Computer Security – ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I [1st ed.] 9783030589509, 9783030589516

The two volume set, LNCS 12308 + 12309, constitutes the proceedings of the 25th European Symposium on Research in Comput

319 71 48MB

English Pages XXVIII, 760 [774] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-xxviii
Front Matter ....Pages 1-1
Pine: Enabling Privacy-Preserving Deep Packet Inspection on TLS with Rule-Hiding and Fast Connection Establishment (Jianting Ning, Xinyi Huang, Geong Sen Poh, Shengmin Xu, Jia-Chng Loh, Jian Weng et al.)....Pages 3-22
Bulwark: Holistic and Verified Security Monitoring of Web Protocols (Lorenzo Veronese, Stefano Calzavara, Luca Compagna)....Pages 23-41
A Practical Model for Collaborative Databases: Securely Mixing, Searching and Computing (Shweta Agrawal, Rachit Garg, Nishant Kumar, Manoj Prabhakaran)....Pages 42-63
Front Matter ....Pages 65-65
Deduplication-Friendly Watermarking for Multimedia Data in Public Clouds (Weijing You, Bo Chen, Limin Liu, Jiwu Jing)....Pages 67-87
DANTE: A Framework for Mining and Monitoring Darknet Traffic (Dvir Cohen, Yisroel Mirsky, Manuel Kamp, Tobias Martin, Yuval Elovici, Rami Puzis et al.)....Pages 88-109
Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation (Anisa Halimi, Erman Ayday)....Pages 110-130
Front Matter ....Pages 131-131
Anonymity Preserving Byzantine Vector Consensus (Christian Cachin, Daniel Collins, Tyler Crain, Vincent Gramoli)....Pages 133-152
CANSentry: Securing CAN-Based Cyber-Physical Systems against Denial and Spoofing Attacks (Abdulmalik Humayed, Fengjun Li, Jingqiang Lin, Bo Luo)....Pages 153-173
Distributed Detection of APTs: Consensus vs. Clustering (Juan E. Rubio, Cristina Alcaraz, Ruben Rios, Rodrigo Roman, Javier Lopez)....Pages 174-192
Designing Reverse Firewalls for the Real World (Angèle Bossuat, Xavier Bultel, Pierre-Alain Fouque, Cristina Onete, Thyla van der Merwe)....Pages 193-213
Front Matter ....Pages 215-215
Follow the Blue Bird: A Study on Threat Data Published on Twitter (Fernando Alves, Ambrose Andongabo, Ilir Gashi, Pedro M. Ferreira, Alysson Bessani)....Pages 217-236
Dynamic and Secure Memory Transformation in Userspace (Robert Lyerly, Xiaoguang Wang, Binoy Ravindran)....Pages 237-256
Understanding the Security Risks of Docker Hub (Peiyu Liu, Shouling Ji, Lirong Fu, Kangjie Lu, Xuhong Zhang, Wei-Han Lee et al.)....Pages 257-276
DE-auth of the Blue! Transparent De-authentication Using Bluetooth Low Energy Beacon (Mauro Conti, Pier Paolo Tricomi, Gene Tsudik)....Pages 277-294
Similarity of Binaries Across Optimization Levels and Obfuscation (Jianguo Jiang, Gengwang Li, Min Yu, Gang Li, Chao Liu, Zhiqiang Lv et al.)....Pages 295-315
HART: Hardware-Assisted Kernel Module Tracing on Arm (Yunlan Du, Zhenyu Ning, Jun Xu, Zhilong Wang, Yueh-Hsun Lin, Fengwei Zhang et al.)....Pages 316-337
Zipper Stack: Shadow Stacks Without Shadow (Jinfeng Li, Liwei Chen, Qizhen Xu, Linan Tian, Gang Shi, Kai Chen et al.)....Pages 338-358
Restructured Cloning Vulnerability Detection Based on Function Semantic Reserving and Reiteration Screening (Weipeng Jiang, Bin Wu, Xingxin Yu, Rui Xue, Zhengmin Yu)....Pages 359-376
LegIoT: Ledgered Trust Management Platform for IoT (Jens Neureither, Alexandra Dmitrienko, David Koisser, Ferdinand Brasser, Ahmad-Reza Sadeghi)....Pages 377-396
Front Matter ....Pages 397-397
PrivColl: Practical Privacy-Preserving Collaborative Machine Learning (Yanjun Zhang, Guangdong Bai, Xue Li, Caitlin Curtis, Chen Chen, Ryan K. L. Ko)....Pages 399-418
An Efficient 3-Party Framework for Privacy-Preserving Neural Network Inference (Liyan Shen, Xiaojun Chen, Jinqiao Shi, Ye Dong, Binxing Fang)....Pages 419-439
Deep Learning Side-Channel Analysis on Large-Scale Traces (Loïc Masure, Nicolas Belleville, Eleonora Cagli, Marie-Angela Cornélie, Damien Couroussé, Cécile Dumas et al.)....Pages 440-460
Towards Poisoning the Neural Collaborative Filtering-Based Recommender Systems (Yihe Zhang, Jiadong Lou, Li Chen, Xu Yuan, Jin Li, Tom Johnsten et al.)....Pages 461-479
Data Poisoning Attacks Against Federated Learning Systems (Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, Ling Liu)....Pages 480-501
Interpretable Probabilistic Password Strength Meters via Deep Learning (Dario Pasquini, Giuseppe Ateniese, Massimo Bernaschi)....Pages 502-522
Polisma - A Framework for Learning Attribute-Based Access Control Policies (Amani Abu Jabal, Elisa Bertino, Jorge Lobo, Mark Law, Alessandra Russo, Seraphin Calo et al.)....Pages 523-544
A Framework for Evaluating Client Privacy Leakages in Federated Learning (Wenqi Wei, Ling Liu, Margaret Loper, Ka-Ho Chow, Mehmet Emre Gursoy, Stacey Truex et al.)....Pages 545-566
Front Matter ....Pages 567-567
An Accountable Access Control Scheme for Hierarchical Content in Named Data Networks with Revocation (Nazatul Haque Sultan, Vijay Varadharajan, Seyit Camtepe, Surya Nepal)....Pages 569-590
PGC: Decentralized Confidential Payment System with Auditability (Yu Chen, Xuecheng Ma, Cong Tang, Man Ho Au)....Pages 591-610
Secure Cloud Auditing with Efficient Ownership Transfer (Jun Shen, Fuchun Guo, Xiaofeng Chen, Willy Susilo)....Pages 611-631
Front Matter ....Pages 633-633
Encrypt-to-Self: Securely Outsourcing Storage (Jeroen Pijnenburg, Bertram Poettering)....Pages 635-654
PGLP: Customizable and Rigorous Location Privacy Through Policy Graph (Yang Cao, Yonghui Xiao, Shun Takagi, Li Xiong, Masatoshi Yoshikawa, Yilin Shen et al.)....Pages 655-676
Where Are You Bob? Privacy-Preserving Proximity Testing with a Napping Party (Ivan Oleynikov, Elena Pagnin, Andrei Sabelfeld)....Pages 677-697
Front Matter ....Pages 699-699
Distributed PCFG Password Cracking (Radek Hranický, Lukáš Zobal, Ondřej Ryšavý, Dušan Kolář, Dávid Mikuš)....Pages 701-719
Your PIN Sounds Good! Augmentation of PIN Guessing Strategies via Audio Leakage (Matteo Cardaioli, Mauro Conti, Kiran Balagani, Paolo Gasti)....Pages 720-735
GDPR – Challenges for Reconciling Legal Rules with Technical Reality (Mirosław Kutyłowski, Anna Lauks-Dutka, Moti Yung)....Pages 736-755
Back Matter ....Pages 757-760
Recommend Papers

Computer Security – ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I [1st ed.]
 9783030589509, 9783030589516

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

LNCS 12308

Liqun Chen Ninghui Li Kaitai Liang Steve Schneider (Eds.)

Computer Security – ESORICS 2020 25th European Symposium on Research in Computer Security, ESORICS 2020 Guildford, UK, September 14–18, 2020, Proceedings, Part I

Lecture Notes in Computer Science Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA

Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA

12308

More information about this series at http://www.springer.com/series/7410

Liqun Chen Ninghui Li Kaitai Liang Steve Schneider (Eds.) •





Computer Security – ESORICS 2020 25th European Symposium on Research in Computer Security, ESORICS 2020 Guildford, UK, September 14–18, 2020 Proceedings, Part I

123

Editors Liqun Chen University of Surrey Guildford, UK

Ninghui Li Purdue University West Lafayette, IN, USA

Kaitai Liang Delft University of Technology Delft, The Netherlands

Steve Schneider University of Surrey Guildford, UK

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-58950-9 ISBN 978-3-030-58951-6 (eBook) https://doi.org/10.1007/978-3-030-58951-6 LNCS Sublibrary: SL4 – Security and Cryptology © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The two volume set, LNCS 12308 and 12309, contain the papers that were selected for presentation and publication at the 25th European Symposium on Research in Computer Security (ESORICS 2020) which was held together with affiliated workshops during the week September 14–18, 2020. Due to the global COVID-19 pandemic, the conference and workshops ran virtually, hosted by the University of Surrey, UK. The aim of ESORICS is to further research in computer security and privacy by establishing a European forum, bringing together researchers in these areas by promoting the exchange of ideas with system developers and by encouraging links with researchers in related fields. In response to the call for papers, 366 papers were submitted to the conference. These papers were evaluated on the basis of their significance, novelty, and technical quality. Except for a very small number of papers, each paper was carefully evaluated by three to five referees and then discussed among the Program Committee. The papers were reviewed in a single-blind manner. Finally, 72 papers were selected for presentation at the conference, yielding an acceptance rate of 19.7%. We were also delighted to welcome invited talks from Aggelos Kiayias, Vadim Lyubashevsky, and Rebecca Wright. Following the reviews two papers were selected for Best Paper Awards and they share the 1,000 EUR prize generously provided by Springer: “Pine: Enabling privacy-preserving deep packet inspection on TLS with rule-hiding and fast connection establishment” by Jianting Ning, Xinyi Huang, Geong Sen Poh, Shengmin Xu, Jason Loh, Jian Weng, and Robert H. Deng; and “Automatic generation of source lemmas in Tamarin: towards automatic proofs of security protocols” by Véronique Cortier, Stéphanie Delaune, and Jannik Dreier. The Program Committee consisted of 127 members across 25 countries. There were submissions from a total of 1,201 authors across 42 countries, with 24 countries represented among the accepted papers. ESORICS 2020 would not have been possible without the contributions of the many volunteers who freely gave their time and expertise. We would like to thank the members of the Program Committee and the external reviewers for their substantial work in evaluating the papers. We would also like to thank the organization/department chair, Helen Treharne, the workshop chair, Mark Manulis, and all of the workshop co-chairs, the poster chair, Ioana Boureanu, and the ESORICS Steering Committee. We are also grateful to Huawei and IBM Research – Haifa, Israel for their sponsorship that enabled us to support this online event. Finally, we would like to express our thanks to the authors who submitted papers to ESORICS 2020. They, more than anyone else, are what made this conference possible.

vi

Preface

We hope that you will find the proceedings stimulating and a source of inspiration for future research. September 2020

Liqun Chen Ninghui Li Kaitai Liang Steve Schneider

Organization

General Chair Steve Schneider

University of Surrey, UK

Program Chairs Liqun Chen Ninghui Li

University of Surrey, UK Purdue University, USA

Steering Committee Sokratis Katsikas (Chair) Michael Backes Joachim Biskup Frederic Cuppens Sabrina De Capitani di Vimercati Dieter Gollmann Mirek Kutylowski Javier Lopez Jean-Jacques Quisquater Peter Y. A. Ryan Pierangela Samarati Einar Snekkenes Michael Waidner

Program Committee Yousra Aafer Mitsuaki Akiyama Cristina Alcaraz Frederik Armknecht Vijay Atluri Erman Ayday Antonio Bianchi Marina Blanton Carlo Blundo Alvaro Cardenas Berkay Celik Aldar C-F. Chan Sze Yiu Chau

University of Waterloo, Canada NTT, Japan UMA, Spain Universität Mannheim, Germany Rutgers University, USA Bilkent University, Turkey Purdue University, USA University at Buffalo, USA Università degli Studi di Salerno, Italy The University of Texas at Dallas, USA Purdue University, USA BIS Innovation Hub Centre, Hong Kong, China Purdue University, USA

viii

Organization

Rongmao Chen Yu Chen Sherman S. M. Chow Mauro Conti Frédéric Cuppens Nora Cuppens-Boulahia Marc Dacier Sabrina De Capitani di Vimercati Hervé Debar Stéphanie Delaune Roberto Di Pietro Tassos Dimitriou Josep Domingo-Ferrer Changyu Dong Wenliang Du Haixin Duan François Dupressoir Kassem Fawaz Jose-Luis Ferrer-Gomila Sara Foresti David Galindo Debin Gao Joaquin Garcia-Alfaro Thanassis Giannetsos Dieter Gollmann Stefanos Gritzalis Guofei Gu Zhongshu Gu Jinguang Han Feng Hao Juan Hernández-Serrano Xinyi Huang Syed Hussain Shouling Ji Ghassan Karame Sokratis Katsikas Stefan Katzenbeisser Ryan Ko Steve Kremer Marina Krotofil Yonghwi Kwon Costas Lambrinoudakis Kyu Hyung Lee

National University of Defense Technology, China Shandong University, China The Chinese University of Hong Kong, Hong Kong, China University of Padua, Italy Polytechnique Montreal, Canada Polytechnique Montréal, Canada Qatar Computing Research Institute (QCRI), Qatar Università degli Studi di Milano, Italy Télécom SudParis, France University of Rennes, CNRS, IRISA, France Hamad Bin Khalifa University, Qatar Kuwait University, Kuwait Universitat Rovira i Virgili, Spain Newcastle University, UK Syracuse University, Italy Tsinghua University, China University of Bristol, UK University of Wisconsin-Madison, USA University of the Balearic Islands, Spain DI, Università degli Studi di Milano, Italy University of Birmingham, UK Singapore Management University, Singapore Télécom SudParis, France Technical University of Denmark, Denmark Hamburg University of Technology, Germany University of the Aegean, Greece Texas A&M University, USA IBM Research, USA Queen’s University Belfast, UK University of Warwick, UK Universitat Politècnica de Catalunya, Spain Fujian Normal University, China Purdue University, USA Zhejiang University, China NEC Laboratories Europe, Germany Norwegian University of Science and Technology, Norway TU Darmstadt, Germany The University of Queensland, Australia Inria, France FireEye, USA University of Virginia, USA University of Piraeus, Greece University of Georgia, USA

Organization

Shujun Li Yingjiu Li Kaitai Liang Hoon Wei Lim Joseph Liu Rongxing Lu Xiapu Luo Shiqing Ma Leandros Maglaras Mark Manulis Konstantinos Markantonakis Fabio Martinelli Ivan Martinovic Sjouke Mauw Catherine Meadows Weizhi Meng Chris Mitchell Tatsuya Mori Haralambos Mouratidis David Naccache Siaw-Lynn Ng Jianting Ning Satoshi Obana Martín Ochoa Rolf Oppliger Manos Panousis Olivier Pereira Günther Pernul Joachim Posegga Indrajit Ray Kui Ren Giovanni Russello Mark Ryan Reihaneh Safavi-Naini Brendan Saltaformaggio Pierangela Samarati Damien Sauveron Einar Snekkenes Yixin Sun Willy Susilo

ix

University of Kent, UK Singapore Management University, Singapore Delft University of Technology, The Netherlands Trustwave, Singapore Monash University, Australia University of New Brunswick, Canada The Hong Kong Polytechnic University, Hong Kong, China Rutgers University, USA De Montfort University, UK University of Surrey, UK Royal Holloway, University of London, UK IIT-CNR, Italy University of Oxford, UK University of Luxembourg, Luxembourg NRL, USA Technical University of Denmark, Denmark Royal Holloway, University of London, UK Waseda University, Japan University of Brighton, UK Ecole normale supérieur, France Royal Holloway, University of London, UK Singapore Management University, Singapore Hosei University, Japan Universidad del Rosario, Colombia eSECURITY Technologies, Switzerland University of Greenwich, UK UCLouvain, Belgium Universität Regensburg, Germany University of Passau, Germany Colorado State University, USA Zhejiang University, China The University of Auckland, New Zealand University of Birmingham, UK University of Calgary, Canada Georgia Institute of Technology, USA Università degli Studi di Milano, Italy XLIM, UMR University of Limoges, CNRS 7252, France Norwegian University of Science and Technology, Norway University of Virginia, USA University of Wollongong, Australia

x

Organization

Pawel Szalachowski Qiang Tang Qiang Tang Juan Tapiador Dave Jing Tian Nils Ole Tippenhauer Helen Treharne Aggeliki Tsohou Luca Viganò Michael Waidner Cong Wang Lingyu Wang Weihang Wang Edgar Weippl Christos Xenakis Yang Xiang Guomin Yang Kang Yang Xun Yi Yu Yu Tsz Hon Yuen Fengwei Zhang Kehuan Zhang Yang Zhang Yuan Zhang Zhenfeng Zhang Yunlei Zhao Jianying Zhou Sencun Zhu

SUTD, Singapore Luxembourg Institute of Science and Technology, Luxembourg New Jersey Institute of Technology, USA Universidad Carlos III de Madrid, Spain Purdue University, USA CISPA, Germany University of Surrey, UK Ionian University, Greece King’s College London, UK Fraunhofer, Germany City University of Hong Kong, Hong Kong, China Concordia University, Canada SUNY University at Buffalo, USA SBA Research, Austria University of Piraeus, Greece Swinburne University of Technology, Australia University of Wollongong, Australia State Key Laboratory of Cryptology, China RMIT University, Australia Shanghai Jiao Tong University, China The University of Hong Kong, Hong Kong, China SUSTech, China The Chinese University of Hong Kong, Hong Kong, China CISPA Helmholtz Center for Information Security, Germany Fudan University, China Chinese Academy of Sciences, China Fudan University, China Singapore University of Technology and Design, Singapore Penn State University, USA

Workshop Chair Mark Manulis

University of Surrey, UK

Poster Chair Ioana Boureanu

University of Surrey, UK

Organization/Department Chair Helen Treharne

University of Surrey, UK

Organization

Organizing Chair and Publicity Chair Kaitai Liang

Delft University of Technology, The Netherlands

Additional Reviewers Abbasi, Ali Abu-Salma, Ruba Ahlawat, Amit Ahmed, Chuadhry Mujeeb Ahmed, Shimaa Alabdulatif, Abdulatif Alhanahnah, Mohannad Aliyu, Aliyu Alrizah, Mshabab Anceaume, Emmanuelle Angelogianni, Anna Anglés-Tafalla, Carles Aparicio Navarro, Francisco Javier Argyriou, Antonios Asadujjaman, A. S. M. Aschermann, Cornelius Asghar, Muhammad Rizwan Avizheh, Sepideh Baccarini, Alessandro Bacis, Enrico Baek, Joonsang Bai, Weihao Bamiloshin, Michael Barenghi, Alessandro Barrère, Martín Berger, Christian Bhattacherjee, Sanjay Blanco-Justicia, Alberto Blazy, Olivier Bolgouras, Vaios Bountakas, Panagiotis Brandt, Markus Bursuc, Sergiu Böhm, Fabian Camacho, Philippe Cardaioli, Matteo Castelblanco, Alejandra Castellanos, John Henry Cecconello, Stefano

Chaidos, Pyrros Chakra, Ranim Chandrasekaran, Varun Chen, Haixia Chen, Long Chen, Min Chen, Zhao Chen, Zhigang Chengjun Lin Ciampi, Michele Cicala, Fabrizio Costantino, Gianpiero Cruz, Tiago Cui, Shujie Deng, Yi Diamantopoulou, Vasiliki Dietz, Marietheres Divakaran, Dinil Mon Dong, Naipeng Dong, Shuaike Dragan, Constantin Catalin Du, Minxin Dutta, Sabyasachi Eichhammer, Philipp Englbrecht, Ludwig Etigowni, Sriharsha Farao, Aristeidis Faruq, Fatma Fdhila, Walid Feng, Hanwen Feng, Qi Fentham, Daniel Ferreira Torres, Christof Fila, Barbara Fraser, Ashley Fu, Hao Galdi, Clemente Gangwal, Ankit Gao, Wei

xi

xii

Organization

Gardham, Daniel Garms, Lydia Ge, Chunpeng Ge, Huangyi Geneiatakis, Dimitris Genés-Durán, Rafael Georgiopoulou, Zafeiroula Getahun Chekole, Eyasu Ghosal, Amrita Giamouridis, George Giorgi, Giacomo Guan, Qingxiao Guo, Hui Guo, Kaiwen Guo, Yimin Gusenbauer, Mathias Haffar, Rami Hahn, Florian Han, Yufei Hausmann, Christian He, Shuangyu He, Songlin He, Ying Heftrig, Elias Hirschi, Lucca Hu, Kexin Huang, Qiong Hurley-Smith, Darren Iadarola, Giacomo Jeitner, Philipp Jia, Dingding Jia, Yaoqi Judmayer, Aljosha Kalloniatis, Christos Kantzavelou, Ioanna Kasinathan, Prabhakaran Kasra Kermanshahi, Shabnam Kasra, Shabnam Kelarev, Andrei Khandpur Singh, Ashneet Kim, Jongkil Koay, Abigail Kokolakis, Spyros Kosmanos, Dimitrios Kourai, Kenichi Koutroumpouchos, Konstantinos

Koutroumpouchos, Nikolaos Koutsos, Adrien Kuchta, Veronika Labani, Hasan Lai, Jianchang Laing, Thalia May Lakshmanan, Sudershan Lallemand, Joseph Lan, Xiao Lavranou, Rena Lee, Jehyun León, Olga Li, Jie Li, Juanru Li, Shuaigang Li, Wenjuan Li, Xinyu Li, Yannan Li, Zengpeng Li, Zheng Li, Ziyi Limniotis, Konstantinos Lin, Chao Lin, Yan Liu, Jia Liu, Jian Liu, Weiran Liu, Xiaoning Liu, Xueqiao Liu, Zhen Lopez, Christian Losiouk, Eleonora Lu, Yuan Luo, Junwei Ma, Haoyu Ma, Hui Ma, Jack P. K. Ma, Jinhua Ma, Mimi Ma, Xuecheng Mai, Alexandra Majumdar, Suryadipta Manjón, Jesús A. Marson, Giorgia Azzurra Martinez, Sergio Matousek, Petr

Organization

Mercaldo, Francesco Michailidou, Christina Mitropoulos, Dimitris Mohammadi, Farnaz Mohammady, Meisam Mohammed, Ameer Moreira, Jose Muñoz, Jose L. Mykoniati, Maria Nassirzadeh, Behkish Newton, Christopher Ng, Lucien K. L. Ntantogian, Christoforos Önen, Melek Onete, Cristina Oqaily, Alaa Oswald, David Papaioannou, Thanos Parkinson, Simon Paspatis, Ioannis Patsakis, Constantinos Pelosi, Gerardo Pfeffer, Katharina Pitropakis, Nikolaos Poettering, Bertram Poh, Geong Sen Polato, Mirko Poostindouz, Alireza Puchta, Alexander Putz, Benedikt Pöhls, Henrich C. Qiu, Tian Radomirovic, Sasa Rakotonirina, Itsaka Rebollo Monedero, David Rivera, Esteban Rizomiliotis, Panagiotis Román-García, Fernando Sachidananda, Vinay Salazar, Luis Salem, Ahmed Salman, Ammar Sanders, Olivier Scarsbrook, Joshua Schindler, Philipp Schlette, Daniel

Schmidt, Carsten Scotti, Fabio Shahandashti, Siamak Shahraki, Ahmad Salehi Sharifian, Setareh Sharma, Vishal Sheikhalishahi, Mina Shen, Siyu Shrishak, Kris Simo, Hervais Siniscalchi, Luisa Slamanig, Daniel Smith, Zach Solano, Jesús Song, Yongcheng Song, Zirui Soriente, Claudio Soumelidou, Katerina Spielvogel, Korbinian Stifter, Nicholas Sun, Menghan Sun, Yiwei Sun, Yuanyi Tabiban, Azadeh Tang, Di Tang, Guofeng Taubmann, Benjamin Tengana, Lizzy Tian, Yangguang Trujillo, Rolando Turrin, Federico Veroni, Eleni Vielberth, Manfred Vollmer, Marcel Wang, Jiafan Wang, Qin Wang, Tianhao Wang, Wei Wang, Wenhao Wang, Yangde Wang, Yi Wang, Yuling Wang, Ziyuan Weitkämper, Charlotte Wesemeyer, Stephan Whitefield, Jorden

xiii

xiv

Organization

Wiyaja, Dimaz Wong, Donald P. H. Wong, Harry W. H. Wong, Jin-Mann Wu, Chen Wu, Ge Wu, Lei Wuest, Karl Xie, Guoyang Xinlei, He Xu, Fenghao Xu, Jia Xu, Jiayun Xu, Ke Xu, Shengmin Xu, Yanhong Xue, Minhui Yamada, Shota Yang, Bohan Yang, Lin Yang, Rupeng Yang, S. J. Yang, Wenjie Yang, Xu

Yang, Xuechao Yang, Zhichao Yevseyeva, Iryna Yi, Ping Yin, Lingyuan Ying, Jason Yu, Zuoxia Yuan, Lun-Pin Yuan, Xingliang Zhang, Bingsheng Zhang, Fan Zhang, Ke Zhang, Mengyuan Zhang, Yanjun Zhang, Zhikun Zhang, Zongyang Zhao, Yongjun Zhong, Zhiqiang Zhou, Yutong Zhu, Fei Ziaur, Rahman Zobernig, Lukas Zuo, Cong

Keynotes

Decentralising Information and Communications Technology: Paradigm Shift or Cypherpunk Reverie?

Aggelos Kiayias University of Edinburgh and IOHK, UK

Abstract. In the last decade, decentralisation emerged as a much anticipated development in the greater space of information and communications technology. Venerated by some and disparaged by others, blockchain technology became a familiar term, springing up in a wide array of expected and some times unexpected contexts. With the peak of the hype behind us, in this talk I look back, distilling what have we learned about the science and engineering of building secure and reliable systems, then I overview the present state of the art and finally I delve into the future, appraising this technology in its potential to impact the way we design and deploy information and communications technology services.

Lattices and Zero-Knowledge

Vadim Lyubashevsky IBM Research - Zurich, Switzerland

Abstract. Building cryptography based on the presumed hardness of lattice problems over polynomial rings is one of the most promising approaches for achieving security against quantum attackers. One of the reasons for the popularity of lattice-based encryption and signatures in the ongoing NIST standardization process is that they are significantly faster than all other post-quantum, and even many classical, schemes. This talk will discuss the progress in constructions of more advanced lattice-based cryptographic primitives. In particular, I will describe recent work on zero-knowledge proofs which leads to the most efficient post-quantum constructions for certain statements.

Accountability in Computing

Rebecca N. Wright Barnard College, New York, USA

Abstract. Accountability is used often in describing computer-security mechanisms that complement preventive security, but it lacks a precise, agreed-upon definition. We argue for the need for accountability in computing in a variety of settings, and categorize some of the many ways in which this term is used. We identify a temporal spectrum onto which we may place different notions of accountability to facilitate their comparison, including prevention, detection, evidence, judgment, and punishment. We formalize our view in a utility-theoretic way and then use this to reason about accountability in computing systems. We also survey mechanisms providing various senses of accountability as well as other approaches to reasoning about accountability-related properties. This is joint work with Joan Feigenbaum and Aaron Jaggard.

Contents – Part I

Database and Web Security Pine: Enabling Privacy-Preserving Deep Packet Inspection on TLS with Rule-Hiding and Fast Connection Establishment . . . . . . . . . . . . . . . . . Jianting Ning, Xinyi Huang, Geong Sen Poh, Shengmin Xu, Jia-Chng Loh, Jian Weng, and Robert H. Deng Bulwark: Holistic and Verified Security Monitoring of Web Protocols . . . . . . Lorenzo Veronese, Stefano Calzavara, and Luca Compagna A Practical Model for Collaborative Databases: Securely Mixing, Searching and Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shweta Agrawal, Rachit Garg, Nishant Kumar, and Manoj Prabhakaran

3

23

42

System Security I Deduplication-Friendly Watermarking for Multimedia Data in Public Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weijing You, Bo Chen, Limin Liu, and Jiwu Jing DANTE: A Framework for Mining and Monitoring Darknet Traffic . . . . . . . Dvir Cohen, Yisroel Mirsky, Manuel Kamp, Tobias Martin, Yuval Elovici, Rami Puzis, and Asaf Shabtai Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anisa Halimi and Erman Ayday

67 88

110

Network Security I Anonymity Preserving Byzantine Vector Consensus . . . . . . . . . . . . . . . . . . Christian Cachin, Daniel Collins, Tyler Crain, and Vincent Gramoli CANSentry: Securing CAN-Based Cyber-Physical Systems against Denial and Spoofing Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdulmalik Humayed, Fengjun Li, Jingqiang Lin, and Bo Luo Distributed Detection of APTs: Consensus vs. Clustering . . . . . . . . . . . . . . . Juan E. Rubio, Cristina Alcaraz, Ruben Rios, Rodrigo Roman, and Javier Lopez

133

153 174

xxii

Contents – Part I

Designing Reverse Firewalls for the Real World . . . . . . . . . . . . . . . . . . . . . Angèle Bossuat, Xavier Bultel, Pierre-Alain Fouque, Cristina Onete, and Thyla van der Merwe

193

Software Security Follow the Blue Bird: A Study on Threat Data Published on Twitter. . . . . . . Fernando Alves, Ambrose Andongabo, Ilir Gashi, Pedro M. Ferreira, and Alysson Bessani

217

Dynamic and Secure Memory Transformation in Userspace . . . . . . . . . . . . . Robert Lyerly, Xiaoguang Wang, and Binoy Ravindran

237

Understanding the Security Risks of Docker Hub . . . . . . . . . . . . . . . . . . . . Peiyu Liu, Shouling Ji, Lirong Fu, Kangjie Lu, Xuhong Zhang, Wei-Han Lee, Tao Lu, Wenzhi Chen, and Raheem Beyah

257

DE-auth of the Blue! Transparent De-authentication Using Bluetooth Low Energy Beacon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mauro Conti, Pier Paolo Tricomi, and Gene Tsudik

277

Similarity of Binaries Across Optimization Levels and Obfuscation . . . . . . . . Jianguo Jiang, Gengwang Li, Min Yu, Gang Li, Chao Liu, Zhiqiang Lv, Bin Lv, and Weiqing Huang

295

HART: Hardware-Assisted Kernel Module Tracing on Arm . . . . . . . . . . . . . Yunlan Du, Zhenyu Ning, Jun Xu, Zhilong Wang, Yueh-Hsun Lin, Fengwei Zhang, Xinyu Xing, and Bing Mao

316

Zipper Stack: Shadow Stacks Without Shadow . . . . . . . . . . . . . . . . . . . . . . Jinfeng Li, Liwei Chen, Qizhen Xu, Linan Tian, Gang Shi, Kai Chen, and Dan Meng

338

Restructured Cloning Vulnerability Detection Based on Function Semantic Reserving and Reiteration Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weipeng Jiang, Bin Wu, Xingxin Yu, Rui Xue, and Zhengmin Yu LegIoT: Ledgered Trust Management Platform for IoT . . . . . . . . . . . . . . . . Jens Neureither, Alexandra Dmitrienko, David Koisser, Ferdinand Brasser, and Ahmad-Reza Sadeghi

359 377

Machine Learning Security PrivColl: Practical Privacy-Preserving Collaborative Machine Learning . . . . . Yanjun Zhang, Guangdong Bai, Xue Li, Caitlin Curtis, Chen Chen, and Ryan K. L. Ko

399

Contents – Part I

An Efficient 3-Party Framework for Privacy-Preserving Neural Network Inference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liyan Shen, Xiaojun Chen, Jinqiao Shi, Ye Dong, and Binxing Fang Deep Learning Side-Channel Analysis on Large-Scale Traces . . . . . . . . . . . . Loïc Masure, Nicolas Belleville, Eleonora Cagli, Marie-Angela Cornélie, Damien Couroussé, Cécile Dumas, and Laurent Maingault Towards Poisoning the Neural Collaborative Filtering-Based Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yihe Zhang, Jiadong Lou, Li Chen, Xu Yuan, Jin Li, Tom Johnsten, and Nian-Feng Tzeng

xxiii

419 440

461

Data Poisoning Attacks Against Federated Learning Systems . . . . . . . . . . . . Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu

480

Interpretable Probabilistic Password Strength Meters via Deep Learning. . . . . Dario Pasquini, Giuseppe Ateniese, and Massimo Bernaschi

502

Polisma - A Framework for Learning Attribute-Based Access Control Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amani Abu Jabal, Elisa Bertino, Jorge Lobo, Mark Law, Alessandra Russo, Seraphin Calo, and Dinesh Verma A Framework for Evaluating Client Privacy Leakages in Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenqi Wei, Ling Liu, Margaret Loper, Ka-Ho Chow, Mehmet Emre Gursoy, Stacey Truex, and Yanzhao Wu

523

545

Network Security II An Accountable Access Control Scheme for Hierarchical Content in Named Data Networks with Revocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nazatul Haque Sultan, Vijay Varadharajan, Seyit Camtepe, and Surya Nepal

569

PGC: Decentralized Confidential Payment System with Auditability . . . . . . . Yu Chen, Xuecheng Ma, Cong Tang, and Man Ho Au

591

Secure Cloud Auditing with Efficient Ownership Transfer . . . . . . . . . . . . . . Jun Shen, Fuchun Guo, Xiaofeng Chen, and Willy Susilo

611

Privacy Encrypt-to-Self: Securely Outsourcing Storage . . . . . . . . . . . . . . . . . . . . . . Jeroen Pijnenburg and Bertram Poettering

635

xxiv

Contents – Part I

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yang Cao, Yonghui Xiao, Shun Takagi, Li Xiong, Masatoshi Yoshikawa, Yilin Shen, Jinfei Liu, Hongxia Jin, and Xiaofeng Xu Where Are You Bob? Privacy-Preserving Proximity Testing with a Napping Party. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ivan Oleynikov, Elena Pagnin, and Andrei Sabelfeld

655

677

Password and Policy Distributed PCFG Password Cracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . Radek Hranický, Lukáš Zobal, Ondřej Ryšavý, Dušan Kolář, and Dávid Mikuš

701

Your PIN Sounds Good! Augmentation of PIN Guessing Strategies via Audio Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matteo Cardaioli, Mauro Conti, Kiran Balagani, and Paolo Gasti

720

GDPR – Challenges for Reconciling Legal Rules with Technical Reality . . . . Mirosław Kutyłowski, Anna Lauks-Dutka, and Moti Yung

736

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

757

Contents – Part II

Formal Modelling Automatic Generation of Sources Lemmas in TAMARIN: Towards Automatic Proofs of Security Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Véronique Cortier, Stéphanie Delaune, and Jannik Dreier

3

When Is a Test Not a Proof? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eleanor McMurtry, Olivier Pereira, and Vanessa Teague

23

Hardware Fingerprinting for the ARINC 429 Avionic Bus . . . . . . . . . . . . . . Nimrod Gilboa-Markevich and Avishai Wool

42

Applied Cryptography I Semantic Definition of Anonymity in Identity-Based Encryption and Its Relation to Indistinguishability-Based Definition . . . . . . . . . . . . . . . . . . . . . Goichiro Hanaoka, Misaki Komatsu, Kazuma Ohara, Yusuke Sakai, and Shota Yamada

65

SHECS-PIR: Somewhat Homomorphic Encryption-Based Compact and Scalable Private Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . Jeongeun Park and Mehdi Tibouchi

86

Puncturable Encryption: A Generic Construction from Delegatable Fully Key-Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Willy Susilo, Dung Hoang Duong, Huy Quoc Le, and Josef Pieprzyk

107

Analyzing Attacks Linear Attack on Round-Reduced DES Using Deep Learning . . . . . . . . . . . . Botao Hou, Yongqiang Li, Haoyue Zhao, and Bin Wu

131

Detection by Attack: Detecting Adversarial Samples by Undercover Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qifei Zhou, Rong Zhang, Bo Wu, Weiping Li, and Tong Mo

146

Big Enough to Care Not Enough to Scare! Crawling to Attack Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabio Aiolli, Mauro Conti, Stjepan Picek, and Mirko Polato

165

xxvi

Contents – Part II

Active Re-identification Attacks on Periodically Released Dynamic Social Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xihui Chen, Ema Këpuska, Sjouke Mauw, and Yunior Ramírez-Cruz

185

System Security II Fooling Primality Tests on Smartcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vladimir Sedlacek, Jan Jancar, and Petr Svenda An Optimizing Protocol Transformation for Constructor Finite Variant Theories in Maude-NPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Damián Aparicio-Sánchez, Santiago Escobar, Raúl Gutiérrez, and Julia Sapiña

209

230

On the Privacy Risks of Compromised Trigger-Action Platforms . . . . . . . . . Yu-Hsi Chiang, Hsu-Chun Hsiao, Chia-Mu Yu, and Tiffany Hyun-Jin Kim

251

Plenty of Phish in the Sea: Analyzing Potential Pre-attack Surfaces . . . . . . . . Tobias Urban, Matteo Große-Kampmann, Dennis Tatang, Thorsten Holz, and Norbert Pohlmann

272

Post-quantum Cryptography Towards Post-Quantum Security for Cyber-Physical Systems: Integrating PQC into Industrial M2M Communication . . . . . . . . . . . . . . . . . Sebastian Paul and Patrik Scheible CSH: A Post-quantum Secret Handshake Scheme from Coding Theory . . . . . Zhuoran Zhang, Fangguo Zhang, and Haibo Tian

295 317

A Verifiable and Practical Lattice-Based Decryption Mix Net with External Auditing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xavier Boyen, Thomas Haines, and Johannes Müller

336

A Lattice-Based Key-Insulated and Privacy-Preserving Signature Scheme with Publicly Derived Public Key . . . . . . . . . . . . . . . . . . . . . . . . . Wenling Liu, Zhen Liu, Khoa Nguyen, Guomin Yang, and Yu Yu

357

Post-Quantum Adaptor Signatures and Payment Channel Networks . . . . . . . . Muhammed F. Esgin, Oğuzhan Ersoy, and Zekeriya Erkin

378

Security Analysis Linear-Complexity Private Function Evaluation is Practical . . . . . . . . . . . . . Marco Holz, Ágnes Kiss, Deevashwer Rathee, and Thomas Schneider

401

Contents – Part II

Certifying Decision Trees Against Evasion Attacks by Program Analysis . . . . Stefano Calzavara, Pietro Ferrara, and Claudio Lucchese They Might NOT Be Giants Crafting Black-Box Adversarial Examples Using Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rayan Mosli, Matthew Wright, Bo Yuan, and Yin Pan Understanding Object Detection Through an Adversarial Lens . . . . . . . . . . . Ka-Ho Chow, Ling Liu, Mehmet Emre Gursoy, Stacey Truex, Wenqi Wei, and Yanzhao Wu

xxvii

421

439 460

Applied Cryptography II Signatures with Tight Multi-user Security from Search Assumptions . . . . . . . Jiaxin Pan and Magnus Ringerud

485

Biased RSA Private Keys: Origin Attribution of GCD-Factorable Keys . . . . . Adam Janovsky, Matus Nemec, Petr Svenda, Peter Sekan, and Vashek Matyas

505

MAC-in-the-Box: Verifying a Minimalistic Hardware Design for MAC Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert Küennemann and Hamed Nemati

525

Evaluating the Effectiveness of Heuristic Worst-Case Noise Analysis in FHE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anamaria Costache, Kim Laine, and Rachel Player

546

Blockchain I How to Model the Bribery Attack: A Practical Quantification Method in Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hanyi Sun, Na Ruan, and Chunhua Su Updatable Blockchains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michele Ciampi, Nikos Karayannidis, Aggelos Kiayias, and Dionysis Zindros PrivacyGuard: Enforcing Private Data Usage Control with Blockchain and Attested Off-Chain Contract Execution . . . . . . . . . . . . . . . . . . . . . . . . Yang Xiao, Ning Zhang, Jin Li, Wenjing Lou, and Y. Thomas Hou

569 590

610

xxviii

Contents – Part II

Applied Cryptography III Identity-Based Authenticated Encryption with Identity Confidentiality . . . . . . Yunlei Zhao

633

Securing DNSSEC Keys via Threshold ECDSA from Generic MPC . . . . . . . Anders Dalskov, Claudio Orlandi, Marcel Keller, Kris Shrishak, and Haya Shulman

654

On Private Information Retrieval Supporting Range Queries . . . . . . . . . . . . . Junichiro Hayata, Jacob C. N. Schuldt, Goichiro Hanaoka, and Kanta Matsuura

674

Blockchain II 2-hop Blockchain: Combining Proof-of-Work and Proof-of-Stake Securely. . . Tuyet Duong, Lei Fan, Jonathan Katz, Phuc Thai, and Hong-Sheng Zhou

697

Generic Superlight Client for Permissionless Blockchains. . . . . . . . . . . . . . . Yuan Lu, Qiang Tang, and Guiling Wang

713

LNBot: A Covert Hybrid Botnet on Bitcoin Lightning Network for Fun and Profit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmet Kurt, Enes Erdin, Mumin Cebe, Kemal Akkaya, and A. Selcuk Uluagac Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

734

757

Database and Web Security

Pine: Enabling Privacy-Preserving Deep Packet Inspection on TLS with Rule-Hiding and Fast Connection Establishment Jianting Ning1,4 , Xinyi Huang1(B) , Geong Sen Poh2 , Shengmin Xu1 , Jia-Chng Loh2 , Jian Weng3 , and Robert H. Deng4 1

Fujian Provincial Key Laboratory of Network Security and Cryptology, College of Mathematics and Informatics, Fujian Normal University, Fuzhou, China [email protected], [email protected], [email protected] 2 NUS-Singtel Cyber Security Lab, Singapore, Singapore [email protected], [email protected] 3 College of Information Science and Technology, Jinan University, Guangzhou, China [email protected] 4 School of Information Systems, Singapore Management University, Singapore, Singapore [email protected]

Abstract. Transport Layer Security Inspection (TLSI) enables enterprises to decrypt, inspect and then re-encrypt users’ traffic before it is routed to the destination. This breaks the end-to-end security guarantee of the TLS specification and implementation. It also raises privacy concerns since users’ traffic is now known by the enterprises, and third-party middlebox providers providing the inspection services may additionally learn the inspection or attack rules, policies of the enterprises. Two recent works, BlindBox (SIGCOMM 2015) and PrivDPI (CCS 2019) propose privacy-preserving approaches that inspect encrypted traffic directly to address the privacy concern of users’ traffic. However, BlindBox incurs high preprocessing overhead during TLS connection establishment, and while PrivDPI reduces the overhead substantially, it is still notable compared to that of TLSI. Furthermore, the underlying assumption in both approaches is that the middlebox knows the rule sets. Nevertheless, with the services increasingly migrating to third-party cloud-based setting, rule privacy should be preserved. Also, both approaches are static in nature in the sense that addition of any rules requires significant amount of preprocessing and re-instantiation of the protocols. In this paper we propose Pine, a new Privacy-preserving inspection of encrypted traffic protocol that (1) simplifies the preprocessing step of PrivDPI thus further reduces the computation time and communication overhead of establishing the TLS connection between a user and a server; (2) supports rule hiding; and (3) enables dynamic rule addition without the need to re-execute the protocol from scratch. We demonstrate the c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 3–22, 2020. https://doi.org/10.1007/978-3-030-58951-6_1

4

J. Ning et al. superior performance of Pine when compared to PrivDPI through extensive experimentations. In particular, for a connection from a client to a server with 5,000 tokens and 6,000 rules, Pine is approximately 27% faster and saves approximately 92.3% communication cost. Keywords: Network privacy

1

· Traffic inspection · Encrypted traffic

Introduction

According to the recent Internet trends report [11], 87% of today’s web traffic was encrypted, compared to 53% in 2016. Similarly, over 94% of web traffic across Google uses HTTPS encryption [7]. The increasing use of end-to-end encryption to secure web traffic has hampered the ability of existing middleboxes to detect malicious packets via deep packet inspection on the traffic. As a result, security service providers and enterprises deploy tools that perform Man-in-the-Middle (MitM) to decrypt, inspect and re-encrypt traffic before the traffic is sent to the designated server. Such approach is termed as Transport Layer Security Inspection (TLSI) by the National Security Agency (NSA), which recently issued an advisory on TLSI [12] citing potential security issues including insider threats. TLSI introduces additional risks whereby administrators may abuse their authorities to obtain sensitive information from the decrypted traffic. On the other hand, there exists growing privacy concern on the access to users’ data by middleboxes as well as the enterprise gateways. According to a recent survey on TLSI in the US [16], more than 70% of the participants are concerned that middleboxes (or TLS proxies) performing TLSI can be exploited by hackers or used by governments, and close to 50% think it is an invasion to privacy. In general, participants are acceptable to the use of middleboxes by their employers or universities for security purposes but also want assurance that these would not be used by governments for surveillance or by exploited hackers. To alleviate the above concerns on maintaining security of TLS while ensuring privacy of the encrypted traffic, Sherry et al. [20] introduced a solution called BlindBox to perform inspection on encrypted traffic directly. However, BlindBox needs a setup phase that is executed between the middlebox and the client. The setup phase performs two-party computation where the input of the middlebox are the rules, which means that the privacy of rules against the middlebox is not assured. In addition, this setup phase is built based on garbled circuit, and needs to be executed for every session. Due to the properties of garble circuit, such setup phase incurs significant computation and communication overheads. To overcome this limitation, Ning et al. [15] recently proposed PrivDPI with an improved setup phase. A new obfuscated rule generation technique was introduced, which enables the reuse of intermediate values generated during the first TLS session across subsequent sessions. This greatly reduces the computation and communication overheads over a series of sessions. However, there still exists considerable delay during the establishment of a TLS connection since each client is required to run a preprocessing protocol for each new connection. In addition,

Pine

5

as we will show in Sect. 4.1, when the domain of the inspection or attack rules is small, the middlebox could perform brute force guessing for the rules in the setting of PrivDPI. This means that, as in BlindBox, PrivDPI does not provide privacy of rules against the middlebox. However, as noted in [20], most solution providers, such as McAfee, rely on the privacy of their rules in their business model. More so given the increasingly popular cloud-based middlebox services, the privacy of the rules should be preserved against the middleboxes. Given the security and privacy concerns on TLSI, and the current status of the state-of-the-arts, we seek to introduce a new solution that addresses the following issues, in addition to maintaining the security and privacy provisions of BlindBox and PrivDPI: (1) Fast TLS connection establishment without preprocessing in order to eliminate the session setup delay incurred in both BlindBox and PrivDPI; (2) Resisting brute force guessing of the rule sets even for small rule domains; (3) Supporting lightweight rule addition. Our Contributions. We propose Pine, a new protocol for privacy-preserving deep packet inspection on encrypted traffic, for a practical enterprise network setting, where clients connect to the Internet through an enterprise gateway. The main contributions are summarized as follows. – Identifying limitation of PrivDPI. We revisit PrivDPI and demonstrate that in PrivDPI, when the rule domain is small, the middlebox could forge new encrypted rules that gives the middlebox the ability to detect the encrypted traffic with any encrypted rules it generates. – New solution with stronger privacy guarantee. We propose Pine as the new solution for the problem of privacy-preserving deep packet inspection, where stronger privacy is guaranteed. First of all, the privacy of the traffic is protected unless there exists an attack in the traffic. Furthermore, privacy of rules is assured against the middlebox, we call this property rule hiding. This property ensures privacy of rules even when the rule domain is small (e.g. approximately 3000 rules as in existing Network Intrusion Detection (IDS) rules), which addresses the limitation of PrivDPI. In addition, privacy of rules is also assured against the enterprise gateway and the endpoints, we term this property rule privacy. – Amortized setup, fast connection establishment. Pine enables the establishment of a TLS connection with low latency and without the need for an interactive preprocessing protocol as in PrivDPI and BlindBox. The latency-incurring preprocessing protocol is performed offline and is only executed once. Consequently, there is no per-user-connection overhead. Any client can setup a secure TLS connection with a remote server without preprocessing delay. In contrast, in PrivDPI and BlindBox, the more rules there are, the higher the per-connection setup cost is. The speed up of the connection is crucial for low-latency applications. – Lightweight rule addition. Pine is a dynamic protocol in that it allows new rules being added on the fly without affecting the connection between a client and a server. The rule addition is seamless to the clients in the sense that the gateway can locally execute the rule addition phase with the middlebox

6

J. Ning et al.

Fig. 1. Pine system architecture.

without any client involvement. This is beneficial as compared to BlindBox and PrivDPI, where the client would need to re-run the preprocessing protocol from scratch for every connection. In addition to stronger privacy protection, we conduct extensive experiments to demonstrate the superior performance of Pine when compared to PrivDPI. For a connection from a client to a server with 5,000 tokens and a ruleset of 6,000, Pine is approximately 27% faster than PrivDPI, and saves approximately 92.3% communication cost. In particular, the communication cost of Pine is independent of the number of rules, while the communication cost of PrivDPI grows linear with the number of rules.

2

Protocol Overview

Pine shares a similar architecture with BlindBox and PrivDPI, as illustrated in Fig. 1. There are five entities in Pine: Client, Server, Gateway (GW), Rule Generator (RG) and Middlebox (MB). Client and server are the endpoints that send and receive network traffic protected by TLS. GW is a device located between a set of clients and servers that allows network traffic to flow from one endpoint to another endpoint. RG generates the attack rule tuples for MB. The attack rule tuples will be used by MB to detect attacks in the network traffic. Each attack rule describes an attack and contains one or more keywords to be matched in the network communication. Hereafter, we will use the terms “rule” and “attack rule” interchangeably. The role of RG can be performed by organization such as McAfee [18]. MB is a network device that inspects and filters network traffic using the attack rule tuples issued by RG. System Requirements. The primary aim is to provide a privacy-preserving mechanism that can detect any suspicious traffic while at the same time ensure the privacy of endpoint’s traffic. In particular, the system requirements include: – Traffic inspection: Pine retains similar functionality of traditional IDS, i.e., to find a suspicious keyword in the packet. – Rule privacy: The endpoints and GW should not learn the attack rules (i.e., the keywords). This is required especially for security solution providers that generate comprehensive and proprietary rule sets as their unique proposition that help to detect malicious traffic more effectively.

Pine

7

– Traffic privacy: On one hand, MB is not supposed to learn the plaintexts of the network traffic, except for the portions of the traffic that match the rules. On the other hand, GW is not allowed to read the content of the traffic. – Rule hiding: MB is not supposed to learn the attack rules from the attack rule tuples issued by RG in a cloud-based setting where MB resides on a cloud platform. In such a case the cloud-based middlebox is not fully trusted. The security solution providers would want to protect the privacy of their unique rule sets, as was discussed previously in describing rule privacy. Threat Model. There are three types of attackers described as follows. – Malicious endpoint. The first type of attacker is the endpoint (i.e., the client or the server). Similar to BlindBox [20] and PrivDPI [15], at most one of the two endpoints is assumed to be malicious but not both. Such an attacker is the same as the attacker in the traditional IDS whose main goal is to evade detection. As in the traditional IDS [17], it is a fundamental requirement that at least one of the two endpoints is honest. This is because if two malicious endpoints agree on a private key and send the traffic encrypted by this particular key, detection of malicious traffic would be infeasible. – The attacker at the gateway. As in conventional network setting, GW is assumed to be semi-honest. That is, GW honestly follows the protocol specification but may try to learn the plaintexts of the traffic. GW may also try to infer the rules from the messages it received. – The attacker at the middlebox. MB is assumed to be semi-honest, which follows the protocol but may attempt to learn more than allowed from the messages it received. In particular, it may try to read the content of the traffic that passed through it. In addition, it may try to learn the underlying rules of the attack rule tuples issued by RG. Protocol Flow. We present how each phase functions at a high level as follows. – Initialization. RG initializes the system by setting the public parameters. – Setup. GW subscribes the inspection service from RG, in which RG receives a shared secret from GW. RG issues the attack rule tuples to MB. The client and the server will derive some parameters from the key of the primary TLS handshake protocol and install a Pine HTTPS configuration, respectively. – Preprocessing. In this phase, GW interacts with MB to generate a set of reusable randomized rules. In addition, GW generates and sends the initialization parameters to the clients within its domain. – Preparation of Session Detection Rule. In this phase, the reusable randomized rules will be used to generate session detection rules. – Token Encryption. In this phase, a client generates the encrypted token for each token in the payload. The encrypted tokens will be sent along with the traffic encrypted from the payload using regular TLS. – Gateway Checking. For the first session, GW checks whether the attached parameters sent by the client is well-formed. This phase will be run when a client connects to a server for the first time.

8

J. Ning et al.

– Traffic Inspection. MB generates a set of encrypted rules and performs inspection using these encrypted rules. – Traffic Validation. One endpoint performs traffic validation in case the other endpoint is malicious. – Rule Addition. A set of new attack rules will be added in this phase. GW interacts with MB to generate the reusable randomized rule set corresponding to these new attack rules.

3

Preliminaries

Complexity Assumption. The decision Diffie-Hellman (DDH) problem is stated as follows: given g, g x , g y , g z , decide whether z = xy (modulo the order of g), where x, y, z ∈ Zp . We say that a PPT algorithm B has advantage  in x y xy x y z solving the DDH problem if | Pr[B (g,g ,g ,g ) = 1] − Pr[B (g,g ,g ,g ) = 1]| ≥ , where the probability above is taken over the coins of B, g, x, y, z. Definition 1. The DDH assumption holds if no PPT adversary has advantage at least  in solving the DDH problem. Pseudorandom function. A pseudorandom function family PRF is a family of functions {PRFa : U → V |a ∈ A} such that A could be efficiently samplable and all PRF, U , V , A are indexed by a security parameter λ. The security property of a PRF is: for any PPT algorithm B running in λ, it holds that | Pr[B PRFa (·) = 1] − Pr[B R(·) = 1]| = negl(λ), where negl is a negligible function of λ, a and R are uniform over A and (U → V ) respectively. The probability above is taken over the coins of B, a and R. For notational simplicity, we consider one version of the general pseudorandom function notion that is custom-made to fit our implementation. Specifically, the pseudorandom function PRF considered in this paper maps λ-bit strings to elements of Zp . Namely, PRFa : {0, 1}λ → Zp , where a ∈ G. Payload Tokenization. As in BlindBox and PrivDPI, we deploy window-based tokenization to tokenize keywords of a client’s payload. Window-based tokenization follows a simple sliding window algorithm. We adopt 8 bytes per token when we implement the protocol. That is, given a payload “secret key”, an endpoint will generate the tokens “secret k”, “ecret ke” and “cret key”.

4

Protocol

In this section, we first point out the limitation of PrivDPI. To address this problem and further reduce the connection delay, we then present our new protocol. 4.1

Limitation of PrivDPI

We show how PrivDPI fails when the domain of rule is small. We say that the domain of rule is small if one can launch brute force attack to guess the underlying rules given the public parameters. We first recall the setup phase of

Pine

9

PrivDPI. In the setup phase, a middlebox receives (si , Ri , sig(Ri )) for rule ri , where Ri = g αri +si and sig(Ri ) is the signature of Ri . With si and Ri , MB obtains the value g αri . Recall that in PrivDPI, the value A = g α is included in the PrivDPI HTTPS configuration, MB could obtain this value via installing a PrivDPI HTTPS configuration. Since the domain of rule is small, with A and g αri , MB can launch brute force attack to obtain the value of ri via trying every ? candidate value v by checking Av = g αri within the rule domain. In this way, MB could obtain the value ri for Ri and rj for Rj . After the completion of 2 preprocessing protocol, MB obtains the reusable obfuscated rule Ii = g kαri +k kαri +k2 kαrj +k2 for rule ri . Now, MB knows values ri , rj , Ii = g , Ij = g . It (ri −rj )−1 kα kα to obtain a value g . With g , ri and can then computes (Ii /Ij ) 2 2 2 Ii = g kαri +k , it can compute Ii /(g kα )ri = g k . With g k and g kα , MB could forge the reusable obfuscated rule successfully for any rule it chooses. With the forged (but valid) reusable obfuscated rule, MB could detect more than it is allowed, which violates the privacy requirement of the encrypted traffic. 4.2

Description of Our Protocol

Initialization. Let R be the domain of rules, PRF be a pseudorandom function, n be the number of rules and [n] be the set {1, ..., n}. Let AESa (salt) be the AES encryption with key a and message salt. Let Enca (salt) = AESa (salt) mod R, where R is an integer used to reduce the ciphertext size [20]. The initialization phase takes in a security parameter λ and chooses a group G of prime order p. It then chooses a generator g of G, and sets the public parameters as (G, p, g). Setup. GW chooses a key g w for the pseudorandom function PRF, where w ∈ Z∗p . It subscribes the service from RG and sends w to RG. RG first computes W = g w . For a rule set {ri ∈ R}i∈[n] , for i ∈ [n], RG chooses a randomness ki ∈ Zp , calculates rw,i = PRFW (ri ) and Ri = g rw,i +ki . RG chooses a signature scheme with sk as the secret key and pk as the public key. It then signs {Ri }i∈[n] with sk and generates the signature of Ri for i ∈ [n], denote by σi . Finally, it sends the attack rule tuples {(Ri , σi , ki )}i∈[n] to MB. Here, g w is the key ingredient for ensuring the property of rule hiding. The key observation here is that since MB does not know g w or w, it cannot guess the underlying ri of Ri via brute forcing all the possible keywords it chooses. In particular, for a given attack rule tuple (Ri , σi , ki ), MB could obtain the value g rw,i by computing Ri /g ki . Due to the property of pseudorandom function, rw,i is pseudorandom, and hence g rw,i is pseudorandom. Without the knowledge of g w or w, it is impossible to obtain ri even if MB brute forces all possible keywords it chooses. On the other hand, the client and the server install a Pine HTTPS configuration which contains a value R. Let ksk be the key of the regular TLS handshake protocol established by a client and a server. With ksk , the client (resp. the server) derives three keys kT , c, ks . Specifically, kT is a standard TLS key, which is used to encrypt the traffic; c is a random value from Zp , which is used for generating session detection rules; ks is a random value from Zp , which is used as a randomness to mask the parameters sent from the client to the server.

10

J. Ning et al.

Preprocessing. In order to accelerate the network connection between a client and a server (compared to PrivDPI), we introduce a new approach that enables fast connection establishment without executing the preprocessing process per client as in PrivDPI. We start from the common networking scenario in an enterprise setting where there exists a gateway located between a set of clients and a server. The main idea is to let the gateway be the representative of the clients within its domain, who will run the preprocessing protocol with MB for only once. Both the clients and the gateway share the initialization parameters required for connection with the server. In this case, the connection between a client and a server can be established instantly without needing any preprocessing as in PrivDPI since the preprocessing is performed by the gateway and MB beforehand. In other words, we offload the operation of preprocessing to the gateway, which dramatically reduces the computation and communication overhead for the connection between a client and a server. Specifically, in this phase, GW runs a preprocessing protocol with MB to generate a reusable randomized rule set as well as the initialization parameters for the clients within the domain of GW. The preprocessing protocol is run after the TLS handshake protocol, which is described in Fig. 2. Upon the completion of this phase, MB obtains a set of reusable randomized rules which enable MB to perform deep packet detection over the encrypted traffic across a series of sessions. The values I0 , I1 and I2 enable each client within the domain of GW to generate the encrypted tokens. Hence, for any network connection with a server, a client does not need to run the preprocessing phase with MB as compared to BlindBox and PrivDPI. This substantially reduces the delay and communication cost for the network connection between the client and the server, especially for large rule set. Furthermore, in case of adding new rules, a client does not need to re-run the preprocessing protocol as BlindBox and PrivDPI does. This means rule addition has no effect on the client side. Preparation of Session Detection Rule. A set of session detection rules will be generated in this phase. These session detection rules are computed, tailored for every session, from the reusable randomized rules generated from the preprocessing protocol. The generated session detection rules are used as the inputs to generate the corresponding encrypted rules. The protocol is described in Fig. 3, and it is executed for every new session. Token Encryption. Similar to BlindBox and PrivDPI, we adopt the windowbased tokenization approach as described in Sect. 3. After the tokenization step, a client obtains a set of tokens corresponding to the payload. For the first time that a client connects with a server, the client derives a salt from c and stores the salt for future use, where c is the key derived from the key ksk of the TLS handshake protocol. For each token t, a client runs the token encryption algorithm as described in Fig. 4. To prevent the count table T from growing too large, the client will clear T every Z sessions (e.g., Z = 1, 000). In this case, the client will send a new salt to MB, where salt ← salt + maxt countt + 1.

Pine

11

In the above, we describe the token encryption when the endpoint is a client. When the endpoint is a server, the server will first run the same tokenization step, and encrypts the tokens as the step 1 and step 2 described in Fig. 4. Gateway Checking. This phase will be executed when a client connects to a server for the first time. For the traffic sent from the client to a server for the first time, the client attaches (salt, Cks , Cw , Cx , Cy ). This enables the server to perform the validation of the encrypted traffic during the traffic validation phase. Cks and ks serve as the randomness to mask the values g w , g x and g xy . The correctness of Cks will be checked once the traffic reached the server. To ensure that g w , g x and g xy are masked by Cks correctly, GW simply checks whether the following equations hold: Cw = (Cks )w , Cx = (Cks )x and Cy = (Cks )xy . Traffic Detection. During the traffic detection phase, MB performs the equality check between the encrypted tokens in the traffic and the encrypted rules it kept. The traffic detection algorithm is described as follows. MB first initializes a counter table CTr to record the encrypted rule Eri for each rule ri . The encrypted rule Eri for rule ri is computed as Eri = EncSi (salt + countri ), where countri is initialized to be 0. MB then generates a search tree that contains the encrypted rules. If a match is found, MB takes the corresponding action, deletes the old Eri corresponding to ri , increases countri by 1, computes and inserts a new Eri into the tree, where the new Eri is computes as EncSi (salt + countri ). Traffic Validation. If it is the first session between a client and a server, upon receiving (salt, Cks , Cw , Cx , Cy ), the server checks whether the equation Cks = g ks holds, where ks is derived (by the server) from the key ksk of the regular TLS −1 handshake protocol. If the equation holds, the server computes (Cw )(ks ) = g w , −1 −1 (Cx )(ks ) = g x , (Cy )(ks ) = g xy . With the computed (g w , g x , g xy ), the server runs the same token encryption algorithm on the plaintext decrypted from the

Input: MB has inputs {(Ri , σi , ki )}i∈[n] , where Ri = g rw,i +ki ; GW has input pk. The protocol is run between GW and MB: 1. GW chooses a random x ∈ Z∗p , computes X = g x , and sends X to MB. 2. MB sends {(Ri , σi )}i∈[n] to GW. 3. Upon receiving {(Ri , σi )}i∈[n] , GW does: (1) Check if σi is a valid signature on Ri using pk for i ∈ [n]; if not, halt and output ⊥. (2) Choose a random y ∈ Z∗p and compute Y = g y . Compute Xi = (Ri · Y )x = g xrw,i +xki +xy for i ∈ [n], and return {Xi }i∈[n] to MB. 4. MB computes Ki = Xi /(X)ki = g xrw,i +xy for i ∈ [n] as the reusable randomized rule for rule ri . 5. GW sets I0 = xy, I1 = x, I2 = g w as the initialization parameters, and sends (I0 , I1 , I2 ) to the clients within its domain.

Fig. 2. Preprocessing protocol

12

J. Ning et al.

Input: The client (resp. the server) has input c. MB has input {Ki }i∈[n] . The protocol is run among a client, a server and MB: 1. The client computes C = g c and sends C to MB (through GW). Meanwhile, the server sets Cs = c and sends Cs to MB. 2. MB checks whether C equals g Cs . If yes, for i ∈ [n], it calculates Si = (Ki ·C)Cs = g c(xrw,i +xy+c) as the session detection rule for rule ri .

Fig. 3. Session detection rule preparation protocol

encrypted TLS traffic as the client does. The server then checks whether the resulting encrypted tokens equal the encrypted tokens received from MB. If not, it indicates that the client is malicious. On the other hand, if it is the traffic sent from the server to the client, the client will do the same token encryption algorithm as the server does, and compares the resulting encrypted tokens with the received encrypted tokens from MB as well. Rule Addition. In practice, new rules may be required to be added into the system. For a new rule ri ∈ R for i ∈ [n ], RG randomly chooses ki ∈ Zp ,    calculates rw,i = PRFW (ri ) and Ri = g rw,i +ki . It then signs the generated Ri with sk to generate the signature σi of Ri . Finally, it sends the newly added attack rule tuples {Ri , σi , ki }i∈[n ] to MB. For the newly added attack rule tuples, the rule addition protocol is described in Fig. 5, which is a simplified protocol of the preprocessing protocol.

Input: The client has inputs (I0 , I1 , I2 ), a token t, the random keys ks and c, the value R, a salt salt and a counter table T, where I0 = xy, I1 = x and I2 = g w . The algorithm is run by the client as follows: 1. Compute I = I0 + c = xy + c. 2. For each token t: • If there exists no tuple corresponding to t in T: compute tw = PRFI2 (t), Tt = g c(I1 tw +I) = g c(xtw +xy+c) , set countt = 0, compute the encryption of t as Et = EncTt (salt). Finally, insert tuple (t, Tt , countt ) into T. • If there exists a tuple (t , Tt , countt ) in T where t = t: update countt = countt + 1, and compute the encryption of t as Et = EncTt (salt + countt ). 3. If it is the first session, compute Cks = g ks , Cw = (I2 )ks = g wks , Cx = g I1 ks = g xks and Cy = g I0 ks = g xyks . The parameters (salt, Cks , Cw , Cx , Cy ) will be sent along with the encrypted token Et for token t.

Fig. 4. Token encryption algorithm

Pine

13

Input: MB has newly added attack rule tuple set {(Ri , σi , ki )}i∈[n ] , where Ri =   g rw,i +ki ; GW has inputs Y , x. The protocol is run between GW and MB: 1. MB sends {(Ri , σi )}i∈[n ] to GW. 2. Upon receiving {(Ri , σi )}i∈[n ] , GW does: (1) Check if σi is a valid signature on Ri using pk for i ∈ [n ]; if not, halt and output ⊥. (2) Compute Xi = (Ri · Y )x =   g xrw,i +xki +xy for i ∈ [n ], and send {Xi }i∈[n ] to MB.  3. MB computes the reusable randomized rule Ki = Xi /(X)ki for i ∈ [n ].

Fig. 5. Rule addition protocol

5 5.1

Security Middlebox Searchable Encryption

Definition. For a message space M, a middlebox searchable encryption scheme consists of the following algorithms: – Setup(λ): Takes a security parameter λ, outputs a key sk. – TokenEnc(t1 , ..., tn , sk): Takes a token set {ti ∈ M}i∈[n] and the key sk, outputs a set of ciphertexts (c1 , ..., cn ) and a salt salt. – RuleEnc(r, sk): Takes a rule r ∈ M, the key sk, outputs an encrypted rule er . – Match(er , (c1 , ..., cn ), salt): Takes an encrypted rule er , ciphertexts {ci }i∈[n] and salt, outputs the set of indexes {indi }i∈[l] , where indi ∈ [n] for i ∈ [l]. Correctness. We refer the reader to Appendix A for its definition. Security. It is defined between a challenger C and an adversary A. – Setup. C runs Setup(λ) and obtains the key sk. – Challenge. A randomly chooses two sets of tokens S0 = {t0,1 , ..., t0,n }, S1 = {t1,1 , ..., t1,n } from M and gives the two sets to C. Upon receiving S0 and S1 , C flips a random coin b, runs TokenEnc(tb,1 , ..., tb,n , sk) to obtain a set of ciphertexts (c1 , ..., cn ) and a salt salt. It then gives (c1 , ..., cn ) and salt to A. – Query. A randomly chooses a set of rules (r1 , ..., rm ) from M and gives the rules to C. Upon receiving the set of rules, for i ∈ [m], C runs RuleEnc(ri , sk) to obtain encrypted rule eri . C then gives the encrypted rules {eri }i∈[m] to A. – Guess. A outputs a guess b of b. Let I0,i be the index set that match ri in S0 and I1,i be the index set that match ri in S1 . If I0,i = I1,i and b = b for all i, we say that the adversary wins the above game. The advantage of the adversary in the game is defined as Pr[b = b] − 1/2. Definition 2. A middlebox searchable encryption scheme is secure if no PPT adversary has a non-negligible advantage in the game.

14

J. Ning et al.

Construction. The construction below captures the main structure from the security point of view. – Setup(λ): Let PRF be a pseudorandom function. Generate x, y, c, w ∈ Zp , set (x, y, c, g w ) as the key. – TokenEnc(t1 , ..., tn , sk): Let salt be a random salt. For i ∈ [n], do: (a) Let count be the number of times that token ti repeats in the sequence t1 ,...,ti−1 ; (b) Calculate tw,i = PRFgw (ti ), Tti = g c(xtw,i +xy+c) , ci = H(Tti , salt + count). Finally, the algorithm outputs (c1 , ..., cn ) and salt. – RuleEnc(r, sk): Compute rw = PRFgw (r), S = g c(xrw +xy+c) , output H(S). Theorem 1. Suppose H is a random oracle, the construction in Sect. 5.1 is a secure middlebox searchable encryption scheme. The proof of this theorem is provided in Appendix B.1. 5.2

Preprocessing Protocol

Definition. The preprocessing protocol is a two-party computation between GW and MB. Let f : {0, 1}∗ × {0, 1}∗ → {0, 1}∗ × {0, 1}∗ be the process of the computation, where for every inputs (a, b), the outputs are (f1 (a, b), f2 (a, b)). In our protocol, the input of GW is x and the input of MB is a derivation of r, and only MB receives the output. Security. The security requirements include: (a) GW should not learn the value of each rule; (b) MB cannot forge any new reusable randomized rule that is different from the reusable randomized rules obtained during the preprocessing protocol. Intuitively, the second requirement is satisfied if MB cannot obtain the value x. Since both of GW and MB are assumed to be semi-honest, we adopt the security definition with static semi-honest adversaries as in [6]. Let π be the two-party protocol for computing f , Viewπi be the ith party’s view during the execution of π, and Outputπ be the joint output of GW and MB from the execution of π. For our protocol, since f is a deterministic functionality, we adopt the security definition for deterministic functionality as shown below. Definition 3. Let f : {0, 1}∗ × {0, 1}∗ → {0, 1}∗ × {0, 1}∗ be a deterministic functionality. We say that π securely computes f in the presence of static semihonest adversaries if (a) Outputπ equals f (a, b); (b) there exist PPT algorithms c c B1 and B2 such that (1) {B1 (a, f1 (a, b))} ≡ {Viewπ1 (a, b)}, (2) {B2 (b, f2 (a, b))} ≡ {Viewπ2 (a, b)}, where a, b ∈ {0, 1}∗ and |a| = |b|. Protocol. In Fig. 6, we provide a simplified protocol that outlines the main structure of the preprocessing protocol. Lemma 1. No computationally unbounded adversary can guess a rule ri with probability greater than 1/|R| with input Ri . The proof of this lemma is provided in Appendix B.2. Theorem 2. The preprocessing protocol securely computes f in the presence of static semi-honest adversaries assuming the DDH assumption holds. The proof of this theorem is provided in Appendix B.3.

Pine

15

Inputs: GW has inputs x, y ∈ Zp ; MB has inputs ({Ri , ki }i∈[n] ), where Ri = g rw,i +ki . The protocol is run between GW and MB: 1. GW computes X = g x , and sends X to MB. 2. MB sends {Ri }i∈[n] to GW. 3. GW computes Xi = (Ri · g y )x for i ∈ [n], and send {Xi }i∈[n] to MB. 4. MB computes Ki = Xi /(X)ki as the reusable randomized rule for rule ri .

Fig. 6. Simplified preprocessing protocol

5.3

Token Encryption

It captures the security requirement that GW cannot learn the underlying token when given an encrypted token. Definition. For a message space M, a token encryption scheme is as follows: • Setup(λ): Takes as input a security parameter λ, outputs a secret key sk and the public parameters pk. • Enc(pk, sk, t): Takes as input the public parameters pk, a secret key sk and a token t ∈ M, outputs a ciphertext c. Security. It is defined between a challenger C and an adversary A. – Setup: C runs Setup(λ) and sends the public parameters pk to A. – Challenge: A randomly chooses two tokens t0 , t1 from M and sends them to C. C flips a random coin b ∈ {0, 1}, runs c ← Enc(pk, sk, tb ), and sends c to A. – Guess: A outputs a guess b of b. The advantage of an adversary is defined to be Pr[b = b] − 1/2. Definition 4. A token encryption scheme is secure if no PPT adversary has a non-negligible advantage in the security game. Construction. The construction presented below outlines the main structure from the security point of view. – Setup(λ): Let PRF be a pseudorandom function. Choose random value x, y, c, w ∈ Zp , calculate p1 = g c , p2 = g w , p3 = x and p4 = y. Finally, set c as sk and (p1 , p2 , p3 , p4 ) as pk. – Enc(pk, sk, t): Let salt be a random salt. Calculate tw = PRFp2 (t), Tt = g c(xtw +xy+c) , c = H(Tt , salt). Output c and salt. Theorem 3. Suppose H is a random oracle, the construction in Sect. 5.3 is a secure token encryption scheme. The proof of this theorem is provided in Appendix B.4.

16

5.4

J. Ning et al.

Rule Hiding

It captures the security requirement that MB cannot learn the underlying rule when given an attack rule tuple (issued by RG). Definition. For a message space M, a rule hiding scheme is defined as follows: – Setup(λ): Takes as input a security parameter λ, outputs a secret key sk and the public parameters pk. – RuleHide(pk, sk, r): Takes as input the public parameters pk, a secret key sk and a rule r ∈ M, outputs a hidden rule. Security. The security definition for a rule hiding scheme is defined between a challenger C and an adversary A as follows. – Setup: C runs Setup(λ) and gives the public parameters to A. – Challenge: A chooses two random rules r0 , r1 from M, and sends them to C. Upon receiving r0 and r1 , C flips a random coin b, runs RuleHide(pk, sk,rb ) and returns the resulting hidden rule to A. – Guess: A outputs a guess b of b. Construction. – Setup(λ): Let PRF be a pseudorandom function. Choose random k, w ∈ Zp , set g w as sk, k as pk. – RuleHide(pk, sk, r): Calculate rw = PRFsk (r), R = g rw +k , and output R. Theorem 4. Suppose PRF is a pseudorandom function, the construction in Sect. 5.4 is a secure rule hiding scheme. The proof of this theorem is provided in Appendix B.5.

6

Performance Evaluations

We investigate the performance of the network connection between a client and a server. Since PrivDPI perfoms better than BlindBox, we only present the comparison with PrivDPI. Let an one-round connection be a connection from the client to the server. The running time of a one-round connection reflects how fast a client can be connected to a server, and the communication cost captures the amount of overhead data need to be transferred for establishing this connection. Ideally, the running time for one-round connection should be as small as possible. The less running time it incurs, the faster a client can connect to a server. Similarly, it is desirable to minimize network communication overhead. We test the running time and the communication cost of one-round connection for our protocol and PrivDPI respectively. Our experiments are run on a Intel(R) Core i7-8700 CPU running at 3.20 Ghz with 8 GB RAM under 64bit Linux operating system. The CPU supports AES-NI instructions, where

Pine 4000

PrivDPI Pine

3500

700

800

2500

700

2000

600

1500

500

1000

400 300 0

1000

2000 3000 4000 Number of rules

5000

0 0

6000

1200

PrivDPI Pine

900

Time (ms)

800 600

2000 3000 4000 Number of rules

5000

2000

4000 6000 Number of tokens

200

400 8000

10000

300 0

8000

10000

(c) 1600

PrivDPI Pine

1400

PrivDPI Pine

1200 1000

600 500

(d)

200 0

6000

700

400

4000 6000 Number of tokens

500

300 1000

800

2000

600

(b) 1000

1000

0 0

PrivDPI Pine

400

500

(a) 1400 Communication cost (kb)

900

3000

Communication cost (kb)

Time (ms)

800

1000

PrivDPI Pine

Time (ms)

900

Communication cost (kb)

1000

17

800 600 400

500

1000 1500 2000 Number of added rules

2500

3000

200 0

(e)

500

1000 1500 2000 Number of added rules

2500

3000

(f)

Fig. 7. Experimental performances

the encryption of token and the encryption of rule reflect this hardware support. The experiments are built on Charm-crypto [1], and is based on NIST Curve P-256. As stated in Sect. 3, both the rules and the tokens consist of 8 bytes. For simplicity, the payload that we test does not contain repeated tokens. We test each case for 20 times and takes the average. How does the number of rules influence the one-round connection? Figure 7a illustrates the running time for one-round connection with 5,000 tokens when the number of rules range from 600 to 6,000. It is demonstrated that Pine takes less time than PrivDPI for each case, the more rules, the less time Pine takes compared to PrivDPI. This means that it takes less time for a client in Pine to connect to a server. In particular, for 5,000 tokens and 6,000 rules, it takes approximately 665 ms for Pine, while PrivDPI takes approximately 912 ms. That is, the delay for one-round connection of Pine is 27% less than PrivDPI; for 5,000 tokens and 3,000 rules, it takes approximately 488 ms for Pine, while PrivDPI takes approximately 616 ms. In other words, a client in Pine connects to a server with 20.7% faster speed than PrivDPI. Figure 7b shows the communication cost for one-round connection with 5,000 tokens when the number of rules range from 600 to 6,000. The communication cost of PrivDPI grows linearly with the number of rules, while for Pine it is constant. The more rules, the more communication cost PrivDPI incurs. This is because the client in PrivDPI needs to run the preprocessing protocol with MB, and the communication cost incurred by this preprocessing protocol is linear with the number of rules. How does the number of tokens influence the one-round connection? We fix the number of rules to be 3,000, and test the running time and communication cost when the number of tokens range from 1,000 to 10,000.

18

J. Ning et al.

Figure 7c shows that the running time of Pine is linear with the number of tokens in the payload, the same as PrivDPI. However, for each case, the time consumed of Pine is less than PrivDPI, this is due to the following two reasons. The first is that a client in Pine does not need to perform the preprocessing protocol for the 3,000 rules. The second is that, the encryption of a token in PrivDPI mainly takes one multiplication in G, one exponentiation in G, and one AES encryption. While in Pine, the encryption of a token mainly takes one hash operation, one exponentiation in G, and one AES encryption. That is, the token encryption of Pine is faster than that of PrivDPI. Figure 7d shows the communication cost of one-round connection with 3,000 rules when the number of tokens range from 1,000 to 10,000. Similar to the running time, the communication costs of Pine and PrivDPI are both linear with the number of tokens, but Pine incurs less communication than PrivDPI. This is due to the additional communication cost of the preprocessing protocol in PrivDPI for 3,000 rules. How does the number of newly added rules influence the one-round connection? We test the running time and communication cost with 3,000 rules and 5,000 tokens when the number of newly added rules range from 300, to 3,000. Figure 7e shows that Pine takes less time than PrivDPI. For 3,000 newly added rules, Pine takes 424.96 ms, while PrivDPI takes 913.52 ms. That is, Pine is 53.48% faster than PrivDPI. Figure 7f shows that the communication cost of Pine is less than PrivDPI. In particular, the communication cost of Pine is independent of the number of newly added rules, while PrivDPI is linear with the number of newly added rules. This is because the client in Pine does not need to perform preprocessing protocol online.

7

Related Work

Our protocol is constructed based on BlindBox proposed by Sherry et al. [20] and PrivDPI proposed by Ning et al. [15], as was stated in the introduction. BlindBox introduces privacy-preserving deep packet inspection on encrypted traffic directly, while PrivDPI utilises an obfuscated rule generation mechanism with improved performance compared to BlindBox. Using the construction in BlindBox as the underlying component, Lan et al. [9] further proposed Embark that leverages on a trusted enterprise gateway to perform privacy-preserving detection in a cloud-based middlebox setting. In Embark, the enterprise gateway needs to be fully trusted and learns the content of the traffic and the detection rules, although in this case the client does not need to perform any operation as in our protocol. Our work focuses on the original setting of BlindBox and PrivDPI with further performance improvements, new properties and stronger privacy guarantee, while considering the practical enterprise gateway setting, in which the gateway needs not be fully trusted. Canard et al. [4] also proposed a protocol, BlindIDS, based on the concept of BlindBox, that has a better performance. The protocol consists of a token-matching mechanism that is based on pairing-based public key operation. Though practical, it is not compatible to TLS protocol.

Pine

19

Another related line of work focuses on accountability of the middlebox. This means the client and the server are aware of the middlebox that performs inspection on the encrypted traffic and are able to verify the authenticity of these middleboxes. Naylor et al. [14] first proposed such a scheme, termed mcTLS, where the existing TLS protocol is modified in order to achieve the accountability properties. However, Bhargavan et al. [3] showed that mcTLS can be tampered by an attacker to create confusion on the identity of the server that a middlebox is connected to, as well as the possibility for the attacker to inject its own data to the network. Due to this, a formal model on analyzing this type of protocols was proposed. Naylor et al. [13] further proposed a scheme, termed mbTLS, which does not modify the TLS protocol, thus allowing authentication of the middleboxes without needing to replace the existing TLS protocol. More recently, Lee et al. [10] proposed maTLS, a protocol that performs explicit authentication and verification of security parameters. There are also proposals that analyse encrypted traffic without decrypting or inspecting the encrypted payloads. Machine learning models were utilised to detect anomalies based on the meta data of the encrypted traffic. Anderson et al. [2] proposed such techniques for malware detection on encrypted Traffic. Trusted hardware has also been deployed for privacy-preserving deep packet inspection. Most of the proposals utilize the secure enclave of Intel SGX. The main idea is to give the trusted hardware, resided in the middlebox, the session key. These include SGX-Box proposed by Han et al. [8], SafeBricks by Poddar et al. [19] and ShieldBox by Trach et al. [21] and LightBox by Duan et al. [5]. We note that our work can be combined with the accountability protocols, as well as the machine learning based works to provide comprehensive encrypted inspection that encompasses authentication and privacy.

8

Conclusion

In this paper, we proposed Pine, a protocol that allows inspection of encrypted traffic in a privacy-preserving manner. Pine builds upon the settings of BlindBox and techniques of PrivDPI in a practical setting, yet enables hiding of rule sets from the middleboxes with significantly improved performance compared to the two prior works. Furthermore, the protocol allows lightweight rules addition on the fly, which to the best of our knowledge has not been considered previously. Pine utilises the common practical enterprise setting where clients establish connections to Internet servers via an enterprise gateway, in such a way that the gateway assists in establishing the encrypted rule sets without learning the content of the client’s traffic. At the same time, a middlebox inspects the encrypted traffic without learning both the underlying rules and content of the traffic. We demonstrated the improved performance of Pine over PrivDPI through extensive experiments. We believe Pine is a promising approach to detect malicious traffic amid growing privacy concerns for both corporate and individual users. Acknowledgments. This work is supported in part by Singapore National Research Foundation (NRF2018NCR-NSOE004-0001) and AXA Research Fund, in part by

20

J. Ning et al.

the National Natural Science Foundation of China (61822202, 61872089, 61902070, 61972094, 61825203, U1736203, 61732021), Guangdong Provincial Special Funds for Applied Technology Research and Development and Transformation of Key Scientific and Technological Achievements (2016B010124009), and Science and Technology Program of Guangzhou of China (201802010061), and in part by the National Research Foundation, Prime Ministers Office, Singapore under its Corporate Laboratory@University Scheme, National University of Singapore, and Singapore Telecommunications Ltd.

A

Correctness of Middlebox Searchable Encryption

On one hand, for every token that matches a rule r, the match should be detected with probability 1; on the other hand, for a token that does not match r, the probability of the match should be negligibly small. For every sufficiently large security parameter λ and any polynomial n(·) such that n = n(λ), for all t1 , ..., tn ∈ Mn , for each rule r ∈ M, for each index indi satisfying r = tindi and for each index indj satisfying r = tindj , let Exp1 (λ) and Exp2 (λ) be experiments defined as follows: Experiment Exp1 (λ): sk ← Setup(λ); (c1 , ..., cn ), salt ← TokenEnc(t1 , ..., tn , sk); er ← RuleEnc(r, sk); {indk }k∈[l] ← Match(er , (c1 , ..., cn ), salt) : indi ∈ {indk }k∈[l] Experiment Exp2 (λ): sk ← Setup(λ); (c1 , ..., cn ), salt ← TokenEnc(t1 , ..., tn , sk); er ← RuleEnc(r, sk); / {indk }k∈[l] {indk }k∈[l] ← Match(er , (c1 , ..., cn ), salt) : indj ∈ We have Pr [Exp1 (λ)] = 1, Pr [Exp2 (λ)] = negl(λ).

B B.1

Proofs Proof of Theorem 1

The security is proved via one hybrid, which replaces the random oracle with deterministic random values. In particular, the algorithm TokenEnc now is modified as follows: Hybrid.TokenEnc(t1 , ..., tn , sk): Let salt be a random salt. For i ∈ [n], sample a random value Ti in the ciphertext space and set ci = Ti . Finally, output (c1 , ..., cn ) and salt. The algorithm RuleEnc(r, sk) is defined to output a random value R from the ciphertext space with the restriction that: (1) if r equals ti for some ti , R is set to be Ti ; (2) for any future r such that r equals r , the output is set to be R. We have that the outputs of algorithm TokenEnc and algorithm RuleEnc are random, while the the pattern of matching between tokens and rules are preserved. Clearly, the distributions for S0 = {t0,1 , ..., t0,n }, S1 = {t1,1 , ..., t1,n } are the same. Hence, any PPT adversary has a change of distinguishing the two sets of exactly half.

Pine

B.2

21

Proof of Lemma 1

Fix a random R = g r , where r ∈ Zp . We have that the probability for R = Ri is the probability for ki = r − rw,i . Hence, for ∀R ∈ G, Pr[R = Ri ] = 1/p. B.3

Proof of Theorem 2

We construct a simulator for each of the parties, B1 for GW and B2 for MB. For the case when GW is corrupted, B1 needs to generate the view of the incoming messages for GW. The message that GW received is Ri for i ∈ [n]. To simulate Ri for rule ri , B1 chooses a random ui ∈ Zp , calculate Ui = g ui and sets Ui as the incoming message for rule ri which simulates the incoming message from MB to GW. Following Lemma 1, the distribution of the simulated incoming message for GW (i.e., Ui ) is indistinguishable from a real execution of the protocol. We next consider the case when MB is corrupted. The first and the third messages are the incoming message that MB received. For the first message, B2 randomly chooses a value v ∈ Zp , computes V = g v , and sets V as the first incoming message for MB. For the third message, B2 randomly chooses a value vi ∈ Zp for i ∈ [n], computes Vi = g vi and sets Vi as the incoming message during the third step of the protocol. The view of MB for a rule ri in an real execution of the protocol is (ki , Ri ; X, Xi ). The distributions of the real view and the simulated view are (ki , g rw,i +ki ; g x , g xrw,i +xki +xy ) and (ki , g rw,i +ki ; g v , g vi ). Clearly, a PPT adversary cannot distinguish these two distributions if DDH assumption holds. B.4

Proof of Theorem 3

We prove the security by one hybrid, where we replace the random oracle with random values. In particular, the modified algorithm Enc is described as follows: Enc(pk, sk, t): Let salt be a random salt. Sample a random value c∗ from the ciphertext space, and output c∗ and salt. Now we have that the output of algorithm Enc is random. The distributions for challenge tokens t0 and t1 are the same. Hence, there exists no PPT adversary that has a chance of distinguishing the two tokes of exactly half. B.5

Proof of Theorem 4

We proof the security via one hybrid, which replaces the output of PRF with a random value. In particular, during the challenge phase, the challenger chooses a random value v, computes R∗ = g v+k (where k is publicly known to the adversary), returns R∗ to the adversary. Clearly, if the adversary wins the security game, one can build a simulator that utilizes the ability of the adversary to break the pseudorandom property of PRF.

22

J. Ning et al.

References 1. Akinyele, J.A., et al.: Charm: a framework for rapidly prototyping cryptosystems. J. Cryptographic Eng. 3(2), 111–128 (2013). https://doi.org/10.1007/s13389-0130057-3 2. Anderson, B., Paul, S., McGrew, D.A.: Deciphering malware’s use of TLS (without decryption). J. Comput. Virol. Hacking Tech. 14(3), 195–211 (2018). https://doi. org/10.1007/s11416-017-0306-6 3. Bhargavan, K., Boureanu, I., Delignat-Lavaud, A., Fouque, P.A., Onete, C.: A formal treatment of accountable proxying over TLS. In: S&P 2018, pp. 339–356. IEEE Computer Society (2018) 4. Canard, S., Diop, A., Kheir, N., Paindavoine, M., Sabt, M.: BlindIDS: marketcompliant and privacy-friendly intrusion detection system over encrypted traffic. In: AsiaCCS 2017, pp. 561–574. ACM (2017) 5. Duan, H., Wang, C., Yuan, X., Zhou, Y., Wang, Q., Ren, K.: Lightbox: full-stack protected stateful middlebox at lightning speed. In: CCS 2019, pp. 2351–2367 (2019) 6. Goldreich, O.: Foundations of Cryptography: Volume 2, Basic Applications. Cambridge University Press (2009) 7. Google. HTTPS Encryption on the Web (2019). https://transparencyreport. google.com/https/overview?hl=en 8. Han, J., Kim, S., Ha, J., Han, D.: SGX-Box: enabling visibility on encrypted traffic using a secure middlebox module. In: APNet 2017, pp. 99–105. ACM (2017) 9. Lan, C., Sherry, J., Popa, R.A., Ratnasamy, S., Liu, Z.: Embark: securely outsourcing middleboxes to the cloud. In: NSDI 2016, pp. 255–273. USENIX Association (2016) 10. Lee, H., et al.: maTLS: how to make TLS middlebox-aware? In: NDSS 2019 (2019) 11. Meeker, M.: Internet trends (2019). https://www.bondcap.com/report/itr19/ 12. National Security Agency. Managing Risk From Transport Layer Security Inspection (2019). https://www.us-cert.gov/ncas/current-activity/2019/11/19/ nsa-releases-cyber-advisory-managing-risk-transport-layer-security 13. Naylor, D., Li, R., Gkantsidis, C., Karagiannis, T., Steenkiste, P.: And then there were more: secure communication for more than two parties. In: CoNEXT 2017, pp. 88–100. ACM (2017) 14. Naylor, D., et al.: Multi-context TLS (mcTLS): enabling secure in-network functionality in TLS. In: SIGCOMM 2015, pp. 199–212. ACM (2015) 15. Ning, J., Poh, G.S., Loh, J.C.N., Chia, J., Chang, E.C.: PrivDPI: privacypreserving encrypted traffic inspection with reusable obfuscated rules. In: CCS 2019, pp. 1657–1670 (2019) 16. O’Neill, M., Ruoti, S., Seamons, K.E., Zappala, D.: TLS inspection: how often and who cares? IEEE Internet Comput. 21(3), 22–29 (2017) 17. Paxson, V.: Bro: a system for detecting network intruders in real-time. Computer Netw. 31(23–24), 2435–2463 (1999) 18. McAfee Network Security Platform (2019). http://www.mcafee.com/us/products/ network-security-platform.aspx 19. Poddar, R., Lan, C., Popa, R.A., Ratnasamy, S.: SafeBricks: shielding network functions in the cloud. In: NSDI 2018, pp. 201–216. USENIX Association (2018) 20. Sherry, J., Lan, C., Popa, R.A., Ratnasamy, S.: BlindBox: deep packet inspection over encrypted traffic. In: SIGCOMM 2015, pp. 213–226 (2015) 21. Trach, B., Krohmer, A., Gregor, F., Arnautov, S., Bhatotia, P., Fetzer, C.: ShieldBox: secure middleboxes using shielded execution. In: SOSR 2018, pp. 2:1–2:14. ACM (2018)

Bulwark: Holistic and Verified Security Monitoring of Web Protocols Lorenzo Veronese1,2(B) , Stefano Calzavara1 , and Luca Compagna2 1

Universit` a Ca’ Foscari Venezia, Venezia, Italy [email protected] 2 SAP Labs France, Mougins, France

Abstract. Modern web applications often rely on third-party services to provide their functionality to users. The secure integration of these services is a non-trivial task, as shown by the large number of attacks against Single Sign On and Cashier-as-a-Service protocols. In this paper we present Bulwark, a new automatic tool which generates formally verified security monitors from applied pi-calculus specifications of web protocols. The security monitors generated by Bulwark offer holistic protection, since they can be readily deployed both at the client side and at the server side, thus ensuring full visibility of the attack surface against web protocols. We evaluate the effectiveness of Bulwark by testing it against a pool of vulnerable web applications that use the OAuth 2.0 protocol or integrate the PayPal payment system.

Keywords: Formal methods

1

· Web security · Web protocols

Introduction

Modern web applications often rely on third-party services to provide their functionality to users. The trend of integrating an increasing number of these services has turned traditional web applications into multi-party web apps (MPWAs, for short) with at least three communicating actors. In a typical MPWA, a Relying Party (RP) integrates services provided by a Trusted Third Party (TTP). Users interact with the RP and the TTP through a User Agent (UA), which is normally a standard web browser executing a web protocol. For example, many RPs authenticate users through the Single Sign On (SSO) protocols offered by TTPs like Facebook, Google or Twitter, and use Cashier-as-a-Service (CaaS) protocols provided by payment gateway services such as PayPal and Stripe. Unfortunately, previous research showed that the secure integration of thirdparty services is a non-trivial task [4,12,18,20–23]. Vulnerabilities might arise due to errors in the protocol specification [4,12], incorrect implementation practices at the RP [18,21,22] and subtle bugs in the integration APIs provided by the TTP [23]. To secure MPWAs, researchers proposed different approaches, L. Veronese—Now at TU Wien. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 23–41, 2020. https://doi.org/10.1007/978-3-030-58951-6_2

24

L. Veronese et al.

most notably based on run time monitoring [7,10,15,16,25]. The key idea of these proposals is to automatically generate security monitors allowing only the web protocol runs which comply with the expected, ideal run. Security monitors can block or try to automatically fix protocol runs which deviate from the expected outcome. In this paper, we take a retrospective look at the design of previous proposals for the security monitoring of web protocols and identify important limitations in the current state of the art. In particular, we observe that: 1. existing proposals make strong assumptions about the placement of security monitors, by requiring them to be deployed either at the client [7,15] or at the RP [10,16,25]. In our work we show that both choices are sub-optimal, as they cannot prevent all the vulnerabilities identified so far in popular web protocols (see Sect. 4); 2. most existing proposals are not designed with formal verification in mind. They can ensure that web protocol runs are compliant with the expected run, e.g., derived from the network traces collected in an unattacked setting, however they do not provide any guarantee about the actual security properties supported by the expected run itself [10,16,25]. Based on these observations, we claim that none of the existing solutions provides a reliable and comprehensive framework for the security monitoring of web protocols in MPWAs. Contributions. In this paper, we contribute as follows: 1. we perform a systematic, comprehensive design space analysis of previous work and we identify concrete shortcomings in all existing proposals by considering the popular OAuth 2.0 protocol and the PayPal payment system as running examples (Sect. 4); 2. we present Bulwark, a novel proposal exploring a different point of the design space. Bulwark generates formally verified security monitors from applied picalculus specifications of web protocols and lends itself to the appropriate placement of such monitors to have full visibility of the attack surface, while using modern web technologies to support an easy deployment. This way, Bulwark reconciles formal verification with practical security (Sect. 5); 3. we evaluate the effectiveness of Bulwark by testing it against a pool of vulnerable web applications that use the OAuth 2.0 protocol or integrate the PayPal payment system. Our analysis shows that Bulwark is able to successfully mitigate attacks on both the client and the server side (Sect. 6).

2

Motivating Example

As motivating example, shown in Fig. 1, we selected a widely used web protocol, namely OAuth 2.0 in explicit mode, which allows a RP to leverage a TTP for authenticating a user operating a UA.1 The protocol starts (step 1) with the UA 1

The protocol in the figure closely follows the Facebook implementation; details might slightly vary for different TTPs.

Holistic and Verified Security Monitoring of Web Protocols

25

Fig. 1. Motivating example: Facebook OAuth 2.0 explicit mode

visiting the RP’s login page. A login button is provided back that, when clicked, triggers a request to the TTP (steps 2–3). Such a request comprises: client id, the identifier registered for the RP at the TTP; reduri, the URI at RP to which the TTP will redirect the UA after access has been granted; and state, a freshly generated value used by the RP to maintain session binding with the UA. The UA authenticates with the TTP (steps 4–5), which in turn redirects the UA to the reduri at RP with a freshly generated value code and the state value (steps 6–7). The RP verifies the validity of code in a back channel exchange with the TTP (steps 8–9): the TTP acknowledges the validity of code by sending back a freshly generated token indicating the UA has been authenticated. Finally, the RP confirms the successful authentication to the UA (step 10). Securely implementing such a protocol is far from easy and many vulnerabilities have been reported in the past. We discuss below two representative attacks with severe security implications. Session Swapping [21]. Session swapping exploits the lack of contextual binding between the login endpoint (step 2) and the callback endpoint (step 7). This is often the case in RPs that do not provide a state parameter or do not strictly validate it. The attack starts with the attacker signing in to the TTP and obtaining a valid code (step 6). The attacker then tricks an honest user, through CSRF, to send the attacker’s code to the RP, which makes the victim’s UA authenticate at the RP with the attacker’s identity. From there on, the attacker can track the activity of the victim at the RP. The RP can prevent this attack by checking that the value of state at step 7 matches the one that was generated at step 2. The boxed shapes around state represent this invariant in Fig. 1.

26

L. Veronese et al.

Unauthorized Login by Code Redirection [4,14]. Code (and token) redirection attacks exploit the lack of strict validation of the reduri parameter and involve its manipulation by the attacker. The attacker crafts a malicious page which fools the victim into starting the protocol flow at step 3 with valid client id and state from an honest RP, but with a reduri that points to the attacker’s site. The victim then authenticates at the TTP and is redirected to the attacker’s site with the code value. The attacker can then craft the request at step 7 with the victim’s code to obtain the victim’s token (step 9) and authenticate as her at the honest RP. The TTP can prevent this attack by (i) binding the code generated at step 6 to the reduri received at step 3, and (ii) checking, at step 8, that the received code is correctly bound to the supplied reduri. The rounded shapes represent this invariant in Fig. 1.

3

Related Work

We review here existing approaches to the security monitoring of web protocols, focusing in particular on their adoption in modern MPWAs. Each approach can be classified based on the placement of the proposed defense, i.e., we discriminate between client-side and server-side approaches. We highlight here that none of the proposed approaches is able to protect the entire attack surface of web protocols. Moreover, none of the proposed solutions, with the notable exception of WPSE [7], is designed with formal verification in mind and provides clear, precise guarantees about the actual security properties satisfied by the enforced policy. 3.1

Client-Side Defenses

WPSE [7] extends the browser with a security monitor for web protocols that enforces the intended protocol flow, as well as the confidentiality and the integrity of messages. This monitor is able to mitigate many vulnerabilities found in the literature. The authors, however, acknowledge that some classes of attack cannot be prevented by WPSE. In particular, network attacks (like the HTTP variant of the IdP mix-up attack [12]), attacks that do not deviate from the intended protocol flow (like the automatic login CSRF from [4]) and purely server-side attacks are out of scope. OAuthGuard [15] is a browser extension that aims to prevent five types of attacks on OAuth 2.0 and OpenID Connect, including CSRF and impersonation attacks. OAuthGuard essentially shares the same limitations of WPSE, due to the same partial visibility of the attack surface (the client side). Recently Google has shown interest in extending its Chrome browser to monitor SSO protocols,2 however their solution deals with a specific attack against their own implementation of SAML and is not a general approach designed to protect other protocols or TTPs. 2

https://gsuiteupdates.googleblog.com/2018/04/more-secure-sign-in-chrome.html.

Holistic and Verified Security Monitoring of Web Protocols

3.2

27

Server-Side Defenses

InteGuard [25] focuses on the server side of the RP, as its code appears to be more error-prone than that of the TTP. InteGuard is deployed as a proxy in front of the RP that checks invariants within the HTTP messages before they reach the web server. Different types of invariants are automatically generated from the network traces of SSO and CaaS protocols and enable the monitor to link together multiple HTTP sessions in transactions. Thanks to its placement, InteGuard can also monitor back channels (cf. steps 8–9 of Fig. 1). The authors explicitly mention that InteGuard does not offer protection on the TTP, expecting further efforts to be made in that direction. Unfortunately, several attacks can only be detected at the TTP, e.g., some variants of the unauthorized login by auth. code (or token) redirection attack from [4]. AEGIS [10] synthesizes run time monitors to enforce control-flow and dataflow integrity, authorization policies and constraints in web applications. The monitors are server-side proxies generated by extracting invariants from a set of input traces. AEGIS was designed for traditional two-party web applications, hence it does not offer comprehensive protection in the multi-party setting, e.g., due to its inability to monitor messages exchanged on the back channels. However, as mentioned by the authors, it can still mitigate those vulnerabilities which can be detected on the front channel of the RP (e.g., the shop-for-free TomatoCart example [10]). Similar considerations apply to BLOCK [16], a black-box approach for detecting state violation attacks, i.e., attacks which exploit logic flaws in the application to allow some functionality to be accessed at inappropriate states. Guha et al. [13] apply a static control-flow analysis for JavaScript code to construct a request-graph, a model of a well-behaved client as seen by the server application. They then use this model to build a reverse proxy that blocks the requests that violate the expected control flow of the application, and are thus marked as potential attacks. Also this approach was designed for two-party web applications, hence does not offer holistic protection in the multi-party setting. Moreover, protection can only be enforced on web applications which are entirely developed in JavaScript.

4

Design Space Analysis

Starting from our analysis of related work, we analyze the design space of security monitors for web protocols, discussing pros and cons along different axes. Our take-away message is that solutions which assume a fixed placement of a single security monitor, which is the path taken by previous work, are inherently limited in their design for several reasons. 4.1

Methodology

We consider three possible deployment options for security monitors: the first two are taken from the literature, while the last one is a novel proposal we make. In particular, we focus on:

28

L. Veronese et al.

1. browser extensions [7,15]: a browser extension is a plugin module, typically written in JavaScript, that extends the web browser with custom code with powerful capabilities on the browser internals, e.g., arbitrarily accessing the cookie jar and monitoring network traffic; 2. server-side proxies [10,16,25]: a proxy server acts as an intermediary sitting between the web server hosting (part of) the web application and the clients that want to access it; 3. service workers: the Service Worker API3 is a new browser functionality that enables websites to define JavaScript workers running in the background, decoupled from the web application logic. Service workers provide useful features normally not available to JavaScript, e.g., inspecting HTTP requests before they leave the browser, hence are an intriguing deployment choice for client-side security monitors. We evaluate these options with respect to four axes, originally proposed as effective criteria for the analysis of web security solutions [8]: 1. ease of deployment: the practicality of a large-scale deployment of the defense mechanism, i.e., the price to pay for site operators to grant security benefits; 2. usability: the impact on the end-user experience, e.g., whether the user is forced to take actions to activate the protection mechanism; 3. protection: the effectiveness of the defense mechanism, i.e., the supported and unsupported security policies; 4. compatibility: the precision of the defense mechanism, i.e., potential false positives and breakages coming from security enforcement. 4.2

Ease of Deployment and Usability

Service workers are appealing, since they score best with respect to ease of deployment and usability. Specifically, the deployment of a service worker requires site operators to just add a JavaScript file to the web application: when a user visits the web application, the installation of the service worker is transparently performed with no user interaction. Server-side proxies similarly have the advantage of ensuring transparent protection to end users. However, they are harder to deploy than service workers, as they require site operators to have control over the server networking. Even if site operators just needed to apply small modifications to the monitored application, they would have to reroute the inbound/outbound traffic to the proxy. This is typically easy for the TTP, which is usually a major company with full control over its deployment, but it can be impossible for some RPs. RPs are sometimes deployed on managed hosting platforms that may not allow any modification on the server itself, except for the application code. Note that site operators could implement the logic of the proxy directly in the application code, but this solution is impractical, since it would require a significant rewriting of the web 3

https://developer.mozilla.org/en-US/docs/Web/API/Service Worker API.

Holistic and Verified Security Monitoring of Web Protocols

29

application logic. This is also particularly complicated when the web application is built on top of multiple programming languages and frameworks. Finally, browser extensions are certainly the worst solution with respect to usability and ease of deployment. Though installing a browser extension is straightforward, site operators cannot assume that every user will perform this manual installation step. In principle, site operators could require the installation of the browser extension to access their web application, but this would have a major impact on usability and could drive users away. Users’ trust in browser extensions is another problem on its own: extensions, once installed and granted permissions, have very powerful capabilities on the browser internals. Moreover, the extension should be developed for the plethora of popular browsers which are used nowadays. Though major browsers now share the same extension architecture, many implementation details are different and would force developers to release multiple versions of their extensions, which complicates deployment. In the end, installing an extension is feasible for a single user, but relying on browser extensions for a large-scale security enforcement is unrealistic. 4.3

Protection and Compatibility

We study protection and compatibility together, since the visibility of the attack surface is the key enabler of both protection and compatibility. Indeed, the more the monitor has visibility of the protocol messages, the more it becomes able to avoid both false positives and false negatives in detecting potential attacks. We use the notation P1 ↔ P2 to indicate the channel between two parties P1 and P2 . If a monitor has visibility over a channel, then the monitor has visibility over all the messages exchanged on that channel. Visibility. Browser extensions run on the UA and can have visibility of all the messages channeled through it. In particular, browser extensions can request host permissions upon installation to get access to the traffic exchanged with arbitrary hosts, which potentially enables them to inspect and edit any HTTP request and response relayed through the UA. In MPWAs, the UA can thus have visibility over both the channels UA ↔ RP and UA ↔ TTP (shortly indicated as UA ↔ {RP, TTP}). However, the UA itself is not in the position to observe the entire attack surface against web protocols: for example, when messages are sent on the back channel between the RP and the TTP (RP ↔ TTP) like in our motivating example (steps 8–9 in Fig. 1), an extension is unable to provide any protection, as the UA is not involved in the communication at all. Server-side proxies can be categorized into reverse proxies and forward proxies, depending on whether they monitor incoming or outgoing HTTP requests respectively (plus their corresponding HTTP responses). Both approaches are useful and have been proposed in the literature, e.g., InteGuard [25] uses both a reverse proxy and a forward proxy at the RP to capture messages from the UA and to the TTP respectively. This way, InteGuard has full visibility of all the messages flowing through the RP, i.e., RP ↔ {UA, TTP}. However, this is still

30

L. Veronese et al.

not sufficient to fully monitor the attack surface. In particular, server-side proxies cannot inspect values that never leave the UA, like the fragment identifier, which instead plays an important role in the implicit flow of OAuth 2.0. Finally, web applications can register service workers at the UA by means of JavaScript. Service workers can act as network proxies with access to all the traffic exchanged between the UA and the origin4 which registered them. This way, service workers have the same visibility of a reverse-proxy sitting at the server; however, since they run on the UA, they also have access to values which never leave the client, like the fragment identifier. Despite this distinctive advantage, service workers are severely limited by the Same Origin Policy (SOP). In particular, they cannot monitor traffic exchanged between the UA and other origins, which makes them less powerful than browser extensions. For example, contrary to browser extensions like WPSE [7], a single service worker cannot monitor and defend both the RP and the TTP. This limitation can be mitigated by using multiple service workers and/or selectively relaxing SOP using CORS, which however requires collaboration between the RP and the TTP. Since service workers are more limited than browser extensions, they also share their inability to monitor back channels, hence they cannot be a substitute for forward proxies. In the end, we conclude that browser extensions and server-side proxies are complementary in their ability to observe security-relevant protocol components, given their respective positioning, while service workers are strictly less powerful than their alternatives. Cross-Site Scripting (XSS). XSS is a dangerous vulnerability which allows an attacker to inject malicious JavaScript code in a benign web application. Once a web application suffers from XSS, most confidentiality and integrity guarantees are lost, hence claiming security despite XSS is wishful thinking. Nevertheless, we discuss here how browser extensions can offer better mitigation than service workers in presence of XSS vulnerabilities. Specifically, since service workers can be removed by JavaScript, an attacker who was able to exploit an XSS vulnerability would also be able to void the protection offered by service workers. This can be mitigated by defensive programming practices, e.g., overriding the functions required for removing service workers, but it is difficult to assess both the correctness and the compatibility impact of this approach. For example, the deactivation of service workers might be part of the legitimate functionality of the web application or the XSS could be exploited before security-sensitive functions are overridden. Browser extensions, instead, cannot be removed by JavaScript and are potentially more robust against XSS. For example, WPSE [7] replaces secret values with random placeholders before they actually enter the DOM, so that secrets exchanged in the monitored protocol cannot be exfiltrated via XSS; the placeholders are then replaced with the intended values only when they leave the browser towards authorized parties. 4

An origin is a triple including a scheme (HTTP, HTTPS, ...), a host (www.foo.com) and a port (80, 443, ...). Origins represent the standard web security boundary.

Holistic and Verified Security Monitoring of Web Protocols

31

Table 1. Attacks on OAuth 2.0 and PayPal Attack

Channels to observe UA RP TTP ext sw proxy sw proxy OAuth 2.0

1

307 Redirect attack [12]

UA ↔ TTP



×

×





2

Access token eavesdropping [21]

UA ↔ RP







×

×

3

Code/State Leakage via referer header [11, 12]

UA ↔ RP







×

×

4

Code/Token theft via XSS [21]

UA ↔ RP



×

×

×

×

5

Cross Social-Network Request Forgery [4]

UA ↔ RP







×

×

6

Facebook implicit AppId Spoofing [20, 23]

UA ↔ TTP

×

×

×





7

Force/Automatic login CSRF [4, 21]

UA ↔ RP







×

×

8

IdP Mix-Up attack [12] (HTTP variant)

UA ↔ RP

×

×



×

×

9

IdP Mix-Up attack [12] (HTTPS variant)

UA ↔ RP







×

×

10 Naive session integrity attack [12] 11 Open Redirector in OAuth 2.0 [14, 17] 12 Resource Theft by Code/Token Redirection [4, 7] 13 Session swapping [14, 21]

UA ↔ RP







×

×

UA ↔ {RP, TTP}











UA ↔ TTP



×

×

×



UA ↔ RP







×

× ×

UA ↔ RP







×

UA ↔ TTP



×

×





UA ↔ RP





×

×

×

UA ↔ TTP



×

×

×



18 NopCommerce gross change in IPN callback [22]

RP ↔ {UA,TTP}

×

×



×

×

19 NopCommerce gross change in PDT flow [22]

RP ↔ {UA,TTP}

×

×



×

×

20 Shop for free by malicious PayeeId replay [18, 20] RP ↔ {UA,TTP}

×

×



×

×

×

×



×

×

14 Social login CSRF on stateless clients [4, 14] 15 Social login CSRF through TTP Login CSRF [4] 16 Token replay implicit mode [14, 20, 24] 17 Unauth. Login by Code Redirection [4, 14] PayPal

21 Shop for less by Token replay [18, 20]

UA ↔ RP

Tamper Resistance. Since both browser extensions and service workers are installed on the client, they can be tampered with or uninstalled by malicious users or software. This means that the defensive practices put in place by browser extensions and service workers are voided when the client cannot be trusted. This is particularly important for applications like CaaS, where malicious users might be willing to abuse the payment system, e.g., to shop for free. Conversely, serverside proxies are resilient by design to this kind of attacks, since they cannot be accessed at the client side, hence they are more appropriate for web applications where the client cannot be trusted to any extent. Assessment on MPWAs. We now substantiate our general claims by means of a list of known attacks against the OAuth 2.0 protocol and the PayPal payment system, two popular protocols in MPWAs. Table 1 shows this list of attacks. For each attack, we show which channels need to be visible to detect the attack and we conclude whether the attack can be prevented by a browser extension (ext), a service worker (sw ) or a server-side proxy (proxy) deployed on either the RP or the TTP. In general we can see that, in the OAuth 2.0 setting, a browser extension is the most powerful tool, as it can already detect and block by itself most of the attacks (15 out of 17). The exceptions are the Facebook implicit AppId Spoofing

32

L. Veronese et al.

attack [20,23], which can only be detected at the TTP, and the HTTP variant of the IdP Mix-Up attack [12], which is a network attack not observable at the client. Yet, remarkably, a comparable amount of protection can be achieved by using just service workers alone: in particular, the use of service workers at both the RP and the TTP can stop 13 out of 17 attacks. The only notable differences over the browser extension approach are that: (i) the code/token theft via XSS cannot be prevented, though we already discussed that even browser extensions can only partially mitigate the dangers of XSS, and (ii) the resource theft by code/token redirection and the unauthorized login by auth. code redirection cannot be stopped, because they involve a cross-origin redirect that service workers cannot observe by SOP. Remarkably, the combination of service workers and server-side proxies offers transparent protection that goes beyond browser extensions alone: 16 out of 17 attacks are blocked, with the only exception of code/token theft via XSS as explained. The PayPal setting shows a very different trend with respect to OAuth 2.0. Although it is possible to detect the attacks on the client side, it is not safe to do so because both browser extensions and service workers can be uninstalled by malicious customers. For example, such client-side approaches cannot prevent the shop for free attack of [18], where a malicious user replaces the merchant id with her own account id. Moreover, it is worth noticing that PayPal deliberately makes heavy use of back channels (RP ↔ TTP), since messages which are not relayed by the browser cannot be tampered with by malicious customers. This means that server-side proxies are the way to go to protect PayPal-like payment systems, as confirmed by the table. 4.4

Take-Away Messages

Here we highlight the main take-away messages of our design space analysis. In general, we claim that different web protocols require different protection mechanisms, hence every defensive solution which is bound to a specific placement of monitors does not generalize. More specifically: – A clear total order emerges on the ease of deployment and usability axes. Service workers score best there, closely followed by server-side proxies, whose deployment is still feasible and transparent to end-users. Browser extensions are much more problematic, especially for large-scale security enforcement. – With respect to the protection and compatibility axes, browser extensions are indeed a powerful tool, yet they can be replaced by a combination of service workers and server-side proxies to enforce transparent protection, extended to attacks which are not visible at the client alone. In the end, we argue that a combination of service workers and server-side proxies has the potential to reconcile security, compatibility, ease of deployment and usability. In our approach, described in the next section, we thus pursue this research direction.

Holistic and Verified Security Monitoring of Web Protocols Ideal Specification

Monitored Specification

Monitor Generation ProVerif (WebSpi)

a2msw (...)

UA RP TTP ... Protocol Specification ProVerif (WebSpi)

Monitor Generation Functions

Abstract Monitor(s) ServiceWorker

33

Deployment Verification

RPsw T T Psw ... Monitor Code Generation

ServiceWorker Monitor(s)

a2mp (...)

RPp T T Pp ...

Abstract Monitor(s) Proxy

Proxy Monitor(s)

Fig. 2. Monitor generation pipeline

5

Proposed Approach: Bulwark

In this section we present Bulwark,5 our formally verified approach to the holistic security monitoring of web protocols. For space reasons, we present an informal overview and we refer to the online technical report for additional details [2]. 5.1

Overview

Bulwark builds on top of ProVerif, a state-of-the-art protocol verification tool [5]. ProVerif was originally designed for traditional cryptographic protocols, not for web protocols, but previous work showed how it can be extended to the web setting by using the WebSpi library [4]. In particular, WebSpi provides a ProVerif model of a standard web browser and includes support for important threats from the web security literature, e.g., web attackers, who lack traditional Dolev-Yao network capabilities and attack the protocol through a malicious website. Bulwark starts from a ProVerif model of the web protocol to protect, called ideal specification, and generates formally verified security monitors deployed as service workers or server-side proxies. This process builds on an intermediate step called monitored specification. The workflow is summarized in Fig. 2. To explain the intended use case of Bulwark, we focus on the typical setting of a multi-party web application including a TTP which offers integration to a set of RPs, yet the approach is general. The TTP starts by writing down its protocol in ProVerif, expressing the intended security properties by means of standard correspondence assertions (authentication) and (syntactic) secrecy queries supported by ProVerif. For example, the code/token redirection attack against OAuth 2.0 (cf. Sect. 2) can be discovered through the violation of a correspondence assertion [4]. The protocol can then be automatically verified for 5

Bulwark is currently proprietary software at SAP: the tool could be made available upon request and an open-source license is under consideration.

34

L. Veronese et al.

security violations and the TTP can apply protocol fixes until ProVerif does not report any flaw. Since ProVerif is a sound verification tool [6], this process eventually leads to a security proof for an unbounded number of protocol sessions, up to the web model of WebSpi. The WebSpi model, although expressive, is not a complete model of the Web [4]. For example, it does not model advanced security headers such as Content-Security-Policy, frames and frame communication (postMessage). However, the library models enough components of the modern Web to be able to capture all the attacks of Table 1. Once verification is done, the TTP can use Bulwark to automatically generate security monitors for its RPs from the ideal specification, e.g., to complement the traditional protocol SDK that the TTP normally offers anyway with protection for RPs, which are widely known to be the buggiest participants [25]. The TTP could also decide to use Bulwark to generate its own security monitors, so as to be protected even in the case of bugs in its own implementation. 5.2

Monitored Specification

In the monitored specification phase, Bulwark relaxes the ideal assumption that all protocol participants are implemented correctly. In particular, user-selected protocol participants are replaced by inattentive variants which comply with the protocol flow, but forget relevant security checks. Technically, this is done by replacing the ProVerif processes of the participants with new processes generated by automatically removing from the honest participants all the security checks (pattern matches, get/insert and conditionals) on the received messages, which include the invariants represented by the boxed checks in Fig. 1. This approximates the possible mistakes made by honest-but-buggy participants, obtaining processes that are interoperable with the other participants, but whose possible requests and responses are a super set of the original ones. Intuitively, an inattentive participant may be willing to install a monitor to prevent attackers from exploiting the lack of forgotten security checks. On the other hand, a deliberately malicious participant has no interest in doing so. Then, Bulwark extracts from the ideal specification all the security invariant checks forgotten by the inattentive variants of the protocol participants and centralizes them within security monitors. This is done by applying two functions a2mp and a2msw , which derive from the participant specifications new ProVerif processes encoding security monitors deployed as a server-side proxy or a service worker respectively. The a2mp function is a modified version of the a2m function of [19], which generates security monitors for cryptographic protocols. The proxy interposes and relays messages from/to the monitored inattentive participant, after performing the intended security checks. A subtle point here is that the monitor needs to keep track of the values that are already in its knowledge and those which are generated by the monitored participant and become known only after receiving them. A security check can only be executed when all the required values are part of the monitor knowledge. The a2msw function, instead, is defined on top of a2mp and the ideal U A process. This recalls that a service worker is a client-side defense that acts as a

Holistic and Verified Security Monitoring of Web Protocols

35

reverse proxy: a subset of the checks of both the server and the client side can be encoded into the final process running on the client. The function has three main responsibilities: (i) rewriting the proxy to be compatible with the service worker API; (ii) removing the channels and values that a service worker is not able to observe; (iii) plugging the security checks made by the ideal U A process into the service worker. Example 1. Figure 3 illustrates how the RP of our OAuth 2.0 example from Sect. 2 is replaced by I(RP ), an inattentive variant in which the state parameter invariant is not checked. The right-hand side of the figure presents the RP as in Fig. 1 with a few more details on its internals, according to the ideal specification: (i) upon reception of message 1, the RP issues a new value for state and saves it together with the U A session cookie identifier (i.e., state is bound to the client session sid(UA)); and (ii) upon reception of message 7, the RP checks the state parameter and its binding to the client session. The inattentive version I(RP ), as generated by Bulwark, is shown on the left-hand side of Fig. 3: the state is neither saved by I(RP ) nor checked afterward. The left-hand side of the figure also shows the proxy M (RP ) generated by Bulwark as a2mp (RP ) to enforce the state parameter invariant at RP . We can see that the saving and the checking of state are performed by M (RP ). It is worth noticing that the M (RP ) can only save state upon reception of message 2 from I(RP ). The service worker monitor a2msw (RP ) would look very similar to M (RP ).

Fig. 3. Monitor invariant example

Finally, Bulwark produces a monitored specification where each inattentive protocol participant deploys a security monitor both at the client side (service worker) and at the server side (proxy). However, this might be overly conservative, e.g., a single service worker might already suffice for security. To optimize ease of deployment, Bulwark runs again ProVerif on the possible monitor deployment options, starting from the most convenient one, until it finds a setting which

36

L. Veronese et al.

satisfies the security properties of the ideal specification. As an example, consider the system in which the only inattentive participant is the RP . There are three possible options, in decreasing order of ease of deployment: 1. T T P  I(RP )  (a2msw (RP, U A)  U A), where the monitor is deployed as a service worker registered by the RP at the U A; 2. T T P  (I(RP )  a2mp (RP ))  U A, where the monitor is a proxy at RP ; 3. T T P  (I(RP )  a2mp (RP ))  (a2msw (RP, U A)  U A), with both. The first option which passes the ProVerif verification is chosen by Bulwark. 5.3

Monitor Generation

Finally, Bulwark translates the ProVerif monitor processes into real service workers (written in JavaScript) or proxies (written in Python), depending on their placement in the monitored specification. This is a relatively direct one-to-one translation, whose key challenge is mapping the ProVerif messages to the real HTTP messages exchanged in the web protocol. Specifically, different RPs integrating the same TTP will host the protocol at different URLs and each TTP might use different names for the same HTTP parameters or rely on different message encodings (JSON, XML, etc.). Bulwark deals with this problem by means of a configuration file, which drives the monitor generation process by defining the concrete values of the symbols and data constructors that are used by the ProVerif model. When the generated monitor needs to apply e.g., a data destructor on a name, it searches the configuration file for its definition and calls the corresponding function that deconstructs the object into its components. Since data constructors/destructors are directly written in the target language as part of the monitor configuration, different implementations can be generated for the same monitor, so that a single monitor specification created for a real-world participant e.g., the Google TTP, can be easily ported to others, e.g., the Facebook TTP, just by tuning their configuration files.

6 6.1

Experimental Evaluation Methodology

To show Bulwark at work, we focus on the core MPWA scenarios discussed in Sect. 4.3. We first write ideal specifications of the OAuth 2.0 explicit protocol and the PayPal payment system in ProVerif + WebSpi. We also define appropriate correspondence assertions and secrecy queries which rule out all the attacks in Table 1 and we apply known fixes until ProVerif is able to prove security for the ideal specifications. Then, we setup a set of case studies representative of the key vulnerabilities plaguing these scenarios (see Table 2). In particular, we selected vulnerabilities from Table 1 so as to evaluate Bulwark on both the RP and TTP via a combination of proxy and service worker monitors. For each case study, we

Holistic and Verified Security Monitoring of Web Protocols

37

choose a set of inattentive participants and we collect network traces to define the Bulwark configuration files mapping ProVerif messages to actual protocol messages. Finally, we use Bulwark to generate appropriate security monitors and deploy them in our case studies. All our vulnerable case studies, their ideal specifications, and the executable monitors generated by Bulwark are provided as an open-source package to the community [1]. Case Studies. We consider a range of possibilities for OAuth 2.0. We start from an entirely artificial case study, where we develop both the RP and the TTP, introducing known vulnerabilities in both parties (CS1). We then consider integration scenarios with three major TTPs, i.e., Facebook, VK and Google, where we develop our own vulnerable RPs on top of public SDKs (CS2-CS4). Finally, we consider a case study where we have no control of any party, i.e., the integration between Overleaf and Google (CS5). We specifically choose this scenario, since the lack of the state parameter in the Overleaf implementation of OAuth 2.0 introduces known vulnerabilities.6 To evaluate the CaaS scenario, we select legacy versions of three popular e-commerce platforms, suffering from known vulnerabilities in their integration with PayPal, in particular osCommerce 2.3.1 (CS6), NopCommerce 1.6 (CS7) and TomatoCart 1.1.15 (CS8). Evaluation Criteria. We evaluate each case study in terms of four key aspects: (i) security: we experimentally confirm that the monitors stop the exploitation of the vulnerabilities; (ii) compatibility: we experimentally verify that the monitors do not break legitimate protocol runs; (iii) portability: we assess whether our ideal specifications can be used without significant changes across different case studies; and (iv) performance: we show that the time spent to verify the protocol and generate the monitors is acceptable for practical use. Table 2. Test set of vulnerable applications

6

CS

RP

TTP

Protocol

Vuln. (Table 1)

1

artificial RP 1

artificial IdP

OAuth 2.0 explicit

#13 #17

2

artificial RP 2

facebook.com

OAuth 2.0 exp. (graph-sdk 5.7)

#13

3

artificial RP 3

vk.com

OAuth 2.0 exp. (vk-php-sdk 5.100)

#13

4

artificial RP 4

google.com

OAuth 2.0 exp. (google/apiclient 2.4)

#13

5

overleaf.com

google.com

OAuth 2.0 explicit

#13 #14

6

osCommerce 2.3.1

paypal.com

PayPal Standard

#18 #20

7

NopCommerce 1.6

paypal.com

PayPal Standard

#18

8 TomatoCart 1.1.15

paypal.com

PayPal Standard

#21

We responsibly disclosed the issue to Overleaf and they fixed it before publication.

38

6.2

L. Veronese et al.

Experimental Results

The evaluation results are summarized in Table 3 and discussed below. In our case studies, we considered as inattentive participants all the possible sets of known-to-be vulnerable parties, leading to 10 experiments; when multiple experiments can be handled by a single run of Bulwark, their results are grouped together in the table, e.g., the experiments for CS2-CS4. Notice that for CS1 we considered three sets of inattentive participants: only TTP (vulnerability #17); only RP (vulnerability #13); and both RP and TTP (both vulnerabilities). Hence, we have 3 experiments for CS1, 3 experiments for CS2-CS4, 1 experiment for CS5 and 3 experiments for CS6-CS8. Table 3. Generated monitors and run-time Monitor

Gen. Monitors

Ideal Verification Inattentive Verification CS Spec.

1 2 3 4 5 6 7 8

IS1

Time

29m

Parties

RP

TTP

time #verif. sw proxy sw proxy

Prevented Vuln.

TTP

41m

2

×

×

×



RP

15m

1



×

×

×

#13

RP,TTP

54m

3



×

×



#13 #17

#17

IS1

27m

RP

18m

1



×

×

×

#13

IS1*

19m

RP

17m

1



×

×

×

#13 #14

3m

RP

8m

1

×



×

×

#18 #20 #21

IS2

Security and Compatibility. To assess security and compatibility, we created manual tests to exploit each vulnerability of our case studies and we ran them with and without the Bulwark generated monitors. In all the experiments, we confirmed that the known vulnerabilities were prevented only when the monitors were deployed (security) and that we were able to complete legitimate protocol runs successfully both with and without the monitors (compatibility). Based on Table 3, we observe that 5 experiments can be secured by a service worker alone, 4 experiments can be protected by a server-side proxy and only one experiment needed the deployment of two monitors. This heterogeneity confirms the need of holistic security solutions for web protocols. Portability. We can see that the ideal specification IS1 created for our first case study CS1 is portable to CS2-CS4 without any change. This means that different TTPs supporting the OAuth 2.0 explicit protocol like Facebook, VK and Google can use Bulwark straightaway, by just tuning the configuration file to their settings. This would allow them to protect their integration scenarios with RPs that (like ours) make use of the state parameter. This is interesting, since different TTPs typically vary on a range of subtle details, which are all accounted for correctly by the Bulwark configuration files. However, the state

Holistic and Verified Security Monitoring of Web Protocols

39

parameter is not mandatory in the OAuth2 standard and thus TTPs tend to allow integration also with RPs that do not issue it. Case study CS5 captures this variant of OAuth 2.0: removing the state parameter from IS1 is sufficient to create a new ideal specification IS1*, which enables Bulwark towards these scenarios as well. As to PayPal, the ideal specification IS2 is portable to all the case studies CS6-CS8. Overall, our experience indicates that once an ideal specification is created for a protocol, then it is straightforward to reuse it on other integration scenarios based on the same protocol. Performance. We report both the time spent to verify the ideal specification (Verification Time) as well as the time needed to verify the monitors (Monitor Verification). Both steps are performed offline and just once, hence the times in the table are perfectly fine for practical adoption. Verifying the ideal specification never takes more than 30 min, while verifying the monitors might take longer, but never more than one hour in our experiments. The time spent in the latter step depends on how many runs of ProVerif are required to reach a secure monitored specification (see the very end of Sect. 5.2). For example, the first experiment runs ProVerif twice (cf. #verif.) and requires 41 min, while the second experiment runs ProVerif just once and thus takes only 15 min.

7

Conclusion

In this paper we identified shortcomings in previous work on the security monitoring of web protocols and proposed Bulwark, the first holistic and formally verified defensive solution in this research area. Bulwark combines state-of-theart protocol verification tools (ProVerif) with modern web technologies (service workers) to reconcile formal verification with practical security. We showed that Bulwark can generate effective security monitors on different case studies based on the OAuth 2.0 protocol and the PayPal payment system. As future work, we plan to extend Bulwark to add an additional protection layer, i.e., on client-side communication based on JavaScript and the postMessage API. This is important to support modern SDKs making heavy use of these technologies, like the latest versions of the PayPal SDKs, yet challenging given the complexity of sandboxing JavaScript code [3]. On the formal side, we would like to strengthen our definition of “inattentive” participant to cover additional vulnerabilities besides missing invariant checks. For example, we plan to cover participants who forget to include relevant security headers and are supported by appropriately configured monitors in this delicate task. Finally, we would like to further engineer Bulwark to make it easier to use for people who have no experience with ProVerif, e.g., by including support for a graphical notation which is compiled into ProVerif processes, similarly to the approach in [9]. Acknowledgments. Lorenzo Veronese was partially supported by the European Research Council (ERC) under the European Unions Horizon 2020 research (grant agreement No. 771527-BROWSEC).

40

L. Veronese et al.

References 1. Bulwark case studies. https://github.com/secgroup/bulwark-experiments 2. Bulwark: holistic and verified security monitoring of web protocols (Technical report). https://secgroup.github.io/bulwark-experiments/report.pdf 3. Van Acker, S., Sabelfeld, A.: JavaScript sandboxing: isolating and restricting clientside JavaScript. In: Aldini, A., Lopez, J., Martinelli, F. (eds.) FOSAD 2015-2016. LNCS, vol. 9808, pp. 32–86. Springer, Cham (2016). https://doi.org/10.1007/9783-319-43005-8 2 4. Bansal, C., Bhargavan, K., Maffeis, S.: Discovering concrete attacks on website authorization by formal analysis. In: CSF 2012. IEEE (2012) 5. Blanchet, B.: An efficient cryptographic protocol verifier based on prolog rules. In: CSFW 2001. IEEE (2001) 6. Blanchet, B.: Automatic verification of correspondences for security protocols. J. Comput. Secur. 17(4), 363–434 (2009) 7. Calzavara, S., Focardi, R., Maffei, M., Schneidewind, C., Squarcina, M., Tempesta, M.: WPSE: fortifying web protocols via browser-side security monitoring. In: USENIX Security 18. USENIX Association (2018) 8. Calzavara, S., Focardi, R., Squarcina, M., Tempesta, M.: Surviving the web: a journey into web session security. ACM Comput. Surv. 50(1), 1–34 (2017) 9. Carbone, R., Compagna, L., Panichella, A., Ponta, S.E.: Security threat identification and testing. In: ICST 2015. IEEE Computer Society (2015) 10. Compagna, L., dos Santos, D., Ponta, S., Ranise, S.: Aegis: automatic enforcement of security policies in workflow-driven web applications. In: CODASPY 2017. ACM (2017) 11. Fett, D., K¨ usters, R., Schmitz, G.: The web SSO standard OpenID connect: indepth formal security analysis and security guidelines. In: CSF 2017. IEEE (2017) 12. Fett, D., K¨ usters, R., Schmitz, G.: A comprehensive formal security analysis of OAuth 2.0. CCS 2016. ACM (2016) 13. Guha, A., Krishnamurthi, S., Jim, T.: Using static analysis for ajax intrusion detection. In: WWW 2009. ACM (2009) 14. Hardt, D.: The OAuth 2.0 authorization framework. RFC 6749, October 2012 15. Li, W., Mitchell, C.J., Chen, T.: OAuthGuard: protecting user security and privacy with OAuth 2.0 and OpenID connect. In: SSR (2019) 16. Li, X., Xue, Y.: BLOCK: a black-bOx approach for detection of state violation attacks towards web applications. In: ACSAC 2011 (2011) 17. Lodderstedt, T., McGloin, M., Hunt, P.: OAuth 2.0 threat model and security considerations. RFC 6819, January 2013 18. Pellegrino, G., Balzarotti, D.: Toward black-box detection of logic flaws in web applications. In: NDSS (2014) 19. Pironti, A., J¨ urjens, J.: Formally-based black-box monitoring of security protocols. In: Massacci, F., Wallach, D., Zannone, N. (eds.) ESSoS 2010. LNCS, vol. 5965, pp. 79–95. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-1174737 20. Sudhodanan, A., Armando, A., Carbone, R., Compagna, L.: Attack patterns for black-box security testing of multi-party web applications. In: NDSS (2016) 21. Sun, S.T., Beznosov, K.: The devil is in the (implementation) details: an empirical analysis of OAuth SSO systems. In: CCS 2012. ACM (2012) 22. Wang, R., Chen, S., Wang, X., Qadeer, S.: How to shop for free online - security analysis of cashier-as-a-service based web stores. In: S&P. IEEE (2011)

Holistic and Verified Security Monitoring of Web Protocols

41

23. Wang, R., Chen, S., Wang, X.: Signing me onto your accounts through Facebook and Google: a traffic-guided security study of commercially deployed single-sign-on web services. In: S&P 2012. IEEE (2012) 24. Wang, R., Zhou, Y., Chen, S., Qadeer, S., Evans, D., Gurevich, Y.: Explicating SDKs: uncovering assumptions underlying secure authentication and authorization. In: USENIX (2013) 25. Xing, L., Chen, Y., Wang, X., Chen, S.: InteGuard: toward automatic protection of third-party web service integrations. In: NDSS 2013 (2013)

A Practical Model for Collaborative Databases: Securely Mixing, Searching and Computing Shweta Agrawal1 , Rachit Garg2(B) , Nishant Kumar3 , and Manoj Prabhakaran4 1

IIT Madras, Chennai, India UT Austin, Austin, USA [email protected] Microsoft Research India, Bangalore, India 4 IIT Bombay, Mumbai, India 2

3

Abstract. We introduce the notion of a Functionally Encrypted Datastore which collects data anonymously from multiple data-owners, stores it encrypted on an untrusted server, and allows untrusted clients to make select-and-compute queries on the collected data. Little coordination and no communication is required among the data-owners or the clients. Our notion is general enough to capture many real world scenarios that require controlled computation on encrypted data, such as is required for contact tracing in the wake of a pandemic. Our leakage and performance profile is similar to that of conventional searchable encryption systems, while the functionality we offer is significantly richer. In more detail, the client specifies a query as a pair (Q, f ) where Q is a filtering predicate which selects some subset of the dataset and f is a function on some computable values associated with the selected data. We provide efficient protocols for various functionalities of practical relevance. We demonstrate the utility, efficiency and scalability of our protocols via extensive experimentation. In particular, we evaluate the efficiency of our protocols in computations relevant to the Genome Wide Association Studies such as Minor Allele Frequency (MAF), Chi-square analysis and Hamming Distance.

1

Introduction

Many real world scenarios call for performing controlled computation on encrypted data belonging to multiple users. A case in point is that of contact tracing to control the COVID-19 pandemic, where cellphone users may periodically upload their (space, time) co-ordinates to enable tracing of infected persons, but desire the assurance that this data will not be used for any other purpose. Another example is Genome Wide Association Studies (GWAS), which look into entire genomes across different individuals to discover associations between genetic variants and particular diseases or traits [4]. R. Garg—Work done primarily when student at IIT Madras. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 42–63, 2020. https://doi.org/10.1007/978-3-030-58951-6_3

A Practical Model for Collaborative Databases

43

More generally, enabling controlled computation on large-scale, multi-user, encrypted cloud storage is of much practical value in various privacy sensitive situations. Over the last several years, several tools have emerged that offer a variety of approaches towards this problem, offering different trade-offs among security, efficiency and generality. While theoretical schemes based on modern cryptographic tools like secure multi-party computation (MPC), fully homomorphic encryption (FHE) or functional encryption (FE) can provide strong security guarantees, their computational and communication requirements are often prohibitive for large-scale data (see Sect. 5.2, and Footnote 4). At the other end are efficient tools like CryptDB [25], Monomi [29], Seabed [22] and Arx [24], which add a lightweight encryption layer under the hood of conventional database queries, but offer only limited security guarantees and do not support collaborative databases (see Table 2). While there also exist tools which seek to strike a different balance by trading off some efficiency for more robust security guarantees and better support for collaboration – like Symmetric Searchable Encryption (SSE) [10,27], and Controlled Functional Encryption (CFE) [20] – they offer limited functionality. Our Approach: We introduce Functionally Encrypted Datastores (FED), opening up new possibilities in secure cloud storage. FED is a secure cloud-based data store that can collect data anonymously from multiple data-owners, store it encrypted on untrusted servers, and allow untrusted clients to make select-andcompute queries on the collected data. The “select and compute” functionality we support may be viewed as typical (relational) database operations on a single table. A query is specified as a pair (Q, f ) where Q is a filtering predicate which selects some rows of the table, and f is a function on the selected data. Several real world scenarios involve such filtering – e.g., in contact tracing, records which have a (space, time) pair from a set of high-risk pairs are filtered; in GWAS, records are often filtered by presence of a disease (see [3] for more examples). A key feature we seek is that the computation overheads for a select-and-compute query should not scale with the entire database size, but only with the number of selected records. 1.1

Our Model

Before we describe our model, we outline our desiderata: – No central trusted authority. The data-owners should not be required to trust any central server with their private data. – No coordination among offline data-owners. The data-owners should not be required to trust, or even be aware of each other. Additionally, the data owners need not be online when queries arrive. – Untrusted clients, oblivious to the data-owners. The data-owners should not be required to trust or even be aware of the clients who will query the datastore in the future.

44

S. Agrawal et al.

– Efficient on large scale data. Enable handling of large databases (e.g. genomic data from hundreds of thousands of individuals, or census data) efficiently. This may be done by allowing some well-defined leakage to the servers, as is common in the searchable encryption literature. – Anonymity of data. As we allow multiple data-owners, the data items referenced in the leakage above when obtained by a compromised server should not be traced back to the data-owners who contributed those items. It is impossible to satisfy the first three requirements using a single (untrusted) server – it may internally play the role of clients and make any number of arbitrary queries to the database, violating security. Hence, we propose a solution with two servers: one with large storage that stores all the encrypted data, and an auxiliary server who stores only some key material. Either server by itself can be corrupt, but they will be assumed not to collude with each other. This model is well-accepted in the literature for privacy-preserving computation on genomic data [7,16], justified by the fact that in the real world, genomic data in the US [16] may be managed by distinct governmental organizations within the National Institutes of Health and the World Health Organization which are expected not to collude. More generally, this model has been used in multi-server Private Information Retrieval [8], CFE [20],1 SSE [21], Secure Outsourced Computation [18] (including in the context of genomic data [28]), and even large-scale real-world systems like Riposte [9] and Splinter [31]. 1.2

Our Results

As discussed above, our first contribution is the notion of a Functionally Encrypted Datastore (FED), which permits a data-owner to securely outsource its data to a storage server, such that, with the help of an auxiliary server, clients can carry out select-and-compute queries efficiently. We emphasize that our database is anonymously collaborative in the sense that it contains data belonging to multiple data owners but hides the association of the (encrypted) data items with their owners. In addition to this, our contributions include: – A general framework for instantiating FED schemes. The framework is modular, and consists of two components, which may be of independent interest – namely, Searchably Encrypted Datastore (SED) and Computably Encrypted Datastore (CED). We present several constructions to instantiate this framework (see an overview in Sect. 1.3). – We demonstrate the utility, efficiency and scalability of our protocols via extensive experimentation. In experiments that model Genome Wide Association Studies (GWAS), genomic records from 100,000 data owners (each contributing a single record) is securely procured in around 500 s, and a client query that filters up to 12,000 records is answered in less than 15 s with a maximum of 15 MB of total communication in the system. For standard functions in GWAS, like Minor Allele Frequency (MAF) and Chi Square Analysis, 1

In CFE, the storage server is implicit as it carries out no computations.

A Practical Model for Collaborative Databases

45

our single data owner based FED protocol has an overhead of merely 1.5×, compared to SSE schemes (which offer no computing functionality). To the best of our knowledge, no prior work achieves the features of collaborative databases, select and compute functionality and efficiency that we achieve in this work. See Table 2 for a comparison. 1.3

Overview of Constructions

We present several modular constructions of FED (and the single data-owner version sFED), which can be instantiated by plugging in different implementations of its components. Below we give a roadmap of how the two simpler primitives, Searchably Encrypted Datastore (SED) and Computably Encrypted Datastore (CED) can be securely dovetailed into a construction of FED. The starting point of our constructions are single data-owner versions sSED and sCED. We show that these components can be implemented by leveraging constructions from the literature, namely, the Multi-Client SSE (MC-SSE) scheme due to Jarecki et al. [17] and CFE due to Naveed et al. [20]. The search query family supported by our sSED constructions are the same as in [17]. For sCED, we support a few specialized functions, as well as a general function family. The primitives sSED and sCED are of independent interest, and they can also be combined to yield a single data-owner version of FED (called sFED). To upgrade sSED and sCED constructions into full-fledged (multi dataowner) SED and CED schemes, we require several new ideas. One challenge in this setting is to be able to hide the association of the (encrypted) data items with their data-owners. A simple approach, wherein each data-owner sends encryptions/secret shares of its data directly to the two servers, hence does not work. Our approach is to first securely merge the data from the different data-owners in a way that removes this association, and then use the single data-owner constructions on the merged data set. For this, both SED and CED constructions rely on onion secret-sharing techniques. Onion secret-sharing is a non-trivial generalization of the traditional mix-nets [6]. In a mix-net, a set of senders D1 , · · · , Dm want to send their messages M1 , · · · , Mm to a server S, with the help of an auxiliary server A (who does not collude with S), so that neither S nor A learns the association between the messages and the senders. We require the following generalization: each sender Di wants to secret-share its message Mi between two servers S and A; that is, it sets Mi = σi ⊕ ρi , and wants to send ρi to S and σi to A. While the senders want their messages to get randomly permuted in order to remove the association of data with themselves, the association between σi and ρi needs to be retained. As described in Sect. 4.1, onion secret-sharing provides a solution to this problem (and more). In the case of CED, merging essentially consists of a multi-set union which can be solved using onion secret-sharing. But in the case of SED, merging entails merging “search indices” on individual data sets into a search index for the combined data set. A search index is a function that maps a keyword to a set of

46

S. Agrawal et al.

records; merging such indices requires that for each keyword, the sets corresponding to it in the different indices be identified and merged together. For this onion secret-sharing alone is not adequate. We propose two approaches to merge the indices – one in the random oracle model using an Oblivious Pseudorandom Function (OPRF) protocol, and another one with comparable efficiency in the standard model, relying on 2-party Secure Function Evaluation (SFE). Please see Sect. 4 for more details.

2

FED Framework

We use the notion of an ideal functionality [13] to specify our requirements from an FED system. FED is formulated as a two stage functionality, involving an initialization stage and a query stage. Figure 1 and Fig. 2 depict the FED functionality and protocol template schematically. The parties involved are: – Data-owners Di (for i = 1, · · · , m, say) who are online only during the initialization phase. Each data-owner Di has an input Zi ⊆ W × X , where for each (w,  x) ∈ Zi , w form the searchable attributes and x the computable values. Z = i Zi denotes the multi-set union of their inputs. – Storage Server S, which is the only party with a large storage after the initialization stage. – Auxiliary Server A, assumed not to collude with S. – Clients which appear individually during the query phase. A client C has input a query of the form (Q, f ) where Q : W → {0, 1} is a search query on the attributes, and f is a computation function on a multi-set of values. It receives in return f (Q[Z]) where Q[Z] denotes the multi-set consisting of elements in X , with the multiplicity of x being the total multiplicity of elements of the  form (w, x) in Z, but restricted to w that are selected by Q; i.e., μQ[Z] (x) = w∈W:Q(w)=1 μZ (w, x), where μR (y) denotes the multiplicity of an element y in multi-set R. S

We instantiate our framework using protocols that provide security against active corruption of D1 Z1 the clients and passive corruption of the other Q, f entities, allowing any subset not involving both ... FED C f(Q[Z]) the servers to collude. Our protocols provide different levels of (acceptable) leakage and are effi- Dm Zm cient on large scale data. We remark that these protocols indeed satisfy the desiderata outlined A in Sect. 1: The data owners are not required to trust a single central authority or co-ordinate with Fig. 1. FED functionality each other, the clients are oblivious of data own- (dotted lines show leakage) ers and of each other, and anonymity of data is maintained modulo the leakage to the storage server and auxiliary sever.

A Practical Model for Collaborative Databases

47

Keyword Search Queries: Following the searchable encryption literature we focus on “keyword queries.” A keyword query is either a predicate about the presence of a single keyword in a record (document), or a boolean formula over such predicates. The searchable attribute for each record is a set of keywords, w ⊆ K where K is a given keyword space. In terms of the notation above, W = P(K), the power set of K. For instance, in a contact-tracing application, each keyword is a coarse-grained (space, time) pair, and a record is a set of such keywords. Risky co-ordinates are assumed to be explicitly enumerated in the query, and records matching any of them are returned for further computation. A basic search query could be a keyword occurrence query of the form Qτ , for τ ∈ K, defined as Qτ (w) = 1 iff τ ∈ w. A more complex search query can be specified as a boolean formula over several such keyword occurrence predicates. Composite Queries: We shall sometimes allow Q and f to be more general than presented above. Specifically, we allow Q = (Q1 , · · · , Qd ), where each Qi : W → {0, 1} and f = (f0 , f1 , · · · , fd ), where for i > 0, fi are functions on multi-sets of values, and f0 is a function on a d-tuple; we define f (Q[Z]) := f0 (f1 (Q1 [Z]), · · · , fd (Qd [Z])). SED

2.1

Protocol Template

All our protocols for FED use the template illustrated in Fig. 2. This is a fixed protocol (described below) that uses two ideal functionalities SED and CED, which in turn are implemented using secure protocols. Thus to obtain a full protocol for FED, we need to only specify the protocols for SED and CED.

W1 Z1

D1

X1

Zm

A

Xm

Q

Q[W ]

Wm Dm

S

T

C

f

Q, f f(Q[Z])

f(δT[X]) CED

Fig. 2. FED protocol template

Searchably Encrypted Datastore. Recall that in an FED or sFED scheme, a query has two components – a search query Q and a computation function f . The Searchably Encrypted Datastore functionality (SED or sSED) has a similar structure, but supports only the search query; all the record identities that match the search query are revealed to the storage server S. This is referred as access pattern leakage in SSE literature and is leaked to S in most SSE schemes [10,17]. The functionality SED is depicted in Fig. 2: Each data-owner Di maps its input Zi to a pair (Wi , Xi ), where Wi ⊆ W × I and Xi ⊆ I × X such that (w, x) → ((w, id), (id, x)) where id is randomly drawn from (a sufficiently large set) I; the output that  S receives when a client C inputs Q is the set of identities Q[W ] ⊆ I where W = i Wi . Please see Sect. 3.1 and Sect. 4.2 for constructions of sSED and SED respectively. Computably Encrypted Datastore. Our second functionality – CED, or its single data-owner variant sCED – helps us securely carry out a computation on an already filtered data set. The complexity of this computation will be related to the size of the filtered data rather than the entire contents of the data set. In CED, as shown in Fig. 2, a data-owner Di (who is online only during the setup phase) has an input in the form of Xi ⊆ I × X . Later, during the query

48

S. Agrawal et al.

 phase, clients can compute functions on a subset of data, X = i Xi . More precisely, a client C can specify a function f from a pre-determined function family, and the storage server S specifies a set T ⊆ I, and C receives f (δT [X]) where we define δT (id, x) = x if id ∈ T and ⊥ otherwise, and δT [X] is the multiset of x values obtained by applying δT (id, x) to all elements of X. Please see Sect. 3.2 and Sect. 4.2 for constructions of sCED and CED respectively. Putting it Together. In the protocol for FED, during the initialization phase, each D locally maps its input Zi to a pair (Wi , Xi ), where Wi ⊆ W × I and Xi ⊆ I×X as follows: (w, x) → ((w, id), (id, x)) where id is randomly drawn from (a sufficiently large set) I. Then, the parties Di , S and A invoke the initialization phase of the SED functionality, and that of CED (possibly in parallel). During the query phase of FED, S, A and C first invoke the  query phase of SED, so that S obtains T = Q[W ] as the output, where W = i Wi ; then they invoke the query phase of CED (with the input of S being T ), and C obtains f (δT [X]). For correctness, i.e., δT [X] = Q[Z], we rely on there being no collisions when elements are drawn from I. Leakage: For every query (Q, f ) from C, this protocol leaks the set T = Q[W ] to S, and in addition provides S and A with the leakage provided by the sSED and sCED functionalities. Note that Di chooses ids at random to define Wi and Xi and the leakage functions of SED and CED are applied to these sets. Leaking T is what is known in the SSE literature as the access pattern leakage, i.e. it amounts to leaking the pattern of T over multiple queries. Specifically, since the ids are random, T1 , · · · , Tn contains only the information provided by the intersection sizes of all combinations of these sets. Formally, this leakage is given  by pattern(T1 , · · · , Tn ) := {| i∈R Ti |}R⊆[n] . Notation Summary. For ease of reference, we provide a brief summary of our notation in the following Table 1. Table 1. Notation index Notation Meaning

Notation Meaning

W

Universe of search-attributes in the data records

W

Input of D to SED functionality; W ⊆ W × I

I

Universe of identifiers used for the data records

Z

Input of D to FED functionality; Z ⊆ W × X

X

Universe of compute-attributes X in the data records

Input of D to CED functionality; X ⊆ I × X

(Q, f )

Input of C to FED;

T

Set of id’s selected by a query; T ⊆ I; T = Q[W ]

J

The set of unique identifiers for each record in the CED database; J ⊆ I; J = {id|∃x s.t. (id, x) ∈ X}

DB(w)

For any w ∈ K, the set of document indices that contain the keyword w

L

Leakage function

K

Universe of keywords

A Practical Model for Collaborative Databases

3

49

Single Data-Owner Protocols

Our protocols for the single data-owner setting follow the template from Sect. 2.1, but restricted to a single data owner D. We refer to the functionalities FED, SED and CED specialized to this setting as sFED, sSED and sCED respectively. As before, to instantiate a sFED protocol, we need only instantiate sSED and sCED protocols. 3.1

Instantiating sSED

We construct sSED by adapting constructions of symmetric searchable encryption (SSE) from the literature. Specifically, we use the multi-client extension of SSE by Jarecki et al. [17], where the data owner is separate from (and does not trust) the client. The main limitations of MC-SSE compared to sSED are that (1) in the former the data-owner D remains online throughout the protocol whereas in the latter D can be online only during the initialization phase, and (2) in the former the output is delivered to both S and C whereas in the latter it must be delivered only to S. We shall leverage the auxiliary server A to meet the additional requirements of sSED compared to MC-SSE. Our model also does not allow any leakage to the clients and handles active corruption of clients, in contrast to [17]. As a lightweight modification of the MC-OXT protocol of [17], our goals can be achieved by simply letting the auxiliary server A play the role of the data owner during the query phase, as well as the client in the MC-OXT protocol. This incurs some leakage to A, namely the search queries Q and the search outcome T . We provide some other alternatives with different security-efficiency tradeoffs in the full version [3] which maybe of independent interest. The leakage is summarized in Table 3. Note that since sSED is instantiated with random IDs, when used in the sFED protocol template (Fig. 2), only the pattern of the search outcomes, is revealed to A and S. 3.2

Instantiating sCED

We develop several sCED protocols for computing various functions on data filtered by a search query. The functionality, assumptions and leakage incurred by our protocols are summarized in Table 3. It will be convenient to define the set J ⊆ I as J = {id|∃x s.t. (id, x) ∈ X}. During each query, S has an input T ⊆ J (output from the SED functionality) and C has an input f from the computation function family. – Value Retrieval. This is the functionality associated with standard SSE, where selected values are retrieved without any further computation on them. There is a single function in the corresponding computation function family, given by f (δT [X]) = δT [X]. Below we give a simple scheme which relies on A to support multiple clients who do not communicate directly with D.

50

S. Agrawal et al.

Protocol sValRet • Initialization: D picks a PRF key K, sets βid := xid ⊕ FK (id), where xid is the value associated with id, and sends {(id, βid )}id∈J to S and K to A. • Computation: S picks a PRF key K1 , computes a random permutation of T and sends these to A. Additionally, S computes the same permutation of {βid ⊕ FK1 (id) | id ∈ T } and sends this to C. A sends {FK (id) ⊕ FK1 (id) | id ∈ T } to C. C outputs {ai ⊕ bi }i , where {ai }i and {b}i are the messages it received from S and A. • Leakage, LsValRet : On initialization, the ID-set J is leaked to S where J ⊆ I, J = {id|∃x s.t. (id, x) ∈ X}. On each query, T is leaked to A. – Summation. The family FSum consists of the single function f such that  f (S) = x∈S x, where the summation is in a given abelian group which the domain of values is identified with. We provide a simple and efficient sCED protocol sSum, which is is similar to sValRet, but uses the homomorphic property of additive secret-sharing to aggregate the values while they remain shared. The value retrieval and summation protocols can also be extended to the setting where each value x is a vector (x1 , · · · , xm ). Please see the full version [3] for details. – General Functions. We provide a sCED scheme for general functions. Our protocol for general functions, sCktEval makes use of garbled circuits and symmetric encryption, and can be seen as an adaptation of the CFE scheme from [20]. Despite the fact that garbled circuits have been optimised for performance in literature, garbled circuits incur high communication complexity and make this protocol less efficient than our other protocols. Due to space constraints, we refer the reader to [3] for the protocol description. – Composite Queries. Recall that a composite query has Q = (Q1 , · · · , Qd ) and f = (f0 , f1 , · · · , fd ) such that f (Q[Z]) := f0 (f1 (Q1 [Z]), · · · , fd (Qd [Z])). We note that the sSED protocol for non-composite queries directly generalizes to composite queries, by simply running d instances of the original sSED functionality to let S learn Ti = Qi [W ] for each i. But we need to adapt the sCED protocol to avoid revealing each fi (Qi [Z]) to C. Towards this, we observe a common feature in all our sCED protocols, namely that the last step has S and A sending secret shares for the output to C. We leverage this in our protocol sComposite, wherein, instead of S and A sending the final shares of f1 , . . . , fd to C, they carry out a secure 2-party computation of f0 with each input to f0 , namely fi (Qi [Z]) for i ∈ [d] being secret-shared between S and A as described above. This can be implemented for a general f0 using Yao’s garbled circuits and oblivious transfer (OT) [32], or for special functions like linear combinations using standard and simpler protocols. The leakage includes the composed leakage of running d instances to compute (f1 , . . . , fd ). Also, the function f0 is leaked to A and S.

A Practical Model for Collaborative Databases

4

51

Multiple Data-Owner Protocols

Following the template in Sect. 2.1, to instantiate a FED protocol, we instantiate SED and CED protocols in this section (see Table 3 for a quick summary). To do so, we first introduce the primitive of onion-secret sharing, which we will use in our constructions. 4.1

Onion Secret-Sharing

To illustrate onion secret-sharing, let us first consider the following simplified task: Each sender Di wants to share its message Mi between two servers S and A; that is, it sets Mi = σi ⊕ ρi , and wants to send σi to S and ρi to A. While the senders want their messages to get randomly permuted, removing the association of the data with themselves, the association between σi and ρi needs to be retained. Onion secret-sharing provides a solution to this problem, as follows. Below, let P K S be the public-key of S for a semantically secure publickey encryption scheme (P K A is defined analogously), and let [[M ]]P K denote encryption of M using a public-key P K. Now, each Di sends [[(ρi , ζi )]]P K S to A, where ζi is of the form [[σi ]]P K A . A mixes these ciphertexts and forwards them to S, who decrypts them to recover pairs of the form (ρi , ζi ). Now, S reshuffles (or sorts) these pairs, stores ρi and sends ζi (in the new order); A recovers σi from ζi (in the same order as ρi are maintained by S). This can be taken further to incorporate additional functionality. As an example of relevance to us, suppose A wants to add a short, private tag to the messages being secret-shared so that the tag persists even after random permutation. Among the messages which were assigned the same tag, A should not be able to link the shares it receives after the permutation to the ones it originally received; S should obtain no information about the tags. Looking ahead, this additional functionality will be required in one of our SED constructions. One solution is for A to add encrypted tags to the data items, and then while permuting the data items, S would rerandomize the ciphertexts holding the tags. We present an alternate approach, which does not require additional functionality from the public-key encryption scheme, but instead augments onion secret-sharing with extra functionality: – Di holds input Mi and shares it as Mi = σi ⊕ ρi . It creates a 3-way additive secret sharing of 0 (the all 0’s string), as αi ⊕ βi ⊕ γi = 0, and sends (αi , [[βi , ρi , [[γi , σi ]]P K A ]]P K S ) to A. – A assigns tags τi for each of them, and sends (in sorted order) (τi ⊕ αi , [[βi , ρi , [[γi , σi ]]P K A ]]P K S ) to S. – S sends (τi ⊕ αi ⊕ βi , [[γi , σi ]]P K A ) to A, in sorted order; it stores ρi (in the same sorted order). – A recovers (τi ⊕ αi ⊕ βi ⊕ γi , σi ) = (τi , σi ).

52

S. Agrawal et al.

This allows S and A to receive all the shares (in the same permuted order); S learns nothing about the tags; A cannot associate which shares originated from which Di , except for what is revealed by the tag. (Even if S or A collude with some Di , the unlinkability is retained for the remaining Di ). In Sect. 4.2, we use a variant of the above scheme. 4.2

Instantiating SED

We describe a general protocol template to realize the SED functionality, using access to the sSED functionality. The high-level plan is to let A create a merged database so that it can play the role of D for sSED. However, since we require privacy against A, the merged database should lack all information except statistics that we are willing to leak. Hence, during the initialization phase, we not only merge the databases, but also replace the keywords with pseudonyms and keep other associated data encrypted. We use pseudonyms for keywords (rather than encryptions) to support queries: during the query phase, the actual keywords will be mapped to these pseudonyms and revealed to A. These two tasks at the initialization and query phases are formulated as two sub-functionalities— merge-map and map—collectively referred to as the functionality mmap, and are explained below. Functionality Pair mmap = (merge-map, map) – Functionality merge-map takes as inputs Wi from each Di ; it generates a pair of “mapping keys” Kmap = (KS , KA ) (independent of its inputs), and ˆ creates a “merged-and-mapped” input  set, W . Merging simply refers to computing the (multi-set) union W = i Wi ; mapping computes the multi-set ˆ ˆ = MK (id)}, where the ˆ = {(w, W ˆ id)|(w, id) ∈ W, w ˆ = MKmap (w), and id map mapping function M is to be specified when instantiating this template – Functionality map takes KS and KA as inputs from S and A respectively, and a ˆ to C. We shall require query Q from a client C; then it outputs a new query Q ˆ id ˆ ∈ Q( ˆ W ˆ )}, that there is a decoding function D such that Q[W ] = {DKS (id)| ˆ where W is as described above. These functionalities may specify leakages to S and/or A, but not to the dataˆ to C, for security, we owners or clients. Note that since map gives an output Q shall require that it can be simulated from Q. We give two constructions, with different efficiency-security trade-offs. Recall that each data-owner Di has an input Wi ⊆ W × I. In both the solutions, Di i ⊆ K × I such that (w, id) ∈ Wi iff shall use a representation of Wi as a set W i } (w ⊆ K is the set of keywords (searchable attributes) and w = {τ |(τ, id) ∈ W τ ∈ K is a keyword). In the two solutions below, the mapping function M maps the keywords differently; but in both the solutions, an identity id that occurs i is mapped to ζ id ← [[id]]P K (an encryption of id under S’s public-key). in W S i In one scheme, we use an oblivious pseudorandom function (OPRF) protocol to calculate M, while the other relies on a secure function evaluation for equality.

A Practical Model for Collaborative Databases

53

Protocol mmap-OPRF merge-map: – Keys. S generates a keypair for a CPA-secure PKE scheme, (PK S , SK S ), and similarly A generates (PK A , SK A ). The keys PK S and PK A are published. S also generates a PRF key K. – Oblivious Mapping. Each data-owner, for each τ ∈ Ki , Di engages in an Oblivious PRF (OPRF) evaluation protocol with S, in which Di inputs τ , S inputs K and Di receives as output τˆ := FK (τ ). Di carries out Li − |Ki | more OPRF executions (with an arbitrary τ ∈ Ki as its input) with S. id PK S , for each id such that there – Shuffling. Each data-owner Di computes ζiid i . All the data-owners use the help of S to route the is a pair of the form (τ, id) ∈ W set of all pairs (ˆ τ , ζiid ) to A, as follows: id ← (ˆ τ , ζiid ) PK A , and sends it to S. Further • Each Di , for each (τ, id) computes ξi,τ i | more ciphertexts of the form ⊥ PK . Di computes Ni − |W A • S collects all such ciphertexts (Ni of them from Di , for all i), and lexicographically sorts them (or randomly permutes them). The resulting list is sent to A. • A decrypts each item in the received list, and discards elements of the form ⊥ to ˆ where id ˆ = ζiid . W ˆ is defined as obtain a set consisting of pairs of the form (ˆ τ , id), ˆ where w ˆ the set of pairs of the form (w, ˆ id) ˆ contains all τˆ values for each id. ˆ and an empty KA ; S’s output is KS = (SK S , K). – Outputs. A’s outputs are W map: A client C has an input Q which is specified using one or more keywords. For each keyword τ appearing in Q, C engages in an OPRF execution with S, who inputs the key K that is part of KS . The resulting query is output as τˆ.

Fig. 3. Protocol mmap-OPRF implementing functionality mmap.

Construction mmap-OPRF: The pair of protocols for the functionalities merge-map and map, collectively called mmap-OPRF, is shown in Fig. 3. In this solution, the mapping key KA is empty, and KS consists of a PRF key K, in addition to a secret-key for a public key encryption (PKE) scheme, SK S . M maps each keyword τ using a pseudorandom function with the PRF key K. This is implemented using an oblivious PRF execution with S, in both merge-map and map protocols. That is, for w ⊆ K we have MKmap (w) = {FK (τ )|τ ∈ w}, where F is a PRF. Note that mmap-OPRF uses two a priori bounds for each data-owner, Ni and i | ≤ Ni and |Ki | ≤ Li , where Ki = {τ |∃id s.t. (τ, id) ∈ W i } is Li such that |W the set of keywords in Di ’s data. We point out the need for OPRF evaluations. If we allowed S to simply give the key K to every data-owner to carry out the PRF evaluations locally, then, if A colludes with even one data-owner Di , it will be able to learn the actual keywords in the entire data-set.

54

S. Agrawal et al.

 Leakage, LOPRF mmap : S learns the bounds Ni and Li for each i, and A learns i Ni ; from the map phase, S learns the number of keywords in the query. ˆ We also include in LOPRF mmap what the output W to A reveals (being an output, this is not “leakage” for mmap-OPRF; but it manifests as leakage in the SED protocol ˆ reveals an anonymized version of the keywordthat uses this functionality). W identity incidence matrix of the merged data2 , where the actual labels of the rows and columns (i.e., the keywords and the ids) are absent. Assuming adaptive corruption of A after the setup phase (with reliable erasures), the adversary doesn’t get this leakage. Please refer to the full version [3] for a discussion on implications of this leakage, and possible ways to avoid it. Construction mmap-SFE: Our next protocol avoids OPRF evaluations. The idea here is to allow each data owner to send secret-shared keywords between S and A, and rely on secure function evaluation (SFE) to associate the keywords with pseudonymous handles, so that the same keyword (from different data owners) gets assigned the same handle (but beyond that, the handles are uninformative). Further, neither server should be able to link the keyword shares with the data owner from which it originated, necessitating a shuffle of the outputs. Due to the complexity of the above task, a standard application of SFE will be very expensive. Instead, we present a much more efficient protocol that relies on secure evaluation only for equality checks, with the more complex computations carried out locally by the servers; as a trade-off, we shall incur leakage similar to that of the OPRF-based protocol above. Below, we sketch the ideas behind the protocol, with the formal description in [3]. Consider two data owners who have shared keywords τ1 and τ2 among A and S, so that A has (α1 , α2 ) and S has (β1 , β2 ), where τ1 = α1 ⊕β1 and τ2 = α2 ⊕β2 . Note that τ1 = τ2 ⇔ α1 ⊕ β1 = α2 ⊕ β2 ⇔ α1 ⊕ α2 = β1 ⊕ β2 . So, for A to check if τ1 = τ2 , A and S can locally compute α1 ⊕ α2 and β1 ⊕ β2 and use a secure evaluation of the equality function to compare them (with only A learning the result). By comparing all pairs of keywords in this manner, A can identify items with identical keywords and assign them pseudonyms (e.g., small integers), while holding only one share of the keyword. This can be done efficiently using securely pre-computed correlations. Let’s say we wish to compute if strings x = y. We map x, y using a collision resistant hash H to 128 bits, and interpret them as elements in a field, say F = GF (2128 ). The protocol then uses the following pre-computed correlation: Alice holds (a, p) ∈ F2 and Bob holds (b, q) ∈ F2 such that a+b = pq and p, q and one of a, b are uniformly random. Alice sends u := H(x) + p to Bob, where x is her input. Bob replies with v := q(u − H(y)) − b, where y is his input. Alice concludes x = y iff a = v. The data shared by the data owners to the servers includes not just keywords, but also the records (document identifiers) associated with each keyword. To minimize the number of comparisons needed, each data owner packs all the records corresponding to a keyword τ into a single list that is secret-shared along with the keyword (rather than share separate (τ, id) pairs). However, after 2

This is a 0–1 matrix with 1 in the position indexed by (τ, id) iff (τ, id) ∈ rows and columns may be considered to be ordered randomly.

  i Wi . The

A Practical Model for Collaborative Databases

55

handles have been assigned to the keywords, such lists should be unpacked and shuffled to erase the link between multiple records that came from a single list (and hence the same data owner). Each entry in each list can be tagged by A using the keyword handle assigned to the list, but then the entries need to be shuffled while retaining the tag itself. This is accomplished using onion secretsharing – please see the full version  [3] for details. This construction uses O(( i Li )2 ) secure equality checks, where Li is an upper bound on the total number of keywords in Di ’s data. We describe an approach to improve this, the leakage and security analysis as well as a comparison with the OPRF protocol in the full version [3]. Instantiating CED: To upgrade a sCED protocol to a CED protocol we need additional tricks using onion secret-sharing. This results in a more involved security analysis and slower performance. Due to space constraints, we refer the reader to the full version [3] for the description.

5

Implementation and Experimental Results

We show the feasibility and performance of our protocols on realistically largescale computations by running tasks representative of Genome Wide Association Studies (GWAS), which is widely used to study associations between genetic variations and major diseases [4]. Implementation/Setup Details. The system was implemented in C++11 using open-source libraries The MC-OXT protocol of Jarecki et al. [17] was reimplemented for use in our SED protocols. Experiments were performed on a Linux system with 32 GB RAM and a 8-core 3.4 GHz Intel i7-6700 CPU. The network consisted of a simulated local area network (simulated using linux tc command), with an average bandwidth of 620 MBps and ping time of 0.1 ms. Our Functionalities. The specific functions we choose as representative of GWAS were adapted from the iDash competition [1] for secure outsourcing of genomic data. Each record in a GWAS database corresponds to an individual, with fields corresponding to demographic and phenotypic attributes (like sex, race etc.), as well as genetic attributes. In a typical query, the demographic and phenotypic attributes are used for filtering, and statistics are computed over the genetic information. In our experiments, the following statistics were computed. – Minor Allele Frequency (MAF): This can be described using the formula f0 (f1 (Q[Z]), f2 (Q[Z])), where f1 computes N = |Q[Z]|, f2 computes the summation function, and f0 computes the following formula: f0 (x, y) = min(y,2x−y) . We implement f2 using our sSum protocol, while f0 is imple2x mented using our sComposite protocol (using GC for f0 ).

56

S. Agrawal et al.

– Chi-square analysis (ChiSq): Using two search filters for queries Q and Q , χ2 statistic can be abstractly described using the formula f0 (f1 (Q[Z]), f2 (Q[Z]), f3 (Q [Z]), f4 (Q [Z])), where f1 , f3 are summation functions, f2 and f4 compute |Q[Z]| and |Q [Z]| respectively, and (b+d)(ad−bc)2 . As previously, f1 , f3 are implemented f0 (a, b, c, d) = bd(a+c)((b+d)−(a+c)) using our sSum protocol and f0 implements the above formula using our sComposite protocol. – Average Hamming Distance: Hamming distance between 2 genome sequences, often used as a metric for genome comparison, is defined  ∗ x∈Q[Z] Δ(x,x ) , where Δ stands for the hamming distance of fx∗ (Q[Z]) = |Q[Z]| two strings. Here, we consider the entire genotype vector x for each individual (rather than the genotype at a given locus). We implement this using our general functions protocol sCktEval. – Genome retrieval: This is a retrieval task, involving retrieval of genomic data at a locus for individuals selected by a search criteria. This is implemented using our sValRet protocol. Dataset and Queries. We used synthetic data inspired by real-world applications [15], with 10,000–100,000 records and 50 SNPs, and with the number of filtered records ranging 2,000–12,000. For Hamming distance, which we implemented using our protocol for general functions (sCktEval) and costlier than our other protocols, we used 5,000–25,000 records with 600–3500 filtered records. The experimentation parameters were chosen to showcase the observations described below. Effects of changing the experimental setting (e.g., a WAN instead of a LAN model, bigger datasets and keyword spaces, etc.) are discussed in the full version. Metrics. Our results are reported over two metrics – the total time taken and the total communication size, across all entities. These metrics are reported separately for the initialization and query phases. Also, the costs incurred by the SED component of the protocols are shown separately, as this could serve as the baseline search cost against which the FED cost can be compared. We performed experiments in both the single (sFED) and multiple data owner (FED) setting, and in the latter case with both the OPRF-based and SFE-based SED protocols.

A Practical Model for Collaborative Databases

57

Fig. 4. Initialization and query-phase time plots for our four applications. (See Fig. 7 for communication costs.)

5.1

Observations

Linear in Data. In Fig. 4, we plot the initialization and query times (summed over all parties, with serial computation and LAN communication) for various functions, each one using our different protocols. The experiment runs on a single data-owner (even for FED) and the initialization times are plotted against the total number of records, while query times are plotted against the number of filtered records. As expected, all the times are linear in the number of records. Comparing the different versions of the protocols, we note that for initialization, the multiple data-owner protocols (FED-OPRF and FED-SFE) are slower that the single data-owner protocol (sFED) by a factor of about 10–12x and 15–18x respectively. The overhead in the query phase is a more modest 5x factor. Efficiency of Query Phase. From our experimental results in Fig. 4, we observe that the query time for our CED and SED protocols is linearly proportional to the number of filtered records. Comparison to SSE. We show a breakup of our sFED cost between search and compute components in Fig. 5. While the cost for Av. Hamming includes a dataset of 15,000 records and a filtered set of around 1800 records, other protocols involve a dataset of 20,000 records and a filtered set of around Fig. 5. Break-up of sFED cost. 2500 records. As can be observed, the overhead for supporting computation in addition to search ranges between 1.2–1.7x, except for Av. Hamming, which as a result of computing a large function using

58

S. Agrawal et al.

the protocol sCktEval, involves an overhead of 22x. Nevertheless, the computation remains feasible: we evaluate Hamming distance between strings as large as 1.25k bits in 10 s. Scalability with Number of Data-Owners. The setup phase of our protocols scale well with the number of data-owners participating in the system (see Fig. 6). Each data owner provides one record. As can be seen from the plot, our OPRF based protocol scales to 100,000 data owners in under 10 min. The SFE-based protocol (which scales quadratically, but avoids the use of random oracles) scales only up to 2000 owners in a similar time.3 In our experiments data owners were serviced serially by S, but we estimate a drop of 100–200 s for our largest benchmarks if we exploited parallelism. Our protocols’ performance in the query phase only depend on the total number of records input by the data-owners i.e. a single data owner contributing 10,000 records will have the same query performance as 10,000 data owners contributing 1 record. Light-Weight/Efficient Clients. As is apparent from the complexity analysis of our protocols, our clients are extremely efficient and light-weight, typically only performing computation proportional to size of queries and outputs. Indeed, in all our experiments, client’s computation time Fig. 6. Scaling of initialization time of FED/SED with increasing number of datanever exceeded 3 ms. owners (each holding one record)

5.2

Comparison with Generic MPC Approaches

While MPC techniques cannot be directly deployed in our setting, as the dataowners cannot interact with each other during the query phase, they can be adapted to our 2-server architecture (assuming they do not collude with each other). The two servers could use generic semi-honest secure 2-party computation (2PC) on data that is kept secret-shared between them. However, as we note below, such a solution is significantly less efficient than our constructions. We focus on the “Genome Retrieval” functionality for a single keyword match with a single data owner, as a minimalistic task for comparison. Note that for each query, the servers must engage in a joint computation proportional to the entire database size. To ensure a fair comparison, we allow the servers to obtain a

3

The cost for generating correlated inputs to the server was not included, as this can be done ahead of time even before the setup phase in an efficient manner [19]. Here, the size of the field elements used for the secure equality check was aggressively set to 64 bits, which is adequate to avoid hash collisions for a modest number of data owners and distinct keywords.

A Practical Model for Collaborative Databases

59

similar leakage as our constructions do; this significantly simplifies the functions being securely computed, and reduces the costs.4 There are essentially 3 families of generic 2PC protocols which are relevant for the comparison: i) Garbled Circuits (GC) [11], ii) GMW [11] and iii) Fully Homomorphic Encryption (FHE) [2]. For the query phase, GC and GMW suffer heavily in communication cost, since communication between the servers is proportional to the circuit size. For instance, in the case of GC, we can lower bound the communication by restricting to the filtering step in the computation, and then calculating the size of the garbled circuit and oblivious transfers (OT) performed. Concretely, for 100,000 records, this implies a minimum communication size of 576 MB. For GMW, for simplicity, we omit the cost of generation of OT correlations (even though, in steady state, they need to be incurred as more queries come in). Then, considering merely 4 bits of communication per AND gate, for the same setting as above, we lower bound the communication during the query phase by 3.6 MB. On the other hand, for the same 100,000 records and a query for filtering around 12, 000 records, our solution incurs only 2.8 MB of communication. For FHE based implementations, the bottleneck is homomorphic computation. A simple lower bound on execution time may be obtained by bounding the number of multiplications performed and considering the cost of a single homomorphic multiplication. For instance, for the smallest size parameters in the Microsoft SEAL homomorphic encryption library [2], the time for a single homomorphic multiplication is the order of milliseconds. Thus a simple estimate for execution time is at least 7 s, whereas for the same parameters, we achieve a time of 2.4 s. We note that our performance gains increase substantially with increase in the number of records in the database (indeed for 10 million records and a filtered set size of around 500, while we incur 112 KB of communication in T hreshold then 6: ClusterT oClusterM apping[p] ← k ;

Algorithm 2. The recovery & discovery algorithm for finding older appearances of a cluster Require: TI,c , M S, T hreshold Ensure: T rackedCluster 1: N ← N umberOf Instances(TI,c ) 2: T rackedCluster ← emptyString 3: for all M ∈ M S do: 4: C←0 5: for all I ∈ TI,c do: 6: PI ← P (Yi = 1|M ) 7: C ← C + PI 8: Cnormalized ← C/N 9: if Cnormalized > T hreshold then 10: T rackedCluster ← M.name

98

D. Cohen et al.

where predictm (x) returns the probability that x belongs to the positive class using model m. We assign c with the annotation of cluster cj if p(c, cj ) obtains the highest probability for all models in M , and p(c, cj ) > β for some user defined parameter β (we set β = 0.7). Similarly to the Jaccard similarity calculation, one can easily distribute the prediction part as those predictions can also be calculated simultaneously. A formal description is described in Algorithm 2. In cases in which there is no match in any of the classifiers in M , we consider cluster c to be a new cluster. Once a new cluster is found we train a new classifier on this cluster’s data as previously explained. After some time, a concept drift may occur, and the patterns change slightly. To deal with this issue, in cases in which a known cluster appears in the data stream, we update and retrain the corresponding model.

6

Analysis of Darknet Traffic

Unlike other methods that used darknet data streams, e.g. [13], DANTE is not trying to find anomalies on specific ports, but rather find concepts and trends in the data. While methods that find correlations between ports exist [4], (1) they operate offline detecting patterns months after the fact, and (2) do not track the patterns over time. In contrast, DANTE finds new trends online (within minutes) and can detect recurring and novel patterns. To the best of our knowledge there are no online algorithms for detecting and tracking patterns in darknet traffic. Therefore, we demonstrate the usefulness of the proposed approach through a case study on over a year of data. The data was collected from a greynet [6,22], meaning that the unused IPs are from a network that is populated by both active and unused IP addresses. Because of the nature of greynet, it is harder for an attacker to detect if the target host is an active machine or a darknet sensor. 6.1

Configuration and Setup

Dataset. For the purpose of this research, the NSP (Network Service Provider) established 1,126 different unused IP addresses across 12 different subnets from the same organization. All traffic which was sent to these IPs were logged as darknet traffic. The traffic was collected in three batches; the first was recorded during a period of six weeks (44 days) from 10/25/2018 until 12/5/2018 (denoted by Batch 1), the second was recorded during a period of eight weeks (55 days) from 2/1/2019 until 3/26/2019 (denoted by Batch 2) and the third batch was recorded during a period of 37 weeks (257 days) from 1/21/2019 until 10/6/2019 (denoted by Batch 3). Note that Batch 2 and Batch 3 are overlapping, that is because the Batches 1 and 2 recorded in the research phase of the project, where Batch 3 recorded in the deployment stage, in real time. A deep analysis was performed on Batches 1 and 2. Batch 3 was used to demonstrate the system’s long-term stability. In total, 7,918,787,884 packet headers from 4,887,568 different source IP addresses were recorded, resulting in over 3 terabytes of data. Figure 5 shows the number of packets and source IP addresses for every hour in

Mining and Monitoring Darknet Traffic

99

the first two batches. Note that due to a technical problem, one hour at the end of October is missing. Because the missing time is insignificant, and the proposed method can deal with missing time windows, those missing values do not affect the overall results.

Fig. 5. The hourly number of different packets (orange) unique src IPs (blue) in the darknet after the data preprocessing stage. (Color figure online)

Configuration. We choose the step size S to be one hour and the window length L to be four hours, similar to the work of Ban et al. [4]. A one hour step size provides a sufficient amount of data while granting a security expert enough time to react to a detected attack. In addition, we choose the epsilon parameter of DB-SCAN to be 0.3, and the minPts parameter to be 30, as those parameters resulted in an average of four new clusters every day (agreed by the security experts to be a reasonable number of clusters to investigate each day). In addition, we do not want clusters with a small number of src IPs, as those clusters are too small to represent a significant trend in the network, and thus should be treated as noise. Scalable Implementation. To scale to the number of sources and amount of data, DANTE was implemented and evaluated at the NSP’s CERT using Spark on an Hadoop architecture. Those architectures utilized the power of big data to distribute the algorithm on several different machines. We tested the method on a Hadoop cluster consisting of 50 cores and 10 executors. The algorithm takes approximately 62 s to extract, embed, cluster, and map each four hours time window. Data Preprocessing. IPs which sent less than three packets per window were discarded. The majority of these packets are likely miss-configurations (back scatter) and not a malicious attack. By removing those IPs, we reduced the number of packets by 33% percent. 6.2

Results

Because of time constrains, we use Batch 1 and 2 for a comprehensive analysis, and show that DANTE can find malicious attacks within the first few hours, long before any of the online reports. Batch 3 is used to prove that the algorithm can

100

D. Cohen et al.

Fig. 6. Hourly number of sequences 10 month of data, by their types.

Fig. 7. An overview of the different clusters in the data during the data collection period, aggregated by hour. Each color represents a different cluster. (Color figure online)

work over a long period of time, and can detect new and reoccurring clusters even after almost a year. A visualisation of Batches 3 clusters over the 10 months can be seen in Fig. 6. A total of 400 clusters (novel darknet behaviors) were discovered over Batches 1 and 2, and 1,141 clusters were discovered over Batch 3. As previously mentioned, the system discovered four new clusters each day, on average, as can be seen in Fig. 9. This number, determined by the set parameters, is small enough for an analyst to evaluate but large enough to detect new concepts. The vast majority of the src IPs belong to one of 16 large clusters (concepts) over the year. Some of these clusters reoccurred (e.g., botnet code reuse) and others always stayed present (e.g., worms looking for open telnet ports). Figure 7 plots the volume of the discovered clusters over time for Batches 1 and 2. The x-axis represents time, and the y-axis represents the total number of src IPs in the data for each hour. In the visualization component of DANTE (Sect. 3) this graph is updated periodically to help the analysts explore past and current trends. The clusters discovered by DANTE can be roughly divided into four categories: 1. Service Recon Clusters whose sequences have six or more unique ports:Either benign web scanners or agents performing reconnaissance on active services. 2. Basic Attacks & Host Recon Clusters whose sequences consist of a single port: Agents trying compromise a device via a single vulnerability, or performing reconnaissance to discover active network hosts.

Mining and Monitoring Darknet Traffic

101

3. Complex Attacks Clusters whose sequences have two to five unique ports: Agents which are attempting to exploit multiple vulnerabilities per device, are performing multi-step attacks, or are worms. 4. Noise and Outliers A single cluster of benign sequences from misconfigurations, backscatter, or are too small to represent an ongoing trend. In Table 1 we give statistics on these categories, and in Fig. 6 we present the volume of each category in Batch 3 over time. Table 2 provides concrete examples of discovered attack patterns, by cluster, as labeled in Fig. 7.

Table 1. The number of clusters, src IPs, and packets for each of the four cluster families. No. of clusters Batch 1 Batch 2 Batch 3 Service Recon 24 27 326 Basic Attacks & Host Recon 77 71 202 Complex Attacks 78 121 610 Noise and Outliers 1 1 1 Clusters Type

No. of src IPs Batch 1 Batch 2 Batch 3 16,909 78,644 101,316 246,864 313,457 1,216,077 351,318 395,109 2,602,184 115,645 142,920 819,566

Batch 1 9,603,305 50,937,141 19,203,613 582,646,403

No. of packets Batch 2 Batch 3 39,111,889 162,499,127 56,881,565 380,244,017 85,960,212 1,085,777,577 743,977,834 5,935,419,695

Service Recon. Service reconnaissance (port scanning) is performed by attackers to find and exploit backdoors or vulnerabilities in services [27]. In Batches 1 and 2, we identified 51 port scanning clusters with an average of 929 different ports scanned in each cluster. However, it is important to note that a cluster in this group is not necessarily malicious. For example, cluster A in Fig. 7 consists of src IPs of Censys, a security company which scans 40 ports to find and report vulnerable IoT devices. Interestingly, because of the embeddings, DANTE found that 2.2% of the IPs in cluster A do not belong Censys’ subnet. Rather, we verified them to belong to a malicious actor copying Censys to stay under the radar. In some cases, the service recon can consist of multiple ports that belong to the same service, in order to use an exploit on this service even if the host is using an alternative port. For example, Cluster B, occurred on 3/8/2019, consist only of ports that can be associated with HTTP. This cluster consists of 17 ports, such as 80, 8080, 8000, 8008, 8081 and 8181. In addition, most of the src IPs are located in Taiwan (18%), Iran (15%) and Vietnam (12%). Because DANTE assigns similar embedding to those ports, those port were group together and DANTE was able to detect this pattern and issue an alert. At the time of writing of this article, we were not able to find any information on this scan online. This lack of reports could be explained by the fact that there was no significant peak in any of the ports involved, which make it hard for conventional anomaly detectors to detect this pattern. Basic Attacks and Host Recon. Many port sequences consist of a single port accessed repeatedly one or more times. This behavior captures agents which are trying to exploit a single vulnerability (e.g., how the Mirai malware compromises

102

D. Cohen et al.

Table 2. Summaries of the mentioned clusters. Cluster name

Category

Number of src IPs

A

Service Recon

141

B

Service Recon

20,982

C

Basic Attacks & Host Recon

113, 407

D

Basic Attacks & Host Recon

895

E

Basic Attacks & Host Recon

285,651

F

Complex Attacks

43,305

G

Complex Attacks

535

H

Complex Attacks

43,305

Number of packets

Sequence examples

873, 458 [2077, 2077, 8877, 7080, . . . , 304, 3556] 2, 061, 655 [8000, 88, 80, 8000, 8081, 80, 80] 7,061,042 [445, 445, 445] 68,065 [11390, 11390, . . . , 11390, 11390] 42,225,387 [23, 23, 2323] 1,718,440 [9527, 9527, 9527, 5555, 5555, 5555] 7,270 [7550, 7550, 7547, 7547, 7547] 105,258 [7379, 7379, 5379, 5379, 6379, 6379]

devices via telnet). This behavior is also consistent with network scanning [27], where the attacker tries to detect live hosts to map out an organization. By following how campaigns (clusters) reoccur and grow in volume over time, analysts can use DANTE to find trends and discover new vulnerabilities. To visualize this, Fig. 8 contains a scatter plot of each port sequence cluster from Batch 1 and its corresponding port.

Fig. 8. A view of the Basic Attacks clusters from Batch 1. Each dot represents a cluster with the corresponding port shown above (x-axis and y-axis are logarithmic).

Fig. 9. Number of newly discovered clusters over time in the Batch 1 data (daily).

It is interesting to see that, for example, port 445 is accessed by many src IPs, each of which only accessed the port an average of 5.8 times. This type of cluster can be seen in the second largest cluster, C. This cluster consists of port 445 (SMB over IP: commonly used by worms such as Conficker [16,23,41] to compromise Windows machines) and was being attacked at a low rate, once every few hours. However, DANTE also detected that port 5060 (SIP: used to

Mining and Monitoring Darknet Traffic

103

initiate and maintain voice/video calls) was trending, being attacked at a high rate from the USA. Nonetheless, most of the clusters in this category are relatively small and only appear for a few hours. In another example, DANTE detected a network recon campaign between 1:00 and 6:00 AM GMT on 11/25/2018 (cluster D) on the unused port of 11390. The attack consisted of 895 different src IPs which mostly targeted one out of our 1,129 darknet IPs. This indicated which subnet the campaign was targeting and provided us with threat intelligence on a possible software vulnerability/backdoor on port 11390. By using Spark implementation system, DANTE was able to issue an alert to the CERT with that information about a minute after the data arrived. Complex Attacks. These clusters capture patterns involving multiple attack steps or vulnerabilities. This category of clusters often captures the most interesting attack patterns, and the most difficult to detect because of their low volume of traffic which hides them under the noise floor. Without the proposed embedding approach, an attack which involves a sequence of ports would be clustered with the other Basic Attacks and thus the other ports in the attack would go unnoticed –making it impossible to distinguish from other attack campaigns. We note that cluster E contains 30% of all darknet sequences. This is because the cluster captures ports 23 and 2323 which are both used for Telnet. The Mirai botnet and its variants have been using these ports to search for vulnerable devices over last few years [23]. One example of this category occurred on 3/4/2019, where DANTE reported a new large cluster appeared (F in Fig. 7) consisting of two ports and originating from China and Brazil only. Here, 92% of the src IPs sent the sequence {9527, 9527, 9527}, and 8% of the src IPs sent {9527, 9527, 9527, 5555, 5555, 5555} and in the reverse order. This is a clear indication of one or more campaigns launched at the same time aiming to exploit a new vulnerability. Four days after DANTE detected the attack, the attack was reported [31] and related to a vulnerability in an IP-Camera (CVE-2017-11632). Interestingly, none of the reports mentioned that the attackers were also targeting port 5555. This not only demonstrates how DANTE can provide threat intelligence, but how DANTE can detect ongoing attacks. A third example of this category can be seen in a cluster, G, from 11/22/18, which was discovered by DANTE. This cluster contains a pattern of scanning two specific ports, 7547 and 7550, which occurs on 11/22/2018 from 5 to 9 PM GMT. According to the Internet Storm Center (ISC) port search [38], a free tool that monitors the level of malicious port activity, this is the most significant peak of activities for port 7550 in the past two years, however, to the best of our knowledge, there have been no reports of an attack that utilizes this port. The missing information in the reports could suggest a novel attack that utilize those two ports. In addition, port 7547 appears to have a large number of packets arriving each day at 10:00 AM GMT, that were assigned to a different cluster. Unlike cluster G, that cluster consists of port 7547 alone with no additional ports in the scan. 12/22/2018 is the only day when activity on this port peaks

104

D. Cohen et al.

in a different hour (see Fig. 10). There are reports [37] of a known Mirai botnet variant that uses port 7547 to exploit routers. Based on this report we attribute cluster G to Mirai activity. Since DANTE detected that the two ports are used interchangeably by the same src IPs, we suspect that there is a new vulnerability tested on port 7550 of routers. At the time of writing of this article existence or absence of such vulnerability was not yet confirmed by any organization. On 10/31/2018, DANTE generated an alert about a new attack (cluster H) one minute after it began. This attack continued everyday from 11/15/18 until the 11/18/18. In the attack, 1,789 different network hosts sent exactly four packets to two or three of the ports: 5379, 6379, and 7379. According to the ISC, this is the largest peak in the use of port 5379 and the third largest peak for port 7379, although these ports are considered unused. We could not find any report online which identified these ports being used together in an attack. Identifying the source of these IPs can lead a CERT team to uncovering an ongoing campaign. Noise and Outliers. These are all of the patterns which DBScan considered outliers. For example, short sequences or those with too few occurrences. Darknet traffic which falls in this category is typically benign backscatter or misconfigured packets, but not a scan [22,27]. The size of this cluster is directly controlled by the minPts parameter in DBScan and can be changed at any time. As previously mentioned, we chose minPts to be 30 in order to create a reasonable number of clusters per day for a security expert to explore. 6.3

Comparative Discussion

In the previous sections, we demonstrated that DANTE can detect new and reoccurring attack patterns. Unlike methods such as [3,4,27,36] which detect port scanning and methods such as [24,25,27,36] which detect traffic spikes on individual port numbers, DANTE can detect both and distinguishing between them. Furthermore, because DANTE uses Word2vec, it can also detect complex patterns (e.g., CVE-2017-11632). Many well known and dangerous malware generate complex patterns captures. Table 3 exemplifies this claim by listing some of these malware along with the ports which they access during their lifetime. To the best of our knowledge, there are no other works which can detect these kinds of patterns as quickly and accurately as DANTE. Concretely, [15,24,25,27,36] observe traffic on individual ports. Although [3,4] finds the connection between ports, it ignores the meaning and the content of the sequences, preventing it from detecting behaviors such as cluster G (ports 7547 and 7550 together). 6.4

Comparison with Ban et al. [4]

To evaluate our contribution of using Word2vec as a means for mining meaningful patterns, we compare our embedding approach to Ban et al. [4]. Ban et al. used a frequent pattern mining (FPM) algorithm called FP-growth on darknet traffic to group similar ports together. By using FPM, Ban et al. was able to find subsets

Mining and Monitoring Darknet Traffic

105

Table 3. Different malware and the ports they are using for infection and communication.

Fig. 10. Port 7547 during the week of 11/21/18. The orange cluster occurs each day at 10:00 AM GMT; the gray cluster is small and occurs continuously during the week. DANTE detected the attack (blue) which occurred at 5:00 PM 11/22/18. (Color figure online)

Botnet Ports Cryptojacking 6379, 2375, 2376 Wicked 8080, 8443, 80, 81 MIRAI 23, 2323 CONFICKER 445 EMOTET 80, 990, 8090, 50000, 8080, 7080, 443 CERBER 80, 6892 KOVTER 43, 8080 QAKBOT 443, 65400, 2222, 21, 41947 Cryptojacking 22, 7878, 2375, 2376 GH0ST 2011, 2013, 800 NANOCORE 3365

Phase Infection Infection Infection Infection Post-infection

Post-infection Post-infection Post-infection

Post-infection Post-infection Post-infection

of ports that occur together in the data. To the best of our knowledge, this is the most recent work that mines patterns in darknet traffic to find new concepts. We applied their method using their parameters on Batch 1 using the same time windows. The top 20 largest subsets of both FPM and DANTE are presented in Table 4. FPM cannot identify when different subsets are jointly used in novel attack patterns. This can be seen in cluster F, where DANTE clusters two different sets together, the set [9527] and the set [5555, 9527]. This cluster led DANTE to the discovery of an attack on Wireless IP Camera 360 devices (CVE-201711632). In addition, Ban et al.’s method omits all sequences with more than six different ports. However, these patterns are important to detect malicious service reconnaissance. For example, in cluster A, DANTE discovered a malicious actor who was copying Censys’s port scanning behaviors. By using the semantic embedding of ports, DANTE was able to relate similar ports (e.g., Telnet ports 23 and 2323 in cluster E). However, FPM created a set for every permutation although they all capture the same concept (see the three biggest FPM sets in Table 4). Moreover, some ports typically appear in attack sequences and rarely appear alone. This means that there is semantic information which can be learned about these ports, such as the intent and how it is reflected on the other ports in the sequence. For example, in Table 4, port 80 (HTTP) is a single concept according to FPM due to the FP-growth process. In contrast, DANTE re-associates port 80 with other patterns. This can be seen in the top 20 largest concepts where port 80 appears in six different concepts but never appears by itself; in FPM, the singleton [80] is the fifth largest subset. Lastly, FP-growth creates a large number of sets, even when setting the minimum support parameter to be a relatively large value (1,819 sets with minimum

106

D. Cohen et al.

Table 4. Top 20 biggest Batch 1 concepts according to DANTE and Ban et al. Ban et al. [4]

DANTE

Port 1 Port 2 Occur.

Port 1

23

5,783,748 23

2323

1,637,119 37215

2323 23

1,577,142 445

445

Port 2 Port 3 Port 4 Port 5 Port 6 2323

33,439,191 15,294,071 4,404,525

982,528 80

8001

8080

80

886,540 80

8080

433

37215

672,897 22 539,976 8080

8081

8081

383,209 3389

3398

5555

322,826 5555

8080 80

277,633 1433

81

8081

8088

and 10 other ports

3,699,342 3,355,387 2,697,434

8080

37215 23

Occur.

2,403,517 9833

3839

8933

1,383,441 937,267

3413

850,124

244,212 1235 different ports

798,760

187,815 80

8080

3389

178,766 3389

139

615,490

8443

158,892 21

482,898

32764

132,361 81

463,355

1433

130,363 1252 different ports

344,761

9000

129,044 443

445

122,518 8080

8000

80 23

106,023 123

47808 161

104181 5900

519,732

343,922

22 21

11211

5901

8333 5431

1883

8883

and 44 other ports

237,758

501

195,431 150,234

support of 1,000 for a period of six weeks). The resulting number of sets is impractical for a CERT analyst to investigate. An advantage of using DANTE is that only a reasonable number of concepts with high importance are identified per day (average of four a day, as shown in Fig. 9). At the same time, DANTE discovers small yet significant patterns that might represent dangerous attacks (such as cluster G).

7

Conclusion and Future Work

By mining darknet traffic, analysts can get frequent reports on-going and new merging threats facing their network. In this paper, we presented DANTE: a framework which enables network service providers to mine threat intelligence from massive darknet traffic streams. DANTE was able to detect hidden threats within real-world datasets and out performed the state of the art approach. Currently, DANTE is being deployed in Deutsche Telekom’s networks to provide their CERT with better threat intelligence. We hope that the embedding and clustering techniques of this paper will assist researchers and the industry in better securing the Internet. Acknowledgment. This research was partially supported by the CONCORDIA project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement number 830927. We would like to thank Nadav Maman for his help in implementing DANTE.

Mining and Monitoring Darknet Traffic

107

References 1. Bailey, M., Cooke, E., Jahanian, F., Myrick, A., Sinha, S.: Practical darknet measurement. In: 2006 40th Annual Conference on Information Sciences and Systems, pp. 1496–1501. IEEE (2006) 2. Bailey, M., Cooke, E., Jahanian, F., Nazario, J., Watson, D., et al.: The Internet motion sensor-a distributed blackhole monitoring system. In: NDSS (2005) 3. Ban, T., Eto, M., Guo, S., Inoue, D., Nakao, K., Huang, R.: A study on association rule mining of darknet big data. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–7, July 2015 4. Ban, T., Pang, S., Eto, M., Inoue, D., Nakao, K., Huang, R.: Towards early detection of novel attack patterns through the lens of a large-scale darknet, pp. 341–349, July 2016 5. Ban, T., Zhu, L., Shimamura, J., Pang, S., Inoue, D., Nakao, K.: Detection of botnet activities through the lens of a large-scale darknet. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.-S.M. (eds.) ICONIP 2017. LNCS, vol. 10638, pp. 442–451. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70139-4 45 6. Ban, T., Zhu, L., Shimamura, J., Pang, S., Inoue, D., Nakao, K.: Behavior analysis of long-term cyber attacks in the darknet. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012. LNCS, vol. 7667, pp. 620–628. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34500-5 73 7. Bartos, K., Sofka, M., Franc, V.: Optimized invariant representation of network traffic for detecting unseen malware variants (2016) 8. Bou-Harb, E., Debbabi, M., Assi, C.: A time series approach for inferring orchestrated probing campaigns by analyzing darknet traffic. In: 2015 10th International Conference on Availability, Reliability and Security, pp. 180–185. IEEE, August 2015 9. Bringer, M.L., Chelmecki, C.A., Fujinoki, H.: A survey: recent advances and future trends in honeypot research. Int. J. Comput. Netw. Inf. Secur. 4(10), 63 (2012) 10. Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 328–339. SIAM (2006) 11. Carnein, M., Trautmann, H.: Optimizing data stream representation: an extensive survey on stream clustering algorithms. Bus. Inf. Syst. Eng. 61(3), 277–297 (2019) 12. Casas, P., Mazel, J., Owezarski, P.: Unsupervised network intrusion detection systems: detecting the unknown without knowledge. Comput. Commun. 35, 772–783 (2012) 13. Choi, S.S., Song, J., Kim, S., Kim, S.: A model of analyzing cyber threats trend and tracing potential attackers based on darknet traffic. Secur. Commun. Netw. 7(10), n/a (2013) ´ Neural visualization of network traffic data for intrusion 14. Corchado, E., Herrero, A.: detection. Appl. Soft Comput. J. 11, 2042–2056 (2010) 15. Coudriau, M., Lahmadi, A., Fran¸cois, J.: Topological analysis and visualisation of network monitoring data: darknet case study. In: 2016 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–6 (2016) 16. Durumeric, Z., Bailey, M., Halderman, J.A.: An Internet-wide view of Internet-wide scanning. In: Proceedings of the 23rd USENIX Conference on Security Symposium, SEC 2014, pp. 65–78. USENIX Association (2014)

108

D. Cohen et al.

17. Ester, M., Wittmann, R.: Incremental generalization for mining in a data warehousing environment. In: Schek, H.-J., Alonso, G., Saltor, F., Ramos, I. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 135–149. Springer, Heidelberg (1998). https:// doi.org/10.1007/BFb0100982 18. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996) 19. Fachkha, C., Bou-Harb, E., Debbabi, M.: Inferring distributed reflection denial of service attacks from darknet. Comput. Commun. 62, 59–71 (2015) 20. Faria, E.R., Gon¸calves, I.J.C.R., de Carvalho, A.C.P.L.F., Gama, J.: Novelty detection in data streams. Artif. Intell. Rev. 45(2), 235–269 (2015). https://doi.org/10. 1007/s10462-015-9444-8 21. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 359–366. IEEE (2000) 22. Harrop, W., Armitage, G.: Defining and evaluating Greynets (sparse darknets). In: The IEEE Conference on Local Computer Networks 30th Anniversary (LCN 2005), vol. l, pp. 344–350. IEEE (2005) 23. Heo, H., Shin, S.: Who is knocking on the Telnet port: a large-scale empirical study of network scanning. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 625–636. ACM (2018) 24. Inoue, D., et al.: An incident analysis system NICTER and its analysis engines based on data mining techniques. In: K¨ oppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5506, pp. 579–586. Springer, Heidelberg (2009). https:// doi.org/10.1007/978-3-642-02490-0 71 25. Kao, C.N., Chang, Y.C., Huang, N.F., Liao, I.J., Liu, R.T., Hung, H.W., et al.: A predictive zero-day network defense using long-term port-scan recording. In: 2015 IEEE Conference on Communications and Network Security (CNS), pp. 695–696. IEEE (2015) 26. Lagraa, S., Francois, J., Lahmadi, A., Miner, M., Hammerschmidt, C., State, R.: BotGM: unsupervised graph mining to detect botnets in traffic flows. In: 2017 1st Cyber Security in Networking Conference (CSNet), pp. 1–8. IEEE, October 2017 27. Liu, J., Fukuda, K.: Towards a taxonomy of darknet traffic. In: 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 37– 43. IEEE, August 2014 28. Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008) 29. Mairh, A., Barik, D., Verma, K., Jena, D.: Honeypot in network security: a survey. In: Proceedings of the 2011 International Conference on Communication, Computing and Security, pp. 600–605. ACM (2011) 30. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) 31. Nichols, S.: FBI warns of SIM-swap scams, IBM finds holes in visitor software, 13-year-old girl charged over Javascript prank (2019). https://www.theregister.co. uk/2019/03/09/security roundup 080319 32. Owezarski, P.: A Near Real-Time Algorithm for Autonomous Identification and Characterization of Honeypot Attacks. Technical report (2015). https://hal. archives-ouvertes.fr/hal-01112926 33. Pa, Y.M.P., Suzuki, S., Yoshioka, K., Matsumoto, T., Kasama, T., Rossow, C.: IoTPOT: a novel honeypot for revealing current IoT threats. J. Inf. Process. 24(3), 522–533 (2016)

Mining and Monitoring Darknet Traffic

109

34. Pang, S., et al.: Malicious events grouping via behavior based darknet traffic flow analysis. Wirel. Pers. Commun. 96, 5335–5353 (2017) 35. Singhal, A., Ou, X.: Security risk analysis of enterprise networks using probabilistic attack graphs. Network Security Metrics, pp. 53–73. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66505-4 3 36. Thonnard, O., Dacier, M.: A framework for attack patterns’ discovery in honeynet data. Digit. Invest. 5, S128–S139 (2008) 37. Ullrich, J.: Port 7547 soap remote code execution attack against DSL modems (2016). https://isc.sans.edu/diary/Port+7547+SOAP+Remote+Code+Execution +Attack+Against+DSL+Modems/21759 38. Van Horenbeeck, M.: The sans Internet storm center. In: 2008 WOMBAT Workshop on Information Security Threats Data Collection and Sharing, pp. 17–23. IEEE (2008) 39. Wagner, C., Dulaunoy, A., Wagener, G., Iklody, A.: MISP: the design and implementation of a collaborative threat intelligence sharing platform. In: Proceedings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security, pp. 49–56. ACM (2016) 40. Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Towards universal paraphrastic sentence embeddings. arXiv preprint arXiv:1511.08198 (2015) 41. Wustrow, E., Karir, M., Bailey, M., Jahanian, F., Huston, G.: Internet background radiation revisited. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, New York (2010) 42. Zhang, J., Tong, Y., Qin, T.: Traffic features extraction and clustering analysis for abnormal behavior detection. In: Proceedings of the 2016 International Conference on Intelligent Information Processing - ICIIP 2016, New York (2016) ˇ 43. Skrjanc, I., Ozawa, S., Dovˇzan, D., Tao, B., Nakazato, J., Shimamura, J.: Evolving cauchy possibilistic clustering and its application to large-scale cyberattack monitoring. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7, November 2017

Efficient Quantification of Profile Matching Risk in Social Networks Using Belief Propagation Anisa Halimi1 and Erman Ayday1,2(B) 1

Case Western Reserve University, Cleveland, OH, USA {anisa.halimi,erman.ayday}@case.edu 2 Bilkent University, Ankara, Turkey

Abstract. Many individuals share their opinions (e.g., on political issues) or sensitive information about them (e.g., health status) on the internet in an anonymous way to protect their privacy. However, anonymous data sharing has been becoming more challenging in today’s interconnected digital world, especially for individuals that have both anonymous and identified online activities. The most prominent example of such data sharing platforms today are online social networks (OSNs). Many individuals have multiple profiles in different OSNs, including anonymous and identified ones (depending on the nature of the OSN). Here, the privacy threat is profile matching: if an attacker links anonymous profiles of individuals to their real identities, it can obtain privacysensitive information which may have serious consequences, such as discrimination or blackmailing. Therefore, it is very important to quantify and show to the OSN users the extent of this privacy risk. Existing attempts to model profile matching in OSNs are inadequate and computationally inefficient for real-time risk quantification. Thus, in this work, we develop algorithms to efficiently model and quantify profile matching attacks in OSNs as a step towards real-time privacy risk quantification. For this, we model the profile matching problem using a graph and develop a belief propagation (BP)-based algorithm to solve this problem in a significantly more efficient and accurate way compared to the state-of-the-art. We evaluate the proposed framework on three real-life datasets (including data from four different social networks) and show how users’ profiles in different OSNs can be matched efficiently and with high probability. We show that the proposed model generation has linear complexity in terms of number of user pairs, which is significantly more efficient than the state-of-the-art (which has cubic complexity). Furthermore, it provides comparable accuracy, precision, and recall compared to state-of-the-art. Thanks to the algorithms that are developed in this work, individuals will be more conscious when sharing data on online platforms. We anticipate that this work will also drive the technology so that new privacy-centered products can be offered by the OSNs. Keywords: Social networks Privacy risk quantification

· Profile matching · Deanonymization ·

c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 110–130, 2020. https://doi.org/10.1007/978-3-030-58951-6_6

Efficient Quantification of Profile Matching Risk in Social Networks

1

111

Introduction

Many individuals, to preserve their privacy and to protect themselves against potential damaging consequences, choose to share content anonymously in the digital space. For instance, people share their opinions about different topics or sensitive information about themselves (e.g., their health status) without sharing their real identities, hoping that they will remain anonymous. Unfortunately, this is non-trivial in today’s interconnected world, in which different activities of individuals can be linked to each other. An attacker, by linking anonymous activities of individuals to their real identities (via other publicly available and identified information about them), can obtain privacy-sensitive information about them. Thus, individuals need tools that show them the scale of their vulnerability against such privacy risks when they share content. In this work, we tackle this problem by focusing on data sharing on online social networks (OSNs). An OSN is a platform, in which, individuals share vast amount of information about themselves such as their social and professional life, hobbies, diseases, friends, and opinions. Via OSNs, people also get in touch with other people that share similar interests or that they already know in real-life [8]. With the widespread availability of the Internet, OSNs have been a part of our lives more than ever. Most individuals have multiple OSN profiles for different purposes. Furthermore, each OSN offers different services via different frameworks, leading individuals share different types of information [9]. Also, in some OSNs, users reveal their real identities (e.g., to find old friends), while in some OSNs, users prefer to remain anonymous (especially in OSNs in which users share anonymous opinions or sensitive information about themselves, such as their health status). Here, the privacy risk is the deanonymization of the anonymous OSN profile of a user using their other OSN profiles, in which the user is identified. Such profile matching across OSNs (i.e., identifying profiles belonging to the same individuals) is a serious privacy threat, especially for individuals that have anonymous profiles in some OSNs and reveal their real identities in others. If an attacker can link anonymous profiles of individuals to their real identities, it can obtain privacy-sensitive information about individuals that is not intended to be linked to their real identities. Such sensitive information can then be used for discrimination or blackmailing. Thus, it is very important to quantify and show the risk of such profile matching attacks in an efficient and accurate way. An OSN can be characterized by (i) its graphical structure (i.e., connections between its users) and (ii) the attributes of its users (i.e., types of information that is shared by its users). The graphical structures of most popular OSNs show strong resemblance to social connections of individuals in real-life (e.g., Facebook). Existing work shows that this fact can be utilized to link accounts of individuals from different OSNs [29]. However, without sufficient background information, just using graphical structure for profile matching becomes computationally infeasible. Furthermore, some OSNs or online platforms either do not have a graphical structure at all (e.g., forums) or their graphical structures do not resemble the real-life connections of the individuals (e.g., health-related OSNs such as PatientsLikeMe [3]). In these types of OSNs, an attacker can uti-

112

A. Halimi and E. Ayday

lize the attributes of the users for profile matching. Thus, to show the scale of the profile matching threat, it is crucial to process both the graphical structure and the other attributes of the users in an efficient and accurate way. In this work, we efficiently model the profile matching problem in OSNs by considering both the graphical structure and other attributes of the users, a step towards delivering real-time information to OSN users about their privacy risks for profile matching due to their sharings on online platforms. Designing efficient privacy risk quantification tools is non-trivial, especially considering the scale of the problem. To overcome this challenge, we develop a novel, graph-based model generation algorithm to solve the profile matching problem in a significantly more efficient and accurate way than the state-of-the-art. We formulate the profile matching problem as finding the marginal probability distributions of random variables representing the possible matches between user profile pairs from the joint probability distribution of many variables. We factorize the joint probability distribution into simpler local functions to compute the marginal probability distributions efficiently. To do so, we formulate the model generation for profile matching by using a graph-based algorithm. That is, we formulate the problem on a factor graph and develop a novel belief propagation (BP)-based algorithm to generate the model efficiently and accurately (compared to the state-of-the-art). The outcome of the model generation will pave the way towards developing real-time risk quantification tools (i.e., inform users about their privacy loss and its consequences as they share new content). Our results show that the proposed model generation algorithm can match user profiles with an accuracy of up to 90% (depending on the amount of information and attributes that users share). As more information is collected about the users profiles in social networks, the accuracy of the BP-based algorithm increases. Also, by analyzing the effect of social networks’ size to obtained precision and recall values, we show the scalability of the proposed model generation algorithm. We also show that by controlling the structure of the proposed graphical model, we can simultaneously improve the efficiency of the proposed model generation algorithm and increase its accuracy. The rest of the paper is organized as follows. In Sect. 2, we discuss the related work. In Sect. 3, we provide the threat model. In Sect. 4, we describe the proposed framework in detail. In Sect. 5, we implement and evaluate the proposed framework using real-life datasets belonging to various OSNs. In Sect. 6, we discuss how the proposed scheme can be used for real-time privacy risk quantification, potential mitigation techniques, and generalization of the proposed scheme for different OSNs. Finally, in Sect. 7, we conclude the paper.

2

Related Work

Several works in the literature have proposed profile matching schemes that leverage network structure, publicly available attributes of the users, or both of them. Profile matching based only on network (graph) structure is widely known as the deanonymization problem. Graph deanonymization (DA) attacks can be classified as (i) seed-based attacks [16,18,19,29,30], in which a set of seeds (users’

Efficient Quantification of Profile Matching Risk in Social Networks

113

accounts in two different networks which belong to the same individual) are known; and (ii) seed-free attacks [33,39], in which no seeds are used. Narayanan and Shmatikov were among the first that proposed a graph deanonymization algorithm [29]. Nilizadeh et al. [30] improved the attack proposed by Narayanan et al. by proposing a community-level deanonymization attack. Korula and Lattanzi [19] proposed a DA attack that by starting from a set of seeds, iteratively matches user pairs with the most number of neighboring mapped pairs. Ji et al. [16] quantified the deanonymizability of social networks from a theoretical perspective (i.e., focusing on social networks that follow a distribution model). Pedarsani et al. [33] proposed a Bayesian-based model to match users across social networks without using seeds. Their model uses node degrees and distances to other nodes. Sharad et al. [36] showed that users’ re-identification (deanonymization) in anonymized social networks can be automated. Ji et al. [17] evaluated several anonymization techniques and deanonymization attacks and showed that all state-of-art anonymization techniques are vulnerable to modern deanonymization attacks. Recently, Zhou et al. [43] proposed DeepLink, a deep neural network based algorithm that leverages network structure for user linkage. Another line of works [11,14,15,23,24,26,27,31,34,38,41,42] have leveraged public information in the users’ profiles (such as user name, profile photo, description, location, and number of friends) for profile matching. Shu et al. [37] provided a broad review of the works that use public information for profile matching. Malhotra et al. [26] built classifiers on various attributes to determine whether two user profiles are matching or not. On the other hand, Zafarani et al. [42] explored user name by analyzing the behaviour patterns of the users, the language used, and the writing style to link users across social media sites. Goga et al. [11] showed that attributes that are hard to be controlled by users, such as location, activity, and writing style, may be sufficient for profile matching. Liu et al. [25] proposed a framework that mainly consists of three steps: behavior similarity modeling, structure consistency modeling, and multiobjective optimization. Goga et al. [12] conducted a detailed analysis of user profiles and their attributes identifying four properties: availability, consistency, non-impersonability, and discriminability. Andreou et al. [5] combined attribute and identity disclosure across social networks. Recently, Halimi et al. [13] proposed a more accurate profile matching framework based on machine learning techniques and optimization algorithms. One common thing about most of these aforementioned approaches is that they rely on training classifiers to determine whether a user pair is a match or not. We implemented some of these approaches and compared with the proposed framework in Sect. 5. Contribution of this Paper: Previous works show that there exists a nonnegligible risk of matching user profiles on offline datasets. Showing the risk on offline datasets is not effective since users need tools that guide them at the time of data sharing in digital world. However, building algorithms that will pave the way towards real-time privacy risk quantification is non-trivial considering the scale of the problem. In this paper, we develop a novel belief propagation

114

A. Halimi and E. Ayday

(BP)-based algorithm to generate the model efficiently and accurately (compared to the state-of-the-art). The proposed algorithm has linear complexity with respect to the number of user pairs (i.e., possible matches), while Hungarian algorithm [21], state-of-the-art that provides the highest accuracy (as shown in Sect. 5.4), has cubic complexity with respect to the number of users. We also show that the proposed algorithm achieves comparable accuracy with the Hungarian algorithm while providing this efficiency advantage.

3

Threat Model

We assume the attacker has access to user profiles in different OSNs. For simplicity we consider two OSNs: user profiles in OSN A (the auxiliary OSN) are linked to their identities, while in OSN T (the target OSN), the profiles of the individuals are anonymized. The attacker’s goal is to match one or multiple user profiles from OSN T to the profiles in OSN A in order to determine the real identities of the users in OSN T . To do such profile matching, we assume that the attacker can only use the publicly available attributes of the users from OSNs A and T . We study the extent of profile matching risk by means of two attacks: targeted attack and global attack. Targeted attack represents a scenario in which the attacker identifies the anonymous profile of a victim (or a set of victims) in OSN T and aims to find the corresponding unanonymized profile of the same victim in OSN A. Global attack represents the case in which the attacker aims to link all profiles in OSN T to their corresponding matches in OSN A.

4

Proposed Model Generation

Let A and T represent the auxiliary and the target OSNs, respectively, in which people publicly share attributes such as date of birth, gender, and location. We represent the profile of a user i in either A or T as Uik , where k ∈ {A, T }. We focus on the most common attributes that are shared in many OSNs and we categorize the profile of a user i as Uik = {nki , ki , gik , pki , fik , aki , tki , ski , rik }, where n denotes the user name,  denotes the location, g denotes the gender, p denotes the profile photo, f denotes the freetext provided by the user in the profile description, a denotes the activity patterns of the user (i.e., time instances at which the user posts), t denotes the interests of the user (based on the sharings of the user), s denotes the sentiment profile of the user, and r denotes the (graph) connectivity pattern of the user. As discussed, the main goal of the attacker is to link the profiles between two OSNs. The overview of the proposed framework is shown in Fig. 1. In the following, we describe the details of the proposed model generation algorithm. 4.1

Categorizing Attributes and Defining Similarity Metrics

Once the attributes of the users are extracted from their profiles, we first categorize them so that we can use them to compute the similarity values of attributes

Efficient Quantification of Profile Matching Risk in Social Networks Data Collection

115

Attributes Categorization

Model Generation User Pairs

OSN A …

X1,1

X1,2

X2,1

f1

f2

f3

X2,2

X3,1

X3,2



xN,3

xN,M

? …

OSN T : profile of user in social network : profile of user in social network

OSN A



fN

g1

g2

g3



gM

OSN T

xij: variable node of user pair ( , ) gj: factor node of fi: factor node of

Fig. 1. Overview of the proposed framework. The proposed framework consists of 3 main steps: (1) data collection, (2) categorization of attributes and computation of attribute similarities, and (3) generation of the model.

between different users. In this section, we define the similarity metrics for each attribute between a user i in OSN A and user j in OSN T . T User Name Similarity - S(nA i , nj ): We use Levenshtein distance [22] to compute the similarity between user names of profiles. T Location Similarity - S(A i , j ): Location information collected from the users’ profiles is usually text-based. We convert the textual information into coordinates via GoogleMaps API [1] and calculate the geographic distance between the corresponding coordinates.

Gender Similarity - S(giA , gjT ): Availability of gender information is mostly problematic in OSNs. Some OSNs do not publicly share the gender information of their users. Furthermore, some OSNs do not even collect this information. In our model, if an OSN does not provide the gender information publicly (or does not have such information), we probabilistically infer the gender information by using a public name database. That is, we use the US social security name database1 and look for a profile’s name (or user name) to probabilistically infer the possible gender of the profile from the distribution of the corresponding name (among males and females) in the name database. We then use this probability as the S(giA , gjT ) value between two profiles. 1

US social security name database includes year of birth, gender, and the corresponding name for babies born in the United States.

116

A. Halimi and E. Ayday

T Profile Photo Similarity - S(pA i , pj ): We calculate the profile photo similarity through a framework named OpenFace [4]. OpenFace is an open source tool performing face recognition. OpenFace first detects the face (in the photo), and then preprocesses it to create a normalized and fixed-size input for the neural network. The features that characterize a person’s face are extracted by the neural network and then used in classifiers or clustering techniques. OpenFace notably offers higher accuracy than previous open source frameworks. Given two T A T profile photos pA i and pj , OpenFace returns the photo similarity, S(pi , pj ), as a real value between 0 (meaning exactly the same photo) and 4.

Freetext Similarity - S(fiA , fjT ): Freetext data in an OSN profile could be a short biographical text or an “about me” page. We use NER (named-entity recognition) [10] to extract features from the freetext information. The extracted features are location, person, organization, money, percent, date, and time. To calculate the freetext similarity between the profiles of two users, we use the cosine similarity between the extracted features from each user. T Activity Pattern Similarity - S(aA i , aj ): To compute the activity pattern similarity, we find the similarity between observed activity patterns of two proA files (e.g., likes or post). Let aA i represent a vector including the times of last |ai | T activities of user i in OSN A. Similarly, aj is a vector including the times of last |aTj | activities of user j in OSN T . First, we compute the time difference between T A T every entry in aA i and aj and we determine min(|ai |, |aj |) pairs whose time difference is the smallest. Then, we compute the normalized distance between these T min(|aA i |, |aj |) pairs to compute the activity pattern similarity between two profiles. T Interest Similarity - S(tA i , tj ): OSNs provide a platform in which users share their opinions via posts (e.g., tweets or tips), and this shared content is composed of different topics. In highlevel, first, we create a topic model using Latent Dirichlet Allocation (LDA) [7]. Then, by using the created model, we compute the topic distribution of each post generated by the users of the auxiliary and the target OSNs. Finally, we compute the interest similarity from the distance of the computed topic distributions. T Sentiment Similarity - S(sA i , sj ): Users typically express their emotions when sharing their opinions about certain issues on OSNs. To determine whether the shared text (e.g., post or tweet) expresses positive or negative sentiment we use sentiment analysis through Python NLTK (natural language toolkit) Text Classification [2]. Given the text to analyze, the sentiment analysis tool returns the probability for positive and negative sentiment in the text. Users’ moods are affected from different factors, so it is realistic to assume that they might change by time (e.g., daily). Thus, we compute the daily sentiment profile of each user, and daily sentiment similarity between the users. For this, first, we compute the normalized distribution of the positive and negative sentiments per day for each user, and then we find the normalized distance between these distributions for each user pair.

Efficient Quantification of Profile Matching Risk in Social Networks

117

Graph Connectivity Similarity - S(riA , rjT ): As in [36], for each user i, we define a feature vector Fi = (c0 , c1 , ..., cn−1 ) of length n made up of components of size b. Each component contains the number of neighbors that have a degree in a particular range, e.g., ck is the count of neighbors with a degree in range [k · b, (k + 1) · b]. We use the feature vector length as 70 and bin size as 15 (as in [36]). 4.2

Generating the Model

We denote the set of profiles that are extracted for training from OSNs A and T as At and Tt , respectively. Profiles are selected such that some profiles in At and Tt belong to the same individual. We let set G include pairs of profiles (UiA , UjT ) from At and Tt that belong to the same individual (i.e., coupled profiles). Similarly, we let set I include pairs of profiles that belong to different individuals (i.e., uncoupled profiles). For each pair of users in sets G and I, we compute the attribute similarities based on the categorizations of the attributes (as discussed in Sect. 4.1). We label the pairs in sets G and I (as coupled and uncoupled) and add them to the training dataset. Then, to identify the weight (contribution) of each attribute, we use logistic regression. Next, we select the profiles to be matched and construct the sets Ae (with size N ) and Te (with size M ). Then, we compute the general similarity S(UiA , UjT ) between every user in Ae and Te using the identified weights of the attributes to obtain the N × M similarity matrix R. Our goal is to obtain a one-to-one matching between the users in Ae and Te that would also maximize the total similarity. One way of solving this problem is to formulate it as an optimization problem and use the Hungarian algorithm, a combinatorial optimization algorithm that solves the assignment problem in polynomial time [21]. It is also possible to formulate profile matching as a classification problem and solve it using machine learning algorithms. Thus, we evaluate and compare the solution of this problem by using both the Hungarian algorithm and other off-the-shelf machine learning algorithms including k-nearest neighbor (KNN), decision tree, random forest, and SVM. Evaluations on different datasets (we will provide the details of the datasets later in Sect. 5.2) show us that Hungarian algorithm provides significantly better precision, recall, and accuracy compared to other machine learning techniques (we will provide the details of our evaluation in Sect. 5.4). However, assuming N users in set Ae and M users in set Te , the running time of the Hungarian algorithm for the above scenario is O(max{N, M }3 ), and hence it is not scalable for large datasets. This raises the need for efficient, accurate, and scalable algorithms for model generation that will pave the way towards real-time privacy risk quantification.

118

4.3

A. Halimi and E. Ayday

Belief Propagation-Based Efficient Formulation of Model Generation

Inspired from the effective use of the message passing algorithms in information theory [35] and reputation management [6], in this research, for the first time, we formulate profile matching as an inference problem that infers the coupled profile pairs and develop an algorithm that relies on belief propagation (BP) on a graphical model. BP algorithm is based on a message-passing strategy for performing efficient inference using graphical models [32]. The problem we consider is different from [6,35] and so is the formulation. In this section we formalize our approach and present the different components that are needed to quantify the profile matching risk. Our goal is to obtain comparable precision, recall, and accuracy values as in the Hungarian algorithm with significantly better efficiency. We represent the marginal probability distribution for a profile pair (i, j) to be a coupled pair as p(xi,j ), where xi,j = 1 if profiles are matched as a result of the algorithm and xi,j = 0, otherwise. Then, we formulate the profile matching (i.e., determining if a profile pair is coupled or uncoupled) as computing the marginal probability distributions of the variables in set X = {xi,j : i ∈ A, j ∈ T }, given the similarity values between the user pairs in the similarity matrix R. Since the number of users in OSNs is high, it is computationally infeasible to compute the marginal probability distributions from the joint probability distribution p(X|R). Thus, we propose to factorize p(X|R) into local functions using a factor graph and run the BP algorithm to compute the marginal probability distributions in linear time (with respect to the number of profile pairs). A factor graph is a bipartite graph containing two sets of nodes (variable and factor nodes) and edges between these two sets. We form a factor graph by setting a variable node for each variable xi,j (i.e., each profile pair). Thus, each variable node represents the marginal probability distribution of that profile pair being coupled or uncoupled. We use two types of factor nodes: (i) “auxiliary” factor node (fi ), representing each user i in OSN A and (ii) “target” factor node (gj ), representing each user j in OSN T . Each factor node is connected to the variable nodes representing its potential matches. Factor nodes represent the statistical relationships between the user attributes and profile matching. Using the factor nodes, the joint probability distribution function can be factorized into products of several local functions, as follows: ⎡ ⎤ N M   1 gj (xσgj , R)⎦, (1) p(X|R) = ⎣ fi (xσfi , R) Z i=1 j=1 where Z is a normalization constant, and σfi (or σgj ) represents the indices of the variable nodes that are connected to factor node fi (or gj ).

Efficient Quantification of Profile Matching Risk in Social Networks

119

Figure 2 shows the factor graph representation of a toy example with 3 users from OSN A and 2 users from OSN T . Here, each user corresponds to a factor node in the graph (shown as a hexagon or rhombus, respectively). Each profile pair is represented by a variable node and shown as a rectangle. Each factor node is connected to the variable nodes it acts on. For example fi is connected to all variable nodes (profile pairs) that contain UiA . The BP algorithm iteratively exchanges messages between the variable and the factor nodes, updating the beliefs on the values of the profile pairs (i.e., being a coupled or an uncoupled profile) at each iteration, until convergence. Next, we introduce the messages i between the variable and the factor A T X X X X X X nodes to compute the marginal distributions using BP. We denote the messages from the variable nodes to f f g g f the factor nodes as μ. We also denote z k (a) (b) the messages from the auxiliary factor nodes to the variable nodes as λ and from the target factor nodes to Fig. 2. Factor graph representation of 3 users from OSN A and 2 users from OSN the variable  nodes as β. The message T . (a) The users in both OSNs A and T . (v+1) (v) μk→i xi,j denotes the probability (b) Factor graph representation of all the of xvi,j = r (r ∈ {0, 1}), at the vth iter- possible profile pairs combinations between   (v) (v) ation. Also, λi→k xi,j denotes the users in OSNs A and T . probability that xvi,j = r (r ∈ {0, 1}) at the vth iteration given R (the messages β can be also expressed similarly). In the following, we describe the message exchange between the variable node x1,4 , the auxiliary factor node f1 , and the target factor node g4 in Fig. 2. For clarity of presentation, we denote the variable and factor nodes x1,4 , f1 , and g4 as i, k, z, respectively. Following the general rules of BP [20], the variable node i generates its message to auxiliary factor node k by multiplying all the messages it receives from its neighbors, excluding k. Note that each variable node has only two neighbors (one auxiliary factor node and one target factor node). Thus, the message from the variable node i to the auxiliary factor node k at the vth iteration is as follows:     (v) (v) (v−1) (v−1) . (2) μi→k x1,4 = βz→i x1,4 1,4

1

1,5

2

2,4

2,5

3

3,4

4

3,5

5

This computation is done at every variable node. The message from the variable node i to the target factor node z is also constructed similarly. Next, factor nodes generate their messages. The message from the auxiliary factor node k to the variable node i is given by:       1 (v) (v) (v) (v) λk→i x1,4 = × S(U1A , U4T ) × (3) fd μd→k x1,4 , Z d∈(∼i)

where (∼ i) means all variable node neighbors of k, except i. We compute function fd as:       (v) (v) (v) (v) fd μd→k x1,4 = 1 − μd→k x1,4 . (4)

120

A. Halimi and E. Ayday

The above computation must be performed for every neighbor of each auxiliary factor node. The message from the target factor node z to the variable node i is also computed similarly. The next iteration is performed in the same way as the vth iteration. The algorithm starts at the variable nodes. In the first iteration (i.e., v = 1), all the variable nodes send to their neighboring factor nodes the same value  (1) (1) (λi→k xi,j = 1/N ), where N is the total number of “auxiliary” factor nodes. The iterations stops when the probability distributions of all variables in X converge. The marginal probability distribution of each variable in X is computed by multiplying all the incoming messages at each variable node. 4.4

-Accurate Model Generation

We also study the limitations and properties of the proposed BP-based model generation algorithm. We particularly analyze if the proposed algorithm maintains any optimality in any sense. For this, we use the following definition: Definition 1 -Accurate Model Generation. We declare a model generation algorithm as -accurate if it can match at least % of the users accurately. Here, accuracy is the number of correctly matched coupled pairs by the proposed algorithm over the total number of coupled pairs. The above definition can also be made in terms of precision or recall (or both) of the proposed algorithm. Thus, for a fixed , we study the conditions for an -accurate algorithm. This also helps us understand the limits of profile matching in OSNs. To have an -accurate algorithm with a high  value, it can be shown that, we require the BP-based algorithm to iteratively increase the accuracy until it converges. This brings about the following sufficient condition about -accuracy. Definition 2 Sufficient Condition. Accuracy of the model generation algorithm increases with each successive iteration (until convergence) if for all cou(2) (1) pled profiles i and j, P r(xi,j = 1) > P r(xi,j = 1) is satisfied. Depending on the fraction of the coupled profile pairs that meet the sufficient condition, -accuracy of the proposed algorithm can be obtained. In Sect. 5, we experimentally explore the cases in which this sufficient condition is satisfied with high probability.

5

Evaluation of the Proposed Mechanism

In this section, we evaluate the proposed BP-based algorithm by using real data from four OSNs. We also study the impact of various parameters to the -accuracy of the proposed algorithm.

Efficient Quantification of Profile Matching Risk in Social Networks

5.1

121

Evaluation Metrics

To evaluate the proposed model, we mainly consider the global attack, in which the goal of the attacker is to match all profiles in Ae to all profiles in Te . In other words, the goal is to deanonymize all anonymous users in the target OSN (who have accounts in the auxiliary OSN). For the evaluation metrics, we use precision, recall, and accuracy. Hungarian algorithm and the proposed BP-based algorithm provide a one-to-one match between all the users. However, we cannot expect that all anonymous users in the target OSN have profiles in the auxiliary OSN. Therefore, some of the provided matches are useless for us. Thus, we select a “similarity threshold” (“probability threshold” for machine learning techniques) for evaluation. Each matching scheme returns 1 (i.e., true match) if the similarity/probability of user pair is higher than the threshold, and 0 otherwise. So, we consider as true positives the pairs that are correctly matched by the algorithm and whose similarity/probability is greater than the threshold. We also compute accuracy as the number of correctly matched coupled pairs identified by the algorithm over the total number of coupled pairs. 5.2

Data Collection

To evaluate our proposed framework, we use three datasets: (i) Dataset 1 (D1): Google+ - Twitter [13], (ii) Dataset 2 (D2): Instagram - Twitter, and (iii) Dataset 3 (D3): Flickr social graph [40]. To collect the coupled profiles in D1 and D2, social links in Google+ profiles and about.me (a social network where users provide links to their OSN profiles) were used, respectively. In terms of dataset sizes, (i) D1 consists of 8000 users in each OSNs where 4000 of them are coupled profiles; (ii) D2 consists of more than 10000 coupled profiles (and more content about the OSN users compared to D1); and (iii) D3 consists of 50000 users. In D1, we use Twitter as our auxiliary OSN (A) and Google+ as our target OSN (T ); in D2, we use Twitter as our auxiliary OSN (A) and Instagram as our target OSN (T ); and in D3, we generate the auxiliary and the target OSN graphs as in [28,36] by using a vertex overlap of 1 and an edge overlap of 0.9. 5.3

Evaluation Settings

Since the model generation process is the same for all three datasets, in the rest of the paper, we hold the discussion over a target and auxiliary network. From each dataset, we select 3000 profile pairs (1500 coupled and 1500 uncoupled) for training. We also select 500 users from the auxiliary OSN and 500 users from the target OSN to construct sets Ae and Te , respectively. Note that none of these users are involved in the training set. Among these profiles, we have 500 coupled pairs and 249500 uncoupled pairs, and hence the goal is to make sure that these 500 users are matched with high confidence in a global attack scenario. Note that we do not use cross-validation because we consider all the possible user combinations to test our model and it is time-consuming to compute all similarity metrics for all combinations. Considering all the combinations instead

122

A. Halimi and E. Ayday

of randomly selecting some user pairs is a more realistic evaluation setting since one can never know which users pairs the attacker will have access to. In cases that there are missing attributes (that are not published by the users) in the dataset, we assign a value for the attribute similarity based on the distributions of the attribute similarity values between the coupled and uncoupled pairs. 5.4

Evaluation of BP-Based Model Generation

In Fig. 3, we show the comparison of the proposed BP-based model generation to [12,13,26,31,36] for each dataset (D1, D2, and D3). [12,26,31,36] use machine learning-based techniques (k-nearest neighbor (KNN), decision tree, random forest, and/or SVM), while [13] uses the Hungarian Algorithm. Our results show that the proposed scheme provides comparable precision and recall compared to the state-of-the-art Hungarian algorithm and it significantly outpowers machine leaning-based algorithms. For instance, the proposed algorithm provides a precision value of around 0.97 (for a similarity threshold of 0.5) in D1. This means, if our proposed algorithm returns a matched profile pair that has a similarity value above 0.5, the corresponding profiles belong to same individual with a high confidence. At the same time, the complexity of the proposed algorithm scales linearly with the number of user pairs, while the Hungarian algorithm suffers from cubic complexity. Note that precision and recall values obtained from D2 are higher compared to the ones from D1 as we collected more information about users in D2. We also compare the BP-based model generation to the deep neural network based algorithm (DeepLink) [43] in D3. For DeepLink, we use the same settings as in [43]. DeepLink achieves an accuracy of 84% in D3 which is slightly less than the one obtained by the proposed algorithm (90%). DeepLink achieves a precision of 0.84, and a recall of 1 while the BP-based algorithm achieves a precision of 0.93 and a recall of 0.9. DeepLink provides a match for each user even if that user does not have a match. 1

0.8

Precision & Recall

Precision & Recall

0.8

1

Precision-DT Recall-DT Precision-KNN Recall-KNN Precision-RF Recall-RF Precision-SVM Recall-SVM Precision-Hungarian Recall-Hungarian Precision-BP Recall-BP

0.6

0.4

0.2

0.8

Precision-DT Recall-DT Precision-KNN Recall-KNN Precision-RF Recall-RF Precision-SVM Recall-SVM Precision-Hungarian Recall-Hungarian Precision-BP Recall-BP

0.6

0.4

Precision & Recall

1

0.2

0 0.2

0.4

0.6

Similarity/Probability Threshold

(a) D1

0.8

0.6

0.4

0.2

0 0

Precision-DT Recall-DT Precision-KNN Recall-KNN Precision-RF Recall-RF Precision-SVM Recall-SVM Precision-Hungarian Recall-Hungarian Precision-BP Recall-BP

0 0

0.2

0.4

0.6

Similarity/Probability Threshold

(b) D2

0.8

0

0.2

0.4

0.6

0.8

Similarity/Probability Threshold

(c) D3

Fig. 3. Comparison of the proposed BP-based scheme with the Hungarian algorithm and machine learning techniques (decision tree - DT, KNN, random forest - RF, and SVM) in terms of precision and recall for D1, D2, and D3.

Efficient Quantification of Profile Matching Risk in Social Networks

123

We also study the effect of the OSNs’ size to precision and recall of the proposed algorithm. In Fig. 4, for each dataset, we show the precision/recall values of the BP-based algorithm when the number of users in the auxiliary OSN (OSN A) increases while the number of target users (i.e., users in OSN T ) is fixed. We set the number of users in OSN T as 100 and increase the number of users in OSN A from 100 to 1000 in steps of 100. We observe that the precision/recall values of the proposed algorithm only slightly decrease with the increase of auxiliary OSN’s size, which shows the scalability of our proposed algorithm. We achieve similar results for the other two scenarios: (i) when we fix the number of users in OSN A and vary the number of users in OSN T ; and (ii) when we increase the number of users in both OSNs A and T . Due to the space constraints, we present the details of the results of scenarios (i) and (ii) in Fig. 7 and 8, respectively, in Appendix A.

0.9

0.8

0.9

0.8

Precision Recall

0.7 100

200

300

1

Precision & Recall

1

Precision & Recall

Precision & Recall

1

0.9

0.8

Precision Recall

400

500

600

700

800

Number of users in OSN A

(a) D1

900 1000

0.7 100

200

300

Precision Recall

400

500

600

700

800

Number of users in OSN A

(b) D2

900 1000

0.7 100

200

300

400

500

600

700

800

900 1000

Number of users in OSN A

(c) D3

Fig. 4. The effect of auxiliary OSN’s (OSN A) size to precision/recall when the size of target OSN (OSN T ) is 100 in D1, D2, and D3.

Next, we evaluate the -accuracy of the proposed model generation algorithm (introduced in Sect. 4.4). There are many parameters to consider to analyze the -accuracy of the proposed algorithm, such as the average degree of factor nodes, total similarity of each user in the target OSN with the ones in the auxiliary OSN, and number of users in the target and auxiliary OSNs. Here, we experimentally analyze and show the -accuracy of the proposed algorithm considering such parameters. For evaluation, we use all datasets and pick 500 users from OSN T . For all the studied parameters, we observe that at least 97% of coupled profiles that can be correctly matched by the BP-based algorithm satisfy the sufficient condition (introduced in Sect. 4.4). We observe that  value is inversely proportional to the average degree of the factor nodes. In D1, the -accuracy of the proposed algorithm is  = 67 and  = 84 when the average degrees of the factor nodes are 500 and 22, respectively (we discuss more about the results of this experiment in Sect. 5.5).

124

A. Halimi and E. Ayday

To study the impact of user pairs’ similarity, for each user in OSN T , we compute the variance of the similarity values between that user and all users in OSN A. Then, we compute the accuracy of the proposed BP-based (a) D1 (b) D2 algorithm on users with varying variance values. For evaluation, Fig. 5. The effect of variance threshold to we use D1 and D2. Our results accuracy in D1 and D2. For “high variance”, show that -accuracy of the pro- our goal is to match users in OSN T that have posed algorithm is higher for users a variance (for the similarity values between a with higher variances (as shown user in OSN T and all users in OSN A) greater in Fig. 5). For instance, in D1, we than the variance threshold, while for “low variobserve the -accuracy as  = 42 ance”, we consider users that have a variance and  = 90, when we run the pro- value smaller than the variance threshold. posed algorithm only for the users with low variance (lower than 0.008) and high variance (higher than 0.012), respectively. These results show that the vulnerability of the users in an OSN (for the profile matching attack) can be identified by analyzing particular characteristics of the OSNs. Furthermore, we observe that  value is inversely proportional to the number of users in OSN A (as shown in Fig. 6). The proposed algorithm achieves an -accuracy of  = 82 and  = 72 in D1 when the number of users in A is 100 and 1000, respectively while the number of users in T is 100. This decrease in accuracy can be considered as low considering that the number of possible matches increases 10 times (from 10000 to 100000 user pairs). In D2, we observe a similar trend to D1. In D3, accuracy decreases faster with the increase in number of users in OSN A (compared to D1 and D2). This is because, in D3, we only use the graph connectivity attribute for profile matching. Thus, as the number of users in OSN A increases, the number of users with similar graph connectivity patterns also increases causing the decrease in accuracy. 100

100

80

80

60

60

40

40

20

20

High Variance Low Variance

0 1.5

High Variance Low Variance

0

3.5

5.5

7.5

9.5

11.5

Variance Threshold

13.5

2

↓ 10-3

100

100

100

90

90

90

80

80

80

4

6

8

10

12

Variance Threshold

14

16

↓ 10-3

70 100 200 300 400 500 600 700 800 900 1000

70 100 200 300 400 500 600 700 800 900 1000

70 100 200 300 400 500 600 700 800 900 1000

Number of users in OSN A

Number of users in OSN A

Number of users in OSN A

(a) D1

(b) D2

(c) D3

Fig. 6. The effect of auxiliary OSN’s (OSN A) size to -accuracy in D1, D2, and D3. The size of target OSN (OSN T ) is fixed to 100.

Efficient Quantification of Profile Matching Risk in Social Networks

5.5

125

Complexity Analysis of the BP-Based Algorithm

The complexity of the BP-based Table 1. Evaluation of the proposed BP-based algorithm is linear in the num- algorithm with varying the number of variable ber of variable (or factor nodes). nodes. N denotes the number of users in OSN T , In the proposed BP-based algo- and each value in the complexity column shows rithm, we generate a variable the number of variable nodes (i.e., the number node for all potential matches of users pairs used for profile matching). between the target and the auxDataset Complexity Precision Recall Accuracy N2 0.978 0.944 67.4% iliary OSNs. Assuming N users in √ D1 N N 0.955 0.939 84.6% both target and auxiliary OSNs, N log N 0.975 0.966 96% this results in N 2 variable nodes N2 0.978 0.982 90.4% √ in the graph. To analyze the effect D2 N N 0.977 0.954 94.8% of number of variable nodes on the N log N 0.934 0.946 90.6% N2 0.925 0.976 90.4% performance, we experimentally √ D3 N N 0.92 0.973 91% try to change the graph structure N log N 0.932 0.976 95% and limit the number of variable nodes for each user in the target OSN. We heuristically decrease the average degree of the factor nodes from N to √ N and log N by only leaving the variable nodes (user pairs) with the highest similarity values. For evaluation, we use all datasets (D1, D2, and D3). We pick 500 users from OSNs A and T to construct the test dataset (where there are 500 coupled and 249500 uncoupled profile pairs initially). In Table 1, we show the results with varying number of variable nodes. For instance, in D1, we obtain an accuracy of 67.4% when all the potential matches are considered (i.e., with N variable nodes for each user in T ) and an accuracy of 96% when we use only log N variable nodes for each user. These results are important since they show that while reducing the complexity of the proposed BP-based algorithm, we can further improve its accuracy.

6

Discussion

In this section we discuss how the proposed framework can be utilized for sensitive OSNs, and potential mitigation techniques against the identified profile matching risk. 6.1

Profile Matching on Sensitive OSNs

Note that in D1 and D2, users provide the links to their social networks publicly. It is quite hard to obtain coupled profiles from social networks where users share sensitive information such as PatientsLikeMe. We expect to obtain similar results as long as users share similar attributes across OSNs. Considering that these users are more privacy-cautious, mostly non-obvious attributes such as interest, activity, sentiment similarity, or writing style can be used.

126

A. Halimi and E. Ayday

6.2

Mitigation Techniques

We foresee that the OSN can provide recommendations to the users (about the content they share) to reduce their risk for profile matching attacks. Such recommendation may include (i) generalizing or distorting some shared content of the user (e.g., generalizing the shared location or posting a content at a later time); or (ii) choosing not to share some content (especially for attributes that are hard to generalize or distort, such as interest or sentiment). When generating such recommendations, there are two main objectives: (i) content shared by the user should not increase user’s risk for profile matching and (ii) utility of the content shared by the user (or utility of user’s profile) should not decrease due to the applied countermeasures. Using a utility metric for the user’s profile, the proposed framework (in Sect. 4.3) can be used to formulate an optimization between the utility of the user’s profile and privacy of the user. The solution of this optimization problem can provide recommendations to the user about how to (or whether to) share a new content on their profile.

7

Conclusion

In this work, we have proposed a novel message passing-based framework to model the profile matching risk in online social networks (OSNs). We have shown via simulations that the proposed framework provides comparable accuracy, precision, and recall compared to the state-of-the-art, while it is significantly more efficient in terms of its computational complexity. We have also shown that by controlling the structure of the proposed BP-algorithm we can further decrease the complexity of the algorithm while increasing its accuracy. We believe that the proposed framework will be instrumental for OSNs to educate their users about the consequences of their online sharings. It will also pave the way towards real-time privacy risk quantification in OSNs against profile matching attacks. Acknowledgment. We would like to thank the anonymous reviewers and our shepherd Shujun Li for their constructive feedback which has helped us to improve this paper.

Appendix A

Scalability of the BP-Based Algorithm

We study the effect of the OSNs’ size to precision and recall of the proposed algorithm. In Sect. 5.4, we provided the results when the number of users in OSN T is fixed. Here, we provide the results of the other two scenarios. In Fig. 7, for each dataset, we show the precision/recall values of the BP-based algorithm when the number of users in the target OSN (OSN T ) increases while the number of auxiliary users (i.e., users in OSN A) is fixed. We set the number of users in

Efficient Quantification of Profile Matching Risk in Social Networks

0.9

0.8

0.9

0.8

Precision Recall

0.7 100

200

300

1

Precision & Recall

1

Precision & Recall

Precision & Recall

1

0.9

0.8

Precision Recall

400

500

600

700

800

0.7 100

900 1000

127

200

Number of users in OSN T

300

Precision Recall

400

500

600

700

800

0.7 100

900 1000

200

Number of users in OSN T

(a) D1

300

400

500

600

700

800

900 1000

Number of users in OSN T

(b) D2

(c) D3

Fig. 7. The effect of target OSN’s (OSN T ) size to precision/recall when the size of auxiliary OSN (OSN A) is 1000 in D1, D2, and D3.

0.9

0.8

0.9

0.8

Precision Recall

0.7 100

200

300

1

Precision & Recall

1

Precision & Recall

Precision & Recall

1

0.9

0.8

Precision Recall

400

500

600

700

800

900 1000

Number of users in OSNs A and T

(a) D1

0.7 100

200

300

Precision Recall

400

500

600

700

800

900 1000

0.7 100

Number of users in OSNs A and T

200

300

400

500

600

700

800

900 1000

Number of users in OSNs A and T

(b) D2

(c) D3

Fig. 8. The effect of auxiliary and target OSNs’ (OSN A and T ) size to precision/recall in D1, D2, and D3.

OSN A as 1000 and increase the number of users in OSN T from 100 to 1000 in steps of 100. In Fig. 8, for each dataset, we show the precision/recall values of the BPbased algorithm when the number of users in both OSNs (i.e., OSN A and T ) increases from 100 to 1000 in steps of 100. In both scenarios, we observe that the precision/recall values of the proposed algorithm only slightly decrease with the increase of the number of users in the target OSN, or the increase of the number of users in both OSNs, which shows the scalability of our proposed algorithm. To further check the effect of the auxiliary OSN’s size to precision and recall of the BP-based algorithm, we quantify the precision/recall values obtained by the proposed algorithm for larger scales in D3. We fix the number of users in the target OSN (i.e., OSN T ) to 1000 while the numNumber of users in OSN A ber of users in the auxiliary OSN (i.e., OSN A) increases from 1000 to 8000 Fig. 9. The effect of auxiliary OSN’s (OSN in steps of 1000 (in Fig. 4 the number A) size to precision/recall when the size of of users in OSN T was fixed to 100 target OSN (OSN T ) is 1000 in D3. while the number of users in OSN A Precision & Recall

1

0.9

0.8

Precision Recall

0.7 1000

2000

3000

4000

5000

6000

7000

8000

128

A. Halimi and E. Ayday

was increasing from 100 to 1000). We show the results for D3 in Fig. 9. The precision/recall values slightly decrease with the increase of the number of users in OSN A, confirming the scalability of the proposed algorithm. Note that, in D3, we only use the graph connectivity attribute for profile matching. We expect that the decrease in precision/recall values will be smaller when both the graphical structure and other attributes of the users are used to generate the model.

References 1. 2. 3. 4.

5.

6. 7. 8. 9.

10.

11.

12.

13. 14.

15.

16.

Google maps API (2020). https://developers.google.com/maps/ Natural language toolkit (2020). http://www.nltk.org/ Patienslikeme (2020). https://www.patientslikeme.com/ Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science (2016) Andreou, A., Goga, O., Loiseau, P.: Identity vs. attribute disclosure risks for users with multiple social profiles. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2017) Ayday, E., Fekri, F.: Iterative trust and reputation management using belief propagation. IEEE Trans. Dependable Secur. Comput. 9(3), 375–386 (2012) Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007) Debnath, S., Ganguly, N., Mitra, P.: Feature weighting in content based recommendation system using social network analysis. In: Proceedings of the International Conference on World Wide Web (2008) Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (2005) Goga, O., Lei, H., Parthasarathi, S.H.K., Friedland, G., Sommer, R., Teixeira, R.: Exploiting innocuous activity for correlating users across sites. In: Proceedings of the 22nd International Conference on World Wide Web (2013) Goga, O., Loiseau, P., Sommer, R., Teixeira, R., Gummadi, K.P.: On the reliability of profile matching across large online social networks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015) Halimi, A., Ayday, E.: Profile matching across unstructured online social networks: threats and countermeasures. arXiv preprint arXiv:1711.01815 (2017) Iofciu, T., Fankhauser, P., Abel, F., Bischoff, K.: Identifying users across social tagging systems. In: Proceedings of the International AAAI Conference on Web and Social Media (2011) Jain, P., Kumaraguru, P., Joshi, A.: @i seek ‘fb.me’: identifying users across multiple online social network. In: Proceedings of the 22nd International Conference on World Wide Web (2013) Ji, S., Li, W., Gong, N.Z., Mittal, P., Beyah, R.: On your social network deanonymizablity: quantification and large scale evaluation with seed knowledge. In: Proceedings of the Network and Distributed System Security Symposium (2015)

Efficient Quantification of Profile Matching Risk in Social Networks

129

17. Ji, S., Li, W., Mittal, P., Hu, X., Beyah, R.: SecGraph: a uniform and opensource evaluation system for graph data anonymization and de-anonymization. In: Proceedings of the 24th USENIX Security Symposium (2015) 18. Ji, S., Li, W., Srivatsa, M., Beyah, R.: Structural data de-anonymization: quantification, practice, and implications. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, pp. 1040–1053. ACM (2014) 19. Korula, N., Lattanzi, S.: An efficient reconciliation algorithm for social networks. Proc. VLDB Endowment 7(5), 377–388 (2014) 20. Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001) 21. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955) 22. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966) 23. Liu, J., Zhang, F., Song, X., Song, Y.I., Lin, C.Y., Hon, H.W.: What’s in the name?: an unsupervised approach to link users across communities. In: Proceedings of ACM International Conference on Web Search and Data Mining (2013) 24. Liu, S., Wang, S., Zhu, F.: Structured learning from heterogeneous behavior for social identity linkage. IEEE Trans. Knowl. Data Eng. 27(7), 2005–2019 (2015) 25. Liu, S., Wang, S., Zhu, F., Zhang, J., Krishnan, R.: Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2014) 26. Malhotra, A., Totti, L., Meira Jr., W., Kumaraguru, P., Almeida, V.: Studying user footprints in different online social networks. In: Proceedings of the International Conference on Advances in Social Network Analysis and Mining (2012) 27. Motoyama, M., Varghese, G.: I seek you: searching and matching individuals in social networks. In: Proceedings of the 11th International Workshop on Web Information and Data Management (2009) 28. Narayanan, A., Shi, E., Rubinstein, B.I.P.: Link prediction by de-anonymization: how we won the kaggle social network challenge. In: Proceedings of the International Joint Conference on Neural Networks (2011) 29. Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Proceedings of IEEE Symposium on Security and Privacy (2009) 30. Nilizadeh, S., Kapadia, A., Ahn, Y.Y.: Community-enhanced de-anonymization of online social networks. In: Proceedings of ACM Conference on Computer and Communications Security (2014) 31. Nunes, A., Calado, P., Martins, B.: Resolving user identities over social networks through supervised learning and rich similarity features. In: Proceedings of ACM Symposium on Applied Computing (2012) 32. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988) 33. Pedarsani, P., Figueiredo, D.R., Grossglauser, M.: A Bayesian method for matching two similar graphs without seeds. In: Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing (2013) 34. Perito, D., Castelluccia, C., Kaafar, M.A., Manils, P.: How unique and traceable are usernames? In: Fischer-H¨ ubner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 1–17. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-64222263-4 1 35. Pishro-Nik, H., Fekri, F.: Performance of low-density parity-check codes with linear minimum distance. IEEE Trans. Inf. Theory 52(1), 292–300 (2005)

130

A. Halimi and E. Ayday

36. Sharad, K., Danezis, G.: An automated social graph de-anonymization technique. In: Proceedings of the 13th ACM Workshop on Privacy in the Electronic Society (2014) 37. Shu, K., Wang, S., Tang, J., Zafarani, R., Liu, H.: User identity linkage across online social networks: a review. ACM SIGKDD Explorations Newsletter 18(2), 5–17 (2017) 38. Vosecky, J., Hong, D., Shen, V.Y.: User identification across multiple social networks. In: Proceedings of the International Conference on Networked Digital Technologies (2009) 39. Wondracek, G., Holz, T., Kirda, E., Kruegel, C.: A practical attack to deanonymize social network users. In: Proceedings of IEEE Symposium on Security and Privacy (2010) 40. Zafarani, R., Liu, H.: Social computing data repository at ASU (2009). http:// socialcomputing.asu.edu 41. Zafarani, R., Liu, H.: Connecting corresponding identities across communities. In: Proceedings of the 3rd International AAAI Conference on Web and Social Media (2009) 42. Zafarani, R., Liu, H.: Connecting users across social media sites: a behavioralmodeling approach. In: Proceedings of ACM SIDKDD Conference on Knowledge Discovery and Data Mining (2013) 43. Zhou, F., Liu, L., Zhang, K., Trajcevski, G., Wu, J., Zhong, T.: DeepLink: a deep learning approach for user identity linkage. In: Proceedings of IEEE International Conference on Computer Communications, pp. 1313–1321. IEEE (2018)

Network Security I

Anonymity Preserving Byzantine Vector Consensus Christian Cachin1 , Daniel Collins2,3(B) , Tyler Crain2 , and Vincent Gramoli2,3 1

University of Bern, Bern, Switzerland [email protected] 2 University of Sydney, Sydney, Australia [email protected], [email protected] 3 EPFL, Lausanne, Switzerland [email protected]

Abstract. Collecting anonymous opinions has applications from whistleblowing to complex voting where participants rank candidates by order of preferences. Unfortunately, as far as we know there is no efficient distributed solution to this problem. Previous solutions either require trusted third parties, are inefficient or sacrifice anonymity. In this paper, we propose a distributed solution called the Anonymised Vector Consensus Protocol (AVCP) that reduces the problem of agreeing on a set of anonymous votes to the binary Byzantine consensus problem. The key idea to preserve the anonymity of voters—despite some of them acting maliciously—is to detect double votes through traceable ring signatures. AVCP is resilient-optimal as it tolerates up to a third of Byzantine participants. We show that our algorithm is correct and that it preserves anonymity with at most a linear communication overhead and constant message overhead when compared to a recent consensus baseline. Finally, we demonstrate empirically that the protocol is practical by deploying it on 100 machines geo-distributed in three continents: America, Asia and Europe. Anonymous decisions are reached within 10 s with a conservative choice of traceable ring signatures. Keywords: Anonymity · Byzantine agreement consensus · Distributed computing

1

· Consensus · Vector

Introduction and Related Work

Consider a distributed survey where a group of mutually distrusting participants wish to exchange their opinions about some issue. For example, participants may wish to communicate over the Internet to rank candidates in order of preference to change the governance of a blockchain. Without making additional trust assumptions [1,18,38], one promising approach is to run a Byzantine consensus algorithm [40], or more generally a vector consensus algorithm [16,23,46] to allow for arbitrary votes. In vector consensus, a set of participants decide on a common vector of values, each value being proposed by one process. Unlike c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 133–152, 2020. https://doi.org/10.1007/978-3-030-58951-6_7

134

C. Cachin et al.

interactive consistency [40], a protocol solving vector consensus can be executed without fully synchronous communication channels, and as such is preferable for use over the Internet. Unfortunately, vector consensus protocols tie each participant’s opinion to its identity to ensure one opinion is not overrepresented in the decision and to avoid double voting. There is thus an inherent difficulty in solving vector consensus while preserving anonymity. In this paper, we introduce the anonymity-preserving vector consensus problem that prevents an adversary from discovering the identity of non-faulty participants that propose values in the consensus and we introduce a solution called Anonymised Vector Consensus Protocol (AVCP). To prevent the leader in some Byzantine consensus algorithms [12] from influencing the outcome of the vote by discarding proposals, AVCP reduces the problem to binary consensus that is executed without the need for a traditional leader [17]. We provide a mechanism to prevent Byzantine processes from double voting while also decoupling their ballots from their identity. In particular, we adopt traceable ring signatures [29,35], which enable participants to anonymously prove their membership in a set of participants, exposing a signer’s identity if and only if they sign two different votes. This can disincentivise participants from proposing multiple votes to the consensus. Alternatively, we could use linkable ring signatures [41], but they cannot ensure that Byzantine processes are held accountable when double-signing. We could also have used blind signatures [11, 13], but this would have required an additional trusted authority. We also identify interesting conditions to ensure anonymous communication. Importantly, participants must propagate their signatures anonymously to hide their identity. To this end, we could construct anonymous channels directly. However, these protocols either require additional trusted parties, are not robust or incur O(n) message delays to terminate [14,32,33,37]. Thus, we assume access to anonymous channels, such as through publicly-deployed [22,56] networks which often require sustained network observation or large amounts of resources to deanonymise with high probability [31,44]. Anonymity is ensured then as processes do not reveal their identity via ring signatures and communicate over anonymous channels. If correlation-based attacks on certain anonymous networks are deemed viable [42] and for efficiency’s sake, processes can anonymously broadcast their proposal and then continue the protocol execution over regular channels. However, anonymous channels alone cannot ensure anonymity when combined with regular channels as an adversary may infer identities through latency [2] and message transmission ordering [48]. For example, an adversary may relate late message arrivals from a single process over both channels to guess the identity of a slow participant. This is why, to ensure anonymity, the timing and order of messages exchanged over anonymous channels should be statistically independent (or computationally indistinguishable with a computational adversary) from that of those exchanged over regular channels. In practice, one may attempt to ensure that there is a low correlation by ensuring that message delays over anonymous and regular channels are each sufficiently randomised.

Anonymity Preserving Byzantine Vector Consensus

135

We construct our solution iteratively first by defining the anonymitypreserving all-to-all reliable broadcast problem that may be of independent interest. Here, anonymity and comparable properties to reliable broadcast [7] with n broadcasters are ensured. By constructing a solution to this problem with Bracha’s classical reliable broadcast [5], AVCP can terminate after three regular and one anonymous message delay. With this approach, our experimental results are promising—with 100 geo-distributed nodes, AVCP generally terminates in less than ten seconds. We remark that, to ensure confidentiality of proposals until after termination, threshold encryption [20,52], in which a minimum set of participants must cooperate to decrypt any message, can be used at the cost of an additional message delay. Related constructions. We consider techniques without additional trusted parties. Homomorphic tallying [18] encodes opinions as values that are summed in encrypted form prior to decryption. Such schemes that rely on a public bulletin board for posting votes [34] could use Byzantine consensus [40] and make timing assumptions on vote submission to perform an election. Unfortunately, homomorphic tallying is impractical when the pool of candidates is large and impossible when arbitrary. Nevertheless, using a multiplicative homomorphic scheme [25] requires exponential work in the amount of candidates at tallying time, and additive homomorphic encryption like Paillier’s scheme [47] incurs large (RSA-size) parameters and costly operations. Fully homomorphic encryption [30] is suitable in theory, but at present is untenable for computing complex circuits efficiently. Self-tallying schemes [34], which use homomorphic tallying, are not appropriate as at some point all participants must be correct, which is untenable with arbitrary (Byzantine) behaviour. Constructions involving mixnets [15] allow for arbitrary ballot structure. However, decryption mix-nets are not a priori robust to a single Byzantine or crash failure [15], and re-encryption mix-nets which use proofs of shuffle are generally slow in tallying [1]. With arbitrary encryptions, larger elections can take minutes to hours [39] to tally. Even with precomputation, O(t) processes still need to perform proofs of shuffle [45] in sequence, which incurs untenable latency even without process faults. With faults and/or asynchrony, we could run O(t) consensus instances in sequence but at an additional cost. DC-nets and subsequent variations [14,33] are not sufficiently robust and generally require O(n) message delays for an all-to-all broadcast. On multi-party computation, general techniques like Yao’s garbledcircuits [55] incur untenable overhead given enough complexity in the structure of ballots. Private-set intersection [27] can be efficient for elections that require unanimous agreement, but do not generalise arbitrarily. Roadmap. The paper is structured as follows. Section 2 provides preliminary definitions and specifies the model. Section 3 presents protocols and definitions required for our consensus protocol. Section 4 presents our anonymitypreserving vector consensus solution. Section 5 benchmarks AVCP on up to 100 geo-distributed located on three continents. To conclude, Sect. 6 discusses the use of anonymous and regular channels in practice and future work.

136

2

C. Cachin et al.

Model

We assume the existence of a set of processes P = {p1 , . . . , pn } (where |P | = n, and the ith process is pi ), an adversary A who corrupts t < n3 processes in P , and a trusted dealer D who generates the initial state of each process. For simplicity of exposition, we assume that cryptographic primitives are unbreakable. With concrete primitives that provide computational guarantees, each party could be modelled as being able to execute a number of instructions each message step bounded by a polynomial in a security parameter k [10]. In this case, the transformation of our proofs presented in the companion technical report [8] is straight forward given the hardness assumptions required by the underlying cryptographic schemes. 2.1

Network

We assume that P consists of asynchronous, sequential processes that communicate over reliable, point-to-point channels in an asynchronous network. An asynchronous process is one that executes instructions at its own pace. An asynchronous network is one where message delays are unbounded. A reliable network is such that any message sent by a correct process will eventually be delivered by the intended recipient. We assume that processes can also communicate using reliable one-way anonymous channels, which we soon describe. Each process is equipped with the primitive “send M to pj ”, which sends the message M (possibly a tuple) to process pj ∈ P . For simplicity, we assume that a process can send a message to itself. A process receives a message M by invoking the primitive “receive M ”. Each process may invoke “broadcast M ”, which is short-hand for “for each pi ∈ P do send M to pi end for”. Analogously, processes may invoke “anon send M to pj ” and “anon broadcast M ” over anonymous channels. Since reaching consensus deterministically is impossible in failure-prone asynchronous message-passing systems [26], we assume that partial synchrony holds among processes in P in Sect. 4. That is, we assume there exists a point in time in protocol execution, the global stabilisation time (GST), unknown to all processes, after which the speed of each process and all message transfer delays are upper bounded by a finite constant [24]. 2.2

Adversary

We assume that the adversary A schedules message delivery over the regular channels, restricted to the assumptions of our model (e.g., the reliability of the channels). For each send call made, A determines when the corresponding receive call is invoked. A portion of processes—exactly t < n3 members of P —are initially corrupted by A and may exhibit Byzantine faults [40] over the lifetime of a protocol’s execution. That is, they may deviate from the predefined protocol in an arbitrary way. We assume A can see all computations and messages sent and received by corrupted processes. A non-faulty process is one that is not corrupted

Anonymity Preserving Byzantine Vector Consensus

137

by A and therefore follows the prescribed protocol. A can only observe anon send and anon receive calls made by corrupted processes. A cannot see the (local) computations that non-faulty processes perform. We do not restrict the amount or speed of computation that A can perform. 2.3

Anonymity Assumption

Consider the following experiment. Suppose that pi ∈ P is non-faulty and invokes “anon send m to pj ”, where pj ∈ P is possibly corrupted, and pj invokes anon receive with respect to m. No process can directly invoke send or invoke receive in response to a send call at any time. pj is allowed to use anon send to message corrupted processes if it is corrupted, and can invoke anon recv with respect to messages sent by corrupted processes. Each process is unable to make oracle calls (described below), but is allowed to perform an arbitrary number of local computations. pj then outputs a single guess, g ∈ {1, . . . , n} as to the 1 . identity of pi . Then for any run of the experiment, P r(i = g) ≤ n−t As A can corrupt t processes, the anonymity set [19], i.e., the set of processes pi is indistinguishable from, comprises n − t non-faulty processes. Our definition captures the anonymity of the anonymous channels, but does not consider the effects of regular message passing and timing on anonymity. As such, we can use techniques to establish anonymous channels in practice with varying levels of anonymity with these factors considered. 2.4

Traceable Ring Signatures

Informally, a ring signature [29,50] proves that a signer has knowledge of a private key corresponding to one of many public keys of their choice without revealing the corresponding public key. Hereafter, we consider traceable ring signatures (or TRSs), which are ring signatures that expose the identity of a signer who signs two different messages. To increase flexibility, we can consider traceability with respect to a particular string called an issue [54], allowing signers to maintain anonymity if they sign multiple times, provided they do so each time with respect to a different issue. We now present relevant definitions of the ring signatures which are analogous to those of Fujisaki and Suzuki [29]. Let ID ∈ {0, 1}∗ , which we denote as a tag. We assume that all processes may query an idealised distributed oracle, which implements the following four operations: 1. σ ← Sign(i, ID, m), which takes the integer i ∈ {1, . . . , n}, tag ID ∈ {0, 1}∗ and message m ∈ {0, 1}∗ , and outputs the signature σ ∈ {0, 1}∗ . We restrict Sign such that only process pi ∈ P may invoke Sign with first argument i. 2. b ← VerifySig(ID, m, σ), which takes the tag ID, message m ∈ {0, 1}∗ , and signature σ ∈ {0, 1}∗ , and outputs a bit b ∈ {0, 1}. All parties may query VerifySig. 3. out ← Trace(ID, m, σ, m , σ  ), which takes the tag ID ∈ {0, 1}∗ , messages m, m ∈ {0, 1}∗ and signatures σ, σ  ∈ {0, 1}∗ , and outputs out ∈ {0, 1}∗ ∪

138

C. Cachin et al.

{1, . . . , n} (possibly corresponding to a process pi ). All parties may query Trace. 4. x ← FindIndex(ID, m, σ) takes a tag ID ∈ {0, 1}∗ , a message m ∈ {0, 1}∗ , and a signature σ ∈ {0, 1}∗ , and outputs a value x ∈ {1, . . . , n}. FindIndex may not be called by any party, and exists only for protocol definitions. The distributed oracle satisfies the following relations: – VerifySig(ID, m, σ) = 1 ⇐⇒ ∃ pi ∈ P which invoked σ ← Sign(i, ID, m). – Trace is as below ⇐⇒ σ ← Sign(i, ID, m) and σ  ← Sign(i , ID, m ) where: ⎧ ⎪ if i = i , ⎨“indep” Trace(ID, m, σ, m , σ  ) = “linked” else if m = m , ⎪ ⎩ i otherwise (i = i ∧ m = m ). – If adversary D is given an arbitrary set of signatures S and must identify the 1 signer pi of a signature σ ∈ S by guessing i, P r(i = g) ≤ n−t for any D. The concrete scheme proposed by Fujisaki and Suzuki [29] computationally satisfies these properties in the random oracle model [3] provided the Decisional Diffie-Hellman problem is intractable. In our protocols, where the ring comprises the n processes of P , the resulting signatures are of size O(kn), where k is the security parameter. To simplify the presentation, we assume that its properties hold unconditionally in the following.

3 3.1

Communication Primitives Traceable Broadcast

Suppose a process p wishes to anonymously send a message to a given set of processes P . By invoking anonymous communication channels, p can achieve this, but processes in P \ {p} are unable to verify that p resides in P , and so cannot meaningfully participate in the protocol execution. By using (traceable) ring signatures, p can verify its membership in P over anonymous channels without revealing its identity. To this end, we outline a simple mechanism to replace the invocation of send and receive primitives (over regular channels) with calls to ring signature and anonymous messaging primitives (namely anon send and anon receive). Let (ID, TAG, label, m) be a tuple for which pi ∈ P has invoked send with respect to. Instead of invoking send, pi first calls σ ← Sign(i, T, m), where T is a tag uniquely defined by TAG and label. Then, pi invokes anon send M, where M = (ID, TAG, label, m, σ). Upon an anon receive M call by pj ∈ P , pj verifies that VerifySig(T, m, σ) = 1 and that they have not previously received (m , σ  ) such that Trace(T, m, σ, m , σ  ) = “indep”. Given this check passes, pj invokes receive(ID, TAG, label, m). In our protocols, processes always broadcast messages, so the transformation is used with respect to broadcast calls.

Anonymity Preserving Byzantine Vector Consensus

139

By the properties of the anonymous channels and the signatures, it follows that anonymity as defined in the previous section holds with additional adversarial access to the distributed oracles. We present the proof in the companion technical report [8]. Hereafter, we assume that calls to primitives send and receive are handled by the procedure presented in this subsection unless explicitly stated otherwise. 3.2

Binary Consensus

Broadly, the binary consensus problem involves a set of processes reaching an agreement on a binary value b ∈ {0, 1}. We first recall the definitions that define the binary Byzantine consensus (BBC) problem as stated in [17]. In the following, we assume that every non-faulty process proposes a value, and remark that only the values in the set {0, 1} can be decided by a non-faulty process. 1. BBC-Termination: Every non-faulty process eventually decides on a value. 2. BBC-Agreement: No two non-faulty processes decide on different values. 3. BBC-Validity: If all non-faulty processes propose the same value, no other value can be decided. For simplicity, we present the safe, non-terminating variant of the binary consensus routine from [17] in Algorithm 1. As assumed in the model (Sect. 2), the terminating variant relies on partial synchrony between processes in P . The protocols execute in asynchronous rounds. State. A process keeps track of a binary value est ∈ {0, 1}, corresponding to a process’ current estimate of the decided value, arrays bin values[1..], in which each element is a set S ⊆ {0, 1}, a round number r (initialised to 0), an auxiliary binary value b, and lists of (binary) values valuesr , r = 1, 2, . . . , each of which are initially empty. Messages. Messages of the form (EST , r, b) and (AUX , r, b), where r ≥ 1 and b ∈ {0, 1}, are sent and processed by non-faulty processes. Note that we have omitted the dependency on a label label and identifier ID for simplicity of exposition. BV-Broadcast. To exchange EST messages, the protocol relies on an all-toall communication abstraction, BV-broadcast [43], which is presented in Algorithm 1. When a process adds a value v ∈ {0, 1} to its array bin values[r] for some r ≥ 1, we say that v was BV-delivered. Functions. Let b ∈ {0, 1}. In addition to BV-broadcast and the communication primitives in our model, a process can invoke bin propose(b) to begin executing an instance of binary consensus with input b, and decide(b) to decide the value b. In a given instance of binary consensus, these two functions may be called exactly once. In addition, the function list.append(v) appends the value v to the list list.

140

C. Cachin et al.

Algorithm 1. Safe binary consensus routine 1: bin propose(v): 2: est ← v; r ← 0; 3: repeat: 4: r ← r + 1; 5: BV-broadcast(EST, r, est) 6: wait until (bin values[r] = ∅) 7: broadcast (AUX, r, bin values[r]) 8: wait until (|valuesr | ≥ n − t) ∧ (val ∈ bin values[r] for all val ∈ valuesr ) 9: b ← r (mod 2) 10: if val = w for all val ∈ valuesr where w ∈ {0, 1} then 11: est ← w; 12: if w = b then 13: decide(v) if not yet invoked decide() 14: else 15: est ← b 16: upon intial receipt of (AUX , r, b) for some b ∈ {0, 1} from process pj 17: valuesr .append(b) 18: BV-broadcast(EST, r, vi ): 19: broadcast (EST, r, vi ) 20: upon receipt of (EST, r, v) 21: if (EST, r, v) received from (t + 1) processes and not yet broadcast then 22: broadcast (EST, r, v) 23: if (EST, r, v) received from (2t + 1) processes then 24: bin values[r] ← bin values[r] ∪ {v}

To summarise Algorithm 1, the BV-broadcast component ensures that only a value proposed by a correct process may be decided, and the auxiliary broadcast component ensures that enough processes have received a potentially decidable value to ensure agreement. The interested reader can verify the correctness of the protocol and read a thorough description of how it operates in [17], where the details of the corresponding terminating protocol also reside. 3.3

Anonymity-Preserving All-to-All Reliable Broadcast

To reach eventual agreement in the presence of Byzantine processes without revealing who proposes what, we introduce the anonymity-preserving all-to-all reliable broadcast problem that preserves the anonymity of each honest sender which is reliably broadcasting. In this primitive, all processes are assumed to (anonymously) broadcast a message, and all processes deliver messages over time. It ensures that all honest processes always receive the same message from one (possibly faulty) sender while hiding the identity of any non-faulty sender. Let ID ∈ {0, 1}∗ be an identifier, a string that uniquely identifies an instance of anonymity-preserving all-to-all reliable broadcast, hereafter referred

Anonymity Preserving Byzantine Vector Consensus

141

to as AARB-broadcast. Let m be a message, and σ be the output of the call Sign(i, T, m) for some i ∈ {1, . . . , n}, where T = f (ID, label) for some function f as in traceable broadcast. Each process is equipped with two operations, “AARBP” and “AARB-deliver”. AARBP[ID](m) is invoked once with respect to ID and any message m, denoting the beginning of a process’ execution of AARBP with respect to ID. AARB-deliver[ID](m, σ) is invoked between n−t and n times over the protocol’s execution. When a process invokes AARB-deliver[ID](m, σ), they are said to “AARB-deliver” (m, σ) with respect to ID. Then, given t < n3 , we define a protocol that implements AARB-broadcast with respect to ID as satisfying the following six properties: 1. AARB-Signing: If a non-faulty process pi AARB-delivers a message with respect to ID, then it must be of the form (m, σ), where a process pi ∈ P invoked Sign(i, T, m) and obtained σ as output. 2. AARB-Validity: Suppose that a non-faulty process AARB-delivers (m, σ) with respect to ID. Let i = FindIndex(T, m, σ) denote the output of an idealised call to FindIndex. Then if pi is non-faulty, pi must have anonymously broadcast (m, σ). 3. AARB-Unicity: Consider any point in time in which a non-faulty process p has AARB-delivered more than one tuple with respect to ID. Let delivered = {(m1 , σ1 ), . . . , (ml , σl )}, where |delivered| = l, denote the set of these tuples. For each i ∈ {1, . . . , l}, let outi = FindIndex(T, mi , σi ) denote the output of an idealised call to FindIndex. Then for all distinct pairs of tuples {(mi , σi ), (mj , σj )}, outi = outj . 4. AARB-Termination-1: If a process pi is non-faulty and invokes AARBP[ID](m), all the non-faulty processes eventually AARB-deliver (m, σ) with respect to ID, where σ is the output of the call Sign(i, T, m). 5. AARB-Termination-2: If a non-faulty process AARB-delivers (m, σ) with respect to ID, then all the non-faulty processes eventually AARB-deliver (m, σ) with respect to ID. Firstly, we require AARB-Signing to ensure that the other properties are meaningful. Since messages are anonymously broadcast, properties refer to the index of the signing process determined by an idealised call to FindIndex. In spirit, AARB-Validity ensures if a non-faulty process AARB-delivers a message that was signed by a non-faulty process pi , then pi must have invoked AARBP. Similarly, AARB-Unicity ensures that a non-faulty process will AARB-deliver at most one message signed by each process. We note that AARB-Termination-1 is insufficient for consensus: without AARB-Termination-2, different processes may AARB-deliver different messages produced by the same process if it is faulty, as in the two-step algorithm implementing no-duplicity broadcast [6,49]. Finally, we state the anonymity property: 6. AARB-Anonymity: Suppose that non-faulty process pi invokes AARBP [ID](m) for some m and a given ID, and has previously invoked an arbitrary number of AARBP[IDj ](mj ) calls where ID = IDj for all such j. Suppose that an adversary A is required to output a guess g ∈ {1, . . . , n}, corresponding

142

C. Cachin et al.

to the identity of pi after performing an arbitrary number of computations, allowed oracle calls and invocations of networking primitives. Then for any A 1 . and run, P r(i = g) ≤ n−t Informally, AARB-Anonymity guarantees that the source of an anonymously broadcast message by a non-faulty process is unknown to the adversary, in that it is indistinguishable from n − t (non-faulty) processes. AARBP can be implemented by composing n instances of Bracha’s reliable broadcast algorithm , which we describe and prove correct in the companion technical report [8].

4

Anonymity-Preserving Vector Consensus

In this section, we introduce the anonymity-preserving vector consensus problem and present and discuss the protocol Anonymised Vector Consensus Protocol (AVCP) that solves it. The anonymity-preserving vector consensus problem brings anonymity to the vector consensus problem [23] where non-faulty processes reach an agreement upon a vector containing at least n − t proposed values. More precisely, anonymised vector consensus asserts that the identity of a process who proposes must be indistinguishable from that of all non-faulty processes. As in AARBP, instances of AVCP are identified uniquely by a given value ID. Each process is equipped with two operations. Firstly, “AVCP[ID](m)” begins execution of an instance of AVCP with respect to ID and proposal m. Secondly, “AVC-decide[ID](V )” signals the output of V from the instance of AVCP identified by ID, and is invoked exactly once per identifier. We define a protocol that solves anonymity-preserving vector consensus with respect to these operations as satisfying the following four properties. Firstly, we require an anonymity property, defined analogously to that of AARB-broadcast: 1. AVC-Anonymity: Suppose that non-faulty process pi invokes AVCP[ID](m) for some m and a given ID, and has previously invoked an arbitrary number of AVCP[IDj ](mj ) calls where ID = IDj for all such j. Suppose that an adversary A is required to output a guess g ∈ {1, . . . , n}, corresponding to the identity of pi after performing an arbitrary number of computations, allowed oracle calls and invocations of networking primitives. Then for any A 1 . and run, P r(i = g) ≤ n−t It also requires the original agreement and termination properties of vector consensus: 2. AVC-Agreement: All non-faulty processes that invoke AVC-decide[ID](V ) do so with respect to the same vector V for a given ID. 3. AVC-Termination: Every non-faulty process eventually invokes AVC-decide[ID](V ) for some vector V and a given ID. It also requires a validity property that depends on a pre-determined, deterministic validity predicate valid() [9,17] which we assume is common to all processes. We assume that all non-faulty processes propose a value that satisfies valid().

Anonymity Preserving Byzantine Vector Consensus

143

4. AVC-Validity: Consider each non-faulty process that invokes AVC-decide[ID](V ) for some V and a given ID. Each value v ∈ V must satisfy valid(), and |V | ≥ n − t. Further, at least |V | − t values correspond to the proposals of distinct non-faulty processes. 4.1

AVCP, the Anonymised Vector Consensus Protocol

We present a reduction to binary consensus which may converge in four message steps, as in the reduction to binary consensus of Democratic Byzantine Fault Tolerance (DBFT) [17], at least one of which must be performed over anonymous channels. We present the proof of correctness in the companion technical report [8]. We note that comparable problems [21], including agreement on a core set over an asynchronous network [4], rely on such a reduction but with a probabilistic solution. As in DBFT, we solve consensus deterministically by reliably broadcasting proposals which are then decided upon using n instances of binary consensus. The protocol is divided into two components. Firstly, the reduction component (Algorithm 2) reduces anonymity-preserving vector consensus to binary consensus. Here, one instance of AARBP and n instances of binary consensus are executed. But, since proposals are made anonymously, processes cannot associate proposals with binary consensus instances a priori. Consequently, processes start with n unlabelled binary consensus instances, and label them over time with the hash digest of proposals they deliver (of the form h ∈ {0, 1}∗ ). To cope with messages sent and received in unlabelled binary consensus instances, we require a handler component (Algorithm 3) that replaces function calls made in binary consensus instances. Functions. In addition to the communication primitives detailed in Sect. 2 and the two primitives “AVCP” and “AVC-decide”, the following primitives may be called: – “inst.bin propose(v)”, where inst is an instance of binary consensus and v ∈ {0, 1}, which begins execution of inst with initial value v. – “AARBP” and “AARB-deliver”, as in Sect. 3. – “valid()” as described above. – “m.keys()” (resp. “m.values()”), which returns the keys (resp. values) of a map m. – “item.key()”, which returns the key of item in a map m which is determined by context. – “s.pop()”, which removes and returns a value from set s. – “H(u, v)”, a cryptographic hash function which maps elements u, v ∈ {0, 1}∗ to h ∈ {0, 1}∗ . State. Each process tracks the following variables: – ID ∈ {0, 1}∗ , a common identifier for a given instance of AVCP.

144

C. Cachin et al.

– proposals[], which maps labels of the form l ∈ {0, 1}∗ to AARB-delivered messages of the form (m, σ) ∈ ({0, 1}∗ , {0, 1}∗ ) that may be decided, and is initially empty. – decision count, tracking the number of binary consensus instances for which a decision has been reached, initialised to 0. – decided ones, the set of proposals for which 1 was decided in the corresponding binary consensus instance, initialised to ∅. – labelled [], which maps labels, which are the hash digest h ∈ {0, 1}∗ of AARBdelivered proposals, to binary consensus instances, and is initially empty. – unlabelled , a set of binary consensus instances with no current label, which is initially of cardinality n. – ones[][], which maps two keys, EST and AUX , to maps with integer keys r ≥ 1 which map to a set of labels, all of which are initially empty. – counts[][], which maps two keys, EST and AUX , to maps with integer keys r ≥ 1 which map to an integer n ∈ {0, . . . , n}, all of which are initialised to 0. Messages. In addition to messages propagated in AARBP, non-faulty processes process messages of the form (ID, TAG, r, label , b), where TAG ∈ {EST , AUX }, r ≥ 1, label ∈ {0, 1}∗ and b ∈ {0, 1}. A process buffers a message (ID, TAG, r, label, b) until label labels an instance of binary consensus inst, at which point the message is considered receipt in inst. The handler, described below, ensures that all messages sent by non-faulty processes eventually correspond to a label in their set of labelled consensus instances (i.e., contained in labelled .keys()). Similarly, a non-faulty process can only broadcast such a message after labelling the corresponding instance of binary consensus. Processes also process messages of the form (ID, TAG, r, ones), where TAG ∈ {EST ONES , AUX ONES }, r ≥ 1, and ones is a set of strings corresponding to binary consensus instance labels. Reduction. In the reduction, presented in Algorithm 2, n (initially unlabelled) instances of binary consensus are used, each corresponding to a value that one process in P may propose. Each (non-faulty) process invokes AARBP with respect to ID and their value m (line 2), anonymously broadcasting (m , σ  ) inside the AARBP instance. On AARB-delivery of some message (m, σ), an unlabelled instance of binary consensus is deposited into labelled , whose key (label) is set to H(m, σ) (line 10). Proposals that fulfil valid() are stored in proposals (line 12), and inst.bin propose(1) is invoked with respect to the newly labelled instance inst = labelled [H(m, σ)] if not yet done (line 13). Upon termination of each instance (line 14), provided 1 was decided, the corresponding proposal is added to decided ones (line 16). For either decision value, decision count is incremented (line 16). Once 1 has been decided in n − t instances of binary consensus, processes will propose 0 in all instances that they have not yet proposed in (line 17). Note that upon AARB-delivery of valid messages after this point, bin propose(1) is not invoked at line 13. Upon the termination of all n instances

Anonymity Preserving Byzantine Vector Consensus

145

Algorithm 2. AVCP (1 of 2): Reduction to binary consensus 1: AVCP[ID](m ):  anonymised reliable broadcast of proposal 2: AARBP[ID](m ) 3: wait until |decided ones| ≥ n − t  wait until n − t instances terminate with 1 4: for each inst ∈ unlabelled ∪ labelled .values() such that 5: inst.bin propose() not yet invoked do  propose 0 in all binary consensus not yet invoked 6: Invoke inst.bin propose(0) 7: wait until decision count = n  wait until all n instances of binary consensus terminate 8: AVC-decide[ID](decided ones) 9: upon invocation of AARB-deliver[ID](m, σ) 10: labelled [H(m, σ)] ← unlabelled .pop()  deterministic, common validity function 11: if valid(m, σ) then 12: proposals[H(m, σ)] ← (m, σ) 13: Invoke labelled [H(m, σ)].bin propose(1) if not yet invoked 14: upon inst deciding a value v ∈ {0, 1}, where inst ∈ labelled .values() ∪ unlabelled 15: if v = 1 then  store proposals for which 1 was decided in the corresponding binary consensus 16: decided ones ← decided ones ∪ {proposals[inst.key()]} 17: decision count ← decision count + 1

of binary consensus (after line 7), all non-faulty processes decide their set of values for which 1 was decided in the corresponding instance of binary consensus (line 8). Handler. As proposals are anonymously broadcast, binary consensus instances cannot be associated with process identifiers a priori, and so are labelled by AARB-delivered messages. Thus, we require the handler, which overrides two of the three broadcast calls in the non-terminating variant of the binary consensus of [17] (Algorithm 1). We now describe the handler (Algorithm 3). Let inst be an instance of binary consensus. On calling inst.bin propose(b) (b ∈ {0, 1}) (and at the beginning of each round r ≥ 1), processes invoke BV-broadcast (line 5 of Algorithm 1), immediately calling “broadcast (ID, EST , r, label , b)” (line 19 of Algorithm 1). If b = 1, (ID, EST , r, label , 1) is broadcast, and label is added to the set ones[EST ][r] (line 21). Note that, given AARB-Termination-1, all messages sent by non-faulty processes of the form (ID, EST , r, label , 1) will be deposited in an instance inst labelled by label . Then, as the binary consensus routine terminates when all non-faulty processes propose the same value, all processes will decide the value 1 in n − t instances of binary consensus (i.e., will pass line 3), after which they execute bin propose(0) in the remaining instances of binary consensus. Since these instances may not be labelled when a process wishes to broadcast a value of the form (ID, EST , r, label, 0), we defer their broadcast until “broadcast(ID, EST , r, label, b)” is called in all n instances of binary consensus. At this point (line 23), (ID, EST ONES , r, ones[EST ][r]) is broadcast (line 24).

146

C. Cachin et al.

Algorithm 3. AVCP (2 of 2): Handler of Algorithm 1 18: upon “broadcast (ID,EST, r, label , b)” in inst ∈ labelled .values() ∪ unlabelled 19: if b = 1 then 20: broadcast (ID,EST, r, label , b) 21: ones[EST][r] ← ones[EST][r] ∪ {inst.key()} 22: counts[EST][r] ← counts[EST][r] + 1 23: if counts[EST][r] = n ∧ |ones[EST][r]| < n then 24: broadcast (ID,EST ONES, r, ones[EST][r]) 25: upon “broadcast (ID,AUX, r, label , b)” in inst ∈ labelled .values() ∪ unlabelled 26: if b = 1 then 27: broadcast (ID,AUX, r, label , b) 28: ones[AUX][r] ← ones[AUX][r] ∪ {inst.key()} 29: counts[AUX][r] ← counts[AUX][r] + 1 30: if counts[AUX][r] = n ∧ |ones[AUX][r]| < n then 31: broadcast (ID,AUX ONES, r, ones[AUX][r]) 32: upon receipt of (ID,TAG, r, ones) s.t. TAG ∈ {EST ONES, AUX ONES} 33: wait until one ∈ labelled .keys() ∀one ∈ ones 34: if TAG = EST ONES then 35: TEMP ← EST 36: else TEMP ← AUX 37: for each l ∈ labelled .keys() such that l ∈ / ones do 38: deliver (ID,TEMP, r, l, 0) in labelled [l] 39: for each inst ∈ unlabelled do 40: deliver (ID,TEMP, r, ⊥, 0) in inst

A message of the form (ID, EST ONES , r, ones) is interpreted as the receipt of zeros in all instances not labelled by elements in ones (at lines 38 and 40). This can only be done once all elements of ones label instances of binary consensus (i.e., after line 33). Note that if |ones[EST ][r] = n|, then there are no zeroes to be processed by receiving processes, and so the broadcast at line 24 can be skipped. Handling “broadcast(ID, AUX , r, label , b)” calls (line 7 of Algorithm 1) is identical to the handling of initial “ broadcast(ID, EST , r, label , b)” calls. Note that the third broadcast in the original algorithm, where (ID, EST , r, label , b) is broadcast upon receipt from t + 1 processes if not yet done before (line 21 of Algorithm 1 (BV-Broadcast)), can only occur once the corresponding instance of binary consensus is labelled. Thus, it does not need to be handled. From here, we can see that messages in the handler are processed as if n instances of the original binary consensus algorithm were executing. 4.2

Complexity and Optimizations

Let k be a security parameter, S the size of a signature and c the size of a message digest. In Table 1, we compare the message and communication complexities of AVCP against DBFT [17], which, as written, can be easily altered to solve

Anonymity Preserving Byzantine Vector Consensus

147

Table 1. Comparing the complexity of AVCP and DBFT [17] after GST [24] Complexity Best-case message complexity

AVCP

DBFT

3

O(n3 )

O(n ) 3

O(tn3 )

Worst-case message complexity O(tn ) Best-case complexity Worst-case bit complexity

O((S + c)n3 )

O(n3 )

3

O((S + c)tn ) O(tn3 )

vector consensus. We assume that AVCP is invoking the terminating variant of the binary consensus of [17]. When considering complexity, we only count messages in the binary consensus routines once the global stabilisation time (GST) has been reached [24]. Both best-free and worst-case message complexity are identical between the two protocols. We remark that there exist runs of AVCP where processes are faulty which has the best-case message complexity O(n3 ), such as when a process has crashed. AVCP mainly incurs greater communication complexity proportional to the size of the signatures, which can vary from size O(k) [35,53] to O(kn) [28]. If processes make a single anonymous broadcast per run, the best-case and worst-case bit complexities of AVCP are lowered to O(Sn2 + cn3 ) and O(Sn2 + ctn3 ). As is done in DBFT [17], we can combine the anonymity-preserving all-toall reliable broadcast of a message m and the proposal of the binary value 1 in the first round of a binary consensus instance. To this end, a process may skip the BV-broadcast step in round 1, which may allow AVCP to converge in four message steps, at least one of which must be anonymous. It may be useful to invoke “broadcastTAG[r](b)”, where TAG ∈ {EST , AUX } (lines 20 and 27) when the instance of binary consensus is labelled, rather than simply when b = 1 (i.e., the condition preceding these calls). Since it may take some time for all n instances of binary consensus to synchronise, doing this may speed up convergence in the “faster” instances.

5

Experiments

In order to evaluate the practicality of our solutions, we implemented our distributed protocols and deployed them on Amazon EC2 instances. We refer to each EC2 instance used as a node, corresponding to a ‘process’ as in the protocol descriptions. For each value of n (the number of nodes) chosen, we ran experiments with an equal number of nodes from four regions: Oregon (us-west-2), Ohio (us-east-2), Singapore (ap-southeast-1) and Frankfurt (eu-central-1). The type of instance chosen was c4.xlarge, which provide 7.5 GiB of memory, and 4 vCPUs, i.e., 4 cores of an Intel Xeon E5-2666 v3 processor. We performed between 50 and 60 iterations for each value of n and t we benchmarked. We varied n incrementally, and varied t both with respect to the maximum faulttolerance (i.e., t =  n−1 3 ), and also fixed t = 6 for values of n = 20, 40, . . . All networking code, and the application logic, was written in Python (2.7). As

148

C. Cachin et al.

we have implemented our cryptosystems in golang, we call our libraries from Python using ctypes1 . To simulate reliable channels, nodes communicate over TCP. Nodes begin timing once all connections have been established (i.e., after handshaking). Our protocol, Anonymised Vector Consensus Protocol (AVCP), was implemented on top of the existing DBFT [17] codebase, as was the case with our implementation of AARB-broadcast, i.e., AARBP. We do not use the fast-path optimisation described in Sect. 4, but we hash messages during reliable broadcast to reduce bandwidth consumption. We use the most conservative choice of ring signatures, O(kn)-sized traceable ring signatures [29], which require O(n) operations for each signing and verification call, and O(n2 ) work for tracing overall. Each process makes use of a single anonymous broadcast in each run of the algorithm. To simulate the increased latency afforded by using publicly-deployed anonymous networks, processes invoke a local timeout for 750 ms before invoking anon broadcast, which is a regular broadcast in our experiments.

(a) Comparing AARBP and AVCP

(b) Comparing DBFT and AVCP

Fig. 1. Evaluating the cost of the reliable anonymous broadcast (AARBP) in our solution (AVCP) and the performance of our solution (AVCP) compared to an efficient Byzantine consensus baseline (DBFT) without anonymity preservation

Figure 1a compares the performance of AARBP with that of AVCP. In general, convergence time for AVCP is higher as we need at least three more message steps for a process to decide. Given that the fast-path optimisation is used, requiring 1 additional message step over AARBP in the good case, the difference in performance between AVCP and AARBP would indeed be smaller. Comparing AVCP with t = max and t = 6, we see that when t = 6, convergence is slower. Indeed, AVC-Validity states at least n − t values fulfilling valid() are included in a process’ vector given that they decide. Consequently, as t is smaller, n − t is larger, and so nodes will process and decide more values. 1

https://docs.python.org/2/library/ctypes.html.

Anonymity Preserving Byzantine Vector Consensus

149

Although AARB-delivery may be faster for some messages, nodes generally have to perform more TRS verification/tracing operations. As nodes decide 1 in more instances of binary consensus, messages of the form (ID,TAG, r, ones) are propagated where |ones| is generally larger, slowing down decision time primarily due to the size of the message. Figure 1b compares the performance of DBFT to solve vector consensus against AVCP. Indeed, the difference in performance between AVCP and DBFT when n = 20 and n = 40 is primarily due to AVCP’s 750 ms timeout. As expected when scaling up n further, cryptographic operations result in increasingly slower performance for AVCP. Overall, AVCP performs reasonably well, reaching convergence when n = 100 between 5 and 7 s depending on t, which is practically reasonable, particularly when used to perform elections which are typically occasional.

6

Discussion

It is clear that anonymity is preserved if processes only use anonymous channels to communicate, provided that processes do not double-sign with ring signatures for each message type. For performance and to prevent long-term correlation attacks on anonymous networks like Tor [42], it may be of interest to use anonymous message passing to propose a value, and then to use regular channels for the rest of a given protocol execution. In this setting, the adversary can de-anonymise a non-faulty process by observing message transmission time [2] and the order in which messages are sent and received [48]. For example, a single non-faulty process may be relatively slow, and so the adversary may deduce that messages it delivers late over anonymous channels were sent by that process. Achieving anonymity in this setting in practice depends on the latency guarantees of the anonymous channels, the speed of each process, and the latency guarantees of the regular channels. One possible strategy could be to use public networks like Tor [22] where message transmission time through the network can be measured.2 Then, based on the behaviour of the anonymous channels, processes can vary the timing of their own messages by introducing random message delays [42] to minimise the correlation between messages over the different channels. It may also be useful for processes to synchronise between protocol executions. This prevents a process from being de-anonymised when they, for example, invoke anon send in some instance when all other processes are executing in a different instance. In terms of future work, it is of interest to evaluate anonymity in different formal models [36,51] and with respect to various practical attack vectors [48]. It will be useful also to formalise anonymity under more practical assumptions so that the timing of anonymous and regular message passing do not correlate highly. In addition, a reduction to a randomized [10] binary consensus algorithm would remove the dependency on the weak coordinator used in each round of the binary consensus algorithm we rely on [17]. 2

https://metrics.torproject.org/.

150

C. Cachin et al.

Acknowledgment. This research is supported under Australian Research Council Discovery Projects funding scheme (project number 180104030) entitled “Taipan: A Blockchain with Democratic Consensus and Validated Contracts” and Australian Research Council Future Fellowship funding scheme (project number 180100496) entitled “The Red Belly Blockchain: A Scalable Blockchain for Internet of Things”.

References 1. Adida, B.: Helios: web-based open-audit voting. In: USENIX Security, pp. 335–348 (2008) 2. Back, A., M¨ oller, U., Stiglic, A.: Traffic analysis attacks and trade-offs in anonymity providing systems. In: International Workshop on Information Hiding, pp. 245–257 (2001) 3. Bellare, M., Rogaway, P.: Random oracles are practical: a paradigm for designing efficient protocols. In: CCS, pp. 62–73 (1993) 4. Ben-Or, M., Kelmer, B., Rabin, T.: Asynchronous secure computations with optimal resilience. In: PODC, pp. 183–192 (1994) 5. Bracha, G.: Asynchronous Byzantine agreement protocols. Inf. Comput. 75(2), 130–143 (1987) 6. Bracha, G., Toueg, S.: Resilient consensus protocols. In: PODC, pp. 12–26 (1983) 7. Bracha, G., Toueg, S.: Asynchronous consensus and broadcast protocols. JACM 32(4), 824–840 (1985) 8. Cachin, C., Collins, D., Crain, T., Gramoli, V.: Anonymity preserving Byzantine vector consensus. CoRR abs/1902.10010 (2020). http://arxiv.org/abs/1902.10010 9. Cachin, C., Kursawe, K., Petzold, F., Shoup, V.: Secure and efficient asynchronous broadcast protocols. In: CRYPTO, pp. 524–541 (2001) 10. Cachin, C., Kursawe, K., Shoup, V.: Random oracles in constantinople: practical asynchronous Byzantine agreement using cryptography. J. Cryptol. 18(3), 219–246 (2005) 11. Camp, J., Harkavy, M., Tygar, J.D., Yee, B.: Anonymous atomic transactions. In: In Proceedings of the 2nd USENIX Workshop on Electronic Commerce (1996) 12. Castro, M., Liskov, B.: Practical Byzantine fault tolerance. In: OSDI, vol. 99, pp. 173–186 (1999) 13. Chaum, D.: Blind signatures for untraceable payments. In: Chaum, D., Rivest, R.L., Sherman, A.T. (eds.) Advances in Cryptology, pp. 199–203. Springer, Boston, MA (1983). https://doi.org/10.1007/978-1-4757-0602-4 18 14. Chaum, D.: The dining cryptographers problem: unconditional sender and recipient untraceability. J. Cryptol. 1(1), 65–75 (1988) 15. Chaum, D.L.: Untraceable electronic mail, return addresses, and digital pseudonyms. CACM 24(2), 84–90 (1981) 16. Correia, M., Neves, N.F., Ver´ıssimo, P.: From consensus to atomic broadcast: timefree Byzantine-resistant protocols without signatures. Comput. J. 49(1), 82–96 (2006) 17. Crain, T., Gramoli, V., Larrea, M., Raynal, M.: DBFT: efficient leaderless Byzantine consensus and its application to blockchains. In: NCA, pp. 1–8 (2018) 18. Cramer, R., Franklin, M., Schoenmakers, B., Yung, M.: Multi-authority secretballot elections with linear work. In: Eurocrypt, pp. 72–83 (1996) 19. Danezis, G., Diaz, C.: A survey of anonymous communication channels. Technical report, MSR-TR-2008-35, Microsoft Research (2008)

Anonymity Preserving Byzantine Vector Consensus

151

20. Desmedt, Y.: Threshold cryptosystems. In: Seberry, J., Zheng, Y. (eds.) AUSCRYPT 1992. LNCS, vol. 718, pp. 1–14. Springer, Heidelberg (1993). https:// doi.org/10.1007/3-540-57220-1 47 21. Diamantopoulos, P., Maneas, S., Patsonakis, C., Chondros, N., Roussopoulos, M.: Interactive consistency in practical, mostly-asynchronous systems. In: ICPADS, pp. 752–759 (2015) 22. Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: USENIX Security, pp. 21–21 (2004) 23. Doudou, A., Schiper, A.: Muteness failure detectors for consensus with Byzantine processes. In: PODC, p. 315 (1997) 24. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. JACM 35(2), 288–323 (1988) 25. ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31(4), 469–472 (1985) 26. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. JACM 32(2), 374–382 (1985) 27. Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-54024676-3 1 28. Fujisaki, E.: Sub-linear size traceable ring signatures without random oracles. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E95.A, 393–415 (2011) 29. Fujisaki, E., Suzuki, K.: Traceable ring signature. In: Okamoto, T., Wang, X. (eds.) PKC 2007. LNCS, vol. 4450, pp. 181–200. Springer, Heidelberg (2007). https:// doi.org/10.1007/978-3-540-71677-8 13 30. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: STOC, pp. 169– 178 (2009) 31. Gilad, Y., Herzberg, A.: Spying in the dark: TCP and Tor traffic analysis. In: Fischer-H¨ ubner, S., Wright, M. (eds.) PETS 2012. LNCS, vol. 7384, pp. 100–119. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31680-7 6 32. Golle, P., Jakobsson, M., Juels, A., Syverson, P.: Universal re-encryption for mixnets. In: Okamoto, T. (ed.) CT-RSA 2004. LNCS, vol. 2964, pp. 163–178. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24660-2 14 33. Golle, P., Juels, A.: Dining cryptographers revisited. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 456–473. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24676-3 27 34. Groth, J.: Efficient maximal privacy in boardroom voting and anonymous broadcast. In: Juels, A. (ed.) FC 2004. LNCS, vol. 3110, pp. 90–104. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27809-2 10 35. Gu, K., Dong, X., Wang, L.: Efficient traceable ring signature scheme without pairings. Adv. Math. Commun. 14(2), 207–232 (2019) 36. Halpern, J.Y., O’Neill, K.R.: Anonymity and information hiding in multiagent systems. J. Comput. Secur. 13(3), 483–514 (2005) 37. Jakobsson, M.: A practical mix. In: Nyberg, K. (ed.) EUROCRYPT 1998. LNCS, vol. 1403, pp. 448–461. Springer, Heidelberg (1998). https://doi.org/10.1007/ BFb0054145 38. Juels, A., Catalano, D., Jakobsson, M.: Coercion-resistant electronic elections. In: Chaum, D., Jakobsson, M., Rivest, R.L., Ryan, P.Y.A., Benaloh, J., Kutylowski, M., Adida, B. (eds.) Towards Trustworthy Elections. LNCS, vol. 6000, pp. 37–63. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12980-3 2

152

C. Cachin et al.

39. Kulyk, O., Neumann, S., Volkamer, M., Feier, C., Koster, T.: Electronic voting with fully distributed trust and maximized flexibility regarding ballot design. In: EVOTE, pp. 1–10 (2014) 40. Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. TOPLAS 4(3), 382–401 (1982) 41. Liu, J.K., Wei, V.K., Wong, D.S.: Linkable spontaneous anonymous group signature for ad hoc groups. In: Wang, H., Pieprzyk, J., Varadharajan, V. (eds.) ACISP 2004. LNCS, vol. 3108, pp. 325–335. Springer, Heidelberg (2004). https://doi.org/ 10.1007/978-3-540-27800-9 28 42. Mathewson, N., Dingledine, R.: Practical traffic analysis: extending and resisting statistical disclosure. In: Martin, D., Serjantov, A. (eds.) PET 2004. LNCS, vol. 3424, pp. 17–34. Springer, Heidelberg (2005). https://doi.org/10.1007/11423409 2 43. Most´efaoui, A., Moumen, H., Raynal, M.: Signature-free asynchronous binary Byzantine consensus with t < n/3, O(n2 ) messages, and O(1) expected time. JACM 62(4), 31 (2015) 44. Murdoch, S.J., Danezis, G.: Low-cost traffic analysis of Tor. In: S&P, pp. 183–195 (2005) 45. Neff, C.A.: A verifiable secret shuffle and its application to e-voting. In: CCS, pp. 116–125 (2001) 46. Neves, N.F., Correia, M., Verissimo, P.: Solving vector consensus with a wormhole. TPDS 16(12), 1120–1131 (2005) 47. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X 16 48. Raymond, J.-F.: Traffic analysis: protocols, attacks, design issues, and open problems. In: Federrath, H. (ed.) Designing Privacy Enhancing Technologies. LNCS, vol. 2009, pp. 10–29. Springer, Heidelberg (2001). https://doi.org/10.1007/3-54044702-4 2 49. Raynal, M.: Reliable broadcast in the presence of Byzantine processes. In: FaultTolerant Message-Passing Distributed Systems, pp. 61–73. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94141-7 4 50. Rivest, R.L., Shamir, A., Tauman, Y.: How to leak a secret. In: Boyd, C. (ed.) ASIACRYPT 2001. LNCS, vol. 2248, pp. 552–565. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45682-1 32 51. Serjantov, A., Danezis, G.: Towards an information theoretic metric for anonymity. In: Dingledine, R., Syverson, P. (eds.) PET 2002. LNCS, vol. 2482, pp. 41–53. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36467-6 4 52. Shoup, V., Gennaro, R.: Securing threshold cryptosystems against chosen ciphertext attack. J. Cryptol. 15(2), 75–96 (2002) 53. Tsang, P.P., Wei, V.K.: Short linkable ring signatures for E-voting, E-cash and attestation. In: Deng, R.H., Bao, F., Pang, H.H., Zhou, J. (eds.) ISPEC 2005. LNCS, vol. 3439, pp. 48–60. Springer, Heidelberg (2005). https://doi.org/10.1007/ 978-3-540-31979-5 5 54. Tsang, P.P., Wei, V.K., Chan, T.K., Au, M.H., Liu, J.K., Wong, D.S.: Separable linkable threshold ring signatures. In: Canteaut, A., Viswanathan, K. (eds.) INDOCRYPT 2004. LNCS, vol. 3348, pp. 384–398. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30556-9 30 55. Yao, A.C.C.: How to generate and exchange secrets. In: FOCS, pp. 162–167 (1986) 56. Zantout, B., Haraty, R.: I2P data communication system. In: ICN, pp. 401–409 (2011)

CANSentry: Securing CAN-Based Cyber-Physical Systems against Denial and Spoofing Attacks Abdulmalik Humayed1,3 , Fengjun Li1 , Jingqiang Lin2,4 , and Bo Luo1(B) 1

2

The University of Kansas, Lawrence, KS, USA {fli,bluo}@ku.edu School of Cyber Security, University of Science and Technology of China, Hefei, China 3 Jazan University, Jazan, Saudi Arabia [email protected] 4 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China [email protected]

Abstract. The Controller Area Network (CAN) has been widely adopted as the de facto standard to support the communication between the ECUs and other computing components in automotive and industrial control systems. In its initial design, CAN only provided very limited security features, which is seriously behind today’s standards for secure communication. The newly proposed security add-ons are still insufficient to defend against the majority of known breaches in the literature. In this paper, we first present a new stealthy denial of service (DoS) attack against targeted ECUs on CAN. The attack is hardly detectable since the actions are perfectly legitimate to the bus. To defend against this new DoS attack and other denial and spoofing attacks in the literature, we propose a CAN firewall, namely CANSentry, that prevents malicious nodes’ misbehaviors such as injecting unauthorized commands or disabling targeted services. We implement CANSentry on a cost-effective and open-source device, to be deployed between any potentially malicious CAN node and the bus, without needing to modify CAN or existing ECUs. We evaluate CANSentry on a testing platform built with parts from a modern car. The results show that CANSentry successfully prevents attacks that have shown to lead to safety-critical implications.

1

Introduction

The Controller Area Network (CAN ), also referred to as the CAN bus, has been widely adopted as the communication backbone of small and large vehicles, ships, planes, and industrial control systems. When CAN was originally developed, its nodes were not technically ready to be connected to the external world and thus assumed to be isolated and trusted. As a result, CAN was designed without basic security features such as encryption, authentication that are now considered c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 153–173, 2020. https://doi.org/10.1007/978-3-030-58951-6_8

154

A. Humayed et al.

essential to communication networks [14,19]. The protocol’s broadcast nature also increases the likelihood of attacks exploiting these security vulnerabilities. Therefore, a malicious message injected by a compromised electronic control unit (ECU) on the bus, if it conforms to CAN specifications, will be treated the same as a legitimate message from a benign ECU and broadcasted over the bus. Moreover, wireless communication capabilities have been added to CAN nodes for expanded functionality (e.g., TPMS, navigation, entertainment systems) without carefully examining the potential security impacts [20,21,43,46]. This enlarges the attack vector for remote attacks by allowing some previously physically isolated units to connect to external entities through wireless connections. Various attacks have been reported in recent years, ranging from simple bus denial attacks [12,36] and spoofing attacks [27,32] to more sophisticated busoff [23,42] and arbitration denial attacks [3,12,27,36]. In response to the vulnerabilities and attacks, significant research efforts have been devoted to automobile and CAN security. Solutions such as message authentication and intrusion detection, akin to those designed for Internet security, have been proposed to secure CAN [49,50]. However, these proposals suffer from major efficiency issues. For instance, the authentication-based schemes deploy cryptographic keys among ECUs to generate MACs, which inevitably incur non-negligible processing overhead and significantly impact CAN’s transmission speed. Moreover, such defense mechanisms can be defeated if the adversary exploits compromised ECUs to nullify authentication or MAC frames [3]. The intrusion detection-based mechanisms leverage ECUs’ behavioral features to detect abnormal frames on the bus, hence, they have to collect a sufficient number of frames by monitoring the bus and ECUs, which inevitably results in delays in the detection. In this paper, we first present a new stealthy arbitration denial attack against CAN, which takes seemingly legitimate actions to prevent selected CAN nodes from sending messages to the bus. This attack bypasses CAN controllers to inject deliberately crafted messages without triggering packet-based detectors, which allows the adversary to extend the length of the attack and cause severe damages. For instance, the attack can block all steering and breaking messages from being sent to the bus without causing any bus error, while keeping other sub-systems such as the engine operating normally. We implement the attack using an STM32 Nucleo-144 board as the attacker and an instrument panel cluster from a 2014 passenger car as the victim and demonstrate a successful attack. Moreover, we propose a new defense mechanism, namely CANSentry, against the family of general CAN denial and spoofing attacks. This system-level control mechanism addresses the fundamental CAN security problem caused by the protocol’s broadcast nature and lack of authentication without demanding any modification to the CAN standard or the ECUs. CANSentry leverages the malicious behavior of the attacks, e.g., inconsistency between attack ECU state and the bus state, to filter frames from high-risk ECUs (i.e., a few external-facing ECUs) in a bit-by-bit fashion and block unauthorized frames from being sent to the bus. We implement a cost-efficient prototype of CANSentry using the STM32

CANSentry: Securing CAN-Based Cyber-Physical Systems

155

Nucleo-144 development board and demonstrate its effectiveness against recent denial and spoofing attacks including the one we propose in this paper. The contributions of this paper are three-fold: (1) we demonstrate a stealthy selective arbitration denial attack against CAN that prevents selected ECUs from transmitting to the CAN bus without triggering any error or anomaly. (2) More importantly, we design a proof-of-concept CAN firewall, CANSentry, which is deployed between high-risk CAN nodes and the bus to defend against bus/ECU denial attacks and ECU spoofing attacks. And (3) we implement the proposed CAN firewall using low-cost hardware and demonstrate its effectiveness on a platform consisting of components from modern passenger cars. The rest of the paper is organized as follows: we introduce the CAN bus and CAN attacks in Sect. 2 and the threat model in Sect. 3. Then, we present the stealthy selective arbitration denial attack in Sect. 4, and CANSentry design and implementation in Sect. 5, followed by security analysis in Sect. 6. Finally, we conclude the paper in Sect. 7.

2 2.1

Background and Related Work The Controller Area Network (CAN)

CAN has been widely used in many distributed real-time control systems . While components differ, the main concepts/mechanisms remain similar in all CAN applications. A CAN network consists of nodes interconnected by a differential bus. Each node is controlled by a microcontroller (MCU), a.k.a. electronic control unit (ECU) in automotive CAN networks. A node connects to the bus through a controller and a transceiver. The controller is a stand-alone circuit or an MCU module, which implements the protocol, e.g., encoding/decoding frames and error handling. The transceiver converts between logic data and physical bits. Frame Prioritization. In CAN, frame prioritization is realized by the arbitration ID (a.k.a. CAN ID). Figure 1 (a) shows a simplified CAN frame. It starts with the 11-bit arbitration ID (or 29-bit in the extended format), which determines the frame’s priority on the bus as well as the frame’s relevance to receivers, based on which each receiver decides to accept or ignore the frame. When multiple nodes attempt to send to the bus simultaneously, the lowest ID indicates the highest priority and wins the arbitration. To support prioritization of frames, the CAN specification defines dominant and recessive bits, denoted by “0” and “1”, respectively. Whenever a dominant and a recessive bit are sent at the same time by different nodes, the dominant bit will dominate the bus. This mechanism allows CAN to resolve real-time conflicts during arbitration and transmission. CAN Error Handling. Error handling in CAN allows nodes to autonomously detect and resolve transmission errors without third-party intervention. It also supports fault confinement and the containment of defective nodes. There are five types of errors in CAN: bit, ACK, stuff, form, and CRC errors. Each ECU is responsible for detecting and keeping track of both transmission and receiving errors with two error counters, transmit error counter (TEC) and receive error

156

A. Humayed et al.

counter (REC). Depending on the role of the ECU, one of the counters will increase when an error is detected or decrease after a successful transmission or reception. The ECU determines its error state according to the values of both counters, as shown in Fig. 1 (b). The transitions between error states allow nodes to treat temporary errors and permanent failures differently. In particular, when a node encounters persistent transmission errors that may affect other nodes, it switches to bus-off state, resulting in its complete isolation from the CAN bus.

Fig. 1. The CAN Bus: (a) Format of a CAN data frame; (b) CAN error state transition. *: RESET or reception of 128 occurrences of 11 recessive bits.

2.2

Existing CAN Attacks

In this paper, we focus on denial and spoofing attacks against CAN. Denial attacks are further classified into bus, ECU, and arbitration denial. Bus Denial. A naive approach is to flood the bus with a stream of 0 × 0 IDs so the bus is always occupied with this highest priority ID (BD1 ) [12,27]. Another approach is to occupy the bus by transmitting a stream of dominant bits and prevent any ECU from transmitting. However, this bit-stream does not conform to CAN protocol, hence, it cannot be dispatched by CAN controllers in regular mode. [12] exploited the Test Mode in some CAN controllers to flood the bus with dominant bits (BD2 ). [36] built a malicious ECU without the CAN controller to allow them to launch the dominant bit-stream attack (BD3 ). ECU Denial. The attacker attempts to force an ECU to a bus-off state (TEC>255) and eliminates all the functionalities managed by the target ECU. Three types of ECU denial attacks have been proposed in the literature: (1) CAN Controller Abuse (ED1 ). [12] exploited the ID Ready Interrupt feature and the Test Mode on some ECUs to inject dominant bits when the target ID attempts to transmit. This would trigger transmission bit errors at the target ECU, as it detects the discrepancies between what it sends and what it sees on the bus. Repeating this attack would gradually increase the transmitter’s TEC, and eventually force it to bus-off. (2) Malicious Frames (ED2 ). An adversarial ECU may send a frame with identical contents to the targeted ECU’s, except replacing a recessive bit with a dominant one, to trigger a bit error at the transmitting ECU [3,12]. (3) Bypassed CAN Controller (ED3 ). Adversaries directly connect malicious/compromised ECUs to CAN transceivers to inject arbitrary

CANSentry: Securing CAN-Based Cyber-Physical Systems

157

bit streams to the bus. [23,36,42] implemented this attack to overwrite recessive bits from the target ECU by dominant ones to trigger bit errors at the transmitting ECU. Arbitration Denial. The adversarial ECU attempts to inject frames with the lowest possible ID (0 × 0) to win the arbitration over any other CAN ID and prevent all non-0 × 0 IDs from sending to the bus, so that the bus becomes completely non-functional [12]. It could also target on a selected ID by transmitting an ID of higher priority whenever that ID starts transmission [22]. We categorize both cases as AD1. [36] monitored the bus with an MCU connected through a CAN transceiver (without CAN controller) to detect the target ID and then inject a dominant bit that replaces a recessive bit in the ID field (AD2 ). The target ID loses arbitration and is unaware of the attack, however, the attack results in an incomplete frame that causes a form error. In Sect. 4, we propose a stealthier version so that no error frame is generated (AD3 ). Table 1. Summary of CAN security control mechanisms: features and effectiveness against attacks discussed in Sect. 2.2. Control

Features

Effectiveness against attacks

Inj.

Aper.

RT

Cost

BD1

BD2

BD3

ED1

ED2

ED3

AD1

AD2

AD3

Spoof

C1









D

D

D

D

D

D

D

D

D

D

C2









D







D



D





D

C3









D







D



D





D

C4















P

P

P

P

P

P

P

C5









P







P



P





P

C6









P







D



D





P

C7









P







D



P





P

CANSentry     P P P P P P P P P P Features: Inj.: preventing injection of incomplete frames or random bits, Aper.: handling aperiodic attacks, RT: real-time defense; Cost: low cost. Effectiveness: D: Detect, P: Prevent, -: No protection Controls: C1: Anomaly-based IDS [13, 19, 37, 38, 51, 52]; C2: Voltage-based IDS [4–7, 10, 35]; C3: Time-based IDS [4, 15]; C4: CAN-ID Obfuscation [17, 22, 24, 29, 58, 59]; C5: Counterattacking [8, 28, 30, 48]; C6: Authentication [18, 41, 44, 53, 54]; C7: Application-level Firewalls [26, 45].

Spoofing Attacks. Due to the lack of authentication in CAN and the broadcast nature of the bus, a compromised ECU could easily send CAN frames with any ID, including IDs that belong to other legitimate/critical ECUs. In [27, 32], attackers compromised an ECU through a remote channel, and sent CAN frames to unlock doors, stall the engine, or control the steering wheel. In the masquerade attack, the attacker first disables the target ECU and then transmits its frames [40]. In the conquest attack, the target ECU is fully compromised to transmit legitimate frames but with malicious intentions [40]. Lastly, [31] sent spoofed diagnostic frames to force a target ECU into diagnostic session, which would stop transmitting frames until the diagnostic session is terminated.

158

2.3

A. Humayed et al.

Existing Controls and Limitations

IDS for CAN. Due to the high predictability of CAN traffic, anomaly detection becomes a viable solution. For example, [13,19,37,38,51,52] monitor CAN traffic and detect frame injection attacks based on the analysis of traffic pattern and packet contents. Meanwhile, due to the lack of sender authentication in CAN, a number of fingerprinting approaches are proposed to identify senders based on physical layer properties. For example, ECUs could be profiled and identified with time/clock features [4,15] or electric/voltage features [5–7,10,25,35]. These approaches associate frames with their legitimate senders. When a mismatch is detected, an attack is assumed. They are effective against the spoofing attacks and some of the denial attacks, as shown in Table 1. The IDS mechanisms do not provide real-time detection except [10], which detects a malicious frame before it completes transmission. In addition, they all assume that packets conform to CAN protocol. Therefore, they cannot recognize many random bit or incomplete frame injection attacks discussed in Sect. 2.2. CAN-ID Obfuscation. Many attacks discussed in Sect. 2.2 target specific CAN IDs. Several solutions have been proposed to obfuscate CAN IDs to confuse or deter the attackers, e.g., ID-Hopping [22,59], ID Randomization [17,24], ID Obfuscation [29], and ID Shuffling [58]. The effectiveness of these solutions rely on the secrecy of the obfuscation mechanism, i.e., the assumption that the obfuscated CAN IDs are known to the legitimate ECUs but not the attackers. This assumption may not be practical in real world applications. Counterattacking. The counterattack mechanisms attempt to stop CAN intrusions by attacking the source. In [30], owners of CAN IDs (legitimate ECUs) would detect spoofed frames on the bus, and interrupt the sender by sending an error frame. [8] proposed a similar mechanism that induced collisions with the spoofed frames until the attacking ECU was forced into bus-off. [28] proposed a counterattack technique with a central monitoring node, which shared secret keys with legitimate ECUs for authentication and spoof detection. [48] launched a bus-off attack against the adversarial ECU of the attacks proposed in [3]. The counterattack approaches need to add detection and counterattacking capacity to all ECUs, so that they can protect their own CAN IDs. This is not cost-effective, and may be impractical for some mission-critical ECUs. Authentication. Cryptography-based solutions have been proposed to enable node (ECU) authentication, and to support data confidentiality and integrity [34, 54,55,60]. In addition, controls against attacks from external devices have been introduced [9, 16,47,57]. They require encryption, key management, and collaboration between ECUs, which implies significant changes to the existing CAN infrastructure. Additional communication overhead and delays are also expected, which is undesired in real-time environments such as CAN. Firewalls. The idea of an in-vehicle firewall was first suggested in [56] to enforce authentication and authorization of CAN nodes, but it did not articulate any technical details behind the concept. Application-level firewalls have been introduced in [26,45], which require intensive modification at the CAN node (in both

CANSentry: Securing CAN-Based Cyber-Physical Systems

159

software and hardware) and system-wide cryptographic capabilities to CAN. This is impractical in real world CAN systems. Finally, the attacks performed by the skipper model [36,42] cannot be prevented using such solutions. In Table 1, we compare different CAN defense approaches with our CANSentry solution in terms of features and capabilities of detecting or preventing the attacks discussed in Sect. 2.2. We also like to note that existing countermeasures do not consider cases where the attacker abuses or bypasses the CAN controller to send arbitrary bit streams or incomplete frames, e.g. [12,23,36]. Meanwhile, controls that transmit frames to the bus, e.g., counterattacks, are also vulnerable to denial attacks that would nullify their effectiveness. Lastly, many existing solutions require overhead such as renovating the protocol, upgrading all or many ECUs, employing statistical learning for IDS, etc., which makes them less cost-efficient and impractical.

3

The Threat Model

Attackers’ Objectives. In this paper, we study two attacks, denial and spoofing. In denial attacks, the attacker aims to nullify certain functionality of an ECU or the CAN bus, such as forcing an ECU into bus-off, stopping a CAN ID from sending to the bus, or occupying the bus. A stealthy denial is to achieve the goal without being noticed by the target ECU or the bus. In spoofing, the attacker aims to spoof a selected ID to send frames to the bus and deceive the receiver, which can trigger undesired consequences ranging from displaying incorrect information to disabling the brakes or the steering wheel [27,32,39]. Attacker’s Capabilities and Limitations. Attackers with local or remote access have been demonstrated in the literature [2,11,27,32]. We consider two attacker models with different capabilities, based on which, different attacks could be realized that vary in impact and sophistication. In both cases, we assume that the attacker knows the design and specifications of the targeted system (automobile) model through open documentation or reverse engineering. For example, the attacker knows the IDs and functionalities of the ECUs. 1. CAN Abusers. The attacker has a local or remote access to an ECU. The attacker obtains full control of all software components of the ECU but cannot alter the hardware. The attacker is assumed to be able to sniff the bus and inject frames to abuse one or more of the CAN’s basic functionalities: arbitration and error handling. Note that the integrity of the CAN controller is preserved, so that all the injected frames conform to the CAN protocol. When an attacker transmits a frame with a high priority ID, the frame wins arbitration over legitimate low priority IDs. On the other hand, the attacker can transmit an almost identical frame with a small difference to trigger error handling and eventually force a target ECU to bus-off state. The attacker may also abuse a feature provided by some CAN controllers, e.g., the “Test Mode”, to inject dominant bits [12]. 2. The Skippers. An attacker can sniff and inject arbitrary CAN traffic by skipping the CAN controller. This allows her to inject bits at any time without having

160

A. Humayed et al.

to be limited by the rules enforced by CAN controllers. This could be achieved through direct or remote access: (1) Direct Access: the attacker could connect a malicious MCU and a CAN transceiver to the OBD-II port (e.g., [36,42]), or connect the MCU through an aftermarket OBD-II adapter (e.g. [11]). This approach requires brief physical access to the vehicle. (2) Remote Access: the attacker may first compromise an ECU (usually one with remote access capabilities such as WiFi or cellular), and then exploit certain hardware design vulnerabilities to directly connect to the bus (e.g. [32]). In particular, [12] showed that 78% of the MCUs deployed in vehicles have built-in CAN controllers, which connect to CAN transceivers through GPIO pins. A compromised MCU could change GPIO configurations to disconnect the CAN controller, and then directly connect itself to the transceiver. In both cases, the attacker “skips” the CAN controller, analyzes the traffic in a bit-by-bit fashion in a process known as “Bit Banging”, and injects arbitrary bit-streams to the bus. We assume that the attacker does not have unlimited and uncontrolled access to alter or to break the integrity of CAN hardware. The attacker could access the bus through any legitimate access point, or through existing hardware/software vulnerabilities, but she cannot alter the hardware to create new vulnerabilities or access points. That is, the attacker may have brief access to the car to connect a malicious ECU to the OBD-II port. However, she cannot disassemble the system to weld a new port or a new attacking device directly to the bus.

4

The Stealthy Selective Arbitration Denial Attack

In this section, we improve the design of the arbitration denial attack presented in [36] with two added features: stealthiness and inexpensive hardware implementation. We present the implementation of the new attack and evaluate it with an Instrument Panel Cluster (IPC) from a used passenger car. Attack Objectives. The objective of the attack is to stealthily prevent frames with specific IDs from being sent to the bus. The attack is expected to have the following features: (1) Selective: only specific IDs are denied, while other ECUs/frames all function as expected. (2) Stealthy: the attack should conform to CAN standard and should not trigger any error on the bus. Hence, the attacker controls the damage and extends the length of the attack. And (3) Practical: the attack should not require expensive hardware or extended access to the bus. Adversary Model. The attacker needs to bypass CAN controllers’ restrictions to perform bit-by-bit analysis/manipulation on the bus. Therefore, the attacker connects directly to the CAN transceiver, i.e., the Skipper model in Sect. 3. Challenges. First, most of existing CAN tools (e.g., CANoe, VehicleSpy, SocketCAN only work with full CAN frames. However, we need to monitor the bus at bit-level and inject bits at any arbitrary time. Next, the attack requires a high degree of precision to the bit-level timing. Last, since the skipper model operates without a CAN controller, any unexpected operation delay, premature

CANSentry: Securing CAN-Based Cyber-Physical Systems

161

injection, or malformed CAN frame will result in bus errors, which may render the attack unsuccessful or detectable. The Attack. The proposed attack passively monitors the bus to detect a specific CAN ID in the arbitration phase. The attacker waits for the last recessive bit in the target ID, and overwrites it with a dominant one, to beat the targeted ECU in arbitration. The attacker completes the transmission with a fake frame, so that it would not trigger any error flag (as in [36]). Hence, the malfunctioning on the bus cannot be detected, and the attacking ECU cannot be identified. Attack Implementation and Evaluation. Most of the current CAN research that require precise timing use automotive-grade micro-controllers or other expensive tools. We use an open-source tool, CANT [1], which facilitates the synchronization with the bus and bit-level analysis/manipulation on the bus. We connect the following to the CAN bus: (1) attacker: an STM32 Nucleo-144 board (connected through OBD-II) running CANT; (2) victim: the IPC of a used 2014 passenger car; (3) other nodes: simulated by BeagleBone Black (BBB) microcontrollers. In the experiments, when we launch the proposed attack against ID 0 × 9A (turn signals), the turn signals become unresponsive regardless of the status of the turn signal lever, since their control messages are blocked. We compare the stealthiness of the proposed attack with the other denial attacks discussed in Sect. 2.2: ECU denial [12,42] and arbitration denial[12,36]. We assign a BBB microcontroller as the victim ECU and capture its errors (TEC, REC and error state). The ECU denial attack (Fig. 2 (a)) causes a sharp increase of TEC at the victim ECU, to force it into bus-off state. It also increases the REC counters on other ECUs. The arbitration denial attack (Fig. 2 (b)) interrupts the victim ECU and causes a form error, which also drives all ECUs into error passive mode, since incomplete frames are detected on the bus. Finally, our attack (Fig. 2 (c)) is stealthy as it achieves the arbitration denial without causing any error on any ECU.

Fig. 2. Comparison of the stealthiness of CAN denial attacks: (a) ECU denial; (b) arbitration denial; (c) stealthy selective arbitration denial.

5

CANSentry: A Firewall for the CAN Bus

Next, we present CANSentry, an efficient and low-cost firewall for the CAN bus to defend against the attacks discussed in Sects. 2.2 and 4.

162

5.1

A. Humayed et al.

The Architecture of the CANSentry Firewall

The objective of CANSentry is to prevent an attacker node (either a CAN abuser or a skipper) from sending malicious frames onto the CAN bus without introducing any practical delay. This requires monitoring and filtering all the messages in real time, since we cannot block any ECU from accessing the bus before an abnormal activity is detected. To address this challenge, we propose a segmentationbased approach to separate high-risk CAN nodes, i.e., a few ECUs with interfaces to the external network, from the rest of the bus, using a firewall. As shown in Fig. 3(a), CANSentry is deployed between each high-risk node and the main bus, which logically divides the original CAN bus into two segments, the internal bus CANint and the external bus CANext . A high-risk ECU directly connects to CANext , which further connects to CANint through the CANSentry firewall. Both segments send and receive messages following the original CAN specifications, where the bi-directional firewall functions mainly as a relay to transmit legitimate messages between CANint and CANext without causing any collision. Therefore, CANSentry monitors the current transmission state of the main bus and decides to forward or block the messages from CANext . Introducing a firewall component for network segmentation enforces the necessary security control, which is missing in the current CAN standard, to regulate the activities of the high-risk, potentially vulnerable nodes. This defense mechanism could fundamentally overcome the vulnerabilities caused by the broadcast nature and lack of authentication/authorization capacities of the CAN bus by preventing the attacker node from broadcasting unauthorized frames. Note that CANSentry does not demand any modifications to the CAN standard, or require any changes to the existing ECUs, which makes it easily adoptable.

Fig. 3. The CANSentry approach: (a) firewall architecture; (b) implementation.

CANSentry: Securing CAN-Based Cyber-Physical Systems

5.2

163

Firewall Principle and Rules

The CANSentry firewall monitors the states of the internal and external buses. Similar to other network firewalls, it checks bi-directional traffic against a set of rules and enforces the forwarding or blocking actions. However, the design of the firewall rules is not straightforward. We have to leverage the features of CAN denial and spoofing attacks to derive appropriate firewall rules for prevention. States and State Transitions of CAN Bus and Nodes. We examine the relationship between the states of a CAN node and the corresponding states of the CAN bus, as seen by the firewall. As shown in Fig. 4(a), each CAN node transits between four legitimate states. In particular, the Receive state denotes the node is listening to the bus when another node is transmitting, while the Idle state denotes the node is listening to the bus and waiting for other nodes to transmit. We combine them as the node takes the same action in both states. Moreover, the Arbitration state denotes the transmission of CAN ID bits and the Transmit state denotes the transmission of all other data and control bits. Correspondingly, the CAN bus also transits between four legitimate states as shown in Fig. 4(b). For the internal bus, we further define Transmitint and Transmitext states to distinguish the transmission due to an (directly connected) internal node or an (firewalled) external node, respectively. From the aspect of bus state transitions, we can interpret the attacks discussed in Sect. 2.2 as exploiting malicious messages to compromise specific bus states. In particular, in the spoofing and arbitration denial attacks, the adversary node (e.g., a malicious abuser or skipper ECU) compromises the bus Arbitration state (denoted as “A1” in Fig. 4(b)) by injecting a partial or complete fake frame with a fraudulent CAN ID to falsely win the arbitration. In the ECU denial attacks, the attacker node compromises the bus Transmit state by injecting messages with deliberately crafted data bits to falsely trigger the target node into transmission errors. Finally, the bus denial attack is a special case of compromising either Arbitration or Transmit states with a stream of dominant bits to take over the entire bus.

Fig. 4. State transitions of a CAN node and the CAN bus.

Firewall Rules. The fundamental principle of the firewall is to ensure that at any time high-risk nodes on the external bus operate in a state consistent with the state of the internal bus. So, for each bus state, we derive the corresponding consistent node states based on the CAN protocol. In particular, (i) for the bus

164

A. Humayed et al.

Idle state, the consistent node states are Idle/Receive, Arbitration and Transmit; (ii) for the bus Arbitration state, the consistent node state are Idle/Receive and Arbitration; and (iii) for the bus Transmit or ErrorFlag states, the consistent node state is only the Idle/Receive state. Finally, we derive a set of firewall rules following the above principle. Similar to network firewall enforcement, the rules will be executed in order such that the traffic blocked by an upper rule will not be evaluated by the lower rules. R1: When the internal bus is in either Transmitint or ErrorFlag state, the firewall always forwards the traffic (bits or frames) from CANint to CANext and blocks the traffic from external to internal, regardless of high-risk node’s state. R2: When the internal bus is in either Idle or Arbitration state, the firewall forwards all the traffic from CANint to CANext . It also forwards traffic that has a CAN ID in the arbitration whitelist and conforms to CAN specifications from CANext to CANint . It blocks all other traffic from external to internal. R3: When the internal bus is in the Transmitext , the firewall forwards the traffic from CANext to CANint that conforms CAN specifications, and blocks all traffic from CANint to CANext , except for error flags. Discussions. First, we assume that the traffic on the internal bus conforms to CAN specifications, because all the nodes behind the firewalls are considered low risk and more trustworthy than external nodes. Regulated by CAN controllers, their activities should not deviate from the CAN protocol. Meanwhile, in R1, the firewall blocks all the traffic from CANext to CANint so that it may block legitimate error flags generated by a benign external node. This may mistakenly block all error flags originated in CANext , since we cannot distinguish if it comes from a benign or compromised external ECU. However, the impact of this blocking to both external and internal nodes is negligible: it does not affect the error counters of the external node, meantime, the internal node causing the error can still receive enough reliable error flags from internal nodes. Lastly, when the internal bus is in Transmitext (i.e., an external node won the arbitration and is sending to the bus), no legitimate traffic should be sent by any other node on the internal bus. So, we block the CANint -to-CANext circuit in R3 as a protective measure. It is worth noting that error flags are handled as a special case – they are collected by an error buffer in the firewall, and sent to CANext when identified. For example, when an internal node detects any error and sends out an error flag, it will trigger R2 to block the CANext -to-CANint circuit and forward the error flag to the transmission ECU on CANext . Error flags will arrive at CANext with a 5-bit delay, which would not cause any issue in error handling. Rule Enforcement. Each firewall monitors both internal and external buses and detects the current internal bus state and external bus state (almost equivalent to the external node state since there is only one node on CANext ). Since R1 and R3 only involve two firewall actions, forward and block, the implementation is straightforward. However, R2 requires the firewall to check the CAN ID originated from the external node against a pre-determined whitelist during the

CANSentry: Securing CAN-Based Cyber-Physical Systems

165

arbitration. The key challenge is to detect the malicious bit as soon as possible before a spoofed ID wins the arbitration. Therefore, we construct a deterministic finite automaton (DFA) to enforce this bit-by-bit ID matching.

Fig. 5. An example DFA for CAN ID matching.

Formally, a DFA is defined as a 5-tuple (Q, q0 , Σ, δ, F ), where Q is a finite set of states, q0 ∈ Q denotes the initial state, Σ is a finite set of input symbols, known as the input alphabet, δ : Q × Σ − → Q is a transition function, and F ⊆ Q denotes the accept states. To start the arbitration, a node always issues a Start-of-Frame (SOF) for hard synchronization, so the input of q0 is always 0. Σ = {0, 1} since the input is a bit stream. Since each firewall is in charge of one high-risk node on the external bus, it maintains a set of CAN IDs to which the external ECU is allowed to send messages. Based on the ID set, we derive the transition function δ and the accepted states F = {CAN IDs}. For example, Fig. 5 shows the DFA of a firewall whose whitelist contains four CAN IDs: “0 × 123”, “0 × 456”, “0 × 789” and “0 × 7AB”. For a spoofed ID “0 × 481”, its fifth bit (i.e., “1”) will be rejected by the state S3 of the DFA, then the firewall will block all the remaining bits of this ID. In practice, we cannot wait until an Arbitration ID reaches an accept state to disseminate it to the bus, since the arbitration phase is expected to be precisely synchronized bit-by-bit among all competing ECUs. To be precise, the proposed CAN firewall implements a Moore machine [33], which is a DFA that generates an output at each state. At each DFA state, an output bit will be disseminated to CANint instantly so that it competes with other nodes on the bus. Lastly, when a spoofed ID is rejected by the DFA, the prefix bits have already been sent to CANint . We will further discuss this and its impact to security in Sect. 6. 5.3

Implementation and Evaluation

To enable an efficient bit-by-bit monitoring and manipulation of bit streams transmitted over CAN, CANSentry is constructed with one MCU and two CAN transceivers, where one transceiver interfaces between the firewall and CANint and the other interfaces between the firewall and CANext , as shown in Fig. 3(a). As the main component of the firewall, the MCU has a filter module, which implements firewall rules and the DFA for CAN ID matching, and then enforces the forwarding or blocking decisions for CAN traffic.

166

A. Humayed et al.

Hardware and Costs. We use the open-source tool CANT [1] on an STM32 Nucleo-144 development board to implement the proof-of-concept CANSentry. The MCU board is connected to CANint and CANext through two designated transceivers. The hardware cost is about $20 to us, which could be significantly lowered in mass production. Meanwhile, we only need to deploy firewalls to external-facing ECUs in a vehicle, which is very limited in number. Transmission Delay. We evaluate the transmission delay introduced by CANSentry. Figure 6 depicts the received bit streams on the CANint and CANext buses when a frame is relayed by the firewall from internal to external (i.e., Fig. 6(a)) and vice versa (i.e., Fig. 6(b)). In Fig. 6(b), the firewall processing includes DFA-based ID matching. Obviously, in both cases, there is no noticeable delay that could cause any bit error or synchronization error.

Fig. 6. Traffic between two buses: (a) CANint to CANext ; (b) CANext to CANint .

Effectiveness Against Attacks. As shown in Fig. 3(b), CANSentry is connected to two CAN transceivers interfacing CANext and CANint , respectively. We use another STM32 Nucleo-144 board to simulate the attacks, which is connected to CANext through ODB-II. We further select an ECU from the Instrument Panel Cluster (IPC) from a used 2014 passenger car as the victim ECU. The IPC is connected to CANint . We simulate ten attacks discussed in Sect. 2.2 (Table 1) to evaluate the performance of CANSentry. In all the attacks, the attacker attempts to inject illegal bits that violate the CAN protocol (e.g., continuous dominant bits in bus-denial attack and arbitration-denial attack), a spoofed frame (e.g., spoofing attack and bus-denial attack), or a spoofed CAN ID (e.g., selective arbitration-denial attack). To evaluate the effectiveness of CANSentry, we monitor the bit stream on the Tx pin of the attacker ECU (i.e., the injected bits from the attacker), and the ones on the internal bus CANint , which shows the traffic on the main bus after the attacks, as shown in Fig. 7. In the experiments, CANSentry was able to block all attack attempts from the adversary ECU, while not interfering with the normal operations on the internal bus CANint . Due to space limit, we only demonstrate the results of three attacks. Figure 7(a) shows a bus-off attack, when a skipper-type attacker injects arbitrary bits (denoted in the block with dashed-line) to trigger a receive bit error at the victim ECU, this attempt is blocked by CANSentry (i.e., no

CANSentry: Securing CAN-Based Cyber-Physical Systems

167

change is observed on the internal bus). Meanwhile, when the attacker injects a dominant bit “0” to win the arbitration and block the transmission attempt of the victim ECU (Fig. 7(b)), or even covers his trace with a complete frame (from a spoofed CAN ID) and a valid CRC (Fig. 7(c)), the malicious actions are prevented by CANSentry and thus have no effect on the internal bus. Finally, it is worth pointing out that although we evaluate CANSentry in the in-vehicle network setting, it can be adopted to secure any CAN-based system.

Fig. 7. Evaluation of firewall performance under: (a) bus-off attack; (b) arbitration denial attack; (c) the proposed stealthy selective arbitration denial attack.

6

Security Analysis and Discussions

Now we analyze the security guarantees of CANSentry and its effectiveness against threats introduced in Sect. 3, and discuss the remaining attack surfaces. Security Analysis of Arbitration. A Deterministic Finite-state Automaton (DFA) is used to filter arbitration IDs (in binary strings). Theorem 1. Let L(M ) be the list of arbitration IDs on the whitelist, which is used to generate DFA M . When M is correctly generated, no CAN frames carrying an ID field that is not in L(M ) shall be disseminated to the bus. Theorem 1 roots on automata theory – a binary string I is accepted by M if and only if I ∈ L(M ). The firewall allows the transmission of the rest of the CAN frame only when I is accepted by M , and I wins arbitration. Theorem 1 ensures that CAN spoofing attacks are effectively blocked: when a malicious/compromised ECU tries to send CAN frames with IDs not in the whitelist, the attempt is rejected in the arbitration phase. Theorem 2. Let Im be the lowest arbitration ID accepted by DFA M , any (partial) ID output from M cannot win arbitration against a target ID It that has higher priority than Im (i.e., It < Im ). During the arbitration phase, the DFA generates an intermediate output bit at each state and transmits the output to CANint . It is possible that the prefix of a spoofed ID (Is ∈ L(M )) being sent to CANint . For instance, if the firewall only

168

A. Humayed et al.

allows 0b1010110011, the ID 0b1010001011 will be denied at bit 5. However, its prefix 0b1010 will be sent to CANint . Theorem 2 shows that adversaries cannot exploit this mechanism to launch arbitration denial attacks against high-priority IDs. Please refer to Appendix A for the proof of this theorem. Security Analysis of Data Frame Transmission. The external node is authorized to transmit its frame to the bus only when it wins arbitration on CANint . The firewall enforces R3 to forward traffic from CANext to CANint . For the simplicity of the design and the portability of the firewalls, we do not further audit the validity of the data frame. When CANext loses arbitration, the firewall moves to enforce R1, which blocks traffic from CANext to CANint . Therefore, an adversary node cannot inject dominate bit(s) to the bus to interrupt CAN frame from other nodes, which may eventually drive other nodes into bus-off. Such ECU denial and bus denial attacks are denied at the firewall. Security of CANSentry. The physical and software security of CANSentry relies on the following factors: (1) CANSentry nodes are deployed in a physically isolated environment, i.e., inside the car. It is difficult for an adversary to bypass/alter it unless he has extended time to physically modify the in-vehicle components. (2) CANSentry only has two network interfaces: CANext and CANint . The limited communication channel and the simplicity of CAN makes it impractical to compromise the operations of the firewall from CANext . And (3) the simplicity of the firewall makes it unlikely to have significant software faults. Updates. We do not consider remote updates to CANSentry in this paper, to keep the simplicity of CANSentry and to avoid adding a new attack vector. In case a CANSentry-protected ECU is remotely updated to send CAN packets with new IDs, such packets will be blocked. In practice, we are not aware of any existing remote update that adds new CAN IDs to the high-risk (e.g., remotely accessible) ECUs. Meanwhile, when the vehicle is updated in the shop, the corresponding CANSentry could be easily updated to add new CAN IDs to the whitelist. Remaining Attack Surfaces. An adversary node may send to CANint using IDs on the whitelist, which may potentially block frames from higher CAN IDs.If the adversary node keeps sending at high frequency, it becomes arbitration denial attacks to nodes with lower priority. CANSentry cannot detect or block this attack. Fortunately, this may not be a severe problem, since: (1) the attack paths from known attacks (e.g. [27,32,39] are all attacking from low priority nodes (i.e., components with wireless interfaces and externally-facing vulnerabilities) to high priority nodes. Attacks from the other direction appear to be meaningless. (2) The high-risk nodes, i.e., nodes that have shown to be compromisable by remote adversaries, mostly belong to low priority nodes (entertainment units, navigation, etc), which are only allowed to send CAN frames with high arbitration IDs. (3) Arbitration denial attacks from such nodes are detectable with a simple IDS, which runs on the same chip as the firewall, monitors traffic from CANint , and informs the firewall to take actions when attacks are detected. When a spoofing attempt is blocked by CANSentry, a partial frame, which only contains the first few bits of the arbitration ID, will be observed on the

CANSentry: Securing CAN-Based Cyber-Physical Systems

169

bus. An adversary node could then intentionally inject malformed frames to CANint . Other receiving nodes on the bus will detect form errors and raise error flags. This will increase RECs at the receiving nodes, and may drive them into error passive mode, in the worst case. This is not a severe problem since: (1) To force a node into error passive mode, REC>127 is needed. This requires continuous attack attempts from a adversary node (assuming that the node manages to avoid increasing its own TEC). (2) Such persistent attacks could be easily detected by a simple counter or IDS embedded in CANSentry. And (3) nodes in error passive mode still perform their functions (with a performance penalty), and they can recover from error passive mode when frames are correctly received. In the attack model, internal ECUs are considered low risk. While we are unaware of any attack that compromises an internal ECU with only brief physical access, we cannot completely rule out this possibility. Lastly, a small number of units in modern automobiles, such as GM’s OnStar, are both remotely compromisable and capable of sending high priority frames, e.g., stopping the vehicle by shutting down fuel injection. When such systems are compromised, they may take over control of the cars even with the existence of CANSentry. From the CAN bus perspective, they are fully authorized to send such frames. Defending against such attacks is outside of the scope of CANSentry.

7

Conclusion

In this paper, we first summarize existing DoS and spoofing attacks on the CAN bus, then implement and evaluate a stealthy selective arbitration denial attack. We present a novel CAN firewall, CANSentry, to be deployed between each high-risk ECU (external-facing ECUs with remote attack vectors) and the internal bus. It defends against attacks that violate CAN standards or abuse the error-handling mechanism to achieve malicious goals. To the best of our knowledge, CANSentry is the first mechanism to not only detect but also prevent denial attacks at data link layer that have not been addressed yet, in addition to preventing the traditional spoofing and denial attacks. Despite the simple yet powerful design, CANSentry effectively prevents the attacks with no noticeable overhead at a low cost with no need to modify CAN standard nor existing ECUs. With hardware implementation and evaluation, we demonstrate the effectiveness of the firewall against the bus/ECU denial attacks and ECU spoofing attacks. Acknowledgements. We would like to thank the anonymous reviewers for their constructive comments. Fengjun Li and Bo Luo were sponsored in part by NSF CNS1422206, DGE-1565570, NSA Science of Security Initiative H98230-18-D-0009, and the Ripple University Blockchain Research Initiative. Jingqiang Lin was partially supported by National Natural Science Foundation of China (No. 61772518) and Cyber Security Program of National Key RD Plan of China (2017YFB0802100).

170

A

A. Humayed et al.

Proof of Theorem 2

Theorem 2. Let Im be the lowest arbitration ID accepted by DFA M , any (partial) ID output from M cannot win arbitration against a target ID It that has higher priority than Im (i.e., It < Im ). Proof. Assume that the adversary attempts to spoof arbitration ID Is (Is < It ) to win against It . Let Bit(S, i) be the i-th bit of string S. If the longest common prefix of Im , Is , and It is an i-bit string P , then the first i bits of Is would be accepted in M (since P ref ix(Is , i) = P ref ix(Im , i)) and sent to CAN accordingly. It and Is would tie in the first i bits of arbitration (since P ref ix(Is , i) = P ref ix(It , i)). At bit i + 1, we have the following three conditions: • If Bit(Is , i + 1) < Bit(Im , i + 1), Is will be rejected by M , since: (1) Is cannot be accepted by the DFA branch that contains Im , since Bit(Is , i+1) = Bit(Im , i+1); (2) if there exist another DFA branch (with ID In ) that accepts Bit(Is , i + 1), then we have In < Im (since they are identical in the first i bits and In < Im at bit i + 1). This violates our assumption that Im be the lowest arbitration ID accepted by M . Therefore, such In and the corresponding DFA branch does not exist. Is will be rejected by M , and It wins arbitration against Is . • If Bit(Is , i+1) > Bit(Im , i+1), then we have Is > Im , since they are identical in the first i bits and Is > Im at bit i + 1. This violates our assumption that Is < It < Im . • If Bit(Is , i + 1) = Bit(Im , i + 1), then Bit(Is , i + 1) will be sent to the bus. Meanwhile, we need Bit(Is , i + 1) = Bit(It , i + 1), otherwise P ref ix(Is , i + 1) is a longer common prefix than P . In case Bit(Is , i + 1) > Bit(It , i + 1), Is would lose arbitration against It . In case Bit(Is , i + 1) < Bit(It , i + 1), then we have Im = Is < It . This violates our assumption that It < Im . In summary, with the existence of M , any (partial) output generated by   Is < Im cannot win arbitration against a higher priority ID It < Im .

References 1. Grimm co. cant. https://github.com/bitbane/CANT 2. Checkoway, S., et al.: Comprehensive experimental analyses of automotive attack surfaces. In: USENIX Security Symposium (2011) 3. Cho, K.-T., Shin, K.G.: Error handling of in-vehicle networks makes them vulnerable. In: ACM CCS, pp. 1044–1055. ACM (2016) 4. Cho, K.-T., Shin, K.G.: Fingerprinting electronic control units for vehicle intrusion detection. In: USENIX Security Symposium (2016) 5. Cho, K.-T., Shin, K.G.: Viden: attacker identification on in-vehicle networks. In: ACM CCS (2017) 6. Choi, W., Jo, H.J., Woo, S., Chun, J.Y., Park, J., Lee, D.H.: Identifying ECUS using inimitable characteristics of signals in controller area networks. IEEE Trans. Veh. Tech. 67(6), 4757–4770 (2018)

CANSentry: Securing CAN-Based Cyber-Physical Systems

171

7. Choi, W., Joo, K., Jo, H.J., Park, M.C., Lee, D.H.: Voltageids: low-level communication characteristics for automotive intrusion detection system. IEEE TIFS 13(8), 2114–2129 (2018) 8. Dagan, T., Wool, A.: Parrot, a software-only anti-spoofing defense system for the can bus. ESCAR EUROPE (2016) 9. Dardanelli, A., et al.: A security layer for smartphone-to-vehicle communication over bluetooth. IEEE Embed. Syst. Lett. 5(3), 34–37 (2013) 10. Foruhandeh, M., Man, Y., Gerdes, R., Li, M., Chantem, T.: Simple: single-frame based physical layer identification for intrusion detection and prevention on invehicle networks. In: ACSAC, pp. 229–244 (2019) 11. Foster, I., Prudhomme, A., Koscher, K., Savage, S.: A story of telematic failures. In: USENIX WOOT, Fast and Vulnerable (2015) 12. Fr¨ oschle, S., St¨ uhring, A.: Analyzing the capabilities of the CAN attacker. In: Foley, S.N., Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10492, pp. 464–482. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66402-6 27 13. Gmiden, M., Gmiden, M.H., Trabelsi, H.: An intrusion detection method for securing in-vehicle can bus. In: IEEE STA (2016) 14. Gupta, R.A., Chow, M.-Y.: Networked control system: overview and research trends. IEEE Trans. Ind. Electron. 57(7), 2527–2535 (2010) 15. Halder, S., Conti, M., Das, S.K.: COIDS: a clock offset based intrusion detection system for controller area networks. In: ICDCN (2020) 16. Han, K., Potluri, S.D., Shin, K.G.: On authentication in a connected vehicle: secure integration of mobile devices with vehicular networks. In: ACM/IEEE ICCPS, pp. 160–169 (2013) 17. Han, K., Weimerskirch, A., Shin, K.G.: A practical solution to achieve real-time performance in the automotive network by randomizing frame identifier. In: Proceedings of Europe Embedded Security Cars (ESCAR), pp. 13–29 (2015) 18. Hartkopp, O., Schilling, R.M.: Message authenticated can. In: Escar Conference, Berlin, Germany (2012) 19. Hoppe, T., Kiltz, S., Dittmann, J.: Security threats to automotive CAN networks – practical examples and selected short-term countermeasures. In: Harrison, M.D., Sujan, M.-A. (eds.) SAFECOMP 2008. LNCS, vol. 5219, pp. 235–248. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87698-4 21 20. Humayed, A., Lin, J., Li, F., Luo, B.: Cyber-physical systems security a survey. IEEE IoT J. 4(6), 1802–1831 (2017) 21. Humayed, A., Luo, B.: Cyber-physical security for smart cars: taxonomy of vulnerabilities, threats, and attacks. In: ACM/IEEE ICCPS (2015) 22. Humayed, A., Luo, B.: Using ID-hopping to defend against targeted DOS on CAN. In: SCAV Workshop (2017) 23. Iehira, K., Inoue, H., Ishida, K.: Spoofing attack using bus-off attacks against a specific ECU of the can bus. In: IEEE CCNC (2018) 24. Karray, K., Danger, J.-L., Guilley, S., Elaabid, M.A.: Identifier randomization: an efficient protection against CAN-bus attacks. In: Ko¸c, C ¸ .K. (ed.) Cyber-Physical Systems Security, pp. 219–254. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-319-98935-8 11 25. Kneib, M., Huth, C.: Scission: signal characteristic-based sender identification and intrusion detection in automotive networks. In: ACM CCS (2018) 26. Kornaros, G., Tomoutzoglou, O., Coppola, M.: Hardware-assisted security in electronic control units: secure automotive communications by utilizing one-timeprogrammable network on chip and firewalls. IEEE Micro 38(5), 63–74 (2018)

172

A. Humayed et al.

27. Koscher, K., et al.: Experimental security analysis of a modern automobile. In: IEEE S&P (2010) 28. Kurachi, R., Matsubara, Y., Takada, H., Adachi, N., Miyashita, Y., Horihata, S.: Cacan-centralized authentication system in can (controller area network). In: International Conference on ESCAR (2014) 29. Lukasiewycz, M., Mundhenk, P., Steinhorst, S.: Security-aware obfuscated priority assignment for automotive can platforms. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 21(2), 32 (2016) 30. Matsumoto, T., Hata, M., Tanabe, M., Yoshioka, K., Oishi, K.: A method of preventing unauthorized data transmission in controller area network. In: IEEE VTC (2012) 31. Miller, C., Valasek, C.: Adventures in automotive networks and control units. Def Con 21, 260–264 (2013) 32. Miller, C., Valasek, C.: Remote exploitation of an unaltered passenger vehicle. Black Hat USA 2015, 91 (2015) 33. Moore, E.F.: Gedanken-experiments on sequential machines. Automata Stud. 34, 129–153 (1956) 34. Mundhenk, P., et al.: Security in automotive networks: lightweight authentication and authorization. ACM TODAES 22(2), 1–27 (2017) 35. Murvay, P.-S., Groza, B.: Source identification using signal characteristics in controller area networks. IEEE Signal Process. Lett. 21(4), 395–399 (2014) 36. Murvay, P.-S., Groza, B.: Dos attacks on controller area networks by fault injections from the software layer. In: ARES. ACM (2017) 37. M¨ uter, M., Asaj, N.: Entropy-based anomaly detection for in-vehicle networks. In: IEEE Intelligent Vehicles Symposium (2011) 38. Narayanan, S.N., Mittal, S., Joshi, A.: Obd securealert: an anomaly detection system for vehicles. In: IEEE SMARTCOMP (2016) 39. Nie, S., Liu, L., Yuefeng, D.: Free-fall: hacking tesla from wireless to can bus. Brief. Black Hat USA 25, 1–16 (2017) 40. Nowdehi, N., Aoudi, W., Almgren, M., Olovsson, T.: CASAD: can-aware stealthyattack detection for in-vehicle networks. arXiv:1909.08407 (2019) 41. N¨ urnberger, S., Rossow, C.: – vatiCAN – vetted, authenticated CAN bus. In: Gierlichs, B., Poschmann, A.Y. (eds.) CHES 2016. LNCS, vol. 9813, pp. 106–124. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53140-2 6 42. Palanca, A., Evenchick, E., Maggi, F., Zanero, S.: A stealth, selective, link-layer denial-of-service attack against automotive networks. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 185–206. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60876-1 9 43. Petit, J., Shladover, S.E.: Potential cyberattacks on automated vehicles. IEEE Trans. Intell. Transp. Syst. 16(2), 546–556 (2015) 44. Radu, A.-I., Garcia, F.D.: LeiA: a lightweight authentication protocol for CAN. In: Askoxylakis, I., Ioannidis, S., Katsikas, S., Meadows, C. (eds.) ESORICS 2016. LNCS, vol. 9879, pp. 283–300. Springer, Cham (2016). https://doi.org/10.1007/ 978-3-319-45741-3 15 45. Rizvi, S., Willet, J., Perino, D., Marasco, S., Condo, C.: A threat to vehicular cyber security and the urgency for correction. Procedia Comput. Sci. 114, 100–105 (2017) 46. Rouf, I., et al.: Security and privacy vulnerabilities of in-car wireless networks: a tire pressure monitoring system case study. In: USENIX Security Symposium (2010) 47. Sagstetter, F., et al.: Security challenges in automotive hardware/software architecture design. In: DATE. IEEE (2013)

CANSentry: Securing CAN-Based Cyber-Physical Systems

173

48. Souma, D., Mori, A., Yamamoto, H., Hata, Y.: Counter attacks for bus-off attacks. In: Gallina, B., Skavhaug, A., Schoitsch, E., Bitsch, F. (eds.) SAFECOMP 2018. LNCS, vol. 11094, pp. 319–330. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-319-99229-7 27 49. Studnia, I., Nicomette, V., Alata, E., Deswarte, Y., Kaaniche, M., Laarouchi, Y.: Survey on security threats and protection mechanisms in embedded automotive networks. In: IEEE/IFIP DSN (2013) 50. Taylor, A., Leblanc, S., Japkowicz, N.: Anomaly detection in automobile control network data with long short-term memory networks. In: IEEE DSAA (2016) 51. Theissler, A.: Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection. Knowl.-Based Syst. 123, 163–173 (2017) 52. Tian, D., et al.: an intrusion detection system based on machine learning for CANBus. In: Chen, Y., Duong, T.Q. (eds.) INISCOM 2017. LNICST, vol. 221, pp. 285–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74176-5 25 53. Van Bulck, J., M¨ uhlberg, J.T., Piessens, F.: Vulcan: efficient component authentication and software isolation for automotive control networks. In: ACSAC, pp. 225–237 (2017) 54. Van Herrewege, A., Singelee, D., Verbauwhede, I.: Canauth-a simple, backward compatible broadcast authentication protocol for can bus. In: ECRYPT Workshop on Lightweight Cryptography, vol. 2011 (2011) 55. Wang, Q., Sawhney, S.: Vecure: a practical security framework to protect the can bus of vehicles. In: IEEE International Conference on IOT (2014) 56. Wolf, M., Weimerskirch, A., Paar, C.: Security in automotive bus systems. In: Workshop on Embedded Security in Cars (2004) 57. Woo, S., Jo, H.J., Lee, D.H.: A practical wireless attack on the connected car and security protocol for in-vehicle can. IEEE Trans. Intell. Transp. Syst. 16(2), 993–1006 (2014) 58. Woo, S., Moon, D., Youn, T.-Y., Lee, Y., Kim, Y.: Can ID shuffling technique (CIST): moving target defense strategy for protecting in-vehicle can. IEEE Access 7, 15521–15536 (2019) 59. Wu, W., et al.: IDH-CAN: a hardware-based ID hopping can mechanism with enhanced security for automotive real-time applications. IEEE Access 6, 54607– 54623 (2018) 60. Ziermann, T., Wildermann, S., Teich, J.: Can+: a new backward-compatible controller area network (can) protocol with up to 16× higher data rates. In: DATE. IEEE (2009)

Distributed Detection of APTs: Consensus vs. Clustering Juan E. Rubio, Cristina Alcaraz(B) , Ruben Rios, Rodrigo Roman, and Javier Lopez Computer Science Department, University of Malaga, Campus de Teatinos s/n, 29071 Malaga, Spain {rubio,alcaraz,ruben,jlm}lcc.uma.es

Abstract. Advanced persistent threats (APTs) demand for sophisticated traceability solutions capable of providing deep insight into the movements of the attacker through the victim’s network at all times. However, traditional intrusion detection systems (IDSs) cannot attain this level of sophistication and more advanced solutions are necessary to cope with these threats. A promising approach in this regard is Opinion Dynamics, which has proven to work effectively both theoretically and in realistic scenarios. On this basis, we revisit this consensus-based approach in an attempt to generalize a detection framework for the traceability of APTs under a realistic attacker model. Once the framework is defined, we use it to develop a distributed detection technique based on clustering, which contrasts with the consensus technique applied by Opinion Dynamics and interestingly returns comparable results. Keywords: Clustering · Consensus · Opinion dynamics detection · Traceability · Advanced persistent threat

1

· Distributed

Introduction

In recent years, there has been a growing interest for advanced event management systems in the industrial cyber-security community for two main reasons: (i) the integration of cutting-edge technologies (e.g., Big Data, Internet of Things) into traditionally isolated environments, which adds complexity to data collection and processing [1]; and (ii) the emergence of the new attack vectors as a result of the Industry 4.0 evolution, which have not been properly studied in context and may form part of an Advanced Persistent Threat (APT) [2]. APTs consist of sophisticated attacks perpetrated by resourceful adversaries which cost millions every year to diverse industrial sectors [3]. The main concern with these threats is that they are especially difficult to detect and trace. In this context, traditional Intrusion Detection Systems (IDS) only pose a first line of defense in an attempt to identify anomalous behaviours in very precise points of the infrastructure [4], and they are tailored to specific types of communication c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 174–192, 2020. https://doi.org/10.1007/978-3-030-58951-6_9

Distributed Detection of APTs: Consensus vs. Clustering

175

standards or types of data, which is not sufficient to track the wide range of attack vectors that might be used by an APT. It is then necessary to fill this gap between classic security mechanisms and APTs. The premise is to find proper mechanisms capable of monitoring all the devices (whether physical or logical) that are interconnected within the organization, retrieve data about the production chain at all levels (e.g., alarms, network logs, raw traffic) and correlate events to trace the attack stages throughout its entire life-cycle. These measures would provide the ability to holistically detect and anticipate attacks as well as failures in a timely and autonomous way, so as to deter the attack propagation and minimize its impact. To cope with this cyber-security scenario, novel candidate solutions such as the Opinion Dynamics approach emerge [5]. These alternatives propose to apply advanced correlation algorithms that analyze an industrial network from a holistic point of view, leveraging data mining and machine learning mechanisms in a distributed fashion. In this paper, we formalize a framework that enables the design and practical integration of such distributed mechanisms for the traceability of APTs, while also comparing the features of the aforementioned solutions according to the cyber-security needs of the industry nowadays, both qualitatively and experimentally. Altogether, we can summarize our contributions as: – Characterization of the context in terms of security requirements and available solutions; – Definition of a framework for developing solutions that enable the distributed correlation of APT events, based on these security needs and a new attacker model; – Identification of effective techniques and algorithms for the traceability of APTs that satisfy the proposed framework; – Qualitative and quantitative comparison of approaches in an Industry 4.0 scenario. The remainder of the paper is organized as follows. Section 2 presents the state of the art of intrusion detection and anomaly correlation mechanisms, as well as the preliminary concepts involved in the studio. Then, Sect. 3 presents the security and detection requirements, whereas Sect. 4 defines the framework for developing solutions that fulfill them. Based on such framework, the studied solutions are addressed in Sect. 5, and experimentally analysed in Sect. 6. Finally, extracted conclusions are discussed in Sect. 7.

2

Background and Preliminaries

At present, there is a plethora of intrusion detection approaches tailored for traditional industrial scenarios (cf. [6]) and Industry 4.0 networks (cf. [7]). This includes specification-based IDSs, which compare the current state of the network with a model that describes its legitimate behaviour [8]; physics-based modeling systems, which simulates the effect of commands over the physical dynamics of the operations [9]; and other more traditional strategies such as

176

J. E. Rubio et al.

signature and anomaly detection systems. Most of these detection approaches focus on the analysis of certain aspects of industrial control systems, such as the communication patterns, the behaviour of sensors and actuators, and others. Still, industrial technologies are becoming more heterogeneous and attacks are extremely localized, which makes crucial to monitor all elements and evidences. Therefore, it is important for industrial ecosystems to set up more than one detection solution to ensure the maximum detection coverage [10]. Moreover, all solutions should coexist with advanced detection platforms that take the infrastructure from a holistic perspective, correlate all events and track all threats throughout their entire life-cycle [11]. This holistic perspective is even more necessary in light of the existence of APTs: sophisticated attacks comprised of several complex phases – from network infiltration and propagation to exfiltration and/or service disruption [3,12]. These advanced detection platforms have been explored in traditional Information Technology (IT) environments through forensic investigation solutions, using proactive (that analyse evidences as incidents occur) or reactive techniques (where evidences are processed once the events occur). Examples of these include flow-based analysis of traffic in real time [13] or the correlation of multiple IDS outputs to highlight and predict the movements of APTs, using information flow tracking [14] or machine learning [15]. Still, most of them are limited to a restricted set of attacks and are not applied to a real setup. In turn, the progress in the Industry 4.0 has not been significant with respect to actual APT traceability solutions. In this sense, the Opinion Dynamics approach [5] paves the way for a new generation of solutions based on the deployment of distributed detection agents across the network. The anomalies reported by these agents are correlated to extract conclusions about the sequence of actions performed by the adversary, and also to identify the more affected areas of the infrastructure. Such assessment can be conducted in a centralized entity or using a distributed architecture of peers [16]. At the same time, it is open to integrate external IDS to examine anomalies in the vicinity of nodes, as well as the abstraction of diverse parameters such as the criticality of resources or the persistence of attacks. Despite the many capabilities of this solution (explained in Sect. 5.1), it is necessary to define a more general detection model to lay the base for the precise application of more APT traceability solutions in the Industry 4.0 paradigm. The reason is that the Opinion Dynamics capabilities can be implemented modularly, they can be integrated into other correlation algorithms and each one has a different effect on many security, detection, deployment and efficiency constraints. These points will be addressed in the next section, where we define the security and detection requirements involved, to latter present the traceability framework.

3

Security and Detection Requirements

Based on the state of the art presented in Sect. 2, this section enumerates the requirements for the development of advanced solutions and systems that

Distributed Detection of APTs: Consensus vs. Clustering

177

provide a holistic perspective on industrial ecosystems. According to [7], we should consider the following detection requirements: (D1) Coverage. APTs make use of an extensive set of attack vectors that jeopardize organizations at all levels. Therefore, the system must be able to assimilate traffic and data from heterogeneous devices and sections of the network, while also incorporating the input of external detection systems. (D2) Holism. In order to identify anomalous behaviors, the system must be able to process all the interactions between users, processes and outputs generated, as well as logs. This allows to generate anomaly and traceability reports at multiple levels (e.g., per application, device or portion of the network, as well as global health indicators). (D3) Intelligence. Beyond merely detecting anomalous events within the network in a timely manner, the system must infer knowledge by correlating current events with past stages and anticipate future movements of the attacker. Similarly, it should provide mechanisms to integrate information from external sources – that is, cyber threat intelligence [17]. (D4) Symbiosis. The system should have the capability to offer its detection feedback to other Industry 4.0 services, by means of well-defined interfaces. This includes access control mechanisms (to adapt the authorization policies depending on the security state of the resources) or virtualization services (that permit to simulate response techniques under different scenarios without interfering the real setup), among others. On the other hand, we can also establish the following security requirements with regards to the deployment of the detection solution over the network: (S1) Distributed data recollection. It is necessary to find distributed mechanisms – such as local agents collaborating in a peer-to-peer fashion – that allow the collection and analysis of information as close as possible to field devices. The ultimate aim is to make the detection system completely autonomous and resistant to targeted attacks. (S2) Immutability. The devised solution must be resistant to modifications of the detection data at all levels, including the reliability and veracity of data exchanged between agents (e.g., through trust levels that weigh the received security information), and the storage of such data (e.g., through unalterable storage mediums and data replication mechanisms such as immutable databases or distributed ledgers). (S3) Data confidentiality. Apart from the protection against data modification, it is mandatory that the system provides authorization and cryptographic mechanisms to control the access to the information generated by the detection platform and all the interactions monitored. (S4) Survivability. Not only the system must properly function even with the presence of accidental or deliberate faults in the industrial infrastructure, but also the system itself cannot be used as a point of attack. To achieve this, the detection mechanisms must be deployed in a separated network that can only retrieve information from the industrial infrastructure.

178

J. E. Rubio et al.

(S5) Real-time performance. The system must not introduce operational delays on the industrial infrastructure, and its algorithms should not impose a high complexity to ensure the generation of real-time detection information. Network segmentation procedures and separate computation nodes (e.g., Fog/Edge Computing nodes) can be used for this purpose.

4

APT Traceability Framework for the Industry 4.0

After defining the detection and security requirements that a conceptual APT traceability solution must fulfill, we now describe the guidelines for the design and construction of its deployment architecture, the algorithms to be used, and the attacker model under consideration. 4.1

Network Architecture and Information Acquisition

The industrial network topology is modelled with an acyclic graph G(V, E), where V represents the devices and E is the set of communication links between them. This way, V can be assigned with parameters to represent, for instance, their criticality, vulnerability level or the degree of infection; whereas the elements in E can be associated with Quality of Service (QoS) parameters (e.g., bandwidth, delays), or compromise states that help to prioritize certain paths when running resilient routing algorithms. For the interest of theoretical analysis, these networks are frequently generated using random distributions that model the architecture of real industrial systems. Also, the topology can be subdivided into multiple network segments with different distributions [5]. This is useful, for example, to study the effects of the attack and detection mechanisms over the corporate section (containing IT elements) and the operational section (OT, containing pure industrial assets), which can be connected by firewalls, so that V = VIT ∪ VOT ∪ VF W . Regardless of the topology configuration, the detection approach must acquire information from the whole set of nodes V to fulfill requirement D1 (Coverage, c.f. Sect. 3), by using agents that are in charge of monitoring such devices, complying with S1. Each of these agents can be either mapped to a individual device (following a 1:1 relationship), which would be ideal for S1, or aggregate the data from a set of physical devices belonging to the infrastructure. In either case, we can assume they are able to retrieve as much data as possible from their assigned devices, which encompasses network-related parameters (e.g., links with other nodes, number packets exchanged, delays), host-based data (e.g., storage, computational usage) or communication information (e.g., low-level commands issued by supervision protocols). These data items are aimed to feed the correlation algorithm with inputs in the form of an anomaly value for every device audited, which is formalized by vector x. This way, xi represents the anomaly value sensed by the corresponding agent on device i, for all i ∈ 1, 2, ..., |V |. Such value can be calculated by each agent autonomously (e.g., applying some machine learning to determine deviations in every data item

Distributed Detection of APTs: Consensus vs. Clustering

179

analyzed) or leveraging an external IDS that is configured to retrieve the raw data as input, thereby conforming to requirement D3. From a deployment perspective, this leads to the question of where to locate the computation of anomalies and their subsequent correlation. As for the former, agents are implemented logically, since it is not always feasible to physically integrate monitoring devices into the industrial assets due to computational limitations. Consequently, these processes may have to run in separate computational nodes. However, we still want to achieve a close connection to field devices while avoiding a centralized implementation such as the one presented in [5]. The solution is then to introduce an intermediate approach based on the concept of distributed data brokers. These components collect the data from a set of individual devices via port-mirroring or network tapping, using data diodes to decouple agents from actual systems and ensure that data transmission is restricted to one direction, thereby shielding the industrial assets from outside access and complying with requirements S3 and S4. These data brokers can also convey the detection reports (i.e., the anomalies sensed by its logical agents over the area where it is deployed) to other brokers in order to execute the correlation in a collaborative way. In consequence, they must be strategically deployed in a separate network such that there is at least one path between every two brokers. Due to this distributed nature, the correlation algorithm can make use of two data models: Replicated database, which assumes that every agent has complete information of the whole network (e.g., through distributed ledgers), and Distributed data endpoints, where the information is fully compartmentalized and the cross-correlation is conducted at a local level. Both approaches have their advantages and disadvantages. The replicated database provides all agents with a vision of the network, although it imposes some overhead. As for the distributed data endpoints, they reduce the number of messages exchanged, yet the algorithm must deal with partial information coming from neighbour brokers. Altogether, the ultimate election of the algorithm, data model and architectural design of the agents responds to performance and overhead restrictions. This composes the detection mechanism at a physical layer, while at an abstract level, it must also return a set of security insights that are based on the attacker model, described in the following. 4.2

Attacker Model

To analyze the parametrization of traceability algorithms, we need to characterize the chain of actions performed by an APT over the network. We will use the formalization described in [5] as a starting point. That paper reviews the most relevant APTs reported in the last decade and extracts a standard representation of an APT in the form of a finite succession of attacks stages: initial intrusion, node compromise, lateral movement, and data exfiltration or destruction. These stages cause different anomalies that are potentially inspected by the detection agents, according to an ordered set of probabilities [5].

180

J. E. Rubio et al.

Algorithm 1. Attacker model - anomaly calculation input: attackSetk , representing the chain of actions of AP T k, 1 ≤ k ≤ numOf Apts local: Graph G(V, E) representing the network output: xi representing the anomaly value sensed by each agent i at the end of the AP T network, where xi ∈ (0, 1) x ← zeros(|V |) (initial opinion vector) while attackSetk = ∀k ∈ {1, .., numOf Apts} do for k ← 1 to numOf Apts do if |attackSetk | > 0 then {attack ← next attack f rom attackSetk } if attack == initialIntrusion( IT, OT, F W ) then attackedN odek ← random v ∈ V(IT ,OT ,F W ) x(attackedN odek ) ← xattackedN odek + θ3 else if attack == compromise then xattackedN odek ← xattackedN odek + θ2 for neighbour in neighbours(attackedN odek ) do xattackedN odek ← xattackedN odek + θ5 end for else if type(attack) == LateralM ovement then previousAttackedN ode ← attackedN odek attackedN odek ← SelectNextNode(G, attackedN odek ) xpreviousAttackedN odek ← xpreviousAttackedN odek + θ5 xattackedN odek ← xattackedN odek + θ3,4 else if attack == exf iltration then xattackedN odek ← xattackedN odek + θ4 else if attack == destruction then xattackedN odek ← xattackedN odek + θ1 end if x ← AttentauteOldAnomalies(x) attackSetk ← attackSetk \ attack end if end for end while

Using these stages, APTs can be represented as a finite sequence of precise events. However, the original authors did not consider the possibility of parallel APT traces being executed simultaneously across the network, hence generating cross-related events. We refer to them as concurrent routes followed by the APT in the attack chain or different APTs taking place (that may eventually collaborate). We extend the aforementioned attacker model with this novel feature and formalize it in Algorithm 1. In the algorithm, the effect of multiple APTs are implemented as a succession of updates on vector x, both in the concerned node (i.e., the anomaly sensed by its agent) and its neighbours when certain attack stages (e.g., a compromise stage) are involved. The routine continues until all attack stages have been executed for the entire set of APTs considered. On the other hand, the theta values refer to the ordered set of probabilities presented in [5]: the lower value the index of theta, the higher anomaly is reported by the agent after that phase. That paper also illustrates the attenuation function, which decreases the values of vector x over time based on the persistence of attacks, the criticality of the resources affected and their influence over posterior APT phases. During the execution of these iterations, the correlation algorithm can be executed at any time to gain knowledge of the actual APT movements, by using

Distributed Detection of APTs: Consensus vs. Clustering

181

the anomalies as input. The complete explanation of inputs and outputs for the traceability solution is further explained in next subsection. 4.3

Inputs and Outputs of the Traceability Solution

After introducing how the information from physical devices is collected by agents in practice and how the anomalies can be calculated in theoretical terms for our simulations, we summarize the set of inputs for traceability solutions as: (I1) Quantitative input. expressed with vector x to assign every industrial asset with an anomaly value prior to conducting the correlation. As previously mentioned, it can be calculated by each associated agent or using external detection mechanisms integrated with the data broker by taking an extensive set of data inputs to comply with D1 and D2. In our simulations, this value is given by the attack phases executed on the network in a probabilistic way, without the detection mechanism having any knowledge about the actual stages. (I2) Qualitative input. the previous values need to be enriched with information to correlate events in nearby devices and infer the presence of related attack stages, according to Sect. 4.2. At the same time, we also need to prioritize attacks that report a higher anomaly values. We assume that the resulting knowledge can be reflected in form of a weight wij , which is assigned by every agent i to each of its neighbours and represents the level of trust given to their anomaly indications when performing the correlation (fulfilling S2). This parameter can be subject to a threshold ε, which defines when two events should be correlated depending on the similarity of their anomalies. Further criteria could be introduced to associate anomalies from different agents. With respect to the outputs of the traceability solutions, they should include, but are not limited to the following items: (O1) Local result to determine whether the agent is generating an anomaly due to whether the actual infection of the associated node, as a result of a security threat in a neighbour device or a false positive. (O2) Information at global level, to determine the degree of affection in the network and the nodes that have been previously taken over, filtered by zones. (O3) Contextual information that permits to correlate past events and visualize the evolution of the threat, while anticipating the resources that are prone to be compromised (D3 & D4). This comprehensive analysis of the requirements and techniques defines a framework for the development of distributed detection solutions for APTs in industrial scenarios, as depicted in Fig. 1. The following section presents some of the candidate solutions that implement them and hence achieve the APT traceability goals proposed so far.

182

J. E. Rubio et al.

Fig. 1. Distributed detection framework

5

Distributed Traceability Solutions

After explaining the proposed framework, we look for feasible solutions that can effectively accommodate all of its statements. More specifically, we revisit the original Opinion Dynamics approach and compare it with two alternative mechanisms . The former is based on the concept of consensus and the two latter are based on clustering. 5.1

Opinion Dynamics

Despite its novelty, the Opinion Dynamics approach has faced a number of improvements since its inception in [18]. It was originally defined as a mechanism to address the anomalies caused by a theoretical APT, whose attacker model was better formalized in [19]. The event correlation and traceability capabilities were updated in [16], to latter show its implementation on an industrial testbed in [5]. Additionally, it has been studied its application to the Smart Grid [20] or the Industrial Internet of Things [21]. Compared to these, the aim of the framework is to ease the design of alternative solutions with results equal to or better than those of Opinion Dynamics. The correlation approach of Opinion Dynamics is based on an iterative algorithm that takes the anomalies of individual agents as input, and generates their resulting ‘opinions’ based on this formula: xi (t + 1) =

n  j=1

wij xj (t)

Distributed Detection of APTs: Consensus vs. Clustering

183

Where xi (t) stands for the opinion of the agent (i ∈ {1, ..., |V |}) at iteration t, such that xi (0) contains the initial anomaly sensed in vector x prior to execute the correlation (I1), as stated in Sect. 4.3. As for wij , it represents the weight given by each agent i to the opinion of each neighbour j in G(V, E), as to model the influence between them (I2). At this point, the original paper defined a ε = 0.3 threshold to hold the maximum difference in opinions between every pair of agents i and j, to associate a weight wij > 0. This way, the weight given by an agent i is equally divided and assigned to each other neighbour agent k n that complies with ε (including itself), having k=1 wik = 1. In [16], the authors provide a methodology to include further security factors and other metrics (e.g., QoS in communication links) when calculating these weight values. Altogether, the correlation is performed by every agent as a weighted sum of the closest opinions, and such calculation can be performed by solely using the information from neighbouring agents, thereby adapting to the distributed architecture based on data brokers (either replicating data or not). When executing this algorithm with a high number of iterations, the outputs of all agents are distributed into different groups that expose the same anomaly value, which correspond to related attacks. As a result, the network is polarized based on their opinions, hence satisfying O1. From these values, it is also possible to study the degree of affection in different network zones and extract global security indicators (O2). Likewise, the evolution of a sequential APT can also be visualized if we account for the agents opinions over time (O3), as described in [5]. 5.2

Distributed Anomaly Clustering Solutions

Opinion Dynamics belongs to a set of dynamic decision models in complex networks whose aim is to obtain a fragmentation of patterns within a group of interacting agents by means of consensus. This fragmentation process is locally regulated by the opinions and weights of the nodes, that altogether abstracts the APT dynamics and its effects on the underlying network. This ultimately enables to take snapshots of the current state of the network and highlight the most affected nodes, thereby tracing APT movements from anomaly events. This rationale can also be applied to define different mechanisms with similar results. We propose to adapt clustering algorithms as an alternative for the correlation of events that fulfill the defined framework. These have been traditionally used as an unsupervised method for data analysis, where a set of instances are grouped according to some criteria of similarity. In our case, we have devices that are affected by correlated attacks (see Sect. 4.2) and show similar anomalies, which results in the devices being grouped together. Classical clustering methods [22], such as K-means, partition a dataset by initially selecting k cluster centroids and assigning each element to its closest centroid. Centroids are repeatedly updated until the algorithm converges to a stable solution. In our case, the anomalies detected by the agents (denoted by the vector x) play the role of the data instances to be grouped into clusters. However, the parametrization of this kind of algorithms impose two main challenges to properly comply with the inputs and outputs of an APT traceability solution:

184

J. E. Rubio et al.

– The election of k. It is one classical drawback of the K-means, since that value has to be specified from the beginning and it is not usually known in advance, as in this case. Numerous works in the literature have proposed methods for selecting the number of clusters [23], including the use of statistical measures with assumptions about the underlying data distribution [24] or its determination by visualization [25]. It is also common to study the results of a set of values instead of a single k, which should be significantly smaller than the number of instances. The aim is to apply different evaluation criterion to find the optimal k, such as the Calinski and Harabasz score (also known as the Variance Ratio Criterion) [26], that minimizes the within-cluster dispersion and maximizes the between-cluster dispersion. – Representation of topological and security constraints. By applying K-means, we assume the dataset consists of a set of multi-dimensional points. However, here we have an one-dimensional vector of anomalies in the range [0,1]. Also, the clusterization of these values is subject to the topology and the security correlation criteria which might determine that, for example, two data points should not be grouped in the same cluster despite having a similar anomaly value. Therefore, it becomes necessary to provide this knowledge to the algorithm and reflect these environmental conditions as inputs (I1 and I2) to the correlation. In this sense, some works have proposed a constrained K-means clustering [27], and specific schemes have been developed to divide a graph into clusters using Spanning Trees or highly connected components [28]. As for the first challenge, we can assume that the value of k is defined by the different classes of nodes within the network depending on their affection degree, which corresponds to the number of consensus between agents that Opinion Dynamics automatically finds. Here we can adopt two methodologies: (1) a static approach where we consider a fixed set of labels (e.g., ‘low’, ‘medium’, ‘high’ and ‘critical’ condition) to classify each agent; or (2) a dynamic approach where k is automatically determined based on the number and typology of attacks. In this case, we can study the Variance Ration Criterion in a range of k values (e.g., k = {1–5}) to extract the optimal value with the presence of an APT. This procedure needs further improvements to make the solution fully distributed, so that each agent is in charge of locally deciding its own level of security based on the surrounding state, instead of adopting a global approach for all nodes. This bring us to the second challenge. A first naive solution would be to introduce additional dimensions to the data instances representing the coordinates of every node, together with the anomalies in vector x. We call this approach location-based clustering. However, this approach still needs to figure out an optimal value of k, and does not take into account the presence of actual links interconnecting nodes in G(V, E). To circumvent this issue while also adopting an automatic determination of the number of clusters, we propose an accumulative anomaly clustering scheme, which is formalized in Algorithm 2. This algorithm begins by selecting the most affected node within the network and subsequently applies the influence of their surrounding nodes. This is represented by adding an entire value to the anomalies

Distributed Detection of APTs: Consensus vs. Clustering

185

Algorithm 2. Accumulative anomaly clustering input: xi representing the initial anomaly value sensed by each agent i within the network, where xi ∈ (0, 1) output: zi representing the agents O1 output of each agent i af ter clustering local: Graph G(V, E) representing the network, where V = VIT ∪ VOT ∪ VF W max ← |V |, k ← 0 y ← x, x ← x sorted in descending order for all i ∈ x do anyN eighbourF ound ← F alse for all j ∈ neighbours(i, G) do if yj ≤ 1 AN D |yi − yj | ≤  then yj ← yj + max ∗ 10 anyN eighbourF ound ← T rue end if end for yi = yi + max ∗ 10 if anyN eighbourF ound then k ←k+1 end if max ← max − 1 end for clusters, centroid ← kmeans(y, k) for all vi ∈ V do c ← clusters(vi ) Zi ← IntegerP art(centroid(c)) end for

of such agents (initially from 0 to 1), which is proportional to the anomaly of the influencing node (see max in the algorithm). This addition is performed as long as the difference between both anomalies (i.e., the influencing and influenced node) does not surpasses a defined threshold ε, similar to the Opinion Dynamics approach in order to comply with I2. Then, the algorithm continues by selecting the next one in the list of nodes inversely ordered by the anomaly value, until all nodes have been influenced or have influenced others. At that point, k is automatically assigned with the number of influencing nodes, and K-means is ready to be executed with the modified data instances. The resulting values of each agent corresponds to the decimal part of their associated centroid. This is comparable to the ‘opinions’ in the Opinion Dynamics approach. The intuition behind this model of influence between anomalies (which can be enriched to include extra security factors to specify I2) assumes that successive attacks raise a similar anomaly value in closest agents, as Opinion Dynamics suggests. At the same time, it addresses the issue of selecting k and including topological information to the clusterization. It is validated from a theoretical point of view in Appendix A. In the following, the accuracy of these correlation approaches are compared under different attack and network configurations.

6

Experiments and Discussions

After presenting some alternative solutions to Opinion Dynamics that fulfill the distributed detection framework presented in Sect. 4, this section aims to put these approaches to the test. More specifically, we consider the attacker model explained in Sect. 4.2, which is applied against a network formalized by G(V, E),

186

J. E. Rubio et al.

following the structure introduced in Sect. 4.1. These theoretical APTs generate a set of anomalies that serve as input to compare the traceability capabilities of each correlation approach: – Location-based clustering: as presented earlier, it consists of the Kmeans algorithm taking the anomalies and coordinates of each node as data instances. These are grouped in a number of clusters, k, which is selected in the range from 1 to 5 according to the Variance Ratio Criterion. – Accumulative clustering: as previously presented, it allows to distributedly locate the infection while automatically determining the optimal k. – Opinion Dynamics: is the approach that serves as inspiration for our framework and serves for comparison with the novel detection methods introduced above. These traceability solutions are simulated under different network and attack configurations, as explained next. We start by running a brief attack test-case that illustrates the features of each approach in a simple network scenario. Based on Algorithm 1, Fig. 2 shows the detection outputs (O1 and O3) of the three approaches when correlating the anomalies of an APT perpetrated against a simple infrastructure. This network is modelled according to the concepts introduced in Sect. 4.1, to include an IT and OT section of nodes connected by a firewall. Concretely, the figure shows an snapshot of the detection state after the adversary has performed a lateral movement from IT node 2 to compromise the firewall. The numeric value assigned to each node represents O1, which will attenuate over time to highlight the most recent anomaly, according to O3. As noted in the figure, location-based clustering fails to accurately determine where the threat is located and selects a wide affection area instead, which is composed by IT1, IT2 and FW1 nodes (i.e., grouped in the same cluster due to the average anomaly in such zone). On the other hand, the accumulative clustering and Opinion Dynamics show a similar result, and successfully identify both IT2 and FW1 as the affected nodes in this scenario. As for the rest of nodes, they agree on a subtle affection value due to the noise present in the network and the anomalies sensed in the vicinity of the attacked nodes. As previously stated, this is modelled in a probabilistic way [5]. We now execute these solutions with a more complex network and APT model in order to study their accuracy . In the context of cluster analysis, the ‘purity’ is an evaluation criteria of the cluster quality that is applicable in this particular scenario. It holds the percentage of the total number of data points that are classified correctly after executing the clustering algorithm, in the range [0,1]. It is calculated according to the following equation: P urity =

k 1  max|ci ∩ tj | N i=1

(1)

where N is the number of nodes, k is the number of clusters, ci is a cluster in C and tj is the classification that has the max count for cluster ci . In our case, by

Distributed Detection of APTs: Consensus vs. Clustering IT2:66

IT4:2

IT1:66

IT5:2

IT2:85

IT3:2

IT4:10

IT1:10

FW1:66

OT5:5

OT2:5

(a)

IT2:85

IT3:4

IT4:2

IT1:27

FW1:85

OT1:5

OT3:5

IT5:10

OT5:7

OT4:5

Location-based clustering

OT2:7

(b)

IT5:2

IT3:2

FW1:85

OT1:6

OT3:6

187

OT5:14

OT4:0

Accumulative clustering

OT2:0

(c)

OT1:11

OT3:0

OT4:0

Opinion Dynamics

Fig. 2. Network topology used in the test case

‘correct classification’ we mean that a cluster ci has identified a group of nodes that have actually been compromised, which is determined in the simulations (but not known by the traceability solutions). This value can be calculated after a single execution of these three approaches to study how the results of the initial test-case escalate to larger networks and more challenging APTs. Specifically, we run 10 different APTs on randomly generated network topologies of 50, 100 and 150 nodes, respectively. For simplicity, we start by executing an individual instance of the Stuxnet APT [5] according to the attacker model established in Sect. 4.2. This attack can be formally defined by the following succession of stages: attackSetStuxnet = {initialIntrusionIT , LateralM ovementF W , LateralM ovementOT , destruction} At this point, it is worth mentioning that the lateral movement in the OT section is performed three times to model the real behavior of this APT and its successive anomalies, as explained in [5]. The purity value is then calculated after every attack stage of each of the ten APTs, to ultimately compute its average with respect to the number of nodes that have been successfully detected and grouped in the cluster with highest value of affection. Figure 3(a) represents these average values in the form of boxplots, where each box represents the quartiles of each detection approach given the different network configurations. As it can be noted, the Opinion Dynamics stands out as the most accurate solution, closely followed by the accumulative clustering approach. The purity of the location-based clustering falls behind, and the three of them increase their value as the network grows in size due to the higher number of nodes that are successfully deemed as healthy and hence not mixed with those that are indeed affected by the APT. Similar results are obtained when we execute two APT attacks in parallel over the same network configurations, as shown in Fig. 3(b). In this case, the former APT is coupled with another attack, which can be assumed to be part

188

J. E. Rubio et al.

(a) Single APT

(b) Two APTs

Fig. 3. Purity average for the three test cases

of Stuxnet or a completely different attack trace within the network, composed by the following stages: attackSetAnotherAP T = {initialIntrusionOT , LateralM ovementF W , LateralM ovementIT , destruction} The second APT is located in a different area of the network so that it begins by sneaking into the OT section to subsequently propagate towards the IT portion of the infrastructure. This causes the spread of anomalies throughout the network hence putting location-based clustering to the test. Despite a subtle decline in the purity of the solutions (especially in the location approach due to the anomaly dispersion), they still output an appreciable accuracy. On the other hand, the superiority of Opinion Dynamics and accumulative clustering over the first approach is also evident with the study of additional accuracy indicators, such as the Rand Index. It penalizes both false positive (FP) and false negative (FN) labeling of affected nodes during clustering, with respect to true positive (TP) and true negative (TN) decisions, according to the following formula: Rand Index =

TP + TN TP + FP + FN + TN

(2)

Figure 4 shows the Rand Index value after each of the ten APTs in the previous experiment (each one composed of two parallel attack traces), for the largest network size (150 nodes). The plot clearly shows a steady accuracy of the two latter approaches (close to 1), contrasting with a lower value in the locationbased approach, which faces a lack of precision when it comes to correctly locate the affection areas, for the same reasons discussed before. Despite the promising results of our clustering approach in terms of accuracy, we are currently in the process of assessing its performance (compliance with S5 requirement).

Distributed Detection of APTs: Consensus vs. Clustering

189

1

Rand Index

0.9

0.8

Location-based clustering Accumulative Clustering Opinion Dynamics

0.7

1

2

3

4

5

6

7

8

9

10

APT #

Fig. 4. Evolution of the Rand Index for 10 APTs and 150 nodes

7

Conclusions

The irruption of APTs is demanding for the development of innovative solutions capable of detecting, analyzing and protecting current and upcoming critical infrastructures. After an exhaustive analysis of the security and detection requirements for these solutions, we come up with a framework for developing APT traceability systems in Industry 4.0 scenarios, inspired by a promising approach called Opinion Dynamics. This framework considers various network architectures, types of attack and data acquisition models to later define the inputs and outputs that traceability solutions should include to support the aforementioned requirements. This lays the base for the development and comparison of novel solutions in this context. As a means to validate the proposed framework, we define two novel protection mechanisms based on clustering, which feature comparable results to the Opinion Dynamics (based on consensus). According to our experiments, the proposed clustering mechanism also presents optimal traceability of events in a distributed setting. The roadmap of this research now leads to further validation and possible extensions of the proposed framework, as well as to the application of these techniques to diverse real-world industrial scenarios. Acknowledgments. This work has been partially supported by the EU H2020SU-ICT-03-2018 Project No. 830929 CyberSec4Europe (cybersec4europe.eu), the EU H2020-MSCA-RISE-2017 Project No. 777996 (SealedGRID), and by a 2019 Leonardo Grant for Researchers and Cultural Creators of the BBVA Foundation. The first author has been partially financed by the Spanish Ministry of Education under the FPU program (FPU15/03213) and R. Rios by the ‘Captacion del Talento Investigador’ fellowship from the University of Malaga.

A

Correctness Proof of the Clustering Detection Approach

This section presents the correctness proof of the consensus-based detection, both the location and accumulative approach. This problem is solved when these conditions are met:

190

J. E. Rubio et al.

1. The attacker is able to find an IT/OT device to compromise within the infrastructure. 2. The traceability solution is able to identify an affected node, thanks to the clustering mechanism and fulfilling O1. 3. The detection can continuously track the evolution of the APT and properly finish in a finite time (termination condition), complying with O2 and O3. The first requirement is satisfied under the assumption that the attacker breaks into the network and then moves throughout the topology following a finite path, according to the model explained in Sect. 4.2. Thus, an APT is defined as at least one sequence of attack stages against the network defined by G(V, E). If we study each of these traces independently, and based on the distribution of G, the attacker can either compromise the current node vi in the chain (as well as performing a data exfiltration or destruction) or propagate to another vj ∈ V , whose graph is connected by the means of firewalls, according to the interconnection methodology illustrated in [5] and summarized in Sect. 4.1. As for the second requirement, it is met with the correlation of anomalies generated by agents in each attack phase. As presented with the attacker model, the value of these anomalies are determined in a probabilistic manner, depending on two possible causes: (1) the severity of the attack suffered and the criticality of the concerned resource; or (2) an indirect effect caused by another attack in the vicinity of the monitored node. Either way, the O1 correlation helps to actually determine whether the attack has been effectively perpetrated against that node, or it belongs to another APT stage in its surroundings. This information is deduced from the combination of I2 (the contextual information) together with these anomalies (i.e., I1), by using K-means to group these nodes and associate them with actual attacks. We can easily demonstrate the third requirement (i.e., the termination of the approach) through induction. To do so, we specify the initial and final conditions as well as the base case: Precondition: we assume the attacker models an APT against the network defined by graph G(V, E) where V = , following the behaviour explained in Algorithm 1. On the other hand, the detection solution based on clustering can firstly sense the individual anomalies in every distributed agent, hence computing I1 and I2. Postcondition: the attacker reaches at least one node in G(V, E) and continues to execute all stages until attackSet =  in Algorithm 1. Over these steps, it is possible to visualize the threat evolution across the infrastructure, following the procedure described in Algorithm 2 in the case of accumulative clustering, and running K-means with both I1 and spatial information, in the case of location-based clustering. Case 1: the adversary intrudes the network and takes control of the first node vi ∈ V , and both clustering approaches cope with the scenario of grouping healthy nodes apart from the attacked node. This is calculated by the Kmeans algorithm within a finite time, by iteratively assigning data items to clusters and recomputing the centroids.

Distributed Detection of APTs: Consensus vs. Clustering

191

Case 2: the adversary propagates from a device node vi to another vj , so that there exist (vi , vj ) ∈ E. In this case, the correlation with K-means aims to group both affected nodes within the same cluster, which can be visualized graphically. As explained before, this is influenced by the attack notoriety and the closeness in the anomalies sensed by their respective agents (i.e., the threshold  in Algorithm 2), as well as extra information given by I2. Induction: if we assume the presence of k ≥ 1 APTs in the network, each one will consider Case 1 at the beginning and will separately consider Case 2 until attackSet =  for all k, ensuring the traceability of the threat and complying with the postcondition. Eventually, these APTs could affect the same subset of related nodes in G, which is addressed by the K-means to correlate the distribution of anomalies (again, attempting to distinguish between attacked nodes and devices that may sense side effects), in a finite time. This way, we demonstrate the validity of the approach, since it finishes and it is able to trace the threats accordingly.

References 1. Khan, A., Turowski, K.: A survey of current challenges in manufacturing industry and preparation for industry 4.0. In: Proceedings of the First International Scientific Conference “Intelligent Information Technologies for Industry” (IITI 2016), pp. 15–26. Springer (2016). https://doi.org/10.1007/978-3-319-33609-1 2 2. Singh, S., Sharma, P.K., Moon, S.Y., Moon, D., Park, J.H.: A comprehensive study on APT attacks and countermeasures for future networks and communications: challenges and solutions. J. Supercomput. 75(8), 4543–4574 (2016). https://doi. org/10.1007/s11227-016-1850-4 3. Lemay, A., Calvet, J., Menet, F., Fernandez, J.M.: Survey of publicly available reports on advanced persistent threat actors. Comput. Secur. 72, 26–59 (2018) 4. Mitchell, R., Chen, I.-R.: A survey of intrusion detection techniques for cyberphysical systems. ACM Comput. Surv. (CSUR) 46(4), 55 (2014) 5. Rubio, J.E., Roman, R., Alcaraz, C., Zhang, Y.: Tracking APTs in industrial ecosystems: a proof of concept. J. Comput. Secur. 27(5), 521–546 (2019) 6. Zeng, P., Zhou, P.: Intrusion detection in SCADA system: a survey. In: Li, K., Fei, M., Du, D., Yang, Z., Yang, D. (eds.) ICSEE/IMIOT -2018. CCIS, vol. 924, pp. 342–351. Springer, Singapore (2018). https://doi.org/10.1007/978-98113-2384-3 32 7. Rubio J.E., Roman R., Lopez J.: Analysis of cybersecurity threats in industry 4.0: the case of intrusion detection. In: The 12th International Conference on Critical Information Infrastructures Security, volume Lecture Notes in Computer Science, vol. 10707, pp. 119–130. Springer, August 2018. https://doi.org/10.1007/978-3319-99843-5 11 8. Sekar, R., et al.: Specification-based anomaly detection: a new approach for detecting network intrusions. In: Proceedings of the 9th ACM Conference on Computer and Communications Security, pp. 265–274. ACM (2002) 9. Lin, H., Slagell, A., Kalbarczyk, Z., Sauer, P.W., Iyer, R.K.: Semantic security analysis of SCADA networks to detect malicious control commands in power grids. In: Proceedings of the First ACM Workshop on Smart Energy Grid Security, pp. 29–34. ACM (2013)

192

J. E. Rubio et al.

10. Rubio, J.E., Alcaraz, C., Roman, R., Lopez, J.: Current cyber-defense trends in industrial control systems. Comput. Secur. J. 87, 101561 (2019) 11. Moustafa, N., Adi, E., Turnbull, B., Hu, J.: A new threat intelligence scheme for safeguarding industry 4.0 systems. IEEE Access 6, 32910–32924 (2018) 12. Chhetri, S.R., Rashid, N., Faezi, S., Al Faruque, M.A.: Security trends and advances in manufacturing systems in the era of industry 4.0. In: 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1039–1046. IEEE (2017) 13. Vance, A.: Flow based analysis of advanced persistent threats detecting targeted attacks in cloud computing. In: 2014 First International Scientific-Practical Conference Problems of Infocommunications Science and Technology, pp. 173–176. IEEE (2014) 14. Brogi, G., Tong, V.V.T.: Terminaptor: highlighting advanced persistent threats through information flow tracking. In: 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–5. IEEE (2016) 15. Ghafir, I., et al.: Detection of advanced persistent threat using machine-learning correlation analysis. Future Gener. Comput. Syst. 89, 349–359 (2018) 16. Rubio, J.E., Manulis, M., Alcaraz, C., Lopez, J.: Enhancing security and dependability of industrial networks with opinion dynamics. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11736, pp. 263–280. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29962-0 13 17. Lee, S., Shon, T.: Open source intelligence base cyber threat inspection framework for critical infrastructures. In: 2016 Future Technologies Conference (FTC), pp. 1030–1033. IEEE (2016) 18. Rubio, J.E., Alcaraz, C., Lopez, J.: Preventing advanced persistent threats in complex control networks. In: Foley, S.N., Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10493, pp. 402–418. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-66399-9 22 19. Rubio, J.E., Roman, R., Alcaraz, C., Zhang, Y.: Tracking advanced persistent threats in critical infrastructures through opinion dynamics. In: Lopez, J., Zhou, J., Soriano, M. (eds.) ESORICS 2018. LNCS, vol. 11098, pp. 555–574. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99073-6 27 20. Lopez, J., Rubio, J.E., Alcaraz, C.: A resilient architecture for the smart grid. IEEE Trans. Ind. Inform. 14, 3745–3753 (2018) 21. Rubio, J.E., Roman, R., Lopez, J.: Integration of a threat traceability solution in the industrial Internet of Things. IEEE Trans. Ind. Inform. (2020). In Press 22. Rui, X., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005) 23. Pham, D.T., Dimov, S.S., Nguyen, C.D.: Selection of k in k-means clustering. Proc. Inst. Mech. Eng. Part C: J. Mech. Eng. Sci. 219(1), 103–119 (2005) 24. Pelleg, D., Moore, A.W., et al.: X-means: extending k-means with efficient estimation of the number of clusters. In: Icml, vol. 1, pp. 727–734 (2000) 25. Bilmes, J., Vahdat, A., Hsu, W., Im, E.J.: Empirical observations of probabilistic heuristics for the clustering problem. Technical Report TR-97-018, International Computer Science Institute (1997) 26. Cali´ nski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.Theory Methods 3(1), 1–27 (1974) 27. Wagstaff, K., Cardie, C., Rogers, S., Schr¨ odl, S., et al.: Constrained k-means clustering with background knowledge. Icml 1, 577–584 (2001) 28. Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)

Designing Reverse Firewalls for the Real World Ang`ele Bossuat1(B) , Xavier Bultel4 , Pierre-Alain Fouque1 , Cristina Onete2 , and Thyla van der Merwe3 1

4

Univ Rennes, CNRS, IRISA, Rennes, France [email protected] 2 University of Limoges/XLIM/CNRS, Limoges, France 3 Mozilla, London, UK LIFO, INSA Centre Val de Loire, Universit´e d’Orl´eans, Bourges, France

Abstract. Reverse firewalls (RFs) were introduced by Mironov and Stephens-Davidowitz to address algorithm-substitution attacks (ASAs) in which an adversary subverts the implementation of a provably-secure cryptographic primitive to make it insecure. This concept was applied by Dodis et al. in the context of secure key exchange (handshake phase), where the adversary wants to exfiltrate sensitive information by using a subverted client implementation. RFs are used as a means of “sanitizing” the client-side protocol in order to prevent this exfiltration. In this paper, we propose a new security model for both the handshake and record layers, a.k.a. secure channel. We present a signed, Diffie-Hellman based secure channel protocol, and show how to design a provably-secure reverse firewall for it. Our model is stronger since the adversary has a larger surface of attacks, which makes the construction challenging. Our construction uses classical and off-the-shelf cryptography.

1

Introduction

In 2013, Snowden revealed thousands of classified NSA documents indicating evidence of widespread mass-surveillance. In his essay on the moral character of cryptographic work [21], Rogaway suggests that the most important effect of the Snowden revelations was the realization of the existence of a new kind of adversary with much greater powers than ever imagined, aiming to enable masssurveillance by eavesdropping on unsecured communications, and by negating the protection afforded by cryptography. It has massive computational resources at its disposal to mount conventional attacks against cryptography, and also tries to subvert the use of cryptographic primitives by introducing backdoors into cryptographic standards [6,7] or, equally insidiously, by using Algorithm Substitution Attacks (ASA). The latter class of attacks was formalised by Bellare et al. [1] in the context of symmetric encryption, but actually goes back to much Full version available at eprint.iacr.org/2020/854. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 193–213, 2020. https://doi.org/10.1007/978-3-030-58951-6_10

194

A. Bossuat et al.

earlier work on Kleptography [22]. Bellare et al. [1] envisage a scenario where the adversary substitutes a legitimate and secure implementation of a protocol with one that leaks cryptographic keys to an adversary in an undetectable way. Sadly, they showed that all major secure communication protocols are vulnerable to ASAs because of their intrinsic reliance on randomness. Reverse Firewalls (RF). Mironov and Stephens-Davidowitz [18] define 3 properties RF should satisfy: (1) security preserving: regardless of the client’s behaviour, the RF will guarantee the same protocol security; (2) functionalitymaintaining: if the user implementation is working correctly, the RF will not break the functionality of the underlying protocol; and (3) exfiltration resistance: regardless of the client’s behaviour, the RF prevents the client from leaking information. In the key-exchange security game of RF, the adversary first corrupts the client by changing its code. Then, during the communication step, there is no more “direct” exchange, and the adversary has to exfiltrate information from the messages circulating between the RF and the server. Dodis et al. [9] described an ASA allowing an adversary to break the security of many authenticated key exchanges (AKE). The attack is in the spirit of [1]: substitute an implementation of a (provably-secure) AKE protocol with a weaker one, in an undetected way for the servers, then exfiltrate information to a passively observing adversary. Next, they introduce a RF, used by either of the two endpoints, to prevent any exfiltration during the AKE protocol. The RF acts like a special entity in charge of enforcing the protocol’s requirements (e.g., checking signatures, adding randomness) to ensure the robustness of the exchanged keys even for a misbehaving client implementation. Delegating all sensitive steps to the RF would be simpler, but goes against the strong advocated model. RF should preserve security, not provide it, as security must hold even with a malicious RF. Moreover, for practical reasons, both endpoints should still be able to communicate even if the RF is not responding, e.g., due to too many connections. Dodis et al.’s protocol is a simple signed Diffie-Hellman key exchange modified to accommodate an RF. They prove that their new scheme is still a secure 2-party AKE protocol in the absence of an RF, and that it additionally provides exfiltration resistance when an RF is added even if the endpoints are corrupted. Furthermore, the RF learns no information about the session keys established by the endpoints. Finally, they show that the resulting AKE protocol can be composed with a secure messaging protocol and still provide a certain degree of exfiltration resistance for specific weak client implementations. To this end, they rerandomize the Diffie-Hellman inputs and perform verification tasks. Since the two parties no longer see the same tuple of DH elements, the signatures cannot be made on the transcripts directly: a bilinear pairing is cleverly used to sign a deterministic function of the transcripts. There are many possibilities for a client implementation to become weak: (1) server authentication can be avoided and the adversary can play the role of a malicious server, (2) by using weak randomness or pre-determined “randoms”, the adversary can predict the client’s ephemeral DH secret and recover

Designing Reverse Firewalls for the Real World

195

the common key exchange and (3) the client can skip the authenticated encryption (AEAD) security during the record phase. We point out that for channel security, because of weakness (3), we cannot just compose a key exchange protocol secure in this model with a secure channel without looking at the security of the channel. There is a subtle theoretic problem in the exfiltration resistance game: since the client knows the key, he can choose messages that depend on the key, which is not the case in traditional encryption security games. Other Approaches. RFs have been extended to other contexts such as malleable smooth projective hash functions [8], and attribute-based encryption [17]. While these results have no real link to our work, they show that RFs are relatively versatile and a promising solution to ASAs. They also superficially resemble middleboxes, which have been studied in the context of TLS [12,19,20]. However, RFs fundamentally differ from middleboxes: the former are meant to preserve the confidentiality, integrity, and authenticity of the secure channel, while the latter break it in controlled ways. Comparison Between Previous Attacker Model and Our Guarantees. Dodis et al. [9] consider three attack scenarios. The first is the security of the primitive in the absence of a firewall (but where the client and server honestly follow the protocol). The second entails that the primitive should still be secure even in the presence of a malicious firewall. The third, and most important security notion is exfiltration resistance, where a malicious implementation of the client tries to exfiltrate information to a network adversary able to monitor the channel between the (honest) RF and the server. For this last definition, the adversary also controls the server. This has three consequences: (1) exfiltration resistance may at most be guaranteed for the handshake: if the RF is not present in the basic 2-party protocol, there is no security check at the record layer and the firewall cannot prevent a client from exfiltrating information to a malicious server at the record layer; (2) [9] have to restrict their malicious implementations to behave in a very particular way, called functionality preserving: in this model, the malicious client must follow protocol, although it may use weak parameters or bypass verification steps; (3) if, within a protocol, the client sends the first key-exchange element, then a commitment phase from the server must precede that message, essentially preventing the server from adaptively choosing a weak DH element for the key-exchange. A more in-depth comparison is given in Appendix A. In contrast to [9] we consider security for both the key-agreement and the record-layer protocols. As mentioned above, this is a non-trivial extension with respect to exfiltration resistance, since the channel keys do not imply exfiltration resistance at the record layer. Our results are proven in the presence of a semi-honest server who follows the protocol specification. We argue that this restriction is necessary since nothing can prevent a malicious server – not monitored by a firewall – from forwarding all the data it receives to the adversary. We focus on guaranteeing exfiltration resistance (in the handshake and in the record layer) with respect to a MitM situated between the firewall and the server.

196

A. Bossuat et al.

Adding RF with Exfiltration Resistance at the Record Layer. Protecting the record layer is more challenging than the key exchange, and means considering these two stages as a whole. Our key observation is that an RF cannot prevent exfiltration at this layer if it does not know the key used for the encryption: nothing prevents the adversary from choosing messages which, for the computed key, leak information to the adversary. Another difficulty is that, contrary to the key exchange which essentially involves public-key primitives (usually quite malleable), the record layer relies on symmetric key primitives, less tolerant to modifications. Our approach differs from [9] by introducing a new functionality preserving definition: we do not restrict the client’s behaviour beyond requiring that a semi-honest server accepts the communication. Our solution is to allow the RF to contribute to securing messages exchanged during the record layer, without compromising the end-to-end security that the channel should provide. As in [9], the main task of our RF is to rerandomize some elements and verify the validity of the signatures. The main difference we introduce is that this key exchange will now generate two keys: kcs and kcfs . The former is very similar to the one generated in [9] and will be only known to both endpoints. It will be used to encrypt messages at the record layer, ensuring security even against a malicious RF. The key kcfs will be known to the endpoints, but also to the RF, allowing it to preserve security by adding a second layer of encryption. To accomplish this, our RF will have a public key, which was not the case in [9]. This is non-trivial in practice: for transparency the RF must not modify the messages’ format, and the endpoints must be oblivious to the RF’s action. Indeed, if the RF was offline or not capable of protecting all of its clients, this must not be detectable by a corrupt implementation. Finally, we cannot rely on standard security properties for encryption to prove our exfiltration-resistance. This is related to the adversarial strategy of choosing messages whose ciphertexts have a distinctive pattern. We provide more details in Sect. 3.3 but, intuitively, the ciphertexts meant to be distinguished by the adversary are encryptions of messages chosen by a corrupt implementation that knows the channel key. Using key-dependent messages schemes [3], we design a very efficient solution based on hash functions and MAC schemes to prevent exfiltration at the record layer.

2 2.1

Security Model and Definitions The Adversary Model

The adversary A is a MitM, interacting with honest parties (one client C , one server S , and sometimes an RF FW ) via instances associated with unique labels. We assume that the client is always the initiator of each execution, and the server is the responder. The server and the firewall have an identity associated with a pair of private-public keys. Additionally, we require a setup phase for the firewall, run either by the challenger if the firewall is honest, or by the adversary if it is malicious. The Setup(1λ ) algorithms takes as input a security parameter 1λ and generates secret parameters sparam (including the firewall’s private credentials) and public parameters pparam (including FW ).

Designing Reverse Firewalls for the Real World

197

To each instance label, there is an associated set of values: a type type in {C , S , FW }, the entity of which label is an instance, a session identifier sid, and two types of session keys, denoted kcfs and kcs , respectively. Both keys are initially set to ⊥. An authentication-acceptance bit accept, set to ⊥ at the beginning of the instance, which may turn to either 1 (if the authentication of the partner succeeds) or to 0 otherwise. Only client and firewall instances have a non-⊥ accept bit (since only the server authenticates itself). A pair of revealed bits revealedcfs and revealedcs , both initially set to 0. A corrupt bit corrupted initially set to 0, and a test bit b drawn at random at the initialization of each new instance label. We use the notation “label.attribute” to refer to a value attribute associated with the instance label. Partnering. We define partnering in terms of type and session identifiers. Two instances, associated with labels label and label , are partnered if and only if label.sid = label .sid and label.type = label .type. As such, a client may be partnered with a server, or a server and a firewall. The scenarios we consider are formally defined through security experiments, in which the adversary A has access to some (or all) of the following oracles. – NewInstance(U ): on input U ∈ {C , S , FW }, this oracle outputs an instance of party U labeled label. The adversary is given label. – Send(label, m): on input an existing client/RF/server instance label and a message m, this oracle simulates sending m to label, adds label to a list Ls (initialized as an empty list) and outputs its corresponding reply m . – Reveal(label, kt): on input an existing label label and a string kt ∈ {cs, cfs} denoting the key type to reveal, this oracle adds label to a list Lkt (initialized as an empty list), and outputs the values stored in label.kkt . The corresponding bit label.revealedkt is also set to 1. – CorruptS(): this oracle yields the server’s private long-term key. Upon corruption, all client and firewall instances with label.accept = 1 change their corrupted values to 1. Moreover, any client or firewall instance generated after this CorruptS query will have corrupted initialized to 1. – Testkt (label): with index a string kt ∈ {cs, cfs}, on input an instance label, this oracle first verifies that label.kkt = ⊥ (otherwise, it outputs ⊥). Then, depending on the test instance it responds as follows. • if label is a client instance, it outputs either the key stored in label.kkt (if label.b = 1), or a randomly-chosen key of the same length (if label.b = 0); • if label is a server or firewall instance, it returns ⊥ if it has no partnered instance label’ of type C . Otherwise, it returns Testkt (label ). – TestSend(label, m): on input an instance label and a message m, if F [label] =⊥ (F is used to keep track of the firewalls), the oracle creates an RF instance labeled F [label]: • if label.b = 1, this oracle acts as Send(label, m) except that it does not update Ls ; • else, this oracle acts as Send(F [label], Send(label, Send(F [label], m))) except that it does not update Ls (without loss of generality, we assume that the firewall just forward the session initialization message);

198

A. Bossuat et al.

This oracle adds label to a list L∗ (initialized as an empty list). As we only consider unilateral authentication, Testkt is defined asymmetrically, depending on which party the tested instance belongs to. Our last oracle TestSend either sends the unmodified message m, as in a Send query, or forces the RF to be active on this transmission. It will be used to show that the actions of our RF go unnoticed and so will only be used in the obliviousness and transparency experiments. Forward Secrecy. Whenever CorruptS is queried, any ongoing instance label (i.e., any instance such that label.accept = 1) has their corrupt bit set to 1. However, completed, accepting instances are excluded from this and can still be tested. This models forward secrecy. We will include the CorruptS query in all our security definitions, thus capturing forward secrecy by default. Reveal and State Reveal. Krawczyk and Wee [16] also consider an oracle that reveals parts of the ephemeral state of a given party. In multi-stage AKE protocols, revealing state distinguishes inputs that the keys depend on from inputs whose knowledge does not affect the security of the channel. For simplicity, we omit state reveal queries in this paper, since we focus on exfiltration resistance and security in the presence of malicious firewalls, for which state reveals are less interesting. Freshness. To rule out trivial attacks, we introduce the notion of freshness, which determines the instances that A is allowed to attack. Definition 1 (Freshness). Let kt ∈ {cs, cfs}. An instance label is kkt -fresh if: – label.accept = 1 (for a client or firewall instance); and label.corrupted = 0 – label.revealedkt = 0, and label .revealedkt = 0 for any partner label of label; Channel Security. Whenever the session keys are indistinguishable from random for the attacker, one can implicitly rely on existing work on constructing secure channels by composition, e.g. [4,14] to prove the security of the established channel. However, in the case of exfiltration resistance the hypothesis does not hold: the corrupt implementation of the client does have access to the keys. 2.2

The Security of kcs

We look at the AKE security of the key kcs when the firewall is malicious as it clearly implies security with an absent or an honest firewall. Our definition of AKE does not demand explicit server-authentication (see [2]). Since the adversary knows kcfs , the default testing mode is Testcs (·). The adversary first runs the Setup algorithm, outputting the public parameters to the challenger (since the server credentials are not included, the adversary does not learn them). Definition 2 ( CS-AKE security – kcs ). A cs-AKE protocol Π is secure if (λ) = |2 · Pr[1 ← ExpCS-AKE (λ)] − 1| is for all PPT adversaries A , AdvCS-AKE Π,A Π,A CS-AKE negligible in the security parameter λ, where ExpΠ,A (λ) is given in Fig. 1.

Designing Reverse Firewalls for the Real World

199

CS-AKE ExpΠ,A (λ):

ExpCFS-AKE (λ): Π,A

1. (sparam, pparam) ← A Setup(·) (1λ ) 2. Q ← {NewInstance, Send, Reveal, CorruptS, Testcs (·)} 3. (label, b∗ ) ← A Q (sparam, pparam) 4. if (label is kcs -fresh) ∧ (label.b = b∗ ): return 1 5. else: return a random bit ExpExf Π,A (λ):

1. (sparam, pparam) ← Setup(·)(1λ ) 2. Q ← {NewInstance, Send, Reveal, CorruptS, Testcfs (·)} 3. (label, b∗ ) ← A Q (pparam) 4. if (label is kcfs -fresh) ∧ (label.b = b∗ ): return 1 5. else: return a random bit ExpOBL Π,A (λ):

(sparam, pparam) ← Setup(1λ ) (P 0 , P 1 ) ← A (pparam) Q ← {NewInstance, Send, Reveal} (labelFW , b∗ ) ← A Q (pparam, P 0 , P 1 ) if (labelFW is exfiltration-fresh) ∧ (labelFW .b = b∗ ): return 1 6. else: return a random bit

1. (sparam, pparam) ← A Setup(·) (1λ ) 2. Q ← {NewInstance, Send, Reveal, CorruptS, TestSend} 3. (label, b∗ ) ← A Q (sparam, pparam) 4. if (label is Send-fresh)∧ (label.b = b∗ ): return 1 5. else: return a random bit

1. 2. 3. 4. 5.

Fig. 1. Experiments for the CS-AKE, CFS-AKE, exfiltration and obliviousness games.

2.3

The Security of kcfs

The adversary can no longer control the RF (which is able to compute kcfs ) and simply acts as a Man-in-the-Middle between the client and the firewall and/or the firewall and the server. The adversary first runs the Setup algorithm, outputting the parameters to the challenger. We focus on the first “layer” of keys, i.e., on kcfs , thus, the default mode for the test oracle is Testcfs (·). Definition 3 (AKE security – kcfs ). A cfs-AKE protocol Π is kcfs -secure if (λ) = |2 · Pr[1 ← ExpCFS-AKE (λ)] − 1| is for all PPT adversaries A , AdvCFS-AKE Π,A Π,A CFS-AKE (λ) is given in Fig. 1. negligible in λ, where ExpΠ,A 2.4

The Malicious Client Scenario

The malicious client scenario is the core motivation behind this work. The firewall and server are both honest, but the adversary can tamper with the client-side implementation. The RF must guarantee the security of the session keys and ensure exfiltrate resistance. In our model, we also require this guarantee to hold for record-layer transmissions. Client-Adversary Interaction. We consider a MitM adversary A situated between the firewall and the server1 . The attacker has substituted the client-side implementation but this implementation has no direct or covert communication channel with the adversary. The game starts with an honest Setup phase, then after being given the public parameters, the adversary A creates two client programs P 0 , P 1 using 1

Exfiltration resistance would obviously be unachievable elsewhere.

200

A. Bossuat et al.

some suitable (unspecified) encoding, and outputs them to the challenger. The adversary will query NewInstance(FW ) to create new firewall instance(s), which will (unbeknownst to A ) interact with one of the two client programs P 0 or P 1 , provided by A . In this case the firewall instance’s test bit b ∈ {0, 1} indicates which of the two adversarial programs it interacts with. In the malicious client scenario, the adversary interacts directly with the server and the firewall instances through the Send queries, but only indirectly (through the firewall) with the client programs. We note that A ’s interaction with the server and the firewall is still arbitrary, i.e., A still actively controls the messages sent between them. The goal of the adversary is to guess which of the supplied client programs P 0 , P 1 the firewall has been interacting with. Formally, A outputs a tuple consisting of a firewall instance labelFW and a bit d, representing its guess of labelFW .b. Trivial Attacks. It is impossible to prove this “left-or-right” exfiltration resistance for arbitrary programs P 0 , P 1 : A could output a program P 0 which produces illegal messages, whereas P 1 emulates the protocol perfectly. The firewall will then abort for labelFW .b = 0, but not for labelFW .b = 1, allowing A to distinguish the two cases. Similarly, A can also trivially win if P 0 and P 1 output messages of different lengths. Informally, we require that the externally observable behavior of the two programs P 0 , P 1 , i.e., what the RF and server do in response to the programs’ messages, should be similarly structured: the number of messages generated should be the same and these messages should pairwise have the same length. However, we do not restrict the semantics of these messages. This restriction applies only to the programs selected for the target instance; all other instances can behave in any way. Formally, we consider a program P , a firewall FW , and a server S . We sent between the denote by τ[labelFW (P )↔labelS ] the ordered transcript of messages   firewall (interacting with P ) and the server. Let L = τ[labelFW (P )↔labelS ]  denote the length of the transcript. For i ∈ {1, . . . , L}, let τ[labelS ↔label  FW (P )] [i] denote the i-th message in the transcript, and let τ[labelS ↔labelFW (P )] [i] denote its length. Definition 4 (Transcript equivalence). Two programs P 0 and P 1 have equivalent transcripts in an execution with the firewall FW and the server S if the following conditions hold:      1. τ[labelFW (P 0 )↔labelS ]  = τ[label  FW (P 1 )↔labelS ] .  2. τ[labelFW (P 0 )↔labelS ] [i] = τ[labelFW (P 1 )↔labelS ] [i] for each i ∈ {1, . . . , L}. Only allowing client programs that generate equivalent transcripts rules out the trivial attacks mentioned above. Interestingly, it is arguably much easier to decide whether two programs have equivalent transcripts (as it is only based on measurable quantities) than to determine if a corrupt program is “functionalitymaintaining” in the sense of [9]. It allows realistic attacks, e.g., selection of messages whose ciphertexts have very specific features, contrarily to [9].

Designing Reverse Firewalls for the Real World

201

Some of our requirements are fullfilled by using specific cryptographic tools, such as length-hiding authenticated encryption [13,15] which would, for example, make the second condition easily satisfied. Regarding the first condition, differences in the number of exchanged messages are likely to imply an unusual behaviour of the client (e.g., a large number of aborted connections) and so can potentially be detected by an external security event management system. Definition 5 (Exfiltration freshness). A firewall labelFW is exfiltrationfresh with respect to programs P 0 , P 1 if: – labelFW .accept = 1; for any label, label.revealedcfs = 0; – the programs P 0 and P 1 generate equivalent transcripts in their execution with the firewall instance labelFW and its partnering server instance labelS . Definition 6 (Exfiltration). An AKE protocol Π is exfiltration secure if for Exf all PPT adversaries A , AdvExf Π,A (λ) = 2 · | Pr[1 ← ExpΠ,A (λ)] − 1| is negligible Exf in λ, where ExpΠ,A (λ) is given in Fig. 1. 2.5

Obliviousness

Dodis et al. take the obliviousness into account for their protocols: the client (resp. the server) cannot distinguish2 whether it interacts with the firewall or the server (resp. the client). To discard trivial attacks, we define the sending freshness. A label is sending-fresh when it is never queried in Send and its related firewall F [label] is never an input of the Send and Reveal oracles. Definition 7 (Sending freshness). A firewall instance label is send-fresh if / Ls . F [label] ∈ / Ls ∪ Lcfs and label ∈ Definition 8 (Obliviousness). An AKE protocol Π is oblivious if for all PPT OBL adversaries A , AdvOBL Π (A ) = |2 Pr[ExpΠ,A (λ) ⇒ 1] − 1| is negligible in λ, where ExpOBL Π,A (λ) is given in Fig. 1.

3

Our Reverse Firewall

We now describe a message-transmission protocol compatible with a reverse firewall. We consider a setting of unilateral authentication, i.e., only the server authenticates itself, using digital signatures. As in [9], we start with a simple DH key exchange protocol between a client and a server, modified to accommodate an RF. In particular, the goal of this new key exchange is for the client and the server to compute two keys: kcs known only to them, and kcfs which will also be known to the firewall. We manage to avoid the use of pairings and only use standard cryptographic tools (such as CPA-secure public key encryption) along with some tricks to ensure the obliviousness of our protocol. 2

The related definition of transparency is given in Appendix B.

202

A. Bossuat et al.

However, we must keep in mind that a shared key is just a tool to protect further communication, and we thus need to explain how the RF can proceed to preserve security of future messages sent by a corrupt client, while only having access to kcfs . This is particularly difficult as the record-later only involves symmetric key primitives, and moreover, other subtleties prevent us from using more “natural” solutions. For the sake of clarity, we describe our protocol step by step, first presenting a key agreement with a passive firewall in Sect. 3.1 and then explaining in Sect. 3.2 how the latter can act to preserve security even with a corrupt client. Finally, we will consider in Sect. 3.3 the record-layer, describing how the keys generated during the key agreement should be used to prevent exfiltration at this stage. 3.1

Signed Diffie-Hellman with Passive/No Firewall

Setup. Before the protocol runs there is a one-time setup phase where the RF chooses a public/private key pair (pkFW , skFW ) for some public encryption scheme Enc, such as the one proposed by El Gamal [11], and sends pkFW to the client. In case the client does not have an RF installed (and hence no pkFW value), the client draws a new random element from the key space in every protocol run and uses this as a substitute for pkFW . Client (pkFW , pkS ) $

x, c ← Zp X = gx , C = gc e = EncpkFW (c)

If σ verifies: kcs ← Y x·β1 kcfs ← Dc·β2 Else: abort.

Firewall (skFW , pkFW , pkS )

Server (skS , pkS ) $

y, d, β1 , β2 ← Zp Y = gy , D = gd (X, C, e)

c = DecskFW (e)

(σ, Y, D, β1 , β2 )

(X, C, e) (σ, Y, D, β1 , β2 )

kcfs ← Dc·β2

σ = SignskS (Y, D, X β1 , C β2 ) kcs ← X y·β1 kcfs ← C d·β2

Fig. 2. A key agreement protocol between the client, a passive firewall, and the server.

The Protocol. Figure 2 depicts how the protocol runs in the presence of a passive firewall. The client begins by choosing two ephemeral DH shares (X, C) ← (g x , g c ) in some cyclic group G generated by g, encrypting c with the firewall public key pkFW to get e = EncpkFW (c), and sending (X, C, e) to the server forwarded unmodified by the firewall. The latter decrypts e with skFW to get c. The server in turn chooses two ephemeral DH shares (Y, D) ← (g y , g d ) and two random elements (β1 , β2 ) from Zp , and sends these, together with a signature of (Y, D, X β1 , C β2 ) to the client (the firewall just forwards this message). The server also computes the keys kcs ← X y·β1 and kcfs ← C d·β2 . Provided the

Designing Reverse Firewalls for the Real World

203

signature is valid, the client computes the keys kcs ← Y x·β1 and kcfs ← Dc·β2 . Finally, the RF computes kcfs ← Dc·β2 . The scheme is correct since X y = Y x = g x·y and C d = Dc = g c·d . Security. Intuitively, only the parties knowing x or y can recover kcs , and only the parties knowing d or c can recover kcfs – which has no actual use in case the RF is passive or absent. The cs-AKE security of this protocol is implied by that of the following protocol, where we consider an active, potentially malicious, adversary. Client (pkFW , pkS )

Firewall (skFW , pkFW , pkS )

Server (skS , pkS )

x, c ← Zp X = gx , C = gc e = EncpkFW (c)

α1 , α2 ← Zp

y, d, β1 , β2 ← Zp Y = gy , D = gd

$

(X, C, e)

$

$

c = DecskFW (e) If X ∈ / G or C ∈ / G or g c = C: abort Else: e˜ = EncpkFW (c · α2 ) ˜ C, ˜ e˜) (X, ˜ = C α2 ˜ = X α1 , C X (σ, Y, D, β1 , β2 )

˜ β1 , C ˜ β2 ) σ = SignskS (Y, D, X

If (Y, D) ∈ / G2 or invalid σ: abort. Else: γ1 = α1 · β1 , γ2 = α2 · β2 (σ, Y, D, γ1 , γ2 ) If σ verifies: kcs ← Y x·γ1 kcfs ← Dc·γ2 kcfs ← Dc·γ2 Else: abort.

˜ y·β1 kcs ← X ˜ d·β2 kcfs ← C

Fig. 3. An active reverse firewall for the protocol in Fig. 2, using the same notations.

3.2

Signed Diffie-Hellman with an Active Firewall

Next, we construct an active RF for the signed DH protocol described in Sect. 3.1. Note that the protocol in Fig. 2 is compatible with the rerandomization proposed by Dodis et al. [9]: the RF can rerandomize the keyshares without impacting the correctness of the resulting protocol. In Fig. 3 we describe the active RF for the protocol of Fig. 2. The RF rerandomizes the two DH elements X and C with exponents α1 and α2 , respectively, ˜ and C. ˜ It also decrypts e to obtain c, and encrypts c·α2 (with its own to obtain X 3 ˜ C, ˜ e˜) to the receiving endpoint. public key pkFW ) to get e˜, before forwarding (X, Upon reception, the server proceeds as before. The rerandomization made by the RF impacts the input to the server’s signature; the firewall must therefore include its random values to the subsequent response to the client: the server sends (σ, Y, D, β1 , β2 ), and the firewall will forward (σ, Y, D, γ1 = α1 · β1 , γ2 = α2 · β2 ). 3

For transparency, the message sent by the RF must have the same format as (X, C, e).

204

A. Bossuat et al.

Apart from rerandomizing the transcript, the RF verifies that the received elements are from the correct groups and that they are not the neutral element of those groups. In addition, it checks the validity of the server’s signature for the ˜ β1 , C˜ β2 ). rerandomized transcript, i.e., it checks that the server signed (Y, D, X This protocol ensures obliviousness, as stated in Theorem 4. This comes from the fact that the transcripts of the honestly-run protocol (in Fig. 2) and the ones from the protocol in Fig. 3 are identically distributed from the point of view of the endpoints, and a tampered client implementation cannot distinguish whether it is being monitored or not. The security of this protocol (cs- and cfs-AKE security) is formally stated in Theorem 1 and Theorem 2, for which we give proof sketches in Sect. 4, and complete proofs in the full version. Theorem 1. The protocol given in Fig. 3 is cfs-AKE secure (Definition 3) under the DDH assumption, and assuming Sig is EUF-CMA and Enc is IND-CPA.   (λ) ≤ nC · nS · AdvEUF-CMA (λ) + AdvIND-CPA (λ) + AdvDDH (λ) AdvCFS-AKE Π Sig Enc G Theorem 2. The protocol given in Fig. 3 is cs-AKE secure (Definition 2) under the DDH assumption, and assuming Sig is EUF-CMA.   EUF-CMA DDH (λ) ≤ n · n · Adv (λ) + Adv (λ) AdvCS-AKE C S Π Sig G We discuss solutions to “stack” several RFs, and to adapt our RF to a real-world TLS-like protocol in the full version. 3.3

Record-Layer Firewall

Theorem 2 and Theorem 1 prove that our protocol is a secure AKE protocol, with or without an RF. What is still missing is a proof of exfiltration resistance, as per Definition 6. We do not prove this property directly for the protocol when viewed as an AKE, as our goal is to also cover exfiltration resistance when the AKE is combined with a subsequent encryption scheme, and we thus prove exfiltration resistance for the whole channel establishment protocol (AKE + encryption scheme). We explain how our RF can prevent exfiltration at the record-layer. Double CPA Encryption Fails. When the client’s implementation cannot be trusted, the security of the encryption algorithm used at the record layer becomes irrelevant. Even the RF cannot ensure the security of ciphertexts formed with kcs because it does not have this key. We cannot simply add an independent, external encryption layer, taking kcfs as the key, known to the RF, hoping it could preserve security by first decrypting this external layer and then re-encrypting it; even if it seems that no matter how the client implementation behaves, the adversary would only see a valid ciphertext generated using kcfs , which it does not know. It might therefore be

Designing Reverse Firewalls for the Real World Client (kcfs , kcs ) C ← AE.Enc(kcs , , he, M, stE )

Firewall (kcfs )

r ← {0, 1}λ , k1 ← H1 (r||kcfs ) k2 ← H2 (r||kcfs ),

˜1 ← H1 (˜ r˜ ← {0, 1}λ , k r||kcfs ) ˜2 ← H2 (˜ k r||kcfs ),

$

s ← k1 ⊕ C, t = MACk2 (r||s)

205

Server (kcfs , kcs )

$

(r, s, t)

If |s| = : abort k1 ← H1 (r||kcfs ), k2 ← H2 (r||kcfs ), ˜ ← k1 ⊕ s C (˜ r, s˜, t˜) ˜1 ⊕ C, ˜1 ← H1 (˜ ˜ t˜ = MAC˜ (˜ s˜ ← k s) r||kcfs ) k k2 r ||˜ ˜2 ← H2 (˜ r||kcfs ) k ˜ ˜1 ⊕ s˜ ˜←k C

˜ ˜ ← AE.Dec(kcs , he, C, ˜ stD ) M ˜ = ⊥ or t˜ invalid: If M abort

Fig. 4. An active reverse firewall for the secure transmission of a message M , using an stLHAE scheme, where H1 : {0, 1}∗ → {0, 1} and H2 {0, 1}∗ → K (where K is the key set of the MAC) are hash functions modelled as random oracles in the security analysis. The encryption/decryption states used for the inner layer is (stE , stD ). A passive firewall simply forwards (r, s, t).

tempting to conclude that this protocol is exfiltration resistant, assuming INDCPA security of the encryption scheme. However, the IND-CPA security of an encryption scheme ensures that no adversary can decide if a given ciphertext encrypts a message m of its choice. Our adversary is much more powerful here: the client, controlled by the adversary, selects the messages to send while knowing the secret keys, which is not allowed in the IND-CPA security game. Moreover, the corrupt implementation could simply select messages whose ciphertexts contain some specific patterns that could be used as a distinguisher. KDM Encryption Fails. Allowing the adversary to select messages while knowing the key is reminiscent of the key-dependent message model from [3]. Such an encryption scheme remains secure even if the adversary receives the encryption of some messages m = f (sk), where f is chosen by the adversary and sk is the secret key. Unfortunately, this does not exactly match our scenario: here, we have two adversaries, one (the corrupt client) who selects the messages to be encrypted and knows the key, and one (the adversary between the RF and the server) who does not know the key and must distinguish the encryption of these messages. Our model is stronger than KDM, and we therefore cannot fully rely on this property. Our Solution. Nonetheless, the construction proposed in [3] does provide an answer to our problem. In our protocol, the client first encrypts the plaintext message into a ciphertext C using a length-hiding authenticated encryption and the secret key kcs . The client then picks r and computes C  = H(r kcfs ) ⊕ C, which is similar to the technique in [3]. It sends (r, C  ) to the firewall which r kcfs ) ⊕ C using a fresh decrypts C  and re-encrypts it by computing C˜  = H(˜ random r˜. We already showed that the key kcfs is indistinguishable from ran-

206

A. Bossuat et al.

dom. Thus, as long as the adversary does not guess kcfs , the value H(˜ r kcfs ) is indistinguishable from a random one (in the ROM) and acts as a one-time-pad on C. This means that C˜  leaks no information to the adversary. Finally, since a one-time-pad does not prevent malleability, we need to additionally compute a MAC on (˜ r, C˜  ) by using a key derived from kcfs . The resulting protocol is exfiltration resistant as stated in Theorem 3, and is also oblivious in both phases, as stated in Theorem 4. Proof sketches are given in Sect. 4 and Appendix C, respectively, with the complete proof of Theorem 3 given in the full version. Theorem 3. Protocol Π is exfiltration resistant (Definition 6) in the ROM under the CDH, and assuming that Sig is EUF-CMA and Enc is IND-CPA. IND−CPA 2 λ AdvExf (λ) + ns · nf · (q1 + q2 ) · AdvCDG (λ) Π (λ) ≤qm /2 + AdvEnc G

+ nm · qs · AdvEUF−CMA (λ) MAC Theorem 4. Protocol Π is unconditionally oblivious (Definition 8).

4

Security Proofs

We give a sketch of the proofs of Theorems 1 to 3, while the sketch of Theorems 4 is in Appendix C. Complete proofs are in the full version. For each security proof, we define the sid of an instance label (either at the firewall FW or the server S ) to be its input to the signature scheme, i.e., referring to Fig. 3, we have ˜ β1 , C˜ β2 ). sid = (Y, D, X Proof of Theorem 1. Let A be an adversary against the CFS-AKE security of protocol Π. In the following sequence of games, let Si denote the event that A is successful in Game i, and let i = Pr[Si ] − 12 . Thus, Si denotes the event that A correctly guesses the bit labelC .b of a kcfs -fresh client instance labelC . We denote by labeliT the ith label of type T created during the experiment. (λ). Game 0. This game is the original experiment, thus: 0 = AdvCFS-AKE Π,A Game 1. Let nC be the number of client labels created during the experiment. This game is the same as Game 0, except that the challenger picks $ i ← {0, . . . , nC } at the beginning of the game. If A does not return (labeliC , d) for some bit d, the challenger returns a random bit. We have 0 ≤ 1 · nC . Game 2. Let nS be the number of server labels created during the experiment. This game is the same as Game 1, except that the challenger picks $ j ← {0, . . . , nS } at the beginning of the game. If labeliC and labeljS are not partners, the challenger returns a random bit. We have 1 ≤ 2 · nS . Game 3. This is the same as Game 2 except that the challenger aborts, sets abort3 = 1 and returns a random bit if: – A makes a Send(label, (σ, Y, D, γ1 , γ2 )) query before A makes any query to the oracle CorruptS,

Designing Reverse Firewalls for the Real World

207

– Send(label, init) previously output a triplet (X, C, e) such that σ is a valid signature on (Y, D, X γ1 , C γ1 ), and – the server did not previously output (σ, Y, D, β1 , β2 ) for some β1 and β2 . We claim that | Pr[S2 ] − Pr[S3 ]| ≤ Pr[abort3 ] ≤ AdvEUF-CMA (λ). We show Sign that, using a PPT algorithm A that, with non-negligible probability, makes a Send(label, (σ, Y, D, γ1 , γ2 )) query, an adversary B can efficiently break the EUF-CMA experiment by outputting σ. Game 4. This is the same as Game 3 except that when the challenger should encrypt a message c using the public key of the firewall pkFW , the challenger picks a random value and encrypts it. Enc denotes the public key encryption scheme we use. (λ). We show by reduction We claim that | Pr[S3 ] − Pr[S4 ]| ≤ AdvIND-CPA Enc that distinguishing Games 3 and 4 is equivalent to distinguishing whether the ciphertexts c contain the valid messages or some random values, which breaks the IND-CPA security of Enc. Game 5. This is the same as Game 4 except the challenger replaces kcfs with random for the client label labeliC . (λ). We prove this claim by reducWe claim that | Pr[S4 ] − Pr[S5 ]| ≤ AdvDDH G tion using an adversary B receiving a DDH challenge that it embeds into the game it simulates for A , an adversary on Game 4. B can distinguish Game 4 from Game 5 depending on the behavior of A . By the changes in the games, we have that the adversary’s Testcfs query will always be answered with a random key, thus 5 = 0, which concludes the proof. Proof of Theorem 2. Let A be an adversary against the CS-AKE security of protocol Π. We use the same notation as for Theorem 1. (λ) experiment. Games Game 0, 1, 2 and 3 Game 0 is the original ExpCS-AKE Π,A 1, 2 and 3 are defined as in the proof of Theorem 1, with similar reductions. Game 4. This is the same as Game 3 except that the challenger replaces kcs with random for the client label labeliC . (λ). We prove this claim in a similar We claim that | Pr[S3 ]−Pr[S4 ]| ≤ AdvDDH G way as for Game 4 of Theorem 1, except that B sets X = A for labeliC , Y = B for labeljS , and Z for building the shared key kcs . By the changes in the games, we have that the adversary’s Testcs query will always be answered with a random key, thus 4 = 0, which concludes the proof. Proof of Theorem 3. Recall that Π is the protocol that first runs the serverauthenticated AKE protocol given in Fig. 3 to obtain the keys kcs and kcfs , which are then used in the stLHAE protocol given in Fig. 4. Let A be an adversary against the exfiltration resistance of protocol Π. Let P 0 and P 1 be the two programs output by A after the setup phase in experiment ExpExf Π,A (λ). We use the same notations as in the proof of Theorem 2. Game 0. This is the original experiment, hence 0 = AdvExf Π,A (λ). Game 1. At the ith message sent by P b to the firewall, the challenger simu$ lates the firewall by picking a random element denoted r˜i ← {0, 1}λ . This game proceeds as in Game 0, except that if the challenger picks r˜i such that there

208

A. Bossuat et al.

exists j < i with r˜i = r˜j , then the challenger aborts, sets abort1 = 1, and returns a random bit. Let qm be the number of messages sent by P b , we have 2 /2λ . | Pr[S0 ] − Pr[S1 ]| ≤ Pr[abort1 ] ≤ qm Game 2. This is the same as Game 1 except that the challenger aborts, sets abort2 = 1, and returns a random bit if A makes a Send(labelFW , (σ, Y, β1 , β2 )) query such that σ is a valid signature on (Y, X β1 , C β2 ), and a (X, C, ·) was output by Send(labelFW , init), and the server did not previously output (σ, Y, β1 , β2 ). We claim that | Pr[S1 ] − Pr[S2 ]| ≤ Pr[abort2 ] ≤ AdvEUF−CMA (λ). We prove this Sign claim as in Game 3 in the proof of Theorem 1. Game 3. This is the same game as Game 2 except that each time the challenger should encrypt a message c using the public key of the firewall pkFW , the challenger picks a random value and encrypts it. Enc denotes the public key encryption scheme used in our protocol. (λ) We prove this claim as in the We claim that: | Pr[S2 ] − Pr[S3 ]| ≤ AdvIND−CPA Enc Game 3 in the proof of Theorem 1. Game 4. We recall that in Game 3, each time P b sends a message (ri , si , ti ) ri kcfs ) for j ∈ {0, 1}. We to the RF, the RF picks r˜i and computes k˜j,i ← Hj (˜ set hi = r˜i kcfs . Game 4 proceeds as in Game 3, but now the challenger sets abort4 = 1, aborts, and returns a random bit if the adversary sends one of the hi to the random oracle that simulates H1 or H2 . We claim that | Pr[S3 ] − Pr[S4 ]| ≤ Pr[abort4 ] ≤ ns · nf · (q1 + q2 ) · AdvCDH G (λ). where qj is the number of queries sent to the random oracle Hj , and nf (resp. ns ) is the number of firewall (resp. server) labels. To prove this, we show how to build an algorithm B that solves the CDH problem from an efficient algorithm A that triggers abort4 with non-negligible probability. We note that, at this step, k2 can be viewed as a MAC key generated at random, independently from any other element of the protocol. Game 5. This is the same as Game 4 except that the challenger aborts, sets abort5 = 1 and returns a random bit if A makes a Send(labelS , (r, s, t)) query such that the server does not abort, and no Send(label, ·) query received (r, s, t) as an answer. Let nm be the number of messages sent by the firewall during the experiment, and qs the number of queries sent to the sending oracle. (B). We claim that | Pr[S4 ] − Pr[S5 ]| ≤ Pr[abort5 ] ≤ nm · qs · AdvEUF−CMA MAC To prove this claim, we show that, using an efficient algorithm A that makes such a Send(labelS , (r, s, t)) query with non-negligible probability, an adversary B can efficiently break the EUF-CMA experiment by outputting t. No value output by the firewall or by the server depends on label.b, which means that 5 = 0, and concludes the proof.

5

Conclusion and Extension

Reverse firewalls aim to limit the damage done by subverted implementations. We revisited the original goals for this primitive in the AKE setting, as stated by Dodis et al. [9]. To thwart much larger classes of malicious implementations, we defined a new security model that is less restrictive, and significantly clearer than the one of [9], based on the notion of functionality-preserving adversaries.

Designing Reverse Firewalls for the Real World

209

Based on our model, we constructed a reverse firewall for communication protocols, taking into account both the key exchange and the record layer. Our construction resists complex strategies by only using simple cryptographic tools, such as hash functions or standard public key encryption. Our solution is thus efficient, only adding a reasonable overhead. Moreover, our RF is remarkably versatile, able to handle the elements of most recent secure communication protocols. It is therefore a truly practical solution, illustrating the benefits of reverse firewalls in the real world. To show this, we implement our reverse firewall in TLS1.3-like protocol, proving its security, compared with the Mint TLS1.3 implementation. The results are in the full version. Acknowledgements. We would like to thank H˚ akon Jacobsen and Olivier Sanders for their contributions to the preliminary versions of this paper, as well as Kenny Paterson for the fruitful discussions on the subject. This work was supported in part by the French ANR, grants 16-CE39-0012 (SafeTLS) and 18-CE39-0019 (MobiS5).

A

Our Security Model Compared to Dodis et al.’s

Reverse firewalls must preserve, not create security. If a secure-channel establishment protocol is run honestly, it should guarantee end-to-end security, no matter what is or isn’t between the endpoints. This requirement is taken into account, both by Dodis et al.’s work [9] and by ours. The use case for reverse firewalls (RFs) is one in which the adversary has tampered with the client implementation to exfiltrate information to a MitM adversary. It is expected that an honest4 RF will preserve security, so the adversary should not be able to distinguish a transcript involving its corrupt implementation (protected by an RF) from one only involving honest parties. This property obviously cannot hold for all tampered implementation and it is thus necessary to place some restrictions on the adversarial behaviour. In [9], this is done by requiring functionality-maintaining implementations, whose definition we adapt and re-orient to fit our criteria. Our Model. Our first goal is to define a model that places fewer restrictions on the adversarial behaviour, while making these restrictions easy to understand and quantify, via the notion of transcript equivalence, defined in Sect. 2.4. As mentioned before, our second goal is to design an RF that preserves security for the key agreement and the record-layer protocols in this new model, which is a non-trivial extension compared to [9], as our model is much more permissive, allowing more realistic attacks, such as key-replacement in the record layer or choosing key-dependent messages. To achieve this, we consider a more general definition of the RF, allowing it to have a public key pkFW . The existence of this public5 key is an important difference compared to [9]: the client is aware 4 5

This requirement is necessary: we cannot ensure any relevant security property when both the client and the RF are corrupt. Actually, only the client protected by the RF needs to know this public key, so we do not need a complex PKI infrastructure.

210

A. Bossuat et al.

of the existence of the RF, which does not seem to be a significant restriction for most use-cases, such as the one of a company network. We stress that we can retain obliviousness and transparency for our protocol, meaning that the client cannot detect the status of the RF. The firewall can, via its secret key, passively derive a shared secret kcfs (also known to both endpoints) from any key agreement involving one of the clients it protects. This key will be used to preserve the security of the record layer even if the client is corrupt. However, since we also want to ensure privacy for the client and the server against the RF (in case the latter is malicious), we allow the endpoints to derive another shared key kcs (unknown to the RF), which is meant to provide end-to-end security (even with respect to the RF). In our protocol, one needs both keys to learn any information about the transmitted messages; yet the strength of only one key suffices to prevent exfiltration of information and provide security with respect to Man-in-the-Middle adversaries. The RF is unable to break the security of the channel, yet tampered implementations cannot manage to exfiltrate information through the firewall – our RF performs its task without being trusted. In particular, our model considers malicious RFs and we prove that security still holds in this case, despite its additional knowledge of kcfs . This additional key may be useful when considering more intricate key agreement protocols such as TLS 1.3, where parts of the handshake are encrypted. Three Attack Scenarios. We first prove security in a context where everyone is honest, but we also need to separately study the case of kcs , only known to the endpoints, and of kcfs , known to the client, the firewall and the server. We also consider the possibility of a malicious client, which again is the main motivation of RFs. This leads to the three following scenarios: 1. Security of kcs . The adversary controls a legitimate RF, situated between two honest endpoints, and its goal is to break the security of the channel, i.e., get some information on kcs . This scenario implies security without an RF, since the adversary can simply force the latter to be passive. It shows that the client server are still able to preserve their privacy even in the presence of a malicious RF. 2. Security of kcfs . The protocol must ensure the security of the kcfs key established between three honest parties following the protocol. Our adversaries are stronger than those of Dodis et al., since they can reveal all but the testsession keys6 and since they may choose the target instance adaptively. In this scenario the adversary tries to get some information on kcfs . 3. Malicious client. For exfiltration resistance, the adversary provides a malicious client implementation, which will then interact with the firewall and the server. The goal of the adversary is to obtain information about the data sent by the tampered implementation during the executions of the key agreement and secure messaging protocol. The server is honest. 6

This is not the case for Dodis et al., in particular with respect to their security against active adversaries.

Designing Reverse Firewalls for the Real World

211

For clarity, we choose to treat each case separately. Our modelling approach follows the models of Bellare and Rogaway [2] and Canetti and Krawczyk [5], although we also use ideas from the multi-stage model of Dowling et al. [10].

B

Transparency

While obliviousness allows the adversary to obtain auxiliary information, such as revealed keys, transparency is solely based on the indistinguishability of messages coming from the endpoints, and those modified by the firewall. Therefore, it is entirely linked to the content of those messages. By definition, transparency is a sub-goal of obliviousness: it is necessary, but not sufficient. Being able to tell whether a message has been modified implies being able to tell whether a firewall is involved or not. If a protocol with a reverse firewall does not achieve transparency, i.e., if there exists an PPT adversary who can distinguish the message originally issued by the client from a modified one with non-negligible probability, this protocol does not achieve obliviousness. Definition 9 (Transparency). An AKE protocol Π is transparent if for all TRS 1 PPT adversaries A , AdvTRS Π (A ) = Pr[ExpΠ,A (λ) ⇒ 1] − 2 is negligible in λ: TRS ExpΠ,A (λ): 1. (sparam, pparam) ← A Setup(·) (1λ ) 2. Q ← {NewInstance, Send, TestSend} 3. (label, d) ← A Q (sparam, pparam) 4. if (label is Send-fresh) ∧ (label.b = d): return 1 5. else: return a random bit

C

Proof of Theorem 4

We will show that no adversary can distinguish a message sending by an entity (client or server) and a message randomized by the firewall: Client key exchange initialization: the client sends a message (g x , g c , e = Encpkf (c)) where x and c was picked at random. The firewall sends a message   (g x , g c , e = Encpkf (c )) where where x = x · α1 and c = c · α2 , such that α1 and α2 was picked at random. Since α1 and α2 perfectly randomizes x and y,   (g x , g c , e = Encpkf (c)) and (g x , g c , e = Encpkf (c )) follow the same distribution. Server response: if the server receives (g x , g c , e = Encpkf (c)) from the client, it sends a message (σ, Y, D, β1 , β2 ) where β1 and β2 are randomly chosen in Zp , and such that σ is a valid signature on (Y, D, g x·β1 , g c·β2 ). if the server receives   (g x , g c , e = Encpkf (c )) from the firewall, it sends a message (σ, Y, D, β1 , β2 ) where β1 and β2 are randomly chosen in Zp , and such that sigma is a valid   signature on (Y, D, g x ·β1 , g c ·β2 ). The firewall then returns (σ, Y, D, γ1 , γ2 ) where γi = αi · βi , so σ is a valid signature on (Y, D, g x·γ1 , g c·γ2 ), and (σ, Y, D, β1 , β2 ) and (σ, Y, D, γ1 , γ2 ) follow the same distribution. Transmission of a message: the client picks r at random an sends (r, s, t) where s ← H1 (r kcfs ) ⊕ C, and t ← MACH1 (rkcfs ) (r s). The firewall picks r˜

212

A. Bossuat et al.

at random an sends the message (˜ r, s˜, t˜) where s˜ ← H1 (˜ r kcfs ) ⊕ C, and t˜ ← r ˜ s), so (r, s, t) and (˜ r, s˜, t˜) follow the same distribution. MACH1 (˜rkcfs ) (˜

References 1. Bellare, M., Paterson, K.G., Rogaway, P.: Security of symmetric encryption against mass surveillance. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 1–19. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-66244371-2 1 2. Bellare, M., Rogaway, P.: Entity authentication and key distribution. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 232–249. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-48329-2 21 3. Black, J., Rogaway, P., Shrimpton, T.: Encryption-scheme security in the presence of key-dependent messages. In: Nyberg, K., Heys, H. (eds.) SAC 2002. LNCS, vol. 2595, pp. 62–75. Springer, Heidelberg (2003). https://doi.org/10.1007/3-54036492-7 6 4. Brzuska, C., Jacobsen, H., Stebila, D.: Safely exporting keys from secure channels. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 670–698. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-498903 26 5. Canetti, R., Krawczyk, H.: Universally composable notions of key exchange and secure channels. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 337–351. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46035-7 22 6. Checkoway, S., et al.: A systematic analysis of the juniper dual EC incident. In: ACM SIGSAC 2016, pp. 468–479 (2016) 7. Checkoway, S., et al.: On the practical exploitability of dual EC in TLS implementations. In: USENIX 2014, pp. 319–335 (2014) 8. Chen, R., Mu, Y., Yang, G., Susilo, W., Guo, F., Zhang, M.: Cryptographic reverse firewall via malleable smooth projective hash functions. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016. LNCS, vol. 10031, pp. 844–876. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53887-6 31 9. Dodis, Y., Mironov, I., Stephens-Davidowitz, N.: Message transmission with reverse firewalls—secure communication on corrupted machines. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9814, pp. 341–372. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53018-4 13 10. Dowling, B., Fischlin, M., G¨ unther, F., Stebila, D.: A cryptographic analysis of the TLS 1.3 handshake protocol candidates. In: ACM CCS 2015, pp. 1197–1210 (2015) 11. Gamal, T.E.: A public key cryptosystem and a signature scheme based on discrete logarithms. In: CRYPTO 1984, pp. 10–18 (1984) 12. Grahm, R.: Extracting the SuperFish certificate (2015). http://blog.erratasec.com/ 2015/02/extracting-superfish-certificate.html 13. Jager, T., Kohlar, F., Sch¨ age, S., Schwenk, J.: On the security of TLS-DHE in the standard model. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 273–293. Springer, Heidelberg (2012). https://doi.org/10.1007/9783-642-32009-5 17 14. Kohlweiss, M., Maurer, U., Onete, C., Tackmann, B., Venturi, D.: (De)constructing TLS 1.3. In: Biryukov, A., Goyal, V. (eds.) INDOCRYPT 2015. LNCS, vol. 9462, pp. 85–102. Springer, Cham (2015). https://doi.org/10.1007/ 978-3-319-26617-6 5

Designing Reverse Firewalls for the Real World

213

15. Krawczyk, H., Paterson, K.G., Wee, H.: On the security of the TLS protocol: a systematic analysis. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 429–448. Springer, Heidelberg (2013). https://doi.org/10.1007/9783-642-40041-4 24 16. Krawczyk, H., Wee, H.: The OPTLS protocol and TLS 1.3. In: Proceedings of Euro S&P, pp. 81–96 (2016) 17. Ma, H., Zhang, R., Yang, G., Song, Z., Sun, S., Xiao, Y.: Concessive online/offline attribute based encryption with cryptographic reverse firewalls—secure and efficient fine-grained access control on corrupted machines. In: Lopez, J., Zhou, J., Soriano, M. (eds.) ESORICS 2018. LNCS, vol. 11099, pp. 507–526. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98989-1 25 18. Mironov, I., Stephens-Davidowitz, N.: Cryptographic reverse firewalls. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9057, pp. 657–686. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46803-6 22 19. Naylor, D., et al.: Multi-context TLS (mcTLS) enabling secure in-network functionality in TLS. In: SIGCOMM 2015, pp. 199–212 (2015) 20. O’Neill, M., Ruoti, S., Seamons, K., Zappala, D.: TLS proxies: friend or foe. In: IMC 2016, pp. 551–557 (2016) 21. Rogaway, P.: The moral character of cryptographic work (2015). http://web.cs. ucdavis.edu/rogaway/papers/moral-fn.pdf 22. Young, A., Yung, M.: Kleptography: using cryptography against cryptography. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 62–74. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-69053-0 6

Software Security

Follow the Blue Bird: A Study on Threat Data Published on Twitter Fernando Alves1(B) , Ambrose Andongabo2 , Ilir Gashi2 , Pedro M. Ferreira1 , and Alysson Bessani1 1

LASIGE, Faculdade de Ciˆencias, Universidade de Lisboa, Lisbon, Portugal [email protected] 2 Centre for Software Reliability, City, University of London, London, UK

Abstract. Open Source Intelligence (OSINT) has taken the interest of cybersecurity practitioners due to its completeness and timeliness. In particular, Twitter has proven to be a discussion hub regarding the latest vulnerabilities and exploits. In this paper, we present a study comparing vulnerability databases between themselves and against Twitter. Although there is evidence of OSINT advantages, no methodological studies have addressed the quality and benefits of the sources available. We compare the publishing dates of more than nine-thousand vulnerabilities in the sources considered. We show that NVD is not the most timely or the most complete vulnerability database, that Twitter provides timely and impactful security alerts, that using diverse OSINT sources provides better completeness and timeliness of vulnerabilities, and provide insights on how to capture cybersecurity-relevant tweets.

Keywords: OSINT

1

· Twitter · Vulnerabilities

Introduction

Cybersecurity has remained a hot research topic due to the increased number of vulnerabilities indexed and to the severe damage caused by recent attacks, from ransomware (e.g., wannacry) to SCADA systems attacks (e.g., the attacks on the Ukrainian power stations). A growing trend for obtaining cybersecurity news is to collect Open Source Intelligence (OSINT) from the Internet [57]. OSINT sources include vulnerability databases (e.g., the National Vulnerability Database—NVD), online forums (e.g., Reddit), social networks (e.g., Twitter), and scientific literature. Although more technical, exploit databases (e.g., ExploitDB) are a useful OSINT source providing code excerpts known as Proofs of Concept (PoC) that show how to exploit a vulnerability. PoCs can be analysed by a specialised audience capable of using the exploit’s code to understand and counteract vulnerability exploitation, thereby removing the vulnerability. The research community has shown many different uses for OSINT, from its collection and processing [26,29,36,37,40,43,48,54,59], vulnerability life cycle analysis [28,31,42,50,56], to evaluating vulnerability exploitability [24,31,32, c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 217–236, 2020. https://doi.org/10.1007/978-3-030-58951-6_11

218

F. Alves et al.

38,45,51]. There are two predominant OSINT sources in the literature: NVD (e.g., [24,31,45,51,55]), and Twitter (e.g., [26,40,41,43,59]). The first provides curated vulnerability data, while the latter is more generic, concise, and covers more topics. As Twitter’s usage grew, various information sources began to link their content on Twitter to increase visibility and attract attention. Twitter’s continued growth placed it among the most relevant communication tools used by the vast majority of companies who have a Twitter account to interact with the world. All this activity also caught the attention of the research community. The information flow and interaction graphs meant new research opportunities, such as detecting emerging topics [34,46], or finding events related to a specific topic, such as riots [25], patients experience with cancer treatment drugs [30], or earthquakes [52]. Twitter popularity instigated the development of tools to collect tweets (e.g., Tweet Attacks Pro [20]), APIs for programming languages (e.g., Tweepy [19]), and many OSINT-collecting tools developed specific plugins to collect tweets (e.g., Elastic Stack [21]), including cybersecurity-oriented ones (e.g., SpiderFoot [18]). The cybersecurity field also found opportunities in using Twitter (discussed in Sect. 2). However, to the best of our knowledge, there is no evidence in the literature highlighting one security data source as advantageous over the others. For instance, the following questions are yet to be answered: Why use solely NVD when there are several reputable vulnerability databases? Is NVD the richest (in terms of number of vulnerabilities reported) and timeliest (does it contain the earliest reporting date of a vulnerability) vulnerability database? Why use Twitter to gather cybersecurity OSINT? Does Twitter provide any advantage over vulnerability databases? Is it useful for security practitioners? In this paper, we present an extensive study on OSINT sources, comparing their timeliness and richness. We analysed the vulnerability OSINT sources indexed on vepRisk [27], which aggregates several vulnerability databases, advisory sites, and their relationships. We compared Twitter against these data sources to understand if there are any advantages in using it as a cybersecurity data source. To explore this topic, we formulated three research questions: RQ1: Is NVD the richest and timeliest vulnerability database? RQ2: Does Twitter provide a rich and timely vulnerability coverage? RQ3: How are vulnerabilities discussed on Twitter? Our findings show that: vulnerability databases complement one another in richness and timeliness (i.e., no single source contains all the vulnerabilities; no single sources can be relied on as providing the earliest vulnerability reporting date); Twitter is a rich and timely vulnerability information source; and finally, Twitter complements other OSINT sources. In summary, our contributions are: – A comparison between some of the most reputable and complete vulnerability databases in terms of timeliness and coverage; – An analysis of the coverage and timeliness of Twitter with respect to vulnerability information;

Follow the Blue Bird: A Study on Threat Data Published on Twitter

219

– An analysis about “early alerts” on Twitter, i.e., vulnerabilities disclosed or discussed on Twitter before their inclusion on vulnerability databases; – An analysis on how vulnerabilities are discussed on Twitter; – Insights on how to collect timely tweets; – Insights regarding OSINT (and in particular Twitter’s) usage for cybersecurity threat awareness.

2

Background and Related Work

The following sections present some vulnerability databases and previous research contributions related to this work. 2.1

Vulnerability Databases

MITRE Corporation [11] maintains the Common Vulnerabilities and Exposures list [3] (in short, CVE), a compilation of known vulnerabilities described in a standard format. A global index of known vulnerabilities simplifies complex analyses such as detecting advanced persistent threats. Therefore, indexing known vulnerabilities in CVE became standard practice for all kinds of security practitioners, including software vendors. Each CVE entry has an ID (CVE-ID), a short description, and the creation date. NIST’s National Vulnerability Database [12] mirrors and complements CVE entries on their database. Every hour, NVD contacts CVE to obtain newly disclosed vulnerabilities (we contacted NVD directly to get this information). Each vulnerability indexed in NVD undergoes a thorough analysis, including attributing an impact score based on the Common Vulnerability Scoring System (for both versions 2.0 [5] and 3.0 [4]), and links related to the vulnerability, such as advisory sites or technical discussions. NVD uses the CVE-ID in place of an ID of its own. There is a significant difference between the dates of NVD and CVE entries. In CVE, it is the date when entries became reserved, but not yet public. NVD entries are always public, using the date when they were indexed, even prior to their analysis completion. Thus, in practice, a vulnerability has the same public disclosure date on both CVE and NVD, which is NVD creation date. Therefore, this study considers only the NVD vulnerability disclosure date. Besides CVE and NVD, many online databases compile known vulnerabilities and provide unrestricted use of their contents, such as the Security Database [17] and PacketStorm [15]. The complementary information provided by each database differs, but in general, these provide a description, some analysis of the security issues raised by the vulnerabilities, known exploits, and possible fixes or mitigation actions. 2.2

Cybersecurity-Related OSINT Studies

To the best of our knowledge, Sauerwein et al. performed the most similar study to the one present on this paper [55]. For two years, the authors collected all

220

F. Alves et al.

tweets with a CVE-ID in its text. They show a comparison of the tweet publishing dates with the disclosure dates of those CVEs on NVD. The results show that 6232 vulnerabilities (25.7% of their dataset) were discussed on Twitter before their inclusion on NVD. However, this study falls short in some aspects. Firstly, the NVD is not always the first database to report new vulnerabilities, which changes the vulnerabilities first confirmed report date (see Sect. 4). Secondly, the authors search only for CVE-IDs on Twitter, which will not capture issues that have been disclosed to the public but not (yet) indexed on CVE or NVD. Finally, the analysis is focused solely on the vulnerabilities life cycle and Twitter appearance, overlooking vulnerability characterisation such as their impact. There is some research work providing evidence that relevant and timely cybersecurity data is available on Twitter [33,41,44], i.e., that some vulnerabilities were published on Twitter before their inclusion on vulnerability databases. However, these are case studies concerning a single vulnerability, and compare the tweets referring them solely with NVD. Other Twitter-based contributions include correlating security alerts from tweets with terms found in dark web sources [53], studying the propagation of vulnerabilities on Twitter [58], and finding that exploits are published on Twitter (on average) two days before the corresponding vulnerability is included in NVD [51]. In a similar research line, Rodriguez et al. [49] analysed vulnerability publishing delays on NVD when comparing to other OSINT sources: Security Focus, ExploitDB, Cisco, Wireshark, and Microsoft advisories. The authors report that NVD presents publishing delays (from 1 to more than 300 days) from 33% to 100% of the cases when comparing with those databases, i.e., sometimes it publishes after these databases or it always publishes after these databases. However, the authors consider only the year of 2017. Similarly, the Recorded Future company reports that for 75% of the vulnerabilities NVD presents a 7-day disclosure delay [16]. However, the company does not reveal how it obtained these results. The literature lacks a systematic and thorough analysis regarding the data published on Twitter and on vulnerability databases, including crucial aspects such as coverage, timeliness, and the actionability provided by such OSINT.

3

Methodology

The objective of this study is to compare some aspects of the information present on vulnerability databases with another OSINT source, namely Twitter. Instead of searching, collecting, and parsing a set of databases, we use the vepRisk database [27]. It contains several types of security-related public data, including all entries published on NVD, Security Database, Security Focus, and PacketStorm databases, from their creation until the end of 2018. We chose Twitter as an OSINT source, as it is a known aggregator of content posted by all kinds of users (hackers, security analysts, researchers, etc.), news sites, and blogs, among others who tweet about their content to increase visibility [9]. Thus, Twitter became an information hub for almost any kind of content. Unlike vulnerability databases—that contain only security data— Twitter includes discussions over a vast universe of topics. Since the results of

Follow the Blue Bird: A Study on Threat Data Published on Twitter

221

this study are based on tweets mentioning indexed vulnerabilities, we decided to search for tweets mentioning the vulnerabilities indexed on NVD. Finally, to ensure the validity of our results, we opted to manually match tweets to vulnerabilities. These decisions raised two questions: 1) what part of the vulnerability description are we going to use as a search term? and 2) How to reduce the number of vulnerabilities to manually inspect? The NVD description of some vulnerabilities includes a “colloquial” name for which the vulnerability became known. For example, CVE-2014-0160 is known as the “Heartbleed bug”. These names fall mostly within two categories: a generic description of the vulnerability class (e.g., “Microsoft Search Service Information Disclosure”), or some “creative” designation related to the vulnerability (e.g., “Heartbleed” is a vulnerability on the “heartbeat” TLS packets which can be exploited to leak or “bleed” information). These colloquial names are easily recognisable since they always appear in the NVD vulnerability description after the “aka” acronym (for also known as). Therefore, to guide the search on Twitter, all vulnerabilities with a colloquial name were selected, and the names were used as query terms. This decision also reduced the number of vulnerabilities to analyse to 9,093, an amount of data manually processable. Additionally, vulnerabilities with colloquial names are more likely to be discussed on Twitter since most were “named” due to media attention. The IDs of the 9,093 vulnerabilities with a colloquial name that were used in this study are listed online [1]. We were unable to use the Twitter API to collect the tweets for the study as it only provides access to tweets published in the previous week. However, the Twitter web page allows searching for tweets published at any point in time. To automate the querying process, a library called GetOldTweets [7] was employed. It mimics a web browser performing queries on the Twitter page, enabling fast and programmatic retrieval of any number of tweets from any time. Regarding matching tweets and vulnerabilities, we consider that a tweet t unequivocally refers a specific vulnerability v if and only if (1) t mentions in its text v ’s CVE-ID even if the vulnerability has not yet been disclosed on NVD, or (2) t contains a link mentioned in v ’s NVD description, even if the web page pointed by the link is currently down, or (3) t mentions a security advisory that is also referred by v ’s NVD links about that threat, or (4) t or t’s links mention an ID associated with v. Two assumptions are made: 1) if an ID is present on a tweet, then the advisory has been published; and 2) a security analyst that receives a tweet containing a security advisory ID can search for this advisory, thus having the same result as publishing the advisory link on the tweet. If a vulnerability is mentioned by up to a thousand tweets, all tweets were manually inspected. The colloquial name of some vulnerabilities is also a word commonly used on tweets, such as “CRIME” (CVE-2012-4930) or “RESTLESS” (CVE2018-12907). For those cases, where a search term can return more than 350,000 tweets, the manual inspection was done in two steps. First, the description is analysed to understand the vulnerability characteristics and related terminology. Then, a large set of informed searches were performed on the tweet set in search of tweets potentially referring the vulnerability. In total, about a million tweets

222

F. Alves et al.

Table 1. The number of entries in each database (in bold) and the number of shared entries between database pairs.

Table 2. The number of times one database or a group of databases were the first to disclose a vulnerability.

NVD PS SD SF NVD 110,353 – – – PS 9,290 129,130 – – SD 110,353 9,344 117,098 – SF 60,378 8,597 60,843 98,445

Database(s) NVD PS SD SF NVD, SD PS, SF NVD, SD, SF NVD, PS, SD NVD, PS, SD, SF

# Occurrences % 0 0.00 853 0.77 0 0.00 40,208 36.44 51,238 46.43 1,265 1.15 16,580 15.02 85 0.08 124 0.11

were manually inspected, and any links present in potential matches were also examined to confirm the matches. The data labelling was performed solely by a PhD student with a cybersecurity background, and it took roughly eight months to complete. All potential matches were triple checked to ensure their validity. The time range considered in this study begins on March 2006 (Twitter’s creation date) until the end of 2018. The tweets were collected between early 2017 and the end of 2019. The resulting dataset contains 3,461,098 tweets. The tweets publishing times were adjusted to the day time-scale to match the time granularity provided by the vulnerability databases. Therefore, all time comparisons performed in this study used the publishing day.

4

Vulnerability Database Comparison

As NVD is considered a standard for consulting vulnerability data, many research works use only it as their vulnerability database (e.g., [42,47,50,55]). This is a natural choice since NVD includes multiple resources for further understanding of the issue at hand. However, other reputable vulnerability databases, with their own disclosure procedures and timings, provide useful information for security practitioners. Therefore, it is interesting to understand if there is evidence that supports using only NVD for practice or research work. To investigate this point, we collected data about two different aspects: the number of entries and their publishing date. The first measures the coverage of the database, while the second is related to its timeliness and practical usefulness. Table 1 shows the number of entries in each of vepRisk’s databases: NVD, PacketStorm (PS), Security Database (SD), and Security Focus (SF). It also shows the number of entries shared between each database pair. Tables 2 and 3 are related to timeliness. Table 2 is divided in two blocks. The first shows the number of occurrences where one database was the first to disclose a vulnerability ahead of other databases. The second block shows the number of occurrences

Follow the Blue Bird: A Study on Threat Data Published on Twitter

223

where various groups of databases were simultaneously first to disclose a vulnerability. Table 3 complements the previous table by showing the percentage of time each database was one of the first to disclose a vulnerability. There are five key takeaways obtained Table 3. The percentage of times from analysing the tables: (1) NVD is not each database was one of the first to the most complete vulnerability database, disclose a vulnerability. with the Security Database and Packet- Database # Occurrences % Storm containing more entries; (2) NVD is NVD 68,027 61.64 not the most timely database. Alone, it was Security Database 68,027 61.64 52.72 never the first to publish a vulnerability; Security Focus 58,177 2,327 2.11 (3) No database stands out as the most PacketStorm timely; (4) Security Database contains all of NVD’s entries (this was manually verified); (5) With the exception of NVD, all databases publish different vulnerabilities. Therefore it is important to follow a set of data sources instead of relying solely on one.

5

Twitter Vulnerability Coverage and Timeliness

Coverage. A first validation on using Twitter for cybersecurity is verifying if vulnerability data reaches Twitter. We searched for tweets mentioning each of the CVE-IDs published on NVD after Twitter’s creation. Of the 94,398 CVEIDs searched, 71,850 (76.11%) were mentioned in tweets. However, by analysing Fig. 1 it is possible to observe that since the beginning of 2010, CVEs became regularly discussed on Twitter. In fact, from 2010 forward, the coverage became above 97.5%, validating the hypothesis that vulnerability data reaches Twitter. The drastic increase in tweets mentioning CVEs in 2010 may be connected to the sudden growth Twitter underwent in that period [10]. Nevertheless, the turning point on cybersecurity threat awareness and on the importance of coordinated vulnerability disclosure mechanisms may have been in the beginning of 2010, when Google publicly disclosed that their infrastructure in China was targeted by an advanced persistent threat codenamed “Operation Aurora” [14]. Later on, it was discovered that other major companies were targeted, such as Adobe Systems, Rackspace, Yahoo, and Symantec. This event could have triggered two crucial social phenomena: that companies are attacked and should not be ashamed of it, and should disclose the details of these attacks in a coordinated effort to detect, understand, and prevent them; and that the users prefer transparency in cybersecurity events since when data breaches occur, typically it is the user data that is affected. Timeliness. Regarding timeliness, we performed the analysis only for the 9,093 vulnerabilities that were manually analysed to ensure the correctness of the results. Figure 2 shows, for those vulnerabilities, which source discussed them first: either one of the vulnerability databases considered in this study, Twitter, or Twitter and at least one of the databases, simultaneously. There are also the

224

F. Alves et al.

cases where the vulnerability was not discussed on Twitter, which are predominant before 2010. Although we are not evaluating the whole databases, the figure shows a predominance of same day publishing cases (84.56% when considering 2006–2018, and 93.73% in 2010–2018). We consider that these results validate the hypothesis that Twitter is a timely source of vulnerability data. In the next section we present an in-depth study of the cases where the vulnerabilities were discussed on Twitter ahead of vulnerability databases. Mentioned

Not mentioned

1400

16000

1200 # Vulnerabilities

# Vulnerabilities

14000 12000 10000 8000 6000 4000

1000 800 600 400

2000

200

0

0

8 201 7 201 6 201 5 201 4 201 3 201 2 201 1 201 0 201 9 200 8 200 7 200 6 200

8 201 7 201 6 201 5 201 4 201 3 201 2 201 1 201 0 201 9 200 8 200 7 200 6 200

Year

Year

Fig. 1. Twitter’s CVE coverage.

6

Same date Twitter

Not discussed Vuln. Databases

18000

Fig. 2. A timeliness comparison between vulnerability databases and Twitter.

Early Vulnerability Alerts on Twitter

Of the 9,093 vulnerabilities analysed, 89 were referred by tweets before being published on at least one of the vulnerability databases considered. Even though these vulnerabilities represent a small percentage of the vulnerability sample under study (0.98%) we decided to characterize them to understand if searching for early alerts on Twitter is a worthy endeavour. The most mentioned vendors in the early alerts are the Ethereum blockchain (17 mentions), Microsoft products (5), Debian (5), Oracle (4), Linux (4), and Apple (4), while the most mentioned assets are Javascript (9), SSL/TLS (8), Xen Hypervisor (4), Safari Browser (3), Mercurial version control (3), and Cloud Foundry (3). These mentions provide evidence of the usefulness of these alerts, as both vendors and assets are some of the major players in their respective fields. All vulnerabilities with Twitter early alerts can be found online [1], together with their publishing dates on the vulnerability databases and some extra notes. In the following sections these early alerts are further analysed on their impact and usefulness. We conclude the section with a discussion of the significance of these results. 6.1

Timeliness

Twitter Versus Vulnerability Databases. Figure 3 presents the distribution of early alerts over the years considered. The number of early alerts increased

Follow the Blue Bird: A Study on Threat Data Published on Twitter

225

in the last two years, which matches the increase of vulnerabilities published on databases since the beginning of 2016. Concerning publishing timing, the majority of early alerts were available up to thirty days ahead of vulnerability databases (78.65%–70 cases). Notably, four early alerts appeared between 31 and 50 days ahead, four between 51 and 100 days ahead, nine between 101 and 200 days ahead, and finally, the two cases with the highest antecedence were 371 and 528 days ahead. The number of days Twitter is ahead of vulnerability databases increased continuously since 2008, but only after 2016 we found more than 10 cases. No relevant patterns were discovered in the early alerts due to the small number of occurrences. Twitter Versus Advisory Sites. Besides vulnerability databases and social media, advisory sites are an essential source of vulnerability information. Many companies use websites to announce software patches, along with which vulnerabilities are fixed. Therefore, we compare Twitter with advisory sites as these are specialized OSINT sources directly connected to the software vendors.

Fig. 3. The number of early alerts found per year.

We manually searched for advisory notices for each of the early alerts, obtaining only 33 advisories (approximately of early alerts). Table 4 presents the number of times either Twitter or the advisory was the first publisher, or when both published on the same day. The majority of early alerts are not paired with an advisory, but the tweets referring them contain links that describe these vulnerabilities. This observation reinforces the idea that Twitter is a useful cybersecurity discussion hub by connecting various knowledge resources in a single place.

226

F. Alves et al.

Table 4. The number of times Twitter, Table 5. The CVSS 2.0/3.0 impact of advisory site, or simultaneously both, were the early alert vulnerabilities. the first to publish an advisory notice. (a) CVSS 2.0. (b) CVSS 3.0.

1st publisher # %

CVSS 2.0 #

Twitter

11 12.36

Low

Same date

13 14.61 9 10.11

Advisory site No Advisory

%

CVSS 3.0 #

5

5.62

Medium

64

71.91

High

20

22.47

56 62.92

Low

0.00

Medium

16

17.98

High

37

41.57

Critical N/A

6.2

% 0

5

5.62

31

34.83

Vulnerability Impact

Although the existence of early alerts is relevant by itself, it is essential to assess the impact of the vulnerabilities. Table 5 presents how many early alerts have low, medium, high, and critical (CVSS 3.0 only) CVSS scores according to the CVSS 2.0 and 3.0 scoring systems. As the CVSS 3.0 was released in 2015, 31 early alerts are ranked only according to CVSS 2.0 (the N/A line in Table 5b). Almost all early alerts are ranked by CVSS 2.0 as having a medium or high impact (about 94%). When considering the CVSS 3.0, no alerts are ranked with low impact, and five are graded with a critical score. Despite the small number of early alerts, the CVSS score points out that these are relevant vulnerabilities and should not be disregarded. For example, CVE2016-7089 is a WatchGuard firewall vulnerability that allows privilege escalation via code injection. This vulnerability belongs to the set of issues disclosed by the “Shadow Brokers” [8], and has a public exploit on ExploitDB [23]. Table 6. The exploitation status of the early alerts. Exploitation status Twitter publishing At disclosure # % # % Exploited

6.3

21 23.60

22 24.72

PoC

11 12.36

11 12.36

No data

57 64.04

56 62.92

Vulnerabilities Exploited at Disclosure Time

A vulnerability only has an actual impact once it is exploited. Table 6 shows the exploitation status of the early alert vulnerabilities, both at the Twitter publishing and disclosure dates. The majority of vulnerabilities are not paired with observations of their exploits in the wild (64% or 57). A quarter of these cases (23.6% or 21) are known to be exploited. In a few cases (12% or 11), a PoC was referred by the vulnerability notice, describing how to exploit the

Follow the Blue Bird: A Study on Threat Data Published on Twitter

227

vulnerability. As it is impossible to know if that PoC was used, we categorised these separately from the cases where the exploitation was confirmed. We matched the early alerts with CVE-mentioning exploits present in ExploitDB [6] to complement the previous result. Only one case was found, published before the earliest vulnerability database and after the disclosing tweet. This information was used to update Table 6, adding the “At disclosure” column. Considering vulnerabilities known to have been exploited and those with a PoC, the total amounts to about 34%. Current studies estimate that the percentage of vulnerabilities that are exploited in the wild is 5.5% [38], meaning that these early alerts include many appealing targets for hackers. 6.4

Actionability

Perhaps even more important than knowing the impact or exploitation status of a vulnerability, is to avoid exploitation. This can be achieved by applying patches or configurations to protect the vulnerable system. Table 7 shows which vulnerability mitigation measures can be reached by following the hyperlinks found in early alert tweets. In almost 40% of the cases, the tweet includes a link pointing to a patch that solves the vulnerability. For another 40% of vulnerabilities there is no patch available (“None”), or that information is not clear or the topic is not discussed (“No data”). The unreadable case is due to a page not written in English, where some parts of the text were not clear even after translation. The N/A entries are due to dead links, which blocked the analysis. In the majority of cases (57%), the Table 7. The actionability provided early alerts provided some information on by the early alert tweets. how to protect the vulnerable systems from Action types # % exploitation, either by patch or configuraPatch 34 38.20 5 5.62 Configuration/patch tion. If the cases where we could not get 12 13.48 Configuration more information (the “No data” cases) pro23 25.84 None vided some solution, then the protection rate 13 14.61 No data would increase to more than 70%. There1 1.12 Unreadable 1 1.12 N/A fore, we conclude that besides impact and exploitation relevance, early alerts are also useful due to the actionability they enable, as they inform security practitioners of possible actions to protect their systems.

7

How Vulnerabilities are Discussed on Twitter

In this section we characterize some aspects of how vulnerabilities are discussed on Twitter. By identifying these aspects we provide guidelines for topic detection techniques oriented at capturing cybersecurity events. The following results are based on the analysis of the 9,093 vulnerabilities considered in this study.

228

7.1

F. Alves et al.

Duration and Number of Tweets

Figure 4 (left) presents the discussion duration. We observed that half of the vulnerabilities were discussed during up to eight days. However, it is interesting to see that the other half is middling spread across to up to 2,000 days. In some cases, the discussion can continue to up to almost 3,800 days. Discussion periods are extensive on some vulnerabilities mainly due to three different reasons: being used as comparative examples when discussing new events (e.g., CVE2014-0160, the “Heartbleed” bug); being (partly) reused on new attacks or as a part of a campaign, (e.g., CVE-2017-11882 [13]); as case studies, therefore being remembered by their impact, specificity, or technical details. Figure 4 (middle) presents how many tweets discuss the vulnerabilities. From the graph we are omitting two outliers: one vulnerability discussed by 7,749 tweets, and another discussed by 15,733. Half of these vulnerabilities are discussed by two to thirteen tweets. These results are not surprising considering that most vulnerabilities are uneventful. The large majority of vulnerabilities are described, patched, and forgotten. Also, only a small percentage of vulnerabilities is exploited in the wild [38], which are the ones more likely to attract more attention. Only 351 vulnerabilities were discussed by more than 50 tweets, showing that although this content is posted on social media, relatively few vulnerabilities attract attention. However, taking a closer look at those 351 vulnerabilities, 14 of them have low severity rating according to CVSS 2.0, 124 have medium severity, and 213 have high severity. Although it is not implied that vulnerabilities with medium and high severity are going to be widely discussed, these results indicate that those referred by more tweets tend to have higher severity ratings.

# Vulnerabilities

1000 100 10 1 0 0

1000

2000 3000 # Days

0 4000 1

0 1000 # Tweets

1

1000 # Tweets on peak day

Fig. 4. Vulnerability discussion analysis: vulnerability discussion duration in days on Twitter (left), the total number of tweets discussing a vulnerability (middle), and the peak number of tweets in a single day (right).

Figure 4 (right) presents the daily peak discussion, i.e., the maximum number of tweets discussing the vulnerability in a single day. Three-quarters of the vulnerabilities have daily peaks between one and ten tweets. This is an important factor for topic detection techniques, as most of these identify new trends based on detecting bursts of tweets discussing the same event [60,61].

Follow the Blue Bird: A Study on Threat Data Published on Twitter

7.2

229

Accounts

The tweets discussing security content used in this study were posted by 194,016 different accounts. We performed a quick analysis to understand if any account(s) stand out as sources of cybersecurity tweets. Out of the 194,016, only 5,863 of them published more that one relevant tweet, and only 228 posted more than 5. The highest tweet count for a single account is 73 tweets. Therefore, in this study, we did not find any best accounts to follow for cybersecurity content.

8

How to Find Timely Tweets

The results presented so far on this paper are a forensic analysis of the content posted on vulnerability databases and on Twitter. However, security practitioners are interested in capturing these posts live. Therefore, this section provides insights for data collection methods based on the aforementioned knowledge. Systems that collect threat intelligence are designed to detect relevant news items while discarding non-relevant ones. The various systems proposed in the literature vary in terms of the complexity of the data selection approach, and as it was infeasible to test all of them, we selected three approaches to test against our data. The first is a simple heuristic-based approach. A tweet is considered relevant if it mentions a software element and a threat word from the VERIS [22] or ENISA [22] cybersecurity taxonomies. The second one is equal to the first but also detects the word “CVE”. The third is a more sophisticated approach. We used a convolutional neural network-based approach (CNN) from a previous Twitter-based cyberthreat detection work developed by the authors [36,37]. The test simply measures if the approach correctly detects the target tweets. Table 8 shows the results of these approaches. We distinguish pre- and post2010 periods as the coverage differs significantly. The percentages on first four columns in the table are obtained against the labelled data used in Sects. 6 and 7, respectively. The last two columns are obtained by running the techniques mentioned above against Sect. 5. These are speculative results as the data is not labelled, but should transpose from the relatively large labelled data. Table 8. Percentage of correctly detected tweets according to the various datasets and methods. The header row includes the dataset size between brackets. Early

Early post Colloquial Colloquial post All CVES All CVES post

(89)

2010 (76)

(9101)

2010 (7923)

(94,398)

2010 (77409)

56.18% 63.16%

57.01%

63.98%

71.14%

70.99%

Heuristic + CVE 62.92% 71.05%

88.46%

99.24%

84.56%

93.73%

CNN

89.51%

87.76%

87.74%

87.59%

Heuristic

57.30% 45.68%

Using the simplest heuristic method is the worst method of the three except in one case. This means that the trivial approach works but lacks expressiveness regarding how vulnerabilities are discussed. Adding the word “CVE” to the

230

F. Alves et al.

detection mechanism enables it to detect tweets about already indexed vulnerabilities that do not follow the “software name and threat type” rule, drastically increasing the detection rates. Therefore, this is a suitable detection technique. Finally, the CNN presents rather poor results regarding detecting early alerts but otherwise is consistently close to 90% accuracy. Although sometimes the CNN has a lower accuracy rate than the heuristics, this test does not cover false positive rates, where the CNN is expected to largely outperform heuristics. We performed a follow-up analysis to the early alert tweets in an effort to understand the CNN results. We used BERT [35] to obtain semantic-rich feature vectors from the tweets, and cosine similarity [62] as a similarity measure. The tweets were grouped by similarity, where each tweet was grouped with its most similar peers, as long as each group had an average similarity rating above 0.8. In almost all cases, the CNN gave the same classification for all members of each group. By observing these groups we can assert that the CNN accurately classified as relevant tweets with a more cybersecurity-oriented speech (“New “Lucky Thirteen” attack on TLS CBC. . . ” or “Misfortune Cookie: The Hole in Your Internet Gateway. . . ”), while incorrectly classifying less structured tweets (“Only in the IT world can you say things like “header smuggling” (. . . ) or in regex “Did you escape the caret? ” or “The text for TBE-01-002 references TBE01-004 - which does not seem to be included in the report. Is that intentional? ”). As our CNN was trained mostly using tweets directly discussing cybersecurity, any tweets not conforming to the pattern are likely to be discarded. Thereby we conclude that a diverse training set for neural networks is required towards a complete detection coverage.

9

Summary of Findings

In the following, we summarize the findings associated with each of the research questions formulated in this study. RQ1: Is NVD the Richest and Timeliest Vulnerability Database? The NVD does not stand out as the most complete vulnerability database, as there are others that index more vulnerabilities. Also, NVD is not the most timely database. In fact, it was never the first database to publish a new vulnerability ahead of the others. However, NVD is known to have a strict publishing policy, allowing for consultation and comments from product vendors, which means that vulnerability publication may be delayed. RQ2: Does Twitter Provide a Rich and Timely Vulnerability Coverage? Since the beginning of 2010 Twitter provides a timely and rich coverage of known vulnerabilities. Moreover, there is a small subset of vulnerabilities (less than 1% of those we inspected) that are discussed on Twitter before their inclusion on vulnerability databases. Although these are very few cases, our analysis shows that they are relevant, impactful, and in many cases provide useful security recommendations. Overall, we consider Twitter as a useful cybersecurity news feed that should be taken into account by security practitioners.

Follow the Blue Bird: A Study on Threat Data Published on Twitter

231

RQ3: How are Vulnerabilities Discussed on Twitter? Vulnerability discussion on Twitter is carried out mostly in small bursts of two to thirteen tweets. Most vulnerabilities stopped being discussed within eight days, although tweets about them can appear for several years. Vulnerabilities discussed by a higherthan-usual volume of tweets (more than 50) tend to have higher impacts.

10

Insights for Practical Usage

Beyond the comparative analysis presented in this paper, there are a set of insights that we gathered while analysing the tweets collected. Below we present practical takeaways related to OSINT usage and its advantages. No Vulnerability Database Stands Out as the Best. NVD is an essential OSINT source, especially due to its thorough analysis and important link aggregation, but other reputable databases should be considered as a complement for four main reasons: Timeliness—NVD is not the most timely database (see Sect. 4); Actionability—NVD does not directly provide suggestions to mitigate or avoid vulnerabilities, unlike other databases (e.g., PacketStorm, Security Focus); Known exploits—NVD does not collect information about known exploits, unlike other databases (e.g., PacketStorm, Security Focus); Completeness—NVD is not the most complete database (see Table 1). Therefore security practitioners should use a database ensemble to collect security events. Twitter is Relevant. OSINT is provided by many reputable sources and should be taken seriously. Besides the significant research efforts (e.g., [26,40,41,43,59]), there are companies and tools dedicated to OSINT sharing and enrichment. Sects. 5 and 6 demonstrate that tweets provide timely, relevant, and useful cybersecurity news. Twitter is a Natural Data Aggregator. Another clear advantage of using Twitter to gather information is its natural data aggregation capability. The 89 early alerts mention 73 different products from 59 different vendors. When considering the 9,093 vulnerabilities analysed, these numbers extend to 1,153 products from 346 vendors. Forty-two CVEs are not indexed in the Common Platform Enumeration—a database of standard machine-readable names of IT products and platforms [2] used by CVE. Security Advisories May Not be Provided by Smaller Companies. The majority of vendors mentioned in the 89 early alerts do not provide an advisory site or news blog, while the vendors who provide advisories may not provide an API or a feed subscription. Since advisory sites link their content to Twitter at publication time, one can receive security updates by following the advisory accounts or by accessing Twitter’s stream API and applying appropriate filters. Twitter is Important but not Omniscient. We believe that a plausible trend for OSINT is to use Twitter as a front-end of the latest events. Since tweets have a relatively small size, messages tend to be concise, efficiently summarising the content of the associated links. This is one of Twitter’s characteristics that made

232

F. Alves et al.

it so popular: reading a set of tweets is much faster than inspecting a collection of web sites. Therefore, Twitter naturally provides an almost standardised summary, quick and straightforward to process, which is very attractive for Security Operation Centres. Tweets will not replace the current security publishing mechanisms in place. Once a security-related tweet is received, a visit to the associated site is practically mandatory to understand the issue at hand or to search for patches, among other relevant data. It is also arguable that a similar feed can be obtained by using an RSS feed. However, through Twitter, it is possible to monitor multiple accounts and to gather additional information not provided by RSS, such as timely breakthroughs or further discussion concerning the issue. Collecting OSINT is a Continuous Process. Another takeaway for security practitioners is that it is essential to follow news about all layers of the software stack by including keywords related to network protocols (e.g., SSL, HTTP) or purchased web services (e.g., cloud services, issue tracking services). This may seem obvious to the reader given this paper’s discussion. However, as part of a research work unrelated to this paper, when we asked security analysts of three industrial partners (nation-wide and global companies with dedicated Security Operations Centres) for keywords to describe their infrastructure (to guide our tweet collection), they did not include network protocols or hardware elements. Moreover, beyond receiving updates about selected assets, it is vital to obtain trending security news. It is hard to describe all relevant elements of a large company thoroughly, and maybe not all software in use is indexed or known. By extending the collection elements with (for example) topic detection techniques (e.g., [34,39,46,60]), one is more likely to cover all software in use. As Twitter can provide all these types of news and the research community has studied thoroughly topic detection on this platform, having trend detection might be mandatory for effective OSINT collection. Diverse Sources Complement Each Other. Finally, and to complement the previous insight, it is important to follow a diverse set of accounts to observe the broad universe of software vendors. The early alerts were posted by 53 different accounts (for 89 alerts), demonstrating that diversity of sources is crucial for awareness. Moreover, during this study we collected tweets posted by about 194,000 accounts, reinforcing the idea that Twitter is a cybsersecurity discussion hub. It is also important to discuss critical cases like the exploitation of CVE2017-0144, which became known as “wannacry”. The vulnerability was published on CVE/NVD and Microsoft’s security advisory, and patched a few months before the wannacry crisis. Therefore, by following Microsoft or CVE/NVD one would be aware of the issue and could avoid the ransomware. Once the vulnerability started being exploited, several online discussions suggested a set of configurations that blocked the exploit. Therefore, those that did not patch their systems (and for the Windows versions that were not patched by Microsoft) could benefit from OSINT once the attacks began. The wannacry ransomware generated a massive discussion on Twitter: describing the issue, how to avoid it, and informing about the kill switch that eventually disabled it.

Follow the Blue Bird: A Study on Threat Data Published on Twitter

11

233

Conclusions and Limitations

In this paper we provide an analysis of the richness of coverage of vulnerabilities and timeliness (in terms of reporting dates of vulnerabilities) of some of the most important OSINT sources, namely Twitter and several vulnerability databases. Our key findings are the following: no source could be considered clearly better than others and therefore diverse OSINT sources should be used as they complement each other; when considering only confirmed vulnerabilities, NVD should not be the unique vulnerability database subscribed; since 2010, Twitter provides an almost perfect vulnerability coverage; Twitter discusses vulnerabilities ahead of databases for very few cases (about 1% for the vulnerabilities examined), and is as timely as the vulnerability databases for the remaining cases; and finally, most of the vulnerabilities reported early on Twitter have a high or critical impact, with the tweet leading to usable mitigation measures. Beyond the collected facts, we provide a set of insights for the security practitioner interested in using OSINT for cybersecurity. These insight are based on our experience of manual inspection of almost one million tweets, and analysing many thousands of vulnerabilities. We believe this knowledge should be valuable for security analysis and research both in industry and academia. Limitations. The results presented in this paper are somewhat pessimistic in terms of the number of vulnerabilities that were found to have early alerts on Twitter. There could be more cases with media attention or early alerts that were not captured by our methodology since we cover a reduced amount of vulnerabilities. There is also the possibility of human error, as manual processing of tweets can lead to mistakes and missing some matches—all early alerts were triple checked to avoid false positives. Another factor that we cannot control (since we are performing a forensic analysis) is that some early alert tweets could have been deleted before this study, and thus not captured. Many dead links also invalidated possible matches, especially when the tweet links used some shortening system, such as “dlvr.it”, “hrbt.us”, “url4.eu”, “bit.ly”, or “ow.ly”.

Funding This work was supported by the H2020 European Project DiSIEM (H2020700692) and by the Funda¸c˜ao para a Ciˆencia e a Tecnologia (FCT) through project ThreatAdapt (FCT-FNR/0002/2018) and the LASIGE Research Unit (UIDB/00408/2020 and UIDP/00408/2020).

References 1. Additional paper data. https://github.com/fernandoblalves/Follow-the-BlueBird-Paper-Additional-Data. Accessed 10 Jul 2020 2. Common platform enumeration. https://cpe.mitre.org/about/. Accessed 15 Apr 2020

234

F. Alves et al.

3. Common vulnerabilities and exposures (cve). http://cve.mitre.org/. Accessed 15 Apr 2020 4. Common vulnerability scoring system version 3.0. https://www.first.org/cvss/v30/. Accessed 15 Apr 2020 5. Cvss v2 archive. https://www.first.org/cvss/v2/. Accessed 15 Apr 2020 6. Exploit database. www.exploit-db.com/. Accessed 15 Apr 2020 7. Get old tweets programatically. https://github.com/Jefferson-Henrique/ GetOldTweets-java. Accessed 15 Apr 2020 8. Hackers say they hacked nsa-linked group, want 1 million bitcoins to share more. https://www.vice.com/en us/article/ezpa9p/hackers-hack-nsa-linked-equationgroup. Accessed 15 Apr 2020 9. How people use Twitter in general. https://www.americanpressinstitute. org/publications/reports/survey-research/how-people-use-twitter-in-general/. Accessed 15 Apr 2020 10. How Twitter evolved from 2006 to 2011. https://buffer.com/resources/howtwitter-evolved-from-2006-to-2011. Accessed 15 Apr 2020 11. The mitre corporation. https://www.mitre.org/. Accessed 15 Apr 2020 12. National vulnerability database. https://nvd.nist.gov/. Accessed 15 Apr 2020 13. New targeted attack in the middle east by APT34, a suspected iranian threat group, using cve-2017-11882 exploit. https://www.fireeye.com/blog/threatresearch/2017/12/targeted-attack-in-middle-east-by-apt34.html. Accessed 15 Apr 2020 14. Operation aurora. https://en.wikipedia.org/wiki/Operation Aurora. Accessed 15 Apr 2020 15. Packet storm. https://packetstormsecurity.com/. Accessed 15 Apr 2020 16. The race between security professionals and adversaries. https://www. recordedfuture.com/vulnerability-disclosure-delay/. Accessed 15 Apr 2020 17. Security database. https://www.security-database.com/. Accessed 15 Apr 2020 18. Spiderfoot. https://www.spiderfoot.net/documentation/. Accessed 15 Apr 2020 19. Tweepy. https://www.tweepy.org/. Accessed 15 Apr 2020 20. Tweet attacks pro. http://www.tweetattackspro.com/. Accessed 15 Apr 2020 21. Twitter input plugin. https://www.elastic.co/guide/en/logstash/current/pluginsinputs-twitter.html. Accessed 15 Apr 2020 22. Veris taxonomy. http://veriscommunity.net/enums.html#section-incident desc. Accessed 13 Jun 2018 23. Watchguard firewalls - ‘escalateplowman’ ifconfig privilege escalation. https:// www.exploit-db.com/exploits/40270. Accessed 15 Apr 2020 24. Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., Shakarian, P.: Proactive identification of exploits in the wild through vulnerability mentions online. In: 2017 CyCon US (2017) 25. Alsaedi, N., Burnap, P., Rana, O.: Can we predict a riot? Disruptive event detection using Twitter. ACM TOIT 17(2), 1–26 (2017) 26. Alves, F., Bettini, A., Ferreira, P.M., Bessani, A.: Processing tweets for cybersecurity threat awareness. Inf. Syst. 95, 101586 (2020) 27. Andongabo, A., Gashi, I.: vepRisk - a web based analysis tool for public security data. In: 13th EDCC (2017) 28. Arora, A., Krishnan, R., Nandkumar, A., Telang, R., Yang, Y.: Impact of vulnerability disclosure and patch availability-an empirical analysis. In: Third Workshop on the Economics of Information Security (2004)

Follow the Blue Bird: A Study on Threat Data Published on Twitter

235

29. Behzadan, V., Aguirre, C., Bose, A., Hsu, W.: Corpus and deep learning classifier for collection of cyber threat indicators in Twitter stream. In: IEEE Big Data 2018 (2018) 30. Bian, J., Topaloglu, U., Yu, F.: Towards large-scale Twitter mining for drug-related adverse events. In: Proceedings of the SHB (2012) 31. Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: 16th ACM SIGKDD (2010) 32. Bullough, B.L., Yanchenko, A.K., Smith, C.L., Zipkin, J.R.: Predicting exploitation of disclosed software vulnerabilities using open-source data. In: 3rd ACM IWSPA (2017) 33. Campiolo, R., Santos, L.A.F., Batista, D.M., Gerosa, M.A.: Evaluating the utilization of Twitter messages as a source of security alerts. In: SAC 13 (2013) 34. Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: 10th MDM/KDD (2010) 35. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv:1810.04805 36. Dion´ısio, N., Alves, F., Ferreira, P.M., Bessani, A.: Cyberthreat detection from Twitter using deep neural networks. In: IJCNN 2019 (2019) 37. Dion´ısio, N., Alves, F., Ferreira, P.M., Bessani, A.: Towards end-to-end cyberthreat detection from Twitter using multi-task learning. In: IJCNN 2020 (2020) 38. Edkrantz, M., Truv´e, S., Said, A.: Predicting vulnerability exploits in the wild. In: 2nd IEEE CSCloud (2015) 39. Fedoryszak, M., Frederick, B., Rajaram, V., Zhong, C.: Real-time event detection on social data streams. In: 25th ACM SIGKDD - KDD ’9 (2019) 40. Le Sceller, Q., Karbab, E.B., Debbabi, M., Iqbal, F.: Sonar: automatic detection of cyber security events over the Twitter stream. In: 12th ARES (2017) 41. McNeil, N., Bridges, R.A., Iannacone, M.D., Czejdo, B., Perez, N., Goodall, J.R.: Pace: pattern accurate computationally efficient bootstrapping for timely discovery of cyber-security concepts. In: 12th ICMLA (2013) 42. McQueen, M.A., McQueen, T.A., Boyer, W.F., Chaffin, M.R.: Empirical estimates and observations of 0day vulnerabilities. In: 42nd HICSS (2009) 43. Mittal, S., Das, P.K., Mulwad, V., Joshi, A., Finin, T.: Cybertwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities. In: 2016 ASONAM (2016) 44. Moholth, O.C., Juric, R., McClenaghan, K.M.: Detecting cyber security vulnerabilities through reactive programming. In: HICSS 2019 (2019) 45. Nayak, K., Marino, D., Efstathopoulos, P., Dumitra¸s, T.: Some vulnerabilities are different than others. In: 17th RAID (2014) 46. Petrovi´c, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: 11th NAACL HLT (2010) 47. Reinthal, A., Filippakis, E.L., Almgren, M.: Data modelling for predicting exploits. In: NordSec (2018) 48. Ritter, A., Wright, E., Casey, W., Mitchell, T.: Weakly supervised extraction of computer security events from Twitter. In: 24th WWW (2015) 49. Rodriguez, L.G.A., Trazzi, J.S., Fossaluza, V., Campiolo, R., Batista, D.M.: Analysis of vulnerability disclosure delays from the national vulnerability database. In: WSCDC-SBRC 2018 (2018) 50. Roumani, Y., Nwankpa, J.K., Roumani, Y.F.: Time series modeling of vulnerabilities. Comput. Secur. 51, 32–40 (2015)

236

F. Alves et al.

51. Sabottke, C., Suciu, O., Dumitra¸s, T.: Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits. In: 24th USENIX Security Symposium (2015) 52. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: 19th WWW (2010) 53. Sapienza, A., Bessi, A., Damodaran, S., Shakarian, P., Lerman, K., Ferrara, E.: Early warnings of cyber threats in online discussions. In: 2017 ICDMW (2017) 54. Sapienza, A., Ernala, S.K., Bessi, A., Lerman, K., Ferrara, E.: Discover: mining online chatter for emerging cyber threats. In: WWW’18 Companion (2018) 55. Sauerwein, C., Sillaber, C., Huber, M.M., Mussmann, A., Breu, R.: The tweet advantage: an empirical analysis of 0-day vulnerability information shared on Twitter. In: 33rd IFIP SEC (2018) 56. Shahzad, M., Shafiq, M.Z., Liu, A.X.: A large scale exploratory analysis of software vulnerability life cycles. In: 34th ICSE (2012) 57. Steele, R.D.: Open source intelligence: what is it? why is it important to the military. Am. Intell. J. 17(1), 35–41 (1996) 58. Syed, R., Rahafrooz, M., Keisler, J.M.: What it takes to get retweeted: an analysis of software vulnerability messages. Comput. Hum. Behav. 80, 207–215 (2018) 59. Trabelsi, S., Plate, H., Abida, A., Aoun, M.M.B., Zouaoui, A., et al.: Mining social networks for software vulnerabilities monitoring. In: 7th NTMS (2015) 60. Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. IEEE TKDE 28(8), 2216–2229 (2016) 61. Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: 29th AAAI (2015) 62. Zaki, M.J., et al.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2014)

Dynamic and Secure Memory Transformation in Userspace Robert Lyerly(B) , Xiaoguang Wang(B) , and Binoy Ravindran(B) Virginia Tech, Blacksburg, VA, USA {rlyerly,xiaoguang,binoy}@vt.edu

Abstract. Continuous code re-randomization has been proposed as a way to prevent advanced code reuse attacks. However, recent research shows the possibility of exploiting the runtime stack even when performing integrity checks or code re-randomization protections. Additionally, existing re-randomization frameworks do not achieve strong isolation, transparency and efficiency when securing the vulnerable application. In this paper we present Chameleon, a userspace framework for dynamic and secure application memory transformation. Chameleon is an out-ofband system, meaning it leverages standard userspace primitives to monitor and transform the target application memory from an entirely separate process. We present the design and implementation of Chameleon to dynamically re-randomize the application stack slot layout, defeating recent attacks on stack object exploitation. The evaluation shows Chameleon significantly raises the bar of stack object related attacks with only a 1.1% overhead when re-randomizing every 50 ms.

1

Introduction

Memory corruption is still one of the biggest threats to software security [43]. Attackers use memory corruption as a starting point to directly hijack program control flow [9,18,22], modify control data [24,25], or steal secrets in memory [20]. Recent works have shown that it is possible to exploit the stack even under new integrity protections designed to combat the latest attacks [23–25,31]. For example, position-independent return-oriented programming (PIROP) [23] leverages a user controlled sequence of function calls and un-erased stack memory left on the stack after returning from functions (e.g., return addresses, initialized local data) to construct a ROP payload. Data-oriented programming (DOP) also heavily relies on user-controlled stack objects to change the execution path in an attacker-intended way [24–26]. Both of these attacks defeat existing code re-randomization mechanisms, which continuously permute the locations of functions [50] or hide function locations to prevent memory disclosure vulnerabilities from constructing gadget chains [14]. In this work, we present Chameleon, a continuous stack randomization framework. Chameleon, like other continuous re-randomization frameworks, periodically permutes the application’s memory layout in order to prevent attackers c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 237–256, 2020. https://doi.org/10.1007/978-3-030-58951-6_12

238

R. Lyerly et al.

from using memory disclosure vulnerabilities to exfiltrate data or construct malicious payloads. Chameleon, however, focuses on randomizing the stack – it randomizes the layout of every function’s stack frame so that attackers cannot rely on the locations of stack data for attacks. In order to correctly reference local variables in the randomized stack layout, Chameleon also rewrites every function’s code, further disrupting code reuse attacks that expect certain instruction sequences (either aligned or unaligned). Chameleon periodically interrupts the application to rewrite the stack and inject new code. In this way, Chameleon can defeat attacks that rely on stack data locations such as PIROP or DOP. Chameleon is also novel in how it implements re-randomization. Existing works build complex runtimes into the application’s address space that add non-trivial performance overhead from code instrumentation [1,14,17,50]. Chameleon is instead an out-of-band framework that executes in userspace in an entirely separate process. Chameleon attaches to the application using standard OS interfaces for observation and re-randomization. This provides strong isolation between Chameleon and application – attackers cannot observe the re-randomization process (e.g., observe random number generator state, dump memory layout information) and Chameleon does not interact with any usercontrolled input. Additionally, cleanly separating Chameleon from the application allows much of the re-randomization process to proceed in parallel. This design adds minimal overhead, as Chameleon only blocks the application when switching between randomized stack layouts. Chameleon can efficiently rerandomize an application’s stack layout with randomization periods in the range of tens of milliseconds. In this paper, we make the following contributions: – We describe Chameleon, a system for continuously re-randomizing application stack layouts, – We detail Chameleon’s stack randomization process that relies on using compiler-generated function metadata and runtime binary reassembly, – We describe how Chameleon uses the standard ptrace and userfaultfd OS interfaces to efficiently transform the application’s stack and inject newlyrewritten code, – We evaluate the security benefits of Chameleon and report its performance overhead when randomizing code on benchmarks from the SPEC CPU 2017 [42] and NPB [4] benchmark suites. Chameleon’s out-of-band architecture allows it to randomize stack slot layout with only 1.1% overhead when changing the layout with a 50 millisecond period, – We describe how Chameleon disrupts a real-world attack against the popular nginx webserver The rest of this paper is organized as follows: in Sect. 2, we describe the background and the threat model. We then present the design and implementation of Chameleon in Sect. 3. We evaluate Chameleon security properties and performance in Sect. 4. We discuss related works in Sect. 5 and conclude the paper in Sect. 6.

Dynamic and Secure Memory Transformation in Userspace

239

Fig. 1. Position Independent ROP. (a) Stack massaging uses code pointers that remain on the stack after the function returns. (b) Code pointer and operand patching write part of the massaged stack memory with relative memory writes to construct the payload.

2

Background and Threat Model

Before describing Chameleon, we first describe how stack object related attacks target vulnerable applications, including detailing a recent presented positionindependent code reuse attack. We then define the threat model of these attacks. 2.1

Background

Traditional code reuse attacks rely on runtime application memory information to construct the malicious payload. Return-oriented programming (ROP) [36,40] chains and executes several short instruction sequences ending with ret instructions, called gadgets, to conduct Turing-complete computation. After carefully constructing the ROP payload of gadget pointers and data operands, the attacker then tricks the victim process into using the ROP payload as stack data. Once the ROP payload is triggered, gadget pointers are loaded into the program counter (which directs control flow to the gadgets) and the operand data is populated into registers to perform the intended operations (e.g., prepare parameters to issue an attacker-intended system call). Modern attacks, such as JIT-ROP [41], utilize a memory disclosure vulnerability to defeat coarse-grained randomization techniques such as ASLR [29] by dynamically discovering gadgets and constructing gadget payloads. Position-independent ROP (PIROP) [23] proposes a novel way to reuse existing pointers on the stack (e.g., function addresses) as well as relative code offsets to construct the ROP payload. PIROP constructs the ROP payload agnostic to the code’s absolute address. It leverages the fact that function call return addresses and local variables may remain on the stack even after the function returns, meaning the next function call may observe stack local variables and

240

R. Lyerly et al.

code pointers from the previous function call. By carefully controlling the application input, the attacker triggers specific call paths and constructs a stack with attacker-controlled code pointers and operand data. This stack construction procedure is called stack massaging (Fig. 1 (a)). The next step modifies some bits of the code pointers to make them point to the intended gadgets (Fig. 1 (b)). This is called code pointer and data operand patching. Since code pointers left from stack massaging point to code pages, it is possible to modify some bits of the pointer using relative memory writes to redirect it to a gadget on the same code page. Fundamentally, PIROP assumes function calls leave their stack slot contents on the stack even after the call returns. By using a temporal sequence of different function calls to write pointers to the stack, PIROP constructs a skeleton of the ROP payload. Very few existing defenses break this assumption. 2.2

Threat Model and Assumptions

The attacker communicates with the target application through typical I/O interfaces such as sockets, giving the attacker the ability to send arbitrary input data to the target. The attacker has the target application binary, thus they are aware of the relative addresses inside any 4 K memory page windows. The attacker can exploit a memory disclosure vulnerability to read arbitrary memory locations and can use PIROP to construct the ROP payload. The application is running using standard memory protection mechanisms such that no page has both write and execute permissions; this means the attacker cannot directly inject code but must instead rely on constructing gadget chains. However, the gadget chains crafted by the attacker can invoke system APIs such as mprotect to create such regions if needed. The attacker knows that the target is running under Chameleon’s control and therefore knows of its randomization capabilities. We assume the system software infrastructure (compiler, kernel) is trusted and therefore the capabilities provided by these systems are correct and sound.

3

Design

Chameleon continuously re-randomizes the code section and stack layout of an application in order to harden it against temporal stack object reuse (i.e. PIROP), stack control data smashing and stack object disclosures. As a result of running under Chameleon, gadget addresses or stack object locations that are leaked by memory disclosures and that help facilitate other attacks (temporal stack object reuse, payload construction) are only useful until the next randomization, after which the attacker must re-discover the new layout and locations of sensitive data. Chameleon continuously randomizes the application (hereafter called the target or child) quickly enough so that it becomes probabilistically impossible for attackers to construct and execute attacks against the target. Similarly to previous re-randomization frameworks [17,50], Chameleon is transparent – the target has no indication that it is being re-randomized. However, Chameleon’s architecture is different from existing frameworks in that it

Dynamic and Secure Memory Transformation in Userspace

241

executes outside the target application’s address space and attaches to the target using standard OS interfaces. This avoids the need for bootstrapping and running randomization machinery inside an application, which adds complexity and high overheads. Chameleon runs all randomization machinery in a separate process, which allows generating the next set of randomization information in parallel with normal target execution. This also strongly isolates Chameleon from the target in order to make it extremely difficult for attackers to observe the randomization process itself. These benefits make Chameleon easier to use and less intrusive versus existing re-randomization systems. 3.1

Requirements

Chameleon needs a description of each function’s stack layout, including location, size and alignment of each stack slot, so that it can randomize each stack slot’s location. Ideally, Chameleon would be able to determine every stack slot’s location, size and alignment by analyzing a function’s machine code. In reality, however, it is impossible to tell from the machine code whether adjacent stack memory locations are separate stack slots (which can be relocated independently) or multiple parts of a single stack slot that must be relocated together (e.g., a struct with several fields)1 . Therefore, Chameleon requires metadata from the compiler describing how it has laid out the stack. While DWARF debugging information [21] can provide some of the required information, it is best-effort and does not capture a complete view of execution state needed for transformation (e.g., unnamed values created during optimization). Instead, Chameleon builds upon existing work [5] that extends LLVM’s stack maps [30] to dump a complete view of function activations. The compiler instruments LLVM bitcode to track live values (stack objects, local variables) by adding stack maps at individual points inside the code. In the backend, stack maps force generation of a per-function record listing stack slot sizes, alignments and offsets. Stack maps also record locations of all live values at the location where the stack map was inserted. Chameleon uses each stack map to reconstruct the frame at that location. The modified LLVM extends stack maps to add extra semantic information for live values, particularly whether a live value is a pointer. This allows Chameleon to detect at runtime if the pointer references the stack, and if so, update the pointer to the stack slot’s randomized location. The metadata also describes each function’s location and size, which Chameleon uses to patch each function to match the randomized layout. All of the metadata is generated at compile time and is lowered into the binary. Chameleon also needs to rewrite stack slot references in code to point to their new locations and must transform existing execution state, namely stack memory and registers, to adhere to the new randomized layout. To switch between different randomized stack layouts (named randomization epochs), Chameleon must be able to pause the target, observe current target execution state, rewrite 1

This information could potentially be inferred heuristically, e.g., from a decompiler.

242

R. Lyerly et al.

the existing state to match the new layout and inject code matching the new layout. Chameleon uses two kernel interfaces, ptrace and userfaultfd, to monitor and transform the target. ptrace [48] is widely used by debuggers to inspect and control the execution of tracees. ptrace allows tracers (e.g., Chameleon) to read and modify tracee state (per-thread register sets, signal masks, virtual memory), intercept events such as signals and system calls, and forcibly interrupt tracee threads. userfaultfd [27] is a Linux kernel mechanism that allows delegating handling of page faults for a memory region to user-space. When accesses to a region of memory attached to a userfaultfd file descriptor cause a page fault, the kernel sends a request to a process reading from the descriptor. The process can then respond with the data for the page by writing a response to the file descriptor. These two interfaces together give Chameleon powerful and flexible process control tools that add minimal overhead to the target. 3.2

Re-Randomization Architecture

Chameleon uses the mechanisms described in Sect. 3.1 to transparently observe the target’s execution state and periodically interrupt the target to switch it to the next randomization epoch. In between randomization epochs, Chameleon executes in parallel with the target to generate the next set of randomized stack layouts and code. Figure 2 shows Chameleon’s system architecture. Users launch the target application by passing the command line arguments to Chameleon. After reading the code and state transformation metadata from the target’s binary, Chameleon forks the target application and attaches to it via ptrace and userfaultfd. From this point on, Chameleon enters a re-randomization loop. At the start of a new randomization cycle, a scrambler thread iterates through every function in the target’s code, randomizing the stack layout as described below. At some point, a re-randomization timer fires, triggering a switch to the next randomization epoch. When the re-randomization event fires, the event handler thread interrupts the target and switches the target to the next randomization epoch by dropping the existing code pages and transforming the target’s execution state (stack, registers) to the new randomized layout produced by the scrambler. After transformation, the event handler writes the execution state back into the target and resumes the child; it then blocks until the next re-randomization event. As the child begins executing, it triggers code page faults by fetching instructions from dropped code pages. A fault handler thread handles these page faults by serving the newly randomized code. In this way the entire re-randomization procedure is transparent to the target and incurs low overheads. We describe each part of the architecture in the following sections. Randomizing stack layouts. Chameleon randomizes function stack layouts by logically permuting stack slot locations and adding padding between the slots. Chameleon also transforms stack memory references in code to point to their randomized locations. When patching the code, Chameleon must work within the space of the code emitted by the compiler. If, for example, Chameleon wanted to change the size of code by inserting arbitrary instructions or changing the operand encoding of existing instructions, Chameleon would need to update all

Dynamic and Secure Memory Transformation in Userspace

243

Fig. 2. Chameleon system architecture. An event handler thread waits for events in a target thread (e.g., signals), interrupts the thread, and reads/writes the thread’s execution state (registers, stack) using ptrace. A scrambler thread concurrently prepares the next set of randomized code for the next re-randomization. A fault handler thread responds to page faults in the target by passing pages from the current code randomization to the kernel through userfaultfd.

code references affected by change in size (e.g., jumps between basic blocks, function calls/returns, etc.). Because finding and updating all code references is known to not be statically solvable [47], previous re-randomization works either leverage dynamic binary instrumentation (DBI) frameworks [7,17,32] or an indirection table [3,50] in order to allow arbitrary code instrumentation. There are problems with both approaches – the former often have large performance costs while the latter does not actually re-randomize the stack layout, instead opting to try and hide code pages from attackers. Chameleon instead applies stack layout randomization without changing the size of code to avoid these problems. In order to facilitate randomizing all elements of the stack, Chameleon modifies the compiler to (1) pad function prologues and epilogues with nop instructions that can be rewritten with other instructions and (2) force 4-byte immediate encodings for all memory operands2 . Chameleon both permutes the ordering and adds random amounts of padding between stack slots; the latter is configurable so users can control how much memory is used versus how much randomness is added between slots. Figure 3 shows how Chameleon randomizes the following stack elements: (1) Callee-saved registers: the compiler saves and restores callee-saved registers through push and pop instructions. Chameleon uses the nop padding emitted by the compiler to rewrite them as mov instructions, allowing the scrambler to place callee-saved registers at arbitrary locations on the stack. Chameleon also randomizes the locations of the return address and saved frame base pointer by inserting mov 2

x86-64 backends typically emit small immediate operands using a 1-byte encoding.

244

R. Lyerly et al.

Fig. 3. Stack slot randomization. Chameleon permutes the ordering and adds random amounts of padding between slots.

intructions in the function’s prologue and epilogue. (2) Local variables: compilers emit references to stack-allocated variables as offsets from the frame base pointer (FBP) or stack pointer (SP). Chameleon randomizes the locations of local variables by rewriting a variable’s offset to point to the randomized location. Chameleon does not currently randomize the locations of stack arguments for called functions as the locations are dictated by the ABI and would require rewriting both the caller and callee with a new parameter passing convention. We plan these transformations as future work. Serving code pages. Chameleon needs a mechanism to transparently and efficiently serve randomized code pages to the child. While Chameleon could use ptrace to directly write the randomized code into the address space of the child application, this would cause large delays when swapping between randomization epochs for applications with large code sections – Chameleon would have to bulk write the entire code section on every re-randomization. Instead Chameleon uses userfaultfd and page faults, which allows quicker switches between epochs by demand paging code into the target application’s address space. At startup, Chameleon attaches a userfaultfd file descriptor to the target’s code memory virtual memory area (VMA). userfaultfd descriptors can only be opened by the process that owns the memory region for which faults should be handled. Chameleon cannot directly open a userfaultfd descriptor for the target but can induce the target application to create a descriptor and pass it to Chameleon over a socket before the target begins normal execution. Chameleon uses compel [15], a library that facilitates implanting parasites into applications controlled via ptrace. Parasites are small binary blobs which

Dynamic and Secure Memory Transformation in Userspace

245

execute code in the context of the target application. For Chameleon, the parasite opens a userfaultfd descriptor and passes it to Chameleon through a socket. To execute the parasite, Chameleon takes a snapshot of the target process’ main thread (registers, signal masks). Then, it finds an executable region of memory in the target and writes the parasite code into the target. Because ptrace allows writing a thread’s registers, Chameleon redirects the target thread’s program counter to the parasite and begins execution. The parasite opens a control socket, initializes a userfaultfd descriptor, passes the descriptor to Chameleon, and exits at well-known location. Chameleon intercepts the thread at the exit point, restores the thread’s registers and signal mask to their original values and restores the code clobbered by the parasite. After receiving the userfaultfd descriptor, Chameleon must prepare the target’s code region for attaching (userfaultfd descriptors can only attach to anonymous VMAs [44]). Chameleon executes an mmap system call inside the target to remap the code section as anonymous and then registers the code section with the userfaultfd descriptor. The controller then starts the fault handler thread, which serves code pages through the userfaultfd descriptor from the scrambler thread’s code buffer as the target accesses unmapped pages. Switching Between Randomization Epochs. The event handler begins switching the target to the new set of randomized code when interrupted by the re-randomization alarm. The event handler interrupts the target, converts existing execution state (registers, stack memory) to the next randomization epoch, and drops existing code pages so the target can fetch fresh code pages on-demand. The event handler issues a ptrace interrupt to grab control of the target. At this point Chameleon needs to transform the target’s current stack to match the new stack layout. The compiler-emitted stackmaps only describe the complete stack layout at given points inside of a function, called transformation points. To switch to the next randomization epoch, Chameleon must advance the target to a transformation point. While the thread is interrupted, Chameleon uses ptrace to write trap instructions into the code at transformation points found during initial code disassembly and analysis3 . Chameleon then resumes the target thread and waits for it to reach the trap. When it executes the trap, the kernel interrupts the thread and Chameleon regains control. Chameleon then restores the original instructions and begins state transformation. Chameleon unwinds the stack using stackmaps similarly to other rerandomiz-ation systems [50]. During unwinding, however, Chameleon shuffles stack objects to their new randomized locations using information generated by the scrambler (Fig. 3). For each function, the scrambler creates a mapping between the original and randomized offsets of all stack slots. This mapping is used to move stack data from its current randomized location to the next randomized location. To access the target’s stack, Chameleon reads the target’s register values using ptrace and the target’s stack using /proc//mem4 . After reaching a transformation point, Chameleon reads the thread’s entire stack into a buffer, located using the target’s stack pointer. Chameleon passes the stack pointer, register set and buffer containing stack data to a stack transformation library to transform it to the new randomized layout. The final step in re-randomization is to map the new code into the target’s address space using userfaultfd. Chameleon executes an madvise system call with the MADV DONTNEED flag for the code section in the context of the target, which instructs the kernel to drop all existing code pages and cause fresh page faults upon subsequent execution. The fault handler begins serving page faults from the code buffer for the new randomization epoch and the target is released to continue normal execution. At this point the target is now executing under a new set of randomized code. The event handler thread signals the scrambler thread to begin generating the next randomization epoch. In this way, switching randomization epochs blocks the target only to transform the target thread’s stack and drop the existing code pages. The most expensive work of generating newly randomized code happens in parallel with the target application’s execution, highlighting one of the major benefits of cleanly separating re-randomization into a separate process from the target application. Multi-process Applications. Chameleon supports multi-process applications such as web servers that fork children for handling requests. When the target forks a new child, the kernel informs Chameleon of a fork event. The new process inherits ptrace status from the parent, meaning the event handler also has tracing privileges for the new child. At this point, the controller instantiates a new scrambler, fault handler and event handler for the new child. Chameleon hands off tracer privileges from the parent to the child event handler thread so the new handler can control the new child. In order to do this, Chameleon first redirects the new child to a blocking read on a socket through code installed via parasite. The original event handler thread then detaches from the new child, allowing the new event handler thread to become the tracer for the new child while it is blocked. After attaching, the new event handler restores the new child to the fork location and removes the parasite. In this way, Chameleon always maintains complete control of applications even when they fork new processes. 3.3

A Prototype of Chameleon

Chameleon is implemented in 6092 lines of C++ code, which includes the event handler, scrambler and fault handler. Chameleon extends code from an open source stack transformation framework [5] to generate transformation metadata and transform the stack at runtime. Chameleon uses DynamoRIO [7] disassemble and re-assemble the target’s machine code. Currently Chameleon supports x86-64. Chameleon’s use of ptrace prevents attaching other ptrace-based applications such as GDB. However, it is unlikely users will want to use both, as GDB is most useful during development and testing. However, Chameleon could be 4

This file allows tracers to seek to arbitrary addresses in the target’s address space to read/write ranges of memory.

Dynamic and Secure Memory Transformation in Userspace

247

extended to dump randomization information when the target crashes to allow debugging core dumps in debuggers.

4

Evaluation

In this section we evaluate Chameleon’s capabilities both in terms of security benefits and overheads: – What kinds of security benefits does Chameleon provide? In particular, how much randomization does it inject into stack frame layouts? This includes describing a real-world case study of how Chameleon defeats a web server attack. (Sect. 4.1) – How much overhead does Chameleon impose for these security guarantees, including how expensive are the individual components of Chameleon and how much overhead does it add to the total execution time? (Sect. 4.2) Experimental Setup. Chameleon was evaluated on an x86-64 server containing an Intel Xeon 2620v4 with a clock speed of 2.1 GHz (max boost clock of 3.0 GHz). The Xeon 2620v4 contains 8 physical cores and has 2 hardware threads per core for a total of 16 hardware threads. The server contains 32 GB of DDR4 RAM. Chameleon is run on Debian 8.11 “Jessie” using Linux kernel 4.9. Chameleon was configured to add a maximum padding of 1024 bytes between stack slots. Chameleon was evaluated using benchmarks from the SNU C version of the NPB benchmarks [4,38] and SPEC CPU 2017 [42]. Benchmarks were compiled with all optimizations (-O3) using the previously described compiler, built on clang/LLVM v3.7.1. The single-threaded version of NPB was used. 4.1

Security Analysis

We first analyze both the quality of Chameleon runtime re-randomization in the target and describe the security of the Chameleon framework itself. Because Chameleon, like other approaches [17,45,50], relies on layout randomization to disrupt attackers, it cannot make any guarantees that attacks will not succeed. There is always the possibility that the attacker is lucky and guesses the exact randomization (both stack layout and randomized code) and is able to construct a payload to exploit the application and force it into a malicious execution. However, with sufficient randomization, the probability that such an attack will succeed is so low as to be practically impossible. Target Randomization: Chameleon randomizes the target in two dimensions: randomizing the layout of stack elements and rewriting the code to match the randomized layout. We first evaluated how well randomizing the stack disrupts attacks that utilize known locations of stack elements. When quantifying the randomization quality of a given system, many works use entropy or the number of randomizable states as a measure of randomness. For Chameleon, entropy refers to the number of potential locations a stack element could be placed, i.e., the number of randomizable locations.

248

R. Lyerly et al.

Fig. 4. Average number of bits of entropy for a stack element across all functions within each binary. Bits of entropy quantify in how many possible locations a stack element may be post-randomization – for example, 2 bits of entropy mean the stack element could be in 22 = 4 possible locations with a 14 = 25% chance of guessing the location.

Figure 4 shows the average entropy created by Chameleon for each benchmark. For each application, the y-axis indicates the geometric mean of the number of bits of entropy across all stack slots in all functions. Chameleon provides a geometric mean 9.17 bits of entropy for SPEC and 9.03 bits for NPB. Functions with more stack elements have higher entropy as there are a larger number of permutations. SPEC’s benchmarks tend to have higher entropy because they have more stack slots. While an attacker may be able to guess the location of a single stack element with 9 bits of entropy (probability of 219 = 0.00195), the attacker must chain together knowledge of multiple stack locations to make a successful attack. For an attack that must corrupt three stack slots, the attacker has a 0.001953 = 7.45 ∗ 10−9 probability of correctly guessing the stack locations, therefore making successful attacks probabilistically impossible. It is also important to note that the amount of entropy can be increased arbitrarily by increasing the amount of padding between stack slots, which necessarily creates much larger stacks. We conclude that Chameleon makes it infeasible for attackers to guess stack locations needed in exploits. Next, we evaluated how Chameleon’s code patching disturbs gadget chains. Attackers construct malicious executions by chaining together gadgets that perform a very basic and low-level operation. Gadgets, and therefore gadget chains, are very frail – slight disruptions to a gadget’s behavior can disrupt the entire intended functionality of the chain. As part of the re-randomization process, Chameleon rewrites the application’s code to match the randomized stack layout. A side effect of this is that gadgets may be disrupted – Chameleon may overwrite part or all of a gadget, changing its functionality and disrupting the gadget chain. To analyze how many gadgets are disrupted, we searched for gadgets in the benchmark binaries and cross-referenced gadget addresses with instructions rewritten by Chameleon. We used Ropper5 , a gadget finder tool, to find all ROP gadgets (those that end in a return) and JOP gadgets (those that end in a call or jump) in the application binaries. We searched for gadgets of 6 instructions 5

https://github.com/sashs/Ropper.

Dynamic and Secure Memory Transformation in Userspace

249

Fig. 5. Percent of gadgets disrupted by Chameleon’s code randomization

or less, as longer gadgets become increasingly hard to use due to unintended gadget side effects (e.g., clobbering registers). Figure 5 shows the percent of gadgets disrupted as part of Chameleon’s stack randomization process. When searching the binary, Ropper may return singleinstruction gadgets that only perform control flow. We term these “trivial” gadgets and provide results with and without trivial gadgets. Chameleon disrupted a geometric mean of 55.81% gadgets or 76.32% of non-trivial gadgets. While Chameleon did not disrupt all gadgets, it disrupted enough that attackers will have a hard time chaining together functionality without having to use one of the gadgets altered by Chameleon. To better understand the attacker’s dilemma, previous work by Cheng et al. [12] mentions that the shortest useful ROP attack produced by the Q ROP compiler [37] consisted of 13 gadgets. Assuming gadgets are chosen with a uniform random possibility from the set of all available gadgets, attackers would have a probability of (1 − 0.5582)13 = 2.44 × 10−5 of being able to construct an unaltered gadget chain, or a (1 − 0.7632)13 = 7.36 × 10−9 probability if using non-trivial gadgets. Therefore, probabilistically speaking it is very unlikely that the attacker will be able to construct gadget chains that have not been altered by Chameleon. Defeating Real Attacks. To better understand how re-randomization can help protect target applications from attackers, we used Chameleon to disrupt a flaw found in a real application. Nginx [35] is a lightweight and fast open-source webserver used by a large number of popular websites. CVE-2013-2028 [16] is a vulnerability affecting nginx v1.3.9/1.4.0 in which a carefully crafted set of requests can lead to a stack buffer overflow. When parsing an HTTP request, the Nginx worker process allocates a page-sized buffer on the stack and calls recv to read the request body. By using a “chunked” transfer encoding and triggering a certain sequence of HTTP parse operations through specifically-sized messages, the attacker can underflow the size variable used in the recv operation on the stack buffer and allow the attacker to send an arbitrarily large payload. VNSecurity published a proof-of-concept attack [46] that uses this buffer overflow to build a ROP gadget chain that remaps a piece of memory with read, write and execute permissions. After creating a buffer for injecting code, the ROP chain copies instruction bytes from the payload to the buffer and “returns” to

250

R. Lyerly et al.

the payload by placing the address of the buffer as the final return address on the stack. The instructions in the buffer set up arguments and call the system syscall to spawn a shell on the server. The attacker can then remotely connect to the shell and gain privileged access to the machine. Chameleon randomizes both the stack buffer and the return address targeted by this attack. There are four stack slots in the associated function, meaning the vulnerable stack buffer can be in one of four locations in the final ordering. Using a maximum slot padding size of 1024, Chameleon will insert anywhere between 0 and 1024 bytes of padding between slots. The slot has an alignment restriction of 16, meaning there are 1024 16 = 64 possible amounts of padding that can be added between the vulnerable buffer and the preceding stack slot. Therefore, Chameleon can place the buffer at 4 ∗ 64 = 256 possible locations within the frame for 8 bits of entropy. Thus, an attacker has a probability of 218 = 0.0039 of guessing the correct buffer location. Additionally, the attacker must guess the location of the return address, which could be at 4∗ 1024 8 = 512 possible locations to initiate the attack, meaning the attacker will have probability of 7.62 ∗ 10−6 of correctly placing data to start the attack. Attacking Chameleon. We also analyzed how secure Chameleon is itself from attackers. Chameleon is most vulnerable when setting up the target as Chameleon communicates with the parasite over Unix domain sockets. However, these sockets are short lived, only available to local processes (not over the network) and only pass control flags and the userfaultfd file descriptor – Chameleon can easily validate the correctness of these messages. After the initial application setup, Chameleon only interacts with the outside world through ptrace and ioctl (for userfaultfd). The only avenue that attackers could potentially use to hijack Chameleon would be through corrupting state in the target binary/application which is then subsequently read during one of the rerandomization periods. Although it is conceivable that attackers could corrupt memory in such a way as to trigger a flaw in Chameleon, it is unlikely that they would be able to gain enough control to perform useful functionality; the most likely outcomes of such an attack are null pointer exceptions caused by Chameleon following erroneous pointers when transforming the target’s stack. Additionally, because Chameleon is a small codebase, it could potentially be instrumented with safeguards and even formally verified. This is a large benefit of Chameleon’s strong isolation – it is much simpler to verify its correctness. Thus, we argue that Chameleon’s system architecture is safe for enhancing the security of target applications. 4.2

Performance

We next evaluated the performance of target applications executing under Chameleon’s control. As mentioned in Sect. 3.2, Chameleon must perform a number of duties to continuously re-randomize applications. In particular, Chameleon runs a scrambler thread to generate a new set of randomized code, runs a fault handler to respond to code page faults with the current set of randomized code, and periodically switches the target application between randomization epochs.

Dynamic and Secure Memory Transformation in Userspace

251

Fig. 6. Overhead when running applications under Chameleon versus execution without Chameleon. Overheads rise with smaller re-randomization periods, but are negligible in most cases.

Figure 6 shows the slowdown of each benchmark when re-randomizing the application every 100 ms, 50 ms and 10 ms versus execution without Chameleon. More frequent randomizations makes it harder for attackers to discover and exploit the current target application’s layout at the cost of increased overhead. For SPEC, Chameleon re-randomizes target applications with a geometric mean 1.19% and 1.88% overhead with a 100 ms and 50 ms period, respectively. For NPB, the geometric means are 0.53% and 0.77%, respectively. Re-randomizing with a 10 ms period raises the overhead to 18.8% for SPEC and 4.18% for NPB. This is due to the time it takes the scrambler thread to randomize all stack layouts and rewrite the code to match – with a 10 ms period, the event handler thread must wait for the scrambler to finish generating the next randomization epoch before switching the target. With 100 ms and 50 ms periods, the scrambler’s code randomization latency is completely hidden.

Fig. 7. Time to switch the target between randomization epochs, including advancing to a transformation point, transforming the stack and dropping existing code pages.

We also analyzed how long it took Chameleon to switch between randomization epochs as described in Sect. 3.2. Figure 7 shows the average switching cost for each benchmark. Switching between randomization epochs is an inexpensive process. For both 100 ms and 50 ms periods, it takes a geometric mean of 335 µs

252

R. Lyerly et al.

for SPEC and 250 µs for NPB to perform the entire procedure transformation. For these two re-randomization periods, only deepsjeng and LU take longer than 600 µs. This is due to large on-stack variables (e.g., LU allocates a 400 KB stack buffer) that must be copied between randomized locations. Nevertheless, as a percentage of the re-randomization period, transformations are inexpensive: 0.2% of the 100 ms re-randomization period and 0.5% of the 50 ms period. We also measured page fault overhead of 5.06 µs per fault. While Chameleon causes page faults throughout the lifetime of the target to bring in new randomized pages, we measured that this usually added less than 0.1% overhead to applications. There are several performance outliers for the 10 ms transformation period. deepsjeng, nab and UA’s overheads increase drastically due to code randomization overhead. When the event handler thread receives a signal to start a rerandomization, it advances the target to a transformation point and blocks until the scrambler thread signals it has finished re-randomizing the code. Because these applications have higher code randomization costs, the event handler thread is blocked waiting for a significant amount of time. We conclude that Chameleon is able to inject significant amounts of entropy in target applications while adding minimal overheads.

5

Related Works

Stack object-based attacks were proposed a long time ago but are regaining popularity due to the recent data oriented attacks and position-independent code reuse attacks [23–26]. Traditional “stack smashing” attacks overflow the stack local buffer and modify the return address on the stack so that upon returning from the vulnerable function, the application jumps to the malicious payload [2]. There are a number of techniques proposed to prevent the return address from being corrupted, such as stack canaries and shadow stacks [8,13,49]. Stack canaries place a random value in between the function return address and the stack local buffer and re-checks the value before function returns. The program executes the warning code and terminates if the canary value is changed [13]. Shadow stacks further enforce backward control flow integrity by storing the function return values in a separate space [8,49]. Both approaches focus on protecting direct control data on stack without protecting other stack objects. Recent works have shown that stack objects other than function return addresses could also be used to generate exploits. G¨oktas et al. proposed using function return addresses and the initialized data that function calls left on the stack to construct position-independent ROP payloads. This way of legally using function calls to construct the malicious payload on the stack is named “stack massaging” [23]. Similarly, attackers can also manipulate other non-control data on the stack to fully control the target. Hu et al. proposed a general approach to automatically synthesize such data-oriented attacks, named data-oriented programming (DOP) [24,25]. They used the fact that non-control data corruption could potentially be used to modify the program’s control flow and implement

Dynamic and Secure Memory Transformation in Userspace

253

memory loads and stores. By using a gadget dispatcher (normally a loop and a selector), the attacker could keep the program executing data-oriented gadgets. Note that both of these attacks leverage non-control data on the stack, bypassing existing control flow integrity checks. Strict boundary checking could be a solution to preventing memory exploits. Such boundary checking could be either software-based [28,33,39] or hardwarebased [19,34]. For example, Intel MPX introduces new bounds registers and an instruction set for boundary checking [34]. Besides the relatively large performance overhead introduced by strict boundary checks, the integrity-based approaches cannot defeat the stack object manipulation caused by temporal function calls [23]. StackArmor statically instruments the binary and randomly allocates discontinuous stack pages [10]. Although StackArmor can break the linear stack address space into discrete pages, the function call locality allows position-independent code reuse to succeed within a stack page size [23]. Timely code randomization breaks the constant locations used in the program layout, making it hard for attackers to reuse existing code to chain gadgets [6,11,50]. However, these approaches transform the code layout but not the stack slot layout, giving attackers the ability to exploit stack objects. Chameleon is designed to disrupt these kinds of attacks by continuously randomizing both the stack layout and code. By changing the stack layout, Chameleon makes it more difficult for attackers to corrupt specific stack elements.

6

Conclusion

We have presented the design, implementation and evaluation of Chameleon, a practical system for continuous stack re-randomization. Chameleon continually generates randomized stack layouts for all functions in the application, rewriting each function’s code to match. Chameleon periodically interrupts the target to rewrite its existing execution state to a new randomized stack layout and injects matching code. Chameleon controls target applications from a separate address space using the widely available ptrace and userfaultfd kernel primitives, maintaining strong isolation between Chameleon and the target. The evaluation showed that Chameleon’s lightweight user-level page fault handling and code transformation significantly raises the bar for stack exploitation with minimal overhead to target application. The source code of Chameleon is publicly available as part of the Popcorn Linux project at http://popcornlinux.org. Acknowledgments. This work is supported in part by the US Office of Naval Research (ONR) under grants N00014-18-1-2022 and N00014-16-1-2711, and by NAVSEA/NEEC under grant N00174-16-C-0018.

References 1. Aga, M.T., Austin, T.: Smokestack: thwarting DOP attacks with runtime stack layout randomization. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 26–36. IEEE (2019)

254

R. Lyerly et al.

2. Aleph, O.: Smashing the stack for fun and profit (1996). http://www.shmoo.com/ phrack/Phrack49/p49-14 3. Backes, M., N¨ urnberger, S.: Oxymoron: making fine-grained memory randomization practical by allowing code sharing. In: Proceedings of the 23rd USENIX Security Symposium, pp. 433–447 (2014) 4. Bailey, D.H., et al.: The NAS parallel benchmarks summary and preliminary results. In Supercomputing 1991: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, pp. 158–165. IEEE (1991) 5. Barbalace, A., et al.: Breaking the boundaries in heterogeneous-ISA datacenters. In: ACM SIGPLAN Notices, vol. 52, pp. 645–659. ACM (2017) 6. Bigelow, D., Hobson, T., Rudd, R., Streilein, W., Okhravi, H.: Timely rerandomization for mitigating memory disclosures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 268–279. ACM (2015) 7. Bruening, D.: Efficient, transparent, and comprehensive runtime code manipulation. Ph.D thesis, Massachusetts Institute of Technology, September 2004 8. Burow, N., Zhang, X., Payer, M.: Shining light on shadow stacks (2018). arXiv preprint arXiv:1811.03165 9. Carlini, N., Barresi, A., Payer, M., Wagner, D., Gross, T.R.: Control-flow bending: on the effectiveness of control-flow integrity. In: 24th USENIX Security Symposium (USENIX Security 15), pp. 161–176 (2015) 10. Chen, X., Slowinska, A., Andriesse, D., Bos, H., Giuffrida, C.: Stackarmor: comprehensive protection from stack-based memory error vulnerabilities for binaries. In: NDSS. Citeseer (2015) 11. Chen, Y., Wang, Z., Whalley, D., Lu, L.: Remix: on-demand live randomization. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 50–61. ACM (2016) 12. Cheng, Y., Zhou, Z., Miao, Y., Ding, X., Deng, R.H.: ROPecker: a generic and practical approach for defending against ROP attacks. In: Symposium on Network and Distributed System Security (NDSS) (2014) 13. Cowan, C., et al.: StackGuard: automatic adaptive detection and prevention of buffer-overflow attacks. In: Proceedings of the 7th USENIX Security Symposium, August 1998 14. Crane, S., et al.: Readactor: practical code randomization resilient to memory disclosure. In: 36th IEEE Symposium on Security and Privacy (Oakland), May 2015 15. CRIU. CRIU Compel. https://criu.org/Compel. Accessed 14 Apr 2019 16. CVE-2013-2028. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-20132028. Accessed 14 Apr 2019 17. Davi, L., Liebchen, C., Sadeghi, A.R., Snow, K.Z., Monrose, F.: Isomeron: code randomization resilient to (just-in-time) return-oriented programming. In: Proceedings of the 22nd Network and Distributed Systems Security Symposium (NDSS) (2015) 18. Davi, L., Sadeghi, A.R., Lehmann, D., Monrose, F.: Stitching the gadgets: on the ineffectiveness of coarse-grained control-flow integrity protection. In: Proceedings of the 23rd USENIX Conference on Security, SEC 2014 (2014) 19. Devietti, J., Blundell, C., Martin, M.M.K., Zdancewic, S.: Hardbound: architectural support for spatial safety of the C programming language. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (2008) 20. Durumeric, Z., et al.: The matter of heartbleed. In: Proceedings of the 2014 Conference on Internet Measurement Conference, pp. 475–488. ACM (2014)

Dynamic and Secure Memory Transformation in Userspace

255

21. DWARF Standards Committee. The DWARF Debugging Standard, February 2017 22. G¨ oktas, E., Athanasopoulos, E., Bos, H., Portokalidis, G.: Out of control: overcoming control-flow integrity. In: Proceedings of the 2014 IEEE Symposium on Security and Privacy SP 2014 (2014) 23. G¨ oktas, E., et al.: Position-independent code reuse: On the effectiveness of ASLR in the absence of information disclosure. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 227–242. IEEE (2018) 24. Hu, H., Chua, Z.L., Adrian, S., Saxena, P., Liang, Z.: Automatic generation of data-oriented exploits. In: 24th USENIX Security Symposium (USENIX Security 15), pp. 177–192 (2015) 25. Hu, H., Shinde, S., Adrian, S., Chua, Z.L., Saxena, P., Liang, Z.: Data-oriented programming: on the expressiveness of non-control data attacks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 969–986. IEEE (2016) 26. Ispoglou, K.K., AlBassam, B., Jaeger, T., Payer, M.: Block oriented programming: automating data-only attacks. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1868–1882. ACM (2018) 27. kernel.org. Userfaultfd. https://www.kernel.org/doc/Documentation/vm/userfaul tfd.txt. Accessed 14 Apr 2019 28. Kroes, T., Koning, K., van der Kouwe, E., Bos, H., Giuffrida, C.: Delta pointers: buffer overflow checks without the checks. In: Proceedings of the Thirteenth EuroSys Conference, p. 22. ACM (2018) 29. Linux Kernel Address Space Layout Randomization. http://lwn.net/Articles/ 569635/. Accessed 14 Apr 2019 30. LLVM Compiler Infrastructure. Stack maps and patch points in LLVM. https:// llvm.org/docs/StackMaps.html. Accessed 14 Apr 2019 31. Lu, K., Walter, M.T., Pfaff, D., N¨ umberger, S., Lee, W., Backes, M.: Unleashing use-before-initialization vulnerabilities in the linux kernel using targeted stack spraying. In: NDSS (2017) 32. Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Notices, vol. 40, pp. 190–200. ACM (2005) 33. Nagarakatte, S., Zhao, J., Martin, M.M.K., Zdancewic, S.: SoftBound: highly compatible and complete spatial memory safety for C. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009 (2009) 34. Oleksenko, O., Kuvaiskii, D., Bhatotia, P., Felber, P., Fetzer, C.: Intel MPX explained: a cross-layer analysis of the intel MPX system stack. Proc. ACM Measur. Anal. Comput. Syst. 2(2), 28 (2018) 35. Reese, W.: Nginx: the high-performance web server and reverse proxy. Linux J. 2008(173), 2 (2008) 36. Roemer, R., Buchanan, E., Shacham, H., Savage, S.: Return-oriented programming: systems, languages, and applications. ACM Trans. Inf. Syst. Secur. (TISSEC) 15(1), 2 (2012) 37. Schwartz, E.J., Avgerinos, T., Brumley, D.: Q: exploit hardening made easy. In: USENIX Security Symposium, pp. 25–41 (2011) 38. Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in openCL. In: 2011 IEEE international symposium on workload characterization (IISWC), pp. 137–148. IEEE (2011) 39. Serebryany, K., Bruening, D., Potapenko, A., Vyukov, D.: AddressSanitizer: a fast address sanity checker. In: Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), pp. 309–318 (2012)

256

R. Lyerly et al.

40. Shacham, H.: The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86). In: Proceedings of the 14th ACM Conference on Computer and Communications Security, October 2007 41. Snow, K.Z., Monrose, F., Davi, L., Dmitrienko, A., Liebchen, C., Sadeghi, A.R.: Just-in-time Code Reuse: on the effectiveness of fine-grained address space layout randomization. In: 2013 IEEE Symposium on Security and Privacy (SP), pp. 574– 588. IEEE (2013) 42. Standard Performance Evaluation Corporation. SPEC CPU 2017. https://www. spec.org/cpu2017. Accessed 14 Apr 2019 43. Szekeres, L., Payer, M., Wei, T., Song, D.: Sok: eternal war in memory. In: 2013 IEEE Symposium on Security and Privacy (SP), pp. 48–62. IEEE (2013) 44. The Linux man-pages project. mmap(2) - Linux manual page, April 2020. http:// man7.org/linux/man-pages/man2/mmap.2.html 45. Venkat, A., Shamasunder, S., Shacham, H., Tullsen, D.M.: Hipstr: heterogeneousISA program state relocation. In: ACM SIGARCH Computer Architecture News, vol. 44, pp. 727–741. ACM (2016) 46. Analysis of nginx 1.3.9/1.4.0 stack buffer overflow and x64 exploitation (CVE2013-2028). https://www.vnsecurity.net/research/2013/05/21/analysis-of-nginxcve-2013-2028.html. Accessed 14 Apr 2019 47. Wang, R., et al.: Ramblr: making reassembly great again. In: Proceedings of the 2017 Network and Distributed System Security Symposium (2017) 48. Wikipedia. Ptrace. http://en.wikipedia.org/wiki/Ptrace. Accessed 14 Apr 2019 49. Wikipedia. Shadow stack. https://en.wikipedia.org/wiki/Shadow stack. Accessed 14 Apr 2019 50. Williams-King, D., et al.: Shuffler: fast and deployable continuous code rerandomization. In: OSDI, pp. 367–382 (2016)

Understanding the Security Risks of Docker Hub Peiyu Liu1 , Shouling Ji1(B) , Lirong Fu1 , Kangjie Lu2 , Xuhong Zhang1 , Wei-Han Lee3 , Tao Lu1 , Wenzhi Chen1(B) , and Raheem Beyah4 1 Zhejiang University, Hangzhou, China {liupeiyu,sji,fulirong007,lutao,chenwz}@zju.edu.cn, [email protected] 2 University of Minnesota Twin Cities, Minneapolis, USA [email protected] 3 IBM Research, Yorktown Heights, USA [email protected] 4 Georgia Institute of Technology, Atlanta, USA [email protected]

Abstract. Docker has become increasingly popular because it provides efficient containers that are directly run by the host kernel. Docker Hub is one of the most popular Docker image repositories. Millions of images have been downloaded from Docker Hub billions of times. However, in the past several years, a number of high-profile attacks that exploit this key channel of image distribution have been reported. It is still unclear what security risks the new ecosystem brings. In this paper, we reveal, characterize, and understand the security issues with Docker Hub by performing the first large-scale analysis. First, we uncover multiple securitycritical aspects of Docker images with an empirical but comprehensive analysis, covering sensitive parameters in run-commands, the executed programs in Docker images, and vulnerabilities in contained software. Second, we conduct a large-scale and in-depth security analysis against Docker images. We collect 2,227,244 Docker images and the associated meta-information from Docker Hub. This dataset enables us to discover many insightful findings. (1) run-commands with sensitive parameters expose disastrous harm to users and the host, such as the leakage of host files and display, and denial-of-service attacks to the host. (2) We uncover 42 malicious images that can cause attacks such as remote code execution and malicious cryptomining. (3) Vulnerability patching of software in Docker images is significantly delayed or even ignored. We believe that our measurement and analysis serves as an important first-step study on the security issues with Docker Hub, which calls for future efforts on the protection of the new Docker ecosystem.

1

Introduction

Docker has become more and more popular because it automates the deployment of applications inside containers by launching Docker images. Docker Hub, one c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 257–276, 2020. https://doi.org/10.1007/978-3-030-58951-6_13

258

P. Liu et al.

of the most popular Docker image registries, provides a centralized market for users to obtain Docker images released by developers [4,31,35]. Docker has been widely used in many security-critical tasks. For instance, Solita uses Docker to handle the various applications and systems associated with their management of the Finnish National Railway Service [9]. Amazon ECS allows users to easily run applications on a managed cluster of Amazon EC2 instances in Docker containers [1]. In addition, millions of Docker images have been downloaded from Docker Hub for billion times by users for data management, website deployment, and other personal or business tasks. The popularity of Docker Hub however brings many high-profile attacks. For instance, on June 13, 2018, a research institute reported that seventeen malicious Docker images on Docker Hub earned cryptomining criminals $90,000 in 30 days [8]. These images have been downloaded collectively for 5 million times in the past year. The report also explained the danger of utilizing unchecked images on Docker Hub: “For ordinary users, just pulling a Docker image from the Docker Hub is like pulling arbitrary binary data from somewhere, executing it, and hoping for the best without really knowing what’s in it”. Therefore, a comprehensive and in-depth security study of Docker Hub is demanded to help users understand the potential security risks. The study of Docker Hub differs from the ones of other ecosystems such as App store and virtual-machine image repositories [15,20,28,30,32] in the following aspects. (1) Docker images are started through run-commands. They are executed through special instructions called run-commands which are securitycritical to the created containers. (2) The structures of Docker images are more complex than traditional applications. A single image may contain a large number of programs, environment variables, and configuration files; it is hard for a traditional analysis to scale to scan all images. (3) Docker images can bring new risks to not only the container itself but also the host because the lightweight virtualization technology leveraged in containers allows the sharing of the kernel. (4) The vulnerability-patching process of Docker images is significantly delayed because the programs are decoupled from the mainstream ones, and developers are less incentivized to update programs in Docker images. All the aforementioned differences require a new study for the security of Docker Hub. The unique characteristics of Docker Hub call for an urgent study of its new security issues. However, a comprehensive and in-depth study entails overcoming multiple challenges. (1) It is not clear how to analyze the security impacts of various categories of information on Docker Hub. For example, run-commands are security-critical to Docker containers while a method for measuring the security impacts of run-commands is still missing. This requires significant empirical analysis and manual effort. (2) Obtaining and analyzing Docker images and the associated meta-information in a scalable manner is non-trivial. For example, it is difficult to perform a uniform analysis on Docker images, since a Docker image contains a large number of files in a broad range of types (e.g., ELF, JAR, and Shell Scripts).

Understanding the Security Risks of Docker Hub

259

In this paper, we perform the first security analysis against Docker Hub. Based on the unique characteristics of Docker Hub, we first empirically identify three major security risks, namely sensitive parameters in run-commands, malicious docker images, and unpatched vulnerabilities. We then conduct a largescale and in-depth study against the three security risks. (1) Run-commands. We carefully analyze the parameters in run-commands to discover sensitive parameters that may pose threats to users. Moreover, we develop multiple new attacks (e.g., obtaining user files in the host and the host display) in Docker images to demonstrate the security risks of sensitive parameters in practice. We also conduct a user study to show that users, in general, are unaware of the risks from sensitive parameters. (2) Malicious executed programs. To study malicious Docker images efficiently, we narrow down our analysis to only the executed programs. We implement a framework to automatically locate, collect, and analyze executed programs. By leveraging this framework, we scan more than 20,000 Docker images to discover malicious executed programs. (3) CVE-assigned vulnerabilities. We provide a definition of the life cycle of vulnerability in Docker images and manually analyze the length of the time window of vulnerabilities in Docker images. To enable the analysis, we collect a large number of images and their meta-information from Docker Hub. Our collected dataset contains all the public information of 2,227,244 images from 975,858 repositories on Docker Hub. The comprehensive analysis enables us to have multiple insightful findings. First of all, we find that the run-commands with sensitive parameters presented in Docker Hub may introduce serious security risks, including the suffering of denial-of-service attack and the leakage of user files in the host and the host display. Moreover, we observe that each recommended run-command in the repository description contains one sensitive parameter on average. Unfortunately, our user study reveals that users are not aware of the threats from sensitive parameters—they will directly execute run-commands specified by developers without checking and understanding them. Second, our analysis shows that malicious images are hidden among common ones on Docker Hub. Using our analysis framework, we have discovered 42 malicious images. The malicious behaviors include remote execution control and malicious cryptomining. Finally, we observe that the vulnerability patching for the software in Docker images is significantly delayed or even ignored. In particular, almost all the images on Docker Hub suffer from unpatched software vulnerabilities. In extreme cases, a single image may contain up to 7,500 vulnerabilities. In addition, vulnerabilities in the software of Docker images tend to have a much longer life cycle due to the lack of image updates. More critically, we find that the in-Docker vulnerabilities can even cause harms to the host machine through Docker, e.g., crashing the host. Our analysis and findings reveal that the Docker ecosystem brings new security threats to users, contained software, and the host machine as well. To mitigate these threats, we suggest multiple potential solutions (see Sect. 7) such as automatically fixing vulnerabilities in images, detecting malicious images in Docker Hub, etc. We have reported all the security issues uncovered in this paper to Docker Hub and they are investigating to confirm these issues.

260

P. Liu et al.

In summary, our work makes the following contributions. – We empirically identify three major sources of security risks in Docker Hub, namely sensitive parameters in run-commands, malicious docker images, and unpatched vulnerabilities. We then conduct a large-scale and in-depth study against the three security risks based on all the public information of 2,227,244 images collected from 975,858 repositories on Docker Hub. We have opensourced this dataset to support reproducibility and motivate future work in Docker security analysis [13]. – We uncover many new security risks on Docker Hub. 1) Sensitive parameters in run-commands can expose disastrous harm to users and the host, such as the leakage of host files and display, and denial-of-service attacks to the host. 2) We uncover 42 malicious images that can cause attacks such as remote code execution and malicious cryptomining. 3) Vulnerability patching of software in Docker images is significantly delayed by 422 days on average. – Our analysis calls for attention to the security threats posed by the new Docker ecosystem. The threats should be addressed collectively by Docker image registry platforms, image developers, users, and researchers. Moreover, we have reported all the security issues uncovered in this paper to Docker Hub and suggest multiple mitigation approaches.

2

Background and Threat Model

In this section, we provide a brief introduction of Docker Hub and its critical risk resources. Then, we describe the threat model of our security analysis. 2.1

Critical Risk Sources in Docker Hub

Docker Hub is the world’s largest registry of container images [5]. Images on Docker Hub are organized into repositories, which can be divided into official repositories and community repositories. For each image in a Docker Hub repository, besides the image itself, meta-information is also available to the users, such as repository description and history, Dockerfile [5], the number of stars and pulls of each repository, and developer information. To perform a risk analysis against Docker Hub, we first need to identify potential risk sources. We empirically identify risk sources based on which major components control the behaviors of a Docker image. Run-Command and Sensitive Parameter. In order to run a Docker container, users need to execute an instruction called run-command. A runcommand mainly specifies the image and parameters used to start a container. For instance, a developer may specify a recommended run-command on Docker Hub, such as “Start container with: docker run --name flaviostutz-opencv2 -privileged -p 2222:22 flaviostutz/opencv-x86”. For users who have never used the image before, the recommended run-commands can be helpful for deploying their containers. However, it is unclear to what extent users should trust

Understanding the Security Risks of Docker Hub

261

the run-commands posted by the developers, who can publish run-commands without any obstruction because Docker Hub does not screen these content. In addition, a run-command may contain a variety of parameters that can affect the behavior of the container [6]. Some of these parameters are sensitive since they control the degree of isolation of networks, storage, or other underlying subsystems between a container and its host machine or other containers. For example, when users run an image with the parameter of --privileged, the container will get the root access to the host. Clearly, the misuse of run-commands containing sensitive parameters may lead to disastrous consequences on the container as well as the host (see Sect. 4). Executed Programs. Previous work already shows that a large amount of software in Docker images is redundant [31]. Hence, when analyzing the content of a Docker image, we should focus on the executed programs that are bound up with the security of the image. Based on our empirical analysis, we find that the entry-file (an executable file in Docker images, specified by a configure file or runcommands) is always the first software triggered when a container starts. Besides, the entry-file can automatically trigger other files during execution. Therefore, the executed programs (the entry-file and subsequently triggered files) are key factors that directly affect the safety of a container. Furthermore, in general, it is less common for users to run software other than executed programs [4]. Therefore, in the current study, we choose to analyze executed programs to check malicious images for measurement purposes. Vulnerabilities in Contained Software. A Docker image is composed of a large number of software packages, vulnerabilities in these software packages bring critical security risks for the following reasons. Vulnerabilities can be exploited by attackers to cause security impacts such as data leakage. Additionally, Docker software programs are often duplicated from original ones, and Docker developers lack incentives to timely fix vulnerabilities in the duplicated programs. As a result, the security risks with vulnerabilities are elevated in Docker because vulnerabilities take a much longer time to be fixed in Docker images. 2.2

Threat Model

Normal Image

Malicious

Command Malicious As shown in Fig. 1, there are two Image Sensitive Command different categories of threats Attacker Vulnerable Image that Docker Hub may face. (1) Vulnerable images. Developers DockerHub upload their images and the associated meta-information to Docker Hub, which may contain vulnerabilities. If users Developer Attacker User download and run the vulnerable images, they are likely to Fig. 1. The threat model of our security analysis.

262

P. Liu et al.

become the targets of attackers who exploit vulnerabilities. Additionally, the runcommands announced by developers may contain sensitive parameters, which may bring more security concerns such as giving containers root access to the host (see Sect. 4). (2) Malicious images. Attackers may upload their malicious images to Docker Hub, and sometimes together with malicious run-commands in the description of their images. Due to the weak surveillance of Docker Hub, malicious images and metadata can easily hide themselves among benign ones (see Sect. 5) [8]. Once users download a malicious image and run it, they may suffer from attacks such as cryptomining. On the other hand, malicious runcommands can also lead to attacks such as host file leakage. In this study, since it is mainly for measurement purposes, we assume that attackers are not aware of the techniques we employ to analyze Docker images. Otherwise, they can hide the malicious code by bypassing the analysis, leading to an arms race.

3

Analysis Framework and Data Collection

In this section, we provide an overview of our analysis framework. Then we introduce our methods of data collection and provide a summary for the collected dataset. 3.1

Overview of Our Analysis Framework

In our security analysis, we focus on the key risk sources (runRepository commands, sensitive Name Docker Hub Description parameters, entry-files Developer and vulnerabilities) in Data Information Collection Docker Hub, as uncov··· Images Crawler Meta Information Downloader ered in Sect. 2.1. Our analysis framework for Run-Command and Novel the study is outlined in Sensitive Parameter Attributes Fig. 2. We first collect Executed Program Extraction Website Extractor Critical Sources a large-scale dataset, including the images Code and all the public inforSecurity Analysis mation of a Docker Data image such as image Analyzer Result name, repository description, and developer inforFig. 2. The framework of our analysis. mation, from Docker 1 ). 2 After data Hub (, collection, the extractor utilizes a set of customized tools to obtain several previously-ignored essentials such as sensitive parameters and executed pro3 After obtaining the set of essentials, we perform grams from the raw data ().

Understanding the Security Risks of Docker Hub

263

systematic security analysis on Docker Hub from various aspects by leveraging Anchore [2], VirusTotal intelligence API [11], and a variety of customized 4 tools (). 3.2

Data Collection and Extraction

The abundant information (described in Sect. 2.1) of Docker images on Docker Hub is important in understanding the security of Docker Hub. However, most of this information has never been collected before, leading to incomprehensive analysis. Hence, we implement a customized web crawler that leverages Docker Hub API described in [3] to collect the Docker images and their associated metainformation from Docker Hub. All the information we obtain from Docker Hub is publicly available to anyone and it is legal to perform analysis on this dataset. Summary of Our ColTable 1. Data collected in our work. lected Data. Our dataset, No. repositories No. images No. developers as shown in Table 1, conOfficial 147 1,384 1 tains all the public inforCommunity 975,711 2,225,860 349,860 mation of the top 975,858 Total 975,858 2,227,244 349,861 repositories on Docker Hub. For each repository, the dataset contains the image files and the meta-information described in Sect. 2.1. Furthermore, to support in-depth analysis, we further extract the following data and code from the collected raw data. Collecting Run-Command and Sensitive Parameter. As discussed in Sect. 2.1, run-commands and sensitive parameters can make a great impact on the behavior of a container. In order to perform security analysis on them, we collect run-commands by extracting text contents that start with ‘‘docker run’’ from the repository descriptions and further obtain sensitive parameters from the run-commands through string matching. Collecting Executed Program. As discussed in Sect. 2.1, the executed program is a key factor that directly affects the security of a container. Therefore, we develop an automatic parser to locate and extract the executed program. For each image, our parser first locates the entry-file according to the Dockerfile or the manifests file. Once obtaining the entry-file, the parser scans the entry-file to locate the files triggered by entry-file. Then, the parser scans the triggered file iteratively to extract all executed programs in the image. For now, our parser can analyze ELF files and shell scripts by leveraging strings [10] and a customized script interpreter, respectively.

4

Sensitive Parameters

In this section, we (1) identify sensitive parameters; (2) investigate the user awareness of sensitive parameters; (3) propose novel attacks exploiting sensitive parameters; (4) study the distribution of sensitive parameters on Docker Hub.

264

4.1

P. Liu et al.

Identifying Sensitive Parameters

As described in Sect. 2.1, the parameters in run-commands can affect the behaviors of containers. However, among the over 100 parameters provided by Docker, it is unknown which parameters can cause security consequences. Therefore, we first obtain all the parameters and their corresponding descriptions from the documentation provided by Docker. Then, we manually identify the sensitive parameters by examining if they satisfy any of the following four proposed criteria: (1) Violate the isolation of file systems; (2) Violate the isolation of networking; (3) Break the separation of processes; (4) Escalate runtime privileges. Next, we explain why we choose these criteria. (1) From the security perspective, each Docker container maintains its own file system isolated from the host file system. If this isolation is broken, a container can gain access to files on the host, which may lead to the leakage of host data. (2) By default, each Docker container has its own network stack and interfaces. If this isolation is broken, a container can have access to the host’s network interfaces for sending/receiving network messages, which, for example, may cause denial of service attacks. (3) Generally, each Docker container has its own process-tree separated from the host. If this isolation is broken, a container can see and affect the processes on the host, which may allow containers to spy the defense mechanism on the host. (4) Most potentially dangerous Linux capabilities, such as loading kernel modules, are dropped in Docker containers. If a container obtains these capabilities, it may affect the host. For example, it is able to execute arbitrary hostile code on the host. 4.2

User Awareness of Sensitive Parameters

We find that nearly all the default container isolation and restrictions enforced by Docker can be broken by sensitive parameters in run-commands, such as --privileged, -v, --pid, and so on. We describe the impact of these sensitive parameters in Sect. 4.3. However, the real security impacts of these sensitive parameters on the users of Docker Hub in practice is not clear. Therefore, the first question we aim to answer is “are users aware of the sensitive parameters when the parameters in run-commands are visible to them?” To answer this question, we conduct a user study to characterize the behaviors of users of Docker Hub, which allows us to understand user preferences and the corresponding risks. Specifically, we survey 106 users including 68 security researchers and 38 software engineers from both academia and industry fields. For all the 106 users, 97% of them only focus on the functionality of images and have never raised doubts about the descriptions, e.g., the developer identification, the run-commands, on Docker Hub. It is worth noting that, even for 68 users who have a background on security research, 95% of them trust the information provided on Docker Hub. 90% of security experts run an image by exactly following the directions provided by image developers. Only 10% of security experts indicate that they prefer to figure out what the run-commands would do. Indeed, the study can be biased due to issues such as the limited number of the investigated users, imbalanced

Understanding the Security Risks of Docker Hub

265

gender and age distribution, and so on. However, our user study reveals that users, even the ones with security-research experience, do not realize the threats of sensitive parameters in general. Nearly 90% of users exactly execute runcommands specified by developers without checking and understanding them. Hence, we conclude that sensitive parameters are an overlooked risk source for Docker users. More details of the user study are deferred to Appendix A.1. 4.3

Novel Attacks Exploiting Sensitive Parameters

To demonstrate the security risks of sensitive parameters in practice, we develop a set of new attacks in Docker images that do not contain any malicious software packages. Our attacks rely on only run-commands with sensitive parameters to attack the host. Note that we successfully uploaded these images with our “malicious” run-commands to Docker Hub without any obstruction, confirming that Docker Hub does not carefully screen run-commands. However, to avoid harm to the community, we immediately removed the sensitive parameters in the run-commands after the uploading, and performed the attacks in our local lab machines only. The Leakage of Host Files. 1 g i t c l o n e h t t p s : // A t t a c k e r / d a t a . g i t As described in Fig. 3, we show 2 cd d a t a how to leak user files in the host 3 g i t p u l l 4 cp −r u s r −d a t a . using sensitive parameters. Specifi- 5 g i t add . cally, --volume or -v is used to mount 6 c u r d a t e =‘ date ‘ 7 g i t commit −m ” $ c u r d a t e ” a volume on the host to the con- 8 g i t push tainer. If the operator uses parameter, -v src:dest, the container will Fig. 3. Code example to implement the gain access to src which is a volume leakage of host files. on the host. Exploiting this parameter, attackers can maliciously upload user data saved in the host volume to their online repository, such as GitHub. It is important to note that this attack can be user-insensitive, i.e., the attacker can prepare a configure file in the malicious image to bypass the manual authentication. Other attacks are delayed to Appendix A.2. Overall, these novel attacks we proposed in this paper demonstrate that sensitive parameters can expose disastrous harm to the container as well as the host. 4.4

Distribution of Sensitive Parameters

Next, we study the distribution of sensitive parameters used in real images on Docker Hub. We observe that 86,204 (8.8%) repositories contain the recommended run-commands in their descriptions on Docker Hub. Moreover, as shown in Table 2, there are 81,294 sensitive parameters in these run-commands—on average, each run-command contains one sensitive parameter. Given the common usage of sensitive parameters and their critical security impacts, it is urgent to improve users’ awareness of the potential security risks brought by sensitive parameters and propose effective vetting mechanisms to detect these risks.

266

P. Liu et al. Table 2. Distribution of sensitive parameters. Criteria

No. sensitive parameters

Violate the isolation of file systems 33,951 Violate the isolation of networking 43,278 Break the separation of processes Escalate runtime privileges Total

5

56 4,009 81,294

Malicious Images

As reported in [8], high-profile attacks seriously damaged the profit of users. These attacks originate from the launching of malicious images such as electronic coin miners. In order to detect malicious images, manual security analysis on each software contained in a Docker image yields accurate results, which is however extremely slow and does not scale well for large and heterogeneous software packages. On the other hand, a dynamic analysis also becomes impractical since it is even more time-consuming to trace system calls, APIs, and network for millions of Docker images. Compared to the two methods above, a static analysis seems to be a potential solution to overcome these challenges. However, the number of software packages contained in each image varies from hundreds to thousands. It is still challenging to analyze billions of software packages using static analysis. Fortunately, we found that a majority of software packages in Docker images are redundant [31] which thus can be filtered out for efficient analysis. However, it is unclear which software deserves attention from security researchers. As stated in Sect. 2.1, we propose to focus on the executed programs. As long as a malicious executed program is detected, the corresponding image can be confirmed as malicious. Considering that our research goal is to characterize malware for measurement purposes instead of actually detecting them in practice, we focus on the executed programs only rather than other software for discovering malicious images in this study. We will discuss how to further improve the detection in Sect. 8. 5.1

Malicious Executed Programs

Detecting Malicious Executed Programs. By leveraging the parser proposed in Sect. 3.2, we can obtain all the executed programs in the tested images. We observe that the file types of the extracted executed programs could be JAR written by JAVA, ELF implemented by C++, Shell Script, etc. It is quite challenging to analyze many kinds of software at the same time, since generating and confirming fingerprints for malware are both difficult and time-consuming. Therefore, we turn to online malware analysis tools for help. In particular, VirusTotal [11] is a highly comprehensive malware detection platform that incorporates various signature-based and anomaly-based detection methods employed

Understanding the Security Risks of Docker Hub

267

by over 50 anti-virus (AV) companies such as Kaspersky, Symantec. Therefore, it can detect various kinds of malware, including Trojan, Backdoor, and BitcoinMiner. As such, we employ VirusTotal to perform a primary screening. However, prior works have shown that VirusTotal may falsely label a benign program as malicious [14,22,29]. To migrate false positives, most of the prior works consider a program malicious if at least a threshold of AV companies detects it. In fact, there is no agreement in the threshold value. Existing detection has chosen two [14], four [22], or five [29] as the threshold. In this paper, to more precisely detect malicious programs, we consider a program as malicious only if at least five of the AV companies detect it. This procedure ensures that the tested programs are (almost) correctly split into benign and malicious ones. The results of the primary screening only report the type of each malware provided by anti-virus companies. It is hard to demonstrate the accuracy of primary screening results, let along understanding the behavior of each malware. Therefore, after we obtain a list of potentially malicious files from the primary screening, a second screening is necessary to confirm the detection results and analyze the behavior of these files. Specifically, we dynamically run the potentially malicious files in a container and collect the logs of system call, network, and so on to expose the security violations of such files. Finally, we implement a framework to finish the above pipeline. First, our framework utilizes our parser to locate and extract executed programs from the Docker images. Second, it leverages ViusTotal API to detect potentially malicious files. Third, we implement a container which contains a variety of tools such as strace and tcpdump for security analysis. Our framework leverages this container as a sandbox for automatically running and tracing the potentially malicious files to generate informative system logs. Since most benign images are filtered out by the primary screening, system logs are generated for only a few potentially malicious images. This framework greatly saves manual efforts and helps detect malicious images rapidly. 5.2

Distribution of Malicious Images

We first study the executed programs in the latest images in 147 official repositories, and we find that there are no malicious executed programs in these official images. Then, to facilitate in-depth analyses on community images, we extract the following subsets from the collected dataset for studying the executed programs. 1) The latest images in the top 10,000 community repositories ranked by popularity. 2) According to the popularity ranking, we divide the rest community repositories into 100 groups and randomly select 100 latest images from each group. In this way, we obtain 10,000 community images. Results. On average, our parser proposed in Sect. 3.2 takes 5 and 0.15 seconds in analyzing one image and one file, respectively. The parser locates 693,757 executed programs from the tested images. After deduplication, we get 36,584 unique executed programs, in which there exist 13 malicious programs identified by our framework. The 13 malicious programs appear in 17 images. Moreover, we

268

P. Liu et al.

notice that all the malicious programs are entry-files in these malicious images. This observation indicates that it is common for malicious images to perform attacks by directly utilizing a malicious entry-files, instead of subsequently triggered files. Intuitively, the developer of a malicious image may release other malicious images on Docker Hub. Therefore, we propose to check the images that are related to malicious images. By leveraging the metadata collected in Sect. 3.2, once we detect a malicious image, we can investigate two kinds of related images: (1) the latest 10 images in the same repository and (2) the latest images in the most popular 10 repositories created by the same developer. We obtain 48 and 84 of these two types of related images respectively, in which there are 27 images contain the same malicious file found in the previously-detected malicious images. After analyzing all the related images by leveraging the framework developed in Sect. 5.1, we further obtain 186 new executed programs, in which there are 20 new malicious programs in 25 images. This insightful finding indicates that heuristic approaches, such as analyzing related images proposed in this work, are helpful in discovering malicious images and programs effectively. We hope that the malicious images and insights discovered in this paper can serve as an indicator for future works. The case study of the detected malicious images is deferred to Appendix A.3.

6

CVE Vulnerabilities

In this section, we evaluate the vulnerabilities in Docker images which are identified through Common Vulnerabilities and Exposures (CVE) IDs because the information of CVEs is public, expansive, detailed, and well-formed [34]. First, we leverage Anchore [2] to perform vulnerability detection for each image and study the distribution of the discovered vulnerabilities in Docker images. The results demonstrate that both official and community images suffer from serious software vulnerabilities. More analysis about vulnerabilities in images are deferred to Appendix A.4. Then, we investigate the extra window of vulnerability in Docker images. Defining of the Extra Window of Vulnerability. To understand the timeline of the life cycle of a vulnerability in Docker images, accurately determining the discovery and patch time of a vulnerability, the release and update time of an image is vital. However, several challenges exist in determining different times. For instance, a vulnerability may be patched multiple times. In addition, different vendors might release different discovery times. Hence, we propose to first define these times motivated by existing research [34]. – Discovery-time is the earliest reported date of a software vulnerability being discovered and recognized to pose a security risk to the public. – Patch-time is the latest reported date that the vendor, or the originator of the software releases a fix, workaround, or a patch that provides protection against the exploitation of the vulnerability. If the vulnerability is fixed by

Understanding the Security Risks of Docker Hub

269

the upgrade of the software and the patch is not publicly available, we record the date of the upgrade of the software instead. – For a vulnerability, release-time is the date that the developer releases an image that first brings in this vulnerability. – For a vulnerability, upgrade-time is the date that the developer releases a new edition of the image, which fixes this vulnerability contained in the previous edition. Suppose that a vulWindow of Vulnerability in Software nerability of software S Window of Vulnerability in Image is discovered at Td . Then Extra Window of Vulnerability in Image after a period of time, Patch-time Discovery-time the developer of S fixes Tp Td Tr Tu this vulnerability at Tp . Time Release-time Upgrade-time Image I first brings in Discovery-time Patch-time this vulnerability at Tr . Tp T r T d Tu It takes extra time for Time Release-time Upgrade-time the developer of I to Discovery-time Patch-time fix the vulnerability and Tp Tr update this image at Tu . Time Td Tu Release-time Upgrade-time The developer of Image I is supposed to immediately fix the vulnerability Fig. 4. Extra window of vulnerabilities in Docker images. once the patch is publicly available. However, it usually takes a long time before developers actually fix the vulnerability in an image. Therefore, we define We , the extra window of vulnerability in images, to measure how long it takes from the earliest time the vulnerability could be fixed to the time the vulnerability is actually fixed. Figure 4 presents three different cases of the extra window of vulnerability. In the first two cases, We is spreading from Tp to Tu . In the last case, even though the patch of a vulnerability is available, image I still brings in the vulnerability, so the We starts at Tr and ends at Tu . For all the cases, there is always an extra time window of vulnerability in image I before the developer updates I. Obtaining Different Times. We choose the latest five editions of the 15 most popular images and randomly sample other 15 Docker images to investigate the extra window of vulnerabilities in images. By leveraging Anchore [2], we obtain 5,608 CVEs from these 30 images. After removing duplication, 3,023 CVEs remain, from which we randomly sample 1,000 CVEs for analysis. Then, we implement a tool to automatically collect discovery-, patch-time, and image release-, updatetime for each CVE by leveraging the public information released on NVD Metrics [12]. We aim to obtain all the vital time of 1,000 CVEs from public information. However, the discovery- and patch-time of vulnerabilities are not always released in public. Therefore, we only obtain the complete information of vital time of 334 CVEs. It is worth noting that in some complex cases (e.g., there are multiple discovery-time for a CVE), we manually collect and confirm the complete information by reviewing the external references associated with CVEs.

270

P. Liu et al.

Results. After analyzing the vital time of the 334 CVEs, we observe that it takes 181 days on average for common software to fix a vulnerability. However, the extra window of vulnerabilities in images is 422 days on average while the longest extra window of vulnerabilities could be up to 1,685 days, which allows sufficient time for attackers to craft corresponding exploitations of the vulnerabilities in Docker images.

7

Mitigating Docker Threats

In this section, we propose several possible methods to mitigate the threats uncovered in this paper. Sensitive Parameters. To mitigate attacks abusing sensitive parameters, one possible method is to design a framework which automatically identifies sensitive parameters and alerts users on the webpages of repository descriptions on Docker Hub. First of all, it is necessary to maintain a comprehensive list of sensitive parameters by manual analysis. After that, sensitive parameters in the descriptions of Docker images can be identified easily by leveraging string matching. Docker Hub should be responsible for displaying the detection results in the image description webpage and prompting the users about the possible risks of the parameters in run-commands. The above framework can be implemented as a backend of the website of Docker Hub or a browser plug-in. Additionally, runtime alerts, as adopted by iOS and existing techniques [33,38], can warn users of potential risks before executing a run-command with sensitive parameters, which will be an effective mechanism to mitigate the abusing of sensitive parameters. Malicious Images. To detect malicious images, traditional static and dynamic analysis, e.g., signature-based method, system call tracing can be certainly helpful. However, many challenges do exist. For example, if the redundant files of an image cannot be removed accurately, it will be extraordinarily time-consuming to analyze all the files in an image by traditional methods. The framework proposed in Sect. 5.1 can be utilized to solve this problem. Furthermore, heuristic approaches, such as analyzing related images proposed in Sect. 5.2, could also be beneficial in discovering malicious images. Vulnerabilities. Motivated by previous research [18], we believe that automatic updating is an effective way to mitigate the security risks from vulnerable images. However, software update becomes challenging in Docker. Because the dependencies among a large quantity of software are complicated and arbitrary update may cause a broken image. We propose multiple possible solutions to automated updating for vulnerable software packages. First, various categories of existing tools such as Anchore [2] can be employed to obtain vulnerability description information including the CVE ID of the vulnerability, the edition of the corresponding vulnerable software packages that bring in the vulnerability, the edition of software packages that repairs this vulnerability. Second, package management tools, e.g., apt, yum, can be helpful to resolve the dependency relationship among software packages. After we obtain the above information, we may safely update vulnerable software and the related software.

Understanding the Security Risks of Docker Hub

8

271

Discussion

In this section, we discuss the limitations of our approach and propose several directions for future works. Automatically Identifying Malicious Parameters. We perform the first analysis on sensitive parameters and show that these parameters can lead to disastrous security consequences in Sect. 4. However, the sensitive parameters we discuss in this paper are recognized by manual analysis. Obviously, there are other sensitive parameters in the field that still need to be discovered in the future. Additionally, even though we discover many sensitive parameters on Docker Hub, it is hard to identify which parameters are published for malicious attempts automatically. Hence, one future direction to improve our work is to automatically identify malicious parameters in repository descriptions. One possible method is gathering images with the similar functionalities and employ statistical analysis to detect the deviating uses of sensitive parameters—deviations are likely suspicious cases, given that most images are legitimate. Improving Accuracy for Detecting Malicious Images. In Sect. 5, we narrow down the analysis of malicious images to the executed programs. Although this method achieves the goal of discovering security risks in Docker Hub, it still has shortcomings for detecting malicious images accurately. For example, malware detection is a well-known arms-race issue [17]. First of all, VirusTotal, the detector we used in the primary screening may miss malware. Additionally, we cannot detect malicious images that perform attacks with other files rather than their executed programs. For example, our approach cannot detect images that download malware during execution, since our parser performs static analysis. Furthermore, after we obtain the system log of potential malicious images, we conduct manual analysis on these log files. Motivated by [21], we plan to include more automatic techniques to parse and analyzing logs as future work. Moreover, we plan to analyze more images in the future. Polymorphic Malware. Malware can change their behaviors according to different attack scenarios and environments [17,25]. Since our research goal is to characterize malware for measurement purposes instead of actually detecting them in practice, we employ existing techniques to find and analyze malware. However, it is valuable to understand the unique behaviors of Docker Malware. For instance, Docker has new fingerprints for malware to detect its running environments. How to emulate the Docker fingerprints to expose malicious behaviors can be an interesting future topic.

9

Related Work

Vulnerabilities in Docker Images. Many prior works focus on the vulnerabilities in Docker images [16,23,27,36,39]. For instance, Shu et al. proposed the Docker Image Vulnerability Analysis (DIVA) framework to automatically discover, download, and analyze Docker images for security vulnerabilities [36].

272

P. Liu et al.

However, these studies only investigate the distribution of vulnerabilities in Docker images, while our work uniquely conducts in-depth analysis on new risks brought by image vulnerabilities, such as the extra window of vulnerability. Security Reinforcement and Defense. Several security mechanisms have been proposed to ensure the safety of Docker containers [26,35,37]. For instance, Shalev et al. proposed WatchIT, a strategy that constrains IT personnel’s view of the system and monitors the actions of containers for anomaly detection [35]. There also exist other Docker security works that focus on defenses for specific attacks [19,24]. For instance, Gao et al. discuss the root causes of the containers’ information leakage and propose a two-stage defense approach [19]. However, these studies are limited to specific attack scenarios, which are not sufficient for a complete understanding of the security state of Docker ecosystems as studied in this work. Registry Security. Researchers have conducted a variety of works on analyzing the code quality and security in third-party code store and application registries, such as GitHub and App Store [15,20,28,30,32]. For instance, Bugiel et al. introduced the security issues of VM image repository [15]. Duc et al. investigated Google Play and the relationship between the end-user reviews and the security changes in apps [30]. However, Docker image registries such as Docker Hub have not been fully investigated before. This work is the first attempt to fill the gap according to our best knowledge.

10

Conclusion

In this paper, we perform the first comprehensive study of Docker Hub ecosystem. We identified three major sources for the new security risks in Docker hub. We collected a large-scale dataset containing both images and the associated meta-information. This dataset allows us to discover novel security risks in Docker Hub, including the risks of sensitive parameters in repository descriptions, malicious images, and the failure of fixing vulnerabilities in time. We developed new attacks to demonstrate the security issues, such as leaking user files and the host display. As the first systematic investigation on this topic, the insights presented in this paper are of great significance to understanding the state of Docker Hub security. Furthermore, our results make a call for actions to improve the security of the Docker ecosystem in the future. We believe that the dataset and the findings of this paper can serve as a key enabler for improvements of the security of Docker Hub. Acknowledgements. This work was partly supported by the Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars under No. LR19F020003, the National Key Research and Development Program of China under No. 2018YFB0804102, NSFC under No. 61772466, U1936215, and U1836202, the Zhejiang Provincial Key R&D Program under No. 2019C01055, and the Ant Financial Research Funding.

Understanding the Security Risks of Docker Hub

A A.1

273

Appendix User Study on Sensitive Parameters

We design an online questionnaire that contains questions including “Do you try to fully understand every parameter of the run-commands provided on the Docker Hub website before running those commands?”, “Do you make a security analysis of the compose.yml file before running the image?”, etc. Our questionnaire was sent to our colleagues and classmates, and further spread by them. In order to ensure the authenticity and objectivity of the investigation results, we did not tell any respondents the purpose of this survey. We plan to conduct the user study in the official community of Docker Hub in the future. Finally, we collected 106 feedback offered by 106 users from various cities in different countries. All of them have benefited from Docker Hub, i.e., they have experiences in using images from Docker Hub. Besides, they are from a broad range from both academia and industry fields, including students and researchers from various universities, software developers and DevOps engineers from different companies, etc. As described in Sect. 4.2, the results of our user study show that 97% of users only care about if they can successfully run the image while ignoring how the images run, not to mention the sensitivity parameters in run-command and docker-compose.yml file. Even for 68 users who have a background in security research, only 10% of them indicate that they prefer to figure out the meaning of the parameters in run-commands. A.2

Novel Attacks Exploiting Sensitive Parameters

Obtaining the Display of the Host. --privileged is one of the most powerful parameters provided by Docker, which may pose a serious threat to users. When the operator uses command --privileged, the container will gain access to all the devices on the host. Under this scenario, the container can do almost anything with no restriction, which is extremely dangerous to the security of users. More specifically, --privileged allows a container to mount a partition on the host. By taking a step further, the attacker can access all the user files stored on this partition. In addition to accessing user files, we design an attack to obtain the display of a user’s desktop. In fact, with --privileged, a one-line code, cp/dev/fb0 user desktop.txt, is sufficient for attackers to access user display data. Furthermore, by leveraging simple image processing software [7], attackers can see the user’s desktop as if they were sitting in front of the user’s monitor. Spying the Process Information on the Host. --pid is a parameter related to namespaces. Providing --pid=host allows a container to share the host’s PID namespace. In this case, if the container is under the control of an attacker, all the programs running on the user’s host will become visible to the attacker inside the container. Then, the attacker can utilize these exposed information such as the PID, the owner, the path of the corresponding executable file and the execution parameters of the programs, to conduct effective attacks.

274

P. Liu et al.

A.3

Case Study of Malicious Images

We manually conduct analysis on detected malicious images. For instance, the image mitradomining/ffmpeg on Docker Hub is detected as malicious by our framework. The entry-file of this image is /opt/ffmpeg [7]. According to the name and entry-file of the image, the functionality of this image should be image and video processing. However, our framework detects that the real functionality of the entry-file is mining Bit-coins. By leveraging the syscall log reported by our framework, we determine that the real identity of this image is a Bit-coin miner. Thus, once users run the image, their machines will become slaves for cryptomining. A.4

Distribution of Vulnerabilities

We investigate the distribution of vulnerabilities in the Negligible Negligible Critical Critical 2 19286 2038 1573316 latest version of all official 0.01% 60.94% 0.05% 38.15% images. First, we observe that Unknown Medium Unknown Medium 1069 6777 32329 1527037 the latest official images con3.38% 21.42% 0.79% 37.03% tain 30,000 CVE vulnerabilHigh Low High Low 1704 2808 334054 654868 5.38% 8.87% 8.1% 15.88% ities. Figure 5(a) categorizes (a) Official images. (b) Community images. these CVE vulnerabilities into 6 groups according to the severFig. 5. Vulnerabilities existing in the latest ity levels assessed by the latimages. est CVSSv3 scoring system [12]. Although only 6% of vulnerabilities are highly/critically severe, they exist in almost 30% of the latest official images. Furthermore, we conduct a similar analysis on the latest images in the 10,000 most popular community repositories. As shown in Fig. 5(b), the ratios of vulnerabilities with medium and high severity increase to over 37% and 8%, respectively, which are higher than those of official images. In addition, it is quite alarming that more than 64% of community images are affected by highly/critically severe vulnerabilities such as the denial of service and memory overflow. These results demonstrate that both official and community images suffer from serious software vulnerabilities. Additionally, community images contain more vulnerabilities with higher severity. Hence, we propose that software vulnerability is an urgent problem which seriously affects the security of Docker images.

References 1. Amazon Elastic Container Servicen, August 2019. https://aws.amazon.com/ getting-started/tutorials/deploy-docker-containers 2. Anchore, August 2019., https://anchore.com/engine/ 3. API to get Top Docker Hub images, August 2019. https://stackoverflow.com/ questions/38070798/where-is-the-new-docker-hub-api-documentation 4. Docker, August 2019. https://www.docker.com/resources/what-container

Understanding the Security Risks of Docker Hub

275

5. Docker Hub Documents, August 2019. https://docs.docker.com/glossary/? term=Docker%20Hub 6. Docker Security Best-Practices, August 2019. https://dev.to/petermbenjamin/ docker-security-best-practices-45ih 7. FFmpeg, August 2019. http://ffmpeg.org 8. Malicious Docker Containers Earn Cryptomining Criminals $90K, August 2019. https://kromtech.com/blog/security-center/cryptojacking-invades-cloud-howmodern-containerization-trend-is-exploited-by-attackers 9. Running Docker in Production, August 2019. https://ghost.kontena.io/docker-inproduction-good-bad-ugly 10. strings(1) - Linux man page, August 2019. https://linux.die.net/man/1/strings 11. Virustotal Api, August 2019. https://pypi.org/project/virustotal-api/ 12. Vulnerability Metrics, August 2019. https://nvd.nist.gov/vuln-metrics/cvss 13. Understanding the Security Risks of Docker Hub, July 2020. https://github.com/ decentL/Understanding-the-Security-Risks-of-Docker-Hub 14. Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: DREBIN: effective and explainable detection of Android malware in your pocket. In: NDSS, vol. 14, pp. 23–26 (2014) 15. Bugiel, S., N¨ urnberger, S., P¨ oppelmann, T., Sadeghi, A.R., Schneider, T.: Amazonia: when elasticity snaps back. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 389–400. ACM (2011) 16. Combe, T., Martin, A., Di Pietro, R.: To docker or not to docker: a security perspective. IEEE Cloud Comput. 3(5), 54–62 (2016) 17. Cozzi, E., Graziano, M., Fratantonio, Y., Balzarotti, D.: Understanding Linux malware. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 161–175. IEEE (2018) 18. Duan, R., et al.: Automating patching of vulnerable open-source software versions in application binaries. In: NDSS (2019) 19. Gao, X., Gu, Z., Kayaalp, M., Pendarakis, D., Wang, H.: ContainerLeaks: emerging security threats of information leakages in container clouds. In: 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 237–248. IEEE (2017) 20. Gorla, A., Tavecchia, I., Gross, F., Zeller, A.: Checking app behavior against app descriptions. In: Proceedings of the 36th International Conference on Software Engineering, pp. 1025–1035. ACM (2014) 21. He, P., Zhu, J., He, S., Li, J., Lyu, M.R.: Towards automated log parsing for largescale log data analysis. IEEE Trans. Dependable Secure Comput. 15(6), 931–944 (2017) 22. Kotzias, P., Matic, S., Rivera, R., Caballero, J.: Certified PUP: abuse in authenticode code signing. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 465–478 (2015) 23. Tak, B., Kim, H., Suneja, S., Isci, C., Kudva, P.: Security analysis of container images using cloud analytics framework. In: Jin, H., Wang, Q., Zhang, L.-J. (eds.) ICWS 2018. LNCS, vol. 10966, pp. 116–133. Springer, Cham (2018). https://doi. org/10.1007/978-3-319-94289-6 8 24. Lin, X., Lei, L., Wang, Y., Jing, J., Sun, K., Zhou, Q.: A measurement study on Linux container security: attacks and countermeasures. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 418–429. ACM (2018) 25. Liu, B., Zhou, W., Gao, L., Zhou, H., Luan, T.H., Wen, S.: Malware propagations in wireless ad hoc networks. IEEE Trans. Dependable Secure Comput. 15(6), 1016– 1026 (2016)

276

P. Liu et al.

26. Loukidis-Andreou, F., Giannakopoulos, I., Doka, K., Koziris, N.: Docker-Sec: a fully automated container security enhancement mechanism. In: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 1561– 1564. IEEE (2018) 27. Martin, A., Raponi, S., Combe, T., Di Pietro, R.: Docker ecosystem-vulnerability analysis. Comput. Commun. 122, 30–43 (2018) 28. Martin, W., Sarro, F., Yue, J., Zhang, Y., Harman, M.: A survey of app store analysis for software engineering. IEEE Trans. Softw. Eng. 43(9), 817–847 (2017) 29. Miller, B., et al.: Reviewer integration and performance measurement for malware detection. In: Caballero, J., Zurutuza, U., Rodr´ıguez, R.J. (eds.) DIMVA 2016. LNCS, vol. 9721, pp. 122–141. Springer, Cham (2016). https://doi.org/10.1007/ 978-3-319-40667-1 7 30. Nguyen, D., Derr, E., Backes, M., Bugiel, S.: Short text, large effect: measuring the impact of user reviews on Android app security and privacy. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 155–169. IEEE (2019) 31. Rastogi, V., Davidson, D., Carli, L.D., Jha, S., Mcdaniel, P.: Cimplifier: automatically debloating containers. In: Joint Meeting on Foundations of Software Engineering (2017) 32. Ray, B., Posnett, D., Filkov, V., Devanbu, P.: A large scale study of programming languages and code quality in GitHub. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 155–165. ACM (2014) 33. Ringer, T., Grossman, D., Roesner, F.: Audacious: user-driven access control with unmodified operating systems. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 204–216. ACM (2016) 34. Shahzad, M., Shafiq, M.Z., Liu, A.X.: A large scale exploratory analysis of software vulnerability life cycles. In: 2012 34th International Conference on Software Engineering (ICSE), pp. 771–781. IEEE (2012) 35. Shalev, N., Keidar, I., Weinsberg, Y., Moatti, Y., Ben-Yehuda, E.: WatchIT: who watches your IT guy? In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 515–530. ACM (2017) 36. Shu, R., Gu, X., Enck, W.: A study of security vulnerabilities on docker hub. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 269–280. ACM (2017) 37. Sun, Y., Safford, D., Zohar, M., Pendarakis, D., Gu, Z., Jaeger, T.: Security namespace: making Linux security frameworks available to containers. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 1423–1439 (2018) 38. Wijesekera, P., et al.: The feasibility of dynamically granted permissions: aligning mobile privacy with user preferences. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 1077–1093. IEEE (2017) 39. Zerouali, A., Mens, T., Robles, G., Gonzalez-Barahona, J.M.: On the relation between outdated docker containers, severity vulnerabilities, and bugs, pp. 491–501 (2019)

DE-auth of the Blue! Transparent De-authentication Using Bluetooth Low Energy Beacon Mauro Conti1 , Pier Paolo Tricomi1(B) , and Gene Tsudik2 1 University of Padua, Padua, Italy [email protected], [email protected] 2 University of California, Irvine, USA [email protected]

Abstract. While user authentication (e.g., via passwords and/or biometrics) is considered important, the need for de-authentication is often underestimated. The so-called “lunchtime attack”, whereby a nearby attacker gains access to the casually departed user’s active log-in session, is a serious security risk that stems from lack of proper deauthentication. Although there have been several proposals for automatic de-authentication, all of them have certain drawbacks, ranging from user burden to deployment costs and high rate of false positives. In this paper we propose DE-auth of the Blue (DEB) – a cheap, unobtrusive, fast and reliable system based on the impact of the human body on wireless signal propagation. In DEB, the wireless signal emanates from a Bluetooth Low Energy Beacon, the only additional equipment needed. The user is not required to wear or to be continuously interacting with any device. DEB can be easily deployed at a very low cost. It uses physical properties of wireless signals that cannot be trivially manipulated by an attacker. DEB recognizes when the user physically steps away from the workstation, and transparently de-authenticates her in less than three seconds. We implemented DEB and conducted extensive experiments, showing a very high success rate, with a low risk of false positives when two beacons are used. Keywords: De-authentication Information security

1

· Bluetooth beacon · Wireless signals ·

Introduction

In many environments, such as workplaces, schools and residences, computers are often used by multiple users, or multiple users’ computers are co-located. To prevent unauthorized access, any secure system mandates authentication and de-authentication mechanisms. Most of the time, the user authenticates, e.g., The second author’s work was done in part while visiting University of California, Irvine. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 277–294, 2020. https://doi.org/10.1007/978-3-030-58951-6_14

278

M. Conti et al.

by demonstrating the possession of secrets, once, at the beginning of the log-in session, and it is her own responsibility to explicitly log out when she leaves the session. Unfortunately, since users are often too lazy or careless, they neglect to terminate the log-in session by locking their workstation or logging out. Users who take a short (coffee or bathroom) break, often do not logout, since logging in again is annoying and time consuming. This behavior opens the door for so-called Lunchtime Attacks [9]. Such an attack occurs whenever an authenticated user, for whatever reason leaves the workplace, thus allowing the (typically insider) attacker to gain access to the current session. The research community put a lot of effort into authentication techniques, producing several widely used and effective methods [1]. The most common technique, password-based authentication, remains popular despite predictions of its imminent demise. However, more and more alternative or supplemental techniques, (especially, biometric-based), are becoming popular. They reduce error rates and improve the overall security. Meanwhile, de-authentication is a relatively recent concern, arising mainly due to the threat of aforementioned lunchtime attacks. Many proposed methods are inaccurate, expensive, obtrusive and not transparent. Even continuous authentication (where the user is authenticated constantly after initial login) suffers from the same problems. It is clear that better de-authentication techniques are needed. In this paper, we propose “DE-auth of the Blue” (DEB), a new deauthentication method that uses a Bluetooth Low Energy Beacon as the main tool to determine whether the user is still present. BLE Beacons are wireless sensors that continuously transmit Bluetooth Low Energy signals to allow mobile technologies to communicate in proximity. DEB takes advantage of the shadowing effect caused by the human body on high-frequency wireless signals. In particular, when a person stands on the Line-of-Sight (LoS) path between a transmitter and a receiver, the water in the human body affects signal strength, thus reducing the Received Signal Strength Indicator (RSSI) [22]. Therefore, by instrumenting a chair with a BLE Beacon, which spreads wireless signal, and monitoring the RSSI on the user’s computer, we can infer the presence of a person, since it breaks the Line-of-Sight and lowers the RSSI. The power loss through human body is usually modeled as a time function due to random people’s mobility [14,21]. However, in DEB the user is always sitting on a chair (i.e., not random), thus we can determine the shadowing loss by empirical measurements and use it as a threshold to determine whether there is human presence. Contributions. Our contribution is twofold: (1) we present DEB (De-auth of the Blue), a new reliable, fast and inexpensive user de-authentication method, and (2) we give a comparative overview of the current de-authentication methods, discussing their benefits and flaws, w.r.t. DEB. Organization: Sect. 2 overviews related work. Section 3 describes DEB and the adversary model. Section 4 discusses collected data, while experimental results and other de-authentication and continuous authentication techniques are treated in Sect. 5. Section 6 explores future work and we conclude in Sect. 7.

DEB – Transparent De-authentication using Bluetooth Low Energy Beacon

2

279

Related Work

Authentication and de-authentication techniques have been extensively studied in the last years. Meanwhile, usage of Bluetooth Low Energy (BLE) Beacons is quite recent: they are mainly used in the proximity context, for indoor localization and by retailers for managing the relationships with customers. Faragher & Harle showed in [10] how to perform indoor localization with BLE fingerprinting, using 19 beacons in an area of ∼600 m2 to detect the position of a consumer device. This demonstrated the advantage of BLE over WiFi for positioning. In [16] Palimbo et al. used a stigmergic approach, a mechanism of indirect coordination, to create an on-line probability map identifying the user’s position in an indoor environment. In [13], authors conducted experiments with indoor positioning using a BLE, achieving over 95% correct estimation rate and a fully accurate (100%) determination whether a device is in a given room. BLE beacons also make it possible to create smart offices and save energy, e.g., [6] Choi et al. created a smart office where BLE beacons and a mobile app determine whether a user enters or exits the office to adjust the power saving mode of the user’s PCs, monitors, and lights. De-authentication is required only after the initial authentication process. Classical authentication methods require one-time input of the user’s secret(s). After, the user must either de-authenticate explicitly (e.g., by locking the screen or logging out) or be de-authenticated via some automated technique, in order to prevent lunchtime attacks. Recent techniques, which are mostly biometric, provide continuous authentication: the user is continuously monitored while interacting with the system. When this interactions stop, the user is automatically de-authenticated—with clear usability drawbacks, e.g., when the user is at her desk, just reading a document. The most popular user authentication is based on passwords. They are intuitive, the user effort (burden) is low and no special equipment is needed. However, they also present some drawbacks. First, users often choose easy-to-remember passwords, which leads to the high probability of cracking [4]. Second, there is no way to automatically de-authenticate a user based on passwords: another task is required, such as clicking or pressing a button, which is often forgotten or deliberately ignored. Even if sophisticated techniques, such as fingerprint of iris recognition could address the first issue, passwords still are not amenable to automatic de-authentication. An inactivity time-out can address the second problem. However, this is ineffective, since interval length is difficult decide upon due to wide range of user (in)activity [19]. Technology is now moving towards continuous authentication based on biometrics. One popular method is keystroke dynamics [3], which creates a user model based on typing style, collecting features such as speed and strength. Though easy to deploy and not requiring any special equipment, it has two drawbacks. First, it is effective only against the zero-effort attack, whereby the attacker claims another user’s identity without investing any effort. As shown in [20], a brief training is enough to learn how to reproduce a given user’s typing style. Second, it does not allow for an inactivity period. The user can be

280

M. Conti et al.

continuously authenticated only while typing, whereas, in a common scenario where the user is present while not typing (e.g., reading emails or watching a video) the system is ineffective. Kaczmarek et al. proposed Assentication [12], a new hybrid biometric based on user’s seated posture pattern. The system captures the combination of physiological and behavioral traits, creating a unique model. It involves instrumenting an office chair with 16 tiny pressure sensors. However, though the system exhibits very low false positive and false negative, its cost is not negligible (∼150$ per chair), and it has low permanence, i.e., the biometric does not remain consistent over long term. More complex systems such as gaze tracking [9] and pulse response [18] have been proposed. They are based on user’s eye movements and human body impedance, respectively. Though highly accurate, both require expensive extra equipment which inhibits their large-scale adoption. Moreover, gaze tracking technique forces the user to always look in roughly the same direction. An inadvertent head movement can cause a false positive. Proximity can be also as a basis for de-authentication. A typical technique of this sort required the user to carry a token, which communicates over a secure channel to a workstation, to continually authenticate the user [8]. The main drawback of this authentication is that the user must carry the token, and a serious risk can occur if it is stolen. Another similar proposal is ZEBRA [15]. In it, a user wears a bracelet on the dominant wrist. The bracelet is equipped with a gyroscope, accelerometer and radio. When the user interacts with a computer keyboard, the bracelet records the wrist movement, processes it, and sends it to the terminal. After a comparison between received information and keyboard activity (made by the user who wears the bracelet), user’s continued presence is confirmed only if they are correlated. There are two flaws here: (1) the user must wear a bracelet (thus, sacrificing unobtrusiveness), and (2) as shown in [11], attacker can, with reasonable probability, remain logged as the victim user in by opportunistically choosing when and how to interact. There was one prior attempt to use attenuation of wireless signals caused by human body for de-authentication purposes. FADEWICH [7] involves an office setting instrumented with nine sensors that detect a user’s position, measuring variations in signal strength. The system reaches 90% accuracy within four seconds, and 100% accuracy within six seconds. However, every office must be instrumented uniquely to determine where to place the sensors. Also, the deployment phase is not trivial and presence of other moving people nearby can cause false positives.

3

System and Adversary Model

We now present the system and adversary model we consider throughout this paper. In particular, Sect. 3.1 describes expected usage of the proposed system, while Sect. 3.2 discusses some attack scenarios.

DEB – Transparent De-authentication using Bluetooth Low Energy Beacon

3.1

281

System Model

We realized DEB using a BLE beacon and one antenna to capture the former’s signals. The de-authentication process should be used in a common office, instrumented with chairs and workstations. DEB is completely unobtrusive. The user arrives to the office, sits down on the chair, logs into the computer (PC or a docked laptop) via usual authentication means (such as passwords), uses the computer and finally gets up, leaving her workspace. Right afterwards, the computer automatically locks her out, thus protecting the current session from unauthorized access. To accomplish this goal, every chair of the office is instrumented with a BLE beacon. A BLE antenna, usually integrated with laptops, collects beacon signals and sends them to our application. The application analyzes signals to decide whether to de-authenticate. In contrast with other de-authentication or continuous authentication methods, DEB does not require the user to actually use the computer after initial login. As part of routine activities, a typical user might make or receive phonecalls, read a document or email, look at the smartphone, watch a video, etc. In all of these activities, the keyboard and mouse are not being used, yet the user remains present. In such cases, de-authentication is both unnecessary and annoying. 3.2

Adversary Model

The main anticipated adversary can be anyone with physical access to the office, such as a co-worker, a customer or a supervisor, with the goal of gaining access to the workstation and the active session of another the victim user. The adversary does not know the user’s credentials. We assume that the adversary has access to the victim’s inside office. There are two ways in which the adversary can try to subvert the system: (1) sit down on the victim’s chair thus restoring a similar RSSI before DEB de-authenticates the user, or (2) alter the RSSI by attenuating the signal, e.g., putting an object in the line of sight, which yields a shadowing effect similar to the human body presence. Another threat with a somewhat less impact is the user herself who might be annoyed by automatic de-authentication during a short break away from the workspace (e.g., to get a coffee, use the restroom). Re-logging in is generally viewed as tedious, thus the legitimate user might try to inhibit the system. A selective attack on BLE advertisement has been demonstrated in [5], which makes packets unreadable. Other attacks annihilate the signal [17]. However, both the attacks cannot interfere with DEB. In fact, even if packets are unreadable, the RSSI is not changed, and if the receiver cannot detect the beacon, DEB automatically logs out the user, since it cannot provide security anymore. To succeed, the attacker must manipulate signals, changing the RSSI when the user leaves the workstation, reproducing the previous value. In this case, he should know this prior value, and it is very hard to alter the signal of one beacon when multiple beacons are located in the same room (e.g., in a common workplace scenario); thus this attack is easy to detect.

282

4

M. Conti et al.

Proposed Method

We now present DEB – a transparent de-authentication method based on BLE beacons and the shadowing effect caused by the presence of the human body in Line of Sight. Section 4.1 overviews DEB. The correct positioning of the beacon is studied in Sect. 4.2. A two-beacon version of DEB is shown in Sect. 4.3. 4.1

DEB Overview

DEB is a low-cost and easy-to-deploy method for fast and trasparent user deauthentication. The only extra equipment is a BLE beacon placed on the underside of the user’s chair. The computer continuously measures the RSSI between itself and the beacon. Since the body of the sitting user is directly in the LoS, it affects the signal power, causing the shadowing effect. The resulting RSSI is lower than the one in an unobstructed path. By monitoring the RSSI, DEB determines whether the user is present. DEB overview is shown in Fig. 1.

Fig. 1. Overview of the system.

The beacon located underneath the user’s chair automatically spreads wireless signals at a fixed frequency. The user’s PC has a BLE antenna which collects beacon’s advertisements, measuring the RSSI. As mentioned in Sect. 3, laptops usually have a built-in BLE antenna, while a BLE USB dongle can be used for Desktops. In the deployment phase, it is sufficient to attach the beacon to the chair and give the ID of the beacon to the app running on the PC. This app is installed on the computer and runs in the background, continuously collecting BLE signals. At authentication time, the user is assumed to be sitting on the chair, and the measured RSSI becomes the reference value. When the user leaves, the RSSI increases, since human body impedance disappears. Noticing an RSSI increase that exceeds a pre-specified threshold (the expected impedance of the human body), DEB automatically de-authenticates the user. How to determine the optimal position for the beacon (in order to optimize the RSSI variation and the threshold value) is discussed below.

DEB – Transparent De-authentication using Bluetooth Low Energy Beacon

4.2

283

Beacon Positioning and Data Collection

The beacon and the receiver must be positioned to have LoS interrupted by the sitting user. Since we suppose that the computer is always on top of the desk, we need to decide where to attach the beacon. For this purpose, we considered 12 different positions: six on the back, and six on the seat of, a common office chair. The corresponding positions are shown in Fig. 2. We tried three different positions for the chair back: top, middle, bottom, as well as for the seat itself: behind, middle, front. Each position was tested on each side of the chair – behind and in front of the back, under and above the seat.

Fig. 2. Different beacon positions.

For the measurements, we used a RadBeacon, shown in Fig. 3. We chose this device due to its small dimensions (35 mm × 35 mm × 15 mm), coin cell powersupply, configurable transmission power and frequency, ranging from +3 dBm to −18 dBm, and one to ten times per second, respectively. After some tries, we set the transmission power at 0 dBm and the advertisement rate at three times per second, which we view as a good compromise between efficiency and energy savings. Due to these settings, transmission range is about five meters (the minimum achievable) and the battery is estimated to last about 220 days. Its small dimensions allow the beacon to be attached to many surfaces of the chair without being cumbersome or attracting attention. To determine the best beacon position, we did a preliminary measurement using two average-sized people. We also considered the potential attack of putting an object into LoS, which could cause signal attenuation similar to that of a human body. Table 1 shows common objects with their level of interference, according to Apple documentation [2].

284

M. Conti et al.

Fig. 3. Radbeacon used in measurements. Table 1. Different materials and level of interference. Type of barrier Interference potential Wood

Low

Synthetic material

Low

Glass

Low

Water

Medium

Bricks

Medium

Marble

Medium

Plaster

High

Concrete

High

Bulletproof glass

High

Metal

Very high

Since the human body attenuation is mainly introduced by water, that has a medium interference potential, to reproduce the same interference, intuitively, the attacker should use a medium-sized object made from medium-barrier material, or a small object with high interference. We believe that using a large object with low interference is infeasible. Since the object should be stealthily placed onto the chair, it would be hard to hide and manage. For the same reason, we believe that objects made from marble, plaster, concrete, bulletproof glass, and brick, would be noticed and are hard to use in limited time when the user is away. Thus, we experimented with two objects commonly found in a typical office: a bottle of water and a metal card with the form factor of a credit card. We conducted RSSI measuring experiments from a laptop (Acer Aspire 7). In the initial phase, the user sat on the chair, breaking LoS between the beacon and the antenna. Eventually she (or the object) steps away from the workspace. An example graph of in Fig. 4 shows that – in the sitting period (from 0 to 5 seconds approximately) – the average RSSI is significantly lower than the one in free space propagation (around 25 dBm less from second 6 to 12 approximately). Also, changes occur almost instantaneously – within one second, the system decides whether the user left.

DEB – Transparent De-authentication using Bluetooth Low Energy Beacon

285

Fig. 4. Example of RSSI changes. Beacon located above the seat, in the front position.

Results of our study are shown in Tables 2 and 3. Table 2 refers to measurements conducted with the beacon under the seat, while in Table 3 the beacon on the seatback. The values represent the difference in RSSI when the user was absent and present. After the preliminary study, we chose the two best positions: Under-Middle for the seat and Behind-Top for the back. We chose these since they provide very good RSSI change and a noticeable difference between the human body and inanimate objects. Table 2. RSSI changes in dBm - Beacon on the seat. Above the seat

Under the seat

Front

Middle

Behind

Front

Middle

Behind

Human body 1

25

25

15

10

7

11

Human body 2

25

25

10

5

5

10

Water bottle

22

15

20

0

0

6

Metal card

12

0

5

5

0

0

Obviously, since the chair is not in LoS between the user and the beacon, results of above-the-seat and in-front-of-the-back are better. However, we want to create system which is easy to deploy, and those positions require us to open the chair in order to install the beacon. This would be laborious and uncomfortable for the user. Therefore, those results are shown only to see that DEB could work even better if it were possible to place the beacon into the chair. Whereas, we decided to simply attach the beacon to the chair.

286

M. Conti et al. Table 3. RSSI changes in dBm - Beacon on the back. In front of the Back

Behind the Back

Top

Middle

Bottom

Top

Middle

Bottom

Human body 1

10

20

20

16

10

12

Human body 2

13

15

15

11

10

12

Water bottle

13

20

5

5

7

2

Metal card

8

7

0

3

7

5

In the final phase of the preliminar experiment, we used three average-sized people, the water bottle and the metal card. Tables 4 and 5 show results for Under-the-seat-Middle and Behind-the-back-Top, respectively. The average of measurements is reported for the human body. The experiment was conducted with the chair in front the desk (0◦ ) and turning it in three different positions: 90◦ left, 90◦ right and 180◦ . Negative values indicate that the RSSI was higher than the initial one, measured with no one in the chair. This happens because the beacon is closer to the receiver in those positions. 4.3

Two-Beacon System

Efficacy of DEB depends on the quality of measurements. It can be affected by multiple factors, such as quality of transmitter and receiver, their synchronization and (obviously) presence of objects in LoS. Since the two devices are not paired, the advertising rate and the scanning frequency cannot be perfectly synchronized, which can lead to loss of some signals and lower performances. Also, if the user moves the chair, our assumptions might not hold any longer, thus causing some false positives. Consequently, in the later stage of the study, we decided to increase performances of DEB and decrease the false positive rate by introducing the second beacon. By using two different positions on the chair, we can gather more information and make more assumptions about the user’s position, with a negligible difference in the deployment cost. Furthermore, fooling two beacons is more challenging for the attacker. The two-beacon version of DEB is discussed in Sect. 5.3. Table 4. RSSI changes in dBm - Beacon under the seat in the middle. 0◦

90◦ Left

180◦

90◦ Right

Human body

7

5

7

6

Water bottle

2

0

0

-3

Metal card

0

-2

2

0

DEB – Transparent De-authentication using Bluetooth Low Energy Beacon

287

Table 5. RSSI changes in dBm - Beacon behind the back on the top.

5

0◦

90◦ Left

180◦

90◦ Right

Human body

18

15

-7

-4

Water bottle

2

6

11

13

Metal card

5

5

-6

10

Evaluations

After preliminary experiments, we conducted a user study to assess feasibility of DEB. In Sect. 5.1 we present the procedure of the user study and the algorithm for determining when the user leaves. In Sect. 5.2, measurements are performed using a single beacon, while Sect. 5.3 reports on the two-beacon version. Section 5.4 compares DEB with other de-authentication methods. 5.1

Data Collection Procedure and Algorithm

We collected data from 15 people (10 men and 5 women), with each person using a swivel chair and a normal (stationary) chair. For the latter, we asked participants to sit down for about five seconds and then stand up and leave. The beacon was positioned on the chair while the PC was on the desk. We tried two different positions, according to preliminary results: Behind-the-backTop and Under-the-seat-Middle. For the swivel chair, since it can rotate, we asked subjects to sit down and then stand up and leave as before; in addition, we asked them to rotate the chair 90◦ left, 90◦ right and 180◦ . We decided to include rotational measurements to assess DEB resistance to false positives. More details are below. To understand when the user leaves, we have to detect a significant increase of the RSSI. We compute the average of all measurements in a sliding window of three seconds, every second. The window of three seconds is the maximum amount of time we need to de-authenticate. Then, we compare the average of the current sliding window with those of its predecessors (in a limited range of five seconds to forget about history and remove the risk of false positives), to see whether the difference exceeds a fixed threshold. When this happens, we can infer the user left and, by using the sliding window approach, we can determine that in at most three seconds. The threshold was decided based on preliminary experiments, while a small variation of the algorithm was used in the two-beacon version (see below). 5.2

Single-Beacon System Results

Looking at preliminary results, we decided to use the position Behind-the-backTop as the best for placing the beacon. While Under-the-seat-Middle is more

288

M. Conti et al.

stable, its value of 6–7 dBm is acceptable, though not very high. Also, the difference between the human body and the water bottle is not large. On the other hand, RSSI difference in Behind-the-back-Top is great, and it is very easy to distinguish between a human and a water bottle, or a metal card. Since the user is supposed to work in front of the desk most of the time, we could consider the 0◦ data only. In our particular case, with the swivel chair, the problems are present only in 90◦ right and 180◦ , since the beacon is closer to the receiver (i.e., in our PC the antenna was on the left side, so when the user rotates towards right the beacon goes towards left, closer to the receiver). However, in both cases, the value is higher than the one in the free LoS. This can be used to determine that the beacon is now closer; thus, the chair was rotated and the user might be sitting still. In 90◦ left the beacon is farther, so the RSSI would not be higher. A better solution to this problem is presented in Sect. 5.3. Based on collected data, we fixed the threshold at ∼10 dBm. When the RSSI exceeds this threshold, the user is de-authenticated. With this threshold, DEB avoids false positives due to signal fluctuation, and none of the aforementioned objects mimic the same RSSI change as the one made by the human body. We believe that the attacker would not be able to interfere with DEB in any meaningful manner. With the frequency of three times per second, and the signal captured multiple times, we can compute an accurate average and notice the change within two seconds. To check for a false positive, we can give a grace period of one second, with a successful automatic de-authentication after three seconds. In this small amount of time, the attacker cannot sit down on the victim’s chair or place an object in LoS without being seen. As mentioned in Sect. 3, another threat could emanate from the actual user. First, the user can place an object in LoS, while leaving. This is doable, however, the user should build a proper object to mimic the RSSI change, which is not a trivial task. Furthermore, re-entering a password is easier and less annoying than finding the right object and properly positioning it every time when leaves the workspace. The user could also try to remove the beacon from the chair, and position it at another location to keep the constant RSSI. To do so, the user must find a position that mimics the RSSI reduced by the human presence. The user has two ways to succeed. One way is to place the beacon further than the normal position. However, if someone or something breaks LoS, the RSSI may become too low to be detected or to be credible and DEB will log out the user. Another way is to place the beacon nearby where nothing can interfere with LoS (i.e., on the desk) and reduce the RSSI by covering the beacon with an object. In this case, reproducing the desired RSSI is more difficult, and the fluctuation of the signal will be too regular to be produced by the human body. DEB would detect this attack. Figure 5 shows sample RSSI changes and algorithm detection for one user. Dashed lines represent the averages of the sliding window. At second 6, the average is higher than its predecessors over the threshold; so, DEB detects that the user left. With all the 15 participants, we achieved 100% success with the beacon positioned Behind-the-back-Top and a threshold of 10 dBm, on the normal

DEB – Transparent De-authentication using Bluetooth Low Energy Beacon

289

chair and on the swivel chair, without rotation. When the beacon was positioned Under-the-seat-Middle, we set the threshold to 6 dBm according to results from the preliminary study, again achieving 100% of success for both chairs.

Fig. 5. RSSI changes: user leaves. Beacon in Behind-the-back-Top position.

5.3

Two-Beacon System Results

Positioning the beacon behind the back of the chair might cause a false positive, i.e., erroneous de-authentication, if the user rotates the chair. The RSSI might increase (if the beacon is closer to the receiver) even if the user is still sitting, which is clearly undesirable. Previous results in Sect. 4 show that the RSSI of the signal spread by a beacon positioned Under-the-seat-Middle is almost constant, even if the chair is rotated. Based on this observation, we decided to use another beacon to improve DEB. Even if the RSSI change of a beacon located under the chair is not wide enough to distinguish between a human and another object, it is constant and high enough to be used as a double-check. The idea is to use the beacon Behind-the-back-Top as the first main means of deciding if the user is leaving. If the RSSI of its signal increases, we examine the signal of the second beacon to see whether this is a false positive. In the algorithm, we monitor averages using a sliding window, as before. When we detect a possible hit, we check the second beacon. If its average is also above its predecessor’s more than the second threshold (appropriate for that position), the algorithm de-authenticates; otherwise, this is considered a rotation and no de-authentication takes place. Two-beacon DEB is more difficult to attack, since the adversary needs to fool two beacons instead of one. Figure 6 shows a false positive, wherein the RSSI of the first beacon (light green diamonds - Behind-the-back-Top) is increasing, while the signal of the second (dark green circles - Under-the-seat-Middle) remains almost constant. Whereas, in Fig. 7, we see a true positive, where signals are increasing, since the user is leaving.

290

M. Conti et al.

Fig. 6. RSSI changes: user chair rotation. (Color figure online)

Fig. 7. RSSI changes: user leaves. (Color figure online)

From our user-study, all the 15 participants were correctly de-authenticated within three seconds. During the rotations, the false positive were probably to occur at 90◦ toward right and 180◦ , since the Behind the back; Top was closer to the PC’s antenna. In the experiment, all the rotations were correctly classified. While at 90◦ toward left the signals RSSI were almost the same, in 90◦ toward right and 180◦ the Behind the back; Top showed possible de-authentication cases. However, the Under the seat; Middle never reached the threshold in these situations, avoiding false positives.

DEB – Transparent De-authentication using Bluetooth Low Energy Beacon

291

Inactivity Allowed

Non Obtrusive

Difficult to Circumvent

Free from Enrollment















Low (Free)

ZEBRA















Medium ($100)

5.4

Price

Non-Cont. Auth.

Timeout

Nothing to Carry

Non Biometric

Table 6. Comparison between de-authentication techniques.

Gaze Track.















High ($2-5k)

Popa















Medium ($150)

Pulse resp.















Unknown

Keystr. Dyn.















Low (Free)

FADEWICH















Unknown

BLE Beacon















Low ($10)

Comparison with Other De-authentication Techniques

We now compare DEB with other de-authentication techniques. Table 6 shows key features of prominent current methods. First, a good method should not force the user to carry anything extra. In fact, it could be annoying, uncomfortable, unaesthetic and result in a serious problem if stolen or forgotten. Most of the techniques satisfy this objective, only ZEBRA with the bracelet and Pulse Response with electrodes violate it. The presented techniques can be divided into biometric and non-biometric. The former is desirable since it is more precise, and typically difficult to subvert. Also, they usually provide continuous authentication – de-authentication can be automatically triggered when the user can no longer verify identity. Even though biometric and continuous authentication methods are more precise, they usually have two issues. First, they require an enrollment phase which can be annoying and time-consuming. Second, biometric hardware can be very expensive. This is true especially for Gaze-Tracking, which costs $2–5K per unit. Other methods,, such as ZEBRA and Popa, are cheaper than Gaze-Tracking. However, the cost per unit is still over $100, which can be quite expensive if instrumenting a big office. Also, ZEBRA is not a purely biometric method; it has been shown to be less secure than others. Pulse Response and Keystroke Dynamics are both biometric-based and continually authenticate the user. Though the cost of Pulse Response is unclear, we believe that the hardware is not cheap. On the other hand, Keystroke Dynamics is completely free. DEB, similar to Timeout and FADEWICH, is not biometric-based and does not provide continuous authentication. Nonetheless, its effectiveness has been demonstrated. Furthermore, it requires no enrollment phase and its cost is very low. Keystroke Dynamics might seem to be the best choice. However, as mentioned earlier, it has an important drawback in that it does not allow the user to be present-but-inactive. Another important property is user burden or unobtrusiveness. All considered methods are transparent, except for ZEBRA. DEB is

292

M. Conti et al.

completely unobtrusive since the user need not be aware and does not need to interact with it. Finally, difficulty of circumvention is also an important property. Inactivity time-outs is the least prone: a too-long time-out value can be too long to de-authenticate the user right after he leaves. ZEBRA was shown to be vulnerable, and FADEWICH can be circumvented if the office is crowded. We demonstrated that DEB is secure: its de-authentication is very fast, at under three seconds.

6

Future Work

Thus far, we tested DEB with 15 people in a communal workspace. In the future, a bigger study should be conducted to better evaluate DEB and discover new useful features, where machine learning could be used to improve efficacy. So far, we found a strong correlation between the RSSI and user actions, demonstrating that the human body attenuation does not vary a lot among different users, and can be used to de-authenticate. Moreover, we used a BLE beacon as the means to send signals. However, perhaps another more suitable device can be used. Since the beacon cannot be paired with the receiver, the RSSI value is affected by fluctuations that sometimes make results inaccurate. On the other hand, a connection between paired device is more stable. In this direction, we could try to find another cheap device that can work better than a BLE beacon. A stable signal will provide both more accuracy and more difficulties for an adversary. Finally, we imagine a similar system where the transmitter and the receiver are both on the user’s chair. In this situation, no false positive are likely due to common actions, such as chair rotation.

7

Conclusion

This paper proposed a new system, DEB, to automatically de-authenticate users. DEB was implemented and tested with encouraging results. Although de-authentication is not a new topic, current methods have some limitations, such as extra equipment for users to carry, lack of inactivity permission, or imposition of expensive hardware to verify user presence. In contrast, for under $10, DEB de-authenticate a user in under three seconds. This speed makes literally hard for the attacker to gain access to the victim’s session. Though DEB is not biometric-based and is not a continuous authentication technique, it is completely unobtrusive, free from the tedious enrollment phase and is quite reliable. Finally, using a second beacon as an extra security factor, we can lower the rate of false positives.

References 1. Al Abdulwahid, A., Clarke, N., Stengel, I., Furnell, S., Reich, C.: A survey of continuous and transparent multibiometric authentication systems. In: European Conference on Cyber Warfare and Security, pp. 1–10 (2015)

DEB – Transparent De-authentication using Bluetooth Low Energy Beacon

293

2. Apple: Potential sources of wi-fi and bluetooth interference (2017). https:// support.apple.com/en-us/HT201542. Accessed 5 July 2018 3. Banerjee, S.P., Woodard, D.L.: Biometric authentication and identification using keystroke dynamics: a survey. J. Pattern Recogn. Res. 7(1), 116–139 (2012) 4. Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: 2012 IEEE Symposium on Security and Privacy, pp. 538–552. IEEE (2012) 5. Brauer, S., Zubow, A., Zehl, S., Roshandel, M., Mashhadi-Sohi, S.: On practical selective jamming of Bluetooth low energy advertising. In: 2016 IEEE Conference on Standards for Communications and Networking (CSCN), pp. 1–6. IEEE (2016) 6. Choi, M., Park, W.K., Lee, I.: Smart office energy management system using bluetooth low energy based beacons and a mobile app. In: 2015 IEEE International Conference on Consumer Electronics (ICCE), pp. 501–502. IEEE (2015) 7. Conti, M., Lovisotto, G., Martinovic, I., Tsudik, G.: Fadewich: fast deauthentication over the wireless channel. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 2294–2301. IEEE (2017) 8. Corner, M.D., Noble, B.D.: Zero-interaction authentication. In: Proceedings of the 8th Annual International Conference on Mobile Computing and Networking, pp. 1–11. ACM (2002) 9. Eberz, S., Rasmussen, K., Lenders, V., Martinovic, I.: Preventing lunchtime attacks: fighting insider threats with eye movement biometrics (2015) 10. Faragher, R., Harle, R.: Location fingerprinting with Bluetooth low energy beacons. IEEE J. Sel. Areas Commun. 33(11), 2418–2428 (2015) 11. Huhta, O., Shrestha, P., Udar, S., Juuti, M., Saxena, N., Asokan, N.: Pitfalls in designing zero-effort deauthentication: opportunistic human observation attacks. In: Network and Distributed System Security Symposium (NDSS), February 2016 12. Kaczmarek, T., Ozturk, E., Tsudik, G.: Assentication: user de-authentication and lunchtime attack mitigation with seated posture biometric. In: Preneel, B., Vercauteren, F. (eds.) ACNS 2018. LNCS, vol. 10892, pp. 616–633. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93387-0 32 13. Kajioka, S., Mori, T., Uchiya, T., Takumi, I., Matsuo, H.: Experiment of indoor position presumption based on RSSI of Bluetooth le beacon. In: 2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE), pp. 337–339. IEEE (2014) 14. Kili¸c, Y., Ali, A.J., Meijerink, A., Bentum, M.J., Scanlon, W.G.: The effect of human-body shadowing on indoor UWB TOA-based ranging systems. In: 2012 9th Workshop on Positioning Navigation and Communication (WPNC), pp. 126–130. IEEE (2012) 15. Mare, S., Markham, A.M., Cornelius, C., Peterson, R., Kotz, D.: Zebra: zeroeffort bilateral recurring authentication. In: 2014 IEEE Symposium on Security and Privacy (SP), pp. 705–720. IEEE (2014) 16. Palumbo, F., Barsocchi, P., Chessa, S., Augusto, J.C.: A stigmergic approach to indoor localization using Bluetooth low energy beacons. In: 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2015) 17. P¨ opper, C., Tippenhauer, N.O., Danev, B., Capkun, S.: Investigation of signal and message manipulations on the wireless channel. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 40–59. Springer, Heidelberg (2011). https:// doi.org/10.1007/978-3-642-23822-2 3 18. Rasmussen, K.B., Roeschlin, M., Martinovic, I., Tsudik, G.: Authentication using pulse- response biometrics. In: The Network and Distributed System Security Symposium (NDSS) (2014)

294

M. Conti et al.

19. Sinclair, S., Smith, S.W.: Preventative directions for insider threat mitigation via access control. In: Stolfo, S.J., Bellovin, S.M., Keromytis, A.D., Hershkop, S., Smith, S.W., Sinclair, S. (eds.) Insider Attack and Cyber Security, pp. 165–194. Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-77322-3 10 20. Tey, C.M., Gupta, P., Gao, D.: I can be you: questioning the use of keystroke dynamics as biometrics (2013) 21. Yoo, S.K., Cotton, S.L., Sofotasios, P.C., Freear, S.: Shadowed fading in indoor off-body communication channels: a statistical characterization using the (ku)/Gamma composite fading model. IEEE Trans. Wireless Commun. 15(8), 5231– 5244 (2016) 22. Zhao, Y., Patwari, N., Phillips, J.M., Venkatasubramanian, S.: Radio tomographic imaging and tracking of stationary and moving people via kernel distance. In: 2013 ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pp. 229–240. IEEE (2013)

Similarity of Binaries Across Optimization Levels and Obfuscation Jianguo Jiang1,2 , Gengwang Li1,2 , Min Yu1,2(B) , Gang Li3 , Chao Liu1 , Zhiqiang Lv1 , Bin Lv1 , and Weiqing Huang1 1

2

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China [email protected] School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3 School of Information Technology, Deakin University, Melbourne, VIC, Australia

Abstract. Binary code similarity evaluation has been widely applied in security. Unfortunately, the compiler optimization and obfuscation techniques exert challenges that have not been well addressed by existing approaches. In this paper, we propose a prototype, ImOpt, for reoptimizing code to boost similarity evaluation. The key contribution is an immediate SSA (static single-assignment) transforming algorithm to provide a very fast pointer analysis for re-optimizing more thoroughly. The algorithm transforms variables and even pointers into SSA form on the fly, so that the information on def-use and reachability can be maintained promptly. By utilizing the immediate SSA transforming algorithm, ImOpt canonicalizes and eliminates junk code to alleviate the perturbation from optimization and obfuscation. We illustrate that ImOpt can improve the accuracy of a state-of-theart approach on similarity evaluation by 22.7%. Our experiment results demonstrate that the bottleneck part of our SSA transforming algorithm runs 15.7x faster than one of the best similar methods. Furthermore, we show that ImOpt is robust to many obfuscation techniques that based on data dependency.

Keywords: Binary code similarity engineering · Program analysis

1

· SSA transforming · Reverse

Introduction

Binary similarity evaluation is a general technology with wide applications in security, including malware lineage inference [19,21,22,27], known bugs or vulnerabilities searching [11,12,24] in released binaries. However, the compiler optimization or obfuscation can remove or insert significant instructions respectively, which will cause much unpredictable damage to the similarity evaluation. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 295–315, 2020. https://doi.org/10.1007/978-3-030-58951-6_15

296

J. Jiang et al.

Fig. 1. Compiler optimizer may change the outputs of the code. It has been tested with modern compilers g++8 and clang++9.

Unfortunately, existing approaches to binary similarity evaluation cannot well address the perturbation brought by optimization and obfuscation. Static approaches usually count statistic features based on syntax sequence or instruction categories, which hardly reveal the semantic information. Besides, most of dynamic approaches tend to break the code into smaller fragments for overcoming the problems of path coverage and emulation. But once the junk code, which is inserted by obfuscation or should be eliminated but not, is broken into smaller fragments, these trivial fragments will be taken as a part of the function signature, which further affects the accuracy of similarity evaluation. In this paper, we propose a prototype, ImOpt, to boost similarity evaluation. ImOpt tries to re-optimize lifted binary code to mitigate the impacts from optimization and obfuscation. Nevertheless, there are two challenges to do so. Challenge 1: optimizing binary code demands a precise pointer analysis, which is lacked in compiler optimizers. The irreversible compilation has erased variable information so that much routine processes of compiler optimizer are interrupted. The lack of pointer analysis also leads to the vulnerability that even compiler optimizers can be confounded by a few pointer dependencies, as shown in Fig. 1. Challenge 2: traditional precise pointer analysis methods are costly. A fast pointer analysis normally outputs an unsound result, while a precise pointer analysis always costs too much due to the iterative process [20]. Hasti et al. [15] accommodate to use SSA form for acceleration, while most SSA transforming algorithms only process variables but do not support pointer analysis. To address Challenge 1, we build a framework by integrating a precise pointer analysis for optimizing the code lifted from binaries. The pointer analysis helps to resume the intercepted optimizations by pointers. Among the many optimizations, only those related with canonicalization and elimination are implemented because they influence the similarity evaluation most. Canonicalization tries to transform the logically equivalent expressions into a unified form. Besides, elimination clean out either useless or unreachable code according to the information

Similarity of Binaries Across Optimization Levels and Obfuscation

297

on def-use chain and reachability. Thus, the framework is supposed to be resilient to the code transformation that based on data-dependency. For Challenge 2, we propose an efficient algorithm to support a precise pointer analysis. This algorithm is named as immediate SSA transforming algorithm for satisfying the immediacy property (Sect. 2.2). The algorithm transforms variables and even pointers into SSA form on the fly to link each usage with its dominating definition. The processing order of code blocks is scheduled carefully to defer and reduce redundant re-computations so that only one pass of dense analysis is needed. Moreover, because of the immediacy property, the informa√ tion on dominating definition can be maintained in O(1) instead of O ( n) in the state-of-the-art [20], where n is the number of instructions in the code. In summary, our contributions include: – An algorithm called immediate SSA transforming algorithm to provide a fast and precise pointer analysis. By satisfying the immediacy property, it maintains the information on dominating definition in constant time (Sect. 3). – A framework equipped with pointer analysis for re-optimizing binary code to boost binary similarity evaluation. It is supposed to be resilient to datadependency-based code transformation (Sect. 4). – An obfuscation method that confounds the compiler optimizer to eliminate any specified code blocks. This obfuscation illustrates it is vulnerable to reoptimize binary code with compiler optimizer directly (Sect. 5.2). – We implement our designs in a prototype, ImOpt, and demonstrate that ImOpt can significantly improve the accuracy of a state-of-the-art approach [9] to similarity evaluation against optimization and obfuscation. We also show that the bottleneck module of ImOpt runs 15.7x faster than one of the best similar methods [20] (Sect. 6). The rest of this paper is organized as follows. Section 2 introduces the terminologies. Section 3 details the immediate SSA transforming algorithm. In Sect. 4, the binary optimization framework is constructed. Experiment settings and results are provided in Sect. 5 and Sect. 6. Section 7 lists related works. Finally, Sect. 8 concludes the paper and envisages the future work.

2

Preliminaries

In this section, we present some terminologies related to pointer analysis and briefly recap the immediacy property. 2.1

Notations and Terminology

We assume readers have a basic familiarity with SSA, dominating, dominator, dominance frontier and φ-function from [5]. Here, we recap some key concepts together with their notations. Symbols B, D, S, V are used to represent the set of blocks, definitions, statements and variables. The lowercase letters b, d, s and v denotes the elements

298

J. Jiang et al.

in these sets respectively. The subscripts in bi , di and si are used to stand for the number in the sequence while the subscript in vk always refers to the SSA version number. A definition d for a variable v is reachable to a block b, denoted as d → b, if and only if there is a path from the define-site to block b and that path contains no further definition for variable v. Such a definition is a reachable definition to the block b. If there is only one definition that reaches to the usage, then this definition is a dominating definition of that usage. Moreover, three new concepts are introduced by us: Backward/Forward block. An edge in control-flow graph (CFG) is a back edge if the source-end block is visited after the target-end block in reverse post-order. A block is a backward block if and only if there is a back edge pointing to that block. Otherwise, it is a forward block. Backward/Forward definition. A definition d for variable v is a forward → definition of block b, denoted as d − rc b, if there is such a path p from b to b that no back edge or other definition of v on it. On the contrary a definition − b, if it can reach b but d is a backward definition of block b, denoted as d ← rc is not a forward definition. Backward reachable graph. Given a CFG G(B, E), where B is the set of all the basic blocks and E is the set of all the jump edges, we define the backward reachable graph (BRG) of a backward block bi as the graph G(B  , E  ), where − b } and the edges are E  := {∀e ∈ rc the nodes are B  := {∀bj ∈ B | bj ← i −     ← E |∀b ∈ B , e ∈ p : b rc b}. The BRG of block b is denoted as GBR (b). Note that only the backward block has a BRG. 2.2

The Immediacy Property

The property of immediacy is the key to transform code into SSA form on the fly and to maintain the information on the dominating definition efficiently. Property 1 (Immediacy). An SSA transforming algorithm satisfies the immediacy property if it holds the following two invariants during processing: Global Invariant. Before processing a block, for each free variable v in the block: a) if there is only one definition of it can reach the current block, then the version number of this definition should have been recorded and there is no φ-function for v in the front of the block; b) otherwise, a φ-function for v should be in the front of the block and the variable defined in the φ-function should have a different version number with other definitions of v. Local Invariant. After processing a block, each variable v used in the block has only one definition can reach the usage and the used variable has been bound with the same version number as this definition. For a block under processing, the global invariant ensures every incoming definition is recorded and the necessary φ-functions are inserted before processing. After processing, the local invariant guarantees that each usage is linked with its dominating definition. These two invariants together preserve a correct SSA form. Besides, here we present the deductive condition that will be referred frequently in the following sections:

Similarity of Binaries Across Optimization Levels and Obfuscation

299

Condition 1 (Deductive Condition). Suppose that traversing the CFG G in reverse post-order generates a block sequence, {bn }. For the block bi under processing, suppose our algorithm hold the invariants in Property 1 for every visited block bj , where j < i. 2.3

Useful Properties About Forward and Backward Blocks

The lemmas listed below describe some properties about forward and backward blocks. These properties inspire us on how to design an algorithm that satisfies Property 1 so that the information on dominating definition can be maintained in constant time. The proofs of these lemmas are presented in Appendix B. Reverse post-order is a recommended alternative of the topological order for program analysis. It forms a good order for dominance: Lemma 1. The dominator block is ahead of the dominated one in reverse postorder traversal. Lemma 2. A forward block has only forward definitions reaching itself if all the blocks except bi on the path to the block are in SSA form. Lemma 3. Given Condition 1, if block bi is a forward block, we have: ∀b ∈ B, ∃ d ∈ b , d → bi ⇒ b is ahead of bi in reverse post-order

(1)

The backward definition can reach the backward block along the iterative dominance frontier (IDF) chain of the block b where that definition is defined: Lemma 4. Given Condition 1, if block bi is a backward block, then: − b ⇒ b ∈ G (b ) ∧ b ∈ DF + (b ) rc ∀b ∈ B, ∃ d ∈ b , d ← i BR i i

3

(2)

Immediate SSA Transforming Algorithm

In this section, we detail the immediate SSA transforming algorithm, which is proposed to transform code into SSA form on the fly for a prompt and thorough pointer analysis. 3.1

Main Function

The immediate SSA algorithm is implemented as the ImSsa procedure. Several components are maintained for the invariants during the processing: – – – –

DU ⊆ V × B × S is the set of def-use edges; R ⊆ B × B is the set of reachable edges; E : V → int traces the dominating definitions for each variable; C : V → int maintains each variable’s maximum SSA number, which represents the unallocated SSA version number for every variable.

300

J. Jiang et al.

The ImSsa procedure traverses the CFG in reverse post-order. Due to the different properties as discussed in Sect. 2, the backward blocks are distinguished from forward ones to process separately. 1 2 3 4 5 6

3.2

ImSsa(L) where L is the block list sorted in reverse post-order for bi ∈ L do if bi is visited then continue if bi is forward block then ProcessForwardBlock(bi , Ei ) else ProcessBackwardBlock(bi , Ei ) Forward Block Analysis

Lemma 2 and Lemma 3 together reveal that for a forward block, all the definitions reaching it can be recorded before analyzing it. Therefore in theory, it is possible to transform the code into SSA form on the fly. The ProcessForwardBlock procedure transforms the statements into SSA form on the fly (line 5). 1 2 3 4 5 6 7 8

ProcessForwardBlock(bi ∈ B) if there is no reaching edge ( , bi ) in R then continue Vdef , Ei ← {} , EDOM (i) foreach statement s ∈ bi do ProcessStatement(null, s, bi , Ei , Vdef ) Propagate(bi , Vdef , Ei ) PropagateReachability(bi , Sinvalid ) Mark(bi , visited)

The table Ei , which records the dominating definitions, is inherited from the immediate dominator of bi . As for the entry block, E is initially empty. The ProcessStatement procedure links each variable usage with its dominating definition (line 2). When a new variable is found, it will be assigned with a unique SSA version number to distinguish with other definitions of the defined variable (line 6). 1 2 3 4 5 6 7

3.3

ProcessStatement(vk ∈ V, s ∈ S, b ∈ B, E, Vdef ) SsaUse(s, E) ClientAnalyse(s) if v = (u ← Lhs(s)) then vk ← uE(u) SsaDef(s, b, E, Vdef ) return vk Fetching Dominating Definitions in Constant Time

With Property 1, an algorithm can trace the dominating definitions in constant time. It is based on an observation of SSA-form code: the definition outgoing from

Similarity of Binaries Across Optimization Levels and Obfuscation

301

Fig. 2. Maintaining dominating definitions. Block 4 is under processing right now. The green(or blue) markers indicate the operations on C(or E). The middle sub-figure shows to number the usages(or definition) in statement’s right(or left)-hand side by fetching from C(or E). And the right part shows to update E by copying from C and to update C by monotonic increment. (Color figure online)

the immediate dominator will be propagated to the current block eventually. If not, the statement that intercepts the definition will introduce a φ-function into the front of the current block. For the example in Fig. 2, the definition A.1 outgoing from immediate dominator block 1 propagates into the current block 4. As for the definitions B.1 and C.3 that is intercepted by new definitions, the φ-functions for variable B and C are introduced into block 4. Lemma 1 guarantees the dominator of a block can be visited in advance and Property 1 ensures the φ-functions are inserted before processing. Therefore, we can inherit the dominating definition table from immediate dominator to the current block and then correct that table by processing the φ-functions. The SsaUse procedure transforms each used variable into SSA form: 1 2 3 4 5 6 7 8

SsaUse(s ∈ S, bi ∈ B, E) if s is φ-function then foreach reaching edge (bj , bi ) ∈ R do if bj ∈ / s and bDOM (i) dominates bj then set (bj , Ssa(v), true) in s else foreach variable v used in Rhs(s) do replace usage of v by usage of Ssa(v)

What the Ssa(v) procedure returns actually is vE(v) . But, if a definition coming from the immediate dominator to the current block is partially intercepted, a branch for it would be missing from the φ-function in the current block. Look at Fig. 2, definition C.1 is partially intercepted by C.2 and still can reach block 4. However, the φ-function in block 4 has no branch for C.1 since block 4 is not block 1’s IDF. To fix it, the missing branch is inserted (line 5).

302

J. Jiang et al.

The SsaDef procedure dispatches an unallocated number C(v) to the defined variable v (line 4) to distinguish it from the other definitions. After that, the def-use pair is recorded and the definition is collected for later propagation. 1 2 3 4 5 6 7

SsaDef(s ∈ S, b ∈ B, E, Vdef ) v ← Lhs(s) E(v) ← C(v) + + replace v by Ssa(v) as Lhs(s) foreach variable v used in Rhs(s) do add def-use pair (v, b, s) into DU add defined variable v into Vdef

The invariants of C and E are still held after updating. C(v) increases monotonously (line 3) hence it still records the maximum SSA number which is unallocated. Meantime, E(v) is assigned with the new SSA number C(v) thus E(v) keeps the dominating definition for variable v. 3.4

Propagating Definition and Reachability

The Propagate procedure introduces φ-functions into the direct dominance frontiers of block b for each outgoing definition (line 4). These φ-functions are tentatively marked f alse, which means they are not reachable yet. The PropagateReachability procedure is responsible to enable these φ-functions. 1 2 3 4

Propagate(b ∈ B, Vdef , E) foreach frontier edge (b , b ) of block b do foreach v ∈ Vdef do add branch (b , Ssa(v), f alse) into φ(v) in the front of b

The goal of the PropagateReachability procedure is to extend the reachability from the current block to its successors, to which the jump condition c is not always false. At this point, the disabled branches in the φ-functions that introduced by the Propagate procedure are enabled. 1 2 3 4 5 6 7

PropagateReachability(b ∈ B, E) foreach jump edge (cond, b ) of block b do if cond is always false then continue foreach φ(v) ∈ b do if (b, v, f alse) ∈ φ(v) then set branch (b, Ssa(v), true) in φ(v) in the front of b add reaching edge (b, b ) into R

Obviously, the φ-function is inserted when processing the dominator b. It is not enabled until the dominated block b is processed. Here comes a question: is it possible that the dominance frontier b is visited before its predecessor b which causes the reachability information is missed? That would not happen when b is a forward block since a forward block has no back edge. If b is a backward block, Sect. 3.5 shows how to treat the back edges as forward ones.

Similarity of Binaries Across Optimization Levels and Obfuscation

303

Fig. 3. An example for showing how to handle the backward block and reduce the re-computations. (a) is a CFG starts with a backward block A, whose BRG is in blue. (b) flips the back edges by mirroring a fake block A (in dash circle) at the bottom of the graph so that only forward blocks remains. (c) is the process sequence of our method which does not re-compute until the end of graph. (d) is the sequence of SAPR [20], which re-computes once a processed node is invalidated (after B and E). (Color figure online)

3.5

Backward Block Analysis

Lemma 4 implies that for a backward block bi , all the backward definitions that will reach it can be propagated back into itself along the IDF chain in GBR (bi ). Remember that for forward block, we propagate the definitions along its IDF chain. Hence, the basic idea to handle backward block is to convert its BRG into a graph which contains only forward blocks so that we can handle it with the procedures already presented in previous sections. Figure 3 gives an example. Therefore, the ProcessBackwardBlock procedure processes the BRG of backward block bi by taking bi as a forward block (line 5). 1 2 3 4 5 6

ProcessBackwardBlock(bi ∈ B, Ei , Sinvalid = {}) G ← BackwardReachSubGraph(Gcf , bi ) L ← ReversePostOrder(G ) do ImSsaSparse(L , Sinvalid , bi ) while Sinvalid (L ) is not empty

The BackwardReachSubGraph procedure creates the BRG for the given backward block bi . It scans the CFG in depth-first inversely by starting from the back edges of bi . This process is very cheap and can be computed previously. The ImSsaSparse procedure is quite similar to the ImSsa procedure, except it only recomputes the invalid statements instead of the whole block. The implementation of ImSsaSparse can be found in Appendix A.

304

3.6

J. Jiang et al.

Satisfying Immediacy Property

The ImSsa procedure requires Property 1, the immediacy property, to output correct SSA-form code efficiently, as discussed in Sect. 3.3. It can be proved the ImSsa procedure meets Property 1. Theorem 1. The ImSsa procedure satisfies Property 1. The proof for this theorem is consigned to Appendix B.2. It should be emphasized that no algorithm can exactly hold Property 1 for backward block because it would require a topological sort for the cycling CFG, which is impossible. Here we say Property 1 can be held for backward block, we actually mean the property can be held at the last time of calling ImSsaSparse. Nevertheless, it is sufficient to preserve what we declaimed on the ImSsa procedure.

4

Binary Optimization Framework

The framework tries to canonicalize and eliminate the code for alleviating the impacts from optimization and obfuscation. Since the framework is driven by a pointer analysis, it is supposed to be robust to the code transformation that based on data-dependency. 4.1

Integration with Pointer Analysis

The framework is implemented in the ClientAnalyse procedure for utilizing the pointer analysis provided by the ImSsa procedure. Besides E and C, two more components are maintained: D : V → E is the table mapping variables into definition expressions and A : E → V is the table mapping address expressions into recovered variables. Each variable usage, including the recovered one, is linked with its dominating definition (by Ssa) and then replaced by the corresponding expression (by Propagate). Thus, Canonicalize will have sufficient information to transform the expression correctly. After that, Recover recovers variables from the addresses of the memory-access instructions (such as STORE). 1 2 3 4 5 6 7 8 9 10 11 12

ClientAnalysis(e, E, C, D, A) match e with case VAR (v) do enew ← Propagate(Ssa(v), D) case EXP (op, e · · · ) do for e ∈ e · · · do ClientAnalysis(e , E, C, D, A) enew ← Canonicalize(op, e · · · ) if op is STORE or LOAD then v ← Recover(enew , E, C, A) enew ← Propagate(Ssa(v), D) replace e by enew

Similarity of Binaries Across Optimization Levels and Obfuscation

305

Propagate substitutes the given SSA-form usage with the usage’s definition expression. It complements the original expression with sufficient information to support more precise and thorough canonicalization. The validation of propagation can be guaranteed because the ImSsa procedure ensures each definition that reaches the usage is analyzed and recorded in advance. Besides, the expressions recorded in D consist of only free variables and constants. That is, after propagation, the statement is transformed into such a concrete form that no element in it can be propagated anymore. No doubt we can restrict the maximum expression size to avoid too costly reduction. 4.2

Canonicalization and Elimination

The canonicalization transforms the equivalent but dissimilar expressions into a unified form to preserve the similarity. It refines the pointer analysis in turn by providing normalized expressions. Moreover, it allows many cascade elimination techniques, such as common sub-expression elimination, sparse conditional constant elimination, etc. After that, the elimination runs to remove the junk code which can never be executed or has not effect the program results but will affect the similarity evaluation heavily. For Canonicalization, we first collect about 44 basic algebraic reduction rules that applied in compilers and BAP [2]. Considering that most obfuscation of obfuscator O-LLVM works by complicating expressions or constants, such as substitution and bogus control-flow, obfuscation patterns are reversed into nearly 10 reduction rules. Though some obfuscation patterns may be overwritten or broken by the subsequent optimization, most of them still can be captured by patching a few rules. To match more patterns, we also enumerate a set of reduction rules with the size of 40 for the small size expressions. Some of these rules are like: (x&c) ⊕ (x|c) ⇔ x ⊕ c, (x&c)|(x&!c) ⇔ x, etc. Once canonicalized, the unified code will reveal the reachability that depends on the obfuscated constant expression. That is the reason why the framework could be resilient to these obfuscations that based on data-dependency. To eliminate the junk code, we follow Cytron et al.’s method [5] by substituting their flow analysis with ours. Initially, all the variables, except the ones assumed to affect program outputs, are tentatively marked dead. Next, the live marks propagate along the def-use chain and the reachability resulted from the pointer analysis phase. That is, the variables on which the live ones depend are also marked live. At last, the statements that define dead variables can be eliminated safely. It is clear that the correctness of the elimination relies on the validation of the information on def-use chain and reachability, which we have discussed in Sect. 3.3 and Sect. 3.4. Here we only focus on two types of dead code, which are enough to show good results in our experiments. Nevertheless, our framework can be extended to handle other types by appending other kinds of initial live marks. We can also propagate the dependency across functions to deal with global dead code.

306

5

J. Jiang et al.

Implementation

Our designs are implemented on the analysis platform of BAP, because of BAP’s rich suite of utilities and libraries for binary analysis. We lift the compiled binaries into BIR (BAP’s Intermediate Representation) code by using BAP. Since ImOpt will be compared to compiler LLVM’s optimizer opt in Sect. 6.3, we also lift binaries to LLVM-IR code by virtue of the prominent framework Mcsema1 . After optimized by opt, the LLVM-IR statements are then re-compiled into binaries with Mcsema for being converted into BIR code later. Our designs are language and platform-independent. In this paper, we implement our binary optimization framework into a prototype called ImOpt as a plugin on the analysis platform of BAP. 5.1

Evaluation Module

It is obvious ImOpt optimizes code but does not contain a evaluation module. ImOpt is supposed to be used as a preprocess module. To evaluate the effect of ImOpt, we reproduce Asm2Vec [9], a state-of-the-art of binary similarity evaluation, as the backend for the evaluation. Asm2Vec builds an NLP model to generate features for binary functions. Considering that Asm2Vec is not designed to process the BIR code, we decompose long BIR statements into a sequence of triples along the abstract syntax tree so that the model can digest. 5.2

Pointer-Based Obfuscation

We also construct a pointer-based obfuscation to demonstrate the vulnerability of optimizing binary code with opt. The obfuscation is similar to Fig. 1b but is implemented for LLVM-IR code. We implement it as an LLVM pass to obfuscate the query function. After obfuscation, the query function works just as before obfuscation. However, once someone wants to utilize opt to re-optimize the obfuscated procedure, this function will be processed such that only the junk code remains. opt’s vulnerability shown in Fig. 1b is caused by early optimization, which runs before processing the IR code. The confound example will not happen when using opt to optimize the LLVM-IR code lifted by Mcsema. But, if we recover variables without pointer analysis to boost the performance of opt, that vulnerability will be reproduced.

6

Experiment

6.1

Experiment Settings

The experiments are performed in the system of Ubuntu 16.04 which is running on an Intel Core i7 @ 3.20 GHz CPU with 16G DDR4-RAM. 1

https://www.trailofbits.com/research-and-development/mcsema/.

Similarity of Binaries Across Optimization Levels and Obfuscation

307

Table 1. Comparison across optimization. @K represents the Top-K accuracy. Programs ImOpt + Simulated Asm2Vec Simulated Asm2Vec O0 vs O3 O2 vs O3 O0 vs O3 O2 vs O3 @1 @5 @10 @1 @5 @10 @1 @5 @10 @1 @5 @10 busybox sqlite lua putty curl openssl

.606 .496 .523 .542 .700 .710

.703 .622 .623 .657 .812 .787

.737 .662 .674 .709 .848 .810

.854 .820 .800 .860 .927 .975

.891 .875 .857 .896 .956 .982

.905 .893 .890 .912 .964 .985

.408 .111 .283 .389 .567 .457

.505 .145 .345 .484 .658 .539

.546 .166 .383 .522 .693 .573

.816 .797 .809 .874 .948 .926

.868 .861 .869 .913 .962 .938

.882 .875 .902 .921 .967 .942

Average

.596 .701 .740 .872 .909 .925 .369 .446 .480 .862 .902 .915

Fig. 4. The ROC of the evaluation across optimization levels.

We build the dataset from six real-world open-source programs: busybox, curl, lua, putty, openssl and sqlite. Binary objects are built by compiler GCC 5.4 from these open-source programs with different optimizations. To evaluate the robustness against the obfuscation, some of these programs are also compiled with the widely-used obfuscator O-LLVM 4.02 . Four kinds of obfuscation provided by O-LLVM are applied: substitution (SUB), bogus control-flow (BCF), flatten (FLAT) and basic block split (BBS). BBS therein is frequently used to improve FLAT, so we combine FLAT with BBS as one obfuscation. These obfuscation are configured to be run three times in our experiment. Through the experiments, we are going to answer three research questions. We have argued these propositions, but they need to be supported empirically. Q1: How can ImOpt boost evaluation against different optimization levels? Q2: What is the performance of ImOpt against different obfuscation? Q3: Is the SSA transforming of ImOpt more efficient than the state-of-the-art? 2

https://github.com/obfuscator-llvm/obfuscator/tree/llvm-4.0.

308

J. Jiang et al.

Fig. 5. Comparing ImOpt with optimizer opt against different obfuscation.

6.2

Evaluation Across Optimization Levels

To answer Q1, we simulate Asm2Vec [9] with no optimizer as the baseline and compare it to one equipped with ImOpt. Only case O0-vs-O3 and case O2-vs-O3 are focused in our experiment. The original SSA algorithm is not compared here because it is implied in optimizer opt, which will be evaluated in next section. Table 1 shows that ImOpt improves the Top-1/5/10 accuracy in O0-vs-O3 case by no less than 13.3%/15.4%/15.5% for each program and promotes around 22.7%/25.5%/26.0% on average. Though Asm2Vec claims that it could capture semantic information, the re-optimization still helps to promote the performance. It is noted that the simulated Asm2Vec did not perform as well as the original model under case O0-vs-O3, and this is accordant with the findings in [16]. As for case O2-vs-O3, these two models are comparable in the Top-K accuracy. The reason why re-optimization has little influence in case O2-vs-O3 is probably because the optimization options enabled only by O3 are mostly about code structure, such as loop optimization, which is rarely related with the function of ImOpt. One might have noticed that ImOpt degraded in some programs, one big reason is the optimization magnified the impact of compilation-related constants for short functions. Figure 4a shows a significant improvement brought by ImOpt in case O0-vsO3 from another perspective. As for case O2-vs-O3, though simulated Asm2Vec has comparable Top-K accuracy to the one with ImOpt, the ROC in Fig. 4b says the latter will performs slightly better than the former after a very small false positive rate (less than 2.0%). Also, the good performance indicates that ImOpt has very little negative effect on similarity evaluation. 6.3

Comparison Against Obfuscation

To address Q2, we evaluate ImOpt on the O0-vs-O3 task against four kinds of obfuscation: SUB, BCF, FLAT (with BBS) and the pointer-based obfuscation

Similarity of Binaries Across Optimization Levels and Obfuscation

309

Table 2. Overhead on maintaining dominating definition. The % column indicates the proportion of the overhead in the cost of ImSsa. Programs Overhead on Dominating Definition (ms) Name Functions % SDA ImSsa Fetch Update Fetch Update Clone lua putty curl sqlite busybox openssl Total

827 1,050 1,036 3,133 4,279 9,812 20,137

45.5% 81.1% 80.1% 81.0% 64.3% 75.3%

658 4,795 20,159 17,608 14,719 64,021

87 231 576 1,177 1,332 2,201

46 117 207 594 590 1,175

22 50 95 259 281 508

24 176 771 586 552 2,053

121,960

5,604

2,729

1,215

4,162

(Sect. 5.2). In this experiment, simulated Asm2Vec without optimizer continues to be used as the baseline. To explain the motivation for designing ImOpt, we further equip simulated Asm2Vec with three optimizers respectively: 1) the LLVM compiler’s optimizer opt; 2) opt with variable recovery; and 3) ImOpt. The second control group which combines opt with variable recovery is constructed to live up to the potential of optimizer opt. Figure 5 shows the evaluation results. ImOpt performs stably against SUB, BCF and pointer-based obfuscation, but has a bad performance against FLAT. As for FLAT which deals with structure instead of data, ImOpt has no special effect on it. The evaluation of ImOpt against SUB has a few more false positives. It is because compiler’s early optimization may reform a few obfuscation rules so that the rules become too complex to be matched. ImOpt shows resilience to the obfuscation that based on data dependency. Optimizing by optimizer opt without variable recovery has the similar performance as the NONE case. The main reason is that without variable recovery most optimizations of opt have been intercepted. By enabling variable recovery, optimizer opt archives comparable accuracy with ImOpt in the non-obfuscation case. Unfortunately, the performance of optimizer opt with variable recovery degraded into 0% when the pointer-based obfuscation is applied. This confirms that it is insufficient to optimize binary code by using optimizer opt directly. 6.4

Efficiency

Regarding Q3, we demonstrate the efficiency of the ImSsa procedure of ImOpt by comparing it to SDA [20], which is the most efficient approach that supports both pointer and reachability analysis. SDA leverages Quadtree to speed up the analysis. we mainly compare the overhead on dominating definition, since it takes up a large proportion of SSA transforming, as shown in Table 2. The experiment result shows that ImOpt costs only 8.1 s to maintain the information over 20, 137 functions while SDA need 127.5 s.

310

J. Jiang et al.

The copies of table E between blocks may be the implicit cost of ImSsa. The Clone column in Table 2 shows that this extra overhead is very small and has a approximately linear relationship with the Update of ImOpt. For the most costly Fetch operation, ImSsa outperforms SDA by a factor of 14.3x to 97.4x as the average number of statements grows. All in all, ImSsa has a 15.7x speed-up on average for maintaining the information about dominating definition.

7

Related Works

Binary Code Similarity Evaluation. Most traditional static approaches deal with the statistic features based on syntax sequence or structure, such as the frequency of operators and operands [18], the categories of instructions [11,12], and the editing distance of code fragments [8]. Some methods [1,25] extract features based on the isomorphism of CFG. These statistic features rarely capture semantic information so that none of them can strip the junk code. Dynamic approaches based on the fact that the compilation does not change the execution result. To model the similarity, Zsh [6] and BinHunt [13] leverage theorem prover to verify the logical equivalence; and Blanket [10], CACompare [17], BinGo [3] analyse the input-output mappings of binary functions after emulating the executions. However, despite several trite questions existing in code emulation, they are not resilient to the elimination related optimizations. It is because most of them will split the binary function into smaller pieces (blocks [13], strands [6], tracelets [3], graphlets [28], blankets [10], etc.), where the junk code still cannot be distinguished. It is worth noting that GitZ [7] leverages opt to re-optimize IR code, but there are several defects for GitZ : 1) it only enables options -early-cse and -instcombine, which is not able to cover most of the optimization effect as ours; 2) it optimizes code strands; 3) as we have shown, compiler optimizer is bad at optimizing the reversed IR code. Pointer Analysis. Flow-sensitive pointer analysis is demanded for a complete and precise data-flow analysis. Traditional iterative data-flow analysis [4] methods are too slow to be practical because of too much unnecessary re-analyzing. Pre-analysis based methods [14,23] sacrifice analysis precision for efficiency by constructing a sound approximation of the memory accessing addresses at the previous auxiliary analysis stage. To accelerate the analysis while retaining the precision, Tok et al. [26] and Madsen et al. [20] propose to compute the def-use information √ on the fly. But, the former is still an iterative method. And the latter costs O ( n) to find dominating definitions, which only needs O(1) with ours.

8

Conclusion

In this paper, we present a binary optimization framework to re-optimize lifted code for alleviating the affects by optimization. We have shown that optimizer opt is insufficient to re-optimize the lifted binary code. To support our framework

Similarity of Binaries Across Optimization Levels and Obfuscation

311

with an efficient and precise pointer analysis, we proposed the immediate SSA transforming algorithm. The way we maintain the dominating definitions is faster than Madsen et al.’s method [20] by a factor of 15.7x on average. Furthermore, we implement our designs into a prototype called ImOpt. The experiments confirm that ImOpt can be used to boost the performance of a state-of-the-art [9] with an average accuracy improvement of 22.7%. It is also demonstrated that ImOpt is robust to data-dependency-based obfuscation, such as O-LLVM obfuscation and pointer-based obfuscation. Of course, the proposed approach is not without limitations. The algebraic reduction rules equipped by ImOpt need to be manually constructed since the problem of algebraic reduction is NP-hard. That is, ImOpt cannot guarantee bug-free analysis though it is difficult to be defrauded. However, it is convenient to fix ImOpt by extending the reduction rule set. Acknowledgments. This work is supported by National Natural Science Foundation of China (No. 61572469).

A

Appendix 1: Sparse Analysis for Backward Block

The ImSsaSparse procedure is quite similar to the ImSsa procedure, except it sparsely recomputes the visited but invalid block (line 4) to avoid unnecessary re-computation: 1 2 3 4 5 6 7

ImSsaSparse(L, G, Sinvalid , bexcept ) for bi ∈ L do if bi is visited then SparseRecompute(bi , Ei , Sinvalid ) else if bi is forward block or bi == bexcept then ProcessForwardBlock(bi , Ei ) else ProcessBackwardBlock(G, bi , Ei , Sinvalid )

The SparseRecompute procedure is similar to the ProcessStatement procedure, except the former only process invalid statements. Another difference is SparseRecompute needs to propagate invalidation through the def-use chain.

B B.1

Appendix 2: Supplementary Proofs Appended Proofs for Section 2.3

Proof (Proof of Lemma 2). Let us assume that there is a backward definition d ∈ b reaching the forward block bi , and all the blocks except bi on any path p : b → bi are in the SSA form. Then, there should be a back edge e : bj → bk on the path p : b → bi . Note the block bk can not be the block bi since a forward block has no back edge pointing to itself. So, the block bk should be in SSA form. However, in SSA-form code, a back edge will introduce φ-functions into

312

J. Jiang et al.

the block bk that is at the target end of the edge for every definition that passes through the block bk . These φ-functions will prevent definition d reaching to the successor block bi , which contradicts the assumption. Hence, a forward block has only forward definitions reaching itself in SSA-form code. Proof (Proof of Lemma 3). According to Condition 1, the first i − 1 blocks have already been in SSA form. Then following Lemma 2, only forward definitions can reach block bi . Recall that if a forward definition d can reach a block bi , then → there is no back edge on the reaching path p : d − rc bi , which means that the  block b which defines d is ahead of block bi in reverse post-order. Proof (Proof of Lemma 4). According to the definition of the BRG, it is trivial that b ∈ GBR (bi ). If there is a path p : b → bi , then bi is either an IDF of block b or dominated by an IDF. Besides, there should be a back edge on path − b . If the back edge is in the middle of path p, then the φ-function p since d ← rc i introduced by it will intercept definition d. That contradicts with the fact that definition d can reach block b. Hence, the back edge can only be the last edge on path p. It means the backward block b cannot be dominated by any block on path p. Therefore, the backward block b is an IDF of block b . B.2

Appended Proofs for Section 3.6

To prove Lemma 1, we first show that given Condition 1 the property is satisfied for the i-th block, no matter the block is a forward or backward one. Lemma 5. Given Condition 1, if the i-th block bi is a forward block, then the ImSsa procedure also satisfies the invariants in Property 1 for block bi . Proof (Proof Sketch of Lemma 5). Based on Lemma 3, all the reachable definitions can be visited before visiting the current block. Global invariant: the variables with dominating definition are linked by EDOM (i) directly, whose validation is confirmed by Condition 1; the other variables, which have multiple reachable definitions, have the φ-functions being inserted in the front of the block when processing the blocks that define these definitions; Hence, the global invariant is held. Local invariant: E and C are well-defined at the beginning of local analysis since the global invariant is held; moreover, their invariants are preserved during the running of ProcessForwardBlock procedure. In conclusion, the local invariant is held. Lemma 6. If the CFG contains only forward block, the ImSsa procedure satisfies the invariants in Property 1 for all the blocks in that graph. Proof. This lemma simply follows Lemma 5. Lemma 7. Given Condition 1, if the i-th block bi is a backward block, the ImSsa procedure also satisfies the invariants in Property 1. Furthermore, for each block in the BRG, the invariants in Property 1 are also satisfied.

Similarity of Binaries Across Optimization Levels and Obfuscation

313

Proof (Proof Sketch of Lemma 7). Assume that all blocks except bi in the BRG are forward blocks. According to Lemma 4 and Lemma 6, the ImSsa procedure can propagate all the backward definitions back into block bi along with the IDF chain by processing GBR (bi ). Additionally, the following SparseRecompute procedure will reveal any new definitions caused by the re-computation and propagate them back, too. Hence, the global invariant is held. Recall that the invariants of E and C are also maintained during the sparse re-computations, therefore, the local invariant is also held. Furthermore, since the BRG GBR (bi ) contains only forward blocks, according to Lemma 6, these invariants are also satisfied for each block in the BRG. As for the case where the BRG contains backward blocks, it can be deductively reasoned from the previous case. It follows that the ImSsa procedure satisfies Property 1: Proof (Proof Sketch of Lemma 1). It is reasonable to assume a forward entry block since we can always insert an empty forward block at the beginning of the CFG. Therefore, the first block satisfies the global invariant because of no reachable definition. Meanwhile, the local invariant is also satisfied according to Lemma 5. That is, the ImSsa procedure satisfies the invariants for the first block. Then the conclusion can be deductively reasoned from Lemma 5 and Lemma 7.

References 1. Bourquin, M., King, A., Robbins, E.: BinSlayer: accurate comparison of binary executables. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop (2013) 2. Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 463–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-64222110-1 37 3. Chandramohan, M., Xue, Y., Xu, Z., Liu, Y., Cho, C.Y., Tan, H.B.K.: Bingo: crossarchitecture cross-OS binary search. In: International Symposium on Foundations of Software Engineering, pp. 678–689. ACM (2016) 4. Chase, D.R., Wegman, M., Zadeck, F.K.: Analysis of pointers and structures. ACM SIGPLAN Not. 25(6), 296–310 (1990) 5. Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13(4), 451–490 (1991). https://doi.org/10.1145/ 115372.115320 6. David, Y., Partush, N., Yahav, E.: Statistical similarity of binaries. ACM SIGPLAN Not. 51, 266–280 (2016) 7. David, Y., Partush, N., Yahav, E.: Similarity of binaries through re-optimization. ACM SIGPLAN Not. 52, 79–94 (2017). https://doi.org/10.1145/3140587.3062387 8. David, Y., Yahav, E.: Tracelet-based code search in executables. ACM SIGPLAN Not. 49, 349–360 (2014) 9. Ding, S.H., Fung, B.C., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP). IEEE (2019)

314

J. Jiang et al.

10. Egele, M., Woo, M., Chapman, P., Brumley, D.: Blanket execution: dynamic similarity testing for program binaries and components. In: USENIX Security Symposium, pp. 303–317 (2014) 11. Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient crossarchitecture identification of bugs in binary code. In: NDSS (2016) 12. Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491 (2016) 13. Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008). https://doi.org/10. 1007/978-3-540-88625-9 16 14. Hardekopf, B., Lin, C.: Flow-sensitive pointer analysis for millions of lines of code. In: International Symposium on Code Generation and Optimization, pp. 289–298. IEEE (2011) 15. Hasti, R., Horwitz, S.: Using static single assignment form to improve flowinsensitive pointer analysis. ACM SIGPLAN Not. 33(5), 97–105 (1998) 16. Hu, Y., Wang, H., Zhang, Y., Li, B., Gu, D.: A semantics-based hybrid approach on binary code similarity comparison. IEEE Trans. Softw. Eng. (2019) 17. Hu, Y., Zhang, Y., Li, J., Gu, D.: Binary code clone detection across architectures and compiling configurations. In: Proceedings of the 25th International Conference on Program Comprehension, pp. 88–98. IEEE Press (2017) 18. Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Proceedings of the 22nd USENIX Conference on Security (2013) 19. Lindorfer, M., Di Federico, A., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 349–358 (2012) 20. Madsen, M., Møller, A.: Sparse dataflow analysis with pointers and reachability. In: M¨ uller-Olm, M., Seidl, H. (eds.) SAS 2014. LNCS, vol. 8723, pp. 201–218. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10936-7 13 21. Ming, J., Xu, D., Jiang, Y., Wu, D.: BinSim: trace-based semantic binary diffing via system call sliced segment equivalence checking. In: USENIX Security Symposium, pp. 253–270 (2017) 22. Ming, J., Xu, D., Wu, D.: Memoized semantics-based binary diffing with application to malware lineage inference. In: Federrath, H., Gollmann, D. (eds.) SEC 2015. IAICT, vol. 455, pp. 416–430. Springer, Cham (2015). https://doi.org/10. 1007/978-3-319-18467-8 28 23. Oh, H., Heo, K., Lee, W., Lee, W., Yi, K.: Design and implementation of sparse global analyses for C-like languages. 47(6), 229–238 (2012) 24. Pewny, J., Schuster, F., Bernhard, L., Holz, T., Rossow, C.: Leveraging semantic signatures for bug search in binary programs. In: Proceedings of the 30th Annual Computer Security Applications Conference, pp. 406–415 (2014) 25. Shirani, P., Wang, L., Debbabi, M.: BinShape: scalable and robust binary library function identification using function shape. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 301–324. Springer, Cham (2017). https://doi. org/10.1007/978-3-319-60876-1 14 26. Tok, T.B., Guyer, S.Z., Lin, C.: Efficient flow-sensitive interprocedural data-flow analysis in the presence of pointers. In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 17–31. Springer, Heidelberg (2006). https://doi.org/10.1007/ 11688839 3

Similarity of Binaries Across Optimization Levels and Obfuscation

315

27. Walenstein, A., Lakhotia, A.: The software similarity problem in malware analysis. In: Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum f¨ ur Informatik (2007) 28. Wei, M.K., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: 2013 10th IEEE Working Conference on Mining Software Repositories (MSR) (2013)

HART: Hardware-Assisted Kernel Module Tracing on Arm Yunlan Du1 , Zhenyu Ning2 , Jun Xu3 , Zhilong Wang4 , Yueh-Hsun Lin5 , Fengwei Zhang2(B) , Xinyu Xing4 , and Bing Mao1 2

1 Nanjing University, Nanjing, China Southern University of Science and Technology, Shenzhen, China [email protected] 3 Stevens Institute of Technology, Hoboken, USA 4 Pennsylvania State University, State College, USA 5 JD Silicon Valley R&D Center, Mountain View, USA

Abstract. While the usage of kernel modules has become more prevalent from mobile to IoT devices, it poses an increased threat to computer systems since the modules enjoy high privileges as the main kernel but lack the matching robustness and security. In this work, we propose HART, a modular and dynamic tracing framework enabled by the Embedded Trace Macrocell (ETM) debugging feature in Arm processors. Powered by even the minimum supports of ETM, HART can trace binary-only modules without any modification to the main kernel efficiently, and plug and play on any module at any time. Besides, HART provides convenient interfaces for users to further build tracing-based security solutions, such as the modular AddressSanitizer HASAN we demonstrated. Our evaluation shows that HART and HASAN incur the average overhead of 5% and 6% on 6 widely-used benchmarks, and HASAN detects all vulnerabilities in various types, proving their efficiency and effectiveness. Keywords: Kernel module

1

· Dynamic tracing · ETM · Arm

Introduction

To enhance the extensibility and maintainability of the kernel, Linux allows third parties to design their own functions (e.g., hardware drivers) to access kernel with loadable kernel modules. Unlike the main kernels developed by experienced experts with high code quality, the kernel modules developed by third parties lack the code correctness and testing rigorousness that happens with the core, resulting in more security flaws lying in the kernel modules. Some kernel vulnerability analyses [25,40] show that CVE patches to the Linux repository involving kernel drivers comprise roughly 19% of commits from 2005 to 2017. The conditions only worsen in complex ecosystems like Android and smart IoT devices. In Y. Du and Z. Ning—These authors contributed equally to this work. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 316–337, 2020. https://doi.org/10.1007/978-3-030-58951-6_16

HART: Hardware-Assisted Kernel Module Tracing on Arm

317

2017, a study on vulnerabilities in the Android ecosystem [39] shows that 41% of 660 collected bugs came from kernel components, most of which were device drivers. To mitigate the threats of vulnerable kernel modules, the literature brings three categories of solutions. As summarized in Table 1, all these solutions carry conditions that limit their use in practice. The first category of solution is to detect illegal memory access [5,6,11,16] at run-time, aiming to achieve memory protection in the kernel. As such solutions require to instrument memory accesses and perform execution-time validation, they need source code and incur high performance overhead. These properties limit solutions in this category for only debugging and testing. The second category of approaches explores to ensure the integrity of the kernel [23,26,53,60,64], including control flow, data flow and code integrity. However, these strategies introduce significant computation overhead and often require extensive modification to the main kernel. The last category of solutions isolates the modules from the core components [22,24,31, 48,51,54,58,61,65], so as to confine the potentially corrupted drivers running out of the kernel. By design, the isolation often requires source code and comes with significant instrumentation to the kernel and the driver. Table 1. Comparison between existing kernel protection works and ours. Non-intrusive represents whether the tools have system-level modification of kernel, user space libraries, etc. (✓ = yes, ✗ = no, ✱ = partially supported). Approach category

Binary-support Non-intrusive Low overhead Representative works (in the order of time)

Memory Debugger







Slub debug [5], Kmemleak [16], Kmemcheck [11], KASAN [6]

Integrity Protection







KOP [23], HyperSafe [60], HUKO [64], KCoFI [26], DFI for kernel [53]

Kernel Isolation







Nooks [54], SUD [24], Livewire [22], SafeDrive [65], SecVisor [51]

Our method ✓





HASAN

To overcome these limitations, we propose HART, a high performance tracing framework on Arm. The motivation of HART is to dynamically monitor the execution of the third-party Linux kernel modules even without module source code. HART utilizes the ETM debugging component [9] for light-weight tracing of both control flow and data access. Combing hardware configurations and software hooks, HART further restricts the tracing to selectively work on specific module(s). With the support of the open interfaces provided by HART, various security solutions against module vulnerabilities can be established. We

318

Y. Du et al.

then demonstrate a modular AddressSanitizer named HASAN. Compared with the previous solutions, our approach is much less intrusive as it requires zero modification to other kernel components, and is fully compatible with any commercial kernel module since it requires no access to any source code. Besides, our empirical evaluation shows that HASAN can detect the occurrence of memory corruptions in binary-only modules with a negligible performance overhead. The design of HART overcomes several barriers. First, for generality, we need to support ETM with the minimum configuration, particularly the size of Embedded Trace Buffer (ETB) that stores the trace. Take Freescale i.MX53 Quick Start Board as an example. Its ETB has a size of 4 KB and can get fully occupied very frequently. The second barrier is that many implementations of ETB raise no interrupt when the buffer is full, thereby the trace can be frequently overwritten. To tackle the above two challenges and retrieve the entire trace successfully, we leverage the Performance Monitoring Unit (PMU) to throw an interrupt after the CPU executes a certain number of instructions, ensuring that the trace size will never exceed the capacity of ETB. Finally, to optimize the performance of HART, we design an elastic scheduling to assure the decoding thread of rapid parsing while avoiding the waste of CPU cycles. Under this scheduling, the thread will occupy the CPU longer while the trace grows faster. Otherwise, the thread will yield the control of CPU more frequently. Our main contributions are summarized as follows. – We design HART, an ETM-assisted tracing framework for kernel modules. With the minimal hardware requirements of ETM, it achieves low intrusiveness, high efficiency, and binary-compatibility for kernel module protections. – We implement HART with Freescale i.MX53 Quick Start Board running Linux 32-bit systems. Our evaluation with 6 widely used module benchmarks shows that HART incurs an average overhead of 5%. – We build HASAN on the top of HART to detect memory vulnerabilities triggered in the target module(s). We evaluate HASAN on 6 existing vulnerabilities ported from real-world CVE cases and the above benchmarks. All the vulnerabilities are detected with an average overhead of only 6%.

2

Background and Problem Scope

In this section, we first introduce the background of kernel module and ETM, and then define our problem scope. 2.1

Loadable Kernel Module

In practice, the third-party modules are mostly developed as loadable kernel modules (LKM). An LKM is an object file that contains code to extend the main kernel, such as building support for new hardware, installing new file systems, etc. For the ease of practical use, an LKM can be loaded at anytime on-demand and unloaded once the functionality is no longer required. To support accesses

HART: Hardware-Assisted Kernel Module Tracing on Arm

319

Fig. 1. Interactions between an LKM and the users space as well as kernel space.

Fig. 2. A general model of ETM.

Fig. 3. Architecture of HART.

from the user space, LKM usually registers a group of its functions as callbacks to system calls in the kernel (e.g., ioctl). In Fig. 1, a client (e.g., user space application) can execute those callbacks via the system calls. Sitting in the kernel space, an LKM can also freely invoke functions exported by the main kernel. 2.2

Embedded Tracing Microcell

ETM is a hardware debugging feature in Arm processors since two decades ago. It works by tracing the control flow and memory accesses of a software’s execution. Being aided by hardware, ETM incurs less than 0.1% performance overhead [43], barely decelerating the normal execution. While ETMs on different chips have varied properties, it generally follows the model in Fig. 2. When CPU executes instructions, ETM collects information about control flows and memory accesses. Such information can be compressed and stored into an on-chip memory called Embedded Trace Buffer (ETB), which is accessible through mapped I/O or debugging interfaces such as JTAG or Serial Wire interface. Trace generated by ETM follows standard formats. A Sync packet is produced at the start of execution, containing information about the entry point and the process ID (CID); a P-header packet indicates the number of sequential instructions and not-taken instructions executed since the last packet; a Data packet records the memory address to be accessed by the instruction. We can fully reconstruct the control flow and memory accesses by decoding such packets. 2.3

Problem Scope

We consider the generality of our work from two aspects. On the one hand, the framework should have compatibility with the majority (if not all) of ETM. We investigate the ETM features on several early and low-end SoCs in Table 2,

320

Y. Du et al. Table 2. ETM and ETB on early SoCs.

SoC

Devices

ETB size ETM feature DATa ARFb

Qualcomm Snapdragon 200 [18] Xperia E1, Moto E

≥4 KB

×



Samsung Exynos 3110 [17]

Galaxy S, Nexus S

≥4 KB





Apple A4 [7]

iPhone 4

≥4 KB





Huawei Kirin 920 [36]

Mate 7, Honor 6

≥4 KB

×



NXP i.MX53 [49]

iMX53 Quick Start Board 4 KB





Juno r0 Board

×



Arm Juno [4] a: Data Address Trace b: Address Range Filter

64 KB

and summarize the following most preliminary ETM configuration for HART to work with: ① ETM supports tracing of both control flow and memory accesses. ② ETM supports filtering by address range. ③ ETB is available for trace storage, yet with a limited buffer size - 4 KB. ④ ETB raises no interrupts when it gets full. On the other hand, the framework should not impose restrictions on the target modules. We then accordingly make a minimal group of assumptions about the target module: ① Source code of the target module is unavailable. ② The target module and other kernel components shall be intact without any modification. ③ The target module is not intentionally malicious.

3

HART Design and Implementation

In this section, we concentrate on HART, our selective tracing framework for kernel modules. We first give an overview of HART in Fig. 3. HART is designed as a standalone kernel driver. At the high-level, it manages ETM (and ETB) to trace the execution of target modules, and PMU to raise interrupts to address the problem of overwritten trace in ETB, and a Trace Decoder to parse the ETM trace. HART also hooks entrances/exits to/from the target modules, to coordinate with PMU and provide open interfaces for further usage. Following the workflow, we then delve into the details of our design and implementation. 3.1

HART Design

Initialization for Tracing. Before starting a module, the Linux kernel performs a set of initialization, including loading data and code from the module, relocating the references to external and internal symbols, and running the init function of the module. HART intercepts the initialization to prepare for tracing. Details are as follows. First, HART builds a profile for the module, with some necessary information about the loading address, the init function, and the kernel data structure that manages this module, for later usage. Since such information will be available

HART: Hardware-Assisted Kernel Module Tracing on Arm

321

after the kernel has finished loading the module, we capture the initialization at this point of time with a callback registered through the trace-point in the load module kernel function, without intrusion to the kernel. Then, HART hooks the entrances to the module and the exits from the module, which is the base of PMU management and open interfaces for users of HART. As Fig. 1 depicted before, the modules are mainly entered and exited either through the callbacks registered to the kernel, or the external functions invoked by such callbacks. We deal with different cases as follows. In the first case, those callbacks are typically functions within the module, waiting for the invoking from the kernel. They are usually registered through code pointers stored in the .data segment. In the course of module loading, the code pointers will be relocated to the run-time addresses during the initialization, so that we can intercept the relocation and alter these pointers to target the wrappers defined by HART. The comparison before and after the HART’s alteration in the code pointers’ relocation can be observed from Fig. 4a, presented with concrete examples. Thus, our wrappers are successfully registered as the callbacks to be invoked by the kernel, and can perform a series of HARTdesigned handlers, while ensuring the normal usage of original functions. In the other case that callbacks invoke external functions, such external symbols (e.g., kmalloc) also need a similar relocation to fix their references to the run-time addresses. Likewise, HART intercepts such relocation and directs the external symbols to the HART-designed wrappers, as is shown in Fig. 4b. So far, as we consider both conditions, all of the entrances and exits are under the control of HART via the adjusted relocation to our wrappers. Finally, HART gets ETM and ETB online for tracing. To be specific, HART allocates a large continuous buffer hart buf, for the sake of backing up ETB data. In our design, its buffer size is configured as 4 MB. Meanwhile, HART spawns a child thread, which monitors data in hart buf and does continuous decoding. We will cover more details about the decoding later in Sect. 3.1. Then, HART enables ETM with ETB, and configures ETM to only trace the range that stores the module. So far, HART finishes the initialization and starts the module by invoking the module’s init function.

Fig. 4. Relocation of external calls and internal function references in modules.

322

Y. Du et al.

Table 3. Wrapper for func1 in Fig. 4a. 1 2 3 4 5 6 7 8 9 10 11 12

ENTRY(func1_hart): SAVE_CTX //save context MOV R0,LR //pass and save LR in HART BL resume_pmu //call resume_pmu MOV LR,R0 //return addr of ori. func1 RESTORE_CTX //restore context BL LR //call original func1 SAVE_CTX //save context BL stop_pmu //call stop_pmu MOV LR,R0 //return LR from HART RESTORE_CTX //restore context BL LR //return back

Table 4. Wrapper for 1 2 3 4 5 6 7 8 9 10 11 12 13 14

kmalloc.

void * hart__kmalloc(size_t size, gfp_t flags){ /*stop PMU*/ cur_cnt = stop_pmu(); /*instrumentation hooks*/ pre_instrumention(); /*original kmalloc*/ addr = __kmalloc(size, flags); /*instrumentation hooks*/ post_instrumentation(); /*resume PMU*/ reset_pmu(cur_cnt); return addr; }

Continuous and Selective Tracing. After the above initialization, ETM will trace execution of the module and store its data in ETB. Under the assumption that the ETB hardware raises no interrupt to alert HART to a full ETB, the remaining challenge is to timely interrupt the module before ETB is filled up. In this work, HART leverages the instruction counter shipped with PMU. Specifically, we start PMU on counting the instructions executed by the CPU. Once the number of instructions hits a threshold, PMU will raise an interrupt, during which HART can copy ETB data out to hart buf. Generally, an Arm instruction can lead to at most 6 bytes of trace data1 . This means a 4 KB ETB can support the execution of at least 680 instructions. Also considering that the PMU interrupts often come with a skid (less than 10 instructions [43]), we set up 680-10 as the threshold for PMU. In our evaluation with real-world benchmarks, we observe the threshold is consistently safe for avoiding overflow in ETB. Though PMU aids HART in preventing trace loss, it causes an extra issue. PMU counts every instruction executed on the CPU, regardless of the context. This means PMU will interrupt not only the target module, but also the other kernel components, leading to high-frequency interruptions and tremendous down-gradation in the entire system. To restrict PMU to count inside the selective module(s), we rely on the hooks HART plants during initialization. When the kernel enters the module via a callback, the execution will first enter our wrapper function. In the wrapper, we resume PMU to count instructions, invoke the original callback, and stop PMU after the callback returns in order, as is shown in Table 3. Our wrapper avoids any modification to the stack and preserves all needed registers when stop/resume PMU, so as to keep the context to call the original init and maintain the context created by init back to the kernel. As aforementioned, the target module may also call for external functions, which are also hooked. In those hooks, HART first stops PMU, then invokes the 1

Some instructions (e.g., LDM) carry more than 6 bytes of trace data, but appear in a relatively few cases. We also take into account the data generated by data access in ETB. Actually, the information in the raw trace output is highly compressed, and a single byte in the trace output can represent up to 16 instructions in some cases.

HART: Hardware-Assisted Kernel Module Tracing on Arm

323

Fig. 5. Parallel trace data saving and decoding.

intended external function, and finally resumes PMU. Table 4 demonstrates the wrapper function of kmalloc when it is called by the target module. The last challenge derives from handling PMU interrupts. By intuition, we can simply register a handler to the kernel, which stops PMU, backs up ETB data, and then resumes PMU. However, this idea barely works in practice. Actually, on occurrence of an interrupt, the handling starts with a general interface tzic handle irq. This function retrieves the interrupt number, identifies the handler registered by users, and finally invokes that handler. After the user handler returns, tzic handle irq will check for further interrupts and handle them. Oftentimes, when we handle our PMU interrupts, new interrupts would come. If we resume PMU inside our handler, PMU would count the operations of tzic handle irq handling the newly arrived interrupts. Following the tzic handle irq handling process, it would trigger another PMU interrupt handler, eventually resulting in endless interrupt handling loop. To address the above problem, we hook the tzic handle irq function with hart handle irq. And right before it exits to the target module, we resume PMU. When our PMU interrupts come, the handler hart pmu handler regisTable 5. Algorithm for backing up ETB data inside PMU interrupt handler and the parallel decoding thread for parsing trace with an elastic scheduling scheme. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

void retrieve_trace_data(){ /*get trace size*/ trace_sz = bytes_in_etb(); /*copy trace*/ cpy_trace(trace_buf, etb, trace_sz); /*update write offset and rounds*/ wr_off += trace_sz; if(wr_off >= trace_buf_sz){ wr_off %= trace_buf_sz; wr_rnd ++; } /*overwrite is possible, pause the target module*/ if(wr_rnd > rd_rnd && rd_off - wr_off < 4KB) SIGSTP(target_pid); }

(a) ETB data backup.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

void parse_trace(){ while(true){ /*calculate valid trace size*/ if(rd_rnd == wr_rnd) trace_sz = wr_off - rd_off; else trace_sz = trace_buf_sz rd_off + wr_off; /*decode trace*/ decode(trace_buf, rd_off, trace_sz); /*update rd_off and rd_rnd*/ rd_off = wr_off; if(rd_rnd < wr_rnd) rd_rnd ++; if(target_paused(target_pid)) SIGCONT(target_pid) /*yield CPU based on the workload of decoding*/ msleep(100 - CONS * trace_sz/trace_buf_sz); yield(); } }

(b) Parallel decoding thread.

324

Y. Du et al.

tered by us will back up the ETB data. In this way, we prevent the above endless interrupt handling loop while obtaining continuous and selective ETM trace. Elastic Decoding. With the help of PMU, HART can timely interrupt the target module and back up the ETB data. To reduce time cost of backup, in the interrupt handler, HART simply copies the ETB data to the previously allocated hart buf. Meanwhile, HART maintains wr off and wr rnd, respectively indicating the writing position to hart buf and how many rounds HART has filled up this buffer. To decode the trace efficiently, HART spawns a decoding thread for each target module. This thread monitors and keeps decoding the remaining trace data, where rd off and rd rnd indicate the current parsing position and how many rounds it has read through the entire buffer. This design of parallel trace backup and decoding, as shown in Fig. 5, however, involves two issues. We describe the details and explain how we address them as follows. First, an over-writing problem is likely to happen when the backup function runs beyond the decoder by more than one lap, leading to the error of losing valid data. A similar over-reading error comes due to a similar lack of synchronization. In order to maintain the correctness of the concurrency, we design algorithms for both the backup function and the decoder with checks of over-writing/reading possibilities. On the one hand, as the backup algorithm in Table 5a presents, if wr rnd is larger than rd rnd while rd off is less than 4 KB ahead of wr off, a new operation of backing up 4 KB data will potentially overwrite the previous unprocessed trace. In case of this, HART will pause the task that is using the target module, and restart via the decoder later when it is safe to continue. On the other hand, as the decoding algorithm in Table 5b depicts, if the decoder and the backup function arrive at the same location, the decoding thread will pause and yield the CPU due to no trace left to parse. Second, since the generation of trace is oftentimes slow, a persistent query for any new trace will bring CPU with a waste of cycles. Such frequent occupation of CPU has a negative impact on other normal works, incurring heavy overhead in the performance and efficiency especially for the system equipped with fewer CPU cores. Thus, we introduce an elastic scheduling scheme to the decoding thread as a remedy for the waste, shown in Table 5. During a round of checking and decoding new trace, we calculate the size of the valid trace to represent the generation rate of the trace. Then, the sleep time for the decoding thread is set to inversely proportional to the computed growth rate. That is, the slower the trace grows, the longer the thread sleeps, and vice versa. In this way, HART leaves more resources for CPU while the decoding thread is not busy, but also assures the decoder of high efficiency when it owns a heavy workload. Supports of Concurrency. We consider two types of concurrency, and support both conditions. First, multiple modules might be registered for protection under HART at the same time. In our design, the context of each module is maintained separately, and HART switches the context at module switch time.

HART: Hardware-Assisted Kernel Module Tracing on Arm

325

The enter and exit events of a kernel module are captured by HART, and context of ETM tracing and PMU counting related to the module is saved at exiting and restored at reentering. Since traces pertaining to different modules are distinguishable based on the context-ID2 packets, our design reuses the same hart buf for different modules for the sake of memory efficiency. Second, multiple user space clients may access the module(s) concurrently, which is common on multi-core SoCs. To handle this type of concurrency, we allocate multiple hart buf on a device with many cores, with each of hart buf uniquely serving one CPU core. This prevents race on trace storage among CPU cores. As both ETM and PMU are core-specific, their configurations and uses are independent across different cores. It is also worth noting that the ETB is shared by the cores. This does not interrupt our system. First, ETM traces from different cores are sequentially pushed to ETB, avoiding race on trace saving. Second, ETM attaches a context-ID whenever a task is being traced. This enables the decoder to separate the trace for individual tasks. Finally, we signal to pause the target module(s) on other cores while one core is doing ETB backup, so as to avoid losing trace data. Other than the trace buffer, we assign a separated decoding thread for each module to prevent confusion in the decoding process. Open Interfaces. To facilitate the further establishment of various trace-based security applications, we provide users of HART with a comprehensive set of interfaces, including (1) Initialization and Exiting interfaces to help users register to the initialization and exiting process of HART, (2) Decoding interfaces to pass the parsed ETM trace data to users, and (3) Instrumentation interfaces to help hook external functions as shown in line 6 and 10 in Table 4. 3.2

HART Implementation

We implement the prototype of HART on Linux system with kernel 4.4.145armv7-x16, running on a Freescale i.MX53 Quick Start Board [14] with ARMv7 Cortex-A8 processor [20]. As the processor only supports 32-bit mode, HART so far only runs with 32-bit systems. We believe there is no significant difference between the 32-bit and 64-bit implementations, and the only differences might lie in the driver for the involved hardware (e.g., PMU, ETM, and ETB) and some configurations (e.g., the instruction threshold for the trace buffer). In total, the HART system contains about 3,200 lines of C code. To decode the trace data, we reuse an open-source decoder [57], which supports both ETM-V2 and ETM-V3. We optimize the decoding process by directly indexing the packet handler with the packet type. Thus HART avoids the cost of handler looking up and will achieve better efficiency. After loading HART into the kernel as a standalone driver, we need to customize the native module utilities, insmod and rmmod, to inform HART of the installation and the removal of the target module to be protected. Before 2

The context-ID of a task is identical to the process ID. With the context-ID, the processor and ETM can identify which task triggers the execution of the module.

326

Y. Du et al.

installing the module, our insmod will send the notice to the virtual device registered by HART, which actually reminds HART of standby for the module protection. Our rmmod is customized similarly.

HASAN Design and Implementation

4

We also build an AddressSanitizer HASAN with HART’s open interfaces to demonstrate the extended utilization of HART. The design of HASAN reuses the scheme of AddressSanitizer [50] and follows the implementation of the Kernel Address Sanitizer (KASAN) [15]. Instead of sanitizing the whole kernel, HASAN focuses on specific kernel modules. Without source code, we cannot locate local and global arrays, which prevents us from wrapping those objects with redzones. Thus, HASAN only focuses on memory objects on the heap. Note that if the module source code is available, HASAN can achieve the same level protection with KASAN. However, if the module source code is out of reach, HASAN can still achieve heap protection while KASAN will not work at all. Table 6. Memory management interfaces HASAN hooked. Category

Allocation

De-allocation

Kmem cache

kmem cache alloc kmem cache create kmem cache free kmem cache destroy

Kmalloc

kmalloc krealloc kzalloc kcalloc

Page operations alloc pages

get free pages

kfree free pages

free pages

Design and Implementation: As HASAN aims to protect kernel memory, it only allocates shadow memory for the kernel space. As the kernel space ranges from 0xbf000000 to 0xffffffff in our current system, HASAN allocates a 130 MB continuous virtual space for the shadow memory with each shadow memory bit mapping to a byte. Given an address addr, HASAN maps it to the shadow memory location using the following function, where ADDR OF SHM indicates the beginning of our virtual space. 1 2 3 4

void * addr_to_hasanaddr(addr){ return ((u32)addr >> 3) + (addr - 0xbf000000) + ADDR_OF_SHM); }

To wrap heap objects with redzones, HASAN hooks the slab interfaces for memory management3 . As aforementioned, we achieve the hooking through the interfaces that HART plants to the wrappers of external calls. The interfaces that HASAN currently supports are listed in three categories in Table 6, with their handling methods explained as follows. ① When kmem cache create is invoked to creates a memory cache with size bytes, we first enlarge this size so as to reserve space for redzones around the 3

We are yet to add supports for slub.

HART: Hardware-Assisted Kernel Module Tracing on Arm

327

actual object. After kmem cache alloc has allocated the object, we mark the rest space as redzones. If kmem cache alloc requires memory from caches created by other kernel components (in many cases it will happen), HASAN will redirect the allocation to caches that are pre-configured in HASAN. These pre-configured caches will be destroyed when the target module is removed. ② In the second category, we first align the requested size by 8 bytes and then add another 16 bytes reserving for redzones. Then we call stock kmalloc and return to users with the address of the 9th byte in the allocated buffer. The first 8 bytes together with the remaining space at the end will be poisoned. De-allocations in both categories above follow similar strategies to KASAN. HASAN delays such operations (e.g., kfree) and marks the freed regions as poisoned. These memories will be released until the size of delayed memory reaches a threshold or the module is unloaded. ③ In the third category, however, we only delay the corresponding deallocation. For allocation, increasing the allocated size in those interfaces incurs a tremendous cost, as the size in their request indicates the order of pages to be allocated. Table 7. Utility kernel functions for memory operations. Category Function name

Category Function name

memory

memcpy memccpy memmove memset

print

string

strcpy strncpy strcat strncat strdup kernel

sprintf vsprintf copy from user copy to user

Since our design excludes the main kernel for tracing, it may miss to capture some vulnerable memory operations inside the kernel functions that listed in Table 7. To handle this issue, we make copies of those functions and load them as a new code section in HASAN. When a target module invokes those functions, we redirect the execution to our copies, enable ETM to trace on them and integrate this trace into ETB. To avoid switching hardware configurations on entering and leaving those functions, our implementation adds the memory holding our copies as an additional range to trace. The detection of HASAN is straightforward and efficient. We register a call back to HART’s decoder. On the arrival of a normal data packet, the decoder will parse the memory address for HASAN. Then, HASAN maps the address to the shadow memory and checks the poison in it. To avoid unsynchronized cases such as decoding of valid accesses after memory de-allocation, we ensure that the decoding synchronizes with the execution in each interface as presented in Table 6. Fine-Grained Synchronization: As our decoding proceeds after the execution, there is a delay from the occurrence of memory corruption to the detection of violations, which may be exploited to bypass HASAN. To reduce this attack surface, we force the decoder to synchronize with the execution whenever a

328

Y. Du et al.

callback is invoked. Specifically, our wrapper of that callback will pause the execution until the decoder consumes all the trace. Since a system call reaching the target module will invoke at least one callback, we ensure one or more synchronization(s) between successive system calls. This follows the existing research [21] that relies on system-call level synchronization to ensure security. Another rationale is that, even the state-of-art kernel exploits would require multiple system calls to compromise the execution [62,63].

5

Evaluation

In this section, to demonstrate the utility of our work, we sequentially test whether our work can achieve continuous tracing and measure their efficiency and effectiveness. 5.1

Setup

To understand the correctness and efficiency of our work, we run 6 widely-used kernel modules and come with standard benchmarks, detailing in Table 8. To further understand the non-intrusiveness in our work, we run the lmbench benchmarks. The evaluation is conducted on the aforementioned Freescale i.MX53 Quick Start Board [14] with 1 GB of RAM. As we aim to demonstrate the generality of HART with the minimal hardware requirements of the trace functionality of ETM and a small ETB, we consider i.MX53 QSB a perfect match. We do not measure the two types of concurrency described in Sect. 3.1, because the device has only one core, so concurrent tests would produce same results as running the tests sequentially. For comparison, we also test under the state-ofart solution KASAN [6]. Since the i.MX53 board we use can only support 32-bit systems while KASAN is only available on 64-bit systems, we run the KASAN based experiments in a Raspberry Pi 3+ configured with the same computing power and memory. To verify the effectiveness of detecting memory corruption, we collect 6 known memory vulnerabilities which cover different types and lie in different modules with various categories and complexities. Note we only cover heaprelated buffer overflows because HASAN focuses on heap related memory errors. 5.2

Continuous Tracing

In Table 9, we summarize the frequency of HART backing up the data from the ETB, the size of data retrieved each time. Most importantly, the last column in Table 9 shows ETB has never been filled up. Besides, the maximal size of trace is uniformly less than 2K, not even reaching the middle of ETB. Both facts indicate that overflow in ETB never happened and all the trace data has been successfully backed-up and decoded. Thus, HART and HASAN achieve continuous tracing, which is the precondition for the correctness of the solution.

HART: Hardware-Assisted Kernel Module Tracing on Arm

329

Table 8. Performance evaluation. The results are normalized using the tests on Native img + Native module as baseline 1. A larger number indicates a larger data transmission rate and a lower deceleration on performance. Module Type

Benchmark Name

Name

Result Setting

Native img +

KASAN img +

HART HASAN Native

Network

iperf [29]

Local Comm.

1.00

1.00

0.29

0.28

TCPW [2]

iperf [29]

Local Comm.

0.92

0.91

0.28

0.28

H-TCP [12]

iperf [29]

Local Comm.

0.94

0.94

0.26

0.25

IOZONE [28] Wr/fs=4048K/reclen=64

UDF [19]

Avg.

module module

HSTCP [30]

File system HFS+ [13]

Driver

KASAN

module module

1.00

1.00

0.96

0.95

Wr/fs=4048K/reclen=512 0.88

0.87

0.96

0.94

Rd/fs=4048K/reclen=64

0.92

0.89

0.98

0.92

Rd/fs=4048K/reclen=512 0.90

0.89

0.99

0.99

IOZONE [28] Wr/fs=4048K/reclen=64

USB STORAGE [8] dd [45]





0.95

0.93

0.99

0.97

Wr/fs=4048K/reclen=512 0.97

0.97

1.00

0.92

Rd/fs=4048K/reclen=64

0.98

0.97

0.99

0.98

Rd/fs=4048K/reclen=512 0.97

0.96

1.00

0.98

Wr/bs=1M/count=1024

1.00

1.00

1.00

0.43

Wr/bs=4M/count=256

1.00

1.00

0.99

0.43

Rd/bs=1M/count=1024

0.99

0.99

0.99

0.75

Rd/bs=4M/count=256

1.00

1.00

1.00

0.76



0.95

0.94

0.85

0.72

Table 9. Tracing evaluation of HART and HASAN. Module Type

Retrieving times Max size(Byte) Min size(Byte) Average size(Byte) Full ETB HART HASAN HART HASAN HART HASAN

HART HASAN

Network HSTCP

4243

3964

1100

1196

20

20

988

1056

0

0

TCP-W

3728

3584

1460

1456

20

20

1128

1088

0

0

H-TCP

3577

3595

1292

1304

20

20

1176

1168

0

0

30505 30278

1652

1756

20

20

144

148

0

0

File

Name

HFS+

system UDF Driver

HART HASAN

17360 20899

USB STORAGE 9316

9325

2424

2848

20

20

240

232

0

0

1544

1692

20

20

448

448

0

0

We also observe that HART and HASAN have some similarities and differences in their tracing behaviours. Overall, HART and HASAN present the similar but non-identical results in the same set of module. The slight deviation mainly derives from the padding packets randomly inserted by ETM and the random skids of PMU. The differences lie across different families of modules. HART and HASAN back up ETB more frequently in the two file systems (nearly 2X more than that in the driver module and 3-7X more than the network modules), while they produce the least trace in each ETB backup with

330

Y. Du et al.

the file systems. This indicates that file systems execute more instructions yet they usually carry less information about control flow and data accesses. Due to such increase in instruction number, PMU has to raise more interrupts in the testing on file systems and contributes to the more frequent ETB backups. The statistics above are consistent with the following performance evaluation – a higher frequency of backing up incurs more trace management and thus, introduces higher performance overhead. 5.3

Performance Evaluation

For performance measurement, we record (1) the bandwidths of the server for network modules and (2) the read/write rates for file-systems as well as driver modules. The testing results are summarized in Table 8. Across all three types of drivers, HART and HASAN barely affects the performance. On average, HART introduces an overhead of 5%, while HASAN introduces 6%. In the worst-case scenario, HART brings a maximum of 12% for (reclen=64) when testing on HFS+, and HASAN brings at most 13% in the same case. These results well support that HART and HASAN are in general efficient. The most important reason for this efficiency, we believe, is caused by employing ETM to capture control flow and data flow with high efficiency. In general, the hardware-based tracing involves negligible performance overhead, and almost all the overhead introduced by HART is caused by decoding and analyzing the trace output. On contrast, since KASAN cannot be deployed to the modules without KASANenabled kernel, we analyze the impacts of only enforcing KASAN through the differences between the tests with and without KASAN-enabled modules to some extent: ① The test performed on the network modules indicates KASAN introduces nearly 4X slowdown. The reason is that, during a networking event (receiving and sending a message), the kernel performs large amounts of computations due to the complicated operations on the network stack, even though the IO through the network port is cheap as we are only doing local communication. As such, the overhead introduced by KASAN is not diluted by IO accesses. Furthermore, though the difference between two tests on KASAN is insignificant, it does not imply that KASAN will be efficient if we re-engineer it to protect the target module only. In a network activity, those drivers only take up a small fraction of computation in the kernel space. Overhead on such computation is covered up by the significant slowdown in the main kernel. ② In the two file systems, KASAN hardly affects the performance of the kernel or the target module. Again, this does not necessarily imply that KASAN is most efficient, because in a file system operation, most of the time is spent on disk access and the computation in the kernel is quite simple and quick. Therefore, the average overhead is insignificant. ③ In the test on USB STORAGE driver, most of the computation in the kernel space is performed by the target module, and takes more time than the IO access. Thus, the cost of adding KASAN on the module is significant (about 2X). Besides reducing the overhead on the target module, we also verify our point of the non-intrusiveness of HART and HASAN. Running them without the target module, we find they introduce zero overhead to the main kernel, because

HART: Hardware-Assisted Kernel Module Tracing on Arm

331

Table 10. Performance evaluation on KASAN with lmbench. HART and HASAN introduce no overhead to the main kernel, so the results are omitted here. Func.

Setting

Native KASAN Overhead Func.

Setting

Processes

stat

3.08

16.4

5.3

File & VM

0K File Create

44.0

136.1

3.1

(ms)

open clos 8.33

36.7

4.4

system

0K File Delete

35.2

227.1

6.5

sig hndl

20.4

3.4

latency (ms) 10K File Create 99.9

370.2

3.7

fork proc 472

1940

4.1

10K File Delete 64.2

204.7

Local

Pipe

18.9

45.8

2.4

Mmap Latency 188000 385000

2.0

Comm.

AF UNIX 26.6

6.06

Native KASAN Overhead

3.2

97.9

3.7

Prot Fault

0.5

0.5

1.0

latency (ms) UDP

41.4

127.6

3.1

Page Fault

1.5

2.3

1.5

TCP

53.4

176.4

3.3

100fd selct

6.6

13.7

2.1

they notice nothing to protect and are always sleeping. Differing from our works, KASAN places a significant burden on the main kernel (Table 10). This well supports the advantage of our works in terms of modification to the main kernel. Table 11. Effectiveness evaluation on HASAN. For the result of Detection, “Y” means detected. Vulnerability

Detection

CVE-ID

Type

CVE-2016-0728

Use-after-free REFCOUNT overflow [56] Y

Y

CVE-2016-6187

Out-of-bound Heap off-by-one [42]

Y

Y

CVE-2017-7184

Out-of-bound xfrm replay verify len [52] Y

Y

CVE-2017-8824

Use-after-free dccp disconnect [33]

Y

Y

CVE-2017-2636

Double-free

Y

Y

CVE-2018-12929 Use-after-free ntfs read locked inode [46] Y

Y

5.4

PoC

n hdlc [44]

HASAN KASAN

Effectiveness Evaluation

We present the results of our effectiveness evaluation in Table 11. As shown in this table, both HASAN and KASAN can detect all 6 cases. Since the selected cases cover different types of heap-related issues, including buffer overflow, offby-one, use-after-free, and double free, we demonstrate that HASAN can cover a variety of vulnerabilities with a comparable security performance to the stateof-the-art solution in terms of the heap.

6

Discussion

Support of Other Architectures. As hardware tracing brings a significant improvement in the performance overhead during the debugging, it becomes a

332

Y. Du et al.

common feature in major architectures. While HART achieves tracing efficiency with the support of ETM on Arm platforms, we can easily extend our design to other architectures with such feature. For example, since Processor Tracing (PT) [3], supports instruction tracing and will be enhanced with PTWRITE [1] to support efficient data tracing, we can extend our design by replacing the hardware set-up and configuration (given a target instrumented with PTWRITE). Support of Statically Linked Modules. HART mainly targets LKM, because most of the third-party modules are released as LKM for the ease of use. Our current design cannot work with statically linked modules, mainly because we require to hook the callbacks and calls to kernel functions, which can be achieved by adjusting the relocation. However, statically linked modules are compiled and built as port of the main kernel and hence, we have no access to place the hooks. To extend support for statically linked modules, we need the kernel builder to instrument the target modules and place the required hooks. Breakage of Perf Tools. As perf tools have already included ETM tracing capabilities, enabling tracing functionality from the user-space, HART may lead to the breakage of perf tools. We think the breakage depends on the scenario. If HART and perf are tracing/profiling different tasks, we consider perf won’t be affected since HART will save the context of the hardware at the entry and exit of the traced module. However, if HART and perf are tracing/profiling the same task, perf might be affected. One easy solution is to run HART and perf side by side instead of running them concurrently.

7 7.1

Related Work Kernel Protection

Kernel Debugger. Kmemcheck [11] can dynamically check uninitialized memory usage in x86 kernel space, but its single-stepping debugging feature drags down the speed of program execution to a large extent. KASAN [6] is the kernel version of address sanitizer. However, the heavy overhead makes it impossible to be a real-time protection mechanism, while the low overhead of HASAN show us the potential. Furthermore, KASAN requires the source code for implementation. If a vulnerability is hidden in a close-source third-party kernel module, which is common, HASAN has the potential to detect it but KASAN would not. Ftrace [10] is an software-based tracer for the kernel. Comparing with Ftrace, HART is noninvasive since we neither modifies the target kernel module nor requires the source code of the kernel module. In regard to the trace granularity, Ftrace is used to monitor function calls and kernel events with predefined tracepoints, but HART can be used to trace executed instructions and data used in these instructions. Ftrace also introduces heavy performance overhead on some event tracing, and cannot be adopted as a real-time protection as well. Kernel Integrity Protection. KCoFI [26] enforces the CFI policy by providing a complete implementation for event handling based on Secure Virtual

HART: Hardware-Assisted Kernel Module Tracing on Arm

333

Architecture (SVA) [27]. However, all software including OS should be compiled to the virtual instruction set that SVA provides. The solution also incurs high computation overhead. With slight modification to the modern architectures, Song et al. [53] enforces DFI over both of control and non-control data that can affect security checks. Despite the aids from the re-modelled hardware, the DFI still introduces significant latency to various system calls. Isolation and Introspection. BGI [22] enforces isolation through an additional interposition layer between kernel and the target module. However, it requires instrumentation over source code of the target modules. Similar ideas are followed by many other works [48,58,65]. Ninja [43] is a transparent debugging and tracing framework on Arm platform, and it can also be used to introspect the OS kernel from TrustZone. However, bridging the semantic gaps between Linux kernel and TrustZone would increase the performance overhead significantly. 7.2

Hardware Feature Assisted Security Solutions

Hardware Performance Counters. HDROP [66] and SIGDROP [59] leverage PMU to count for the abnormal increase of events and seek patterns of ROP at runtime. Morpheus [38] proposes a benchmarking tool to evaluate computational signatures based mobile malware detection, in which HPCs help to create runtime traces from applications for the ease of the signature comparison. Although these solutions [37,55] achieve significant overhead reduction with the assistance of HPCs, none of them applies this hardware feature to kernel module protection. Hardware-Based Tracing. With the availability of Program Trace Interface, Kargos [41] monitors the execution and memory access events to detect code injection attacks. However, it brings extra performance overhead due to the modification to the kernel. Since the introduction in Intel’s Broadwell microarchitecture, PT has been broadly applied in solutions for software security [32,34,35,47]. All the PT-based solutions are control flow specific, differing from HART that traces both data flow and control flow for security protection.

8

Conclusion

Kernel modules demand as much security protection as the main kernel. However, the current solutions are actually limited by the requirement of source code, significant intrusiveness and heavy overhead. We present HART as a general ETM-powered tracing framework specifically for kernel modules, with the most preliminary hardware support and the most compatible method. HART can trace the selective work on the binary-only module(s) continuously by combining hardware configurations and software hooks, and decode the trace efficiently following an elastic scheduling. Based on the framework, we then build a modular security solution, HASAN, to effectively detect memory corruptions without the aforementioned limitations. Testing on a set of benchmarks in different modules, both HART and HASAN perform significantly superior to the

334

Y. Du et al.

state-of-the-art KASAN, introducing the average overhead of 5% and 6%. Moreover, HASAN identifies all of the 6 vulnerabilities in different categories and modules, indicating its comparable effectiveness to the state-of-the-art solutions. Acknowledgements. We sincerely thank our shepherd Prof. Dave Jing Tian and reviewers for their comments and feedback. This work was supported in part by grants from the Chinese National Natural Science Foundation (NSFC 61272078, NSFC 61073027).

References 1. PTWRITE - write data to a processor trace packet. https://hjlebbink.github.io/ x86doc/html/PTWRITE.html 2. TCP westwood+ congestion control (2003). https://tools.ietf.org/html/rfc3649 3. Processor tracing (2013). https://software.intel.com/en-us/blogs/2013/09/18/ processor-tracing 4. Juno ARM Development Platform SoC Technical Reference Manual (2014) 5. slub (2017). https://www.kernel.org/doc/Documentation/vm/slub.txt 6. Home Google/Kasan Wiki (2018). https://github.com/google/kasan/wiki 7. Apple A4 (2019). https://www.apple.com/newsroom/2010/06/07Apple-PresentsiPhone-4/ 8. Config usb storage: USB mass storage support (2019). https://cateee.net/lkddb/ web-lkddb/USB STORAGE.html 9. Embedded trace macrocell architecture specification (2019). http://infocenter.arm. com/help/index.jsp?topic=/com.arm.doc.ihi0014q/index.html 10. ftrace - function tracer (2019). https://www.kernel.org/doc/Documentation/ trace/ftrace.txt 11. Getting started with kmemcheck - the Linux kernel documentation (2019). https:// www.kernel.org/doc/html/v4.14/dev-tools/kmemcheck.html 12. H-TCP - congestion control for high delay-bandwidth product networks (2019). http://www.hamilton.ie/net/htcp.htm 13. HFS plus (2019). https://www.forensicswiki.org/wiki/HFS%2B 14. i.MX53 quick start board—NXP (2019). https://www.nxp.com/products/powermanagement/pmics/power-management-for-i.mx-application-processors/i.mx53quick-start-board:IMX53QSB 15. The kernel address sanitizer (Kasan) - the Linux kernel documentation (2019). https://www.kernel.org/doc/html/v4.14/dev-tools/kasan.html 16. Kmemleak (2019). https://www.kernel.org/doc/html/v4.14/dev-tools/kmemleak. html 17. Samsung Exynos 3110 (2019). https://www.samsung.com/semiconductor/ minisite/exynos/products/mobileprocessor/exynos-3-single-3110/ 18. Snapdragon 200 series (2019). https://www.qualcomm.com/snapdragon/ processors/200 19. Universal disk format (2019). https://docs.oracle.com/cd/E19683-01/806-4073/ fsoverview-8/index.html 20. ARM: Cortex-A8 Technical Reference Manual (2014) 21. Bigelow, D., Hobson, T., Rudd, R., Streilein, W., Okhravi, H.: Timely rerandomization for mitigating memory disclosures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 268–279. ACM (2015)

HART: Hardware-Assisted Kernel Module Tracing on Arm

335

22. Boyd-Wickizer, S., Zeldovich, N.: Tolerating malicious device drivers in Linux. In: USENIX Annual Technical Conference, Boston (2010) 23. Carbone, M., Cui, W., Lu, L., Lee, W., Peinado, M., Jiang, X.: Mapping kernel objects to enable systematic integrity checking. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 555–565. ACM (2009) 24. Castro, M., et al.: Fast byte-granularity software fault isolation. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 45–58. ACM (2009) 25. Chen, H., Mao, Y., Wang, X., Zhou, D., Zeldovich, N., Kaashoek, M.F.: Linux kernel vulnerabilities: state-of-the-art defenses and open problems. In: Proceedings of the Second Asia-Pacific Workshop on Systems, p. 5. ACM (2011) 26. Criswell, J., Dautenhahn, N., Adve, V.: KCoFI: complete control-flow integrity for commodity operating system kernels. In: 2014 IEEE Symposium on Security and Privacy (SP), pp. 292–307. IEEE (2014) 27. Criswell, J., Lenharth, A., Dhurjati, D., Adve, V.: Secure virtual architecture: a safe execution environment for commodity operating systems. In: Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles, SOSP 2007, pp. 351–366. ACM (2007) 28. Don, C., Capps, C., Sawyer, D., Lohr, J., Dowding, G., et al.: IOzone filesystem benchmark (2016). http://www.iozone.org/ 29. Dugan, J., Elliott, S., Mah, B.A., Poskanzer, J., Prabhu, K., et al.: iPerf - the ultimate speed test tool for TCP, UDP and SCTP (2018). https://iperf.fr/ 30. Floyd, S.: Highspeed TCP for large congestion windows (2003). https://tools.ietf. org/html/rfc3649 31. Garfinkel, T., Rosenblum, M., et al.: A virtual machine introspection based architecture for intrusion detection. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2003, pp. 191–206 (2003) 32. Ge, X., Cui, W., Jaeger, T.: GRIFFIN: guarding control flows using intel processor trace. In: Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2017) 33. Ghannam, M.: CVE-2017-8824 Linux: use-after-free in DCCP code (2017). https:// www.openwall.com/lists/oss-security/2017/12/05/1 34. Gu, Y., Zhao, Q., Zhang, Y., Lin, Z.: PT-CFI: transparent backward-edge control flow violation detection using intel processor trace. In: Proceedings of the 7th ACM International Conference on Data and Application Security and Privacy (CODASPY) (2017) 35. Hertz, J., Newsham, T.: Project triforce: run AFL on everything! (2016). https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2016/ june/project-triforce-run-afl-on-everything/ 36. Hinum, K.: Hisilicon kirin 920 (2017). https://www.notebookcheck.net/HiSiliconKirin-920-SoC-Benchmarks-and-Specs.240088.0.html 37. Iyer, R.K.: An OS-level framework for providing application-aware reliability. In: 12th Pacific Rim International Symposium on Dependable Computing, PRDC 2006 (2007) 38. Kazdagli, M., Ling, H., Reddi, V., Tiwari, M.: Morpheus: benchmarking computational diversity in mobile malware. In: Proceedings of Hardware and Architectural Support for Security and Privacy (2014) 39. Linares-V´ asquez, M., Bavota, G., Escobar-Vel´ asquez, C.: An empirical study on android-related vulnerabilities. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 2–13. IEEE (2017)

336

Y. Du et al.

40. Machiry, A., Spensky, C., Corina, J., Stephens, N., Kruegel, C., Vigna, G.: Dr. Checker: a soundy analysis for Linux kernel drivers. In: 26th USENIX Security Symposium (USENIX Security 2017), pp. 1007–1024. USENIX Association (2017) 41. Moon, H., Lee, J., Hwang, D., Jung, S., Seo, J., Paek, Y.: Architectural supports to protect OS kernels from code-injection attacks. In: Proceedings of Hardware and Architectural Support for Security and Privacy (2016) 42. Nikolenko, V.: Heap off-by-one POC (2016). http://cyseclabs.com/exploits/ matreshka.c 43. Ning, Z., Zhang, F.: Ninja: towards transparent tracing and debugging on arm. In: 26th USENIX Security Symposium (USENIX Security 2017), pp. 33–49 (2017) 44. Popov, A.: CVE-2017-2636: exploit the race condition in the n hdlc Linux kernel driver bypassing SMEP (2017). https://a13xp0p0v.github.io/2017/03/24/CVE2017-2636.html 45. Rubin, P., MacKenzie, D., Kemp, S.: dd - convert and copy a file (2019). http:// man7.org/linux/man-pages/man1/dd.1.html 46. Schumilo, S.: Multiple memory corruption issues in ntfs.ko (Linux 4.15.0-15.16) (2018). https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763403 47. Schumilo, S., Aschermann, C., Gawlik, R., Schinzel, S., Holz, T.: kAFL: hardwareassisted feedback fuzzing for OS kernels. In: Proceedings of the 26th Security Symposium (USENIX Security) (2017) 48. Sehr, D., et al.: Adapting software fault isolation to contemporary CPU architectures. In: 19th USENIX Security Symposium (USENIX Security 2010), pp. 1–12 (2010) 49. Freescale Semiconductor: i.MX53 Multimedia Applications Processor Reference Manual (2012) 50. Serebryany, K., Bruening, D., Potapenko, A., Vyukov, D.: AddressSanitizer: a fast address sanity checker. In: USENIX Annual Technical Conference, pp. 309–318 (2012) 51. Seshadri, A., Luk, M., Qu, N., Perrig, A.: SecVisor: a tiny hypervisor to provide lifetime kernel code integrity for commodity OSes. In: ACM SIGOPS Operating Systems Review, pp. 335–350. ACM (2007) 52. snorez: Exploit of CVE-2017-7184 (2017). https://raw.githubusercontent.com/ snorez/exploits/master/cve-2017-7184/exp.c 53. Song, C., Lee, B., Lu, K., Harris, W., Kim, T., Lee, W.: Enforcing kernel security invariants with data flow integrity. In: NDSS (2016) 54. Swift, M.M., Martin, S., Levy, H.M., Eggers, S.J.: Nooks: an architecture for reliable device drivers. In: Proceedings of the 10th Workshop on ACM SIGOPS European Workshop, pp. 102–107. ACM (2002) 55. Tang, A., Sethumadhavan, S., Stolfo, S.J.: Unsupervised anomaly-based malware detection using hardware features. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 109–129. Springer, Cham (2014). https://doi. org/10.1007/978-3-319-11379-1 6 56. Perception Point Team: Refcount overflow exploit (2017). https://github.com/ SecWiki/linux-kernel-exploits/blob/master/2016/CVE-2016-0728/cve-2016-0728. c 57. virtuoso: virtuoso/etm2human: Arm’s ETM v3 decoder (2009). https://github. com/virtuoso/etm2human 58. Wahbe, R., Lucco, S., Anderson, T.E., Graham, S.L.: Efficient software-based fault isolation. In: ACM SIGOPS Operating Systems Review, pp. 203–216. ACM (1994) 59. Wang, X., Backer, J.: SIGDROP: signature-based ROP detection using hardware performance counters. arXiv preprint arXiv:1609.02667 (2016)

HART: Hardware-Assisted Kernel Module Tracing on Arm

337

60. Wang, Z., Jiang, X.: HyperSafe: a lightweight approach to provide lifetime hypervisor control-flow integrity. In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 380–395. IEEE (2010) 61. Wang, Z., Jiang, X., Cui, W., Ning, P.: Countering kernel rootkits with lightweight hook protection. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 545–554. ACM (2009) 62. Wu, W., Chen, Y., Xing, X., Zou, W.: Kepler: facilitating control-flow hijacking primitive evaluation for Linux kernel vulnerabilities. In: 28th USENIX Security Symposium (USENIX Security 2019), pp. 1187–1204 (2019) 63. Wu, W., Chen, Y., Xu, J., Xing, X., Gong, X., Zou, W.: Fuze: towards facilitating exploit generation for kernel use-after-free vulnerabilities. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 781–797 (2018) 64. Xiong, X., Tian, D., Liu, P., et al.: Practical protection of kernel integrity for commodity OS from untrusted extensions. In: NDSS, vol. 11 (2011) 65. Zhou, F., et al.: SafeDrive: safe and recoverable extensions using language-based techniques. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, pp. 45–60. USENIX Association (2006) 66. Zhou, H.W., Wu, X., Shi, W.C., Yuan, J.H., Liang, B.: HDROP: detecting ROP attacks using performance monitoring counters. In: Huang, X., Zhou, J. (eds.) ISPEC 2014. LNCS, vol. 8434, pp. 172–186. Springer, Cham (2014). https://doi. org/10.1007/978-3-319-06320-1 14

Zipper Stack: Shadow Stacks Without Shadow Jinfeng Li1,2 , Liwei Chen1,2(B) , Qizhen Xu1,2 , Linan Tian1,2 , Gang Shi1,2 , Kai Chen1,2 , and Dan Meng1,2 1

2

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China {lijinfeng,chenliwei,xuqizhen,tianlinan, shigang,chenkai,mengdan}@iie.ac.cn School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China

Abstract. Return-Oriented Programming (ROP) is a typical attack technique that exploits return addresses to abuse existing code repeatedly. Most of the current return address protecting mechanisms (also known as the Backward-Edge Control-Flow Integrity) work only in limited threat models. For example, the attacker cannot break memory isolation, or the attacker has no knowledge of a secret key or random values. This paper presents a novel, lightweight mechanism protecting return addresses, Zipper Stack, which authenticates all return addresses by a chain structure using cryptographic message authentication codes (MACs). This innovative design can defend against the most powerful attackers who have full control over the program’s memory and even know the secret key of the MAC function. This threat model is stronger than the one used in related work. At the same time, it produces lowperformance overhead. We implemented Zipper Stack by extending the RISC-V instruction set architecture, and the evaluation on FPGA shows that the performance overhead of Zipper Stack is only 1.86%. Thus, we think Zipper Stack is suitable for actual deployment.

Keywords: Intrusion detection

1

· Control Flow Integrity

Introduction

In the exploitation of memory corruption bugs, the return address is one of the most widely exploited vulnerable points. On the one hand, code-reuse attacks (CRAs), such as ROP [5] and ret2libc [26], perform malicious behavior by chaining short sequences of instructions which end with a return via corrupted return This work is partially supported by the National Natural Science Foundation of China (No. 61602469, U1836211), and the Fundamental theory and cutting edge technology Research Program of Institute of Information Engineering, CAS (Grant No. Y7Z0411105). c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 338–358, 2020. https://doi.org/10.1007/978-3-030-58951-6_17

Zipper Stack: Shadow Stacks Without Shadow

339

addresses. These attacks require no code injection so they can bypass nonexecutable memory protection. On the other hand, the most widely exploited memory vulnerability, stack overflow, is also exploited by overwriting the return address. Both CRAs and stack smashing attacks are based on tampering with the return addresses. In order to protect the return addresses, quite a few methods were presented, such as Stack Protector (also known as Stack Canary) [8,38], Address Space Layout Randomization (ASLR) [32], Shadow Stacks [6,9,23,29], Control Flow Integrity (CFI) [1,2], and Cryptography-based CFI [21,24]. However, they have encountered various problems in the actual deployment. Stack Protector and ASLR rely on secret random values (cookies or memory layout). Both methods are widely deployed now. However, if there is a memory leak, both methods will fail [30]: the attacker can read the cookie and hold it unchanged while overwriting the stack to bypass the Stack Protector, and derandomize ASLR to bypass ASLR. Some works have proved that they can be reliably bypassed in some circumstances [16,20,27], such as BROP [4]. Even if there is no information leaking, some approaches can still bypass ASLR and perform CRAs [25]. Shadow Stack is a direct mechanism that records all return addresses in a protected stack and checks them when returns occur. It has been implemented via both compiler-based and instrumentation-based approaches [9,10]. In recent years, commercial hardware support has also emerged [17]. But Shadow Stack relies heavily on the security of the memory isolation, which is difficult to guarantee in actual deployment. Some designs of Shadow Stack utilize ASLR to protect Shadow Stack. However, they cannot thwart the attacks contains any information disclosure [7,9]. Since most methods that bypass ASLR [20,27] are effective against this type of defense. Other designs use page attributes to protect Shadow Stack, for instance, CET [17]. However, defenses that rely on page attributes, such as NX (no-execute bit), have been bypassed by various technologies in actual deployment [18]: a single corrupted code pointer to the function in the library (via a JOP/COOP attack) may change the page attribute and disable the protection. In light of these findings, we think that mechanisms that do not rely on memory isolation are more reliable and imperative. Cryptography-based protection mechanisms based on MAC authentication have also been proposed. These methods calculate the MACs of the return addresses and authenticate them before returns [21,24,40]. They do not rely on memory isolation because the attacker cannot generate new correct MACs without the secret key. But they face other problems: replay attacks (which reuse the existing code pointers and the MACs of them), high-performance overhead, and security risk of keys. A simple authentication of MAC is vulnerable facing replay attacks, so some designs introduce extra complexity to mitigate the problem (such as adding a nonce or adding stack pointer into the MAC input). Besides, these methods’ performance overhead is also huge, since they use a cryptographic MAC and then save the result into memory. Both operations consume massive runtime overhead. The security of the secret key is also crucial. Theses

340

J. Li et al.

works assume that there is no hardware attack or secret leak from the kernel. However, in the real world, the secret key can be leaked by a side-channel attack or an attack on context switching in the kernel. Therefore, we should go further over these assumptions, and consider a more powerful threat model. In this paper, we propose a novel method to protect return addresses, Zipper Stack, which uses a chain of MACs to protect the return addresses and all the MACs. In the MAC chain, the newest MAC is generated from the latest return address and the previous MAC. The previous MAC is generated from the return address and MAC before the previous one, as Fig. 1 shows. So the newest MAC is calculated from all the former return addresses in the stack, although it is generated by computing the latest address and the previous MAC. Zipper Stack minimizes the amount of state requiring direct protection: only the newest MAC needs to be protected from tampering. Without tampering the newest MAC, an attacker cannot tamper with any return address because he cannot tamper the whole chain and keep the relation.

Fig. 1. Core of Zipper Stack

Zipper Stack avoids the problems of Shadow Stack and Cryptography-based protection mechanisms. Compared to Shadow Stack, it does not rely on memory isolation. Consequently, the attacks that modify both the shadow stack and the main stack cannot work in Zipper Stack. Compared with other cryptographybased mechanisms, Zipper Stack can resist against replay attacks itself and will not fail even if the secret key is leaked (see Sect. 4). In terms of efficiency, Zipper Stack performs even better. Our design is more suitable for parallel processing in the CPU pipeline, which avoids most performance overhead caused by the MAC calculation and memory access. The performance overhead of Zipper Stack with hardware support based on Rocket Core [35] (RISC-V CPU [36]) is only 1.86% based on our experiments. The design of Zipper Stack solved three challenges: First, it avoids the significant runtime overhead that most cryptography-based mechanisms suffer since the newest MAC is updated in parallel. In our hardware implementation, most

Zipper Stack: Shadow Stacks Without Shadow

341

instructions that contain a MAC generation/authentication take only one cycle (See Sect. 6). Second, it utilized the LIFO order of return addresses to minimize the amount of state requiring direct protection. In general, a trust root authenticating all the data can help us defend against replay attacks or attacks containing secret key leaks (which most current return address protection method cannot). While in Zipper Stack, the authentication uses the newest MAC, at the same time, the MAC is a dynamic trust root itself. So it gets better security without extra overhead. Third, previous methods protect each return address separately, so any one been attacked may cause attacks. Zipper Stack, however, connect all the return addresses, leverage the prior information to increase the bar for attackers. In order to demonstrate the design and evaluate the performance, we implement Zipper Stack in three deployments corresponding to three situations: a) Hardware approach, which is suitable when hardware support of Zipper Stack is available. b) Customized compiler approach, which is suitable when hardware support is not available, but we can recompile the applications. c) Customized ISA approach, which is suitable when we cannot recompile the programs, but we can alter the function of CALL/RET instructions. Ideally, the hardware approach is the best - it costs the lowest runtime overhead. The other two approaches, however, are suitable in some compromised situation. In the hardware approach, we instantiated Zipper Stack with a customized Rocket Core (a RISC-V CPU) on the Xilinx Zynq VC707 evaluation board (and hardware-based Shadow Stack as a comparison). In customized compiler approach, we implemented Zipper Stack in LLVM. In customized ISA approach, we use Qemu to simulate the modified ISA. Contributions. In summary, this paper makes the following contributions: 1. Design: We present a novel, concise, efficient return address protection mechanism, called Zipper Stack, which protect return addresses against the attackers have full control of all the memory and know the secret keys, with no significant runtime overhead. Consequently, we analyze the security of our mechanism. 2. Implementation: To demonstrate the benefits of Zipper Stack, we implemented Zipper Stack on the FPGA board, and a hardware-based Shadow Stack as a comparison. To illustrate the potential of Zipper Stack to be further implemented, we also implemented it in LLVM and Qemu. 3. Evaluation: We quantitatively evaluated the runtime performance overhead of Zipper Stack, which is better than existing mechanisms.

2 2.1

Background and Related Work ROP Attacks

Return Oriented Programming (ROP) [5,26] is the major form of code reuse attacks. ROP makes use of existing code snippets ending with return instructions

342

J. Li et al.

called gadgets to perform malicious acts. In ROP attacks, the attackers link different gadgets by tampering with a series of return addresses. An ROP attack is usually made up of multiple gadgets. At the end of each gadget, a return instruction links the next gadget via the next address in the stack. The defenses against ROP mainly prevent return instructions from using corrupted return addresses or randomize the layout of the codes. 2.2

Shadow Stack and SafeStack

Shadow Stack is a typical technique to protect return addresses. Shadow Stack saves the return addresses in a separate memory area and checks the return addresses in the main stack when returns. It has been implemented in both compiler-based and instrumentation-based approaches [6,9,12,22,29]. SafeStack [19] is a similar way, which moves all the return addresses into a separated stack instead of backs up the return addresses. SafeStack is now implemented in LLVM as a component of CPI [33]. An isolated stack mainly brings about two problems: One is that memory isolation costs more memory overhead and implementation complexity. Another one is that security relies on the security of memory isolation, which is impractical. As the structure of the shadow stack is simply the copies of return addresses, it is fragile once the attacker can modify its memory area. For example, in Intel CET, the shadow stack’s protection is provided by a new page attribute. But a similar approach in DEP is easily bypassed by a variety of methods modifying the page attribute in real-world attacks [18]. ASLR is also bypassed in real-world attacks that other implementations rely on. The previous work [7] constructed a variety of attacks on Shadow Stack. 2.3

CFI and Crypto-Based CFI

Control Flow Integrity (CFI), which first introduced by Abadi et al. [1,2], has been recognized as an important low-level security property. In CFI, runtime checks are added to enforce that jumps and calls/rets land only to valid locations that have been analyzed and determined ahead of execution [34,37,39]. The security and performance overhead of different implementations differ. Fine-grained CFI approaches will introduce significant overhead. However, coarse-grained CFI has lower performance overhead but enforces weaker restrictions, which is not secure enough [7]. Besides, Control-Flow Graphs (CFGs), which fine-grained CFI bases on, are constructed by analyzing either the disassembled binary or source code. CFG cannot be both sound and complete, so even if efficiency losses are not mainly considered, CFI is not a panacea for code reuse attacks [11]. Due to the above reasons, CFI is not widely deployed on real systems now. To improve CFI, some implementations also introduce cryptography methods to solve problems such as inaccurate static analysis: CCFI [21], RAGuard [40], Pointer-Authentication [24]. Most of these methods are based on MAC: the MACs of protected key pointers, including return addresses, are generated, whereafter, authenticated before use. But all of these methods rely heavily on

Zipper Stack: Shadow Stacks Without Shadow

343

secret keys and cost tremendous performance overhead. Another problem of these mechanisms is replay attacks. The attackers can perform replay attacks by reusing the existing values in memory.

3

Threat Model

In this paper, we assume that a powerful attacker has the ability to read and overwrite arbitrary areas of memory. He tries to perform ROP (or ret2lib) attacks. This situation is widespread - for example, a controllable pointer out of bounds can help the attacker acquire the capability. Reasonably, the attacker cannot alter the value in the dedicated registers (called Top and Key registers in our design), since these registers cannot be accessed by general instructions. The attacker in our assumption is more powerful than all previous works. Shadow Stacks assume that the attackers cannot locate or overwrite the shadow stack, which is part of the memory. In our work, we do not need that assumption, which means it can defend against more powerful attacks.

4

Design

In this section, we elaborate on the design of Zipper Stack in detail. Here, we take the hardware approach as an example. The design in other approaches is equivalent.

Fig. 2. Overview of Zipper Stack (hardware approach)

4.1

Overview

In the hardware approach, we need hardware support and the modification of memory layout. Figure 2 shows the overview of the hardware in Zipper Stack: Zipper Stack needs two dedicated registers and a MAC module in the CPU, but it requires no hardware modification of the memory. The registers include the Top register holding a MAC (N m bits) and the Key register holding a secret key (N s bits). Both the registers are initialized as random numbers at the beginning of a process, and they cannot be read nor rewritten by attackers. The secret key will not be altered in the same process. Therefore, we temporarily ignored this register for the sake of simplicity in the following. Assuming that the width of

344

J. Li et al.

Fig. 3. Layout in Zipper Stack: The dotted rectangles in the figure indicate the input of the MAC function, and the solid lines indicate the storage location of the MAC.

return addresses is N a, the MAC module should perform a cryptographic MAC function with an input bit width of N a + N m and an output bit width of N m. We now turn to the memory layout of Zipper Stack. In Zipper Stack, all return addresses are bound to a MAC, as shown in Fig. 3. The novelty is, the MAC is not generated from the address bound with itself, but from the previous return address with the MAC bound with that address. This connection keeps all return addresses and MAC in a chain. To maintain the structure, the top one, namely the last return address pushed into the stack, is handled together with the previous MAC and the new MAC is saved into the Top register; while the bottom one, i.e., the first return address pushed into the stack is bound to a random number (exactly the initial value of Top register when the program begins). 4.2

Operations

Next, we describe how Zipper Stack works with return addresses in the runtime, i.e., how to handle the Call instructions and the Return instructions. As Fig. 4 shows. Call: In general, the Call instructions perform two operations. First, push the address of the next instruction into the stack. Then, set the PC to the call destination. While in Zipper Stack, the Call instructions become slightly more complicated and need three steps:

Zipper Stack: Shadow Stacks Without Shadow

345

Fig. 4. The stack layout before/after a call/return. Previous MAC is generated from previous return address and the MAC with that return address. SP stands for stack pointer.

1. Push the Top register along with the return address into the main stack; 2. Calculate a new MAC of the Top register and the return address and save the new MAC into the Top register; 3. Set the PC to the call destination. Return: In general, the Return instructions also perform two operations. First, pop the return address from the stack; second, set the PC to this address. Correspondingly, in Zipper Stack, Returns also become a little more complicated, including three steps. 1. Pop the return address and the previous MAC from the main stack and calculate the MAC of them. Then check whether it matches the Top register. If not, raise an exception (which means an attack). 2. Save the MAC poped from the stack into the Top register. 3. Set the PC to the return address. Figure 4 shows the process of CALL and RET in Zipper Stack. We omit the normal operations about the PC and return addresses. The core idea of Zipper Stack is to use a chain structure to link all return addresses together. Based on this structure, we only need to focus on the protection and verification of the top of the chain instead of protecting the entire structure. Just like a zipper, only the slider is active, and when the zipper is pulled up, the following structure automatically bites up. Obviously, protecting a MAC from tampering is much easier than protecting a series of MAC from tampering: Adding a special register in the CPU is enough, and there is no need to protect a special memory area.

346

4.3

J. Li et al.

Setjump/Longjump and C++ Exceptions

In most cases, the return addresses are used in a LIFO order (last in, first out). But there are exceptions, such as setjump/longjump and C++ exceptions. Consequently, most mechanisms protecting return addresses suffer from the Setjump/Longjump and C++ Exceptions, some papers even think the blockchaining like algorithm cannot work with exceptional control-flows [12]. However, Zipper Stack can accommodate both Setjump/Longjump and C++ Exceptions. Both Setjump/Longjump and C++ Exceptions mainly save and restore the context between different functions. The main task of them is stack unwind. The return addresses in the stack will not encounter any problem since Zipper Stack does not alter either the value nor the position of return addresses. The only problem is how to restore the Top register. The solution is quite simple: backup the Top register just like backup the stack pointer or other registers (in Setjump/Longjump save them in the jump buffer, similar in C++ Exceptions). When we need to Longjump or handle an exception, restore the Top register just as restoring other registers. The chain structure will remain tight. However, the jump buffer is in memory, so this solution exposes the Top register (since the attacker can write on arbitrary areas of memory) and leaves an opportunity to overwrite the Top register. So additional protection is a must. In our implementation, we use a MAC to authenticate the jump buffer (or context record in C++ exception). In this way, we can reuse the MAC module.

5 5.1

Implementations Hardware Approach

We introduce the hardware approach first. In hardware approach, we implemented a prototype of Zipper Stack by modifying the Rocket Chip Generator [35] and customized the RISC-V instruction set accordingly. We also added a MAC module, several registers, and several instructions into the core. Whereafter, we modified the toolchain, including the compiler and the library glibc. Besides, we implemented a similar Shadow Stack for comparison. Core and New Instructions. In Rocket core, we added a Top register and a Key register, which correspond to those designed in the algorithm. These two registers cannot be loaded/stored via normal load/store instructions. At the beginning of a program, the Key and Top register are initialized by random values. In RISC-V architecture, a CALL instruction will store the next PC, i.e., return address, to the ra register, and a RET instruction will read the address in the ra register and jump to the address. Consequently, two instructions were added in our prototype: ZIP (after call), UNZIP (before return). They will perform as a Zipper Stack’s CALL/RET together with a normal CALL/RET. For the sake of simplicity, we use a compressed structure. In RISC-V architecture, the return address in register ra, so we put the return address and the

Zipper Stack: Shadow Stacks Without Shadow

347

MAC together into ra. In the current Rocket core, only lower 40 bits are used to store the address. Therefore, we use the upper 24 bits to hold the MAC. Correspondingly, our Top register is 24 bits. And our Key register is 64 bits. Our new instructions will update a MAC (ZIP after a CALL) or check and restore a MAC (UNZIP before a RET). When a ZIP instruction is executed, the address in ra (only lower 40 bits) along with the old MAC in the Top register will be calculated into a new MAC. The new MAC is stored in the Top register. The old MAC is stored to the higher 24 bits in the ra register (the lower 40 bits remain unchanged). Correspondingly, when an UNZIP instruction is executed, the ra register (including the MAC and address) is calculated and compared with the Top register. If the values match, the MAC in ra (higher 24 bits) is restored into the Top register, and the higher 24 bits in ra is restored to zero. If the values do not match, an exception will be raised (which means an attack). MAC Module. Next, we added a MAC module in the Rocket Core. Here, we use Keccak [3] (Secure Hash Algorithm 3 (SHA-3) in one special case of Keccak) as the MAC function. In our hardware implementation, the arguments of Keccak are as follows: l = 4, r = 256, c = 144 The main difference between our implementation and SHA-3 is that we use a smaller l: in SHA-3, l = 6. This module will take 20 cycles for one MAC calculation normally, and it costs 793 LUTs and 432 flip flops.

Fig. 5. Pipeline of ZIP and UNZIP

Pipeline. The pipeline in Rocket Core is a 5-stage single-issue in-order pipeline. To reduce performance overhead, the calculations are processed in parallel. If the next instruction that uses a MAC calculation arrives after the previous one

348

J. Li et al.

finished, the pipeline will not stall. Figure 5 is a pipeline diagram of two instructions. In Fig. 5, IF, ID, EX, MEM, WB stand for fetch, decode, execute, memory, and write back stages. UD/CK stands for updating Top register/checking MACs. As the figure shows, if the next ZIP/UNZIP instruction arrives after the MAC calculation, only one extra cycle is added to the pipeline, which is equivalent to inserting a nop. It is worth noting that the WB stage in ZIP/UNZIP does not rely on the finish of the UD/CK stage: in the write back stage, it only writes the ra register, and the value does not rely on the current calculation. Fortunately, most functions require more cycles than 20. So in most cases, the ZIP/UNZIP instruction takes only one cycle. Customized Compiler. To make use of the new instructions, we also customized the riscv-gcc. The modification on riscv-gcc is quite simple: Whenever we store a return address onto the stack, we add a ZIP instruction; whenever we pop a return address from the stack, we add a UNZIP instruction. It is noteworthy that, if a function will not call any function, ra register will not be spilled to the stack. So we only add the new instructions when the ra register is saved into/restored from the stack (rather than all calls and returns). Setjmp/Longjmp Support. To support Setjmp/Longjmp, we also modified the glibc in the RISC-V tool chain. We have only modified two points: 1. Declaration of the Jump Buffer: Add additional space for the Top register and MAC. 2. Setjmp/Longjmp: Store/restore the Top register; Authenticate the data. Our changes perfectly support Setjmp/Longjmp, which is verified in some benchmarks in SPEC2000, such as perlbmk. These benchmarks will not pass without Setjmp/Longjmp support. Optimization. In order to further reduce the runtime overhead, we also optimized the MAC module. We added a small cache (with a size of 4) to cache the recent results. If a new request can be found in the cache, the calculation will take only one cycle. This optimization slightly increased the complexity of the hardware, but significantly reduced runtime overhead (by around 30%, see Sect. 6). A Comparable Hardware Based Shadow Stack. In order to compare with Shadow Stacks, we also implemented a hardware supported Shadow Stack on Rocket Core. We tried to be consistent as much as possible: We added two instructions that back up or check the return address, and a pointer pointing the shadow stack. The compiler with Shadow Stacks inserts the instructions just like the way in Zipper Stack.

Zipper Stack: Shadow Stacks Without Shadow

5.2

349

LLVM

In customized compiler approach, we implemented Zipper Stack algorithm based on LLVM 8.0. First we allocate some registers: we set the lower bits of XMM15 as the Top register, and several XMM registers as the Key register (we use different size of Key, corresponding to different numbers of rounds of AES-NI, see next part). We modified the backend of the LLVM so as to forbid these registers to be used by anything else. Therefore, the Key and Top register will not be leaked. At the beginning of a program, the Key and Top register are initialized by random numbers. We also modified the libraries so that the libraries will not use these registers. MAC Function. Our implementation leverages the AES-NI instructions [13] on the Inter x86-64 architecture to minimize the performance impact of MAC calculation. We use the Key register as the round key of AES-NI, and use one 128-bit AES block as the input (64-bit address and 64-bit MAC). The 128-bit result is truncated into 64-bit in order to fit our design. We use different rounds of AES-NI to test the performance overhead: a) standard AES, performs 10 rounds of an AES encryption flow, we mark it as full ; b) 5 rounds, marked as half ; c) one round, marked as single. Obviously, the full MAC function is the most secure one, and the single is the fastest one. Operations. In each function, we insert a prologue at the entry, and an epilogue before the return. In the prologue, the old Top register is saved onto stack, and updated to the new MAC of the current return address and the old Top register value. In the epilogue, the MAC in the stack and return address are authenticated using the Top register. If it doesn’t match, an exception will be raised. Just as we introduced before. 5.3

Qemu

To figure out whether it is possible to run the existing binaries if we change the logic of calls and returns, we customized the x86-64 instruction set and simulated it with Qemu. All the simulation is in the User Mode of Qemu 2.7.0. The modification is quite concise: we add two registers and change the logic of Call and Ret instructions. As the algorithm designed, the Call instruction will push the address and the Top register, update the Top register with a new MAC, while Ret instruction will pop the return address and check the MAC. Since Qemu uses lower 39 bits to address the memory, we use the upper 25 bits to store the MAC. Correspondingly, the width of the Top register is also 25 bits. The Key register here is 64 bits. We used SHA-3 as the MAC function in this implementation. Since we did not change the stack structure, this implementation has good binary compatibility. Therefore it can help us to evaluate the security with real x86-64 attacks.

350

J. Li et al. Table 1. Security comparison

Adversary

Stack Protector

ASLR

Shadow Stack

MAC/Encryption

Zipper Stack

Stack Overflow

safe

safe

safe

safe

safe

Arbitrary Write

unsafe

safe

safe

safe

safe

Memory Leak & Arbitrary Write

unsafe

unsafe

unsafe

safe

safe

Secret Leak & Arbitrary Write

N/A

N/A

N/A

unsafe

safe

Replay Attack

unsafe

unsafe

N/A

unsafe

safe

Brute-force Attempts

N/A

N/A

N/A

2N s−1

2N s−1 + N ∗ 2N m−1 *

*: Under a probability of 1 − (1 − 1/e)N , the valid collision of a certain attack does not exist. N: The number of gadgets in an attack; Nm: Bit width of MAC/ciphertext; Ns: Bit width of secret. The Brute-force attack already contains the memory leak, so the ALSR is noneffective and not considered.

6

Evaluation and Analysis

We evaluated Zipper Stack in two aspects: Runtime Performance and Security. We evaluated the performance overhead with the SPEC CPU 2000 on the FPGA and SPEC CPU 2006 in LLVM approach. Besides, we also tested the compatibility of modified x86-64 ISA with existing applications. 6.1

Security

We first analyze the security of Zipper Stack, then show the attack test results, and finally, compare the security of different implementations. Security Analysis. The challenge for the attacker is clear: how to tamper with the memory to make the fake return address be used and bypass our check? We list the defense effect of different methods of protecting return addresses in the face of different attackers in Table 1. The table shows that Zipper Stack has higher security than Shadow Stack and other cryptography-based protection mechanisms. Direct Overwrite. First, we consider direct overwrite attacks. In the previous cryptography-based methods, the adversary cannot know the key and calculate the correct MAC, so it is secure. But we go further that the adversary may steal the key and calculate the correct MAC. As Fig. 6 shows, to tamper with any return address structure and bypass the check (let’s say, Return Address N), the attacker must bypass the pre-use check. Even if the attacker has stolen the key, the attacker needs to tamper with the MAC, which is used to check the return address, i.e., the MAC stored beside Return Address N+1. Since the MAC and the return address are authenticated together, the attacker has to modify the MAC stored beside Return Address N+2 to tamper with the MAC bound to Return Address N+1. And so on, the attacker has to alter the MAC at the top,

Zipper Stack: Shadow Stacks Without Shadow

351

which is in the Top register. As we have assumed, the register is secure against tampering. As a result, an overwrite attack won’t work.

Fig. 6. Direct Overwrite Attack: The solid lines indicate the protect relation of the MACs, and the dotted lines show the order that the attacker should overwrite in.

Replay Attack. Next, we discuss replay attacks: since we have assumed that the attacker can read all the memory, is the attacker able to utilize the protected addresses and their MACs in memory to play a replay attack? In Zipper Stack, it is not feasible because once the call path has changed, the MAC will be updated immediately. The MACs in an old call path cannot work in a new call path, even if the address is the same. This is an advantage over previous cryptography-based methods: Zipper Stack resist replay attacks naturally. Brute-Force Attack. Then we discuss the security of Zipper Stack in the face of brute force. Here we consider the attacks which read all related data in memory, guess the secret constantly, and finally construct the attack. In the cryptographybased approaches, security is closely related to the entropy of the secret. Here in Zipper Stack, the entropy of the secret is the bit width of Key register (Ns), which means the attacker needs to guess the correct Key register in a space of size 2N s . Once the attacker gets the correct key, the attack becomes equivalent to an attack with a secret leak. Attack with Secret Leak. The difference between Zipper Stack and other cryptography-based approaches is, other approaches will fail to protect control flow once the attacker knows the secret, but Zipper Stack will not. Because even if the attacker knows the secret key, the Top register cannot be altered. If an attack contains N gadgets, the attacker needs to find N collisions whose input contains the ROP (or ret2lib) gadget addresses to use the gadgets and bypass the check1 . Considering an ideal MAC function, one collision will take about 2N m−1 1

The same gadget addresses won’t share the same collision, because the MACs bound to them differ.

352

J. Li et al.

times of guesses on average2 . So an attack with N gadgets will take N ∗ 2N m−1 times of guesses on average even if the Key register is leaked. The total number of guesses is (guessing Key value and the collisions) 2N s−1 + N ∗ 2N m−1 . And more unfortunate for the attackers: under a certain probability (1 − (1 − 1/e)N for an attack contains N gadgets), the valid collision does not even exist. Take an attack that contains five gadgets as an example, the possibility that the collision does not exist is around 90%, and the possibility grows as the N grows. Attack Tests. We tested some vulnerabilities and the corresponding attacks to evaluate the security of Zipper Stack. In these tests, we use Qemu implementation, because most attacks are very sensitive to the stack layout, a customized compiler (or just a compiler in a different version) may lead to failures. Using Qemu simulation can keep the stack layout unchanged, avoid the illusion that the defense works, which is actually because of the stack layout changes. We wrote a test suite contains 18 attacks and the corresponding vulnerable programs. Each attack contains at least one exploit on return addresses, including stack overflow, ROP gadget, or ret2lib gadget. All attacks are detected and stopped (all of them will alter the MAC and cannot pass the check during return). These tests show that Zipper Stack is reliable. Table 2 shows the difference between Zipper Stack and other defense methods. Table 2. Attack test against defenses Defence

# Applied # Secured # Bypassed

DEP

20

2

18

ASLR

8

2

6

Stack Canary

6

2

4

Coarse-Grained CFI 16

10

6

2

0

2

20

20

0

Shadow Stacks Zipper Stack

Attacks Against Shadow Stacks. The above attacks can also be stopped by Shadow Stacks. To further prove the Zipper Stack’s security advantages compared to Shadow Stacks, we also wrote two proof-of-concept attacks (corresponding to two common types of shadow stack) that can bypass the shadow stack. As introduced in previous work [7], Shadow Stack implementations have various 2

The MAC function is a (N m + N a)bit-to-N mbit function, so choosing a gadget address will determine Na bits of input, which means there are 2N m optional values. On average, there is one collision in the values, since every 2N m+N a /2N m = 2N a inputs share the same MAC value. Because of our special algorithm, this is not a birthday attack nor an ordinary second preimage attack, but a limited second preimage attack.

Zipper Stack: Shadow Stacks Without Shadow

353

Table 3. Result of SPEC 2000 in hardware approach Benchmark

Baseline (seconds)

Shadow Stack (seconds/slowdown)

Zipper Stack (seconds/slowdown)

Zipper Stack (optimized) (seconds/slowdown)

164.gzip

10923.10

10961.65 (0.35%)

10960.60 (0.34%)

10948.88 (0.24%)

175.vpr

7442.48

7528.06 (1.15%)

7490.49 (0.65%)

7485.40 (0.58%)

176.gcc

8227.93

8318.83 (1.10%)

8348.99 (1.47%)

8317.34 (1.09%)

181.mcf

11128.67

11153.01 (0.22%)

11183.31 (0.49%)

11168.93 (0.36%)

186.crafty

10574.27

10942.89 (3.49%)

10689.74 (1.09%)

10692.53 (1.12%)

197.parser

8318.16

8577.67 (3.12%)

8658.89 (4.10%)

8544.72 (2.72%)

14467.81

15111.99 (4.45%)

15519.98 (7.27%)

15040.26 (3.96%)

252.eon 253.perlbmk

7058.96

7310.78 (3.57%)

7388.20 (4.66%)

7342.20 (4.01%)

254.gap

7728.56

7850.32 (1.58%)

7926.10 (2.56%)

7817.52 (1.15%) 14644.70 (6.48%)

255.vortex

13753.47

14738.06 (7.16%)

14748.70 (7.24%)

256.bzip2

6829.01

6893.50 (0.94%)

6954.81 (1.84%)

6865.27 (0.53%)

300.twolf

11904.25

12044.16 (1.18%)

11974.22 (0.59%)

11917.37 (0.11%)

2.36%

2.69%

1.86%

Average

flaws and can be attacked via different vulnerabilities. We constructed similar attacks. In both PoC attacks, we corrupt the shadow stack before we overwrite the main stack and perform ROP attacks. In the first example, the shadow stack is parallel to the regular stack, as introduced in [9]. The layout of the shadow stack is easy to obtain because its offset to the main stack is fixed. In the second example, the shadow stack is compact (only stores the return address). The offset of this shadow stack to the main stack is not fixed, so we used a memory leak vulnerability to locate the shadow stack and the current offset. In both cases, Shadow Stack can not stop the attacks, but Zipper Stack can. We also added both attacks to Table 2. Security of Different Approaches. The security of Zipper Stack is mainly affected by two aspects: the security of Top register and the length of MAC, corresponding to the risk of tamper of MAC and the risk of brute-force attack. From the Top register’s perspective, the protection in the hardware and Qemu approach is complete. In both implementations, the program cannot access the Top register except the call and return, so there is no risk of tampering. In the LLVM approach, an XMM register is used as the Top register, so it faces the possibility of being accessed. We banned the use of XMM15 in the compiler and recompiled the libraries to prevent access to the Top register, but there are still risks. For example, there may be unintended instructions in the program that can access the XMM15 register. From the perspective of MAC length, MAC in LLVM approach is 64-bit wide and therefore has the highest security. MAC in hardware is 24-bit wide and Qemu is 25-bit wide, so they have comparable security but still lower than LLVM approach.

354

6.2

J. Li et al.

Performance Overhead

We evaluated the performance overhead of different approaches. The cost of the hardware approach is significantly lower than that of the LLVM approach. It is worth noting that QEMU is not designed to reflect the guest’s actual performance, it only guarantees the correctness of logic3 . Thus, we do not evaluate the performance overhead of the QEMU approach. SPEC CINT 2000 in Hardware Approach. To evaluate the performance of Zipper Stack on RISC-V, we instantiated it on the Xilinx Zynq VC707 evaluation board and ran the SPEC CINT 2000 [14] benchmark suite (due to the limited computing power of the Rocket Chip on FPGA, we chose the SPEC 2000 instead of SPEC 2006). The OS kernel is Linux 4.15.0 with support for the RISC-V architecture. The hardware and GNU tool-chain are based on freedom (commit cd9a525) [28]. All the benchmarks are compiled with GCC version 7.2.0 and -O2 optimization level. We ran each benchmark for 3 times. Table 3 shows the results of Zipper Stack and Shadow Stacks. The result shows that without optimization, Zipper Stack is slightly slower than Shadow Stacks (2.69% vs 2.36%); while with optimization (the cache), Zipper Stack is much faster than Shadow Stacks (1.86% vs 2.36%). To sum up, the runtime overhead of Zipper Stack is satisfactory (1.86%). Table 4. Result of SPEC 2006 in LLVM approach Benchmark

Baseline (seconds)

Full (seconds/slowdown)

Half (seconds/slowdown)

Single (seconds/slowdown)

401.bzip2

337.69

385.98 (14.30%)

358.32 (5.76%)

336.33 (−0.40%)

403.gcc

240.38

341.06 (41.89%)

305.94 (21.43%)

272.42 (13.33%)

445.gobmk

340.28

493.47 (45.02%)

427.90 (20.48%)

384.91 (13.12%)

456.hmmer

269.62

270.50 (0.33%)

268.81 (−0.30%)

268.74 (−0.33%)

458.sjeng

403.38

460.58 (14.18%)

435.94 (7.47%)

421.69 (4.54%)

462.libquantum 251.50 464.h264ref

338.29

473.astar

285.58

Average

265.50 (5.57%)

261.17 (3.70%)

256.72 (2.07%)

410.56 (21.37%)

381.11 (11.24%)

355.97 (5.23%)

362.00 (50.11%)

333.91 (14.47%)

311.26 (8.99%)

21.13%

8.96%

4.75%

Performance Overhead in LLVM Approach. To evaluate the performance of Zipper Stack on customized compiler, we run the SPEC CPU 2006 [15] compiled by our customized LLVM (with optimization -O2). 3

For example, QEMU will optimize basic blocks. Some redundant instructions (e.g. a long nop slide) may be eliminated directly, which cannot reflect real execution performance.

Zipper Stack: Shadow Stacks Without Shadow

355

Table 4 shows the performance overhead of Zipper Stack in LLVM approach4 . Shadow Stack is reported to cost about 2.5–5% [9,31]. Our approach costs 4.75% ∼21.13% accroding to the results, depending on how many rounds we perform in MAC function. It does not cost too much overhead if we use only one round of AES-NI, although it is still slower than hardware approach. Furthermore, it costs 21.13% when we perform a standard AES encryption, which is faster than CCFI [21], since we encrypt less pointers than CCFI. 6.3

Compatibility Test

Here, we test the binary compatibility of Zipper Stack. This test is only valid for Qemu implementation. In the other two implementations, due to we have modified the compiler, we can use Zipper Stack as long as we recompile the source code, so there is no compatibility issue. The purpose of this test is: If we only modify the Call and Ret instructions in the x86-64 ISA, and use the compression structure to maintain the stack layout, is it possible to maintain binary compatibility directly? We randomly chose 50 programs in Ubuntu (under the path /usr/bin) to test the compatibility in Qemu. 42 out of 50 programs are compatible with our mechanism. Most failures are due to the Setjmp/Longjmp, which we have not supported in Qemu yet. So we think although some issues need to be solved (such as the setjmp/longjump), Zipper Stack can be used directly on most existing x86-64 binaries.

7

Conclusion

In this paper, we proposed Zipper Stack, a novel algorithm of return address protection, which authenticates all return addresses by a chain structure using MAC. It minimizes the amount of state requiring direct protection and costs low performance overhead. Through our analysis, Zipper Stack is an ideal way to protect return addresses, and we think it is a better alternative to Shadow Stack. We discussed various possible attackers and attacks in detail, concluding that an attacker cannot bypass Zipper Stack and then counterfeit the return addresses. In most cases, Zipper Stack is more secure than existing methods. The simulation of attacks on Qemu also corroborates the security of Zipper Stack. Our experiment also evaluated the runtime performance of Zipper Stack, and the results have shown that the performance loss of Zipper Stack is very low. The performance overhead with hardware support based on Rocket Core is only 1.86% on average (versus a hardware-based Shadow Stack costs 2.36%). Thus, the proposed design is suitable for actual deployment.

4

The performance gain is due to memory caching artifacts and fluctuations.

356

J. Li et al.

References 1. Abadi, M., Budiu, M., Erlingsson, U., Ligatti, J.: Control-flow integrity. In: Proceedings of the 12th ACM Conference on Computer and Communications Security, CCS 2005, pp. 340–353. ACM, New York (2005). https://doi.org/10.1145/1102120. 1102165. http://doi.acm.org/10.1145/1102120.1102165 2. Abadi, M., Budiu, M., Erlingsson, U., Ligatti, J.: Control-flow integrity principles, implementations, and applications. ACM Trans. Inf. Syst. Secur. 13(1), 4:1–4:40 (2009). https://doi.org/10.1145/1609956.1609960. http://doi.acm.org/10. 1145/1609956.1609960 3. Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: Keccak sponge function family main document. Submission to NIST (Round 2) 3(30) (2009) 4. Bittau, A., Belay, A., Mashtizadeh, A., Mazi`eres, D., Boneh, D.: Hacking blind. In: IEEE Symposium on Security and Privacy (SP), pp. 227–242. IEEE (2014) 5. Buchanan, E., Roemer, R., Shacham, H., Savage, S.: When good instructions go bad: generalizing return-oriented programming to RISC. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS 2008, pp. 27–38. ACM, New York (2008). https://doi.org/10.1145/1455770.1455776. http:// doi.acm.org/10.1145/1455770.1455776 6. Chiueh, T.C., Hsu, F.H.: RAD: a compile-time solution to buffer overflow attacks. In: Proceedings 21st International Conference on Distributed Computing Systems, pp. 409–417, April 2001. https://doi.org/10.1109/ICDSC.2001.918971 7. Conti, M., et al.: Losing control: on the effectiveness of control-flow integrity under stack attacks. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 952–963 (2015) 8. Cowan, C., et al.: StackGuard: automatic adaptive detection and prevention of buffer-overflow attacks. In: USENIX Security Symposium, San Antonio, TX, vol. 98, pp. 63–78 (1998) 9. Dang, T.H., Maniatis, P., Wagner, D.: The performance cost of shadow stacks and stack canaries. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, ASIA CCS 2015, pp. 555–566. ACM, New York (2015). https://doi.org/10.1145/2714576.2714635. http://doi.acm.org/ 10.1145/2714576.2714635 10. Davi, L., Sadeghi, A.R., Winandy, M.: ROPdefender: a detection tool to defend against return-oriented programming attacks. In: Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security, ASIACCS 2011, pp. 40–51. ACM, New York (2011). https://doi.org/10.1145/1966913. 1966920. http://doi.acm.org/10.1145/1966913.1966920 11. Evans, I., et al.: Control jujutsu: on the weaknesses of fine-grained control flow integrity. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 901–913. ACM, New York (2015). https://doi.org/10.1145/2810103.2813646. http://doi.acm.org/ 10.1145/2810103.2813646 12. Frantzen, M., Shuey, M.: StackGhost: hardware facilitated stack protection. In: USENIX Security Symposium, vol. 112 (2001) 13. Gueron, S.: Intel® advanced encryption standard (AES) new instructions set. Intel Corporation (2010) 14. Henning, J.L.: SPEC CPU2000: measuring CPU performance in the new millennium. Computer 33(7), 28–35 (2000)

Zipper Stack: Shadow Stacks Without Shadow

357

15. Henning, J.L.: SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34(4), 1–17 (2006). https://doi.org/10.1145/1186736.1186737. http://doi.acm.org/10.1145/1186736.1186737 16. Hund, R., Willems, C., Holz, T.: Practical timing side channel attacks against kernel space ASLR. In: 2013 IEEE Symposium on Security and Privacy (SP), pp. 191–205, May 2013. https://doi.org/10.1109/SP.2013.23. http://doi. ieeecomputersociety.org/10.1109/SP.2013.23 17. Intel: Control-flow enforcement technology preview (2016). https://software.intel. com/sites/default/files/managed/4d/2a/control-flow-enforcement-technologypreview.pdf 18. Katoch, V.: Whitepaper on bypassing ASLR/DEP (2011). https://www.exploitdb.com/docs/english/17914-bypassing-aslrdep.pdf 19. Kuznetsov, V., Szekeres, L., Payer, M., Candea, G., Sekar, R., Song, D.: Codepointer integrity. In: OSDI, vol. 14 (2014) 20. Marco-Gisbert, H., Ripoll-Ripoll, I.: Exploiting Linux and PaX ASLR’s weaknesses on 32-and 64-bit systems (2016) 21. Mashtizadeh, A.J., Bittau, A., Boneh, D., Mazi`eres, D.: CCFI: cryptographically enforced control flow integrity. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 941–951. ACM, New York (2015). https://doi.org/10.1145/2810103.2813676. http://doi.acm.org/ 10.1145/2810103.2813676 22. Ozdoganoglu, H., Vijaykumar, T., Brodley, C.E., Kuperman, B.A., Jalote, A.: SmashGuard: a hardware solution to prevent security attacks on the function return address. IEEE Trans. Comput. 55(10), 1271–1285 (2006) 23. Prasad, M., Chiueh, T.C.: A binary rewriting defense against stack based buffer overflow attacks. In: USENIX Annual Technical Conference, General Track, pp. 211–224 (2003) 24. Qualcomm Technologies Inc.: Pointer authentication on armv8.3 (2017). https:// www.qualcomm.com/media/documents/files/whitepaper-pointer-authenticationon-armv8-3.pdf 25. Seibert, J., Okhravi, H., S¨ oderstr¨ om, E.: Information leaks without memory disclosures: remote side channel attacks on diversified code. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014, pp. 54–65. ACM, New York (2014). https://doi.org/10.1145/2660267.2660309. http://doi.acm.org/10.1145/2660267.2660309 26. Shacham, H.: The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86). In: Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS 2007, pp. 552–561. ACM, New York (2007). https://doi.org/10.1145/1315245.1315313. http://doi.acm.org/ 10.1145/1315245.1315313 27. Shacham, H., Page, M., Pfaff, B., Goh, E.J., Modadugu, N., Boneh, D.: On the effectiveness of address-space randomization. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, pp. 298–307. ACM (2004) 28. SiFive: Sifive’s freedom platforms (2015). https://github.com/sifive/freedom 29. Sinnadurai, S., Zhao, Q., Wong, W.-F.: Transparent runtime shadow stack: protection against malicious return address modifications (2008) 30. Strackx, R., Younan, Y., Philippaerts, P., Piessens, F., Lachmund, S., Walter, T.: Breaking the memory secrecy assumption. In: Proceedings of the Second European Workshop on System Security, EUROSEC 2009, pp. 1–8. ACM, New York (2009). https://doi.org/10.1145/1519144.1519145. http://doi.acm.org/ 10.1145/1519144.1519145

358

J. Li et al.

31. Szekeres, L., Payer, M., Wei, T., Song, D.: SoK: eternal war in memory. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 48–62 (2013) 32. Pax Team: PaX address space layout randomization (ASLR) (2003) 33. The Clang Team: Clang 3.8 documentation safestack (2015). http://clang.llvm. org/docs/SafeStack.html 34. Tice, C., et al.: Enforcing forward-edge control-flow integrity in GCC & LLVM. In: USENIX Security, vol. 26, pp. 27–40 (2014) 35. UC Berkeley Architecture Research: Rocket chip generator (2012). https://github. com/freechipsproject/rocket-chip 36. UC Berkeley Architecture Research: The RISC-V instruction set architecture (2015). https://riscv.org/ 37. van der Veen, V., et al.: A tough call: mitigating advanced code-reuse attacks at the binary level. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 934–953. IEEE (2016) 38. Wagle, P., Cowan, C., et al.: StackGuard: simple stack smash protection for GCC. In: Proceedings of the GCC Developers Summit, pp. 243–255. Citeseer (2003) 39. Zhang, C., et al.: Practical control flow integrity and randomization for binary executables. In: 2013 IEEE Symposium on Security and Privacy (SP), pp. 559– 573. IEEE (2013) 40. Zhang, J., Hou, R., Fan, J., Liu, K., Zhang, L., McKee, S.A.: RAGuard: a hardware based mechanism for backward-edge control-flow integrity. In: Proceedings of the Computing Frontiers Conference, CF 2017, pp. 27–34. ACM, New York (2017). https://doi.org/10.1145/3075564.3075570. http://doi.acm.org/ 10.1145/3075564.3075570

Restructured Cloning Vulnerability Detection Based on Function Semantic Reserving and Reiteration Screening Weipeng Jiang1,2

, Bin Wu1,2(B) , Xingxin Yu1,2 and Zhengmin Yu1,2

, Rui Xue1,2

,

1 State Key Laboratory of Information Security, Institute of Information Engineering, Chinese

Academy of Sciences, Beijing, China {jiangweipeng,wubin,yuxingxin,xuerui1,yuzhengmin}@iie.ac.cn 2 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China

Abstract. Although code cloning may speed up the process of software development, it could be detrimental to the software security as undiscovered vulnerabilities can be easily propagated through code clones. Even worse, since developers tend not to simply clone the original code fragments, but also add variable and debug statements, detecting propagated vulnerable code clone is challenging. A few approaches have been proposed to detect such vulnerability- named as restructured cloning vulnerability; However, they usually cannot effectively obtain the vulnerability context and related semantic information. To address this limitation, we propose in this paper a novel approach, called RCVD++, for detecting restructured cloning vulnerabilities, which introduces a new feature extraction for vulnerable code based on program slicing and optimizes the code abstraction and detection granularity. Our approach further features reiteration screening to compensate for the lack of retroactive detection of fingerprint matching. Compared with our previous work RCVD, RCVD++ innovatively utilizes two granularities including line and function, allowing additional detection for exact and renamed clones. Besides, it retains more semantics by identifying library functions and reduces the false positives by screening the detection results. The experimental results on three different datasets indicate that RCVD++ performs better than other detection tools for restructured cloning vulnerability detection. Keywords: Code vulnerability · Clone detection · Program slicing

1 Introduction Given the diversity and accessibility of abundant open source code, code reuse occurs frequently during software development for the sake of convenience. Considering two similar software products PA and PB , if PA ’s source code has already been open-accessed, then PB ’s developers can quickly develop a similar function with the help of PA ’s source code. At the same time, vulnerabilities in PB ’s source code would also be introduced into PB , which are the so called clone-caused vulnerabilities [1]. Although PA ’s open source © Springer Nature Switzerland AG 2020 L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 359–376, 2020. https://doi.org/10.1007/978-3-030-58951-6_18

360

W. Jiang et al.

code may be patched after the vulnerability is disclosed, a large number of previous research work [2–9] shows that PB ’s developers rarely upgrade and maintain the cloned codes, leaving a great number of security threats. When cloning occurs, developers often modify the cloned code to meet specific functional requirements. Consider again the above two software PA and PB . After cloning, PB ’s developers may add a bunch of assert statements for debugging or delete statements that are not related to its functional requirements. These modifications may or may not affect vulnerabilities in the code, making the vulnerability detection more difficult. For example, VUDDY [2], the most accurate cloning vulnerability detection tool, has been proven to be completely unable to detect such cloning vulnerabilities by experiments in literature [11]. If the cloned code still has vulnerabilities after the statement is modified, then this kind of vulnerability is called restructured cloning vulnerability. At present, detection tools for such cloning vulnerabilities include Deckard [10], ReDeBug [5], RCVD [11], etc., but they are less effective in retaining semantics of vulnerabilities, and the extracted vulnerability codes are not accurate enough. In addition, these tools also lack the retrospective filtering of the detection results, leading to incorrect identification of well-patched code. In order to overcome the shortcomings of current detection tools in detecting restructured cloning vulnerability, we propose an approach that introduces semantic reserving function abstraction and reiteration screening. The proposed abstraction method can retain more semantic information, so it can obtain more accurate vulnerability fingerprints and improve the accuracy of detection. The reiteration screening method can effectively identify the patched codes in the detection results, and significantly reduce the detection false positives. The main contributions of this paper are as follows: Semantic Retention of C/C++ Library Functions. The semantic information related to functions is lost during tokenization and abstraction process of existing cloning vulnerability detection. Therefore, we propose a semantic-reserved code abstraction method, which could reduce semantic missing by retaining C/C++ library function names. Screening of Detection Results Based on Retroactive Detection. The current detection methods cannot effectively identify similar codes before and after patching, which will result in a large number of patched codes still being mistakenly identified as vulnerabilities. Therefore, our method traces the detection results back to the hit fingerprints and looks for hash of official-patch fingerprint corresponding with hit fingerprints to furtherly eliminate this kind of false alarm results. Accurate Cloning Vulnerabilities Detector. We implement our detection tool called RCVD++, which improves the abstraction method in fingerprint generation, and adds the features of retrospective filtering of the detection results, retaining more vulnerability semantics and reducing false positives. Compared to RCVD and ReDeBug, RCVD++ performs better. In comparison with Deckard and CloneWorks [12], the best restructured clone detection tools available today, RCVD++ outperforms their detection results. The rest of the paper is organized as follows: Sect. 2 gives a brief introduction to clone detection and vulnerable code extraction. Then we propose the system architecture

Restructured Cloning Vulnerability Detection Based on Function

361

of our method in Sect. 3 and introduce the details of our method in Sect. 4 and Sect. 5. Moreover, we evaluate our method and compare it with other detection methods in Sect. 6, and introduce the related work about cloning vulnerable code detection in Sect. 7. At last, we conclude this paper in Sect. 8.

2 Background In this section, we introduce some common concepts and describe the process of our proposed code clone detection, followed by the demonstration of program slicing. 2.1 Clone Detection Clone Type. A code fragment is recognized as a cloned one if it satisfies several given definitions of similarity [13]. Currently, code clone mainly includes four types [2], which is widely accepted: Type-1: Exact Clones. The code is copied directly without any modifications. Type-2: Renamed Clones. These are syntactically identical clones except for the modification of identifiers, literals, types, whitespace, layout and comments. Type-3: Restructured Clones. Based on the cloning of Type-2, copied code fragments are further modified such as added, deleted or modified statements. The proposed method can detect Type-1, Type-2 and Type-3 clones. Type-4: Semantic Clones. The two code fragments implement the same function and have the same semantics, but their syntax is different. Now, most of the research focuses on the first three types, while a few other efforts focus on detecting Type-4 cloning vulnerability. This is because vulnerabilities are sensitive to grammars; two pieces of code with the same functionality may not have the same vulnerabilities. Although Yamaguchi, et al. [14] propose a method to detect Type-4 clones, their method relies heavily on costly operations, and the detection accuracy is not precisely shown in their analysis. Detection Granularity. Different detection methods apply different granularities, which influence the accuracy of detection result. The detection granularity is the smallest unit of comparison in the detection process. At present, the code clone detection is mainly composed of five different granularities [2]. Token: This is the smallest meaningful unit that makes up a program. For example, in the statement ‘int x;’ three tokens exist: ‘int’, ‘x’ and ‘;’. Line: A sequence of tokens delimited by a new-line character. Function: A collection of consecutive lines that perform a specific task. File: A file contains a set of functions. Program: A collection of files.

362

W. Jiang et al.

2.2 Vulnerable Code Extraction When detecting the cloning vulnerable code, it is necessary to extract the corresponding vulnerable code from the function to characterize the vulnerability. The methods currently used mainly include program slicing, sliding windows and AST representation. The latter two methods are relatively simple, but the semantic information obtained is not as much as in the program slicing method, so we also adopt program slicing in our method. Program Slicing Program slicing is a technique for extracting code snippets which would affect specific data from a target program. The specific data may consist of variables, processes, objects or anything that users are interested in. Typically, program slicing is based on a slicing criterion, which consists of a pair , where p is a program point and V is a subset of program variables [17]. In addition, program slicing can be performed in a backward or forward fashion. A backward slice consists of all statements which could affect the computation of the criterion, while a forward slice contains all statements which could be affected by the criterion.

3 System Architecture The main idea of RCVD++ is to use code patches and program slicing to obtain code fragments related to vulnerabilities, abstract the code fragments into precise fingerprint with semantic reserving function abstraction and retroactively screen the detection result to reduce false alarm. 3.1 Overall Structure There are two stages in RCVD++ as shown in Fig. 1: fingerprint database generation and cloning vulnerability detection. The first stage includes preprocessing, Type-3 fingerprint and Type-1&Type-2 fingerprint generation. The second stage includes hash lookup, greedy-based fingerprint matching and reiteration screening. 3.2 System Flow In the first stage, fingerprints of vulnerable code will be generated. We use two granularities, Line and Function to generate fingerprints, where Line granularity is used to generate Type-3 fingerprints, and Function granularity is used to generate Type-1&Type-2 fingerprints and official-patch fingerprints. The official-patch fingerprints will be linked with the Type-3 fingerprints for screening. Our method uses vulnerability-related keywords to retrieve vulnerable functions, and generates corresponding fingerprints. For Type-3 fingerprints, we first use program slicing to obtain discontinuous code fragments related to the vulnerability from the vulnerable function, and then apply the abstraction method of function semantic reservation to generate the corresponding fingerprints. In order to reduce the detection false positives caused by code patch, our method also proposes to retrospectively filter the detection results by linking Type-3 fingerprints with official-patch

Restructured Cloning Vulnerability Detection Based on Function

363

Stage1: Fingerprint Database Generation Preprocessing Code Crawling & Vulnerable Code Extraction

Vulnerable Code

Type-3 Fingerprint Generation

Type-1&Type-2 Fingerprint Generation

Program Slicing Abstraction(Line) & Hash Retroactive Link

Abstraction(Function) & Hash

Type-3 Fingerprint Database

Type-1&Type-2 Fingerprint Database

Stage2: Cloning Vulnerability Detection Hash Looking Up Target Code

Comparing with Type1&Type-2 Fingerprint

Greedy-based Matching

Comparing with Type-3 Fingerprint

Reiteration Screening

Retroactive to Type-3 Fingerprint

Fingerprint Databases

Fig. 1. Overall structure of RCVD++

fingerprints. For Type-1&Type-2 fingerprints, we take the entire vulnerable function as a whole and apply abstraction to generate the fingerprints. In the second stage, the target code is converted into the same form as the fingerprint for detection. Specifically, there are three sub steps in the detection. First, we will use hash lookup to detect Type-1 and Type-2 clones; if the detection is successful, the detection process will end considering that if a clone is a Type-1 or Type-2 clone, then it cannot be a Type-3 clone. Otherwise, the greedy-based matching algorithm is used to compare the target code with Type-3 fingerprints. Once the detection is successful, we will further trace back to determine whether the code has been patched.

4 Establishment of Fingerprint Databases In order to obtain the best detection result for different types of clones, our method utilizes two granularities to generate fingerprints including Line and Function. Type3 fingerprints are generated by the granularity of Line. The official-patch fingerprints are generated by the granularity of Function and are linked to corresponding Type-3 fingerprints. Type-1&Type-2 fingerprints are generated by the granularity of Function. As shown in Fig. 2, the vulnerable code will be abstracted in the granularity of Line and Function. 4.1 Preprocessing First, the source code database is formed by collecting open source projects. In order to retrieve the information about vulnerabilities more conveniently, the codes are crawled

364

Code

W. Jiang et al. Vulnerable Code Extraction

Program Slicing

Vulnerable Code

Vulnerable Line

Function Abstraction Towards Semantic Reserving(Function Granularity)) & Hash

Function Abstraction Towards Semantic Reserving(Line Granularity) & Hash Retroactive Link

Official-patch Fingerprints

Type-1&Type-2 Fingerprint Database

Type-3 Fingerprint Database

Fingerprint Databases

Fig. 2. Fingerprint databases generation in RCVD++

from open source projects hosted on GitHub, which also makes it easier to get patch information from the commit history. To extract vulnerability related code, we leverage various information embedded within the commit history. Specifically, we look for vulnerability related keywords such as use-after-free, double free, heap-based buffer overflow, stack-based buffer overflow, integer overflow, OOB, out-of -bounds read, out-of -bounds write and CVE to retrieve vulnerable code. After that, the vulnerable codes, patches and the patched codes are saved in the vulnerable code database. 4.2 Program Slicing Based on Patches Obviously, it is impossible to detect restructured clones with granularity of Function or higher. The granularity of Token will discard a lot of information about program, thus it is not suitable for detecting clone-caused vulnerabilities. Therefore, we take Line as granularity and use program slicing [18] to get the discontinuous code fragments, which would preserve the vulnerability related codes better. In order to detect the exact and renamed clones, we also take Function as the granularity to generate corresponding Type-1&Type-2 fingerprint. Besides, we also link the official-patch fingerprint to Type-3 fingerprint to implement the retroactive detection. To determine the slicing criterion, the most intuitive idea is that the codes added or deleted in the patches are related to the vulnerability. However, the lines of added code in the patches do not exist in the original codes; we can only get the variables that exist in the original codes from added codes. If we use the statements in which these variables reside as vulnerable codes, we may introduce codes unrelated to vulnerability, which would greatly increase false positive. Therefore, we only take the deleted code as the slicing criterion. Note that other existing approaches such as VulDeePecker [1] and SySeVR [19] also adopt a similar strategy. Table 1 lists an example of program slicing based on patches. We take the deleted free(a); statement in the patch as slicing criterion to get forward slice and backward slice. These sliced codes contain the key statements that variable a is freed twice, and there are no statements unrelated to vulnerability.

Restructured Cloning Vulnerability Detection Based on Function

365

This example demonstrates that patches-based program slicing can effectively extract the critical code lines related to vulnerability. Table 1. Snippet of the code with double free, corresponding patch and the result of patch-based program slicing Commit char *a = malloc(100); int b; char c; scanf("%s",a); if(strlen(a)>=100){ printf("err"); free(a); } else{ scanf("%d",&b); if (b == 0xffff){ printf("useless"); } } if(strlen(a)=100){ FUNCCALL(LVAR); } if(FUNCCALL(LVAR)=100){ free(LVAR); } if(strlen(LVAR)=100){ free(a); } free(a);

Function Abstraction towards Semantic Reserving

Fingerprint

DTYPE*LVAR=malloc(100); scanf("%s",LVAR); if(strlen(LVAR)>=100){ free(LVAR); } if(strlen(LVAR) r. Invoking Lemma 1 gives that the probability of solving the X of rank R given XX T is ≤ (R!)−1 < (r!)−1 . Then we combine the party knowledge XW (X ∈ Rm×n , and W ∈ Rn×k ). From Theorem 1, XX T is the only information required by SIM with respect to I1 with the knowledge of XW . In other words, combining the XW gives no more information other than XX T . Therefore, given an unknown n, and the probability of solving the −1 . X < (r!)−1 , we have the probability of solving W is also < (r!) Then we continue to combine the party knowledge of { X l W l |l∈S\I ,  l l T X (X ) |l∈S\I }. They are the sum of real number matrices and the sum of non-negative real number matrices respectively. Given the adversary threshold t < s − 1, which means the number of honest local nodes (i.e. |l|l∈S\I ) is ≥ 2, we have a negligible probability to derive any X l W l or X l (X l )T from their sum. In addition, dl (the number of columns of X l ) is also unknown to I. Therefore, with the probability of solving the X < (r!)−1 , the probability of solving either X l or W l is also < (r!)−1 .  l |l ∈I }. At last, we combine the party knowledge of {X l | l ∈I , W l First, they are the input of I, which are independent of { X W l |l∈S\I ,  l l T X (X ) |l∈S\I }. Next, we combine them into X(X)T , which will give the adversary the problem of solving X H (X H )T . As each of dl |l∈S\I is unknown to I, we have the number of column of X H is also unknown to I. Thus, with Lemma 1, we have the probability of solving the X H of rank r is ≤ (r!)−1 . Sim  ilarly, combing {X l |l ∈I , W l |l ∈I } to XW will also give the probability of H −1 solving W ≤ (r!) . Thus, PrivColl preserves ε-privacy against adversary parties I on X, and ε ≤ (r!)−1 . With Theorem 2, we demonstrate that, with a sufficient rank of the training dataset of honest parties, e.g. ≥ 35, which is common in real-world datasets, PrivColl achieves 10−40 -privacy.

PrivColl: Practical Privacy-Preserving Collaborative Machine Learning

5

409

Correctness Analysis and Case Study

In this section, we first prove the correctness of PrivColl when distributing learning algorithms that are based on gradient descent optimization. Then we use a recurrent neural network as a case study to illustrate the collaborative learning process in PrivColl. 5.1

Correctness of PrivColl’s Gradient Descent Optimization

The following theorem demonstrates that if a non-distributed gradient descent optimization algorithm taking X as input, denoted by FGD (X), converges to a local/global minima η, then executing PrivColl with the same hyper settings (such as cost function, step size, and model structure) on X 1 , ..., X s , denoted by  ({(Sl , X l ), Agg}Sl ∈S ), also converges to η. FGD Theorem 3. PrivColl’s distributed algorithm of solving gradient descent opti ({(Sl , X l ), Agg}Sl ∈S ) is correct. mization FGD Proof. Let FGD (X) = η denote the convergence of FGD (X) to the local/global minima η. Let Wi denote the model parameters of FGD at ith training iteration. Let Wi = |{Wil }|l∈{1,..,s} denote the vertical concatenation on {Wil }l∈{1,..,s} ,  at ith training iteration. i.e., the model parameters of FGD th In FGD , the i (i ≥ 1) training iteration update Wi such that ∂XWi−1 ∂J ∂J = Wi−1 − α ∂W ∂(XWi−1 ) ∂Wi−1 ∂J X. = Wi−1 − α ∂(XWi−1 )

Wi = Wi−1 − α

(9)

 In FGD , each node Sl updates its local Wil in l l Wil = Wi−1 − αΔX l = Wi−1 −α l = Wi−1 −α

l ∂X l Wi−1 ∂J  ) l ∂(XWi−1 ∂Wi−1

∂J l  )X . ∂(XWi−1

 Since Wi = |{Wil }|l∈{1,..,s} , and X = |{X l }|l∈{1,..,s} , we have in FGD that l −α Wi = |{(Wi−1

∂J ∂J l   ) X )}|l∈{1,..,s} = Wi−1 − α ∂(XW  ) X. ∂(XWi−1 i−1

(10)

Comparing Eqs. 9 and 10, we can find that Wi and Wi are updated using the same equation. Therefore, with FGD (X) = η, the gradient descent guarantees  ({(Sl , X l ), Agg}Sl ∈S ) also converges to η. FGD Thus, F  GD ({(Sl , X l ), Agg}Sl ∈S ) = FGD (X).

410

5.2

Y. Zhang et al.

Case Study

Figure 2 shows an example of a two-layer feed-forward recurrent neural network (RNN). Every neural layer is attached with a time subscript c. The weight matrix W maps the input vector X (c) to the hidden layer h(c) . The weight matrix V propagates the hidden layer to the output layer yˆ(c) . The weight matrix U maps the previous hidden layer to the current one.

Fig. 2. An example of recurrent neural network

Original Algorithm. Recall that the original non-distributed version of the RNN is divided into the forward propagation and backward propagation through time. First, in the forward propagation, the output of the hidden layer propagated from the input layer is calculated as (c)

Zh = X (c) W + U h(c−1) + bh (c) h(c) = σ1 (Zh )

(11)

The output of the output layer propagated from hidden layer is calculated as (c)

Zy = V h(c) + by (c) yˆ(c) = σ2 (Zy ) Then, the cost function

T 

(12)

J(ˆ y (c) , y (c) ), and the coefficient matrices W, U, V are

c=0

updated using the backward propagation through time. The gradients of J (c) with respect to V is calculated as (c) ∂Z (c) y ∂J (c) ∂ yˆ (c) (c) ∂V ∂ yˆ ∂Zy

. We let

(c)

∂J ∂ yˆ(c)

=

(c) δloss ,

(c) δyˆ

=

(c) σ2 (Zy ).

∂J (c) ∂V (c)

The gradient of J

= with

respect to V can be written as: ∂J (c) (c) (c) = [δloss ◦ δyˆ ](h(c) )T . ∂V The gradients of J (c) with respect to U is calculated as (c) ∂J (c) ∂ yˆ ∂h(c) , ∂ yˆ(c) ∂h(c) ∂U

where

(c) ∂J (c) ∂ yˆ ∂ yˆ(c) ∂h(c)

=V

T

(c) [δloss



(c) δyˆ ],

(13) ∂J (c) ∂U

and

c c c      ∂h(c) ∂ + h(k) ∂h(c) (i) (k) = = σ1 (Zh )U σ1 (Zh )h(k−1) . (k) ∂U ∂U ∂h k=0

k=0

i=k+1

=

PrivColl: Practical Privacy-Preserving Collaborative Machine Learning (c)

411

(c)

Let δh = σ1 (Zh ), then the gradient of J (c) with respect to U is   c c   ∂J (c) (c) (c) (i)  (k) = V T δloss ◦ δyˆ δh U δh h(k−1) . ∂U k=0

(14)

i=k+1

Similarly, the gradients of J (c) with respect to W is calculated as:   c c   ∂J (c) (c) (i)  (k) (k) T (c) = V δloss ◦ δyˆ δh U δh X . ∂W k=0

(15)

i=k+1

Algorithm 1. Privacy Preserving Collaborative Recurrent Neural Network 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

Input: Local training data X l(c) (l = [1, ..., s], c = [0, ..., T ]), learning rate α Output: Model parameters W l (l = [1, ..., s]), U, V Initialize: Randomize W l (l = [1, ..., s]), U, V repeat for all Sl ∈ S do in parallel Sl : X l(c) W l , c = [0, ..., T ] Sl : srt(c) ← X l(c) W l , c = [0, ..., T ] (c) {(Sl , Esrt )}Sl ∈S ← Shr(srt(c) , S) end for (c) X (c) W ← Rec({(Sl , Esrt )}Sl ∈S ) (c) (c) (c) A : Zh ← X W + U h(c−1) + bh , h(c) ← σ1 (Zh ) (c) (c) A : Zy ← V h(c) + by , yˆ(c) = σ2 (Zy ) (c) (c) (c) (c) (c) (c) , δyˆ ← σ2 (Zy ), δh = σ1 (Zh ) A : δloss ← ∂J ∂y ˆ(c)

15:

A:

∂J (c) ∂V

16:

A:

∂J (c) ∂U

17: 18: 19: 20: 21:

(c)

(c)

← [δloss ◦ δyˆ ](h(c) )T  c  c   (c) (c) (i)  (k) ← V T δloss ◦ δyˆ δh U δh h(k−1)

A:V ←V −α A:U ←U −α

T  c=0 T  c=0

k=0

i=k+1

∂J (c) ∂V ∂J (c) ∂U

for all Sl ∈ S do in parallel  c  c   (c) (i)  (k) l(k) T (c) ∂J (c) = V δ ◦ δ δ U δ X loss y ˆ h h ∂W l l

l

W ←W −α

22: end for 23: until Convergence

T  c=0

k=0

i=k+1

∂J (c) ∂W l

PrivColl Algorithm. In PrivColl, each local node keeps X l(c) , c = [0, ..., T ], and maintains the coefficient matrix W l locally. The aggregation node maintains

412

Y. Zhang et al.

coefficient matrices U, V . Similar to its original non-distributed counterpart, the training process is divided into the forward propagation and backward propagation through time. Below we briefly outline these steps and the detailed algorithm is given by Algorithm 1. In the forward propagation, local nodes compute X l(c) W l , c = [0, ..., T ] s  X l(c) W l respectively (line 7) (line number in Algorithm 1), and X (c) W = l=1

is calculated using the secret-sharing scheme (line 8 to line 11). The aggregation (c) (c) node then computes Zh , h(c) , Zy , yˆ(c) using Eqs. 11 and 12 (line 12 and line 13). In the backward propagation through time, for each J (c) at time t, the aggre(c) (c) (k) (c) (c) (k) gation node computes δloss , δyˆ , δh , k = [0, ..., t], and sends δloss , δyˆ , δh to local nodes (line 14). Then the aggregation node computes the gradients of J (c) with respect to V, U using Eqs. 13 and 14 (line 15 and line 16), and updates U, V (line 17 and line 18). The local nodes compute the gradients of J (c) with respect to W l using Eq. 16, and update W l respectively (line 19 to line 22).   c c   ∂J (c) (c) (i)  (k) l(k) T (c) = V δloss ◦ δyˆ δh U δh X . ∂W l k=0

6

(16)

i=k+1

Performance Evaluation

We implement PrivColl in C++. It uses the Eigen library [17] to handle matrix operations, and uses ZeroMQ library [21] to implement the distributed messaging. The experiments are executed on four Amazon EC2 c4.8xlarge machines with 60GB of RAM each, three of which act as local nodes and the other acts as the aggregation node. To simulate the real-world scenarios, we execute PrivColl on both LAN and WAN network settings. In the LAN setting, machines are hosted in a same region, and the average network bandwidth is 1GB/s. In the WAN setting, we host these machines in different continents. The average network latency (one-way) is 137.7 ms, and the average network throughput is 9.27 MB/s. We collect 10 runs for each data point in the results and report the average. We use the MNIST dataset [27], and duplicate its samples when its size is less than the sample size m (m ≥ 60, 000). We take non-private machine learning which trains on the concatenated dataset as the baseline, and compare with MZ17 [31], which is the state-of-theart cryptographic solution for privacy preserving machine learning. It is based on oblivious transfer (MZ17-OT) and linearly homomorphic encryption (MZ17LHE). As shown in Fig. 3, PrivColl achieves significant efficiency improvement over MZ17, and due to parallelization in the computing of the local nodes, PrivColl also outperforms the non-private baselines in the LAN network setting. Linear Regression and Logistic Regression. We use mini-batch stochastic gradient descent (SGD) for training the linear regression and logistic regression.

PrivColl: Practical Privacy-Preserving Collaborative Machine Learning

413

We set the batch size |B| = 40 with 4 sample sizes (1, 000-100, 000) in the linear regression and logistic regression. In the LAN setting, PrivColl achieves around 45x faster than MZ17-OT. It takes 13.11 s for linear regression (Fig. 3a) and 12.73 s for logistic regression (Fig. 3b) with sample size m = 100, 000, while in MZ17-OT, 594.95 s and 605.95 s are reported respectively. PrivColl is also faster than the baseline, which takes 17.20 s and 17.42 s for linear/logistic regression respectively. In the WAN setting, PrivColl is around 9x faster than MZ17-LHE. It takes 1408.75 s for linear regression (Fig. 3c) and 1424.94 s for logistic regression (Fig. 3d) with sample size m = 100, 000, while in MZ17-LHE, it takes 12841.2 s and 13441.2 s respectively with the same sample size. It is worth mentioning that in MZ17, an MPCfriendly alternative function is specifically designed to replace non-linear sigmoid functions for training logistic regression, while in our framework, the non-linear function is used as usual. To further break down the overhead to computation and communication, we summarize the results of linear regression and logistic regression on other sample sizes in Table 1.

Fig. 3. Efficiency comparison. a-c) the natural logarithm of running time(s) as the sample size increases. d-f) running time(s) as the sample size increases.

Neural Network. We implement a fully connected neural network in PrivColl. It has two hidden layers with 128 neurons in each layer (same as MZ17) and takes a sigmoid function as the activation function. For training the neural network, we set the batch size |B| = 150 with 4 sample sizes (1, 000–60, 000). In the LAN network setting, PrivColl achieves 1352.87 s (around 22.5 min) (Fig. 3c) with sample size m = 60, 000, while in MZ17, it takes 294, 239.7 s (more

414

Y. Zhang et al. Table 1. PrivColl’s overhead breakdown for linear/logistic regression Linear regression

Logistic regression

Computation Communication Total

Computation Communication Total

LAN

WAN

LAN

WAN

LAN

WAN

LAN

WAN

m = 1,000

0.445 s

0.016 s 103.28 s

0.461 s 103.73 s

0.467 s

0.583 s 109.57 s

1.050 s 110.04 s

m = 10,000

1.293 s

0.978 s 561.53 s

2.271 s 562.83 s

1.368 s

0.812 s 506.64 s

2.180 s 508.01 s

m = 60,000

6.155 s

1.578 s 887.63 s

7.733 s 893.79 s

6.271 s

1.683 s 829.69 s

7.954 s 835.96 s

m = 100,000 10.07 s

3.037 s 1398.67 s 13.11 s 1408.75 s 10.29 s

2.434 s 1414.64 s 12.73 s 1424.94 s

than 81 h) with the same sample size. PrivColl also outperforms the nonprivate baseline which takes 2, 127.87 s. In the WAN setting, PrivColl achieves 18, 367.88 s (around 5.1 h) with sample size m = 60, 000 (Fig. 3f), while in MZ17, it is not yet practical for training neural networks in WAN setting due to the high number of interactions and high communication. Note that, in Fig. 3f, we still plot the MZ17-OT-LAN result (294, 239.7 s), showing that even when running our framework in the WAN setting, it is still much more efficient compared to the MPC solutions in MZ17 run in the LAN setting. The overhead breakdown on computation and communication is summarized in Table 2. Table 2. PrivColl’s overhead breakdown for neural network Neural network Computation Communication LAN WAN

7

Total LAN

WAN

m = 1,000

30.14 s

8.729 s 795.27 s

38.87 s

825.42 s

m = 10,000

223.08 s

4.683 s 2662.58 s

227.76 s

2885.66 s

m = 60,000

1320.77 s

32.10 s 17047.10 s 1352.87 s 18367.88 s

m = 100,000 2180.74 s

47.99 s 19364.73 s 2228.73 s 21545.47 s

Related Work

The studies most related to PrivColl are [23,49]. Zheng et al. [49] employ the lightweight additive secret sharing scheme for secure outsourcing of the decision tree algorithm for classification. Hu et al. [23] propose FDML, which is a collaborative machine learning framework for distributed features, and the model parameters are protected by additive noise mechanism within the framework of differential privacy. There also have been some previous research efforts which have explored collaborative learning without exposing their trained models [24,33,46,47]. For example, Papernot et al [33] make use of transfer learning in combination with differential privacy to learn an ensemble of teacher models on data partitions, and then use these models to train a private student model.

PrivColl: Practical Privacy-Preserving Collaborative Machine Learning

415

In addition, there are more works on generic privacy-preserving machine learning frameworks via HE/MPC solutions [9,14,20,25,26,34,35,41,43,45] or differential privacy mechanism [1,2,18,39]. Recent studies [11,12] propose a hybrid multi-party computation protocol for securely computing a linear regression model. In [28], an approach is proposed for transforming an existing neural network to an oblivious neural network supporting privacy-preserving predictions. In [43], a secure protocol is presented to calculate the delta function in the back-propagation training. In [31], a MPC-friendly alternative function is specifically designed to replace non-linear sigmoid and softmax functions, as the division and the exponentiation in these function are expensive to compute on shared values.

8

Conclusion

We have presented PrivColl, a practical privacy-preserving collaborative machine learning framework. PrivColl guarantees privacy preservation for both local training data and models trained on them, against an honest-butcurious adversary. It also ensures the correctness of a wide range of machine/deep learning algorithms, such as linear regression, logistic regression, and a variety of neural networks. Meanwhile, PrivColl achieves a practical applicability. It is much more efficient compared to other state-of-art solutions.

References 1. Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016) 2. Abuadbba, S., et al.: Can we use split learning on 1D cnn models for privacy preserving training? arXiv preprint arXiv:2003.12365 (2020) 3. Albrecht, M., et al.: Homomorphic encryption security standard. Technical report, HomomorphicEncryption.org (2018) 4. Bogdanov, D., Laur, S., Willemson, J.: Sharemind: a framework for fast privacypreserving computations. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 192–206. Springer, Heidelberg (2008). https://doi.org/10.1007/9783-540-88313-5_13 5. Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. ACM (2017) 6. Lindell, Y. (ed.): TCC 2014. LNCS, vol. 8349. Springer, Heidelberg (2014). https:// doi.org/10.1007/978-3-642-54242-8 7. Chen, Y.R., Rezapour, A., Tzeng, W.G.: Privacy-preserving ridge regression on distributed data. Inf. Sci. 451, 34–49 (2018) 8. Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. R Theor. Comput. Sci. 9(3–4), 211–407 (2014) Found. Trends 9. Esposito, C., Su, X., Aljawarneh, S.A., Choi, C.: Securing collaborative deep learning in industrial applications within adversarial scenarios. IEEE Trans. Ind. Inf. 14(11), 4972–4981 (2018)

416

Y. Zhang et al.

10. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015) 11. Gascón, A., et al.: Secure linear regression on vertically partitioned datasets. IACR Cryptology ePrint Archive 2016/892 (2016) 12. Gascón, A., et al.: Privacy-preserving distributed linear regression on highdimensional data. Proc. Priv. Enhancing Technol. 2017(4), 345–364 (2017) 13. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, pp. 169–178 (2009) 14. Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016) 15. Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game, or a completeness theorem for protocols with honest majority. In: Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali, pp. 307–328 (2019) 16. Golub, G., Van Loan, C.: Matrix Computations, 3rd edn. The John Hopkins University Press, Baltimore (1996) 17. Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). http://eigen.tuxfamily.org 18. Gupta, O., Raskar, R.: Distributed learning of deep neural network over multiple agents. J. Netw. Comput. Appl. 116, 1–8 (2018) 19. Hagestedt, I., et al.: Mbeacon: privacy-preserving beacons for DNA methylation data. In: NDSS (2019) 20. Hardy, S., et al.: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017) 21. Hintjens, P.: ZeroMQ: Messaging for Many Applications. O’Reilly Media, Inc. (2013) 22. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (2012) 23. Hu, Y., Niu, D., Yang, J., Zhou, S.: FDML: a collaborative machine learning framework for distributed features. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2232–2240 (2019) 24. Jia, Q., Guo, L., Jin, Z., Fang, Y.: Privacy-preserving data classification and similarity evaluation for distributed systems. In: 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), pp. 690–699. IEEE (2016) 25. Ko, R.K.L., et al.: STRATUS: towards returning data control to cloud users. In: Wang, G., Zomaya, A., Perez, G.M., Li, K. (eds.) ICA3PP 2015. LNCS, vol. 9532, pp. 57–70. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27161-3_6 26. Kwabena, O.A., Qin, Z., Zhuang, T., Qin, Z.: MSCryptoNet: multi-scheme privacypreserving deep learning in cloud computing. IEEE Access 7, 29344–29354 (2019) 27. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann. lecun.com/exdb/mnist/ 28. Liu, J., Juuti, M., Lu, Y., Asokan, N.: Oblivious neural network predictions via MiniONN transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631 (2017)

PrivColl: Practical Privacy-Preserving Collaborative Machine Learning

417

29. Marc, T., Stopar, M., Hartman, J., Bizjak, M., Modic, J.: Privacy-enhanced machine learning with functional encryption. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 3–21. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29959-0_1 30. Melis, L., Song, C., De Cristofaro, E., Shmatikov, V.: Exploiting unintended feature leakage in collaborative learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706. IEEE (2019) 31. Mohassel, P., Zhang, Y.: SecureML: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. IEEE (2017) 32. Papernot, N., McDaniel, P., Sinha, A., Wellman, M.: Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814 (2016) 33. Papernot, N., Song, S., Mironov, I., Raghunathan, A., Talwar, K., Erlingsson, Ú.: Scalable private learning with pate. arXiv preprint arXiv:1802.08908 (2018) 34. Ryffel, T., et al.: A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017 (2018) 35. Sadat, M.N., Aziz, M.M.A., Mohammed, N., Chen, F., Wang, S., Jiang, X.: SAFETY: secure gwAs in federated environment through a hybrid solution with intel SGX and homomorphic encryption. arXiv preprint arXiv:1703.02577 (2017) 36. Sharma, S., Chen, K.: Confidential boosting with random linear classifiers for outsourced user-generated data. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 41–65. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-29959-0_3 37. Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321 (2015) 38. Song, S., Chaudhuri, K., Sarwate, A.D.: Stochastic gradient descent with differentially private updates. In: 2013 IEEE Global Conference on Signal and Information Processing, pp. 245–248. IEEE (2013) 39. Vepakomma, P., Gupta, O., Swedish, T., Raskar, R.: Split learning for health: distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564 (2018) 40. Wang, S., Pi, A., Zhou, X.: Scalable distributed DL training: batching communication and computation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5289–5296 (2019) 41. Will, M.A., Nicholson, B., Tiehuis, M., Ko, R.K.: Secure voting in the cloud using homomorphic encryption and mobile agents. In: 2015 International Conference on Cloud Computing Research and Innovation (ICCCRI), pp. 173–184. IEEE (2015) 42. Yao, A.C.C.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science (SFCS 1986), pp. 162–167. IEEE (1986) 43. Yuan, J., Yu, S.: Privacy preserving back-propagation neural network learning made practical with cloud computing. IEEE Trans. Parallel Distrib. Syst. 25(1), 212–221 (2014) 44. Zhang, J., Chen, B., Yu, S., Deng, H.: PEFL: a privacy-enhanced federated learning scheme for big data analytics. In: 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2019) 45. Zhang, X., Ji, S., Wang, H., Wang, T.: Private, yet practical, multiparty deep learning. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1442–1452. IEEE (2017) 46. Zhang, Y., Bai, G., Zhong, M., Li, X., Ko, R.: Differentially private collaborative coupling learning for recommender systems. IEEE Intelligent Systems (2020)

418

Y. Zhang et al.

47. Zhang, Y., Zhao, X., Li, X., Zhong, M., Curtis, C., Chen, C.: Enabling privacypreserving sharing of genomic data for GWASs in decentralized networks. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 204–212. ACM (2019) 48. Zheng, H., Ye, Q., Hu, H., Fang, C., Shi, J.: BDPL: a boundary differentially private layer against machine learning model extraction attacks. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 66–83. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29959-0_4 49. Zheng, Y., Duan, H., Wang, C.: Towards secure and efficient outsourcing of machine learning classification. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 22–40. Springer, Cham (2019). https://doi.org/10. 1007/978-3-030-29959-0_2

An Efficient 3-Party Framework for Privacy-Preserving Neural Network Inference Liyan Shen1,2(B) , Xiaojun Chen1 , Jinqiao Shi3 , Ye Dong1,2 , and Binxing Fang4 1

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China {shenliyan,chenxiaojun,dongye}@iie.ac.cn 2 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3 Beijing University of Posts and Telecommunications, Beijing, China [email protected] 4 Institute of Electronic and Information Engineering of UESTC in Guangdong, Dongguan, China [email protected]

Abstract. In the era of big data, users pay more attention to data privacy issues in many application fields, such as healthcare, finance, and so on. However, in the current application scenarios of machine learning as a service, service providers require users’ private inputs to complete neural network inference tasks. Previous works have shown that some cryptographic tools can be used to achieve the secure neural network inference, but the performance gap is still existed to make those techniques practical. In this paper, we focus on the efficiency problem of privacy-preserving neural network inference and propose novel 3-party secure protocols to implement amounts of nonlinear activation functions such as ReLU and Sigmod, etc. Experiments on five popular neural network models demonstrate that our protocols achieve about 1.2×–11.8× and 1.08×–4.8× performance improvement than the state-of-the-art 3-party protocols (SecureNN [28]) in terms of computation and communication overhead. Furthermore, we are the first to implement the privacy-preserving inference of graph convolutional networks.

Keywords: Privacy-preserving computation inference · Secret sharing

1

· Neural network

Introduction

Machine learning (ML) has been widely used in many applications such as medical diagnosis, credit risk assessment [4], facial recognition. With the rapid development of ML, Machine Learning as a Service (MLaaS) has been a prevalent business model. Many technology companies such as Google, Microsoft, and c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 419–439, 2020. https://doi.org/10.1007/978-3-030-58951-6_21

420

L. Shen et al.

Amazon are providing MLaaS which helps clients benefit from ML without the cognate cost, time, and so on. The most common scenario of using MLaaS is neural network predictions that customers upload their input data to the service provider for inference or classification tasks. However, the data on which the inference is performed involves medical, genomic, financial and other types of sensitive information. For example, the patients would like to predict the probability of heart disease using Google’s pre-trained disease diagnosis model. It will require patients to send personal electronic medical records to service providers for accessing inference services. More seriously, due to the restrictive laws and regulations [1,2], the MLaaS service would be prohibited. One naive solution is to let the consumers acquire the model and perform the inference task on locally trusted platforms. However, this solution is not feasible for two main reasons. First, these models were trained using massive amounts of private data, therefore these models have commercial value and the service providers should want them confidential to preserve companies’ competitive advantage. Second, revealing the model may compromise the privacy of the training data containing sensitive information [22]. Recent advances in research provide many cryptographic tools like secure multiparty computation (MPC) and fully homomorphic encryption (FHE), to help us address some of these concerns. They can ensure that during the inference, neither the customers’ data nor the service providers’ models will be revealed by the others except themselves. The only information available for the customers is the inference or classification label, and for the service providers is nothing. Using these tools, privacy-preserving neural network inference could be achieved [15,18,22–24,27,28]. Concretely, the proposed privacy-preserving neural network computation methods mainly use cryptographic protocols to implement each layer of the neural network. Most of the works use mixed-protocols other than a single protocol for better performance. For the linear layers such as fully connected or convolution layers, they are usually represented as arithmetic circuits and evaluated using homomorphic encryption. And for the non-linear layers such as ReLU or Sigmoid functions, they are usually described as a boolean circuit and evaluated using Yao’s garbled circuit [30]. There have been many secure computation protocols [6,9,13] used to solve the private computation problems under different circuit representations. These protocols for the linear layers are practical enough, due to the use of lightweight cryptographic primitives, such as multiplication triples [8] generated by the three-parties [27,28] and homomorphic encryption with the SIMD batch processing technique [18,22]. However, for the non-linear layers, the protocols based on garbled circuits still have performance bottlenecks. Because garbled circuits incur a multiplicative communication overhead proportional to the security parameter. As noted by prior work [22], 216 invocations of ReLU and Sigmoid could lead communication overhead about 103 MB and 104 MB respectively. This is not feasible in real applications with complex neural networks.

An Efficient 3-Party Framework

421

One solution to alleviate the overhead of non-linear functions is three-party secure protocols. SecureNN [28] constructs new and efficient protocols with the help of a third-party and results in a significant reduction in computation and communication overhead. They implement a secure protocol of derivative computation of ReLU function (DReLU) based on a series of sub-protocols such as private compare(PC), share convert(SC) and multiplication (MUL) etc. And then they use DReLU as a basic building block to implement the computation of non-linear layers such as ReLU and Maxpool. The main problem of DReLU protocol in SecureNN is the complexity and it has a long dependence chain of other sub-protocols. Our work is motivated by SecureNN and attempt to implement a more efficient secure computation of non-linear functions with less complexity. Concretely, we propose a novel and more efficient three-party protocol for DReLU, which is several times faster than the version of SecureNN and only depends on the multiplication sub-protocol. And based on DReLU, we implement not only the activation functions ReLU and Maxpool that are done in SecureNN, but also the more complex activation function Sigmoid. In known secure protocols, Sigmoid is usually approximated by a set of piecewise continuous polynomials and implemented using garbled circuit [22]. Our Sigmoid implementation with a third-party can alleviate the overheads about 360×. Besides, current works have been applied in the inference of many neural networks, such as linear regression, multi-layer perceptron (MLP), deep neural networks (DNN) and convolutional neural networks (CNN), we implement a privacy inference on graph convolutional networks (GCN) [29], which are popular when processing non-Euclidean data structures such as social network, finance transactions, etc. To the best of our knowledge, we are the first to implement the privacy-preserving inference on GCN. 1.1

Contribution and Roadmap

In this paper, we propose an efficient 3-party framework for privacy-preserving neural network inference. In detail, our contributions are described as follows: • We propose a novel protocol of the DReLU which is much faster than that in [28]. Based on the protocol, our 3-party framework could efficiently implement the commonly used non-linear activation functions in neural networks. • We are the first work to implement the privacy-preserving inference scheme of GCN on the MNIST dataset with an accuracy of more than 99%. And the GCN model is more complex than those in [28] in terms of the invocation numbers of linear and non-linear functions. • We give a detailed proof of security and correctness. And we provide the same security as in SecureNN. Concretely, our protocols are secure against one single semi-honest corruption and against one malicious corruption under the privacy notion in Araki et al. [5]. • Experiments demonstrate that our novel protocol can obtain better performance in computation and communication overhead. Concretely, our proto-

422

L. Shen et al.

cols outperform prior 3-party work SecureNN by 1.2×–11.8×. And our protocol for the secure computation of sigmoid is about 360× faster than that in MiniONN [22]. The remainder of this paper is organized as follows. In Sect. 2 we give the definition of symbols and primitives related to neural networks and cryptographic building blocks. An introduction of the general framework for the privacypreserving inference of neural networks is given in Sect. 3. In Sect. 4 we propose our efficient 3-party protocol constructions. Then, we give detailed correctness and security analysis of our protocols in Sect. 5. In Sect. 6 we give the implementation of our protocols. The related work is presented in Sect. 7. Finally, we conclude this paper in Sect. 8.

2 2.1

Preliminary Definitions

Firstly, we define the symbols used in the article in Table 1. Table 1. Table for notations. S

Server

C

Client

P

Semi-honest Third Party

a

Lowercase bold letter denotes vector

A

Uppercase bold letter denotes matrix

a[i]

Denotes the ith elements of a

PRG Pseudo Random Generator ∈R

Uniformly random sampling from a distribution

∈R D Sampling according to a probability distribution D s

2.2

The statistical security parameter

Neural Networks

The neural networks usually have similar structures with many layers stacked on top of each other, the output of the previous layer serves as the input of the next layer. And the layers are usually classified into linear layers and non-linear layers. Linear Layers: The main operations in linear layers are matrix multiplications and additions. Concretely,

An Efficient 3-Party Framework

423

• The Fully Connected (FC) layer used in many neural networks can be formulated as y = W x + b, where x is the input vector, y is the output of FC layer, W is the weight matrix and b is the bias vector. • Convolution layer used in CNN networks can be converted into matrix multiplication and addition as noted in [11] and can be formulated as Y = W X + B. • The graph convolution layer used in GCN networks two conK−1 is essentially ˜ secutive matrix multiplication operations Y = k=0 Tk (L)XW as we have described in Appendix A. • The mean pooling operation used in CNN or GCN networks is the average of a region of the proceeding layer. It can be formulated as y = mean(x). Non-linear Layers: Operations in the non-linear layer are usually comparison, exponent, and so on. We identify three common categories: • Piecewise linear functions [22]. e.g. Rectified Linear Units (ReLU): f (y) = [max(0, yi )]; the max pooling operation: y = max(x). 1 • Smooth activation functions. e.g. Sigmoid: f (y) = [ ]. 1 + e−yi −yi e • Softmax: f (y) = [  −yj ]. It is usually applied to the last layer and can je be replaced by argmax operation. As in the prediction phase, it is orderpreserving and will not change the prediction result. 2.3

Additive Secret Sharing

In our protocols, all values are secret-shared between the server and the client. Consider a 2-out-of-2 secret sharing protocol, a value is shared between two parties P0 and P1 such that the addition of two shares yields the true value. Suppose the value x is shared additively in the ring Z2 (integers modulo 2 ), the two shares of x are denoted as x0 and x1 . • The algorithm Share(x) generates the two shares of x over Z2 by choosing r ∈R Z2 and sets x0 = r, x1 = x − r mod 2 . • The algorithm Reconst(x0 , x1 ) reconstructs the value x = x0 + x1 mod 2 using x0 and x1 as the two shares. Suppose in two-party applications, both P0 and P1 share their inputs x,y with each other. Then, the two parties run secure computation protocols on the input shares. • For addition operation, z = x + y: Pi locally computes zi = xi + yi . • For multiplication operation, z = x · y: it will be performed using precomputed Multiplication Triples (MTs) [8] of the form c = a · b. Based on the shares of a MT, multiplication will be performed as follows: (1) Pi computes ei = xi − ai and f i = yi − bi (2) Pi performs Reconst(e0 , e1 ) and Reconst(f0 , f1 ) (3) Pi sets its output share of the multiplication as zi = −i × e × f + f × xi + e × yi + ci

424

L. Shen et al.

The secret sharing scheme could also be used for matrix X by sharing the elements of X component-wise. 2.4

Threat Model and Security

The parties involved in our protocols are service provider S, the customer C and the third server P. In our protocols, we consider an adversary who can corrupt only one of the three parties under semi-honest and malicious security. Semi-honest Security: A semi-honest (passive) adversary is an adversary who corrupts parties but follows the protocol specification. That is, the corrupt parties run the protocol honestly but try to learn additional information from the received messages. We follow the security proof method in the real-ideal paradigm [10]. Let π be a protocol running in the real interaction and F be the ideal functionality completed by a trusted party. The ideal-world adversary is referred to as a simulator Sim. We define the two interactions as follows: • Realπ (k, C; x1 , ..., xn ) run the protocol π with security parameter k, where each party Pi ’s input is xi and the set of corrupted parties is C. Output {Vi , i ∈ C}, (y1 , ..., yn ). The final view and final output of Pi are Vi and yi respectively. • IdealF ,Sim (k, C; x1 , ..., xn ) compute (y1 , ..., yn ) ← F(x1 , ..., xn ) Output Sim(C, {(xi , yi ), i ∈ C}), (y1 , ..., yn ). In the semi-honest model, a protocol π is secure, it must be possible that the ideal world adversary’s view is indistinguishable from the real world adversary’s view. Privacy Against Malicious Adversary: A malicious (active) adversary may cause the corrupted parties to deviate from the protocol. Araki et al. [5] formalized the notion of privacy against malicious adversaries using an indistinguishability based argument, which is weaker than the full simulation-based malicious security. Nevertheless, it guarantees that privacy is not violated even if one of the parties behaves maliciously. The indistinguishability-based malicious security is that for any two inputs of the honest parties, the view of the adversary in the protocol is indistinguishable.

3

The General Framework for the Secure Inference of Neural Networks

The two-party mixed privacy-preserving inference protocol of neural networks is based on additive secret sharing. S and C run the interactive secure computation protocol for each layer on the input shares, during which they do not learn any intermediate information. And the outputs of each layer also be secretly shared among the two parties. Suppose the neural network is defined as: y = (W L−1 · fL−2 (...f0 (W 0 · X)...)), the corresponding computation process of the

An Efficient 3-Party Framework

425

model is presented in Fig. 1. Specifically, for each layer, S and C will each hold a share of Y i such that Reconst of the shares are equal to the input/output to that layer which is performed in the version of plaintext computation of neural networks. The output values will be used as inputs for the next layer. Finally, S sends the output shares y0 to C who can reconstruct the output predictions.

Fig. 1. The flow chart of the neural network computation process.

Most network weights and input/output features are represented as floatingpoint numbers, while in cryptographic protocols, they are typically encoded into integer form. In our paper, all values are secretly shared in the ring ZL , where L = 264 . And we follow the work in [28] using fixed-point arithmetic to perform all computations with α and β bits (as in [28], we set β = 13) for integer and fraction parts respectively. Each value is represented in two’s complement format, of which the most significant bit (MSB) is the sign bit. It works straightforwardly for the addition and subtraction of two fixed-point decimal numbers. While for multiplication, it needs to truncate the last β bits of the product.

4

Protocol Constructions

The idea of the 3-party protocol in our design is the same as the 2-party protocol as we described above, except that it needs a third-party P to assist in the computation for each layer protocol of the neural network. Concretely, for the linear layer, we utilize the protocol [27,28] based on the Beaver MTs which should be generated by the complex cryptographic protocol. We use P to provide the relevant randomness in the MTs, thereby it can significantly reduce the overhead. and The corresponding ideal functionality F is Z = W X, where W ∈ Zm×n L . The detail of the protocol π is described in Algorithm 1. The X ∈ Zn×v Mul L parties generate the MTs in step 1. S and C compute the matrix multiplication in step 2–4, with each party obtain one share of Z.

426

L. Shen et al.

Algorithm 1. Matrix Multiplication. πMul ({S, C}, P) Input: S: W 0 , X 0 , C: W 1 , X 1 Output: S: W X 0 , C: W X 1 1: P generates random matrices A ∈R Zm×n and B ∈R Zn×v using PRG, and sends L L (A0 , B0 , C0 ), (A1 , B1 , C1 ) to S and C respectively, where C = AB. 2: S computes E0 = W 0 − A0 and F 0 = X 0 − B0 , C computes E1 = W 1 − A1 and F 1 = X 1 − B1 . 3: S and C run algorithm E=Reconst(E0 , E1 ) and F =Reconst(F 0 , F 1 ) by exchanging shares. 4: S outputs W 0 F + EX 0 + C0 , C outputs W 1 F + EX 1 + C1 − EF .

For the non-linear layer, we design efficient protocol with the help of P. The detailed description is as follows. We consider the non-linear functions in two categories. The first kind includes ReLU and Max-pooling operation. The second category includes some complex smooth activation functions such as Sigmoid. The secure computation protocols for the non-linear activation functions all are based on the computation of derivatives of ReLU. Protocol for DReLU: The formula of DReLU is ReLU (x). It is 1 if x ≥ 0 and 0 otherwise. It can be computed by the sign function of x, ReLU (x) = 1−sign(x). Our protocol implements the computation of sign function with the help of P. Concretely, for ∀x, sign(x) = sign(x · r) with r > 0. S and C need to generate random positive numbers r0 ∈R ZL and r1 ∈R ZL in advance respectively. It must satisfy that r=Reconst(r0 , r1 ) be positive number in ZL . Then S and C perform multiplication protocol based on Beaver MTs and the outputs of two parties are y0 = x · r0 and y1 = x · r1 respectively. Since r is randomly generated, the value of x · r is also a random number in ZL . S and C send y0 and y1 to P who performs Reconst(y0 , y1 ). P computes the sign function of the random number z = sign(y), if z = 1, P generates secret shares of zero and sends each share to S and C; if z = 0, P generates secret shares of one and sends to S and C. We could also cut the communication of the last step to half by generating the output shares using a shared PRG key between P and one of the parties. The protocol is described in Algorithm 2. It should be noted that after the multiplication protocol, it needs to truncate the last β bits of the product. Because for the multiplication of two fixed-point values, the rightmost 2β bits of the product now corresponds to the fraction part instead of β bits. While in this case, r is a random positive number in ZL without scaling, the product of x · r does not need to truncate.

An Efficient 3-Party Framework

427

Algorithm 2. ReLU (x). πDReLU ({S, C}, P) Input: S: x0 , C: x1 Output: S: ReLU (x)0 , C: ReLU (x)1 1: S and C generate random positive numbers r0 ∈R ZL and r1 ∈R ZL resp., s.t. r=Reconst(r0 , r1 ) be positive number in ZL . P generates shares of zero ui = 0i and shares of one vi = 1i , i ∈ {0, 1}. 2: S, C and P perform πMul ({S, C}, P) protocol with input (xi ,ri ) and output yi , i ∈ {0, 1} of S, C resp. 3: S and C send the shares of yi to P who performs Reconst(y0 , y1 ). 4: P computes the sign function of the random number z = sign(y), if z = 1, P sends u0 , u1 to S and C resp, S outputs u0 , C outputs u1 ; if z = 0, P sends v0 , v1 to S and C resp, S outputs v0 , C outputs v1 .

Protocol for ReLU: The formula of ReLU is ReLU(x) = x·ReLU (x). It equals the multiplication of x and ReLU (x), the protocol is described in Algorithm 3. Similarly, the product of x·ReLU (x) does not need to truncate. Algorithm 3. ReLU. πReLU ({S, C}, P) Input: S: x0 , C: x1 Output: S: ReLU(x)0 , C: ReLU(x)1 1: S, C and P perform πDReLU ({S, C}, P) protocol with input xi and output yi , i ∈ {0, 1} of S, C resp. 2: S, C and P perform πMul ({S, C}, P) protocol with input (xi , yi ) and output ci , i ∈ {0, 1} of S, C resp.

Protocol for Maxpool: The Maxpool protocol is the same as that in [28] except that we replace the DReLU module with our novel protocol πDReLU . We give a brief description of the protocol in Appendix B due to space limits. And we point the reader to [28] for further details on the Maxpool protocol. Protocol for Sigmoid: We adapt the method in [22] to approximate the smooth activation functions that can be efficiently computed and incurs negligible accuracy loss. The activation function f () is split into m + 1 intervals using m knots which are switchover positions for polynomials expressions. For simplicity, the author uses 1-degree polynomial to approximate sigmoid function: ⎧ 0 x < x1 ⎪ ⎪ ⎪ ⎪ a x + b , x1 ≤ x < x2 ⎨ 1 1 (1) f¯(x) = ... ⎪ ⎪ x + b x ≤ x < x a ⎪ m−1 m−1 m−1 m ⎪ ⎩ 1 x ≥ xm However, the author uses garbled circuit to implement the approximate activation functions which results in a large amount of overhead. Instead, we propose

428

L. Shen et al.

a 3-party protocol for the approximate sigmoid function f¯(x). It is clear that f¯(x) should be public to all parties. According formula 2, S and C will be able to determine the range of x by the subtraction of x and knot xi . ⎧ ∀i, i ∈ {1, ..., m}, x − xi < 0 ⎨0 (2) f¯(x) = ai x + bi ∃i, i ∈ {1, ..., m − 1}, x − xi+1 < 0, x − xi ≥ 0 ⎩ 1 ∀i, i ∈ {1, ..., m}, x − xi ≥ 0 S, C and P can perform a variation of the πDReLU protocol to determine the range of x, which is described in Algorithm 4. S and C compute the subtraction of x and knot xi in step 1. With the help of P, the parties determine the range of x in step 2–5. The outputs of S and C are shares of ai and bi parameters in f¯(). πDReLU can be seen as a special form of πVoDReLU , which only has one knot x1 = 0. Algorithm 4. VoDReLU. πVoDReLU ({S, C}, P) Input: S: x0 , C: x1 Output: S: ai 0 , bi 0 , C: ai 1 , bi 1 suppose x in the (i + 1)th interval for i ∈ {0, ..., m} 1: S and C compute the subtraction of x and xi locally, for i ∈ {0, ..., m − 1}. x, with x[i] = x0 , x ˆ[i] = xi+1 . S: y0 =x-ˆ C: y1 =x, with x[i] = x1 . s.t. y[i] = x − xi+1 . m 2: S and C generate random positive numbers r0 ∈R Zm L and r1 ∈R ZL resp, s.t. r=Reconst(r0 , r1 ), r[i] be positive number in ZL . P generates shares of ai and bi for i ∈ {0, ..., m} and a0 = b0 = am = 0, bm = 1. 3: S, C and P perform πMul ({S, C}, P) protocol element-wise with input (yj ,rj ) and output zj , j ∈ {0, 1} of S, C resp. 4: S sends z0 and C sends z1 to P who performs Reconst(z0 , z1 ). 5: P computes the sign function of the random number c = sign(z) if ∃i ∈ {0, ..., m − 2}, c[i] = 0 and c[i + 1] = 1, P sends uj = ai j , vj = bi j to S and C resp, for j ∈ {0, 1}; if ∀i ∈ {0, ..., m − 1}, c[i] = 1, P sends uj = a0 j , vj = b0 j to S and C resp, for j ∈ {0, 1}; if ∀i ∈ {0, ..., m − 1}, c[i] = 0, P sends uj = am j , vj = bm j to S and C resp, for j ∈ {0, 1}; S outputs u0 , v0 and C outputs u1 , v1 ;

The protocol of Sigmoid is described in Algorithm 5. In order to reduce the overhead, the outputs of πVoDReLU could also be the value of i which indicates x in the i + 1th interval. Then S and C could perform multiplication locally, that is ai x = ai (x0 + x1 ), since all ai , bi are public. The dependence of these protocols is presented in Fig. 2. It can be seen that the protocol for ReLU only depends on the DReLU and Mul subprotocol. Similarly, the protocol dependency of Sigmoid and Maxpool is simple.

An Efficient 3-Party Framework

429

Algorithm 5. Sigmoid. πSigmoid ({S, C}, P) Input: S: x0 , C: x1 Output: S: f¯(x)0 , C: f¯(x)1 1: S, C and P perform πVoDReLU ({S, C}, P) protocol with input xi and output ui , vi i ∈ {0, 1} of S, C resp. 2: S, C and P perform πMul ({S, C}, P) protocol with input (xi , ui ) and output ci , i ∈ {0, 1} of S, C resp. 3: S outputs c0 + v0 , C outputs c1 + v1 .

Fig. 2. Protocol dependence of the non-linear activation function.

5 5.1

Correctness and Security Analysis Correctness Analysis

The protocols are correct as long as the core dependency module of the framework πDReLU is correct. There are two conditions that must be satisfied. 1. As we described above, πDReLU relies on the computation of sign function and it is obvious that sign(x) = sign(x · r) as long as ∀r > 0. 2. All the values in the protocol are in the ring ZL , in order to make sure the correctness of the protocol, all absolute value of any (intermediate) results will not exceed L2 . Therefore, we let the bit length of random value r ∈R ZL be relatively small, such as smaller than 32. Then r will be positive value in the ring and the probability that the result x · r exceeds L2 will be ignored. 5.2

Security Analysis

Our protocols provide security against one single corrupted party under the semi-honest and malicious model. The universal composability framework [10]

430

L. Shen et al.

guarantees the security of arbitrary composition of different protocols. Therefore, we only need to prove the security of individual protocols. we first proof the security under semi-honest model in the real-ideal paradigm. Semi-honest Security Security for πDReLU : The output of the simulator Sim which simulates for corrupted S is one share generated by the algorithm Share which is uniformly random chosen from ZL and S’s view in the real execution is also random share which is indistinguishable with that in ideal execution. It is the same for the simulation of C. The output of the simulator Sim for P is a random value in ZL that is indistinguishable with x · r generated in real-world execution. Security for πReLU : P learns no information from the protocol, for both πDReLU ({S, C}, P) and πMul ({S, C}, P) provide outputs only to S and C. S’s and C’s view in the real execution is random share, which is indistinguishable with that in ideal execution. Security for πMaxpool : The Maxpool protocol in [28] has been proven secure, the only difference between us is that we have replaced the sub-protocol DReLU. πDReLU has been proven secure as mentioned above. Due to the universal composability, the Maxpool protocol we use is secure under the semi-honest model. Security for πVoDReLU : The views of S and C are random shares in ZL , and the same as the views in πDReLU . The view of P is a set of random shares x ◦ r (◦ denotes hadamard product) rather than one in πDReLU , and it is indistinguishable with that in ideal execution. In order to reduce the overhead, we also provide an alternative solution. The outputs of S and C are plaintext values i indicating the interval information. Based on i, S and C could infer the range of input x and output f¯(x). However, the floating-point number corresponding to x in an interval is infinite. The only case that S could infer the classification result of C is that S knows exactly the output f¯(x) of hidden layer. We will prove that this situation doesn’t exist. Suppose the input dimension of the sigmoid function is p, where the number of input in range (x1 , xm ) is q, and the number of input in range (−∞, x1 ), (xm , ∞) is p − q. Then there will be α = (η · 2β )q possible values of f¯(x), where x ∈ ZpL , η is the interval size between two continuous knots. Our protocol could be secure when the statistical security parameter s is 40 by choosing appropriate value of α. For example, we set η = 2 (e.g. the sigmoid function is approximated in the set of knots [−20, −18, ..., 18, 20]) and β = 13, then as long as q ≥ 3, statistical security can be satisfied. In practical applications, the nodes in a hidden layer are usually highdimensional vector, so our alternative scheme is secure under the semi-honest model. Security for πSigmoid : P learns no information from the protocol, for both πVoDReLU ({S, C}, P) and πMul ({S, C}, P) provide outputs only to S and C. S’s and C’s view in the real execution is random share, which is indistinguishable with that in ideal execution.

An Efficient 3-Party Framework

431

Malicious Security The same as [28], our protocols provide privacy against a malicious S or P in the indistinguishability paradigm [5]. This holds because all the incoming messages to S or P are random shares generated in ZL . For any two inputs of the honest C, the view of the adversary is indistinguishable. Collusion Analysis Compared with SecureNN, our protocols for the non-linear functions can resist against collusion attack, as long as the underlying multiplication module is collusion resistant. For example, we can replace the multiplication protocol based on the 3-party with a traditional 2-party protocol based on HE or other techniques. Because the random number ri owned by S and C is generated separately, it can be regarded as one-time-pad, thereby protecting the private information of each party. However, the protocols in SecureNN will reveal computation results at each layer, as long as the third-party collude with S or C even if the underlying multiplication protocol is collusion resistant.

6

Experimental Results

6.1

Experimental Enviroment

Our experiments are executed on a server with an Intel Xeon E5-2650 CPU(2.30 GHz) and 126GB RAM in two environments, respectively modeling LAN and WAN settings. The network bandwidth and latency are simulated using Linux Traffic Tools(tc) command. We consider the WAN setting with 40MB/s and 50 ms RTT (round-trip time). All our protocols and that in SecureNN has been implemented and executed more than 10 times on our server (The source code of SecureNN is obtained from Github1 and we use their code directly in our comparison experiment). We take the average of experimental results for comparison and analysis. 6.2

Neural Networks

We evaluate the experiments in five different neural networks denoted by A, B, C, D and E over the MNIST dataset. More details about those neural networks can be found in Appendix C. Network A is a 3-layer deep neural network comprising of three FC layers with a ReLU activation function next to every FC layer (Fig. 3). Network B is a 3-layer convolutional neural network comprising of one convolutional layer and two full connection layers. Every layer is followed by a ReLU activation function (Fig. 4). Network C is 4-layer convolutional neural networks, the first two layers of which are convolutional layers followed by ReLU and a 2×2 Maxpool. The last two layers are FC layers followed by the ReLU activation function (Fig. 5). 1

https://github.com/snwagh/securenn-public.

432

L. Shen et al.

Network D is 4-layer convolutional neural networks similar to Network C, and is a larger version with more output channels and more weights (Fig. 5). Network E is a graph convolutional neural network on MNIST datasets. The network consists of one graph convolutional layer with 32 output feature maps, followed by the ReLU activation layer and one FC layer (Fig. 6). Network A, B, C and D were also evaluated in SecureNN, and graph convolutional neural network E is implemented for private inference for the first time. For network E, we construct an 8-nearest neighbor graph of the 2D grid with |V| = 784 nodes. Each image x is transformed to a 784 dimension vector and the pixel value of an image xi serves as the 1-dimensional feature on vertex vi . We simplify the model in [12]. Network E just have one graph convolutional layer without pooling operation rather than the original model with two graph convolutional layer followed by a pooling operation separately, and the order of filter is set K = 5 rather than K = 25. The simplified network structure can faster the secure computation and also achieve an accuracy of more than 99% on MINST classification task. 6.3

Inference Results

Compared with previous two-party protocols, such as SecureML [24] and Minionn [22], the performance gains for 3-party protocols SecureNN come from using the third-party to generate Beaver MTs for the linear matrix multiplication and avoiding the use of garbled circuits for the non-linear activation functions. However, SecureNN still depends on a series of subprotocol for the secure computation of non-linear activation functions, such as DReLU protocol, it first needs to convert values that are shared over ZL into shares over ZL−1 and then computes the MSB of this value based on private compare and multiplication subprotocol and finally convert results that are shared over ZL−1 into shares over ZL for the computation of the next linear layer. Our protocol of the DReLU only depends on the third-party multiplication subprotocol which is obviously faster than SecureNN. The experimental inference results of the five neural network are described in Table 2. We run the privacy-preserving inference for 1 prediction and batch of 128 predictions in both the LAN and WAN settings. It can be found that our protocol is about 1.2×–4.7× faster than SecureNN for 1 prediction, about 2.5×– 11.8× faster for 128 predictions and the communication cost is about 1.08×–4.8× fewer than SecureNN. The neural network E is more complex than C or D, for they have 3211264, 1323520 and 1948160 invocations of ReLU respectively, and the invocations for multiplication module is also larger than that of network C and D. And the result of SecureNN for network E is performed based on their source code. The experimental microbenchmark results for various protocols are described in Table 3. It is about 2×–40× faster than SecureNN and the communication cost of the DReLU, ReLU and Maxpool protocols are about 9.2×, 6.3× and 4.9× fewer than those in SecureNN respectively.

An Efficient 3-Party Framework

433

Table 2. Inference results for batch size 1 vs 128 on Networks A-E over MNIST. Batch size

LAN (s) 1 128

WAN (s) 1 128

Comm (MB) 1 128

A SecureNN 0.036 Us 0.02

0.52 0.11

6.16 7.39 3.09 3.29

2.1 1.94

29 8.28

B SecureNN 0.076 Us 0.03

4.24 0.68

7.76 25.86 3.87 7.43

4.05 2.28

317.7 90.45

C SecureNN 0.14 Us 0.03

14.204 9.34 64.37 8.86 1.2 4.86 15.35 2.73

1066 281.65

D SecureNN 0.24 Us 0.07

19.96 2.25

1550 395.21

10.3 89.05 18.94 5.11 19.84 9.92

E SecureNN 1.917 32.993 7.57 128.48 61.76 2398.83 Us 1.605 9.071 6.06 29.25 46.91 497.76 Table 3. Microbenchmarks in LAN/WAN settings. Protocol Dimension

LAN (s) SecureNN Us

DReLU 64 × 16 128 × 128 576 × 20

15.66 204.54 134.05

1.61 464.21 7.09 1058.5 3.28 874.36

190.79 0.68 200.146 10.88 195.9 7.65

0.074 1.18 0.83

64 × 16 128 × 128 576 × 20

15.71 210.2 135.8

1.65 513.25 8.5 1115.48 5.6 921.76

233.98 0.72 281.543 11.534 266.91 8.11

0.115 1.835 1.29

ReLU

Maxpool 8 × 8 × 50,4 × 4 64.9 24 × 24 × 16,2 × 2 85.6 24 × 24 × 20,2 × 2 98.4

WAN (s) SecureNN Us

13.59 7641.7 5.79 1724.7 6.82 1763.9

Comm (MB) SecureNN Us

3812.79 802.02 804.98

2.23 5.14 6.43

0.46 1.05 1.31

Table 4. Overhead of secure computation of the sigmoid function. 216 220 LAN (ms) Comm (MB) LAN (ms) Comm (MB) MiniONN 105

104





Us

88.08

5470.12

1409.29

278.039

We perform our sigmoid approximation protocol with the ranges as [x1 , xm ] = [−11, 12] and 25 pieces, the experimental results are described in Table 4. Our protocol achieves about 360× improvement, and the communication cost is about 114× fewer than that in MiniONN when there are 216 invocations of the sigmoid function. We also implement 220 invocations of the sigmoid function with acceptable overhead.

434

7

L. Shen et al.

Related Work

As a very active research field in the literature, privacy-preserving inference of neural networks has been extensively considered in recent years. And existing work mainly adopts different security protocols to realize the secure computation of neural networks. Early work in the area can be traced back to the work by Barni et al. [7,25], which is based on homomorphic encryption. Gilad-Bachrach et al. [15] proposed CryptoNets based on leveled homomorphic encryption (LHE) which supports additions and multiplications and only allows a limited number of two operations. Thus, only low-degree polynomial functions can be computed in a straightforward way. Due to the limitations LHE, the author proposed several alternatives to the activation functions and pooling operations used in the CNN layers, for example, they used square activation function instead of ReLU and mean pooling instead of max pooling. It will affect the accuracy of the model. Instead of purely relying on HE, most of the work uses multiple secure protocols. SecureML presented protocols for privacy-preserving machine learning for linear regression, logistic regression and neural network training. The data owners distributed their data among two non-colluding servers to train various models using secure two-party computation (2PC). The author performed multiplications based on additive secret sharing and offline-generated multiplication triplets. And the non-linear activation function is approximated using a piecewise linear function, which is processed using garbled circuits. MiniONN further optimized the matrix multiplication protocols. It performed leveled homomorphic encryption operations with the single instruction multiple data (SIMD) batch processing technique to complete the linear transformation in offline. The author used polynomial splines to approximate widelyused nonlinear functions (e.g. sigmoid and tanh) with negligible loss in accuracy. And the author used secret sharing and garbled circuits to complete the activation functions in online. Chameleon [27] used the same technique as in [24] to complete the matrix multiplication operations. The difference is that [24] completed the multiplication triplets based on oblivious transfer [17,20] or HE, while Chameleon completed it with a third-party that can generate correlated randomness used in multiplication triplets. Besides, the oblivious transfer used in garbled circuits also completed based on the third-party. Almost all of the heavy cryptographic operations are precomputed in an offline phase which substantially reduces the computation and communication overhead. However, some information could be revealed if the third-party colluded with either party. Gazelle [18] is state-of-the-art 2PC privacy-preserving inference framework of neural networks. It performed multiplications based on homomorphic encryption and used specialized packing schemes for the Brakerski-Fan-Vercauteren (BFV) [14] scheme. However, in order to achieve circuit privacy, the experiment result is about 3-3.6 times slow down for the HE component of Gazelle [26]. Similarly, it used garbled circuits for non-linear activations.

An Efficient 3-Party Framework

435

All prior works [22,24,27] use a secure computation protocol for Yao’s garbled circuits to compute the non-linear activation functions. However, it has been noted [5] that garbled circuits incur a multiplicative overhead proportional to the security parameter in communication and are a major computational bottleneck. SecureNN constructed a novel protocol for non-linear functions such as ReLU and maxpool that completely avoid the use of garbled circuits. It is developed with the help of a third-party and is the state-of-the-art protocol in secure training and inference of CNNs. Besides, there are some other works combine with quantization / binary neural networks [3,26] or some other constructions based on three-party protocols [23] (the three parties both have shares of the private input in the model).

8

Conclusions

In this paper, we propose novel and efficient 3-party protocols for privacypreserving neural network inference. And we are the first work to implement the privacy-preserving inference scheme of the graph convolutional network. We conducted a detailed analysis of the correctness and security of this scheme. Finally, we give an implementation for the private prediction of five different neural networks, it has better performance than the previous 3-party work. Acknowledgment. This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences, Grant (No.XDC02040400), DongGuan Innovative Research Team Program (Grant No.201636000100038).

A

Graph Convolutional Network

For spectral-based GCN, the goal of these models is to learn a function of signals/features on a graph G = (V, E, A), where V is a finite set of |V| = n vertices, E is a set of edges and A ∈ Rn×n is a representative description of the graph structure typically in the form of an adjacency matrix. It will be computed based on the training data by the server and the graph structure is identical for all signals [16]. x ∈ Rn is a signal, each row xv is a scalar for node v. An essential operator in spectral graph analysis is the graph Laplacian L, and the normalized definition is L = In − D −1/2 AD −1/2 , where In is the identity matrix and D is the diagonal node degree matrix of A. Defferrard et al. [12] proposed to approximate the graph filters by a truncated expansion in terms of Chebyshev polynomials. The Chebyshev polynomials are recursively defined as Tk (x) = 2xTk−1 (x)−Tk−2 (x), with T0 (x) = 1 and T1 (x) = x. And filtering x with a K-localized filter gθ can be performed using: K−1 of a signal ˜ ˜ = 2 L − In . λmax is the largest eigenvalue with L gθ ∗ x = k=0 θk Tk (L)x, λmax of L. θ ∈ RK is a vector of Chebyshev coefficients.

436

L. Shen et al.

The definition generalized to a signal matrix X ∈ Rn×c with c dimensional vector for each node and f feature maps is as follows: Y = K−1 feature n×f ˜ T ( L)XΘ , Θk ∈ Rc×f and the total number of traink k where Y ∈ R k=0 able parameters per layer is c×f ×K (Θ ∈ RK×c×f ). Through graph convolution layer, GCN can capture feature vector information of all neighboring nodes. It preserves both network topology structure and node feature information. However, the privacy-preserving inference of GCN is not suitable for node classification tasks, in which test nodes (without labels) are included in GCN training. It could not quickly generate embeddings and make predictions for unseen nodes [19].

B

Maxpool Protocol

Algorithm 6. Maxpool. πMP ({S, C}, P) Input: S: {xi 0 }i∈[n] , C: {xi 1 }i∈[n] Output: S: y0 , C: y1 , s.t. y = Max({xi }i∈[n] ) 1: S sets max1 0 = x1 0 and C sets max1 1 = x1 1 . 2: for i = {2, ..., n} do 3: S sets wi 0 = xi 0 − maxi−1 0 and C sets wi 1 = xi 1 − maxi−1 1 . 4: S, C and P perform πDReLU ({S, C}, P) protocol with input (wi j ) and output yi j , j ∈ {0, 1} of S, C resp. 5: S, C and P perform πSS ({S, C}, P) protocol with input (yi j , maxi−1 j ,xi j ,) and output maxi j , j ∈ {0, 1} of S, C resp. 6: end for 7: S outputs maxn 0 and C outputs maxn 1 .

Algorithm 7. SelectShare. πSS ({S, C}, P) Input: S: (α0 , x0 , y0 ), C: (α1 , x1 , y1 ) Output: S: z0 , C: z1 , s.t. z = (1 − α)x + αy 1: S sets w0 = y0 − x0 and C sets w1 = y1 − x1 . 2: S, C and P perform πMul ({S, C}, P) protocol with input (αj , wj ) and output cj , j ∈ {0, 1} of S, C resp. 3: S outputs z0 = x0 + c0 and C outputs z1 = x1 + c1 .

An Efficient 3-Party Framework

C

Neural Network Structure

Fig. 3. The neural network A presented in SecureML [24]

Fig. 4. The neural network B presented in Chameleon [27]

Fig. 5. The neural network C/D presented in MiniONN [22] and [21] resp.

437

438

L. Shen et al.

Fig. 6. Graph convolutional neural network trained from the MNIST dataset

References 1. The health insurance portability and accountability act of 1996 (hipaa). https:// www.hhs.gov/hipaa/index.html 2. Regulation (eu) 2016/679 of the European parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (gdpr). https://gdpr-info.eu/ 3. Agrawal, N., Shahin Shamsabadi, A., Kusner, M.J., Gasc´ on, A.: Quotient: twoparty secure neural network training and prediction. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1231– 1247 (2019) 4. Angelini, E., di Tollo, G., Roli, A.: A neural network approach for credit risk evaluation. Q. Rev. Econ. Finan. 48(4), 733–755 (2008) 5. Araki, T., Furukawa, J., Lindell, Y., Nof, A., Ohara, K.: High-throughput semihonest secure three-party computation with an honest majority. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 805–817 (2016) 6. Barni, M., Failla, P., Kolesnikov, V., Lazzeretti, R., Sadeghi, A.-R., Schneider, T.: Secure evaluation of private linear branching programs with medical applications. In: Backes, M., Ning, P. (eds.) ESORICS 2009. LNCS, vol. 5789, pp. 424–439. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04444-1 26 7. Barni, M., Orlandi, C., Piva, A.: A privacy-preserving protocol for neural-networkbased computation. In: Proceedings of the 8th workshop on Multimedia and security, pp. 146–151 (2006) 8. Beaver, D.: Efficient multiparty protocols using circuit randomization. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 420–432. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-46766-1 34 9. Bogdanov, D., Laur, S., Willemson, J.: Sharemind: a framework for fast privacypreserving computations. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 192–206. Springer, Heidelberg (2008). https://doi.org/10.1007/9783-540-88313-5 13 10. Canetti, R.: Universally composable security: a new paradigm for cryptographic protocols. In: Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pp. 136–145. IEEE (2001) 11. Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing (2006) 12. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016) 13. Demmler, D., Schneider, T., Zohner, M.: Aby-a framework for efficient mixedprotocol secure two-party computation. In: NDSS (2015)

An Efficient 3-Party Framework

439

14. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012) 15. Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In: International Conference on Machine Learning, pp. 201–210 (2016) 16. Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015) 17. Ishai, Y., Kilian, J., Nissim, K., Petrank, E.: Extending oblivious transfers efficiently. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 145–161. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4 9 18. Juvekar, C., Vaikuntanathan, V., Chandrakasan, A.: {GAZELLE}: a low latency framework for secure neural network inference. In: 27th {USENIX} Security Symposium ({USENIX} Security 18), pp. 1651–1669 (2018) 19. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016) 20. Kolesnikov, V., Kumaresan, R.: Improved OT extension for transferring short secrets. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8043, pp. 54–70. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40084-1 4 21. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 22. Liu, J., Juuti, M., Lu, Y., Asokan, N.: Oblivious neural network predictions via minionn transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631 (2017) 23. Mohassel, P., Rindal, P.: Aby3: a mixed protocol framework for machine learning. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 35–52 (2018) 24. Mohassel, P., Zhang, Y.: Secureml: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. IEEE (2017) 25. Orlandi, C., Piva, A., Barni, M.: Oblivious neural network computing via homomorphic encryption. EURASIP J. Inf. Secur. 2007, 1–11 (2007). https://doi.org/ 10.1155/2007/37343 26. Riazi, M.S., Samragh, M., Chen, H., Laine, K., Lauter, K., Koushanfar, F.: {XONN}: Xnor-based oblivious deep neural network inference. In: 28th {USENIX} Security Symposium ({USENIX} Security 19), pp. 1501–1518 (2019) 27. Riazi, M.S., Weinert, C., Tkachenko, O., Songhori, E.M., Schneider, T., Koushanfar, F.: Chameleon: a hybrid secure computation framework for machine learning applications. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 707–721 (2018) 28. Wagh, S., Gupta, D., Chandran, N.: Securenn: 3-party secure computation for neural network training. Proc. Priv. Enhanc. Technol. 2019(3), 26–49 (2019) 29. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596 (2019) 30. Yao, A.C.C.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science (sfcs 1986), pp. 162–167. IEEE (1986)

Deep Learning Side-Channel Analysis on Large-Scale Traces A Case Study on a Polymorphic AES Lo¨ıc Masure1,3(B) , Nicolas Belleville2 , Eleonora Cagli1 , Marie-Angela Corn´elie1 , Damien Courouss´e2 , C´ecile Dumas1 , and Laurent Maingault1 1

Univ. Grenoble Alpes, CEA, LETI, DSYS, CESTI, 38000 Grenoble, France [email protected] 2 Univ. Grenoble Alpes, CEA, List, 38000 Grenoble, France 3 Sorbonne Universit´e, UPMC Univ Paris 06, POLSYS, UMR 7606, LIP6, 75005 Paris, France

Abstract. Code polymorphism is an approach to efficiently address the challenge of automatically applying the hiding of sensitive information leakage, as a way to protect cryptographic primitives against side-channel attacks (SCA) involving layman adversaries. Yet, recent improvements in SCA, involving more powerful threat models, e.g., using deep learning, emphasized the weaknesses of some hiding counter-measures. This raises two questions. On the one hand, the security of code polymorphism against more powerful attackers, which has never been addressed so far, might be affected. On the other hand, using deep learning SCA on code polymorphism would require to scale the state-of-the-art models to much larger traces than considered so far in the literature. Such a case typically occurs with code polymorphism due to the unknown precise location of the leakage from one execution to another. We tackle those questions through the evaluation of two polymorphic implementations of AES, similar to the ones used in a recent paper published in TACO 2019 [6]. We show on our analysis how to efficiently adapt deep learning models used in SCA to scale on traces 32 folds larger than what has been done so far in the literature. Our results show that the targeted polymorphic implementations are broken within 20 queries with the most powerful threat models involving deep learning, whereas 100, 000 queries would not be sufficient to succeed the attacks previously investigated against code polymorphism. As a consequence, this paper pushes towards the search of new polymorphic implementations secured against state-of-the-art attacks, which currently remains to be found.

1 1.1

Introduction Context

Side-channel analysis (SCA) is a class of attacks against cryptographic primitives that exploit weaknesses of their physical implementation. During the execution c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 440–460, 2020. https://doi.org/10.1007/978-3-030-58951-6_22

Deep Learning Side-Channel Analysis on Large-Scale Traces

441

of the latter implementation, some sensitive variables are indeed processed that depend on both a piece of public data (e.g. a plain-text) and on some chunk of a secret value (e.g. a key). Hence, combining information about a sensitive variable with the knowledge of the public data enables an attacker to reduce the secret chunk search space. By repeating this attack several times, implementations of secure cryptographic algorithms such as the Advanced Encryption Standard (AES) [31] can then be defeated by recovering each byte of the secret key separately thanks to a divide-and-conquer strategy, thereby breaking the high complexity usually required to defeat such an algorithm. The information on sensitive variables is usually gathered by acquiring time series (a.k.a. traces) of physical measurements such as the power consumption or the electromagnetic emanations measured on the target device (e.g. a smart card). Nowadays, SCA are considered as one of the most effective threats against cryptographic implementations. To protect against SCA, many counter-measures have been developed and have been shown to be practically effective so that their use in industrial implementations is today common. Their effects may be twofold. On the one hand it may force an attacker to require more traces to recover the secret data. In other words it may require more queries to the cryptographic primitive, possibly beyond the duration of a session key. On the other hand, it may increase the computational complexity of side-channel analysis, making the use of simple statistical tools and attacks harder. Countermeasures against SCA can be divided into two main families: masking and hiding. Masking, a.k.a. secret-sharing, consists in replacing any sensitive secret-dependent variable in the implementation by a (d + 1)−sharing, so that each share subset is statistically independent of the sensitive variable. This counter-measure is known to be theoretically sound since it amplifies the noise exponentially with the number of shares d, though at the cost of a quadratic growth in the performance overheads [33,35]. This drawback, combined with the difficulty of properly implementing the countermeasure, makes masking still challenging for a non-expert developer. Hiding covers many techniques that aim at reducing the signal-to-noise ratio in the traces. Asynchronous logic [34] or dual rail logic [41] in hardware, shuffling independent computations [43], or injecting temporal de-synchronization in the traces [13] in software, are typical examples of hiding counter-measures that have been practically effective against SCA. Hence, in practice, masking and hiding are combined to ensure that a secured product meets security standards. However, due to the skyrocketing production of IoT, there is a need for the automated application of protections to improve products’ resistance against SCA while keeping the performance overhead sufficiently low. In this context, a recent work proposed a compiler toolchain to automatically apply a software hiding counter-measure called code polymorphism [6]. The working principle of the counter-measure relies on the execution of many variants of the machine code of the protected software component, produced by a runtime code generator. The successive execution of many variants aims at producing variable side-channel traces in order to increase the difficulty to realize SCA. We emphasize on the

442

L. Masure et al.

fact that, if code polymorphism is the only counter-measure applied to the target component, information leakage is still present in the side-channel traces. Yet, several works have shown the ability of code polymorphism and similar software mechanisms to be effective in practice against vertical SCA [2,14], i.e. up to the point that the Test Vector Leakage Assessment (TVLA) methodology [5], highly discriminating in the detection of side-channel leakage, would not be able to detect information leakage in the traces [3,6]. 1.2

Problem Addressed in This Paper

Yet, though very promising, these results cannot draw an exhaustive guarantee concerning the security level against SCA, since other realistic scenarios have not been investigated. Indeed, on the one hand the SCA literature proposes other ways to outperform vertical attacks when facing hiding counter-measures. Re-synchronization techniques might annihilate the misalignment effect occurred by code polymorphism, since it is successfully applied on hardware devices prone to jitter [16,29,44]. Likewise, Convolutional Neural Networks (CNN) can circumvent some software and hardware de-synchronization counter-measures, in a sense similar to code polymorphism [10,23]. It is therefore of great interest to use those techniques to assess the security provided by some code polymorphism configurations against more elaborated attackers. On the other hand, until now the literature has only demonstrated the relevance of CNN attacks on restricted traces whose size does not exceed 5, 000 samples [7,10,23,42,45], which is small, e.g., regarding the size of the raw traces in the public datasets of software AES implementations used in those papers [7,12,30]. This requires to restrict the acquired traces to a tight window where the attacker is confident that the relevant leakage occurs. Unfortunately, this is not possible since code polymorphism applies hiding in a systematic and pervasive way in the implementation. Likewise, other dimensionality reduction techniques like dedicated variants of Principal Component Analysis (PCA) [38] might be considered prior to the use of CNNs. Unfortunately, they do not theoretically provide any guarantee that relevant features will be extracted, especially for data prone to misalignment. As a consequence, attacking a polymorphic implementation necessarily requires to deal with large-scale traces. This generally spans serious issues for machine learning problems known under the name of curse of dimensionality [36]. That is why it currently remains an open question whether CNN attacks can scale on larger traces, or whether it represents a technical issue that some configurations of code polymorphism might benefit against these attacks. Hence, both problems, namely evaluating code polymorphism and addressing large-scale traces SCA, are closely connected. 1.3

Contribution

In this paper, we tackle the two problems presented so far by extending the security evaluation provided by Belleville et al. [6]. The evaluation aims to assess

Deep Learning Side-Channel Analysis on Large-Scale Traces

443

the security of the highest code polymorphism configuration they used, on same implementations, against stronger attackers. Our evaluation considers a wide spectrum of threat models, ranging from automated attacks affordable by a layman attacker, to state-of-the-art techniques. The whole evaluation setup is detailed in Sect. 2. In particular, we propose to adapt the architectures used in the literature of CNN attacks, in order to handle the technical challenge of large scale traces. This is presented in Subsect. 2.6. Our results show that vertical attacks fail due to the code polymorphism counter-measure, but that more elaborated scenarios lead to successful attacks. Compared to the ones conducted in [6], and depending on the different attack powers considered hereafter, the number of required queries is lowered by up to several orders of magnitude. In a worst case scenario, our trained CNNs are able to recover every secret key byte in less than 20 traces of dimensionality 160, 000, which is 32 times higher than the traces used so far in CNN attacks. This therefore illustrates that large scale traces are not necessarily a technical challenge for deep-learning based SCA. Those results are presented in Sect. 3. Thus, this study claims that though code polymorphism is a promising tool to increase the hardness of SCA against embedded devices, a sound polymorphic configuration, eventually coupled with other counter-measures, is yet to be found, in order to protect against state-of-the-art SCA. This is discussed in Sect. 4. However, the toolchain used by Belleville et al. [6] allows to explore many configurations beside the one considered here, the exploration of the securing capabilities of the toolchain is then beyond the scope of this paper, and left as an open question for further works.

2

Evaluation Methodology

This section presents all the settings necessary to proceed with the evaluation. Subsection 2.1 briefly presents the principle of the code polymorphism countermeasure, and precises the aim of the different code transformations included in the evaluated configuration. Subsection 2.2 presents the target device and the two AES implementations that have been used for this evaluation. Subsection 2.3 details the acquisition settings, while Subsect. 2.4 describes the shape of the acquired traces. Finally, Subsect. 2.5 presents the different threat models considered hereafter, and Subsect. 2.6 precises how the CNN attacks have been conducted. 2.1

Description of the Target Counter-Measure

We briefly describe the code polymorphism counter-measure applied by the toolchain used by Belleville et al. [6]. The compiler applies the counter-measure to selected critical parts of an unprotected source code: it inserts, in the target program, machine code generators, called SGPC s (Specialized Generators of Polymorphic Code), that can produce so-called polymorphic instances, i.e., many

444

L. Masure et al.

different but functionally-equivalent implementations of the protected components. At run-time, SGPCs are regularly executed to produce new machine code instances of the polymorphic components. Thus, the device will behave differently after each code generation but the results of the computations are not altered. The toolchain supports several polymorphic code transformations, which can be selected separately in the toolchain, and most of them offer a set of configuration parameters. A developer can then set the level and the nature of polymorphic transformations, hence the amount of behavioral variability. Hereafter, we detail the specific code transformations that have been activated for the evaluation: – Register shuffling: the index of the general purpose callee saved registers are randomly permuted. – Instruction shuffling: the independent instructions are randomly permuted. – Semantic variants: some instructions are randomly replaced by another semantically equivalent (sequence of) instruction(s). For example, variants of arithmetic instructions (e.g. eor, sub), remains arithmetically equivalent to the original instruction. – Noise instructions: a random number of dummy instructions is added between the useful instructions in order to break the alignment of the leakage in the traces. Noise instructions are interleaved with useful instructions by the instruction shuffling transformation. We emphasize on the fact that the sensitive variables (e.g., the AES key) are only manipulated by the polymorphic instances (i.e., the generated machine code), and not by the SGPCs themselves. SGPCs are specialized code generators, and their only input is a source of random data (a PRNG internal to the code generation runtime) that drives code generation. Hence, SGPCs only manipulate instruction and register encodings, and never manipulate secret data. Thus, performing a SCA on side-channel traces of executions of SGPCs cannot reveal a secret nor an information leakage. However, SGPCs manipulate data that are related to the contents of the buffer instances, e.g., the structure of the generated code, the nature of the generated machine instructions (useful and noise instructions), etc. SCA performed on SGPC traces could possibly be helpful to reveal sensitive information about the code used by the polymorphic instances, but to the best of our knowledge, there is no such work in the literature. As such, this research question is out of the scope of this paper. 2.2

Target of Evaluation

In order to make a fair comparison with the results presented by Belleville et al. [6], we consider two out of the 15 implementations used in their benchmarks, namely the AES 8-bit and the mbedTLS that we briefly describe hereafter. mbedTLS. This 32-bit implementation of AES from the ARM mbedTLS library [4] follows the so-called T-table technique [15]: the 16-byte state of AES

Deep Learning Side-Channel Analysis on Large-Scale Traces

445

is encoded into four uint32 t variables, each representing a column of the state. Each round of the AES is done by applying four different constant look-up tables, that are stored in flash memory. AES 8-bit. This is a simple software unprotected implementation of AES written in C, and manipulating only variables of type uint8 t, similar to [1]. The SubBytes operation is computed byte-wise thanks to a look-up table, stored in RAM. This reduces information leakage on memory accesses, as compared to the use of the same look-up table stored in flash memory. Target Device. We ran the different AES implementations on an evaluation board embedding an Arm Cortex-M4 32-bit core [40]. This device does not provide any hardware security mechanisms against side-channel attacks. This core originally operates at 72 MHz, but the core frequency was reduced to 8 MHz for the purpose of side-channel measurements. The target is similar to the one used by Belleville et al. [6], who considered a Cortex-M3 core running at 24 MHz. These two micro-controllers have an in-order pipeline architecture, but with a different pipeline organization. Thus, we cannot expect those two platforms to exhibit the same side-channel characteristics. However, our experience indicates that these two experimental setups would lead to similar conclusions with regards to the attacker models considered in our study. Similar findings on similar targets have also been reported by Heuser et al. [20]. Therefore, we assume in this study that the differences of side-channel characteristics between the two targets should not induce important differences in the results of such side-channel analysis. Configuration of the Code Polymorphism Counter-Measure. For each evaluated implementation, the code polymorphism counter-measure is applied with a level corresponding to the configuration high described by Belleville et al. [6]: all the polymorphic code transformations are activated, the number of inserted noise instructions follows a probability distribution based on a truncated geometric law. The dynamic noise is activated and SGPCs produce a new polymorphic instance of the protected code for each execution (the regeneration period is set to 1). 2.3

Acquisition Setup

We measured side-channel traces corresponding to EM emanations with an EM probe RF-B 0.3-3 from Langer, equipped with a pre-amplifier, and a Rohde & Schwarz RTO 2044 oscilloscope with a 4 GHz bandwidth and a vertical resolution of 8 bits. We set the sampling rate to 200 MS/sec., with the acquisition mode peak-detect that collects the minimum and the maximum voltage values over each sampling period. We first verify that our acquisition setup is properly set. This is done by acquiring several traces where the code polymorphism was de-activated. Thus, we can verify that those traces are synchronized. Then, computing the Welch’s T-test [28] enables to assess whether the probe is correctly positioned

446

L. Masure et al.

Fig. 1. Acquisitions on the mbedTLS implementation. Top: two traces containing the first AES round. Bottom: SNR computed on the 100, 000 profiling traces.

and the sampling rate is high enough. Then, 100, 000 profiling traces are acquired for each target implementation. Each acquisition campaign lasts about 12 h. 2.4

Preliminary Analysis of the Traces

Once the traces are acquired, and before we investigate the attacks that will be presented in Subsect. 2.5, we detail here a preliminary analysis of the component. The aim is to restrict as much as possible the target region acquired to a window covering the entire first AES round. Therefore there would not be any loss of informative leakage about the sensitive intermediate variable targeted in those experiments. In addition to that, a univariate leakage assessment, by computing the Signal-to-Noise Ratio (SNR) [27],1 is provided hereafter in order to verify that there is no trivial leakage that could be exploited by a weak, automatized attacker (see Subsect. 2.5). mbedTLS. We ran some preliminary acquisitions on 107 samples, in order to visualize all the execution. We could clearly distinguish the AES execution with sparse EM peaks, from the call to the SGPC with more frequent EM peaks. This enabled to focus on the first 106 samples of the traces corresponding to the AES execution. Actually, the traces restricted to the AES execution seem to remain globally the same between each other, up to local elastic deformations along the time axis. This is in line with the effect of code polymorphism, since it involves transformations at the assembly level. Likewise, 10 patterns could be distinguished on each traces, which were clues to expect that it would correspond to the 10 rounds of AES. That is how we could restrict again our target window to the first round, up to comfortable margins because of the misalignment effect of code polymorphism. This represents 80, 000 samples. An illustration of two traces restricted to the first round is given in Fig. 1 (top). 1

A short description of this test is provided in Appendix B for the non-expert reader.

Deep Learning Side-Channel Analysis on Large-Scale Traces

447

Fig. 2. The 16 SNR of the acquired traces from the mbedTLS implementation, one for each targeted byte, after re-aligned pattern extraction.

The SNR denoting here the potential univariate leakage of the first output byte of the SubBytes operation is computed based on the 100, 000 acquired profiling traces, and is plotted in Fig. 1 (bottom). No distinguishing peak can be observed, which confirms the soundness of code polymorphism against vertical attacks on raw traces. However, we observe in each trace approximately 16 EM peaks, which corresponds to the number of memory accesses to the look-up tables per encryption round. These memory accesses are known to carry sensitive information. This suggests that a trace re-alignment on EM peaks might be relevant to achieve successful vertical attacks. We proceed with such a re-alignment that we describe hereafter. For each 80, 000-dimensional trace, the clock cycles corresponding to the region between two EM peaks are identified according to a thresholding on falling edges. Since the EM peak pattern delimiting two identified clock cycles may be spread over a different number of samples, from one pattern to another, we only keep the minimum and maximum points. Likewise, since the number of identified clock cycles can also differ from one trace to another, the extracted samples are eventually zero-padded to be of dimension D = 50.

Fig. 3. Acquisitions on the AES 8-bit implementation. Top: two traces containing the first AES round. Bottom: SNR computed on the 100, 000 profiling traces.

448

L. Masure et al.

Based on this re-alignment, a new SNR is computed on Fig. 2. Contrary to the SNR computed on raw traces in Fig. 1, some leakages are clearly distinguishable though the amplitude of the SNR peaks vary from 5.10−2 to 5.10−1 , according to the targeted byte. AES 8-bit. We proceed in the same way for the evaluation of AES 8-bit implementation. As with the mbedTLS traces, we could identify 10 successive patterns likely to correspond to the 10 AES rounds. Therefore we reduce the target window at the oscilloscope to the first AES round, which represents 160, 000dimensional traces plotted in Fig. 3 (top). This growth in the size of the traces is expected, since the naive AES 8-bit implementation is not optimized to be fast, contrary to the mbedTLS one. Yet, here the peaks are hardly distinguishable from the level of noise. This is expected, due to the look-up tables being moved from the flash memory to the RAM, as mentioned in Subsect. 2.2. As a consequence, the memory accesses being less remarkable. That is why the re-alignment technique described for the mbedTLS implementation is not relevant here. Finally, Fig. 3 (bottom) shows the SNR on the raw traces, ensuring once again that no trivial leakage can be exploited to recover the secret key. 2.5

Threat Models

We propose several threat models that we distinguish according to two powers that we precise hereafter. First, the attacker may have the possibility to get a so-called open sample, which is a clone device behaving similarly to the actual target, in which he has a full access and control to the secret variables, e.g. the secret key. Having an open sample enables to run a profiled attack by building the exact leakage model of the target device. This is a necessary condition in order to evaluate the worstcase scenario from developer’s point of view [21]. Though often seen as strong, this assumption can be considered realistic in our context: Here both the chip and the source codes used for the evaluation are publicly available. Second, the attacker may eventually incorporate human expertise to improve attacks initially fully automatized. Here, this will concern either the preliminary task of trace re-alignment (see the procedure described in Subsect. 2.4), or the capacity to properly design a CNN architecture for the deep learning based SCA (see Subsect. 2.6). Hence, we consider the following attack scenarios: – Aauto : considers a fully automatized attack (i.e. without any human expertise), without access to an open sample. – ACPA : considers an attack without access to an open sample, but with human expertise to re-align the traces (see Subsect. 2.4). It results in doing a CPA targeting the output of the SubBytes operation, assuming the Hamming weight leakage model [8]. – AgT : considers the same attack as ACPA , i.e. targeting re-aligned traces, along with access to an open sample in addition. The profiling is done thanks to

Deep Learning Side-Channel Analysis on Large-Scale Traces

449

Gaussian templates with pooled covariance matrices [11]. No dimensionality reduction technique is used here, beside the implicit reduction done through the re-alignment detailed in Subsect. 2.4. – ACNN : considers an attack with access to an open sample and human expertise to build a CNN for the profiled attack. This attack scenario is considered the most effective against de-synchronized traces with first-order leakage [7,10, 23]. Therefore, we do not assume ACNN to need to get access to re-aligned traces. In addition, no preliminary dimensionality reduction is done here. We observed in the preliminary analysis conducted in Subsect. 2.4 that the SNR of the raw traces, computed on 100, 000 traces, did not emphasize any peak. Instead, the same SNR based on the re-aligned traces emphasized some peaks. Thanks to the works of Mangard et al. [26,27] and Oswald et al. [28], we can already draw the following conclusions: Aauto will not succeed with less than Na = 100, 000 queries, whereas ACPA , AgT and ACNN are likely to succeed within the same amount of queries. 2.6

CNN-Based Profiling Attacks

As mentioned in Subsect. 2.5, CNN attacks may require some human expertise to properly set the model architecture. This section is devoted at describing the whole settings used to train the CNNs used in the attack scenario ACNN , in order to tackle the challenge of large-scale traces. We quickly review the guidelines in the SCA literature, and argue why they are not suited to our traces. We then present the used architecture, and we detail the training parameters. The Literature Guidelines. Though numerous papers have proposed CNN architectures [7,10,25], the state-of-the-art CNNs are currently given by Kim et al. [23] and Zaid et al. [45]. Their common point is to rely on the so-called VGG-like architecture [37]: s ◦ λ ◦ [σ ◦ λ]n1 ◦ [δp ◦ σ ◦ μ ◦ γw,k ]n2 ◦ μ ,

(1)

where γw,k denotes a convolutional layer made of k filters of size w, μ denotes a batch-normalization layer [22], σ denotes an activation function i.e. a nonlinear function, called ReLU, applied element-wise [18], δp denotes an average pooling layer of size p, λ denotes a dense layer, and s denotes the softmax layer.2 Furthermore, the composition [δp ◦ σ ◦ μ ◦ γw,k ] is denoted as a convolutional block. Likewise, [σ ◦ λ] denotes a dense block. We note n1 (resp. n2 ) the number of dense blocks (resp. convolutional blocks). An intuitive approach would be to directly set the parameters or our architecture to the ones used by Kim et al. or by Zaid et al. Unfortunately, we argue in both case that such a transposition is not possible. 2

The softmax outputs a normalized vector of positive scalars. Hence, it is similar to a discrete probability distribution that we want to fit with the actual leakage model.

450

L. Masure et al.

Kim et al. propose particular guidelines to set the architecture [23]. They recommend to fix w = 3, p = 2, i.e., the minimal possible values, and to set n2 so that the time dimensionality at the output of the last block is reduced to one. Since each pooling divides the time dimensionality by p, n2 ≤ logp (D).3 Meanwhile they double the number of filters for each new block compared to the previous one, without exceeding 256. Unfortunately, using these guidelines is likely to increase n2 from 10 in Kim et al.’s work to at least 17 in our context. As explained by He et al. [19], stacking such a number of layers is likely to make the training harder to tweak the learning parameters to optimal values. That is why an alternative architecture called Resnet has been introduced [19], and starts to be used in SCA as well [17,47]. This possibility will be discussed in Sect. 4. Second, due to the doubling number of filters at each new block, the number of learning parameters would be around 1.8 M, i.e. 10 folds more than the number of traces. In such a configuration, over-fitting is likely to happen [36], which degrades the performance of CNNs. In order to improve the Kim et al.’s architecture, Zaid et al. [45] proposed thumb rules to set the filter size in the convolutional and the pooling layers depending on the maximum temporal amplitude of the de-synchronization. Unfortunately, it assumes to know the maximum amplitude of the desynchronization, which is not possible here since it is hard to guess how many times those transformations are applied in the polymorphic instance. Our Architecture. The drawbacks of Kim et al.’s and Zaid et al.’s guidelines in our particular context justify why we do not directly use them. Instead, we propose to take the Kim et al.’s architecture as a baseline, on which we modify some of the parameters as follows. First, we set the number of filters in this first block to k0 = 10, we decrease the maximal number of filters from 256 to kmax = 100, and we slightly change the way the number of filters is computed in the intermediate convolutional layers, according to Table 2 in Sect. A. Likewise, we remove the dense block (i.e. n1 = 0). This limits the number of learning parameters, and thus avoids over-fitting. Second, we increase the pooling size to p = 5. This mechanically allows to decrease the minimal number of convolutional blocks n2 from 17 to 6 for the mbedTLS traces and to 7 for the AES 8-bit ones. To be sure that this growth in p does not imply any loss of information in the pooling layers, we set the filter size to w = 2p + 1 = 11. Eventually, since with such numbers of convolutional blocks the output dimensionality is not equal to one yet, we add a global average pooling δG at the top of the convolutional blocks, which will force the reduction without adding any extra learning parameter [46]. Such an architecture would represent 177, 500 learning parameters, when targeting the mbedTLS implementation and 287, 500 for the AES 8-bit. As a comparison, we would expect at least 1.8M parameters if we apply the Kim et al.’s guidelines to our context. 3

We recall that D denotes the dimensionality of the traces, i.e. D = 80, 000 for the mbedTLS implementation (D = 50 for the re-aligned data) and D = 160, 000 for the AES 8-bit.

Deep Learning Side-Channel Analysis on Large-Scale Traces

451

First Convolutional Block. To decrease further the number of learning parameters, an attacker may even tweak the first convolutional block, by exploiting the properties of the input signal. Figure 4 sketches an EM trace chunk of about one clock cycle. We make the underlying assumption that the relevant information to extract from the traces is contained in the patterns that occur around each clock, mostly due to the change of states in the memory registers storing the sensitive variables [27].4 Moreover, no additional relevant pattern is assumed to be contained in the trace until the next clock cycle, appearing T = 50 samples later.5 By carefully setting w0 to the size of the EM patterns, and p0 such that w0 ≤ p0 ≤ T , we optimally extract the relevant information from the patterns while avoiding the entanglement between two of them. In our experiments, we arbitrarily set p0 = 25. This tweaked first convolutional block has then the same receptive field than one would have with two normal blocks of parameters (w = 11, p = 5). Therefore, we spare one block (i.e. the last one), which decreases the number of learning parameters: our architecture now represents 84, 380 (resp. 177, 500) learning parameters for the model attacking the mbedTLS implementation (resp. AES 8-bit). Table 2 in Sect. A provides a synthesis of the description of our architecture, along with a comparison with the Kim et al. and Zaid et al.’s works. Training Settings. The source code is implemented in Python thanks to the Pytorch [32] library and is run on a workstation with a Nvidia Quadro M4000 GP-GPU with 8 GB memory and 1664 cores. For each experiment, the whole dataset is split into a training and a test subsets, containing respectively 95, 000 and 5, 000 traces. The latter ones are used to simulate a key recov- Fig. 4. Two EM patterns separated by one clock cycle. ery based on the scores attributed to each hypothetical value of the sensitive target variable by the trained model. Moreover, the SH100 data augmentation method is applied to the training traces, following the description given in [10]: each trace is randomly shifted of maximum 100 points, which represents 2 clock cycles.6 The training is done by minimizing the Negative Log Likelihood (NLL) loss quantifying a dissimilarity between the output discrete probability distribution given by the softmax layer and the labels generated by the output of the SubBytes operation, with the Adam optimizer [24] during 200 epochs7 which approx4 5 6 7

This assumption has somewhat already been used for the pattern extraction realignment in Subsect. 2.4. We recall that despite the effect of code polymorphism, and in absence of hardware jitter, the duration of the clock period, in terms of samples, is roughly constant. This data augmentation is not applied on the attack traces. One epoch corresponds to the number of steps necessary to pass the whole training data-set through the optimization algorithm once.

452

L. Masure et al.

imately represents a 16-hour long training for each targeted byte. The learning rate of the optimizer is always set to 10−5 . 2.7

Performance Metrics

This section explains how the performance metrics are computed in order to give a fair comparison between the different attack scenarios considered in this evaluation. Based on an (eventually partially) trained model, we proceed a key recovery, by aggregating the output scores given by the softmax layer, computed from a given set of attack traces coming from the 5, 000 test traces into a maximum likelihood distinguisher [21]. This outputs a scalar score for each key hypothesis. We evaluate the performance of a model during and after the training by computing its Success Rate (SR), namely the probability that the key recovery outputs the highest score to the right key hypothesis. In the following, if the success rate is at least β = 90%, we will say that the attack was successful. Eventually, Na will denote the required number of queries to the cryptographic primitive (i.e., the number of traces) in order to obtain a successful attack. 8

3

Results

Once the different threat models and their corresponding parameterization have been introduced in Sect. 2, we can now present the results of each attack, also summarized in Table 1.

Fig. 5. Success Rate with respect to the number of attack traces. Vertical attacks on mbedTLS. The different colors denote the different targeted bytes.

As argued in Subsect. 2.4, we can directly conclude from the SNRs given by Fig. 1 (bottom) and Fig. 3 (bottom) that the fully automatized attack Aauto 8

More details are given in Appendix C.

Deep Learning Side-Channel Analysis on Large-Scale Traces

453

cannot succeed within the maximum amount of collected traces, i.e., Na > 105 , for both implementations. Figure 5a depicts the performances of ACPA against the mbedTLS implementation, on each state byte at the output of the SubBytes operation. It can be seen that the re-alignment enables a first-order CPA to succeed within Na ≈ 103 for the byte 1, and Na ∈ 104 , 105  for the others.9 Those results are in line with the rule of thumb stating that the higher the SNR on Fig. 2, the faster the success rate convergence towards 1 on Fig. 5a [27]. Since we argued in Subsect. 2.4 that the proposed re-alignment technique was not relevant on the AES 8-bit traces, we conclude that ACPA would require more that 105 queries on those traces.10 Figure 5b summarizes the outcomes of the attack AgT . One can remark that the attack is successful for all the target bytes within 2, 000 queries, which represents an improvement by one order of magnitude as compared to the scenario ACPA . In other words, the access to an open sample provides a substantial advantage to AgT compared to ACPA . As for the latter one, and for the same reasons, we conclude that the attack AgT would fail with 105 traces of the AES 8-bit implementation.

Fig. 6. Evolution of Na with respect to the number of training epochs during the open sample profiling by the CNN (attack ACNN ).

Figure 6 presents the results of the CNN attack ACNN . In particular, Fig. 6a shows that training the CNN for 200 epochs allows to recover a secret byte in less than 20 traces in the case of the mbedTLS implementation. Likewise, Fig. 6b shows that a successful attack can be done in less than 10 traces on the AES 8-bit implementation. Moreover, both curves in Fig. 6 show that the latter observations can be generalized for each byte targeted in the attack ACNN .11 Finally, one can remark that training the CNNs during a lower number 9 10 11

Targeting the output of the AddRoundKey operation instead of the output of the SubBytes operation has also been considered without giving better results. Section 4 discusses the possibility of relevant re-alignment techniques for the AES 8-bit implementation. Additional experiments realized on a setup close to ACNN confirm that the results can be generalized to any of the 16 state bytes.

454

L. Masure et al.

of epochs (e.g., 100 for mbedTLS, 50 for AES 8-bit), still leads to the same order of magnitude for Na . Based on these observations, one can make the following interpretations. First, the attack ACNN leads to the best attack among the tested ones, by one or several orders of magnitude. Second, such attacks are reliable, since the results do not differ from one implementation to another, and from one targeted byte to another. Third, the training time for CNNs, currently set to roughly 16 hours for each byte (see Subsect. 2.6), can be halved or even quartered without requiring too much more queries to succeed the attack. Since in this scenario the profiling phase is here the critical (i.e. the longest) task, it might be interesting to find a trade-off between the training time and the resulting Na , depending on the attacker’s abilities. Table 1. Minimal number Na of required queries to recover the target key bytes. Scenario mbedTLS Aauto ACPA AgT ACNN

4

AES 8-bit

5

>105

>10

3

5

3.10 − 10 20 − 103 >20

>105 >105 >10

Discussion

So far Sect. 3 has presented and summarized the results of the attacks, depending on the threat models defined in Subsect. 2.5, against two implementations of a cryptographic primitive, protected by a given configuration of code polymorphism. This section proposes to discuss these results, the underlying assumptions behind the attacks, and eventually the consequences of them. Choice of Our CNN Architecture. The small amount of queries to succeed the attack ACNN , conducted on both implementations, shows the relevance of our choice of CNN architecture. This illustrates that an end-to-end attack with CNNs is possible when targeting large scale traces, without necessarily requiring very deep architectures. We emphasize that there may be other choices of parameters for the convolutional giving relevant results as well, if not better. Yet, we do not find necessary to further investigate this way here. The obtained minimal number of queries Na was low enough so that any improvement in the CNN performances is not likely to change our interpretations of the vulnerability of the targets against ACNN . In particular, the advantage of Resnets [19] broadly used in image recognition typically relies on the necessity to use deep convolutional architectures in this

Deep Learning Side-Channel Analysis on Large-Scale Traces

455

field [37] and promising results have been obtained over the past few months with Resnets in SCA [17,47]. However, we demonstrate that we can take advantage of the distinctive features of side-channel traces to constrain the depth of our model and avoid typical issues related to deep architectures (i.e. vanishing gradient). On the Re-alignment Technique. The re-alignment technique used in this work is based on the detection of leakage instants by thresholding. Other realignment techniques [16,29,44] may be used. Therefore, the results of ACPA and AgT might be improved. However, none of the re-alignment techniques in the literature provides strong theoretical guarantees of optimality, especially regarding the use of code polymorphism. On the Security of Code Polymorphism. Our study exhibits the attacks Aauto and ACPA with high Na enough to enable a key refreshing period reasonably high, without compromising the confidentiality of the key. Unfortunately, this is not possible in presence of a stronger attacker that may have access to an open sample, as emphasized by attacks AgT and ACNN , where a secret key can be recovered within the typical duration of a session key. This may be critical at first sight since massive IoT applications often rely on Commercially available Of The Shelf (COTS) devices, which implies that open samples may be easily accessible to any adversary. More generally, this issue can be viewed from the perspective of the problem discussed by Bronchain et al. about the difficulty to prevent side-channel attacks in COTS devices, even with sophisticated counter-measures [9]. First, our experimental target is intrinsically highly vulnerable to SCA. Second, the use of software implementations of cryptographic primitives offers a large attack surface, which remains highly difficult to protect especially with a hiding countermeasure alone. This underlines the fact that a component may need to use hiding in combination with other counter-measures, e.g., masking, to be secured against a strong side-channel attacker model.

5

Conclusion

So far, this paper answers two questions that may help both developers of secure implementations, and evaluators. From a developer’s point of view, this paper has studied the effect of two implementations of a code polymorphism counter-measure against several side channel attack scenarios, covering a wide range of potential attackers. In a nutshell, code polymorphism as an automated tool, is able to provide a strong protection against threat models considering automated and layman attackers, as the evaluated implementations were secure enough against our first attacker models. Yet, the implementations evaluated are not sound against stronger attacker models. The soundness of software hiding countermeasures, if used alone, remains to be demonstrated against state-of-the-art attacks, for example by using other configurations of the code polymorphism toolchain, or by

456

L. Masure et al.

proposing new code transformations. All in all, our results underline again, if need be, the necessity to combine the hiding and masking protection principles in a secured implementation. From an evaluator’s point of view, this paper illustrates how to leverage CNN architectures to tackle the problem of large-scale side-channel traces, thereby narrowing the gap between SCA literature and concrete evaluations of secure devices where pattern detection and re-alignment are not always possible. The idea lies in slight adaptations of the CNN architectures already used in SCA, eventually by exploiting the signal properties of the SCA traces. Surprisingly, our results emphasize that, though the use of more complex CNN architectures have been shown to be sound to succeed SCAs, their use might not be a necessary condition in a SCA context. Acknowledgements. This work was partially funded thanks to the French national program “Programme d’Investissement d’Avenir IRT Nanoelec” ANR-10-AIRT-05. The authors would like to thank Emmanuel Prouff and Pierre-Alain Mo¨ellic for their fruitful feedbacks and discussions on this work.

A

Summary of the CNN Architecture Parameters

In the Zaid et al.’s methodology, T denotes the maximum assumed amount of random shift in the traces, and I denotes the assumed number of leakage temporal points in the traces. Table 2. Our architecture and the recommendations from the literature. Kim et al. [23]

This paper

Zaid et al. [45]

1

0

2

n2

logp (D)

logp (D)

3

p

2

5(p0 = 25)

2,

w

3

11(w0 = 10)

1,

kn

min(k0 × 2n , kmax ) 10, 20, 40, 40, 80, 100(100) k0 × 2

k0

8

n1

kmax 256

B

T 2 T 2

, ,

D I D I n

10

8, 32

100



Signal-to-Noise Ratio (SNR)

To assess whether there is trivial leakage in the traces, one may compute a Signal-to-Noise Ratio (SNR) [27]. For each time sample t, it is estimated by the following statistics:   V E [X[t]|Z = z] , (2) SNR[t]  Z V (X[t])

Deep Learning Side-Channel Analysis on Large-Scale Traces

457

where X[t] is the random continuous variable denoting the EM emanation measured at date t, Z is the random discrete variable denoting the sensitive target variable, E denotes the expected value and V denotes the variance. When the sample t does not carry any informative leakage, X[t] does not depend on Z so the numerator is zero, and inversely.

C

Performance Metric Computation

Let (pi )i≤Na the plaintexts of the Na traces of an attack set, encrypted with varying keys (ki )i≤Na , and let (yi )i≤Nv be the corresponding outputs of the CNN softmax. Let kˆ be in the key chunk space. A score for the value kˆ is defined by the Na ˆ log (yi [zi ]) where zi = maximum likelihood distinguisher [21] as: dNa [k] = i=1 ˆ The key is considered as recovered within Na queries if the SubBytes[pi ⊕ ki ⊕ k]. distinguisher dNa outputs the highest score for kˆ = 0. To assess the difficulty of attacking a target device with profiling attacks (which is assumed to be the worst-case scenario for the attacked device), it has initially been suggested to measure or estimate the minimum number of traces required to get a successful key recovery [27]. Observing that many random factors may be involved during the attack, the latter measure has been refined to study the probability that the right key is ranked first according to the scores. This metric is called the S uccess Rate [39]: SR(Na ) =  ˆ Pr argmaxˆ dN [k] = 0 . k∈K

a

In practice, to estimate SR(Na ), sampling many attack sets may be very prohibitive in an evaluation context, especially if we need to reproduce the estimations for many values of Na until we find the smallest value Na such that for Na ≥ Na the success rate is higher than a given threshold β. One solution to circumvent this problem is, given a validation set of Nv traces, to sample some attack sets by permuting the order of the traces into the validation set (e.g. 500 times in our experiments). dNa can then be computed with a cumulative sum to get a score for each Na ∈ 1, Nv . For each value of Na , the success rate is ˆ = 0”. dNa [k] estimated by the occurrence frequency of the event “argmaxk∈K ˆ While this trick gives good estimations for Na  Nv , one has to keep in mind that the estimates become biased when Na → Nv . Hopefully, in our experiments, the validation set size remains much higher than Na afterwards: Nv = 100, 000 for Aauto , ACPA , Nv = 20, 000 for AgT , and Nv = 5, 000 for ACNN .

References 1. Small portable AES128 in C (2020). https://github.com/kokke/tiny-AES-c 2. Agosta, G., Barenghi, A., Pelosi, G.: A code morphing methodology to automate power analysis countermeasures. In: DAC (2012). https://doi.org/10.1145/ 2228360.2228376 3. Agosta, G., Barenghi, A., Pelosi, G., Scandale, M.: The MEET approach: securing cryptographic embedded software against side channel attacks. TCAD (2015). https://doi.org/10.1109/TCAD.2015.2430320

458

L. Masure et al.

4. ARMmbed: 32-bit t-table implementation of aes for mbed tls (2019). https:// github.com/ARMmbed/mbedtls/blob/master/library/aes.c 5. Becker, G., Cooper, J., De Mulder, E., Goodwill, G., Jaffe, J., Kenworthy, G., et al.: Test vector leakage assessment (TVLA) derived test requirements (DTR) with AES. In: International Cryptographic Module Conference (2013) 6. Belleville, N., Courouss´e, D., Heydemann, K., Charles, H.: Automated software protection for the masses against side-channel attacks. TACO (2019). https://doi. org/10.1145/3281662 7. Benadjila, R., Prouff, E., Strullu, R., Cagli, E., Dumas, C.: Deep learning for sidechannel analysis and introduction to ASCAD database. J. Cryptogr. Eng. 10(2), 163–188 (2019). https://doi.org/10.1007/s13389-019-00220-8 8. Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28632-5 2 9. Bronchain, O., Standaert, F.X.: Side-channel countermeasures’ dissection and the limits of closed source security evaluations. IACR TCHES 2020(2) (2020). https:// doi.org/10.13154/tches.v2020.i2.1-25 10. Cagli, E., Dumas, C., Prouff, E.: Convolutional neural networks with data augmentation against jitter-based countermeasures. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 45–68. Springer, Cham (2017). https://doi.org/ 10.1007/978-3-319-66787-4 3 11. Choudary, O., Kuhn, M.G.: Efficient template attacks. In: Francillon, A., Rohatgi, P. (eds.) CARDIS 2013. LNCS, vol. 8419, pp. 253–270. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08302-5 17 12. Coron, J.-S., Kizhvatov, I.: An efficient method for random delay generation in embedded software. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 156–170. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-041389 12 13. Coron, J.-S., Kizhvatov, I.: Analysis and improvement of the random delay countermeasure of CHES 2009. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 95–109. Springer, Heidelberg (2010). https://doi.org/10.1007/ 978-3-642-15031-9 7 14. Courouss´e, D., Barry, T., Robisson, B., Jaillon, P., Potin, O., Lanet, J.-L.: Runtime code polymorphism as a protection against side channel attacks. In: Foresti, S., Lopez, J. (eds.) WISTP 2016. LNCS, vol. 9895, pp. 136–152. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45931-8 9 15. Daemen, J., Rijmen, V.: AES and the wide trail design strategy. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 108–109. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46035-7 7 16. Durvaux, F., Renauld, M., Standaert, F.-X., van Oldeneel tot Oldenzeel, L., Veyrat-Charvillon, N.: Efficient removal of random delays from embedded software implementations using hidden Markov models. In: Mangard, S. (ed.) CARDIS 2012. LNCS, vol. 7771, pp. 123–140. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-37288-9 9 17. Gohr, A., Jacob, S., Schindler, W.: Efficient solutions of the CHES 2018 AES challenge using deep residual neural networks and knowledge distillation on adversarial examples. IACR Cryptology ePrint Archive 2020, 165 (2020). https://eprint.iacr. org/2020/165 18. Goodfellow, I.J., Bengio, Y., Courville, A.C.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org/

Deep Learning Side-Channel Analysis on Large-Scale Traces

459

19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016). https://doi.org/10.1109/CVPR.2016.90 20. Heuser, A., Genevey-Metat, C., Gerard, B.: Physical side-channel analysis on stm32f0, 1, 2, 3, 4 (2020). https://silm.inria.fr/silm-seminar 21. Heuser, A., Rioul, O., Guilley, S.: Good is not good enough - deriving optimal distinguishers from communication theory. In: Batina, L., Robshaw, M. (eds.) CHES 2014. LNCS, vol. 8731, pp. 55–74. Springer, Heidelberg (2014). https://doi.org/10. 1007/978-3-662-44709-3 4 22. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015). http://jmlr.org/proceedings/ papers/v37/ioffe15.html 23. Kim, J., Picek, S., Heuser, A., Bhasin, S., Hanjalic, A.: Make some noise. Unleashing the power of convolutional neural networks for profiled side-channel analysis. IACR TCHES 2019(3) (2019). https://doi.org/10.13154/tches.v2019.i3.148-179 24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015). http://arxiv.org/abs/1412.6980 25. Maghrebi, H., Portigliatti, T., Prouff, E.: Breaking cryptographic implementations using deep learning techniques. In: Carlet, C., Hasan, M.A., Saraswat, V. (eds.) SPACE 2016. LNCS, vol. 10076, pp. 3–26. Springer, Cham (2016). https://doi. org/10.1007/978-3-319-49445-6 1 26. Mangard, S.: Hardware countermeasures against DPA – a statistical analysis of their effectiveness. In: Okamoto, T. (ed.) CT-RSA 2004. LNCS, vol. 2964, pp. 222–235. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-246602 18 27. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks - Revealing the Secrets of Smart Cards. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-38738162-6 28. Mather, L., Oswald, E., Bandenburg, J., W´ ojcik, M.: Does my device leak information? An a priori statistical power analysis of leakage detection tests. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8269, pp. 486–505. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42033-7 25 29. Nagashima, S., Homma, N., Imai, Y., Aoki, T., Satoh, A.: DPA using phase-based waveform matching against random-delay countermeasure. In: ISCAS (2007). https://doi.org/10.1109/ISCAS.2007.378024 30. Nassar, M., Souissi, Y., Guilley, S., Danger, J.: RSM: a small and fast countermeasure for AES, secure against 1st and 2nd-order zero-offset SCAs. In: DATE (2012). https://doi.org/10.1109/DATE.2012.6176671 31. National Institute of Standards and Technology: Advanced encryption standard (AES). https://doi.org/10.6028/NIST.FIPS.197 32. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019) 33. Prouff, E., Rivain, M.: Masking against side-channel attacks: a formal security proof. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 142–159. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3642-38348-9 9 34. Renaudin, M.: Asynchronous circuits and systems: a promising design alternative. Microelectron. Eng. 54(1), 133–149 (2000) 35. Rivain, M., Prouff, E.: Provably secure higher-order masking of AES. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 413–427. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15031-9 28

460

L. Masure et al.

36. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014). https://doi.org/10. 1017/CBO9781107298019 37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015). http://arxiv.org/abs/1409.1556 38. Standaert, F.-X., Archambeau, C.: Using subspace-based template attacks to compare and combine power and electromagnetic information leakages. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 411–425. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85053-3 26 39. Standaert, F.-X., Malkin, T.G., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009). https://doi.org/10.1007/9783-642-01001-9 26 40. STMicroelectronics: NUCLEO-F303RE. https://www.st.com/content/st com/en/ products/evaluation-tools/product-evaluation-tools/mcu-mpu-eval-tools/stm32mcu-mpu-eval-tools/stm32-nucleo-boards/nucleo-f303re.html 41. Suzuki, D., Saeki, M.: Security evaluation of DPA countermeasures using dualrail pre-charge logic style. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 255–269. Springer, Heidelberg (2006). https://doi.org/10.1007/ 11894063 21 42. Timon, B.: Non-profiled deep learning-based side-channel attacks with sensitivity analysis. IACR TCHES 2019(2) (2019). https://doi.org/10.13154/tches.v2019.i2. 107-131 43. Veyrat-Charvillon, N., Medwed, M., Kerckhof, S., Standaert, F.-X.: Shuffling against side-channel attacks: a comprehensive study with cautionary note. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol. 7658, pp. 740–757. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34961-4 44 44. van Woudenberg, J.G.J., Witteman, M.F., Bakker, B.: Improving differential power analysis by elastic alignment. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 104–119. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3642-19074-2 8 45. Zaid, G., Bossuet, L., Habrard, A., Venelli, A.: Methodology for efficient CNN architectures in profiling attacks. IACR TCHES 2020(1) (2019). https://doi.org/ 10.13154/tches.v2020.i1.1-36 ` Oliva, A., Torralba, A.: Learning deep features 46. Zhou, B., Khosla, A., Lapedriza, A., for discriminative localization. In: CVPR (2016). https://doi.org/10.1109/CVPR. 2016.319 47. Zhou, Y., Standaert, F.-X.: Deep learning mitigates but does not annihilate the need of aligned traces and a generalized ResNet model for side-channel attacks. J. Cryptogr. Eng. 10(1), 85–95 (2019). https://doi.org/10.1007/s13389-019-00209-3

Towards Poisoning the Neural Collaborative Filtering-Based Recommender Systems Yihe Zhang1 , Jiadong Lou1 , Li Chen1 , Xu Yuan1(B) , Jin Li2 , Tom Johnsten3 , and Nian-Feng Tzeng1 1 University of Louisiana at Lafayette, Lafayette, LA 70503, USA {yihe.zhang1,jiadong.lou1,li.chen,xu.yuan,nianfeng.tzeng}@louisiana.edu 2 Guangzhou University, Guangzhou 510006, Guangdong, China [email protected] 3 University of South Alabama, Mobile, AL 36688, USA [email protected]

Abstract. In this paper, we conduct a systematic study for the very first time on the poisoning attack to neural collaborative filtering-based recommender systems, exploring both availability and target attacks with their respective goals of distorting recommended results and promoting specific targets. The key challenge arises on how to perform effective poisoning attacks by an attacker with limited manipulations to reduce expense, while achieving the maximum attack objectives. With an extensive study for exploring the characteristics of neural collaborative filterings, we develop a rigorous model for specifying the constraints of attacks, and then define different objective functions to capture the essential goals for availability attack and target attack. Formulated into optimization problems which are in the complex forms of non-convex programming, these attack models are effectively solved by our delicately designed algorithms. Our proposed poisoning attack solutions are evaluated on datasets from different web platforms, e.g., Amazon, Twitter, and MovieLens. Experimental results have demonstrated that both of them are effective, soundly outperforming the baseline methods.

1

Introduction

The recommender systems become prevalent in various E-commerce systems, social networks, and others, for promoting products or services to users of interest. The objective of a recommender system is to mine the intrinsic correlations of users’ behavioral data so as to predict the relevant objects that may attract users’ interest for promotion. Many traditional solutions leveraging the matrix factorization [16], association rule [6,8,22], and graph structure [10] techniques have been proposed by exploring such correlations to implement the recommender systems. Recently, the rapid advances in neural network (NN) techniques and their successful applications to diverse fields have inspired service c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 461–479, 2020. https://doi.org/10.1007/978-3-030-58951-6_23

462

Y. Zhang et al.

providers to leverage such emerging solutions to deeply learn the intrinsic correlations of historical data for far more effective prediction, i.e., recommendation, to improve users’ experiences in using their web services [5,7,14,23,27]. A survey in [25] has summarized a series of NN-based recommender systems for objects recommendation. The neural collaborative filtering-based recommender system is among these emerging systems that have been proposed or envisioned for use in Youtube [7], Netflix [20], MovieLen [4], Airbnb [12], Amazon [18], and others. However, existing studies have demonstrated that the traditional recommender systems are vulnerable to the poisoning attack [9,19]. That is, an attacker can inject some fake users and operations on the network objects to disrupt the correlation relationships among objects. Such disrupted correlations can lead the recommender system to make wrong recommendation or promote attacker’s specified objects, instead of original recommendation results that achieve its goal. For example, the poisoning attack solutions proposed in [19,24], and [9] have been demonstrated with high efficiency in the matrix factorization-based, association rule-based, and graph-based recommender systems, respectively. To date, the poisoning attacks on the NN-based recommender systems have yet to be explored. It is still unsettled if this attack is effective to NN-based recommender systems. Typically, such a recommender system takes all users’ M latest operations to form a historical training dataset, which is used to train the weight matrices existing between two adjacent layers in neural networks. With the well trained weight matrices, the NN can recommend a set of objects that have the high correlations (similarity probabilities) within historical data. The goal of this paper is to perform poisoning attacks in neural collaborative filtering-based recommender systems and verify its effectiveness. The general idea of our poisoning attack approach is to inject a set of specified fake users and data. Once the recommender system takes the latest historical data for training, the injected data can be selected and act effectively to change the trained weight matrices, thus leading to wrong recommendation results. Specifically, two categories of attacks are studied, i.e., availability attacks and target attacks. The first one aims to demote the recommended results by injecting poisoned data into the NN so as to change its output, i.e., each object’s recommendation probability. This category of attack refers to the scenario that some malicious users or service providers target to destroy the performance of other web applications’ recommender systems. The second one aims to not only distort the recommended results but also promote a target set of objects to users. This category of attack connotes the application scenario that some users or business operators aim at promoting their target products, against the web servers’ original recommendation. In both attacks, we assume an attacker will inject as few operations as possible to minimize the expense of an attack while yielding the maximum attack outcomes. This objective is due to the fact that an attack’s available resource may be limited and its detection should be avoided as best as possible. By defining the effective objective functions and modeling the resource constraints for each attack, we formulate each corresponding scenario into an optimization problem.

Poisoning Attack

463

Even though the formulated problems are in the complex form of non-linear and non-convex programming, we design effective algorithms based on the gradient descent technique to solve them efficiently. To validate the effectiveness of our proposed two attack mechanisms, we take the real-world datasets from Amazon, Twitter, and MovieLens as the application inputs for evaluation. Experimental results show that our availability and target attacks both substantially outperform their baseline counterparts. Specifically, our availability attack mechanism lowers the accuracy by at least 60%, 17.7%, and 58%, in Amazon, Twitter, and MovieLens, respectively, given the malicious actions of only 5% users. Main contributions of this work are summarized as follows: – We are the first to propose poisoning attack frameworks on the NN-based recommender systems. Through experiments, we have successfully demonstrated that the neural collaborative filtering-based recommender systems are also vulnerable to the poisoning attacks. This calls for the service providers to take into account the potential impact of poisoning attacks in the design of their NN-based recommender systems. – We present two types of poisoning attacks, i.e., availability attack and target attack, with the aim to demote recommendation results and promote the target objects, respectively. Through rigorous modeling and objective function creation, both types of attacks are formulated as the optimization problems with the goal of maximizing an attacker’s profits while minimizing its involved operational cost (i.e., the amount of injected data). Two effective algorithms are designed respectively to solve the two optimization problems. – We implement our attack mechanisms in various real-world scenarios. Experimental results demonstrate that our poisoning attack mechanisms are effective in demoting recommended results and promoting target objects in the neural collaborative filtering-based recommender system.

2

Problem Statement

In this paper, we aim to design effective strategies for poisoning attacks on a general category of recommender systems based on neural collaborative filtering. Specifically, our goal is to achieve twofold distortions—demoting the recommended results and promoting the target objects—through availability attack and target attack, respectively. 2.1

Problem Setting

Given N users in the set N and D objects in the set D, a recommender system is designed to recommend a small subset of objects to each user based on an estimate of the user’s interest. In a neural collaborative-based recommender system, an NN is trained via historical data from each user to learn the probabilities of recommending objects to a given user, based on which the top-K objects will be selected for recommendation.

464

Y. Zhang et al.

The historical operation behaviors (explicit [3] or implicit [15]) of a user can be learnt by a neural collaborative-based recommender system, such as marking, reviewing, clicking, watching, or purchasing. In practice, it typically truncates the M latest objects operated from each user n in the historical data, denoted as un = (u1n , u2n , ..., uM n ), for learning. Let U denotes the entire dataset from all users, i.e., U = {un |n ∈ N }. The neural collaborative filtering model can be modeled as a function of C K (F (N , D)) = P K , which takes N and D as the input, and outputs the recommendation probability of each object j, i.e., pn = {pn (j)|j ∈ D}, for each user n. Here, F represents the neural collaborative filtering model and pn (j) satisfies 0 ≤ pn (j) ≤ 1 and pn (1) + pn (2) + ... + pn (D) = 1 for n ∈ N . C K (·) function indicates the top-K objects selected according to the recommendation probabilities. For each user 1 2 K n, top-K recommended objects are denoted as rK n = (rn , rn , ..., rn ), and their K corresponding scores (i.e., probabilities) are represented as pn . The recommendation results of all users are denoted by R = {rK n |n ∈ N }. We perform poisoning attacks by injecting fake users into the network and enabling them to perform certain operations to distort the recommendation ˆ injected fake users with each perresults for normal users. Assume there are N u1n , u ˆ2n , ..., u ˆM forming at most M operations. Denote u ˆ n = (ˆ n ) as the objects ˆ } as the complete set of objects operated by a fake user n and Uˆ = {ˆ un |n ∈ N ˆ fake users. As the fake users disguise themselves over the normal users, of N the recommender system takes both normal users and fake users operations ˆ which naturally as the historical data for training, denoted by U˜ = U ∪ U, 1 2 = (˜ r ˜n , ..., r˜nK ) as the impacts the recommendation results. We denote ˜ rK n n, r top-K recommended objects for a normal user n after poisoning attacks and ˜ = {˜ denote R rK n |n ∈ N } as the recommended results for all users. The attacker will control the amount of poisoned data operations (i.e., fake users and their operations) to achieve its goal of demoting originally recommended objects or promoting target objects. 2.2

Attacks as Optimization Problems

We consider two types of poisoning attacks, i.e., availability attack and target attack, with different profit requirements. In what follows, we will formulate them as optimization problems with the objective of optimizing their respective profits constrained by the available resources. Availability Attack. The goal of availability attack is to demote the original recommendation results by poisoning data into the training dataset to change the NN output for the objects having top K highest probabilities. The profit of an attacker depends on the degree of distortion of the recommendation results. For its profit maximization, the attacker tries to achieve the largest discrepancy between recommendation results before and after the attack. We formulate an optimization problem in the following general form: ˜ OPT-A: min S(R, R) m ˆ ˆn ∈ {c1 , ..., cd }, s.t.U 0 ≤ B, u

(1)

Poisoning Attack

465

˜ represents the accuracy metric of the recommendation results where S(R, R) ˜ after the attack, as compared with those before the attack (R). B rep(R) resents the maximum limit of the poisoned data. With the L0-norm [21], the resource constraints address both limited fake users and their operations. The set of {c1 , ..., cd } represents a fake user’s available operations. For example, in Ebay, this set is {0, 1}, representing whether a user clicks one product or not; In MovieLens, this set is {0, 1, 2, 3, 4, 5}, representing the rating score of one movie. Target Attack. The goal of target attack is to not only distort the originally recommended results but also promote the target objects to users. Let T = {t1 , t2 , ..., tK } denote the K target objects that an attacker wishes to promote. We define a metric named the successful score, expressed as HT (·), to measure the fraction of normal users whose top-K recommendation results include the target objects after the attack. To maximize the successful score, an attacker not only promotes target objects to as many users as possible but also promote as many target objects as possible for each user. This two-faceted consideration will be incorporated in the mathematical expression of HT (·), to be discussed later. Such a problem is formulated in the following general form: ˜ OPT-T: max HT (R) m ˆ ˆn ∈ {c1 , ..., cd }. s.t.U 0 ≤ B, u

(2)

The difference between the problem formulations of the availability attack and the target attack (i.e., OPT-A and OPT-T) lies in their objective functions. We hope to design a general attack strategy, which can flexibly achieve different goals by switching the objective function in a lightweight manner.

3

Availability Attack in Recommender System

In this section, we present our strategy for performing the availability attack in neural collaborative filtering-based recommender systems. Assume that an attacker can create or operate a set of fake users and inject bogus data into the recommender system and distort the recommended results. The structure of our attack model is illustrated in Fig. 1, where the flow with solid arrow lines illustrates the general framework of a neural collaborative filtering model and the flow with the dashed arrow lines represents our attack framework. The NN ¯ j as inputs to calculate model takes a user vector xn and an object vector x the score pn (j), denoting the probability of recommending object j ∈ D to user n ∈ U. To perform the attack, an attacker collects a set of recommended results and train an attack table A, used to serve as the guideline for the attacker to determine the number of fake users and their operations. Details of the attack process are elaborated in the sequence text. 3.1

Attack Model Design

¯ j indicates one-hot encoding vectors that keep 1 in their In Fig. 1, xn and x corresponding categorical entries and leave all other entries as 0. W1 and W2 are

466

Y. Zhang et al.

Fig. 1. Our attack framework.

two weight matrices that encode user vectors and object vectors into respective dense vectors, as follows: ¯j , ¯ j = W2 x vn = W1 xn , v

(3)

¯ j indicate the user n’s encoding vector and the object j’s encoding where vn and v vector, respectively. Here, both user and object encoding vectors are set to the same size. We assume the attacker knows the neural collaborative filtering-based recommender system used, but its detailed design is in black-box. According to characteristic of this system, we calculate the similarity of two encoding vectors from a user n and an object j by taking their dot product as follows: yn (j) = ¯ j · vnT . Then, a Softmax layer is leveraged to calculate the probability of the v object j by considering all the objects in the dataset: T

ev¯ j vn . pn (j) = |D| T ¯ i vn v i=1 e

(4)

To train this model to get W1 and W2 , we employ the binary log likelihood [26] as follows: ¯j ) = Ljn log(σ(¯ vj vnT )) + (1 − Ljn ) log(σ(−¯ vj vnT )), G(vn , v

(5)

where Ljn = 1 when a user n has operated on an object j in the training dataset (positive sample) and Ljn = 0, otherwise (negative sample) [11]. Sigmoid function σ [13] is leveraged to calculate the similarity of two encoded vectors and Log likelihood is employed here for easy gradient calculation. Given the historical data U, we can express the total likelihood function as:     Ljn log(σ(¯ G(U) = vj vnT )) + (1 − Ljn ) log σ(−¯ vj vnT )) , (6) ¯n n∈N v ¯ j ∈V

Poisoning Attack

467

¯ n indicates the union of the positive and negative samples of a user n, where V which is sampled for efficiency consideration. To maximize G(U), the stochastic gradient decent method is leveraged to solve it iteratively. Based on our notations in Sect. 2, the number of normal users is N before the attack and the historical data of user operations is U = {un |1 ≤ n ≤ N }. ˆ users, with the whole set of operations Assume the attacker injects a total of N ˆ }. With both historical and newly injected datasets, a ˆ = {un |1 ≤ n ≤ N being U recommender system will maximize the following objective in the training phase:     ˜ U ˜) = G( Ljn log(σ(¯ vj vnT )) + (1 − Ljn ) log σ(−¯ vj vnT )) ¯n n∈N v ¯ j ∈V

+

  

¯n ˆ v ¯ j ∈V n∈N

 Ljn log(σ(¯ vj vnT )) + (1 − Ljn ) log σ(−¯ vj vnT )) ,

(7)

where the first summation term deals with the historical data of normal users while the second summation term is for the injected bogus data. To distort the recommended results through polluting NN training in Eq. (7), we first expand on the objective function of an availability attack presented in Sect. 2.2 and then model the constraints of injected fake users and their operations. With mathematical expressions of the goal and constraints, our availability attack is formulated and solved as an optimization problem, elaborated next. 3.2

Attack Objective Function

To distort the recommender system, the objective function of availability attack should be able to measure the discrepancy of recommended results before and ˜ As the recommender system typically recommends the after attack, i.e., S(R, R). top-K objects to a user, it is sufficient to consider only the top-K recommendations. Given the probabilities of all objects generated by the neural network, the ˜ will store the objects that have the top K highest recommended results R and R probabilities for each user before and after the attack, respectively. Before the attack, each object in the top-K recommendation list has a higher probability than any object after the K-th one. After the attack, if successful, at least one of the originally recommended objects, assuming object rnk ∈ R for a user n, will ˜ in the new ranking list, so have a lower probability than the K-th object r˜nK ∈ R that the object rnk will have a low chance to be recommended to the user. Thus, for each user n, we can define the following function to model the discrepancy of recommended results before and after attack: ⎧ rnK ); ⎨ 1, p˜n (rnk ) < p˜n (˜ k K K k k rnK ); sgn[(pn (rn ) − pn (rn )) · (˜ pn (˜ rn ) − p˜n (rn ))] = 0, p˜n (rn ) = p˜n (˜ ⎩ k −1, p˜n (rn ) > p˜n (˜ rnK ). where pn (rnk ) and pn (rnK ) represent the probabilities of object rnk and the KrnK ) th object, respectively, in the recommendation list R, while p˜n (rnk ) and p˜n (˜ k stand for the probabilities of an object rn and the Kth object in the recommen˜ after the attack. Note that pn (rk ) and pn (rK ) are known (i.e., dation results R n n

468

Y. Zhang et al.

constant) while p˜n (rnk ) and p˜n (˜ rnK ) are variables, resulted from the poisoning attack. The intuition of this equation is explained as follows. After the attack, if an object rnk that is previously in the recommendation list R has a lower ˜ rnk will not appear in the top-K recprobability than the K-th object r˜nK ∈ R, ommendation list, and thus the function sgn(·) returns 1, indicating a successful attack. Otherwise, sgn(·) returns 0 or −1, meaning an unsuccessful attack. Furthermore, considering all the users, the discrepancy of recommended ˜ is expressed as follows: results, i.e., S(R, R), ˜ = S(R, R)

N  K  1 n=1 k=1

2

{1 − sgn[(pn (rnk ) − pn (rnK )) · (˜ pn (˜ rnK ) − p˜n (rnk ))]},

(8)

where 12 (1 − sgn(·)) is used to map the inner part over a value between 0 and ˜ implies minimizing the recommendation accuracy after 1. Minimizing S(R, R) poisoning attack, and it is equivalent to maximizing the discrepancy of recommended results before and after the attack. However, Eq. (8) is not continuous, unable to be solved directly. Based on the characteristics of 12 (1−sgn(x)), we use 1 to approximate it by setting θ to an approximate value. After refor1 − exp(−θx) ˜ mulating Eq. (8), we obtain the following objective function for min S(R, R): min

K N  

{1 −

n=1 k=1

1 }, 1 + exp[−θ1 (pn (rnk ) − pn (rnK )) · (˜ pn (˜ rnK ) − p˜n (rnk ))]

(9)

where θ1 is a positive parameter with its value adjusted into a suitable range. 3.3

Attack Constraints

We present the resource constraints for the injected fake users and operations. Budget Constraints. To meet the resource limitation, the amount of injected fake users and operations is constrained by a budget B, expressed as follows: Uˆ0 ≤ B ,

(10)

ˆ 0 represents where Uˆ denotes the set of fake users and their operations while U the L0 -norm of their operations. Operation Constraints. The operations of fake users are reflected in the attack table A as shown in Fig. 1, which is used by an attacker to determine the operations of each fake user on each object. Attack table A is defined with the dimenˆ × D, with rows and columns representing fake users and objects. sions of N Each entry δnj is a weight value in the range of [0, 1], indicating the likelihood that a fake user n interacts with object j. By adjusting the weight values, the attacker can change the strategies of operating its fake users, thus bending the recommendation results. Each entry can be approximated as follows: δnj =

1 (tanh(anj ) + 1) , 2

(11)

Poisoning Attack

469

where tanh(·) represents the hyperbolic tangent function. We let each fake user n operate on at most M objects, i.e., u ˆn = ˆ2n , ..., u ˆM ). For a given fake user’s dense vector v and an object’s vec(ˆ u1n , u n n ¯ j , we can define our expected output of object j for a fake user n as a tor v reference function y˜n (j): ¯ j vnT ) + (1 − δnj ) · σ(−θn v ¯ j vnT ), y˜n (j) = δnj · σ(θp v

(12)

where θp and θn are shape parameters which adjust the sensitivity of userobject relationship, with the value of 0.5 as a threshold. If δnj is larger than this threshold, the attacker activates the operation. Since no attacker can manipulate normal users, δnj is a variable only related to fake users. For normal users, δnj is set to 1 if user i has relationship with object j, or set to 0, otherwise. 3.4

Solving the Optimization Problem

Based on our discussion, the availability attack can be reformulated as follows: ˜ min S(R, R) ˜ U) ˜ s.t.{W1 , W2 } ← argmax G( ˆ ˆ 0 ≤ B, u ˆm ˆm U n ∈ {c1 , ..., cd }, u n ∈ U.

(13)

˜ and then solve the inner objecWe first solve the outer objective of min S(R, R) ˜ U˜). tive of argmax G( ¯k ) to represent the inner part of Eq. (9), i.e., We use Ea (vn , v ¯k ) = 1 − Ea (vn , v

1 . 1 + exp[−θ1 (pn (rnk ) − pn (rnK )) · (˜ pn (˜ rnK ) − p˜n (rnk ))]

(14)

Here, pn (rnk ) and pn (rnK ) are constants and can be obtained from Eq. (12) by rnK ) are variables, derived via Eq. (12) by setting δnj to 0 or 1. p˜n (rnk ) and p˜n (˜ updating matrix A. To reduce the computation cost, p˜n (˜ rnK ) will be updated only after we have finished one round of calculation on all objects and users. If our attack is successful, E will be close to 0. To model the goal of taking as few operations as possible, the L1 norm is added to the inner part of objective function Eq. (14), resulting in a new function, ¯ k ): denoted as lossa (vn , v ¯ k ) = Ea (vn , v lossa (vn , v ¯k ) + c|δnk |,

(15)

where c is a constant that adjusts the importance of two objective terms. Now, we use two phases to solve Eq. (15). Phase I: This phase aims to update δ values in A, by minimizing the loss function lossa , with a two-step iterative procedure stated as follows: Step 1: We initialize the random value for each a in Eq. (11) and employ the gradient descent method to update a iteratively, as follows: aik := aik − ηa ∇a lossa (vn , vk ),

(16)

470

Y. Zhang et al.

where ηa is the step size of updating a. Step 2: We fix the attack table A and update the ranking of reference recommendation scores of all objects using Eq. (12). According to the new ranking, we can find the object r˜nK∗ . The two steps repeat until the convergence criterion is satisfied, i.e., the gradient difference between two consecutive iterations is less than a threshold. Phase II: This phase is to add the fake users and update the vector representation of injected objects. Again, a two-step iterative procedure is used: Step 3: Assume the attacker will control one fake user n and select S objects to perform the fake operations, we have: S = min{ NB , M }, where N  is the total number of fake users. Step 4: After adding a fake user and its operations to the training dataset, we ¯j train new W1 and W2 by solving the inner optimization problem. Let vn and v denote the user vector and object vector, respectively. For an easy expression, we ¯ j ) and then calculate denote the terms inside the summation in Eq. (7) as L(vn , v ¯ j ) with respect to vn and v ¯ j , i.e., the derivative of L(vn , v ¯j ) = ∇j L(vn , v

¯j ) ∂L(vn , v = (Ljn − σ(¯ vj vnT ))vnT . ∂¯ vj

(17)

¯ j ), where ηj is the Then, we update vj as follows: v ¯j := v ¯j + ηj · ∇j L(vn , v vj ) n ,¯ ¯ j ) = ∂L(v update step. With respect to vn , we have: ∇n L(vn , v = (Ljn − ∂¯ vj T σ(¯ vj vn ))¯ vj . Then, for each vi employed to aggregate vn , we have:  ¯ j )/(M − 1) , vnT := vnT + ηn · ∇n L(vn , v (18) j

where ηn is the step size to update vn . Step 4 repeats until the convergence criterion is satisfied, i.e., the object vector expression result changes negligibly. Phase 1 and Phase 2 will repeat until the budget constraints are met or the attack purpose is achieved.

4

Target Attack

The goal of target attack is to promote specific objects to the normal users. The key challenge lies in designing an expression to measure the successful score of a targeted object. Denote T = {t1 , t2 , ..., tK } as the target objects that an attacker wishes to promote to normal users. An attacker aims to promote the target objects in T to the top-K recommendation list by making them ranked higher than the highest recommended item rn1 , formulated as follows: ⎧ ⎨ −1, p˜n (rnt ) > p˜n (rn1 ); t 1 t 1 pn (ˆ rn ) − p˜n (rn ))] = 0, p˜n (rnt ) = p˜n (rn1 ); sgn[(pn (rn ) − pn (rn )) · (˜ (19) ⎩ 1, p˜n (rnt ) < p˜n (rn1 ).

Poisoning Attack

471

In this function, rnt and rn1 are constant, which are known for a recommender system before attack. After performing poisoning attack, a success results in r˜nt ranked higher than r˜n1 and sgn(x) returning −1; otherwise, the sgn(x) returns 1 or 0. Its corresponding successful score can be expressed by: ˜ = HT (R)

N  K  1 n=1 t=1

2

{1 − sgn[(pn (rnt ) − pn (rn1 )) · (˜ pn (ˆ rnt ) − p˜n (rn1 ))]}.

(20)

To make this problem solvable, the successful score is approximated as follows: max

K N  

{1 −

n=1 t=1

1 }. 1 + exp[−θ1 (pn (rnt ) − pn (rn1 )) · (˜ pn (ˆ rnt ) − p˜n (rn1 ))]

(21)

The attack constraints illustrated in Sect. 3.3, i.e., budget and operation constraints, can also apply to the target attack. Thus, target attack can be formulated as follows: ˜ max HT (R)

(22)

˜ U ˜) s.t. {W1 , W2 } = argmax G( m ˆ 0 ≤ B, u ˆ U ˆn ∈ {c1 , ..., cd }, u ˆm n ∈ U. Like Eq. (14), we have: ¯ t ) = Et (vn , v ¯t ) + c|δnt |, losst (vn , v

(23)

¯t ) is the inner part of Eq. (21). where Et (vn , v We can follow the similar procedure as in Phases 1 and 2 of Sect. 3.4 to solve this problem iteratively.

5

Experiments

We implement the proposed attack frameworks and evaluate them by using three real-world datasets from Amazon, Twitter and MoiveLens, as outlined next. 5.1

Datasets

Amazon [2]. This dataset contains the item-to-item relationships, e.g., a customer who bought one product X also purchased another product Y . The historical purchasing records are used by a recommender system to recommend product to a user. In our experiments, we take the data in the Beauty category as our dataset, which includes 1000 users and 5100 items. Twitter [17]. The social closeness of users, represented by friend or follower relationships, is provided in this dataset. Specifically, we use 1000 users and 3457 friendships in our evaluation. MoiveLens [1]. This dataset is from a non-commercial and personalized movie recommender system collected by GroupLens, consisting of 1000 users’ ratings on 1700 movies. The rating scores range from 0 to 5, indicating the preference of users for movies.

472

5.2

Y. Zhang et al.

Comparison

The state-of-the-art poisoning attacks target specific categories of recommender systems: [9,19,24] aiming to the matrix factorization based, association rule based, and graph based recommender systems, respectively. There is no straightforward method to apply them to the neural network-based recommender systems, making it infeasible to compare our solution with them. As far as we know, there is no existing poisoning attack solution for neural collaborative filteringbased recommender systems. Therefore, we propose to compare our solutions with the baseline methods, as described next, to show their performance gains. Baseline Availability Attack. An attacker randomly selects M objects, with the number of injected fake users ranging from 1% to 30% of the users in the historical dataset. Each fake user operates on these M objects to poison the historical dataset. To be specific, in Amazon, each fake user purchases M selected products; in Twitter, each fake user follows M other users; in MovieLens, each user chooses M movies and rates them with random scores. Baseline Target Attack. An attacker has a set of target objects T . Each fake user randomly selects a target object from T and then selects another M − 1 popular objects in the network to operate on. The purpose of such a selection is to build close correlation of the target object and the popular objects so as to have the chance of being recommended. With respect to the selection of target objects in this attack, we consider two strategies: 1) selecting objects randomly and 2) selecting unpopular objects. 5.3

Performance of Availability Attack

We implement the neural collaborative filtering-based recommender systems for the Amazon, Twitter and MovieLens datasets and then perform both our poisoning attack solutions and the baseline availability attack to distort the recommended results. We set K = 30, i.e., recommending the top 30 objects.

Fig. 2. Comparison of recommendation accuracy of our availability attack and the baseline method in Amazon with fake users portions varying from 0 to 30%.

Fig. 3. The recommendation accuracy of our availability attack for the Amazon dataset under different amounts of users in test dataset.

Poisoning Attack

473

Amazon. There are 1000 users selected, each with the latest M = 8 purchased products as the historical dataset U. The neural collaborative filtering-based recommender system takes the historical dataset for training and then makes recommendations. We first randomly select 10 users for testing. Figure 2 shows the recommendation accuracy after performing our availability attack and baseline attack with the portion (the percentage of inject fake users over the total normal users in the historical data) of injected fake users increasing from 1% to 30%, i.e., from 10 to 300 users. From this figure, it is obvious that our availability attack significantly outperforms the baseline method in terms of reducing the recommendation accuracy. Especially, the accuracy of recommendation drops by 60%, 80% and 90% when the percentages of fake users are 5%, 15% and 30%, respectively, in our availability attack. However, the recommendation accuracy drops only by 0.1%, 2% and 9%, respectively, in the baseline attack. We now exam our solutions on various amounts of testing users, including 10, 100 and 1000. Figure 3 illustrates the recommendation accuracy outcomes, given different amounts of testing users when injecting the fake user count ranging from 1% to 30% in our attack. Intuitively, to achieve the same attack goal (in terms of the accuracy drop rate), more fake users need to be injected to change the correlation relationships of more testing users. However, our result is seen to be highly effective, i.e., recommendation accuracy drops less when the amount of testing users rises. Specifically, to achieve a 60% accuracy decrease, our solution only needs to inject 8% and 10% fake users respectively under 100 and 1000 users in the test set. This demonstrates that an attacker can achieve an excellent attack goal by resorting to only a small number of fake users. Twitter. We select 1000 users and their corresponding relationships with other users sampled from [17] as the training dataset, while another 1000 users and their relationships are selected for testing. As it is hard to identify the latest friends in Twitter, we let each user randomly select 8 friends. 10 users are randomly selected from the test dataset as the targets to perform our proposed attack and the baseline availability attack. Figure 4 demonstrates the recommendation accuracy after injecting the fake user count in the range of 1% to 30% into the Twitter dataset. From this figure, the recommendation accuracy from our attack is found to drop by 17.7%, 45.2% and 61.8%, when the portions of fake users are 5%, 15% and 30%, respectively. Compared to the accuracy results attained by the baseline availability attack, which drops by only 0.07%, 1.5% and 5.7%, respectively, we conclude that our attack is starkly more effective. To show the effectiveness of our attack on different types of users, we select 100 popular users that are followed by most users and 100 unpopular users that do not have any follower. Besides, we randomly select another 100 users as the third group of target users. Figure 5 shows the results after our attack on the three groups of users: the recommendation accuracy drops slower from the popular dataset than from both random and unpopular datasets, with the accuracy from the unpopular dataset dropping fastest. The reason is that the popular users have more knitted relationships with other users, thus requiring an attacker to invoke more operations to change such correlated friendships.

474

Y. Zhang et al.

However, for an unpopular dataset, the friendships relationship is much lighter, thus making it far easier for our attack to change its correlation with others. MovieLens. We select 1000 users and their rating scores in a total of 1700 movies sampled from [1] as the historical data and use other 10 users as the testing data. The portion of fake users increases from 0% to 5%, 10% and 30% when conducting our attack while K value increases from 1 to 30. Figure 6 depicts the recommendation accuracy of our attack with an increase in the K value under various portions of fake users. In this figure, we can see the recommendation accuracy drops less when K increases. The reason is that with a higher K value (more recommendation results), the fake users need to change more correlations among movies, thus lowering the successful rate. But with the portion of fake users rising, the recommendation accuracy can drop more as the attacker invokes more efforts to change the correlations among movies. Specifically, with 5% injected fake users, the recommendation accuracy drops by more than 50%. This demonstrates the superior effectiveness of our attack on the MovieLens. Figure 7 shows the recommendation accuracy of baseline availability attack. It is seen the recommendation accuracy drops less than that from our attack (Fig. 6). Even when the portion of fake users rises to 30%, it drops less than 14%, which is still worse than our attack with only 1% fake users injected. Thus, the baseline solution is clearly outperformed by our attack. 5.4

Performance of Target Attack

We define a metric hit ratio to indicate the fraction of normal users whose top K recommendations contain the target items, after the attack. Amazon. We use the same historical data in Amazon as in Sect. 5.3 and select 1000 other users for testing. 30 products outside the original top 30 recommendation list are randomly selected as our targets. Figure 8 compares hit ratios of our

Fig. 4. Comparison of recommendation accuracy of our availability attack and the baseline method in Twitter under a range of fake users portions.

Fig. 5. Recommendation accuracy of availability attack in Twitter with respect to the popular, random and unpopular user datasets.

Poisoning Attack

475

target attack and those of the baseline method for various amounts of injected fake users, ranging from 1% to 30%. It is seen that our target attack significantly outperforms the baseline approach. Specifically, our attack can achieve the hit ratios of 4.83%, 18.00% and 40.20%, respectively, under the injected fake user count of 5%, 15% and 30%. However, the baseline method can only attain the hit ratios of 0.00%, 0.7% and 5.36%, respectively. We then fix the amount of fake users to 5% of all normal users in historical data (i.e., 50 users) and change the amounts of both desired recommended results and target products. We randomly select the amount of target products from 1, 5, 10, to 30, while varying K from 1 to 5, 10 and 30. Table 1 lists the hit ratios from our attack and the baseline method. The hit ratios of both our attack and the baseline method are found to increase with a larger K or more target products. However, our attack always achieves a much higher hit ratio than the baseline counterpart. For example, when the amount of target products is 10, the hit ratios increase from 0.06% to 1.85% in our attack and from 0 to 0.09% in the baseline method, respectively, when K varies from 1 to 30. Twitter. We use the same sampled training and test datasets from Sect. 5.3 to perform the target attack. The portion of fake users is fixed to 10%, i.e., 100 users, and K ranges from 1 to 5, 10 and 30. The target users are randomly selected, with the user number varying from 1 to 5, 10 and 30. Table 2 provides the hit ratios of our attack and the baseline method. It is seen that the hit ratios increase with a larger K or more target users in both our attack and the baseline counterpart. But our attack significantly outperforms the baseline method. Especially, when K = 30, our attack can achieve the hit ratios of 0.39%, 1.02%, 2.21% and 4.62% with the number of target users varying from 1 to 5, 10 and 30, respectively, while the baseline method reaches the respective hit ratios of 0.05%, 0.08%, 0.09 and 0.18%.

Fig. 6. Recommendation accuracy of availability attack in MovieLens with different portions of fake users and various K values.

Fig. 7. Recommendation accuracy of baseline availability attack in MovieLens with different portions of fake users and various K values.

476

Y. Zhang et al.

Fig. 8. Comparison of our target attack and the baseline method in Amazon under a range of fake users portions in the dataset. Table 1. Hit ratios of our target attack and the baseline method in the Amazon dataset under a range of K values for a range of target products Hit ratio (%)

K 1

5

10

30

# of target products (Our attack)

1 5 10 30

0.01 0.03 0.06 0.08

0.03 0.10 0.15 0.27

0.08 0.54 1.17 1.32

0.31 1.02 1.85 2.17

# of target products (Baseline)

1 5 10 30

0.00 0.00 0.00 0.00

0.00 0.01 0.02 0.03

0.01 0.02 0.04 0.05

0.03 0.04 0.09 0.11

Table 2. Hit ratios of our target attack and the baseline method in Twitter dataset under a range of K values for a range of target products Hit ratio (%)

K 1

5

10

30

# of target users (Our attack)

1 5 10 30

0.02 0.04 0.07 0.11

0.03 0.12 0.20 0.49

0.011 0.66 1.72 2.01

0.39 1.02 2.21 4.62

# of target users (Baseline)

1 5 10 30

0.01 0.01 0.01 0.02

0.01 0.01 0.02 0.04

0.03 0.04 0.05 0.07

0.05 0.08 0.09 0.18

Poisoning Attack

477

MovieLens. We use the same training dataset from Sect. 5.3 and sample three categories of test dataset: random, unpopular, and low-ranking movies. For each category, we randomly select 10 movies as the target set. For recommendation, we consider the top 30 ones. Figure 9(a) shows the hit ratios of our attack under the three categories of target movie set. From this figure, we can see the hit ratios of the unpopular and low ranking movie sets are lower than those from the random set. The reason is that the unpopular and low ranking targets have fewer correlations with others, thus making it much harder to promote them for recommendations when compared with the random target set. Still, our attack achieves 23% and 31% hit ratios by injecting 30% fake users. This demonstrates the advantages of our proposed attack on promoting products. Figure 9(b) depicts the hit ratios of baseline target attack. Compared to Fig. 9(a), the hit ratios under baseline attack are much lower than those from our attack. Even in the random target movie set, the baseline has its hit ratio of only 4.8% with the injection of 30% fake users.

Fig. 9. Comparisons of hit ratios of baseline and our target attack in MovieLens on the random, unpopular, low-ranking target sets for a range of fake users portions.

6

Conclusion

This paper proposes the first poisoning attack framework targeting the neural collaborative filtering-based recommender system. We have studied two types of poisoning attacks: the availability attack and the target attack, with the goals of demoting recommendation results and promoting specific target objects, respectively. By developing the mathematical models based on the resource constraints and establishing objective functions according to attacker’s goals, we formulated these two types of attacks into optimization problems. Through algorithm designs, we solved the proposed optimization problems effectively. We have implemented the proposed solutions and conducted our attacks on the realworld datasets from Amazon, Twitter and MovieLens. Experimental results have demonstrated that the proposed availability attack and target attack are highly

478

Y. Zhang et al.

effective in demoting recommendation results and promoting specific targets, respectively, in the neural collaborative filtering-based recommender systems. Acknowledgement. This work was supported in part by NSF under Grants 1948374, 1763620, and 1652107, and in part by Louisiana Board of Regents under Contract Numbers LEQSF(2018-21)-RD-A-24 and LEQSF(2019-22)-RD-A-21. Any opinion and findings expressed in the paper are those of the authors and do not necessarily reflect the view of funding agency.

References 1. grouplens.org (2019). https://grouplens.org/datasets/movielens/100k/ 2. Mcauley, J.: (2019). http://jmcauley.ucsd.edu/data/amazon/ 3. Amatriain, X., Pujol, J.M., Tintarev, N., Oliver, N.: Rate it again: increasing recommendation accuracy by user re-rating. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 173–180 (2009) 4. Bai, T., Wen, J.-R., Zhang, J., Zhao, W.X.: A neural collaborative filtering model with interaction-based neighborhood. In: Proceedings of the ACM Conference on Information and Knowledge Management, pp. 1979–1982 (2017) 5. Barkan, O., Koenigstein, N.: Item2vec: neural item embedding for collaborative filtering. In: Proceedings of the 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2016) 6. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998) 7. Covington, P., Adams, J., Sargin, E.: Deep neural networks for YouTube recommendations. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 191–198 (2016) 8. Davidson, J., et al.: The YouTube video recommendation system. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 293–296 (2010) 9. Fang, M., Yang, G., Gong, N.Z., Liu, J.: Poisoning attacks to graph-based recommender systems. In: Proceedings of the 34th Annual Computer Security Applications Conference (ACSAC), pp. 381–392 (2018) 10. Fouss, F., Pirotte, A., Renders, J.-M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007) 11. Goldberg, Y., Levy, O.: Word2vec explained: deriving Mikolov et al’.s negativesampling word-embedding method (2014) 12. Grbovic, M., Cheng, H.: Real-time personalization using embeddings for search ranking at airbnb. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 311–320 (2018) 13. Han, J., Moraga, C.: The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Mira, J., Sandoval, F. (eds.) IWANN 1995. LNCS, vol. 930, pp. 195–201. Springer, Heidelberg (1995). https://doi.org/10.1007/3-54059497-3 175 14. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.-S.: Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web (WWW), pp. 173–182 (2017)

Poisoning Attack

479

15. Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Proceedings of the Eighth IEEE International Conference on Data Mining, pp. 263–272. IEEE (2008) 16. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. IEEE Comput. 8, 30–37 (2009) 17. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web (WWW), pp. 591–600 (2010) 18. Larry, H.: The history of amazon’s recommendation algorithm. https://www. amazon.science/the-history-of-amazons-recommendation-algorithm (2020) 19. Li, B., Wang, Y., Singh, A., Vorobeychik, Y.: Data poisoning attacks on factorization-based collaborative filtering. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 1885–1893 (2016) 20. Liang, D., Krishnan, R.G., Hoffman, M.D., Jebara, T.: Variational autoencoders for collaborative filtering. In: Proceedings of the 27th World Wide Web Conference, pp. 689–698 (2018) 21. Mancera, L., Portilla, J.: L0-norm-based sparse representation through alternate projections. In: Proceedings of the IEEE International Conference on Image Processing, pp. 2089–2092 (2006) 22. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web (WWW), pp. 285–295 (2001) 23. Wang, H., Wang, N., Yeung, D.-Y.: Collaborative deep learning for recommender systems. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1244 (2015) 24. Yang, G., Gong, N.Z., Cai, Y.: Fake co-visitation injection attacks to recommender systems. In: Proceedings of the 24th Annual Network and Distributed System Security Symposium (NDSS) (2017) 25. Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. (CSUR) 52(1), 5 (2019) 26. Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 8778–8788 (2018) 27. Zhou, C., et al.: Atrank: an attention-based user behavior modeling framework for recommendation. In: Proceedings of the 31th AAAI Conference on Artificial Intelligence (2018)

Data Poisoning Attacks Against Federated Learning Systems Vale Tolpegin, Stacey Truex(B) , Mehmet Emre Gursoy, and Ling Liu Georgia Institute of Technology, Atlanta, GA 30332, USA {vtolpegin3,staceytruex,memregursoy}@gatech.edu, [email protected]

Abstract. Federated learning (FL) is an emerging paradigm for distributed training of large-scale deep neural networks in which participants’ data remains on their own devices with only model updates being shared with a central server. However, the distributed nature of FL gives rise to new threats caused by potentially malicious participants. In this paper, we study targeted data poisoning attacks against FL systems in which a malicious subset of the participants aim to poison the global model by sending model updates derived from mislabeled data. We first demonstrate that such data poisoning attacks can cause substantial drops in classification accuracy and recall, even with a small percentage of malicious participants. We additionally show that the attacks can be targeted, i.e., they have a large negative impact only on classes that are under attack. We also study attack longevity in early/late round training, the impact of malicious participant availability, and the relationships between the two. Finally, we propose a defense strategy that can help identify malicious participants in FL to circumvent poisoning attacks, and demonstrate its effectiveness. Keywords: Federated learning · Adversarial machine learning flipping · Data poisoning · Deep learning

1

· Label

Introduction

Machine learning (ML) has become ubiquitous in today’s society as a range of industries deploy predictive models into their daily workflows. This environment has not only put a premium on the ML model training and hosting technologies but also on the rich data that companies are collecting about their users to train and inform such models. Companies and users alike are consequently faced with 2 fundamental questions in this reality of ML: (1) How can privacy concerns around such pervasive data collection be moderated without sacrificing the efficacy of ML models? and (2) How can ML models be trusted as accurate predictors? Federated ML has seen increased adoption in recent years [9,17,40] in response to the growing legislative demand to address user privacy [1,26,38]. Federated learning (FL) allows data to remain at the edge with only model parameters being shared with a central server. Specifically, there is no centralized data curator who collects and verifies an aggregate dataset. Instead, each c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 480–501, 2020. https://doi.org/10.1007/978-3-030-58951-6_24

Data Poisoning Attacks Against Federated Learning Systems

481

data holder (participant) is responsible for conducting training on their local data. In regular intervals participants are then send model parameter values to a central parameter server or aggregator where a global model is created through aggregation of the individual updates. A global model can thus be trained over all participants’ data without any individual participant needing to share their private raw data. While FL systems allow participants to keep their raw data local, a significant vulnerability is introduced at the heart of question (2). Consider the scenario wherein a subset of participants are either malicious or have been compromised by some adversary. This can lead to these participants having mislabeled or poisonous samples in their local training data. With no central authority able to validate data, these malicious participants can consequently poison the trained global model. For example, consider Microsoft’s AI chat bot Tay. Tay was released on Twitter with the underlying natural language processing model set to learn from the Twitter users it interacted with. Thanks to malicious users, Tay was quickly manipulated to learn offensive and racist language [41]. In this paper, we study the vulnerability of FL systems to malicious participants seeking to poison the globally trained model. We make minimal assumptions on the capability of a malicious FL participant – each can only manipulate the raw training data on their device. This allows for non-expert malicious participants to achieve poisoning with no knowledge of model type, parameters, and FL process. Under this set of assumptions, label flipping attacks become a feasible strategy to implement data poisoning, attacks which have been shown to be effective against traditional, centralized ML models [5,44,50,52]. We investigate their application to FL systems using complex deep neural network models. We demonstrate our FL poisoning attacks using two popular image classification datasets: CIFAR-10 and Fashion-MNIST. Our results yield several interesting findings. First, we show that attack effectiveness (decrease in model utility) depends on the percentage of malicious users and the attack is effective even when this percentage is small. Second, we show that attacks can be targeted, i.e., they have large negative impact on the subset of classes that are under attack, but have little to no impact on remaining classes. This is desirable for adversaries who wish to poison a subset of classes while not completely corrupting the global model to avoid easy detection. Third, we evaluate the impact of attack timing (poisoning in early or late rounds of FL training) and the impact of malicious participant availability (whether malicious participants can increase their availability and selection rate to increase effectiveness). Motivated by our finding that the global model may still converge accurately after early-round poisoning stops, we conclude that largest poisoning impact can be achieved if malicious users participate in later rounds and with high availability. Given the highly effective poisoning threat to FL systems, we then propose a defense strategy for the FL aggregator to identify malicious participants using their model updates. Our defense is based on the insight that updates sent from malicious participants have unique characteristics compared to honest participants’ updates. Our defense extracts relevant parameters from the

482

V. Tolpegin et al.

high-dimensional update vectors and applies PCA for dimensionality reduction. Results on CIFAR-10 and Fashion-MNIST across varying malicious participant rates (2–20%) show that the aggregator can obtain clear separation between malicious and honest participants’ respective updates using our defense strategy. This enables the FL aggregator to identify and block malicious participants. The rest of this paper is organized as follows. In Sect. 2, we introduce the FL setting, threat model, attack strategy, and attack evaluation metrics. In Sect. 3, we demonstrate the effectiveness of FL poisoning attacks and analyze their impact with respect to malicious participant percentage, choice of classes under attack, attack timing, and malicious participant availability. In Sect. 4, we describe and empirically demonstrate our defense strategy. We discuss related work in Sect. 5 and conclude in Sect. 6. Our source code is available1 .

2

Preliminaries and Attack Formulation

2.1

Federated Machine Learning

FL systems allow global model training without the sharing of raw private data. Instead, individual participants only share model parameter updates. Consider a deep neural network (DNN) model. DNNs consist of multiple layers of nodes where each node is a basic functional unit with a corresponding set of parameters. Nodes receive input from the immediately preceding layer and send output to the following layer; with the first layer nodes receiving input from the training data and the final layer nodes generating the predictive result. In a traditional DNN learning scenario, there exists a training dataset D = (x1 , ..., xn ) and a loss function L. Each xi ∈ D is defined as a set of features fi and a class label ci ∈ C where C is the set of all possible class values. The final layer of a DNN architecture for such a dataset will consequently contain |C| nodes, each corresponding to a different nclass in C. The loss of this DNN given parameters θ on D is denoted: L = n1 i L(θ, xi ). When fi is fed through the DNN with model parameters θ, the output is a set of predicted probabilities pi . Each value pc,i ∈ pi is the predicted probability that xi has a class value c, and pi contains a probability pc,i for each class value c ∈ C. Each predicted probability pc,i is computed by a node nc in the final layer of the DNN architecture using input received from the preceding layer and nc ’s corresponding parameters in θ. The predicted class for instance xi given a model M with parameters θ then becomes Mθ (xi ) = argmaxc∈C pc,i . Given a cross entropy loss  function, the loss on xi can consequently can be calculated as L(θ, xi ) = − c∈C yc,i log(pc,i ) where yc,i = 1 if c = ci and 0 otherwise. The goal of training a DNN model then becomes to find the parameter values for θ which minimize the chosen loss function L. The process of minimizing this loss is typically done through an iterative process called stochastic gradient descent (SGD). At each step, the SGD algorithm 1

https://github.com/git-disl/DataPoisoning FL.

Data Poisoning Attacks Against Federated Learning Systems

483

(1) selects a batch of samples B ⊆ D, (2) computes the corresponding gradi 1 ent gB = |B| x∈B ∇θ L(θ, x), and (3) then updates θ in the direction −gB . In practice, D is shuffled and then evenly divided into |B| sized batches such that each sample occurs in exactly one batch. Applying SGD iteratively to each of the pre-determined batches is then referred to as one epoch. In FL environments however, the training dataset D is not wholly available at the aggregator. Instead, N participants P each hold their own private training dataset D1 , ..., DN . Rather than sharing their private raw data, participants instead execute the SGD training algorithm locally and then upload updated parameters to a centralized server (aggregator). Specifically, in the initialization phase (i.e., round 0), the aggregator generates a DNN architecture with parameters θ0 which is advertised to all participants. At each global training round r, a subset Pr consisting of k ≤ N participants is selected based on availability. Each participant Pi ∈ Pr executes one epoch of SGD locally on Di to obtain updated parameters θr,i , which  are sent to the aggregator. The aggregator sets the global parameters θr = k1 i θr,i ∀i where Pi ∈ Pr . The global parameters θr are then advertised to all N participants. These global parameters at the end of round r are used in the next training round r + 1. After R total global training rounds, the model M is finalized with parameters θR . 2.2

Threat and Adversary Model

Threat Model: We consider the scenario in which a subset of FL participants are malicious or are controlled by a malicious adversary. We denote the percentage of malicious participants among all participants P as m%. Malicious participants may be injected to the system by adding adversary-controlled devices, compromising m% of the benign participants’ devices, or incentivizing (bribing) m% of benign participants to poison the global model for a certain number of FL rounds. We consider the aggregator to be honest and not compromised. Adversarial Goal: The goal of the adversary is to manipulate the learned parameters such that the final global model M has high errors for particular classes (a subset of C). The adversary is thereby conducting a targeted poisoning attack. This differs from untargeted attacks which instead seek indiscriminate high global model errors across all classes [6,14,51]. Targeted attacks have the desirable property that they decrease the possibility of the poisoning attack being detected by minimizing influence on non-targeted classes. Adversary Knowledge and Capability: We consider a realistic adversary model with the following constraints. Each malicious participant can manipulate the training data Di on their own device, but cannot access or manipulate other participants’ data or the model learning process, e.g., SGD implementation, loss function, or server aggregation process. The attack is not specific to the DNN architecture, loss function or optimization function being used. It requires training data to be corrupted, but the learning algorithm remains unaltered.

484

2.3

V. Tolpegin et al.

Label Flipping Attacks in Federated Learning

We use a label flipping attack to implement targeted data poisoning in FL. Given a source class csrc and a target class ctarget from C, each malicious participant Pi modifies their dataset Di as follows: For all instances in Di whose class is csrc , change their class to ctarget . We denote this attack by csrc → ctarget . For example, in CIFAR-10 image classification, airplane → bird denotes that images whose original class labels are airplane will be poisoned by malicious participants by changing their class to bird. The goal of the attack is to make the final global model M more likely to misclassify airplane images as bird images at test time. Label flipping is a well-known attack in centralized ML [43,44,50,52]. It is also suitable for the FL scenario given the adversarial goal and capabilities above. Unlike other types of poisoning attacks, label flipping does not require the adversary to know the global distribution of D, the DNN architecture, loss function L, etc. It is time and energy-efficient, an attractive feature considering FL is often executed on edge devices . It is also easy to carry out for nonexperts and does not require modification or tampering with participant-side FL software. Table 1. Notations used throughout the paper. M , MN P

Model, model trained with no poisoning

k

Number of FL participants in each round

R

Total number of rounds of FL training

Pr

FL participants queried at round r, r ∈ [1, R]

θr , θr,i

Global model parameters after round r and local model parameters at participant Pi after round r

m%

Percentage of malicious participants

csrc , ctarget

Source and target class in label flipping attack

M acc

Global model accuracy

crecall i

Class recall for class ci

m cntij

Baseline misclassification count from class ci to class cj

Attack Evaluation Metrics: At the end of R rounds of FL, the model M is finalized with parameters θR . Let Dtest denote the test dataset used in evaluating M , where Dtest ∩ Di = ∅ for all participant datasets Di . In the next sections, we provide a thorough analysis of label flipping attacks in FL. To do so, we use a number of evaluation metrics.

Data Poisoning Attacks Against Federated Learning Systems

485

Global Model Accuracy (M acc ): The global model accuracy is the percentage of instances x ∈ Dtest where the global model M with final parameters θR predicts MθR (x) = ci and ci is indeed the true class label of x. Class Recall (crecall ): For any class ci ∈ C, its class recall is the percentage i T Pi · 100% where T Pi is the number of instances x ∈ Dtest where MθR (x) = T Pi +F Ni ci and ci is the true class label of x; and F Ni is the number of instances x ∈ Dtest where MθR (x) = ci and the true class label of x is ci . Baseline Misclassification Count (m cntij ): Let MN P be a global model trained for R rounds using FL without any malicious attack. For classes ci = cj , the baseline misclassification count from ci to cj , denoted m cntij , is defined as the number of instances x ∈ Dtest where MN P (x) = cj and the true class of x is ci . Table 1 provides a summary of the notation used in the rest of this paper.

3 3.1

Analysis of Label Flipping Attacks in FL Experimental Setup

Datasets and DNN Architectures: We conduct our attacks using two popular image classification datasets: CIFAR-10 [22] and Fashion-MNIST [49]. CIFAR-10 consists of 60,000 color images in 10 object classes such as deer, airplane, and dog with 6,000 images included per class. The complete dataset is pre-divided into 50,000 training images and 10,000 test images. Fashion-MNIST consists of a training set of 60,000 images and a test set of 10,000 images. Each image in Fashion-MNIST is gray-scale and associated with one of 10 classes of clothing such as pullover, ankle boot, or bag. In experiments with CIFAR-10, we use a convolutional neural network with six convolutional layers, batch normalization, and two fully connected dense layers. This DNN architecture achieves a test accuracy of 79.90% in the centralized learning scenario, i.e. N = 1, without poisoning. In experiments with Fashion-MNIST, we use a two layer convolutional neural network with batch normalization, an architecture which achieves 91.75% test accuracy in the centralized scenario without poisoning. Further details of the datasets and DNN model architectures can be found in Appendix A. Federated Learning Setup: We implement FL in Python using the PyTorch [35] library. By default, we have N = 50 participants, one central aggregator, and k = 5. We use an independent and identically distributed (iid ) data distribution, i.e., we assume the total training dataset is uniformly randomly distributed among all participants with each participant receiving a unique subset of the training data. The testing data is used for model evaluation only and is therefore not included in any participant Pi ’s train dataset Di . Observing that both DNN models converge after fewer than 200 training rounds, we set our FL experiments to run for R = 200 rounds total. Label Flipping Process: In order to simulate the label flipping attack in a FL system with N participants of which m% are malicious, at the start of each experiment we randomly designate N × m% of the participants from P

486

V. Tolpegin et al.

as malicious. The rest are honest. To address the impact of random selection of malicious participants, by default we repeat each experiment 10 times and report the average results. Unless otherwise stated, we use m = 10%. For both datasets we consider three label flipping attack settings representing a diverse set of conditions in which to base adversarial attacks. These conditions include (1) a source class → target class pairing whose source class was very frequently misclassified as the target class in federated, non-poisoned training, (2) a pairing where the source class was very infrequently misclassified as the target class, and (3) a pairing between these two extremes. Specifically, for CIFAR-10 we test (1) 5: dog → 3: cat, (2) 0: airplane → 2: bird, and (3) 1: automobile → 9: truck. For Fashion-MNIST we experiment with (1) 6: shirt → 0: t-shirt/top, (2) 1: trouser → 3: dress, and (3) 4: coat → 6: shirt. 3.2

Label Flipping Attack Feasibility

We start by investigating the feasibility of poisoning FL systems using label flipping attacks. Figure 1 outlines the global model accuracy and source class recall in scenarios with malicious participant percentage m ranging from 2% to 50%. Results demonstrate that as the malicious participant percentage, increases the global model utility (test accuracy) decreases. Even with small m, we observe a decrease in model accuracy compared to a non-poisoned model (denoted by MN P in the graphs), and there is an even larger decrease in source class recall. In experiments with CIFAR-10, once m reaches 40%, the recall of the source class decreases to 0% and the global model accuracy decreases from 78.3% in the non-poisoned setting to 74.4% in the poisoned setting. Experiments conducted on Fashion-MNIST show a similar pattern of utility loss. With m = 4% source class recall drops by ∼ 10% and with m = 10% it drops by ∼ 20%. It is therefore clear that an adversary who controls even a minor proportion of the total participant population is capable of significantly impacting global model utility.

Fig. 1. Evaluation of attack feasibility and impact of malicious participant percentage on attack effectiveness. CIFAR-10 experiments are for the 5 → 3 setting while FashionMNIST experiments are for the 4 → 6 setting. Results are averaged from 10 runs for each setting of m%. The black bars are mean over the 10 runs and the green error bars denote standard deviation. (Color figure online)

Data Poisoning Attacks Against Federated Learning Systems

487

While both datasets are vulnerable to label flipping attacks, the degree of vulnerability varies between datasets with CIFAR-10 demonstrating more vulnerability than Fashion-MNIST. For example, consider the 30% malicious scenario, Figure 1b shows the source class recall for the CIFAR-10 dataset drops to 19.7% while Fig. 1d shows a much lower decrease for the Fashion-MNIST dataset with 58.2% source class recall under the same experimental settings. Table 2. Loss in source class recall for three source → target class settings with differing baseline misclassification counts in CIFAR-10 and Fashion-MNIST. Loss averaged from 10 runs. Highlighted bold entries are highest loss in each. csrc → ctarget m cntsrc target

Percentage of Malicious Participants (m%) 2

4

10

20

30

40

50

73%

70.5%

CIFAR-10 0→2

16

1.42% 2.93% 10.2% 14.1% 48.3%

1→9

56

0.69% 3.75% 6.04%

5→3

200

15%

36.3% 49.2% 54.7%

3.21% 7.92% 25.4% 49.5% 69.2% 69.2%

0%

Fashion-MNIST 1→3

18

0.12% 0.42% 2.27% 2.41% 40.3% 45.4%

4→6

51

0.61% 7.16%

6→0

118

−1%

16%

42%

29.2% 28.7% 37.1% 58.9%

2.19% 7.34% 9.81% 19.9%

39%

43.4%

On the other hand, vulnerability variation based on source and target class settings is less clear. In Table 2, we report the results of three different combinations of source → target attacks for each dataset. Consider the two extreme settings for the CIFAR-10 dataset: on the low end the 0 → 2 setting has a baseline misclassification count of 16 while the high end count is 200 for the 5 → 3 setting. Because of the DNN’s relative challenge in differentiating class 5 from class 3 in the non-poisoned setting, it could be anticipated that conducting a label flipping attack within the 5 → 3 setting would result in the greatest impact on source class recall. However, this was not the case. Table 2 shows that in only two out of the six experimental scenarios did 5 → 3 record the largest drop in source class recall. In fact, four scenarios’ results show the 0 → 2 setting, the setting with the lowest baseline misclassification count, as the most effective option for the adversary. Experiments with Fashion-MNIST show a similar trend, with label flipping attacks conducted in the 4 → 6 setting being the most successful rather than the 6 → 0 setting which has more than 2× the number of baseline misclassifications. These results indicate that identifying the most vulnerable source and target class combination may be a non-trivial task for the adversary, and that there is not necessarily a correlation between non-poisoned misclassification performance and attack effectiveness. We additionally study a desirable feature of the label flipping attack: they appear to be targeted. Specifically, Table 3 reports the following quantities for each source → target flipping scenario: loss in source class recall, loss in target

488

V. Tolpegin et al.

Table 3. Changes due to poisoning in source class recall, target class recall, and total recall for all remaining classes (non-source, non-target). Results are averaged from 10 runs in each setting. The maximum standard deviation observed was 1.45% in source class recall and 1.13% in target class recall. csrc → ctarget Δ crecall Δ crecall src target



all other Δ crecall

CIFAR-10 0→2 1→9 5→3

−6.28% −6.22% −6.12%

1.58% 2.28% 3.00%

0.34% 0.16% 0.17%

Fashion-MNIST 1→3 4→6 6→0

−2.23% −9.96% −8.87%

0.25% 2.40% 2.59%

0.01% 0.09% 0.20%

Fig. 2. Relationship between global model accuracy and source class recall across changing percentages of malicious participants for CIFAR-10 and Fashion-MNIST. As is 1:10. each dataset has 10 classes, the scale for M acc vs crecall src

class recall, and loss in recall of all remaining classes. We observe that the attack causes substantial change in source class recall (> 6% drop in most cases) and target class recall. However, the attack impact on the recall of remaining classes is an order of magnitude smaller. CIFAR-10 experiments show a maximum of 0.34% change in class recalls attributable to non-source and non-target classes and Fashion-MNIST experiments similarly show a maximum change of 0.2% attributable to non-source and non-target classes, both of which are relatively minor compared to source and target classes. Thus, the attack is causing the global model to misclassify instances belonging to csrc as ctarget at test time while other classes remain relatively unimpacted, demonstrating its targeted nature towards csrc and ctarget . Considering the large impact of the attack on source class recall, changes in source class recall therefore make up the vast majority of the decreases in global model accuracy caused by label flipping attacks in FL

Data Poisoning Attacks Against Federated Learning Systems

489

systems. This observation can also be seen in Fig. 2 where the change in global model accuracy closely follows the change in source class recall. The targeted nature of the label flipping attack allows for adversaries to remain under the radar in many FL systems. Consider systems where the data contain 100 classes or more, as is the case in CIFAR-100 [22] and ImageNet [13]. In such cases, targeted attacks become much more stealthy due to their limited impact to classes other than source and target. 3.3

Attack Timing in Label Flipping Attacks

While label flipping attacks can occur at any point in the learning process and last for arbitrary lengths, it is important to understand the capabilities of adversaries who are available for only part of the training process. For instance, Google’s Gboard application of FL requires all participant devices be plugged into power and connected to the internet via WiFi [9]. Such requirements create cyclic conditions where many participants are not available during the day, when phones are not plugged in and are actively in use. Adversaries can take advantage of this design choice, making themselves available at times when honest participants are unable to. We consider two scenarios in which the adversary is restricted in the time in which they are able to make malicious participants available: one in which the adversary makes malicious participants available only before the 75th training round, and one in which malicious participants are available only after the 75th training round. As the rate of global model accuracy improvement decreases with both datasets by training round 75, we choose this point to highlight how pre-established model stability may effect an adversary’s ability to launch an effective label flipping attack. Results for the first scenario are given in Fig. 3 whereas the results for the second scenario are given in Fig. 4. In Figure 3, we compare source class recall in a non-poisoned setting versus with poisoning only before round 75. Results on both CIFAR-10 and FashionMNIST show that while there are observable drops in source class recall during the rounds with poisoning (1–75), the global model is able to recover quickly after poisoning finishes (after round 75). Furthermore, the final convergence of the models (towards the end of training) are not impacted, given the models with and without poisoning are converge with roughly the same recall values. We do note that some CIFAR-10 experiments exhibited delayed convergence by an additional 50–100 training rounds, but these circumstances were rare and still eventually achieved the accuracy and recall levels of a non-poisoned model despite delayed convergence.

490

V. Tolpegin et al.

Fig. 3. Source class recall by round for experiments with “early round poisoning”, i.e., malicious participation only in the first 75 rounds (r < 75). The blue line indicates the round at which malicious participation is no longer allowed.

Fig. 4. Source class recall by round for experiments with “late round poisoning”, i.e., malicious participation only after round 75 (r ≥ 75). The blue line indicates the round at which malicious participation starts.

In Fig. 4, we compare source class recall in a non-poisoned setting versus with poisoning limited to the 75th and later training rounds. These results show the impact of such late poisoning demonstrating limited longevity; a phenomena which can be seen in the quick and dramatic changes in source class recall. Specifically, source class recall quickly returns to baseline levels once fewer malicious participants are selected in a training round even immediately

Table 4. Final source class recall when at least one malicious party participates in the final round R versus when all participants in round R are non-malicious. Results averaged for 10 runs for each experimental setting. Source Class Recall (crecall ) src m% ∈ PR > 0 m% ∈ PR = 0 CIFAR-10 73.90% 82.45% 77.30% 89.40% 57.50% 73.10% Fashion-MNIST 84.32% 96.25% 51.50% 89.60% 49.80% 73.15%

csrc → ctarget 0 1 5

→ → →

2 9 3

1 4 6

→ → →

3 6 0

Data Poisoning Attacks Against Federated Learning Systems

491

Fig. 5. Evaluation of impact from malicious participants’ availability α on source class recall. Results are averaged from 3 runs for each setting.

following a round with a large number of malicious participants having caused a dramatic drop. However, the final poisoned model in the late-round poisoning scenario may show substantial difference in accuracy or recall compared to a non-poisoned model. This is evidenced by the CIFAR-10 experiment in Fig. 4, in which the source recall of the poisoned model is ∼10% lower compared to non-poisoned. Furthermore, we observe that model convergence on both datasets is negatively impacted, as evidenced by the large variances in recall values between consecutive rounds. Consider Table 4 where results are compared when either (1) at least one malicious participant is selected for PR or (2) PR is made entirely of honest participants. When at least one malicious participant is selected, the final source class recall is, on average, 12.08% lower with the CIFAR-10 dataset and 24.46% lower with the Fashion-MNIST dataset. The utility impact from the label flipping attack is therefore predominantly tied to the number of malicious participants selected in the last few rounds of training. 3.4

Malicious Participant Availability

Given the impact of malicious participation in late training rounds on attack effectiveness, we now introduce a malicious participant availability parameter α. By varying α we can simulate the adversary’s ability to control compromised participants’ availability (i.e. ensuring connectivity or power access) at various points in training. Specifically, α represents malicious participants’ availability and therefore likeliness to be selected relative to honest participants. For example, if α = 0.6, when selecting each participant Pi ∈ Pr for round r, there is a 0.6 probability that Pi will be one of the malicious participants. Larger α implies higher likeliness of malicious participation. In cases where k > N × m%, the number of malicious participants in Pr is bounded by N × m%. Figure 5 reports results for varying values of α in late round poisoning, i.e., malicious participation is limited to rounds r ≥ 75. Specifically, we are interested in studying those scenarios where an adversary boosts the availability of

492

V. Tolpegin et al.

Fig. 6. Source class recall by round when malicious participants’ availability is close to that of honest participants (α = 0.6) vs significantly increased (α = 0.9). The blue line indicates the round in which attack starts.

the malicious participants enough that their selection becomes more likely than the non-malicious participants, hence in Fig. 5 we use α ≥ 0.6. The reported source class recalls in Fig. 5 are averaged over the last 125 rounds (total 200 rounds minus first 75 rounds) to remove the impact of individual round variability; further, each experiment setting is repeated 3 times and results are averaged. The results show that, when the adversary maintains sufficient representation in the participant pool (i.e. m ≥ 10%), manipulating the availability of malicious participants can yield significantly higher impact on the global model utility with source class recall losses in excess of 20%. On both datasets with m ≥ 10%, the negative impact on source class recall is highest with α = 0.9, which is followed by α = 0.8, α = 0.7 and α = 0.6, i.e., in decreasing order of malicious participant availability. Thus, in order to mount an impactful attack, it is in the best interests of the adversary to perform the attack with highest malicious participant availability in late rounds. We note that when k is significantly larger than N × m%, increasing availability (α) will be insufficient for meaningfully increasing malicious participant selection in individual training rounds. Therefore, experiments where m < 10% show little variation despite changes in α. To more acutely demonstrate the impact of α, Fig. 6 reports source class recall by round when α = 0.6 and α = 0.9 for both the CIFAR-10 and FashionMNIST datasets. In both datasets, when malicious participants are available more frequently, the source class recall is effectively shifted lower in the graph, i.e., source class recall values with α = 0.9 are often much smaller than those with α = 0.6. We note that the high round-by-round variance in both graphs is due to the probabilistic variability in number of malicious participants in individual training rounds. When fewer malicious participants are selected in one training round relative to the previous round, source recall increases. When more malicious participants are selected in an individual round relative to the previous round, source recall falls. We further explore and illustrate our last remark with respect to the impact of malicious parties’ participation in consecutive rounds in Fig. 7. In this figure, the x-axis represents the change in the number of malicious clients participating

Data Poisoning Attacks Against Federated Learning Systems

493

Fig. 7. Relationship between change in source class recall in consecutive rounds versus change in number of malicious participants in consecutive rounds. Specifically, ∀r ≥ 75 @ round r) - (crecall @ round r − 1) while the x-axis the y-axis represents (crecall src src represents (# of malicious ∈ Pr ) - (# of malicious ∈ Pr−1 ).

in consecutive rounds, i.e., (# of malicious ∈ Pr ) – (# of malicious ∈ Pr−1 ). The y-axis represents the change in source class recall between these consecutive @ round r) – (crecall @ round r −1). The reported results are rounds, i.e., (crecall src src then averaged across multiple runs of FL and all cases in which each participation difference was observed. The results confirm our intuition that, when Pr contains more malicious participants than Pr−1 , there is a substantial drop in source class recall. For large differences (such as +3 or +4), the drop could be as high as 40% or 60%. In contrast, when Pr contains fewer malicious participants than Pr−1 , there is a substantial increase in source class recall, which can be as high as 60% or 40% when the difference is −4 or −3. Altogether, this demonstrates the possibility that the DNN could recover significantly even in few rounds of FL training, if a large enough decrease in malicious participation could be achieved.

4

Defending Against Label Flipping Attacks

Given a highly effective adversary, how can a FL system defend against the label flipping attacks discussed thus far? To that end, we propose a defense which enables the aggregator to identify malicious participants. After identifying malicious participants, the aggregator may blacklist them or ignore their updates θr,i in future rounds. We showed in Sects. 3.3 and 3.4 that high-utility model convergence can be eventually achieved after eliminating malicious participation. The feasibility of such a recovery from early round attacks supports use of the proposed identification approach as a defense strategy. Our defense is based on the following insight: The parameter updates sent from malicious participants have unique characteristics compared to honest participants’ updates for a subset of the parameter space. However, since DNNs have many parameters (i.e., θr,i is extremely high dimensional) it is non-trivial to analyze parameter updates by hand. Thus, we propose an automated strategy for identifying the relevant parameter subset and for studying participant updates using dimensionality reduction (PCA).

494

V. Tolpegin et al.

Algorithm 1: Identifying Malicious Model Updates in FL def evaluate updates(R : set of vulnerable train rounds, P : participant set): U =∅ for r ∈ R do Pr ← participants ∈ P queried in training round r θr−1 ← global model parameters after training round r − 1 for Pi ∈ Pr do θr,i ← updated parameters after train DNN(θr−1 , Di ) θΔ,i ← θr,i − θr src θΔ,i ← parameters ∈ θΔ,i connected to source class output node src to U Add θΔ,i  U ← standardize(U) U  ← PCA(U  , components=2 ) plot(U  )

Fig. 8. PCA plots with 2 components demonstrating the ability of Algorithm 1 to identify updates originating from a malicious versus honest participant. Plots represent relevant gradients collected from all training rounds r > 10. Blue Xs represent gradients from malicious participants while yellow Os represent gradients from honest participants. (Color figure online)

The description of our defense strategy is given in Algorithm 1. Let R denote the set of vulnerable FL training rounds and csrc be the class that is suspected to be the source class of a poisoning attack. We note that if csrc is unknown, the aggregator can defend against potential attacks such that csrc = c ∀c ∈ C. We also note that for a given csrc , Algorithm 1 considers label flipping for all possible ctarget . An aggregator therefore will conduct |C| independent iterations of Algorithm 1, which can be conducted in parallel. For each round r ∈ R and participant Pi ∈ Pr , the aggregator computes the delta in participant’s model update compared to the global model, i.e., θΔ,i ← θr,i − θr . Recall from Sect. 2.1 that a predicted probability for any given class c is computed by a specific node

Data Poisoning Attacks Against Federated Learning Systems

495

nc in the final layer DNN architecture. Given the aggregator’s goal of defending against the label flipping attack from csrc , only the subset of the parameters in θΔ,i corresponding to ncsrc is extracted. The outcome of the extraction is src and added to a global list U built by the aggregator. After U denoted by θΔ,i is constructed across multiple rounds and participant deltas, it is standardized by removing the mean and scaling to unit variance. The standardized list U  is fed into Principal Component Analysis (PCA), which is a popular ML technique used for dimensionality reduction and pattern visualization. For ease of visualization, we use and plot results with two dimensions (two components). In Fig. 8, we show the results of Algorithm 1 on CIFAR-10 and FashionMNIST across varying malicious participation rate m, with R = [10, 200]. Even in scenarios with low m, as is shown in Figs. 8a and 8e, our defense is capable of differentiating between malicious and honest participants. In all graphs, the PCA outcome shows that malicious participants’ updates belong to a visibly different cluster compared to honest participants’ updates which form their own cluster. Another interesting observation is that our defense does not suffer from the “gradient drift” problem. Gradient drift is a potential challenge in designing a robust defense, since changes in model updates may be caused both by actual DNN learning and convergence (which is desirable) or malicious poisoning attempt (which our defense is trying to identify and prevent). Our results show that, even though the defense is tested with a long period of rounds (190 training rounds since R = [10, 200]), it remains capable of separating malicious and honest participants, demonstrating its robustness to gradient drift. A FL system aggregator can therefore effectively identify malicious participants, and consequently restrict their participation in mobile training, by conducting such gradient clustering prior to aggregating parameter updates at each round. Clustering model gradients for malicious participant identification presents a strong defense as it does not require access to any public validation dataset, as is required in [3], which is not necessarily possible to acquire.

5

Related Work

Poisoning attacks are highly relevant in domains such as spam filtering [10,32], malware and network anomaly detection [11,24,39], disease diagnosis [29], computer vision [34], and recommender systems [15,54]. Several poisoning attacks were developed for popular ML models including SVM [6,12,44,45,50,52], regression [19], dimensionality reduction [51], linear classifiers [12,23,57], unsupervised learning [7], and more recently, neural networks [12,30,42,45,53,58]. However, most of the existing work is concerned with poisoning ML models in the traditional setting where training data is first collected by a centralized party. In contrast, our work studies poisoning attacks in the context of FL. As a result, many of the poisoning attacks and defenses that were designed for traditional ML are not suitable to FL. For example, attacks that rely on crafting optimal poison instances by observing the training data distribution are inapplicable since the

496

V. Tolpegin et al.

malicious FL participant may only access and modify the training data s/he holds. Similarly, server-side defenses that rely on filtering and eliminating poison instances through anomaly detection or k-NN [36,37] are inapplicable to FL since the server only observes parameter updates from FL participants, not their individual instances. The rising popularity of FL has led to the investigation of different attacks in the context of FL, such as backdoor attacks [2,46], gradient leakage attacks [18,27,59] and membership inference attacks [31,47,48]. Most closely related to our work are poisoning attacks in FL. There are two types of poisoning attacks in FL: data poisoning and model poisoning. Our work falls under the data poisoning category. In data poisoning, a malicious FL participant manipulates their training data, e.g., by adding poison instances or adversarially changing existing instances [16,43]. The local learning process is otherwise not modified. In model poisoning, the malicious FL participant modifies its learning process in order to create adversarial gradients and parameter updates. [4] and [14] demonstrated the possibility of causing high model error rates through targeted and untargeted model poisoning attacks. While model poisoning is also effective, data poisoning may be preferable or more convenient in certain scenarios, since it does not require adversarial tampering of model learning software on participant devices, it is efficient, and it allows for non-expert poisoning participants. Finally, FL poisoning attacks have connections to the concept of Byzantine threats, in which one or more participants in a distributed system fail or misbehave. In FL, Byzantine behavior was shown to lead to sub-optimal models or non-convergence [8,20]. This has spurred a line of work on Byzantineresilient aggregation for distributed learning, such as Krum [8], Bulyan [28], trimmed mean, and coordinate-wise median [55]. While model poisoning may remain successful despite Byzantine-resilient aggregation [4,14,20], it is unclear whether optimal data poisoning attacks can be found to circumvent an individual Byzantine-resilient scheme, or whether one data poisoning attack may circumvent multiple Byzantine-resilient schemes. We plan to investigate these issues in future work.

6

Conclusion

In this paper we studied data poisoning attacks against FL systems. We demonstrated that FL systems are vulnerable to label flipping poisoning attacks and that these attacks can significantly negatively impact the global model. We also showed that the negative impact on the global model increases as the proportion of malicious participants increases, and that it is possible to achieve targeted poisoning impact. Further, we demonstrated that adversaries can enhance attack effectiveness by increasing the availability of malicious participants in later rounds. Finally, we proposed a defense which helps an FL aggregator separate malicious from honest participants. We showed that our defense is capable of identifying malicious participants and it is robust to gradient drift.

Data Poisoning Attacks Against Federated Learning Systems

497

As poisoning attacks against FL systems continue to emerge as important research topics in the security and ML communities [4,14,21,33,56], we plan to continue our work in several ways. First, we will study the impacts of the attack and defense on diverse FL scenarios differing in terms of data size, distribution among FL participants (iid vs non-iid), data type, total number of instances available per class, etc. Second, we will study more complex adversarial behaviors such as each malicious participant changing the labels of only a small portion of source samples or using more sophisticated poisoning strategies to avoid being detected . Third, while we designed and tested our defense against the label flipping attack, we hypothesize the defense will be useful against model poisoning attacks since malicious participants’ gradients are often dissimilar to those of honest participants. Since our defense identifies dissimilar or anomalous gradients, we expect the defense to be effective against other types of FL attacks that cause dissimilar or anomalous gradients. In future work, we will study the applicability of our defense against such other FL attacks including model poisoning, untargeted poisoning, and backdoor attacks. Acknowledgements. This research is partially sponsored by NSF CISE SaTC 1564097 and 2038029. The second author acknowledges an IBM PhD Fellowship Award and the support from the Enterprise AI, Systems & Solutions division led by Sandeep Gopisetty at IBM Almaden Research Center. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or other funding agencies and companies mentioned above.

A

DNN Architectures and Configuration

All NNs were trained using PyTorch version 1.2.0 with random weight initialization. Training and testing was completed using a NVIDIA 980 Ti GPUaccelerator. When necessary, all CUDA tensors were mapped to CPU tensors before exporting to Numpy arrays. Default drivers provided by Ubuntu 19.04 and built-in GPU support in PyTorch was used to accelerate training. Details can be found in our repository: https://github.com/git-disl/DataPoisoning FL. Fashion-MNIST: We do not conduct data pre-processing. We use a Convolutional Neural Network with the architecture described in Table 6. In the table, Conv = Convolutional Layer, and Batch Norm = Batch Normalization. CIFAR-10: We conduct data pre-processing prior to training. Data is normalized with mean [0.485, 0.456, 0.406] and standard deviation [0.229, 0.224, 0.225]. Values reflect mean and standard deviation of the ImageNet dataset [13] and are commonplace, even expected when using Torchvision [25] models. We additionally perform data augmentation with random horizontal flipping, random cropping with size 32, and default padding. Our CNN is detailed in Table 5.

498

V. Tolpegin et al. Table 5. CIFAR-10 CNN. Layer type

Table 6. Fashion-MNIST CNN. Size

Conv + ReLu + Batch Norm 3 × 3 × 32 Conv + ReLu + Batch Norm 3 × 32 × 32 Max Pooling 2×2 Conv + ReLu + Batch Norm 3 × 32 × 64 Conv + ReLu + Batch Norm 3 × 64 × 64 Max Pooling 2×2 Conv + ReLu + Batch Norm 3 × 64 × 128 Conv + ReLu + Batch Norm 3 × 128 × 128 Max Pooling 2×2 Fully Connected 2048 Fully Connected + Softmax 128/10

Layer type

Size

Conv + ReLu + Batch Norm 5 × 1 × 16 Max Pooling

2×2

Conv + ReLu + Batch Norm 5 × 16 × 32 Max Pooling

2×2

Fully Connected

1568/10

References 1. An Act: Health insurance portability and accountability act of 1996. Public Law 104-191 (1996) 2. Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., Shmatikov, V.: How to backdoor federated learning. arXiv preprint arXiv:1807.00459 (2018) 3. Baracaldo, N., Chen, B., Ludwig, H., Safavi, J.A.: Mitigating poisoning attacks on machine learning models: a data provenance based approach. In: 10th ACM Workshop on Artificial Intelligence and Security, pp. 103–110 (2017) 4. Bhagoji, A.N., Chakraborty, S., Mittal, P., Calo, S.: Analyzing federated learning through an adversarial lens. In: International Conference on Machine Learning, pp. 634–643 (2019) 5. Biggio, B., Nelson, B., Laskov, P.: Support vector machines under adversarial label noise. In: Asian Conference on Machine Learning, pp. 97–112 (2011) 6. Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, pp. 1467–1474 (2012) 7. Biggio, B., Pillai, I., Rota Bul` o, S., Ariu, D., Pelillo, M., Roli, F.: Is data clustering in adversarial settings secure? In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, pp. 87–98 (2013) 8. Blanchard, P., Guerraoui, R., Stainer, J., et al.: Machine learning with adversaries: byzantine tolerant gradient descent. In: NeurIPS, pp. 119–129 (2017) 9. Bonawitz, K., et al.: Towards federated learning at scale: System design. In: SysML 2019 (2019, to appear). https://arxiv.org/abs/1902.01046 10. Bursztein, E.: Attacks against machine learning - an overview (2018). https://elie. net/blog/ai/attacks-against-machine-learning-an-overview/ 11. Chen, S., et al.: Automated poisoning attacks and defenses in malware detection systems: an adversarial machine learning approach. Comput. Secur. 73, 326–344 (2018) 12. Demontis, A., et al.: Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In: 28th USENIX Security Symposium, pp. 321–338 (2019) 13. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

Data Poisoning Attacks Against Federated Learning Systems

499

14. Fang, M., Cao, X., Jia, J., Gong, N.Z.: Local model poisoning attacks to byzantinerobust federated learning. In: USENIX Security Symposium (2020, to appear) 15. Fang, M., Yang, G., Gong, N.Z., Liu, J.: Poisoning attacks to graph-based recommender systems. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 381–392 (2018) 16. Fung, C., Yoon, C.J., Beschastnikh, I.: Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866 (2018) 17. Hard, A., et al.: Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604 (2018) 18. Hitaj, B., Ateniese, G., Perez-Cruz, F.: Deep models under the GAN: information leakage from collaborative deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618 (2017) 19. Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., Li, B.: Manipulating machine learning: poisoning attacks and countermeasures for regression learning. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 19–35. IEEE (2018) 20. Kairouz, P., et al.: Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977 (2019) 21. Khazbak, Y., Tan, T., Cao, G.: MLGuard: mitigating poisoning attacks in privacy preserving distributed collaborative learning (2020) 22. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) 23. Liu, C., Li, B., Vorobeychik, Y., Oprea, A.: Robust linear regression against training data poisoning. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 91–102 (2017) 24. Maiorca, D., Biggio, B., Giacinto, G.: Towards adversarial malware detection: lessons learned from pdf-based attacks. ACM Comput. Surv. (CSUR) 52(4), 1– 36 (2019) 25. Marcel, S., Rodriguez, Y.: Torchvision the machine-vision package of torch. In: 18th ACM International Conference on Multimedia, pp. 1485–1488 (2010) 26. Mathews, K., Bowman, C.: The California consumer privacy act of 2018 (2018) 27. Melis, L., Song, C., De Cristofaro, E., Shmatikov, V.: Exploiting unintended feature leakage in collaborative learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706. IEEE (2019) 28. Mhamdi, E.M.E., Guerraoui, R., Rouault, S.: The hidden vulnerability of distributed learning in byzantium. arXiv preprint arXiv:1802.07927 (2018) 29. Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N.K.: Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inform. 19(6), 1893–1905 (2014) 30. Mu˜ noz-Gonz´ alez, L., et al.: Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 27–38 (2017) 31. Nasr, M., Shokri, R., Houmansadr, A.: Comprehensive privacy analysis of deep learning: passive and active white-box inference attacks against centralized and federated learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 739–753. IEEE (2019) 32. Nelson, B., et al.: Exploiting machine learning to subvert your spam filter. LEET 8, 1–9 (2008) 33. Nguyen, T.D., Rieger, P., Miettinen, M., Sadeghi, A.R.: Poisoning attacks on federated learning-based IoT intrusion detection system (2020)

500

V. Tolpegin et al.

34. Papernot, N., McDaniel, P., Sinha, A., Wellman, M.P.: SoK: security and privacy in machine learning. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 399–414. IEEE (2018) 35. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8024–8035 (2019) 36. Paudice, A., Mu˜ noz-Gonz´ alez, L., Gyorgy, A., Lupu, E.C.: Detection of adversarial training examples in poisoning attacks through anomaly detection. arXiv preprint arXiv:1802.03041 (2018) 37. Paudice, A., Mu˜ noz-Gonz´ alez, L., Lupu, E.C.: Label sanitization against label flipping poisoning attacks. In: Alzate, C., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11329, pp. 5–15. Springer, Cham (2019). https://doi.org/10.1007/ 978-3-030-13453-2 1 38. General Data Protection Regulation: Regulation (EU) 2016/679 of the European parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46. Off. J. Eur. Union (OJ) 59(1–88), 294 (2016) 39. Rubinstein, B.I., et al.: Antidote: understanding and defending against poisoning of anomaly detectors. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp. 1–14 (2009) 40. Ryffel, T., et al.: A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017 (2018) 41. Schlesinger, A., O’Hara, K.P., Taylor, A.S.: Let’s talk about race: identity, chatbots, and AI. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2018) 42. Shafahi, A., et al.: Poison frogs! Targeted clean-label poisoning attacks on neural networks. In: Advances in Neural Information Processing Systems, pp. 6103–6113 (2018) 43. Shen, S., Tople, S., Saxena, P.: Auror: Defending against poisoning attacks in collaborative deep learning systems. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 508–519 (2016) 44. Steinhardt, J., Koh, P.W.W., Liang, P.S.: Certified defenses for data poisoning attacks. In: NeurIPS, pp. 3517–3529 (2017) 45. Suciu, O., Marginean, R., Kaya, Y., Daume III, H., Dumitras, T.: When does machine learning fail? Generalized transferability for evasion and poisoning attacks. In: 27th USENIX Security Symposium, pp. 1299–1316 (2018) 46. Sun, Z., Kairouz, P., Suresh, A.T., McMahan, H.B.: Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963 (2019) 47. Truex, S., Liu, L., Gursoy, M.E., Yu, L., Wei, W.: Towards demystifying membership inference attacks. arXiv preprint arXiv:1807.09173 (2018) 48. Truex, S., Liu, L., Gursoy, M.E., Yu, L., Wei, W.: Demystifying membership inference attacks in machine learning as a service. IEEE Trans. Serv. Comput. (2019) 49. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017) 50. Xiao, H., Xiao, H., Eckert, C.: Adversarial label flips attack on support vector machines. In: ECAI, pp. 870–875 (2012) 51. Xiao, H., Biggio, B., Brown, G., Fumera, G., Eckert, C., Roli, F.: Is feature selection secure against training data poisoning? In: International Conference on Machine Learning, pp. 1689–1698 (2015) 52. Xiao, H., Biggio, B., Nelson, B., Xiao, H., Eckert, C., Roli, F.: Support vector machines under adversarial label contamination. Neurocomputing 160, 53–62 (2015)

Data Poisoning Attacks Against Federated Learning Systems

501

53. Yang, C., Wu, Q., Li, H., Chen, Y.: Generative poisoning attack method against neural networks. arXiv preprint arXiv:1703.01340 (2017) 54. Yang, G., Gong, N.Z., Cai, Y.: Fake co-visitation injection attacks to recommender systems. In: NDSS (2017) 55. Yin, D., Chen, Y., Kannan, R., Bartlett, P.: Byzantine-robust distributed learning: towards optimal statistical rates. In: International Conference on Machine Learning, pp. 5650–5659 (2018) 56. Zhao, L., et al.: Shielding collaborative learning: mitigating poisoning attacks through client-side detection. IEEE Trans. Dependable Secure Comput. (2020) 57. Zhao, M., An, B., Gao, W., Zhang, T.: Efficient label contamination attacks against black-box learning models. In: IJCAI, pp. 3945–3951 (2017) 58. Zhu, C., Huang, W.R., Li, H., Taylor, G., Studer, C., Goldstein, T.: Transferable clean-label poisoning attacks on deep neural nets. In: International Conference on Machine Learning, pp. 7614–7623 (2019) 59. Zhu, L., Liu, Z., Han, S.: Deep leakage from gradients. In: Advances in Neural Information Processing Systems, pp. 14747–14756 (2019)

Interpretable Probabilistic Password Strength Meters via Deep Learning Dario Pasquini1,2,3(B) , Giuseppe Ateniese1 , and Massimo Bernaschi3 1

3

Stevens Institute of Technology, Hoboken, USA {dpasquin,gatenies}@stevens.edu 2 Sapienza University of Rome, Rome, Italy Institute of Applied Computing, CNR, Rome, Italy [email protected]

Abstract. Probabilistic password strength meters have been proved to be the most accurate tools to measure password strength. Unfortunately, by construction, they are limited to solely produce an opaque security estimation that fails to fully support the user during the password composition. In the present work, we move the first steps towards cracking the intelligibility barrier of this compelling class of meters. We show that probabilistic password meters inherently own the capability to describe the latent relation between password strength and password structure. In our approach, the security contribution of each character composing a password is disentangled and used to provide explicit fine-grained feedback for the user. Furthermore, unlike existing heuristic constructions, our method is free from any human bias, and, more importantly, its feedback has a clear probabilistic interpretation. In our contribution: (1) we formulate the theoretical foundations of interpretable probabilistic password strength meters; (2) we describe how they can be implemented via an efficient and lightweight deep learning framework suitable for client-side operability.

Keywords: Password security

1

· Strength meters · Deep learning

Introduction

Accurately measuring password strength is essential to guarantee the security of password-based authentication systems. Even more critical, however, is training users to select secure passwords in the first place. One common approach is to rely on password policies that list a series of requirements for a strong password. This approach is limited or even harmful [10]. Alternatively, Passwords Strength Meters (PSMs) have been shown to be useful and are witnessing increasing adoption in commercial solutions [15,26]. The first instantiations of PSMs were based on simple heuristic constructions. Password strength was estimated via either handcrafted features such as LUDS (which counts lower and uppercase letters, digits, and symbols) or heuristic c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 502–522, 2020. https://doi.org/10.1007/978-3-030-58951-6_25

Interpretable Probabilistic Password Strength Meters via Deep Learning

503

Change this

i

a m s e c u

r

e

!

i

a M s e c u

Change this

r

e

i

a M s e c u

!

Change this

r

E

!

i

0 M s e c u

Change this

r

E

!

i

0 M s e c $

r

E

!

Fig. 1. Example of the character-level feedback mechanism and password composition process induced by our meter. In the figure, “iamsecure! ” is the password initially chosen by the user. Colors indicate the estimated character security: red (insecure) → green (secure). (Color figure online)

entropy definitions. Unavoidably, given their heuristic nature, this class of PSMs failed to accurately measure password security [11,30]. More recently, thanks to an active academic interest, PSMs based on more sound constructions and rigorous security definitions have been proposed. In the last decade, indeed, a considerable research effort gave rise to more precise meters capable of accurately measuring password strength [9,20,28]. However, meters have also become proportionally more opaque and inherently hard to interpret due to the increasing complexity of the employed approaches. State-of-art solutions base their estimates on blackbox parametric probabilistic models [9,20] that leave no room for interpretation of the evaluated passwords; they do not provide any feedback to users on what is wrong with their password or how to improve it. We advocate for explainable approaches in password meters, where users receive additional insights and become cognizant of which parts of their passwords could straightforwardly improve. This makes the password selection process less painful since users can keep their passwords of choice mostly unchanged while ensuring they are secure. In the present work, we show that the same rigorous probabilistic framework capable of accurately measuring password strength can also fundamentally describe the relation between password security and password structure. By rethinking the underlying mass estimation process, we create the first interpretable probabilistic password strength meter. Here, the password probability measured by our meter can be decomposed and used to estimate further the strength of every single character of the password. This explainable approach allows us to assign a security score to each atomic component of the password and determine its contribution to the overall security strength. This evaluation is, in turn, returned to the user who can tweak a few “weak” characters and consistently improve the password strength against guessing attacks. Figure 1 illustrates the selection process. In devising the proposed mass estimation process, we found it ideally suited for being implemented via a deep learning architecture. In the paper, we show how that can be cast as an efficient client-side meter employing deep convolutional neural networks. Our work’s major contributions are: (i) We formulate a novel password probability estimation framework

504

D. Pasquini et al.

based on undirected probabilistic models. (ii) We show that such a framework can be used to build a precise and sound password feedback mechanism. (iii) We implement the proposed meter via an efficient and lightweight deep learning framework ideally suited for client-side operability.

2

Background and Preliminaries

In this section, we offer an overview of the fundamental concepts that are important to understand our contribution. Section 2.1 covers Probabilistic Password Strength Meters. Next, in Sect. 2.2, we cover structured probabilistic models that will be fundamental in the interpretation of our approach. Finally, Sect. 2.3 briefly discusses relevant previous works within the PSMs context. 2.1

Probabilistic Password Strength Meters (PPSMs)

Probabilistic password strength meters are PSMs that base their strength measure on an explicit estimate of password probability. In the process, they resort to probabilistic models to approximate the probability distribution behind a set of known passwords, typically, instances of a password leak. Having an approximation of the mass function, strength estimation is then derived by leveraging adversarial reasoning. Here, password robustness is estimated in consideration of an attacker who knows the underlying password distribution, and that aims at minimizing the guess entropy [19] of her/his guessing attack. To that purpose, the attacker performs an optimal guessing attack, where guesses are issued in decreasing probability order (i.e., high-probability passwords first). More formally, given a probability mass function P (x) defined on the key-space X, the attacker creates an ordering XP (x) of X such that: XP (x) = [x0 , x1 , . . . , xn ]

where

∀i∈[0,n] : P (xi ) ≥ P (xi+1 ).

(1)

During the attack, the adversary produces guesses by traversing the list XP (x) . Under this adversarial model, passwords with high probability are considered weak, as they will be quickly guessed. Low-probability passwords, instead, are assessed as secure, as they will be matched by the attacker only after a considerable, possibly not feasible, number of guesses. 2.2

Structured Probabilistic Models

Generally, the probabilistic models used by PPSMs are probabilistic structured models (even known as graphical models). These describe password distributions by leveraging a graph notation to illustrate the dependency properties among a set of random variables. Here, a random variable xi is depicted as a vertex, and an edge between xi and xj exists whether xi and xj are statistically dependent. Structured probabilistic models are classified according to the orientation of edges. A direct acyclic graph (DAG) defines a directed graphical

Interpretable Probabilistic Password Strength Meters via Deep Learning

505

model (or Bayesian Network). In this formalism, an edge asserts a cause-effect relationship between two variables; that is, the state assumed from the variable xi is intended as a direct consequence of those assumed by its parents par(xi ) in the graph. Under such a description, a topological ordering among all the random variables can be asserted and used to factorize the joint probability distribution of the random variables effectively. On the other hand, an undirected graph defines an undirected graphical model, also known as Markov Random Field (MRF). In this description, the causality interpretation of edges is relaxed, and connected variables influence each other symmetrically. Undirected models permit the formalization of stochastic processes where causality among factors cannot be asserted. However, this comes at the cost of giving up to any simple form of factorization of the joint distribution. 2.3

Related Works

Here, we briefly review early approaches to the definition of PSMs. We focus on the most influential works as well as to the ones most related to ours. Probabilistic PSMs: Originally thought for guessing attacks [21], Markov model approaches have found natural application in the password strength estimation context. Castelluccia et al. [9] use a stationary, finite-state Markov chain as a direct password mass estimator. Their model computes the joint probability by separately measuring the conditional probability of each pair of n-grams in the observed passwords. Melicher et al. [20] extended the Markov model approach by leveraging a character/token level Recurrent Neural Network (RNN) for modeling the probability of passwords. As discussed in the introduction, pure probabilistic approaches are not capable of any natural form of feedback. In order to partially cope with this shortcoming, a hybrid approach has been investigated in [24]. Here, the model of Melicher et al. [20] is aggregated with a series of 21 heuristic, hand-crafted feedback mechanisms such as detection of leeting behaviors or common tokens (e.g., keyboard walks). Even if harnessing a consistently different form of feedback, our framework merges these solutions into a single and jointly learned model. Additionally, in contrast with [24], our feedback has a concrete probabilistic interpretation as well as complete freedom from any form of human bias. Interestingly enough, our model autonomously learns some of the heuristics hardwired in [24]. For instance, our model learned that capitalizing characters in the middle of the string could consistently improve password strength. Token Look-Up PSMs: Another relevant class of meters is that based on the token look-up approach. Generally speaking, these are non-parametric solutions that base their strength estimation on collections of sorted lists of tokens like leaked passwords and word dictionaries. Here, a password is modeled as a combination of tokens, and the relative security score is derived from the ranking of the tokens in the known dictionaries. Unlike probabilistic solutions, token-based PSMs are able to return feedback to the user, such as an explanation for the

506 f

D. Pasquini et al. y x 1 2 3 z v w

e q y

(a)

j

! @ # $ %

Common tokens.

q w S e e c u r

rt

e y 1 !

l

e

t m e

q s w E e c u !

r

et 1 y

l

e

t m e 2

(b)

Capitalize first/inner.

(c)

i

n 2

q w e

r

t

y

!

i

q w e

!

r

t

y

n

Numeric last/inner.

(d)

Special last/inner.

Fig. 2. In Panel (a), the model automatically highlights the presence of weak substrings by assigning high probabilities to the characters composing them. Panels (b), (c), and (d) are examples of self-learned weak/strong password composition patterns. In panel (b), the model assigns a high probability to the capitalization of the first letter (a common practice), whereas it assigns low probability when the capitalization is performed on inner characters. Panel (c) and (d) report similar results for numeric and special characters. (Color figure online)

weakness of a password relying on the semantic attributed to the tokens composing the password. A leading member of token look-up meters is zxcvbn [31], which assumes a password as a combination of tokens such as token, reversed, sequence repeat, keyboard, and date. This meter scores passwords according to a heuristic characterization of the guess-number [19]. Such score is described as the number of combinations of tokens necessary to match the tested password by traversing the sorted tokens lists. zxcvbn is capable of feedback. For instance, if one of the password components is identified as “repeat”, zxcvbn will recommend the user to avoid the use of repeated characters in the password. Naturally, this kind of feedback mechanism inherently lacks generality and addresses just a few human-chosen scenarios. As discussed by the authors themselves, zxcvbn suffers from various limitations. By assumption, it is unable to model the relationships among different patterns occurring in the same passwords. Additionally, like other token look-up based approaches, it fails to coherently model unobserved patterns and tokens.

3

Meter Foundations

In this section, we introduce the theoretical foundations of the proposed estimation process. First, in Sect. 3.1, we introduce and motivate the probabilistic character-level feedback mechanism. Later, in Sect. 3.2, we describe how that mechanism can be obtained using undirected probabilistic models. 3.1

Character-Level Strength Estimation via Probabilistic Reasoning

As introduced in Sect. 2.1, PPSMs employ probabilistic models to approximate the probability mass function of an observed password distribution, say P (x). Estimating P (x), however, could be particularly challenging, and suitable estimation techniques must be adopted to make the process feasible. In this direction, a general solution is to factorize the domain of the mass function (i.e., the key-space); that is, passwords are modeled as a concatenation of smaller factors, typically, decomposed at the character level. Afterward, password distribution

Interpretable Probabilistic Password Strength Meters via Deep Learning

507

is estimated by modeling stochastic interactions among these simpler components. More formally, every password is assumed as a realization x = [x1 , . . . , x ] of a random vector of the kind x = [x1 , . . . , x ], where each disjoint random variable xi represents the character at position i in the string. Then, P (x) is described through structured probabilistic models that formalize the relations among those random variables, eventually defining a joint probability distribution. In the process, every random variable is associated with a local conditional probability distribution (here, referred to as Q) that describes the stochastic behavior of xi in consideration of the conditional independence properties asserted from the underlying structured model i.e., Q(xi )=P (xi | par(xi )). Eventually, the joint measurement of probability is derived from the aggregation of the marginalized  local conditional probability distributions, typically under the form P (x)= i=1 Q(xi =xi ). As introduced in Sect. 2.1, the joint probability can be employed as a good representative for password strength. However, such a global assessment unavoidably hides much fine-grained information that can be extremely valuable to a password meter. In particular, the joint probability offers us an atomic interpretation of the password strength, but it fails at disentangling the relation between password strength and password structure. That is, it does not clarify which factors of an evaluated password are making that password insecure. However, as widely demonstrated by non-probabilistic approaches [17,24,31], users benefit from the awareness of which part of the chosen password is easily predictable and which is not. In this direction, we argue that the local conditional probabilities that naturally appear in the estimation of the joint one, if correctly shaped, can offer detailed insights into the strength or the weakness of each factor of a password. Such character-level probability assignments are an explicit interpretation of the relation between the structure of a password and its security. The main intuition here is that: high values of Q(xi ) tell us that xi (i.e., the character at position i in the string) has a high impact on increasing the password probability and must be changed to make the password stronger. Instead, characters with low conditional probability are pushing the password to have low probability and must be maintained unchanged. Figure 2 reports some visual representations of such probabilistic reasoning. Each segment’s background color renders the value of the local conditional probability of the character. Red describes high probability values, whereas green describes low probability assignments. Such a mechanism can naturally discover weak passwords components and explicitly guide the user to explore alternatives. For instance, local conditional probabilities can spot the presence of predictable tokens in the password without the explicit use of dictionaries (Fig. 2a). These measurements are able to automatically describe common password patterns like those manually modeled from other approaches [24], see Figs. 2b, 2c and 2d. More importantly, they can potentially describe latent composition patterns that have never been observed and modeled by human beings. In doing this, neither supervision nor human-reasoning is required.

508

D. Pasquini et al. x1 x1

x2

x3 (a)

x4

x3

x2

x4 (b)

Fig. 3. Two graphical models describing different interpretations of the generative probability distribution for passwords of length four. Graph (a) represents a Bayesian network. Scheme (b) depicts a Markov Random Field.

Unfortunately, existing PPSMs, by construction, leverage arbitrary designed structured probabilistic models that fail to produce the required estimates. Those assume independence properties and causality relations among characters that are not strictly verified in practice. As a result, their conditional probability measurements fail to model correctly a coherent character-level estimation that can be used to provide the required feedback mechanism. Hereafter, we show that relaxing these biases from the mass estimation process will allow us to implement the feedback mechanism described above. To that purpose, we have to build a new probabilistic estimation framework based on complete, undirected models. 3.2

An Undirected Description of Password Distribution

To simplify our method’s understanding, we start with a description of the probabilistic reasoning of previous approaches. Then, we fully motivate our solution by comparison with them. In particular, we chose the state-of-the-art neural approach proposed in [20] (henceforth, referred to as FLA) as a representative instance, since it is the least biased as well as the most accurate among the existing PPSMs. FLA uses a recurrent neural network (RNN) to estimate password mass function at the character level. That model assumes a stochastic process represented by a Bayesian network like the one depicted in Fig. 3a. As previously anticipated, such a density estimation process bears the burden of bold assumptions on its formalization. Namely, the description derived from a Bayesian Network implies a causality order among password characters. This assumption asserts that the causality flows in a single and specific direction in the generative process i.e., from the start of the string to its end. Therefore, characters influence their probabilities only asymmetrically; that is, the probability of xi+1 is conditioned by xi but not vice versa. In practice, this implies that the observation of the value assumed from xi+1 does not affect our belief in the value expected from xi , yet the opposite does. Consequently, every character xi is assumed independent from the characters that follow it in the string, i.e., [xi+1 , . . . , x ]. Assuming this property verified for the underlying stochastic process eventually simplifies the probability estimation; the local conditional probability of each character can be easily computed as Q(xi ) = P (xi | x1 , . . . , xi−1 ), where Q(xi ) explicates that the i’th character solely depends on the characters that precede it in the string.

Interpretable Probabilistic Password Strength Meters via Deep Learning

509

Just as easily,  factorizes in:  the joint probability Q(x ) = P (x ) P (x) = i 1 i=1 i=2 P (xi | x1 , . . . xi−1 ) by chain rule. Unfortunately, although those assumptions do simplify the estimation process, the conditional probability Q(xi ), per se, does not provide a direct and coherent estimation of the security contribution of the single character xi in the password. This is particularly true for characters in the first positions of the string, even more so for the first character x1 , which is assumed to be independent of any other symbol in the password; its probability is the same for any possible configuration of the remaining random variables [x2 , . . . , x ]. Nevertheless, in the context of a sound character-level feedback mechanism, the symbol xi must be defined as “weak” or “strong” according to the complete context defined by the entire string. For instance, given two passwords y = “aaaaaaa” and z = “a######”, the probability Q(x1 = ‘a’) should be different if measured on y or z. More precisely, we expect Q(x1 = ‘a’|y) to be much higher than Q(x1 = ‘a’|z), as observing y2,7 = “aaaaaa” drastically changes our expectations about the possible values assumed from the first character in the string. On the other hand, observing z2,7 = “######” tells us little about the event x1 = ‘a’. Yet, this interaction cannot be described through the Bayesian network reported in Fig. 3a, where Q(x1 = ‘a’|y) eventually results equal to Q(x1 = ‘a’|z). The same reasoning applies to trickier cases, as for the password x = “(password)”. Here, arguably, the security contribution of the first character ‘(’ strongly depends from the presence or absence of the last character1 , i.e., x7 = ‘)’. The symbol x1 = ‘(’, indeed, can be either a good choice (as it introduces entropy in the password) or a poor one (as it implies a predictable template in the password), but this solely depends on the value assumed from another character in the string (the last one in this example). It should be apparent that the assumed structural independence prevents the resulting local conditional probabilities from being sound descriptors of the real character probability as well as of their security contribution. Consequently, such measures cannot be used to build the feedback mechanism suggested in Sect. 3.1. The same conclusion applies to other classes of PPSMs [9,29] which add even more structural biases on top of those illustrated by the model in Fig. 3a. Under a broader view, directed models are intended to be used in contexts where the causality relationships among random variables can be fully understood. Unfortunately, even if passwords are physically written character after character by the users, it is impossible to assert neither independence nor causeeffect relations among the symbols that compose them. Differently from plain dictionary words, passwords are built on top of much more complex structures and articulated interactions among characters that cannot be adequately described without relaxing many of the assumptions leveraged by existing PPSMs. In the act of relaxing such assumptions, we base our estimation on an undirected and complete2 graphical model, 1 2

Even if not so common, strings enclosed among brackets or other special characters often appear in password leaks. “Complete” in the graph-theory sense.

510

D. Pasquini et al.

(a)

(b)

Fig. 4. Estimated local conditional probabilities for two pairs of passwords. The numbers depicted above the strings report the Q(xi ) value for each character (rounding applied).

as this represents the most general description of the password generative distribution. Figure 3b depicts the respective Markov Random Field (MRF) for passwords of length four. According to that description, the probability of the character xi directly depends on any other character in the string, i.e., the full context. In other words, we model each variable xi as a stochastic function of all the others. This intuition is better captured from the evaluation of local conditional probability (Eq. 2). ⎧ ⎪ i=1 ⎨P (xi | xi+1 , . . . x ) Q(xi ) = P (xi | x1 , . . . xi−1 ) (2) i= ⎪ ⎩ P (xi | x1 , . . . , xi−1 , xi+1 , . . . x ) 1 < i < . Henceforth, we use the notation Q(xi ) to refer to the local conditional distribution of the i’th character within the password x. When x is not clear from the context, we write Q(xi | x) to make it explicit. The notation Q(xi =s) or Q(s), instead, refers to the marginalization of the distribution according to the symbol s. Eventually, such undirected formalization intrinsically weeds out all the limitations observed for the previous estimation process (i.e., the Bayesian network in Fig. 3a). Now, every local measurement is computed within the context offered by any other symbol in the string. Therefore, relations that were previously assessed as impossible can now be naturally described. In the example, y =“aaaaaaa” / z =“a######”, indeed, the local conditional probability of the first character can be now backward-influenced from the context offered from the subsequent part of the string. This is clearly observable from the output of an instance of our meter reported in Fig. 4a, where the value of Q(x1 =‘a’) drastically varies between the two cases, i.e., y and z. As expected, we have Q(x1 =‘a’|y)  Q(x1 =‘a’|z) verified in the example. A similar intuitive result is reported in Fig. 4b, where the example x = “(password)” is considered. Here, the meter first scores the string x = “(password”, then it scores the complete password x = “(password)”. In this case, we expect that the presence of the last character ‘)’ would consistently influence the conditional measurement of the first bracket in the string. Such expectation is perfectly captured from the reported output, where appending at the end of the string the symbol ‘)’ increases the probability of the first bracket of a factor ∼15. However, obtaining these improvements does not come for free. Indeed, under the MRF construction, the productory over the local conditional prob-

Interpretable Probabilistic Password Strength Meters via Deep Learning

511

abilities (better defined as potential functions or factors within this context) does not provide the exact joint probability distribution  of x. Instead, such product results in a unnormalized version of it: P (x) ∝ i=1 Q(xi )= P˜ (x) with ˜

P (x) = P Z(x) . In the equation, Z is the partition function. This result follows from the Hammersley–Clifford theorem [16]. Nevertheless, the unnormalized joint distribution preserves the core properties needed to the meter functionality. Most importantly, we have that: ∀x, x : P (x) ≥ P (x ) ⇔ P˜ (x) ≥ P˜ (x ).

(3)

That is, if we sort a list of passwords according to the true joint P (x) or according to the unnormalized version P˜ (x), we obtain the same identical ordering. Consequently, no deviation from the adversarial interpretation of PPSMs described in Sect. 2.1 is implied. Indeed, we have XP (x) = XP˜ (x) for every password distribution, key-space, and suitable sorting function. Furthermore, the joint probability distribution, if needed, can be approximated using suitable approximation methods, as discussed in Appendix B. Appendix A, instead, reports a more detailed description of the feedback mechanism. It is important to highlight that, although we present our approach under the character-level description, our method can be directly applied to n-grams or tokens without any modification. In summary, in this section, we presented and motivated an estimation process able to unravel the feedback mechanism described in Sect. 3.1. Maintaining a purely theoretical focus, no information about the implementation of such methodology has been offered to the reader. Next, in Sect. 4, we describe how such a meter can be shaped via an efficient deep learning framework.

4

Meter Implementation

In this section, we present a deep-learning-based implementation of the estimation process introduced in Sect. 3.2. Here, we describe the model and its training process. Then, we explain how the trained network can be used as a building block for the proposed password meter. Model Training. From the discussion in Sect. 3.2, our procedure requires the parametrization of an exponentially large number of interactions among random variables. Thus, any tabular approach, such as the one used from Markov Chains or PCFG [29], is a priori excluded for any real-world case. To make such a meter feasible, we reformulate the underlying estimation process so that it can be approximated with a neural network. In our approach, we simulate the Markov Random Field described in Sect. 3.2 using a deep convolutional neural network trained to compute Q(xi ) (Eq. 2) for each possible configuration of the structured model. In doing so, we train our network to solve an inpainting-like task defined over the textual domain. Broadly speaking, inpainting is the task of reconstructing missing information from mangled inputs, mostly images with missing or damaged patches [32]. Under the probabilistic perspective, the

512

D. Pasquini et al. x = “love1” f (“•ove1”)

f (“l•ve1”)

P (x1 | “•ove1”) P (x2 | “l•ve1”)

f (“lo•e1”)

f (“lov•1”)

f (“love•”)

P (x3 | “lo•e1”)

P (x4 | “lov•1”)

P (x5 | “love•”)

Fig. 5. Graphical depiction of the complete inference process for the password x = “love1”. The function f refers to the trained autoencoder and the symbol ‘•’ refers to the deleted character.

model is asked to return a probability distribution over all the unobserved elements of x, explicitly measuring the conditional probability of those concerning the observable context. Therefore, the network has to disentangle and model the semantic relation among all the factors describing the data (e.g., characters in a string) to reconstruct input instances correctly. Generally, the architecture and the training process used for inpainting tasks resemble an auto-encoding structure [8]. In the general case, these models are trained to revert self-induced damage carried out on instances of a train-set X. At each training step, an instance x ∈ X is artificially mangled with an information-destructive transformation to create a mangled variation x ˜. Then, the network, receiving x ˜ as input, is optimized to produce an output that most resembles the original x; that is, the network is trained to reconstruct x from x ˜. In our approach, we train a network to infer missing characters in a mangled password. In particular, we iterate over a password leak (i.e., our train-set) by creating mangled passwords and train the network to recover them. The mangling operation is performed by removing a randomly selected character from the string. For example, the train-set entry x = “iloveyou” is transformed in x ˜ =“ilov•you” if the 5’th character is selected for deletion, where the symbol ‘•’ represents the “empty character”. A compatible proxy-task has been previously used in [22] to learn a suitable password representation for guessing attacks. We chose to model our network with a deep residual structure arranged to create an autoencoder. The network follows the same general Context Encoder [23] architecture defined in [22] with some modifications. To create an information bottleneck, the encoder connects with the decoder through a latent space junction obtained through two fully connected layers. We observed that enforcing a latent space, and a prior on that, consistently increases the meter effectiveness. For that reason, we maintained the same regularization proposed in [22]; a maximum mean discrepancy regularization that forces a standard normal distributed latent space. The final loss function of our model is reported in Eq. 4. In the equation, Enc and Dec refer to the encoder and decoder network respectively, s is the softmax function applied row-wise3 , the distance function d is the cross-entropy, and mmd refers to the maximum mean discrepancy. Ex,˜x [d(x, s(Dec(Enc(˜ x)))] + αEz∼N (0,I) [mmd(z, Enc(˜ x))] 3

(4)

The Decoder outputs  estimations; one for each input character. Therefore, we apply the softmax function separately on each of those to create  probability distributions.

Interpretable Probabilistic Password Strength Meters via Deep Learning

513

Henceforth, we refer to the composition of the encoder and the decoder as f (x) = s(Dec(Enc(x))). We train the model on the widely adopted RockYou leak [7] considering an 80/20 train-test split. From it, we filter passwords presenting fewer than 5 characters. We train different networks considering different maximum password lengths, namely, 16, 20, and 30. In our experiments, we report results obtained with the model trained on a maximum length equal to 16, as no substantial performance variation has been observed among the different networks. Eventually, we produce three neural nets with different architectures; a large network requiring 36 MB of disk space, a medium-size model requiring 18 MB, and a smaller version of the second that requires 6.6 MB. These models can be potentially further compressed using the same quantization and compression techniques harnessed in [20].4 Model Inference Process. Once the model is trained, we can use it to compute the conditional probability Q(xi ) (Eq. 2) for each i and each possible configuration of the MRF. This is done by querying the network f using the same mangling trick performed during the training. The procedure used to compute Q(xi ) for x is summarized in the following steps: 1. We substitute the i’th character of x with the empty character ‘•’, obtaining a mangled password x ˜. 2. Then, we feed x ˜ to a network that outputs a probability distribution over Σ of the unobserved random variable xi i.e., Q(xi ). 3. Given Q(xi ), we marginalize out xi , obtaining the probability: ˜). Q(xi ) = P (xi =xi | x For instance, if we want to compute the local conditional probability of the character ‘e’ in the password x = “iloveyou”, we first create x ˜ =“ilov•you” and use it as input for the net, obtaining Q(x5 ), then we marginalize that (i.e., Q(x5 =‘e’)) ˜). From the probabilistic point of view, this getting the probability P (x5 = ‘e’| x process is equivalent to fixing the observable variables in the MRF and querying the model for an estimation of the single unobserved character. At this point, to cast both the feedback mechanism defined in Sect. 3.1 and the unnormalized joint probability of the string, we have to measure Q(xi ) for each character xi of the tested password. This is easily achieved by repeating the inference operation described above for each character comprising the input string. A graphical representation of this process is depicted in Fig. 5. It is important to highlight that the  required inferences are independent, and their evaluation can be performed in parallel (i.e., batch level parallelism), introducing almost negligible overhead over the single inference. Additionally, with the use of a feed-forward network, we avoid the sequential computation that is intrinsic in recurrent networks (e.g., the issue afflicting [20]), and that can be excessive for a reactive client-side implementation. Furthermore, the convolutional structure enables the construction of very deep neural nets with a limited memory footprint. 4

The code, pre-trained models, and other materials related to our work are publicly available at: https://github.com/pasquini-dario/InterpretablePPSM.

514

D. Pasquini et al.

In conclusion, leveraging the trained neural network, we can compute the potential of each factor/vertex in the Markov Random Field (defined as local conditional probabilities in our construction). As a consequence, we are now able to cast a PPSM featuring the character-level feedback mechanism discussed in Sect. 3.1. Finally, in Sect. 5, we empirically evaluate the soundness of the proposed meter.

5

Evaluation

In this section, we empirically validate the proposed estimation process as well as its deep learning implementation. First, in Sect. 5.1, we evaluate the capability of the meter of accurately assessing password strength at string-level. Next, in Sect. 5.2, we demonstrate the intrinsic ability of the local conditional probabilities of being sound descriptors of password strength at character-level. 5.1

Measuring Meter Accuracy

In this section, we evaluate the accuracy of the proposed meter at estimating password probabilities. To that purpose, following the adversarial reasoning introduced in Sect. 2.1, we compare the password ordering derived from the meter with the one from the ground-truth password distribution. In doing so, we rely on the guidelines defined in [15] for our evaluation. In particular, given a testset (i.e., a password leak), we consider a weighted rank correlation coefficient between ground-truth ordering and that derived from the meter. The groundtruth ordering is obtained by sorting the unique entries of the test-set according to the frequency of the password observed in the leak. In the process, we compare our solution with other fully probabilistic meters. A detailed description of the evaluation process follows. Test-Set. For modeling the ground-truth password distribution, we rely on the password leak discovered by 4iQ in the Dark Web [1] on 5th December 2017. It consists of the aggregation of ∼250 leaks, consisting of 1.4 billion passwords in total. In the cleaning process, we collect passwords with length in the interval 5–16, obtaining a set of ∼ 4 · 108 unique passwords that we sort in decreasing frequency order. Following the approach of [15], we filter out all the passwords with a frequency lower than 10 from the test-set. Finally, we obtain a test-set composed of 107 unique passwords that we refer to as XBC . Given both the large number of entries and the heterogeneity of sources composing it, we consider XBC an accurate description of real-world passwords distribution. Tested Meters. In the evaluation process, we compare our approach with other probabilistic meters. In particular: – The Markov model [13] implemented in [3] (the same used in [15]). We investigate different n-grams configurations, namely, 2-g, 3-g and 4-g that we refer to as MM2 , MM3 and MM4 , respectively. For their training, we employ the same train-set used for our meter.

Interpretable Probabilistic Password Strength Meters via Deep Learning

515

– The neural approach of Melicher et al. [20]. We use the implementation available at [4] to train the main architecture advocated in [20], i.e., an RNN composed of three LSTM layers of 1000 cells each, and two fully connected layers. The training is carried out on the same train-set used for our meter. We refer to the model as FLA. Metrics. We follow the guidelines defined by Golla and D¨ urmuth [15] for evaluating the meters. We use the weighted Spearman correlation coefficient (ws) to measure the accuracy of the orderings produced by the tested meters, as this has been demonstrated to be the most reliable correlation metrics within this context [15]. Unlike [15], given the large cardinality and diversity of this leak, we directly use the ranking derived from the password frequencies in XBC as ground-truth. Here, passwords with the same frequency value have received the same rank in the computation of the correlation metric. Results. Table 1 reports the measured correlation coefficient for each tested meter. In the table, we also report the required storage as auxiliary metric. Our meters, even the smallest, achieve higher or comparable score than the most performant Markov Model, i.e., MM4 . On the other hand, our largest model cannot directly exceed the accuracy of the state-of-the-art estimator FLA, obtaining only comparable results. However, FLA requires more disk space than ours. Indeed, interestingly, our convolutional implementation permits the creation of remarkably lightweight meters. As a matter of fact, our smallest network shows a comparable result with MM4 requiring more than a magnitude less disk space. In conclusion, the results confirm that the probability estimation process defined in Sect. 3.2 is indeed sound and capable of accurately assessing password mass at string-level. The proposed meter shows comparable effectiveness with the state-of-the-art [20], whereas, in the large setup, it outperforms standard approaches such as Markov Chains. Nevertheless, we believe that even more accurate estimation can be achieved by investigating deeper architectures and/or by performing hyper-parameters tuning over the model. 5.2

Analysis of the Relation Between Local Conditional Probabilities and Password Strength

In this section, we test the capability of the proposed meter to correctly model the relation between password structure and password strength. In particular, we investigate the ability of the measured local conditional probabilities of determining the tested passwords’ insecure components. Our evaluation procedure follows three main steps. Starting from a set of weak passwords X: (1) We perform a guessing attack on X in order to estimate the guess-number of each entry of the set.

516

D. Pasquini et al.

Table 1. Rank correlation coefficient computed between XBC and the tested meters.

Baseline (PNP) ours (small) Semi-Meter (PNP) 0.154 0.170 0.193 0.199 Fully-Meter (PNP) 1.1MB 94MB 8.8GB 60MB 36MB 18MB 6.6MB Baseline (AGI) Semi-Meter (AGI) Fully-Meter (AGI) SemiMeter/Baseline (AGI) FullyMeter/Baseline (AGI) MM2

Weighted Spearman ↑ Required Disk Space ↓

Table 2. Strength improvement induced by different perturbations.

MM3 MM4

FLA

ours ours (large) (mid) 0.217 0.207 0.203

n = 1 0.022 0.036

n = 2 0.351 0.501

n = 3 0.549 0.674

0.066

0.755

0.884

3.0 · 1010 3.6 · 1011 5.6 · 1011 4.6 · 1010 5.1 · 1011 6.8 · 1011 8.2 · 1010 7.7 · 1011 8.9 · 1011 1.530

1.413

1.222

2.768

2.110

1.588

(2) For each password x ∈ X, we substitute n characters of x according to the estimated local conditional probabilities (i.e., we substitute the characters ˜. with highest Q(xi )), producing a perturbed password x (3) We repeat the guessing attack on the set of perturbed passwords and measure the variation in the attributed guess-numbers. Hereafter, we provide a detailed description of the evaluation procedure. Passwords Sets. The evaluation is carried out considering a set of weak passwords. In particular, we consider the first 104 most frequent passwords of the XBC set. Password Perturbations. In the evaluation, we consider three types of password perturbation: (1) The first acts as a baseline and consists of the substitution of random positioned characters in the passwords with randomly selected symbols. Such a general strategy is used in [24] and [14] to improve the user’s password at composition time. The perturbation is applied by randomly selecting n characters from x and substituting them with symbols sampled from a predefined character pool. In our simulations, the pool consists of the 25 most frequent symbols in XBC (i.e., mainly lowercase letters and digits). Forcing this character-pool aims at preventing the tested perturbation procedures to create artificially complex passwords such as strings containing extremely uncommon unicode symbols. We refer to this perturbation procedure as Baseline. (2) The second perturbation partially leverages the local conditional probabilities induced by our meter. Given a password x, we compute the conditional probability Q(xi ) for each character in the string. Then, we select and substitute the character with maximum probability, i.e., arg maxxi Q(xi ). The symbol we use in the substitution is randomly selected from the same pool used for the baseline perturbation (i.e., top-25 frequent symbols). When n is

Interpretable Probabilistic Password Strength Meters via Deep Learning

517

greater than one, the procedure is repeated sequentially using the perturbed password obtained from the previous iteration as input for the next step. We refer to this procedure as Semi-Meter. (3) The third perturbation extends the second one by exploiting the local conditional distributions. Here, as in the Semi-Meter-based, we substitute the character in x with the highest probability. However, rather than choosing a substitute symbol in the pool at random, we select that according to the distribution Q(xi ), where i is the position of the character to be substituted. In particular, we choose the symbol the minimize Q(xi ), i.e., arg mins∈Σ  Q(xi =s), where Σ  is the allowed pool of symbols. We refer to this method as Fully-Meter. Guessing Attack. We evaluate password strength using the min-auto strategy advocated in [27]. The ensemble we employ is composed of HashCat [2], PCFG [6,29] and the Markov chain approach implemented in [5,13]. We limit each tool to produce 1010 guesses. The total size of the generated guesses is ∼3 TB. Metrics. In the evaluation, we are interested in measuring the increment of password strength caused by an applied perturbation. We estimate that value by considering the Average Guess-number Increment (henceforth, referred to as AGI); that is, the average delta between the guess-number of the original password and the guess-number of the perturbed password: |X|

1  [g(˜ xi ) − g(xi )] AGI(X) = |X| i=0 where g is the guess-number, and x ˜i refers to the perturbed version of the i’th password in the test set. During the computation of the guess-numbers, it is possible that we fail to guess a password. In such a case, we attribute an artificial guess-number equals to 1012 to the un-guessed passwords. Additionally, we consider the average number of un-guessed passwords as an ancillary metrics; we refer to it with the name of Percentage Non-Guessed Passwords (PNP) and compute it as: PNP(X) =

1 |{xi | g(xi ) = ⊥ ∧ g(˜ xi ) = ⊥}|, |X|

where g(x) = ⊥ when x is not guessed during the guessing attack. Results. We perform the tests over three values of n (i.e., the number of perturbed characters), namely, 1, 2, and 3. Results are summarized in Table 2. The AGI caused by the two meter-based solutions is always greater than that produced by random perturbations. On average, that is twice more effective with respect to the Fully-Meter baseline and about 35% greater for the Semi-Meter. The largest relative benefit is observable when n = 1, i.e., a single character is modified. Focusing on the Fully-Meter approach, indeed, the guidance of the local conditional probabilities permits a guess-number increment 2.7 times bigger than the one caused by a random substitution in the string. This advantage

518

D. Pasquini et al.

drops to ∼1.5 when n = 3, since, after two perturbations, passwords tend to be already out of the dense zone of the distribution. Indeed, at n = 3 about 88% of the passwords perturbed with the Fully-Meter approach cannot be guessed during the guessing attack (i.e., PNP). This value is only ∼55% for the baseline. More interestingly, the results tell us that substituting two (n = 2) characters following the guide of the local conditional probabilities causes a guess-number increment greater than the one obtained from three (n = 3) random perturbations. In the end, these results confirm that the local conditional distributions are indeed sound descriptors of password security at the structural level. Limitations. Since the goal of our evaluation was mainly to validate the soundness of the proposed estimation process, we did not perform user studies and we did not evaluate human-related factors such as password memorability although we recognize their importance.

6

Conclusion

In this paper, we showed that it is possible to construct interpretable probabilistic password meters by fundamentally rethinking the underlying password mass estimation. We presented an undirected probabilistic interpretation of the password generative process that can be used to build precise and sound password feedback mechanisms. Moreover, we demonstrated that such an estimation process could be instantiated via a lightweight deep learning implementation. We validated our undirected description and deep learning solution by showing that our meter achieves comparable accuracy with other existing approaches while introducing a unique character-level feedback mechanism that generalizes any heuristic construction.

Appendix A

Details on the Password Feedback Mechanism

Joint probability can be understood as a compatibility score assigned to a specific configuration of the MRF; it tells us the likelihood of observing a sequence of characters during the interaction with the password generative process. On a smaller scale, a local conditional probability measures the impact that a single character has in the final security score. Namely, it indicates how much the character contributes to the probability of observing a certain password x. Within this interpretation, low-probabilities characters push the joint probability of x to be closer to zero (secure), whereas high-probability characters (i.e., Q(x1 )  1) make no significant contribution to lowering the password probability (insecure). Therefore, users can strengthen their candidate passwords by substituting highprobability characters with suitable lower-probability ones (e.g., Fig. 1). Unfortunately, users’ perception of password security has been shown to be generally erroneous [25], and, without explicit guidelines, it would be difficult

Interpretable Probabilistic Password Strength Meters via Deep Learning

519

for them to select suitable lower-probability substitutes. To address this limitation, one could adopt our approach based on local conditional distributions as an effective mechanism to help users select secure substitute symbols. Indeed, ∀i Q(xi ) are able to clarify which symbol is a secure substitute and which is not for each character xi of x. In particular, a distribution Q(xi ), defined on the whole alphabet Σ, assigns a probability to every symbol s that the character xi can potentially assume. For a symbol s ∈ Σ, the probability Q(xi =s) measures how much the event xi =s is probable given all the observable characters in x. Under this interpretation, a candidate, secure substitution of xi is a symbol with very low Q(xi =s) (as this will lower the joint probability of x). In particular, every symbol s s.t. Q(xi =s) < Q(xi =xi ) given x is a secure substitution for xi . Table 3 better depicts this intuition. The Table reports the alphabet sorted by Q(xi ) for each xi in the example password x = “P aSsW 0rD!”. The bold symbols between parenthesis indicate xi . Within this representation, all the symbols below the respective xi for each xi are suitable substitutions that improve password strength. This intuition is empirically proven in Sect. 5.2. It is important to note that the suggestion mechanism must be randomized to avoid any bias in the final password distribution.5 To this end, one can provide the user with k random symbols among the pool of secure substitutions, i.e., {s | Q(xi =s) < Q(xi =xi )}. Table 3. First seven entries of the ordering imposed on Σ from the local conditional distribution for each character of the password x = “P aSsW 0rD!” Rank •aSsW0rD! P•SsW0rD! Pa•sW0rD! PaS•W0rD! PaSs•0rD! PaSsW•rD! PaSsW0•D! PaSsW0r•! PaSsW0rD•

B

0

{P}

A

s

S

w

O

R

d

1

1

S

{a}

{S}

{s}

{W}

o

{r}

{D}

S

2

p

@

c

A

#

{0}

N

t

2

3

B

3

n

T

f

I

0

m

s

4

C

4

t

E

k

i

L

l

3

5

M

I

d

H

1

#

D

k

{!}

6

1

1

r

O

F

A

n

e

5

7

c

5

x

$

3

@

X

r

9

8

s

0

$

I

0

)

S

f

4

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

Estimating Guess-Numbers

Within the context of PPSMs, a common solution to approximate guess-numbers [19] is using the Monte Carlo method proposed in [12]. With a few adjustments, the same approach can be applied to our meter. In particular, we have to derive an approximation of the partition function Z. This can be done by leveraging the Monte Carlo method as follows: Z N · Ex [P (x)] 5

(5)

That is, if weak passwords are always perturbed in the same way, they will be easily guessed.

520

D. Pasquini et al.

where N is the number of possible configurations of the MRF (i.e., the cardinality of the key-space), and x is a sample from the posterior distribution of the model. Samples from the model can be obtained in three ways: (1) sampling from the latent space of the autoencoder (as done in [22]), (2) performing Gibbs sampling from the autoencoder, or (3) using a dataset of passwords that follow the same distribution of the model. Once we have an approximation of Z, we can use it to ˜ normalize every joint probability, i.e., P (x) = P Z(x) and then apply the method in [12]. Alternatively, we could adopt a more articulate solution as in [18]. In any event, the estimation of the partition function Z is performed only once and can be done offline.

References 1. 2. 3. 4. 5. 6. 7. 8.

9. 10.

11. 12.

13.

14.

15.

1.4 Billion Clear Text Credentials Discovered. https://tinyurl.com/t8jp5h7 hashcat GitHub. https://github.com/hashcat NEMO Markov Model GitHub. https://github.com/RUB-SysSec/NEMO Neural Cracking GitHub. https://github.com/cupslab/neural network cracking OMEN GitHub. https://github.com/RUB-SysSec/OMEN PCFG GitHub. https://github.com/lakiw/pcfg cracker RockYou Leak. https://downloads.skullsecurity.org/passwords/rockyou.txt.bz2 Baldi, P.: Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, volume 27 of Proceedings of Machine Learning Research, pp. 37–49. PMLR, Bellevue, 02 July 2012 Castelluccia, C., D¨ urmuth, M., Perito, D.: Adaptive password-strength meters from Markov models. In: NDSS (2012) Clair, L.S., Johansen, L., Enck, W., Pirretti, M., Traynor, P., McDaniel, P., Jaeger, T.: Password exhaustion: predicting the end of password usefulness. In: Bagchi, A., Atluri, V. (eds.) ICISS 2006. LNCS, vol. 4332, pp. 37–55. Springer, Heidelberg (2006). https://doi.org/10.1007/11961635 3 Dell’ Amico, M., Michiardi, P., Roudier, Y.: Password strength: an empirical analysis. In: 2010 Proceedings IEEE INFOCOM, pp. 1–9, March 2010 Dell’Amico, M., Filippone, M.: Monte Carlo strength evaluation: fast and reliable password checking. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 158–169. Association for Computing Machinery, New York (2015) D¨ urmuth, M., Angelstorf, F., Castelluccia, C., Perito, D., Chaabane, A.: OMEN: faster password guessing using an ordered Markov enumerator. In: Piessens, F., Caballero, J., Bielova, N. (eds.) ESSoS 2015. LNCS, vol. 8978, pp. 119–132. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15618-7 10 Forget, A., Chiasson, S., van Oorschot, P.C., Biddle, R.: Improving text passwords through persuasion. In: Proceedings of the 4th Symposium on Usable Privacy and Security, SOUPS 2008, pp. 1–12. Association for Computing Machinery, New York (2008) Golla, M., D¨ urmuth, M.: On the accuracy of password strength meters. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, pp. 1567–1582. Association for Computing Machinery, New York (2018)

Interpretable Probabilistic Password Strength Meters via Deep Learning

521

16. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press, Cambridge (2009) 17. Komanduri, S., Shay, R., Cranor, L.F., Herley, C., Schechter, S.: Telepathwords: preventing weak passwords by reading users’ minds. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 591–606. USENIX Association, San Diego, August 2014 18. Ma, J., Peng, J., Wang, S., Xu, J.: Estimating the partition function of graphical models using Langevin importance sampling. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, volume 31 of Proceedings of Machine Learning Research, pp. 433–441. PMLR, Scottsdale, 29 April– 01 May 2013 19. Massey, J.L.: Guessing and entropy. In: Proceedings of 1994 IEEE International Symposium on Information Theory, p. 204, June 1994 20. Melicher, W., et al.: Fast, lean, and accurate: modeling password guessability using neural networks. In: 25th USENIX Security Symposium (USENIX Security 2016), pp. 175–191. USENIX Association, Austin, August 2016 21. Narayanan, A., Shmatikov, V.: Fast dictionary attacks on passwords using timespace tradeoff. In: Proceedings of the 12th ACM Conference on Computer and Communications Security, CCS 2005, pp. 364–372. Association for Computing Machinery, New York (2005) 22. Pasquini, D., Gangwal, A., Ateniese, G., Bernaschi, M., Conti, M.: Improving password guessing via representation learning. In 2021 42th IEEE Symposium on Security and Privacy, May 2021 23. Pathak, D., Kr¨ ahenb¨ uhl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting. In: Computer Vision and Pattern Recognition (CVPR) (2016) 24. Ur, B., et al.: Design and evaluation of a data-driven password meter. In: CHI 2017 (2017) 25. Ur, B., Bees, J., Segreti, S.M., Bauer, L., Christin, N., Cranor, L.F.: Do users’ perceptions of password security match reality? In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI 2016, pp. 3748–3760. Association for Computing Machinery, New York (2016) 26. Ur, B., et al.: How does your password measure up? The effect of strength meters on password creation. In: Presented as part of the 21st USENIX Security Symposium (USENIX Security 2012), pp. 65–80. USENIX, Bellevue (2012) 27. Ur, B., et al.: Measuring real-world accuracies and biases in modeling password guessability. In: 24th USENIX Security Symposium (USENIX Security 2015), pp. 463–481. USENIX Association, Washington, D.C., August 2015 28. Wang, D., He, D., Cheng, H., Wang, P.: fuzzyPSM: a new password strength meter using fuzzy probabilistic context-free grammars. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 595– 606, June 2016 29. Weir, M., Aggarwal, S., Medeiros, B.D., Glodek, B.: Password cracking using probabilistic context-free grammars. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 391–405, May 2009 30. Weir, M., Aggarwal, S., Collins, M., Stern, H.: Testing metrics for password creation policies by attacking large sets of revealed passwords. In Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, pp. 162–175. Association for Computing Machinery, New York (2010)

522

D. Pasquini et al.

31. Wheeler, D.L.: zxcvbn: low-budget password strength estimation. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 157–173. USENIX Association, Austin, August 2016 32. Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems 25, pp. 341–349. Curran Associates Inc. (2012)

Polisma - A Framework for Learning Attribute-Based Access Control Policies Amani Abu Jabal1(B) , Elisa Bertino1 , Jorge Lobo2 , Mark Law3 , Alessandra Russo3 , Seraphin Calo4 , and Dinesh Verma4 1

4

Purdue University, West Lafayette, IN, USA {aabujaba,bertino}@purdue.edu 2 ICREA - Universitat Pompeo Fabra, Barcelona, Spain [email protected] 3 Imperial College London, London, UK {mark.law09,a.russo}@imperial.ac.uk IBM TJ Watson Research Center, Yorktown Heights, NY, USA {scalo,dverma}@us.ibm.com

Abstract. Attribute-based access control (ABAC) is being widely adopted due to its flexibility and universality in capturing authorizations in terms of the properties (attributes) of users and resources. However, specifying ABAC policies is a complex task due to the variety of such attributes. Moreover, migrating an access control system adopting a low-level model to ABAC can be challenging. An approach for generating ABAC policies is to learn them from data, namely from logs of historical access requests and their corresponding decisions. This paper proposes a novel framework for learning ABAC policies from data. The framework, referred to as Polisma, combines data mining, statistical, and machine learning techniques, capitalizing on potential context information obtained from external sources (e.g., LDAP directories) to enhance the learning process. The approach is evaluated empirically using two datasets (real and synthetic). Experimental results show that Polisma is able to generate ABAC policies that accurately control access requests and outperforms existing approaches. Keywords: Authorization rules

1

· Policy mining · Policy generalization

Introduction

Most modern access control systems are based on the attribute-based access control model (ABAC) [3]. In ABAC, user requests to protected resources are granted or denied based on discretionary attributes of users, resources, and environmental conditions [7]. ABAC has several advantages. It allows one to specify access control policies in terms of domain-meaningful properties of users, resources, and environments. It also simplifies access control policy administration by allowing access decisions to change between requests by simply changing attribute values, without changing the user/resource relationships underlying the c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 523–544, 2020. https://doi.org/10.1007/978-3-030-58951-6_26

524

A. Abu Jabal et al.

rule sets [7]. As a result, access control decisions automatically adapt to changes in environments, and in user and resource populations. Because of its relevancy for enterprise security, ABAC has been standardized by NIST and an XML-based specification, known as XACML, has been developed by OASIS [20]. There are several XACML enforcement engines, some of which are publicly available (e.g., AuthZForce [18] and Balana [19]). Recently a JSON profile for XACML has been proposed to address the verbosity of the XML notation. However, a major challenge in using an ABAC model is the manual specification of the ABAC policies that represent one of the inputs for the enforcement engine. Such a specification requires detailed knowledge about properties of users, resources, actions, and environments in the domain of interest [13,22]. One approach to address this challenge is to take advantage of the many data sources that are today available in organizations, such as user directories (e.g., LDAP directories), organizational charts, workflows, security tracking’s logs (e.g., SIEM), and use machine learning techniques to automatically learn ABAC policies from data. Suitable data for learning ABAC policies could be access requests and corresponding access control responses (i.e., access control decisions) that are often collected in different ways and forms, depending on the real-world situation. For example, an organization may log past decisions taken by human administrators [17], or may have automated access control mechanisms based on low-level models, e.g., models that do not support ABAC. If interested in adopting a richer access control model, such an organization could, in principle, use these data to automatically generate access control policies, e.g., logs of past decisions taken by the low-level mechanism could be used as labeled examples for a supervised machine learning algorithm1 . Learned ABAC policies, however, must satisfy a few key requirements. They must be correct and complete. Informally, an ABAC policy set is correct if it is able to make the correct decision for any access request. It is complete if there are no access requests for which the policy set is not able to make a decision. Such a case may happen when the attributes provided with a request do not satisfy the conditions of any policy in the set. To meet these requirements, the following issues need to be taken into account when learning ABAC policies: – Noisy examples. The log of examples might contain decisions which are erroneous or inconsistent. The learning process needs to be robust to noise to avoid learning incorrect policies. – Overfitting. This is a problem associated with machine learning [10] which happens when the learned outcomes are good only at explaining data given as examples. In this case, learned ABAC policies would be appropriate only 1

Learning policies from logs of access requests does not necessarily mean that there is an existing access control system or that policy learning is aimed to reproduce existing policies or validate them. Such logs may consist of examples of access control decisions provided by a human expert, and learning may be used for example in a coalition environment where a coalition member can get logs from another coalition member to learn policies for similar missions.

Polisma - A Framework for Learning Attribute-Based Access Control Policies

525

for access requests observed during the learning process and fail to control any other potential access request, so causing the learned policy set to be incomplete. The learning process needs to generalize from past decisions. – Unsafe generalization. Generalization is critical to address overfitting. But at the same time generalization should be safe, that is, it should not result in learning policies that may have unintended consequences, thus leading to learned policies that are unsound. The learning process has to balance the trade-off between overfitting and safe generalization. This paper investigates the problem of learning ABAC policies, and proposes a learning framework that addresses the above issues. Our framework learns from logs of access requests and corresponding access control decisions and, when available, context information provided by external sources (e.g., LDAP directories). We refer to our framework as Polisma to indicate that our ABAC policy learner uses mining, statistical, and machine learning techniques. The use of multiple techniques enables extracting different types of knowledge that complement each other to improve the learning process. One technique captures data patterns by considering the frequency and another one exploits statistics and context information. Furthermore, another technique exploits data similarity. The assembly of these techniques in our framework enables better learning for ABAC policies compared to the other state-of-the-art approaches. Polisma consists of four steps. In the first step, a data mining technique is used to infer associations between users and resources included in the set of decision examples and based on these associations a set of rules is generated. In the second step, each constructed rule is generalized based on statistically significant attributes and context information. In the third step, authorization domains for users and resources (e.g., which resources were accessed using which operations by a specific user) are considered in order to augment the set of generalized rules with “restriction rules” for restricting access of users to resources by taking into account their authorization domain. Policies learned by those three stages are safe generalizations with limited overfitting. To improve the completeness of the learned set, Polisma applies a machine learning (ML) classifier on requests not covered by the learned set of policies and uses the result of the classification to label these data and generate additional rules in an “ad-hoc” manner.

2

Background and Problem Description

In what follows, we introduce background definitions for ABAC policies and access request examples, and formulate the problem of learning ABAC policies. 2.1

ABAC Policies

Attribute-based access control (ABAC) policies are specified as Boolean combinations of conditions on attributes of users and protected resources. The following definition is adapted from Xu et al. [24]:

526

A. Abu Jabal et al.

Definition 1 (cf. ABAC Model [24]). An ABAC model consists of the following components: – U, R, O, P refer, respectively, to finite sets of users, resources, operations, and rules. – AU refers to a finite set of user attributes. The value of an attribute a ∈ AU for a user u ∈ U is represented by a function dU (u, a). The range of dU for an attribute a ∈ AU is denoted by VU (a). – AR refers to a finite set of resource attributes. The value of an attribute a ∈ AR for a resource r ∈ R is represented by a function dR (r, a). The range of dR for an attribute a ∈ AR is denote – A user attribute expression eU defines a function that maps every attribute a ∈ AU , to a value in its range or ⊥, eU (a) ∈ VU (a) ∪ {⊥}. Specifically, eU can be expressed as the set of attribute/value pairs eU = {ai , vi  | ai ∈ AU ∧ f (ai ) = vi ∈ VU (ai )}. A user ui satisfies eU (i.e., it belongs to the set defined by eU ) iff for every user attribute a not mapped to ⊥, a, dU (ui , a) ∈ eU . Similarly, a resource si can be defined by a resource attribute expression eR . – A rule ρ ∈ P is a tuple eU , eR , O, d where ρ.eU is a user attribute expression, eR is a resource attribute expression, O ⊆ O is a set of operations, and d is the decision of the rule (d ∈ {permit, deny})2 . The original definition of an ABAC rule does not include the notion of “signed rules” (i.e., rules that specify positive or negative authorizations). By default, d = permit, and an access request u, r, o for which there exist at least a rule ρ = eU , eR , O, d in P such that u satisfies eU (denoted by u |= eU ), r satisfies eR (denoted by r |= eR ), and o ∈ O, is permitted. Otherwise, the access request is assumed to be denied. Even though negative authorizations are the default in access control lists, mixed rules are useful when dealing with large sets of resources organized according to hierarchies, and have been widely investigated [21]. They are used in commercial access control systems (e.g., the access control model of SQL Servers provides negative authorizations by means of the DENY authorization command), and they are part of the XACML standard. 2.2

Access Control Decision Examples

An access control decision example (referred to as a decision example (l)) is composed of an access request and its corresponding decision. Formally, Definition 2 (Access Control Decisions and Examples (l) ). An access control decision is a tuple u, r, o, d where u is the user who initiated the access request, r is the resource target of the access request, o is the operation requested on the resource, and d is the decision taken for the access request. A decision example l is a tuple u, eU , r, eR , o, d where eU and eR are a user attribute expression, and a resource attribute expression such that u |= eU , and r |= eR , and the other arguments are interpreted as in an access control decision. 2

Throughout the paper, we will use the dot notation to refer to a component of an entity (e.g., ρ.d refers to the decision of the rule ρ).

Polisma - A Framework for Learning Attribute-Based Access Control Policies

527

There exists an unknown set F of all access control decisions that should be allowed in the system, but for which we only have partial knowledge through examples. In an example l, the corresponding eU and eR can collect a few attributes (e.g., role, age, country of origin, manager, department, experience) depending on the available log information. Note that an access control decision is an example where eU and eR are the constant functions that assign to any attribute ⊥. We say that an example u, eU , r, eR , o, d belongs to an access control decision set S iff u, r, o, d ∈ S. Logs of past decisions are used to create an access control decision example dataset (see Definition 3). We say that a set of ABAC policies P controls an access control request u, r, o iff there is a rule ρ = eU , eR , O, d ∈ P that satisfies the access request u, r, o. Similarly, we say that the request u, r, o is covered by F iff u, r, o, d ∈ F, for some decision d. Therefore, P can be seen as defining the set of all access decisions for access requests controlled by P and, hence, be compared directly with F - and with some overloading in the notation, we want decisions in P (u, r, o, d ∈ P) to be decisions in F (u, r, o, d ∈ F). Definition 3 (Access Control Decision Example Dataset (D)). An access control decision example dataset is a finite set of decision examples (i.e., D = {l1 , . . . , ln }). D is expected to be mostly, but not necessarily, a subset of F. D is considered to be noisy if D ⊆ F. 2.3

Problem Definition

Using a set of decision examples, extracted from a system history of access requests and their authorized/denied decisions, together with some context information (e.g., metadata obtained from LDAP directories), we want to learn ABAC policies (see Definition 4). The context information provides user and resource identifications, as well as user and resource attributes and their functions needed for an ABAC model. Additionally, it might also provide complementary semantic information about the system in the form of a collection of typed binary relations, T = {t1 , . . . , tn }, such that each relation ti relates pairs of attribute values, i.e., ti ⊆ VX (a1 )×VY (a2 ), with X, Y ∈ {U, R}, and a1 ∈ AX , and a2 ∈ AY . For example, one of these relations can represent the organizational chart (department names are related in an is member or is part hierarchical relation) of the enterprise or institution where the ABAC access control system is deployed. Definition 4 (Learning ABAC Policies by Examples and Context (LAP EC)). Given an access control decision example dataset D, and context information comprising the sets U, R, O, the sets AU and AR , one of which could be empty, the functions assigning values to the attributes of the users and resources, and a possibly empty set of typed binary relations, T = {t1 , . . . , tn }, LAP EC aims at generating a set of ABAC rules (i.e., P = {ρ1 . . . ρm }) that are able to control all access requests in F.

528

2.4

A. Abu Jabal et al.

Policy Generation Assessment

ABAC policies generated by LAP EC are assessed by evaluation of two quality requirements: correctness, which refers to the capability of the policies to assign a correct decision to any access request (see Definition 5), and completeness, which refers to ensuring that all actions, executed in the domain controlled by an access control system, are covered by the policies (see Definition 6). This assessment can be done against example datasets as an approximation of F. Since datasets can be noisy, it is possible that two different decision examples for the same request are in the set. Validation will be done only against consistent datasets as we assume that the available set of access control examples is noisefree. Definition 5 (Correctness). A set of ABAC policies P is correct with respect to a consistent set of access control decisions D if and only if for every request u, r, o covered by D, u, r, o, d ∈ P → u, r, o, d ∈ D. Definition 6 (Completeness). A set of ABAC policies P is complete with respect to a consistent set of access control decisions D if and only if, for every request u, r, o, u, r, o covered by D → u, r, o is controlled by P. These definitions allow P to control requests outside D. The aim is twofold. First, when we learn P from an example dataset D, we want P to be correct and complete with respect to D. Second, beyond D, we want to minimize incorrect decisions while maximizing completeness with respect to F. Outside D, we evaluate correctness statistically through cross validation with example datasets which are subsets of F, and calculating Precision, Recall, F1 score, and accuracy of the decisions made by P. We quantify completeness by the Percentage of Controlled Requests (PCR) which is the ratio between the number of decisions made by P among the requests covered by an example dataset and the total number of requests covered by the dataset. Definition 7 (Percentage of Controlled Requests (PCR)). Given a subset of access control decision examples N ⊆ F, and a policy set P, the percentage of controlled requests is defined as follows: P CR =

3

|{u, r, o | u, r, o is covered by N and u, r, o is controlled by P}| |{u, r, o | u, r, o is covered by N }|

The Learning Framework

Our framework comprises four steps, see Fig. 1. For a running example see Appendix B. 3.1

Rules Mining

D provides a source of examples of user accesses to the available resources in an organization. In this step, we use association rule mining (ARM) [1] to analyze the association between users and resources.

Polisma - A Framework for Learning Attribute-Based Access Control Policies

529

Thus, rules having high association metrics (i.e., support and confidence scores) are kept to generate access control rules. ARM is used for capturing the frequency and data patterns and formalize them in rules. Polisma uses Apriori [2] (one of the ARM algoFig. 1. The architecture of Polisma rithms) to generate rules that are correct and complete with respect to D. This step uses only the examples in D to mine a first set of rules, referred to as ground rules. These ground rules potentially are overfitted (e.g., ρ1 in Fig. 8 in Appendix B) or unsafelygeneralized (e.g., ρ2 in Fig. 8 in Appendix B). Overfitted rules impact completeness over F while unsafely-generalized rules impact correctness. Therefore, Polisma post-processes the ground rules in the next step. 3.2

Rules Generalization

To address overfitting and unsafe generalization associated with the ground rules, this step utilizes the set of user and resource attributes AU , AR provided by either D, external context information sources, or both. Straightforwardly, eU (i.e., user expression) or eR (i.e., resource expression) of a rule can be adapted to expand or reduce the scope of each expression when a rule is overfitted or is unsafely generalized, respectively. In particular, each rule is post-processed based on its corresponding user and resource by analyzing AU and AR to statistically “choose” the most appropriate attribute expression that captures the subsets of users and resources having the same permission authorized by the rule according to D. In this way, this step searches for an attribute expression that minimally generalizes the rules. Polisma provides two strategies for post-processing ground rules. One strategy, referred to as Brute-force Strategy BS, uses only the attributes associated with users and resources appearing in D. The other strategy assumes that attribute relationship metadata (T ) is also available. Hence, the second strategy, referred to as Structure-based Strategy SS, exploits both attributes and their relationships. In what follows, we describe each strategy. Brute-Force Strategy (BS). To post-process a ground rule, this strategy searches heuristically for attribute expressions that cover the user and the resource of interest. Each attribute expression is weighted statistically to assess its “safety” level for capturing the authorization defined by the ground rule of interest. The safety level of a user attribute expression eU for a ground rule ρ is estimated by the proportion of the sizes of two subsets: a) the subset of decision

530

A. Abu Jabal et al.

Algorithm 1. Brute-force Strategy (BS-U -S) for Generalizing Rules Require: ρi : a ground rule, ui : the user corresponding to ρi , D, AU . 1: Define wmax = -∞, aselected = ⊥. 2: for ai ∈ AU do 3: wi = Wuav (ui , ai , dU (ui , ai ), D) 4: if wi > wmax then 5: aselected = ai , wmax = wai 6: end if 7: end for 8: eU ←  aselected , dU (ui , aselected )  9: Create Rule ρi = eU , ρi .eR , ρi .O, ρi .d 10: return ρi

Algorithm 2. Structure-based Strategy (SS) for Generalizing Rules Require: ρi : a ground rule to be Generalized, ui : the user corresponding to ρi , ri : the resource corresponding to ρi , T , AU , AR . 1: gi = G(ui , ri , AU , AR , T ) 2: x ∈ AU , y ∈ AR  = First-Common-Attribute-Value(gi ) 3: eU ←  x, dU (ui , x) ; eR ←  y, dR (ri , y)  4: Create Rule ρi = eU , eR , ρi .O, ρi .d 5: return ρi

examples in D such that the access requests issued by users satisfy eU while the remaining parts of the decision example (i.e., resource, operation, and decision) match the corresponding ones in ρ; and b) the subset of users who satisfy eU . The safety level of a resource attribute expression is estimated similarly. Thereafter, the attribute expression with the highest safety level is selected to be used for post-processing the ground rule of interest. Formally: Definition 8 (User Attribute Expression Weight Wuav (ui , aj , dU (ui , aj ), D)). For a ground rule ρ and its corresponding user ui ∈ U , the weight of a D| user attribute expression aj , dU (ui , aj ) is defined by the formula: Wuav = |U |UC | , where UD = {l ∈ D | dU (ui , aj ) = dU (l.u, aj ) ∧ l.r ∈ ρ.eR ∧ l.o ∈ ρ.O ∧ ρ.d = l.d} and UC = {uk ∈ U | dU (ui , aj ) = dU (uk , aj )}. Different strategies for selecting attribute expressions are possible. They can be based on a single attribute or a combination of attributes, and they can consider either user attributes, resource attributes, or both. Strategies using only user attributes are referred to as BS-U -S “for a single attribute” and BS-U -C “for a combination of attributes”. Similarly, BS-R-S and BS-R-C are strategies for a single or a combination of resources attributes. The strategies using the attributes of both users and resources are referred to as BS-U R-S and BSU R-C. Due to space limitation, we show only the algorithm using the selection strategy BS-U -S (see Algorithm 1). Structure-Based Strategy (SS). The BS strategy works without prior knowledge of T . When T is available, it can be analyzed to gather information about “hidden” correlations between common attributes of a user and a resource. Such attributes can potentially enhance rule generalization. For example, users working in a department ti can potentially access the resources owned

Polisma - A Framework for Learning Attribute-Based Access Control Policies

531

by ti . The binary relations in T are combined to form a directed graph, referred to as Attribute-relationship Graph G (see Definition 9). Traversing this graph starting from the lowest level in the hierarchy to the highest one allows one to find the common attributes values between users and resources. Among the common attribute values, the one with the least hierarchical level, referred to as first common attribute value, has heuristically the highest safety level for generalization because the generalization using such attribute value supports the least level of generalization. Subsequently, to post-process a ground rule, this strategy uses T along with AU , AR of both the user and resource of interest to build G. Thereafter, G is used to find the first common attribute value between the user and resource of that ground rule to be used for post-processing the ground rule of interest (as described in Algorithm 2). Definition 9 (Attribute-relationship Graph G). Given AU , AR , ρ and T , the attribute-relationship graph (G) of ρ is a directed graph composed of vertices V and edges E where V = {v | ∀ui ∈ ρ.eU , ∀ai ∈ AU , v = dU (ui , ai )} ∪{v | ∀ri ∈ ρ.eR , ∀ai ∈ AR , v = dR (ri , ai )}, and E = {(v1 , v2 ) | ∃ti ∈ T ∧ (v1 , v2 ) ∈ ti ∧ v1 , v2 ∈ V }. Proposition 1. Algorithms 1 and 2 output generalized rules that are correct and complete with respect to D. As discussed earlier, the first step generates ground rules that are either overfitted or unsafely-generalized. This second step post-processes unsafely-generalized rules into safely-generalized ones; hence, improving correctness. It may also postprocess overfitted rules into safely-generalized ones; hence, improving completeness. However, completeness can be further improved as described in the next subsections. 3.3

Rules Augmentation Using Domain-Based Restrictions

“Safe” generalization of ground rules is one approach towards fulfilling completeness. Another approach is to analyze the authorization domains of users and resources appearing in D to augment restriction rules, referred to as domainbased restriction rules 3 . Such restrictions potentially increase safety by avoiding erroneous accesses. Basically, the goal of these restriction rules is to augment negative authorization. One straightforward approach for creating domain-based restriction rules is to analyze the authorization domain for each individual user and resource. However, such an approach leads to the creation of restrictions suffering from overfitting. Alternatively, the analysis can be based on groups of users or resources. 3

This approach also implicitly improves correctness.

532

A. Abu Jabal et al.

Identifying such groups requires pre-processing AU and AR . Therefore, this step searches heuristically for preferable attributes to use for partitioning the user and resource sets. The heuristic prediction for preferable attributes is based on selecting an attribute that partitions the corresponding set evenly. Hence, estimating the attribute distribution of users or resources allows one to measure the ability of the attribute of interest for creating even partitions4 . One method for capturing the attribute distribution is to compute the attribute entropy as defined in Eq. 1. The attribute entropy is maximized when the users are distributed evenly among the user partitions using the attribute of interest (i.e., sizes of the subsets in GaUi are almost equal). Thereafter, the attribute with the highest entropy is the preferred one. Given the preferred attributes, this step constructs user groups as described in Definition5 10. These groups can be analyzed with respect to O as described in Algorithm 3 Consequently, user groups GaUi and resource groups GaRi along with O comprise two types of authorization domains: operations performed by each user group, and users’ accesses to each resource group. Subsequently, the algorithm augments restrictions based on these access domains as follows. – It generates deny restrictions based on user groups (similar to the example in Fig. 11 in Appendix B). These restriction rules deny any access request from a specific user group using a specific operation to access any resource. – It generates deny restrictions based on both groups of users and resources. These restriction rules deny any access request from a specific user group using a specific operation to access a specific resource group. On the other hand, another strategy assumes prior knowledge about preferred attributes to use for partitioning the user and resource sets (referred to as Semantic Strategy (SS)). Definition 10 (Attribute-based User Groups GaUi ). Given U and ai ∈ AU , U is divided into is a set of user groups GaUi {g1ai , . . . , gkai } where (giai = {u1 , . . . , um } | ∀u , u ∈ gi , dU (u , ai ) = dU (u , ai ) , m ≤| U |) ∧ (gi ∩ gj = φ | i = j) ∧ (k =| VU (ai ) |). a

Entropy(GUi , ai ) = (

−1 ∗ ln m

m  a j=1,gj ∈GUi

a

pj ∗ ln pj ), where m = |GUi |, pj = m

|gj | a

l=1,gl ∈GUi

|gl |

(1) Proposition 2. Step 3 outputs generalized rules that are correct and complete with respect to D.

4

5

Even distribution tends to generate smaller groups. Each group potentially has a similar set of permissions. A large group of an uneven partition potentially includes a diverse set of users or resource; hence hindering observing restricted permissions. The definition of constructing resource groups is analogous to that of user groups.

Polisma - A Framework for Learning Attribute-Based Access Control Policies

3.4

533

Rules Augmentation Using Machine Learning

The rules generated from the previous steps are generalized using domain knowledge and data statistics extracted from D. However, these steps do not consider generalization based on the similarity of access requests. Thus, these rules cannot control a new request that is similar to an old one (i.e., a new request has a similar pattern to one of the old requests, but the attribute values of the new request do not match the old ones). A possible prediction approach is to use an ML classifier that builds a model based on the attributes provided by D and context information. The generated model implicitly creates patterns of accesses and their decisions, and will be able to predict the decision for any access request based on its similarity to the ones controlled by the rules generated by the previous steps. Thus, this step creates new rules based on these predictions. Once these new rules are created, Polisma repeats Step 2 to safely generalize the ML-based augmented rules. Notice that ML classification algorithms might introduce some inaccuracy when performing the prediction. Hence, we utilize this step only for access requests that are not covered by the rules generated by the previous three steps. Thus, the inaccuracy effect associated with the ML classifier is minimized but the correctness and completeness are preserved. Note that Polisma is used as a one-time learning. However, if changes arise in terms of regulations, security policies and/or organizational structure of the organization, the learning can be executed again (i.e., on-demand learning).

4

Evaluation

In this section, we summarize the experimental methodology and report the evaluation results of Polisma. 4.1

Experimental Methodology

Datasets. To evaluate Polisma, we conducted several experiments using two datasets: one is a synthetic dataset (referred to as project management (P M ) [24]), and the other is a real one (referred to as Amazon 6 ). The P M dataset has been generated by the Reliable Systems Laboratory at Stony Brook University based on a probability distribution in which the ratio is 25 for rules, 25 for resources, 3 for users, and 3 for operations. In addition to decision examples, P M is tagged with context information (e.g., attribute relationships and attribute semantics). We used such a synthetic dataset (i.e., P M ) to show the impact of the availability of such semantic information on the learning process and the quality of the generated rules. Regarding the Amazon dataset, it is a historical dataset collected in 2010 and 2011. This dataset is an anonymized sample of access provisioned within the Amazon company. One characteristic of 6

http://archive.ics.uci.edu/ml/datasets/Amazon+Access+Samples.

534

A. Abu Jabal et al. Table 1. Overview of datasets Datasets Project Management Amazon ( P M) ( AZ) # # # # # # # #

of of of of of of of of

decision examples users resources operations examples with a “Permit” decision examples with a “Deny” decision User Attributes Resource Attributes

550 150 292 7 505 50 5 6

1000 480 271 1 981 19 12 1

the Amazon dataset is that it is sparse (i.e., less than 10% of the attributes are used for each sample). Furthermore, since the Amazon dataset is large (around 700K decision examples), we selected a subset of the Amazon dataset (referred to as AZ). The subset was selected randomly based on a statistical analysis [11] of the size of the Amazon dataset where the size of the selected samples suffices a confidence score of 99% and a margin error score of 4%. Table 1 shows statistics about the P M and AZ datasets. Comparative Evaluation. We compared Polisma with three approaches. We developed a Na¨ıve ML approach which utilizes an ML classifier to generate rules. A classification model is trained using D. Thereafter, the trained model is used to generate rules without using the generalization strategies which are used in Polisma. The rules generated from this approach are evaluated using another set of new decision examples (referred to as N )7 . Moreover, we compared Polisma with two recently published state-of-the-art approaches for learning ABAC policies from logs: a) Xu & Stoller Miner [24], and b) Rhapsody by Cotrini et al. [5]. Evaluation and Settings. On each dataset, Polisma is evaluated by splitting the dataset into training D (70%) and testing N (30%) subsets. We performed a 10-fold cross-validation on D for each dataset. As the ML classifiers used for the fourth step of Polisma and the na¨ıve approach, we used a combined classification technique based on majority voting where the underlying classifiers are Random Forest and kNN8 . All experiments were performed on a 3.6 GHz Intel Core i7 machine with 12 GB memory running on 64-bit Windows 7.

7

8

Datasets are assumed to be noise-free, that is, (N ⊂ F ) ∧ (D ⊂ F ) ∧ (N ∩ D = φ). Note that F is the complete set of control decisions which we will never have in a real system. Other algorithms can be used. We used Random Forest and kNN classifiers since they showed better results compared to SVM and Adaboost.

Polisma - A Framework for Learning Attribute-Based Access Control Policies

535

Evaluation Metrics. The correctness of the generated ABAC rules is evaluated using the standard metrics for the classification task in the field of machine learning. Basically, predicted decisions for a set of access requests are compared with their ground-truth decisions. Our problem is analogous to a two-class classification task because the decision in an access control system is either “permit” or “deny”. Therefore, for each type of the decisions, a confusion matrix is prepared to enable calculating the values of accuracy, precision, recall, and F1 score as outlined in Eqs. 2–39 . TP TP , Recall = (2) TP + FP TP + FN TP + TN P recision · Recall , Accuracy = (3) F 1 Score = 2 · P recision + Recall TP + TN + FP + FN Regarding the completeness of the rules, they are measured using PCR (see Definition 7). P recision =

Fig. 2. Comparison of Na¨ıve, Xu & Stoller Miner, and Polisma Using the P M Dataset

(a) Polisma (BS-HS-ML)

Fig. 3. Comparison of Na¨ıve, Rhapsody, and Polisma Using the AZ Dataset

(b) Polisma (SS-SS-ML)

Fig. 4. Evaluation of Polisma using the P M dataset.

4.2

Experimental Results

Polisma vs. Other Learning Approaches. Here, Polisma uses the default strategies in each step (i.e., the BS-U R-C strategy in the second step and the HS strategy in the third step10 ) and the impact of the other strategies in Polisma 9 10

F1 Score is the harmonic mean of precision and recall. Since the AZ dataset does not contain resource attributes, BS-R-C (instead of BSU R-C) is executed in the second step and the execution of the third step is skipped.

536

A. Abu Jabal et al.

Fig. 5. Comparison between the variants of the brute-force strategy (Step 2) using the P M dataset.

Fig. 6. Polisma Evaluation on the Amazon Dataset (a sample subset and the whole set).

Fig. 7. Polisma Evaluation on a sample subset of Amazon Dataset for only positive authorizations.

is evaluated next. The results in Fig. 2 show that Polisma achieves better results compared to the na¨ıve and Xu & Stoller Miner using the P M dataset. In this comparison, we did not include Rhapsody because the default parameters of Rhapsody leads to generating no rule. With respect to Xu & Stoller Miner, Polisma’s F1 score (PCR) improves by a factor of 2.6(2.4). This shows the importance of integrating multiple techniques (i.e., data mining, statistical, and ML techniques) in Polisma to enhance the policy learning outcome. Moreover, Xu & Stoller Miner requires prior knowledge about the structure of context information while Polisma benefits from such information when available. Xu & Stoller Miner [24] generates rules only for positive authorization (i.e., no negative authorization). This might increase the possibility of vulnerabilities and encountering incorrect denials or authorizations. Moreover, Xu & Stoller in their paper [24] used different metrics from the classical metrics for data mining that are used in our paper. In their work [24], they used a similarity measure between the generated policy and the real policy which given based on some distance function. However, such a metric does not necessarily reflect the correctness of the generated policy (i.e., two policies can be very similar, but their decisions are different). In summary, this experiment shows that the level of correctness (as indicated by the precision, recall and F1 score) and completeness (as indicated by the PCR) of the policies that are generated by Polisma is higher than for the policies generated by Xu & Stoller Miner and the na¨ıve approach due to the reasons explained earlier (i.e., the lack of negative authorization rules), as well as generating generalized rules without considering the safety of generalization. Furthermore, as shown in Fig. 3, Polisma also outperforms the na¨ıve

Polisma - A Framework for Learning Attribute-Based Access Control Policies

537

approach and Rhapsody using the AZ dataset. Rhapsody [5] only considers positive authorization (similar to the problem discussed above about Xu & Stoller Miner [24]); hence increasing the chances of vulnerabilities. In this comparison, we have excluded Xu & Stoller Miner because of the shortage of available resource attributes information in the AZ dataset. Concerning the comparison with the na¨ıve approach, on the P M dataset, the F1 score (PCR) of Polisma improves by a factor of 1.2 (4.1) compared to that of the na¨ıve. On the AZ dataset, Polisma’s F1 score (PCR) improves by a factor of 2.7 (3.3) compared to the na¨ıve. Both Polisma and the na¨ıve approach use ML classifiers. However, Polisma uses an ML classifier for generating only some of the rules. This implies that among Polisma steps which improve the results of the generated rules (i.e., Steps 2 and Step 4), the rule generalization step is essential for enhancing the final outcome. In summary, the reported results show that the correctness level of the rules generated by Polisma is better than that of the ones generated by Rhapsody and the na¨ıve approach (as indicated by the difference between the precision, recall, and F1 scores metrics). Meanwhile, the difference between the completeness level (indicated by PCR) of the generated rules using Polisma and that of Rhapsody is not large. The difference in terms of completeness and correctness is a result of Rhapsody missing the negative authorization rules. Steps Evaluation. Figure 4 shows the evaluation results for the four steps of Polisma in terms of precision, recall, F1 score, accuracy, and PCR. In general, the quality of the rules improves gradually for the two datasets, even when using different strategies in each step. The two plots show that the strategies that exploit additional knowledge about user and resource attributes (e.g., T ) produce better results when compared to the ones that require less prior knowledge. Moreover, the quality of the generated rules is significantly enhanced after the second and the fourth steps. This shows the advantage of using safe generalization and similarity-based techniques. On the other hand, the third step shows only a marginal enhancement on the quality of rules. However, the rules generated from this step can be more beneficial when a large portion of the access requests in the logs are not allowed by the access control system. We also conducted an evaluation on the whole Amazon dataset; the results are shown in Fig. 6. On the whole dataset, Polisma achieves significantly improved scores compared to those when using the AZ dataset because Polisma was able to utilize a larger set of decision examples. Nonetheless, given that the size of AZ was small compared to that of the whole Amazon dataset, Polisma outcomes using AZ 11 are promising. In summary, the increase of recall can be interpreted as reducing denials of authorized users (i.e., smaller number of falsenegatives). A better precision value is interpreted as being more accurate with the decisions (i.e., smaller number of false-positives). Figures 4a–4b show that most of the reduction of incorrect denials is achieved in Steps 1 and 2, whereas the other steps improve precision which implies more correct decisions (i.e., decreas11

We also experimented samples of different sizes (i.e., 2k–5k), the learning results using these sample sizes showed slight improvement of scores.

538

A. Abu Jabal et al.

ing the number of false-positives and increasing the number of true-positives). As Fig. 7 shows, the results improve significantly when considering only positive authorizations. This is due to the fact that the negative examples are few and so they are not sufficient in improving the learning of negative authorizations. As shown in the figure, precision, recall, and F1 score increase indicating that the learned policies are correct. Precision is considered the most effective metric especially for only positive authorizations to indicate the correctness of the generated policies since higher precision implies less incorrect authorizations. Variants of the Brute-Force Strategy. As the results in Fig. 5 show, using resource attributes for rules generalization is better than using the user attributes. The reason is that the domains of the resource attributes (i.e., VR (ai ) | ai ∈ AR ) is larger than that of the user attributes (i.e., VU (aj ) | aj ∈ AU ) in the P M dataset. Thus, the selection space of attribute expressions is expanded; hence, it potentially increases the possibility of finding better attribute expression for safe generalization. In particular, using resources attributes achieves 23% and 19% improvement F1 score and PCR when compared to that of using users attributes producing a better quality of the generated policies in terms of correctness (as indicated by the values of precision, recall, and F1 score) and completeness (as indicated by the PCR value). Similarly, performing the generalization using both user and resource attributes is better than that of using only user attribute or resource attributes because of the larger domains which can be exploited to find the best attribute expression for safe generalization. In particular, BS-U R-C shows a significant improvement compared to BS-U -C (22% for F1 score (which reflects a better quality of policies in terms of correctness), and 25% for PCR (which reflects a better quality of policies in terms of completeness)). In conclusion, the best variant of the BS strategy is the one that enables exploring the largest set of possible attribute values to choose the attribute expression which has the highest safety level.

5

Related Work

Policy mining has been widely investigated. However, the focus has been on RBAC role mining that aims at finding roles from existing permission data [16, 23]. More recent work has focused on extending such mining-based approaches to ABAC [5,8,14,15,24]. Xu and Stoller [24] proposed the first approach for mining ABAC policies from logs. Their approach iterates over a log of decision examples greedily and constructs candidate rules; it then generalizes these rules by utilizing merging and refinement techniques. Medvet et al. [14] proposed an evolutionary approach which incrementally learns a single rule per group of decision examples utilizing a divide-and-conquer algorithm. Cotrini et al. [5] proposed another rule mining approach, called Rhapsody, based on APRIORI-SD (a machine-learning algorithm for subgroup discovery) [9]. It is incomplete, only mining rules covering a significant number of decision examples. The mined rules are then filtered to remove any refinements. All of those approaches [5,14,24] mainly rely on decision

Polisma - A Framework for Learning Attribute-Based Access Control Policies

539

logs assuming that they are sufficient for rule generalization. However, logs, typically being very sparse, might not contain sufficient information for mining rules of good quality. Thus, those approaches may not be always applicable. Moreover, neither of those approaches is able to generate negative authorization rules. Mocanu et al. [15] proposed a deep learning model trained on logs to learn a Restricted Boltzmann Machine (RBM). Then, the learned model is used to generate candidate rules. Their proposed system is still under development and further testing is needed. Karimi and Joshi [8] proposed to apply clustering algorithms over the decision examples to predict rules, but they don’t support rules generalization. To the best of our knowledge, Polisma is the first generic framework that incorporates context information, when available, along with decisions logs to increase accuracy. Another major difference of Polisma with respect to existing approaches is that Polisma can use ML techniques, such as statistical ML techniques, for policy mining while at the same time being able to generate ABAC rules expressed in propositional logics.

6

Conclusion and Future Work

In this paper, we have proposed Polisma, a framework for learning ABAC policies from examples and context information. Polisma comprises four steps using various techniques, namely data mining, statistical, and machine learning. Our evaluations, carried out on a real-world decision log and a synthetic log, show that Polisma is able to learn policies that are both complete and correct. The rules generated by Polisma using the synthetic dataset achieve an F1 score of 0.80 and PCR of 0.95; awhen using the real dataset, the generated rules achieve an F1 score of 0.98 and PCR of 1.0. Furthermore, by using the semantic information available with the synthetic dataset, Polisma improves the F1 score to reach 0.86. As part of future work, we plan to extend our framework by integrating an inductive learner [12] and a probabilistic learner [6]. We also plan to extend our framework to support policy transfer and policy learning explainability. In addition, we plan to extend Polisma with of conflict resolution techniques to deal with input noisy data. Another direction is to integrate in Polisma different ML algorithms, such as neural networks. Preliminary experiments about using neural networks for learning access control policies have been reported [4]. However, those results show that neural networks are not able to accurately learn negative authorizations and thus work is required to enhance them. Acknowledgment. This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agreement Number W911NF-163-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Jorge Lobo was also supported

540

A. Abu Jabal et al.

by the Spanish Ministry of Economy and Competitiveness under Grant Numbers TIN201681032P, MDM20150502, and the U.S. Army Research Office under Agreement Number W911NF1910432.

A

Additional Algorithms

Algorithm 3 outlines the third step of Polisma for augmenting rules using domain-based restrictions. Algorithm 3. Rules Augmentation using Domain-based Restrictions Require: D, U , R, x: a preferable attribute of users, y: a preferable attribute of resources, O. ai 1: GU = GUai (U , x); GR = GR (R, y) 2: ∀ gi ∈ GU , Ogui → {}; ∀ gi ∈ GR , Ogri → {}; ∀ gi ∈ GR , Ugri → {} 3: for li ∈ D do 4: g  ← gi ∈ GU | li .u ∈ gi ∧ li .d = P ermit; Ogu → Ogu ∪ li .o 5: Ogr → Ogr ∪ li .o; Ugr → Ugr ∪ li .u 6: end for 7: P  → {} 8: for gi ∈ GU do 9: ρi = x, dU (ui , x), ∗, oi , Deny | ui ∈ gi ∧ oi ∈ (O \ Ogu ); P  → P  ∪ ρi i 10: end for 11: for gi ∈ GR do 12: ρi = x, dU (ui , x), y, dR (ri , y), ∗, Deny | ri ∈ gi ∧ ui ∈ (U \ Ugr ); P  → P  ∪ ρi i 13: end for  14: return P

B

Running Example of Policy Learning Using Polisma

Consider a system including users and resources both associated with projects. User attributes, resource attributes, operations, and possible values for two selected attributes are shown in Table 2. Assume that a log of access control decision examples is given. Table 2. Details about A Project Management System User Attributes (AU )

{id, role, department, project, technical area}

Resource Attributes (AR ) {id, type, department, project, technical area} Operations List (O)

{request, read, write, setStatus, setSchedule, approve, setCost}

VU (role)

{planner, contractor, auditor, accountant, department manager, project leader}

VR (type)

{budget, schedule, and task}

Polisma - A Framework for Learning Attribute-Based Access Control Policies

541

Fig. 8. Examples of ground rules generated from rule mining based on the specifications of the running example

B.1

Rules Generalization

Brute-Force Strategy (BS). Assume that BS-U R-C is used to generalize ρ2 defined in Fig. 8. AU is {id, role, department, project, technical area} and AR is {id, type, department, project, technical area}. Moreover, ρ2 is able to control the access for the user whose ID is “acct-4’ when accessing a resource whose type is task. The attribute values of the user and resources controlled by ρ2 are analyzed. To generalize ρ2 using BS-U R-C, each attribute value is weighted as shown in Fig. 9. For weighting each user/resource attribute value, the proportion of the sizes of two user/resource subsets is calculated according to Definition 8. In particular, for the value of the “department” attribute corresponding to the user referred by ρ2 (i.e., “d1”) (Fig. 9b), two user subsets are first found: a) the subset of the users belonging to department “d1”; and b) the subset of the users belonging to department “d1” and having a permission to perform the “setCost” operation on a resource of type “task” based on D. Then, the ratio of the sizes of these subsets is considered as the weight for the attribute value“d1”. The weights for the other user and resource attributes values are calculated similarly. Thereafter, the user attribute value and resource attribute value with the highest wights are chosen to perform the generalization. Assume that the value of the “department” user attribute is associated with the highest weight and the “project” resource attribute is associated with the highest weight. ρ2 is generalized as: ρ 2 = user(department: d1), resource (type: task, project: d1-p1), setCost, permit

(a) w.r.t. dU (uid= acct−4 , “role”)

(b) w.r.t. dU (uid= acct−4 , “department”)

(c) w.r.t. dU (uid= acc−4 , “project”)

(d) w.r.t. dR (rtype= task , “project”)

Fig. 9. Generalization of ρ2 defined in Fig. 8 using the Brute Force Strategy (BS-U R-C)

542

A. Abu Jabal et al.

Fig. 10. Generalization of ρ2 defined in Fig. 8 using Structure-based Strategy: An example of Attribute-relationship Graph

Structure-Based Strategy (SS). Assume that SS is used to generalize ρ2 , defined in Fig. 8. Also, suppose that Polisma is given the following information: – The subset of resources R satisfying ρ2 .eR has two values for the project attribute (i.e., “d1-p1”, “d1-p2”). – The user “acct-4” belongs to the project “d1-p1”. – R and “acct-4” belong to the department “d1”. – T = {(“d1-p1”, “d1”), (“d1-p2”, “d1”)}. G is constructed as shown in Fig. 10. Using G, the two common attributes for “acct-4” and R are “d1-p1” and “d1” and the first common attribute is “d1-p1”. Therefore,ρ2 is generalized as follows: ρ 2 = user(project: d1-p1), resource (type: task, project: d1-p1), setCost, permit B.2

Rules Augmentation Using Domain-Based Restrictions

Assume that we decide to analyze the authorization domain by grouping users based on the “role” user attribute. As shown in the top part of Fig. 11, the authorization domains of the user groups having distinct values for the “role” attribute are identified using the access requests examples of D. These authorization domains allow one to recognize the set of operations authorized per user group. Thereafter, a set of negative authorizations are generated to restrict users having a specific role from performing specific operations on resources.

Polisma - A Framework for Learning Attribute-Based Access Control Policies

543

Fig. 11. Rules augmentation using domain-based restrictions

B.3

Rules Augmentation Using Machine Learning

Assume that D includes a decision example ( li ) for a user (id: “pl-1”) accessing a resource (id: “sc-1”) where both of them belong to a department “d1”. Assuming Polisma generated a rule based on li as follows: ρi =  user(role: planner, department: d1), resource (type: schedule, department: d1), read, permit  Such a rule cannot control a new request by a user (id: “pl-5” for accessing a resource (id: “sc-5”) where both of them belong to another department “d5”). Such a is similar to li . Therefore, a prediction-based approach is required to enable generating another rule.

References 1. Agrawal, R., Imieli´ nski, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993) 2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. VLDB 1215, 487–499 (1994) 3. Bertino, E., Ghinita, G., Kamra, A.: Access control for databases: concepts and systems. Found. Trends® Databases 3(1–2), 1–148 (2011) 4. Cappelletti, L., Valtolina, S., Valentini, G., Mesiti, M., Bertino, E.: On the quality of classification models for inferring ABAC policies from access logs. In: Big Data, pp. 4000–4007. IEEE (2019) 5. Cotrini, C., Weghorn, T., Basin, D.: Mining ABAC rules from sparse logs. In: EuroS&P, pp. 31–46. IEEE (2018) 6. De Raedt, L., Dries, A., Thon, I., Van den Broeck, G., Verbeke, M.: Inducing probabilistic relational rules from probabilistic examples. In: IJCAI (2015) 7. Hu, V., et al.: Guide to attribute based access control (ABAC) definition and considerations (2017). https://nvlpubs.nist.gov/nistpubs/specialpublications/nist. sp.800-162.pdf 8. Karimi, L., Joshi, J.: An unsupervised learning based approach for mining attribute based access control policies. In: Big Data, pp. 1427–1436. IEEE (2018)

544

A. Abu Jabal et al.

9. Kavˇsek, B., Lavraˇc, N.: APRIORI-SD: adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 20(7), 543–583 (2006) 10. Kohavi, R., Sommerfield, D.: Feature subset selection using the wrapper method: overfitting and dynamic search space topology. In: KDD, pp. 192–197 (1995) 11. Krejcie, R.V., Morgan, D.W.: Determining sample size for research activities. Educ. Psychol. Measur. 30(3), 607–610 (1970) 12. Law, M., Russo, A., Elisa, B., Krysia, B., Jorge, L.: Representing and learning grammars in answer set programming. In: AAAI (2019) 13. Maxion, R.A., Reeder, R.W.: Improving user-interface dependability through mitigation of human error. Int. J. Hum.-Comput. Stud. 63(1–2), 25–50 (2005) 14. Medvet, E., Bartoli, A., Carminati, B., Ferrari, E.: Evolutionary inference of attribute-based access control policies. In: Gaspar-Cunha, A., Henggeler Antunes, C., Coello, C.C. (eds.) EMO 2015. LNCS, vol. 9018, pp. 351–365. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15934-8 24 15. Mocanu, D., Turkmen, F., Liotta, A.: Towards ABAC policy mining from logs with deep learning. In: IS, pp. 124–128 (2015) 16. Molloy, I., et al.: Mining roles with semantic meanings. In: SACMAT, pp. 21–30. ACM (2008) 17. Ni, Q., Lobo, J., Calo, S., Rohatgi, P., Bertino, E.: Automating role-based provisioning by learning from examples. In: SACMAT, pp. 75–84. ACM (2009) 18. AuthZForce. https://authzforce.ow2.org/ 19. Balana. https://github.com/wso2/balana 20. OASIS eXtensible Access Control Markup Language (XACML) TC. https://www. oasis-open.org/committees/tc home.php?wg abbrev=xacml 21. Rabitti, F., Bertino, E., Kim, W., Woelk, D.: A model of authorization for nextgeneration database systems. TODS 16(1), 88–131 (1991) 22. Sadeh, N., et al.: Understanding and capturing people’s privacy policies in a mobile social networking application. Pers. Ubiquitous Comput. 13(6), 401–412 (2009) 23. Xu, Z., Stoller, S.D.: Algorithms for mining meaningful roles. In: SACMAT, pp. 57–66. ACM (2012) 24. Xu, Z., Stoller, S.D.: Mining attribute-based access control policies from logs. In: Atluri, V., Pernul, G. (eds.) DBSec 2014. LNCS, vol. 8566, pp. 276–291. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43936-4 18

A Framework for Evaluating Client Privacy Leakages in Federated Learning Wenqi Wei(B) , Ling Liu, Margaret Loper, Ka-Ho Chow, Mehmet Emre Gursoy, Stacey Truex, and Yanzhao Wu Georgia Institute of Technology, Atlanta, GA 30332, USA {wenqiwei,khchow,memregursoy,staceytruex,yanzhaowu}@gatech.edu, [email protected], [email protected]

Abstract. Federated learning (FL) is an emerging distributed machine learning framework for collaborative model training with a network of clients (edge devices). FL offers default client privacy by allowing clients to keep their sensitive data on local devices and to only share local training parameter updates with the federated server. However, recent studies have shown that even sharing local parameter updates from a client to the federated server may be susceptible to gradient leakage attacks and intrude the client privacy regarding its training data. In this paper, we present a principled framework for evaluating and comparing different forms of client privacy leakage attacks. We first provide formal and experimental analysis to show how adversaries can reconstruct the private local training data by simply analyzing the shared parameter update from local training (e.g., local gradient or weight update vector). We then analyze how different hyperparameter configurations in federated learning and different settings of the attack algorithm may impact on both attack effectiveness and attack cost. Our framework also measures, evaluates, and analyzes the effectiveness of client privacy leakage attacks under different gradient compression ratios when using communication efficient FL protocols. Our experiments additionally include some preliminary mitigation strategies to highlight the importance of providing a systematic attack evaluation framework towards an in-depth understanding of the various forms of client privacy leakage threats in federated learning and developing theoretical foundations for attack mitigation. Keywords: Privacy leakage attacks evaluation framework

1

· Federated learning · Attack

Introduction

Federated learning enables the training of a high-quality ML model in a decentralized manner over a network of devices with unreliable and intermittent network connections [5,14,20,26,29,33]. In contrast to the scenario of prediction on edge devices, in which an ML model is first trained in a highly controlled Cloud environment and then downloaded to mobile devices for performing predictions, c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 545–566, 2020. https://doi.org/10.1007/978-3-030-58951-6_27

546

W. Wei et al.

federated learning brings model training to the devices while supporting continuous learning on device. A unique feature of federated learning is to decouple the ability of conducting machine learning from the need of storing all training data in a centralized location [13]. Although federated learning by design provides the default privacy of allowing thousands of clients (e.g., mobile devices) to keep their original data on their own devices, while jointly learn a model by sharing only local training parameters with the server. Several recent research efforts have shown that the default privacy in FL is insufficient for protecting the underlaying training data from privacy leakage attacks by gradient-based reconstruction [9,32,34]. By intercepting the local gradient update shared by a client to the FL server before performing federated averaging [13,19,22], the adversary can reconstruct the local training data with high reconstruction accuracy, and hence intrudes the client privacy and deceives the FL system by sneaking into client confidential training data illegally and silently, making the FL system vulnerable to client privacy leakage attacks (see Sect. 2.2 on Threat Model for detail). In this paper, we present a principled framework for evaluating and comparing different forms of client privacy leakage attacks. Through attack characterization, we present an in-depth understanding of different attack mechanisms and attack surfaces that an adversary may leverage to reconstruct the private local training data by simply analyzing the shared parameter updates (e.g., local gradient or weight update vector). Through introducing multi-dimensional evaluation metrics and developing evaluation testbed, we provide a measurement study and quantitative and qualitative analysis on how different configurations of federated learning and different settings of the attack algorithm may impact the success rate and the cost of the client privacy leakage attacks. Inspired by the attack effect analysis, we present some mitigation strategies with preliminary results to highlight the importance of providing a systematic evaluation framework for comprehensive analysis of the client privacy leakage threats and effective mitigation methods in federated learning.

2 2.1

Problem Formulation Federated Learning

In federated learning, the machine learning task is decoupled from the centralized server to a set of N client nodes. Given the unstable client availability [21], for each round of federated learning, only a small subset of Kt < clients out of all N participants will be chosen to participate in the joint learning. Local Training at a Client: Upon notification of being selected at round t, a client will download the global state w(t) from the server, perform a local training computation on its local dataset and the global state, i.e., wk (t + 1) = wk (t) − η∇wk (t), where wk (t) is the local model parameter update at round t and ∇w is the gradient of the trainable network parameters. Clients can decide its training batch size Bt and the number of local iterations before sharing.

Federated Learning Privacy Leakage Evaluation Framework

547

Update Aggregation at FL Server: Upon receiving the local updates from all Kt clients, the server incorporates these updates and update its global state, and initiates the next round of federated learning. Given that local updates can be in the form of either gradient or model weight update, thus two update aggregation implementations are the most representative. In distributed SGD [17,18,29,30], each client uploads the local gradients to the FL server at each round and the server iteratively aggregates the gradients from all Kt clients into the local Kt nt global model: w(t + 1) = w(t) − η k=1 n ∇wk (t), where η is the global learning rate and nnt is the weight of client k. nk is the number of data points at client k and n indicates the amount of total data from all participating clients at round t. In federated averaging [3,20], each client uploads the local training parameter update to the FL server and the server iteratively performs a weighted average the received weight parameters to update the global model: Kt of nt w w(t + 1) = k=1 n k (t + 1), where Δwk (t) denote the difference between the model parameter update before the iteration of training and the model parameter update after the training for client k. 2.2

Threat Model

In an FL system, clients are the most vulnerable attack surface for the client privacy leakage (CPL) attack. To focus on client gradient leakage vulnerability due to sharing its local gradient update, we assume that clients may be partially compromised: (i) an adversary may intercept the local parameter update prior to sending it via the client-to-server communication channel to the FL server and (ii) an adversary may gain access to the saved local model executable (checkpoint data) on the compromised client with no training data visible and launch whitebox gradient leakage attacks to steal the private training data. Note that local data and model may not be compromised at the same time for many reasons, e.g., separated storage of data and model, or dynamic training data. We also assume that the federated server is honest and will truthfully perform the aggregation of local parameter updates and manage the iteration rounds for jointly learning. However, the FL server may be curious and may analyze the periodic updates from certain clients to perform client privacy leakage attacks and gain access to the private training data of the victim clients. Given that the gradient-based aggregation and model weight-based aggregation are mathematically equivalent, one can obtain the local model parameter difference from the local gradient and the local training learning rate. In this paper, gradient-based aggregation is used without loss of generality. 2.3

The Client Privacy Leakage (CPL) Attack: An Overview

The client privacy leakage attack is a gradient-based feature reconstruction attack, in which the attacker can design a gradient-based reconstruction learning algorithm that will take the gradient update at round t, say ∇wk (t), to be shared by the client, to reconstruct the private data used in the local training

548

W. Wei et al.

computation. For federated learning on images or video clips, the reconstruction algorithm will start by using a dummy image of the same resolution as its attack initialization seed, and run a test of this attack seed on the intermediate local model, to compute a gradient loss using a vector distance loss function between the gradient of this attack seed and the actual gradient from the client local training. The goal of this reconstruction attack is to iteratively add crafted small noises to the attack seed such that the generated gradient from this reconstructed attack seed will approximate the actual gradient computed on the local training data. The reconstruction attack terminates when the gradient of the attack seed reconstructed from the dummy initial data converges to the gradient of the training data. When the gradient-based reconstruction loss function is minimized, the reconstructed attack data will also converge to the training data with high reconstruction confidence.

Algorithm 1. Gradient-based Reconstruction Attack 1: Inputs: f (x; w(t)): Differentiable learning model, ∇wk (t): gradients produced by the local training on private training data (x; y) at client k, w(t), w(t+1): model parameters before and after the current local training on (x; y), ηk learning rate of local training Attack configurations: IN IT (x.type): attack initialization method, T: attack termination condition, η  : attack optimization method, α: regularizer ratio. 2: Output: reconstructed training data (xrec ; yrec ) 3: Attack procedure 4: if wk (t + 1): then 5: Δwk (t) ← wk (t + 1) − w(t) 6: ∇wk (t) ← Δwηkk(t) 7: end if 8: x0rec ← INIT(x.type) 9: yrec ← arg mini (∇i wk (t)) 10: for τ in T do ∂loss(f (xτ τ rec ,w(t)),yrec ) (t) ← 11: ∇watt ∂w(t) τ τ 12: D ← ||∇watt (t) − ∇wk (t)||2 + α||f (xτrec , w(t)) − yrec ||2 +1 ∂D τ 13: xτrec ← xτrec − η  ∂x τ rec 14: end for

Algorithm 1 gives a sketch of the client privacy leakage attack method. Line 4– 6 convert the weight update to gradient when the weight update is shared between the FL server and the client. The learning rate ηk for local training is assumed to be identical across all clients in our prototype system. Line 8 invokes the dummy attack seed initialization, which will be elaborated in Sect. 3.1. Line 9 is to get the label from the actual gradient shared from the local training. Since the local training update towards the ground-truth label of the training input data should be the most aggressive compared to other labels, the sign of gradient for the ground-truth label of the private training data will be different than other classes, and its absolute value is usually the largest. Line 10–14 presents the

Federated Learning Privacy Leakage Evaluation Framework

549

iterative reconstruction process that produces the reconstructed private training data based on the client gradient update. If the reconstruction algorithm converges, then the client privacy leakage attack is successful, or else the CPL attack is failed. Line 12–13 show that when the L2 distance between the gradients of the attack reconstructed data and the actual gradient from the private training data is minimized, the reconstructed attack data from the dummy seed converges to the private local training data, leading to the client privacy leakage. In Line 12, a label-based regularizer is utilized to improve the stability of the attack optimization. An alternative way to reconstruct the label of the private training data is to initialize a dummy label and feed it into the iterative approximation algorithm for attack optimization [34], in a similar way as the content reconstruction optimization.

Fig. 1. Example illustration of the Client Privacy Leakage attack

Figure 1 provides a visualization of four illustrative example attacks over four datasets: MNIST [16], CIFAR10 [15], CIFAR100 [15], and LFW [12]. We make two interesting observations from Algorithm 1. First, multiple factors in the attack method could impact the attack success rate (ASR) of the client privacy leakage attack, such as the dummy attack seed data initialization method (Line 8), the attack iteration termination condition (T), the selection of the gradient loss (distance) function (Line 12), the attack optimization method (Line 13). Second, the configuration of some hyperparameters in federated learning may also impact the effectiveness and cost of the CPL attack, including batch size, training data resolution, choice of activation function, and whether the gradient update is uploaded to the FL server using baseline communication protocol or a communication efficient method. In the next section, we present the design of our evaluation framework to further characterize the client privacy leakage attack of different forms and introduce cost-effect metrics to measure and analyze the adverse effect and cost of CPL attacks. By utilizing this framework, we provide a comprehensive study on how different attack parameter configurations and federated learning hyperparameter configurations may impact the effectiveness of the client privacy leakage attacks.

550

3 3.1

W. Wei et al.

Evaluation Framework Attack Parameter Configuration

Attack Initialization: We first study how different methods for generating the dummy attack seed data may influence the effectiveness of a CPL attack in terms of reconstruction quality or confidence as well as reconstruction time and convergence rate. A straightforward dummy data initialization method is to use a random distribution in the shape of dummy data type and we call this baseline the random initialization (CPL-random). Although random initialization is also used in [9,32,34], our work is, to the best of our knowledge, the first study on variations of attack initiation methods. To understand the role of random seed in the CPL attack, it is important to understand the difference of the attack reconstruction learning from the normal deep neural network (DNN) training. In a DNN training, it takes as the training input both the fixed data-label pairs and the initialization of the learnable model parameters, and iteratively learn the model parameters until the training converges, which minimizes the loss with respect to the ground truth labels. In contrast, the CPL attack performs reconstruction attack by taking a dummy attack seed input data, a fixed set of model parameters, such as the actual gradient updates of a client local training, and the gradient derived label as the reconstructed label yrec , its attack algorithm will iteratively reconstruct the local training data used to generate the gradient, ∇wk (t), by updating the dummy synthesized seed data, following the attack iteration termination condition T, denoted by {x0rec , x1rec , ...xs T } ∈ Rd , such that the loss between the gradient of the reconstructed data xirec and the actual gradient ∇wk (t) is minimized. Here x0rec denotes the initial dummy seed. Theorem 1 (CPL Attack Convergence Theorem). Let x∗rec be the optimal synthesized data for f (x) and attack iteration t ∈ {0, 1, 2, ...T }. Given the convexity and Lipschitz-smoothness assumption, the convergence of the gradient-based reconstruction attack is guaranteed with: f (xTrec ) − f (x∗rec ) ≤

2L||x0rec − x∗rec ||2 . T

(1)

The above CPL Attack Convergence theorem is derived from the well-established Convergence Theorem of Gradient Descent. Due to the limitation of space, the formal proof of Theorem 1 is provided in the Appendix. Note that the convexity assumption is generally true since the d-dimension trainable synthesized data can be seen as a one-hidden-layer network with no activation function. The fixed model parameters are considered as the input with optimization of the least square estimation problem as stated in Line 12 of Algorithm 1. According to Theorem 1, the convergence of the CPL attack is closely related to the initialization of the dummy data x0rec . This motivates us to investigate different ways to generate dummy attack seed data. Concretely, we argue that different random seeds may have different impacts on both reconstruction learning efficiency (confidence) and reconstruction learning convergence (time or the

Federated Learning Privacy Leakage Evaluation Framework

551

number of iteration steps). Furthermore, using geometrical initialization as those introduced in [24] not only can speed up the convergence but also ensure attack stability. Consider a single layer of the neural network: g(x) = σ(wx + b), a geometrical initialization considers the form g(x) = σ(w∗ (x − b∗ ) instead of directly initialing w and b with random distribution. For example, according to [31], the following partial derivative of the geometrical initialization. ∂g = σ  (w∗ (x − b∗ ))(x − b∗ ), ∂w∗ is more independent of translation of the input space than and is therefore more stable.

(2) ∂g ∂w

= σ  (wx + b)x,

Fig. 2. Visualization of different initialization

Figure 2 provides a visualization of five different initialization methods and their impact on the CPL attack in terms of reconstruction quality and convergence (#iterations). In addition to CPL-random, CPL-patterned is a method that uses patterned random initialization. We initialize a small portion of the dummy data with a random seed and duplicate it to the entire feature space. An example of the portion can be 1/4 of the feature space. CPL-dark/light is to use a dark (or light) seed of the input type (size), whereas CPL-R.G.B. is to use red or green or blue solid color seed of the input type (size). CPL-optimal refers to the theoretical optimal initialization method, which uses an example from the same class as the private training data that the CPL attack aims to reconstruct. We observe from Fig. 2 that CPL-patterned, CPL-R.G.B., and CPL-dark/light can outperform CPL-random with faster attack convergence and more effective reconstruction confidence. We also include CPL-optimal to show that different CPL initializations can effectively approximate the optimal way of initialization in terms of reconstruction effectiveness.

552

W. Wei et al.

LFW

CIFAR100

Fig. 3. Effect of different random seed

Figure 3 shows that the CPL attacks are highly sensitive to the choice of random seeds. We conduct this set of experiments on LFW and CIFAR100, and both confirm consistently our observations: different random seeds lead to diverse convergence processes with different reconstruction quality and confidence. From Fig. 3, we observe that even with the same random seed, attack with patterned initialization is much more efficient and stable than the CPL-random. Moreover, there are situations where the private label of the client training data is successfully reconstructed but the private content reconstruction fails (see the last row for both LFW and CIFAR100). Attack Termination Condition: The effectiveness of a CPL attack is also sensitive to the setting of the attack termination condition. Recall Sect. 2.3 and Algorithm 1, there are two decision factors for termination. One is the maximum attack iteration [34] and the other is the L2 -distance threshold of the gradient loss [32], i.e., the difference between the gradient from the reconstructed data and the actual gradient from the local training using the private local data. Other gradient loss functions, e.g., cosine similarity, entropy, can be used to replace the L2 function. Table 1 compares six different settings of attack iterations for configuring termination condition. A small attack iteration cap may cause the reconstruction attack to abort before it can succeed the attack, whereas a larger attack iteration cap may increase the cost of attack. The result in Table 1 confirms that choosing the cap for attack iterations may have an impact on both attack effectiveness and cost. Table 1. Effect of termination condition maximum attack iteration 10 20 LFW CIFAR10

30

50

CPL-patterned 0

0.34 0.98 1

CPL-random

0

0

CPL-patterned 0 CPL-random

0

100

300

1

1

0.562 0.823 0.857

0.47 0.93 0.973 0.973 0.973

0

0

0

CIFAR100 CPL-patterned 0

0

0.12 0.85

0.981 0.981

0

0

0.23

CPL-random

0

0 0

0.356 0.754 0.85

Fig. 4. Effect of attack optimization

Federated Learning Privacy Leakage Evaluation Framework

553

Attack Optimization: Optimization methods, such as Stochastic Gradient descent, Adam, and Adagrad can be used to iteratively update the dummy data during the reconstruction of a CPL attack. While the first-order optimization techniques are easy to compute and less time consuming, the second-order techniques are better in escaping the slow convergence paths around the saddle points [4]. Figure 4 shows a comparison of L-BFGS [6] and Adam and their effects on the CPL-patterned attack for LFW dataset. It takes fewer attack iterations to get high reconstruction quality using the L-BFGS compared to using Adam as the optimizer in the reconstruction attack, confirming that choosing an appropriate optimizer may impact on attack effectiveness. 3.2

Hyperparameter Configurations in Federated Learning

Batch Size: Given that all forms of CPL attack methods are reconstruction learning algorithms that iteratively learn to reconstruct the private training data by inferencing over the actual gradient to perform iterative updates on the dummy attack seed data, it is obvious that a CPL attack is most effective when working with the gradient generated from the local training data of batch size 1. Furthermore, when the input data examples in a batch of size B belongs to only one class, which is often the case for mobile devices and the non-i.i.d distribution of the training data [33], the CPL attacks can effectively reconstruct the training data of the entire batch. This is especially true when the dataset has low inter-class variation, e.g., face and digit recognition. Figure 5 shows the visualization of performing a CPL-patterned attack on the LFW dataset with four different batch sizes. With a large batch size, the detail of a single input is being neutralized, making the attack reconstruction captures more of the shared features of the entire batch rather than specific to a single image.

Fig. 5. Effect of batch size in CPL-patterned attacks on LFW

554

W. Wei et al.

Training Data Resolution: In contrast to the early work [34] that fails to attack images of resolution higher than 64×64, we argue that the effectiveness of the CPL attack is mainly attributed to the model overfitting to the training data. In order to handle higher resolution training data, we double the number of filters in all convolutional layers to build a more overfitted model. Figure 6 shows the scaling results of CPL attack on the LFW dataset with input data size of 32×32, 64 × 64, and 128 × 128. CPL-random requires a much larger number of attack iterations to succeed the attack with high reconstruction performance. CPLpatterned is a significantly more effective attack for all three different resolutions with 3 to 4× reduction in the attack iterations compared to CPL-random. We also provide an example of attacking the 512 × 512 Indiana University Chest X-Rays image of very high resolution in Fig. 7.

Fig. 6. Attack iteration of the training data scaling using LFW dataset

Fig. 7. Extreme scaling case of attacking 512 × 512 Chest X-ray image

Activation Function: The next hyperparameter of FL is the activation function used in model training. We show that the performance of the CPL attacks is highly related to the choice of the activation function. Figure 8 compares the attack iterations and attack success rate of CPL-patterned attack with three different activation functions: Sigmoid, Tanh, and LeakReLU. Due to space constraint, the results on MNIST is omitted. We observe that ReLU naturally prevents the full reconstruction of the training data using gradient because the gradient of the negative part of ReLU will be 0, namely, that part of the trainable parameters will stop responding to variations in error and will not get adjusted during optimization. This dying ReLU problem takes out the gradient information needed for CPL attacks. In comparison, both Sigmoid and Tanh are differentiable bijective and can pass the gradient from layer to layer in an almost lossless manner. LeakyReLU sets a slightly inclined line for the negative part of ReLU to mitigate the issue of dying ReLU and thus is vulnerable to CPL attacks.

Federated Learning Privacy Leakage Evaluation Framework

555

Fig. 8. Effect of activation function on the CPL attack.

Motivated by the impact of activation function, we argue that any model components that discontinue the integrity and uniqueness of gradients can hamper CPL attacks. We observe from our experiments that an added dropout structure τ (t) elusive and unable to enables different gradient in every query, making ∇watt converge to the uploaded gradients. By contrast, pooling cannot prevent CPL attacks since pooling layers do not have parameters.

Fig. 9. Illustration of the CPL attack under communication-efficient update

Baseline Protocol v.s. Communication-Efficient Protocol: In the baseline communication protocol, the client sends a full vector of local training parameter update to the FL server in each round. For federated training of large models on complex data, this step is known to be the bottleneck of Federated Learning. Communication-efficient FL protocols were proposed [14,20] to improve the communication efficiency of parameter update and sharing by employing high precision vector compression mechanisms, such as structured updates and sketched updates. As more FL systems utilize a communicationefficient protocol to replace the baseline protocol, it is important to study the impact of using a communication efficient protocol on the performance of the CPL attacks, especially compared to the baseline client-to-server communication protocol. In this set of experiments, we measure the performance

556

W. Wei et al.

of CPL attacks under varying gradient compression percentage θ, i.e., θ percentage of the gradient update will be discarded in this round of gradient upload. We employ the compression method in [17] as it provides a good tradeoff between communication-efficiency and model training accuracy. It leverages sparse updates and sends only the important gradients, i.e., the gradients whose magnitude larger than a threshold, and further measures are taken to avoid losing information. Locally, the client will accumulate small gradients and only send them when the accumulation is large enough. Figure 9 shows the visualization of the comparison on MNIST and CIFAR10. We observe that compared to the baseline protocol with full gradient upload, using the communication efficient protocol with θ up to 40%, the CPL attack remains to be effective for CIFAR10. Section 4.2 provides a more detailed experimental study on CPL attacks under communication-efficient FL protocol. 3.3

Attack Effect and Cost Metrics

Our framework evaluates the adverse effect and cost of CPL attacks using the following metrics. For data-specific metrics, we average the evaluation results over all successful reconstructions. Attack Success Rate (ASR) is the percentage of successfully reconstructed training data over the number of training data being attacked. We use ASRc and ASRl to refer to the attack success rate on content and label respectively. MSE uses the root mean square deviation to measure the Msimilarity between2 1 reconstructed input xrec and ground-truth input x: M i=1 (x(i) − xrec (i)) when the reconstruction is successful. M denotes total number of features in the input. MSE can be used on all data format, e.g. attributes and text. A smaller MSE means the more similar reconstructed data to the private ground truth. SSIM measures the structural similarity between two images based on a perception-based model [28] that considers image degradation as perceived change. Attack Iteration measures the number of attack iterations required for reconstruction learning to converge and thus succeed the attack, e.g., L2 distance of the gradients between the reconstructed data and the local private training data is smaller than a pre-set threshold.

4 4.1

Experiments and Results Experiment Setup

We evaluate CPL attacks on four benchmark datasets: MNIST, LFW, CIFAR10, CIFAR100. MNIST consists of 70000 grey-scale hand-written digits images of size 28 × 28. The 60000:10000 split is used for training and testing data. Labeled Faces in the Wild (LFW) people dataset has 13233 images from 5749 classes.

Federated Learning Privacy Leakage Evaluation Framework

557

The original image size is 250 × 250 and we slice it to 32 × 32 and extract the ‘interesting’ part. Our experiments only consider 106 classes, each with more than 14 images. For a total number of 3735 eligible LFW data, a 3:1 train-test ratio is applied. CIFAR10 and CIFAR100 both have 60000 color images of size 32 × 32 with 10 and 100 classes respectively. The 50000:10000 split is used for training and testing. We perform CPL attacks with the following configurations as the default unless otherwise stated. The initialization method is patterned, the maximum attack iterations are 300, the optimization method is L-BFGS with attack learning rate 1. The attack is performed with full gradient communication. For each dataset, the attack is performed on 100 images with 10 different random seeds. For MNIST and LFW, we use a LeNet model with 0.9568 benign accuracy on MNIST and 0.695 on LFW. For CIFAR10 and CIFAR100, we apply a ResNet20 with benign accuracy of 0.863 on CIFAR10 and CIFAR100. We use 100 clients as the total client population and at each communication round, 10% of clients will be selected randomly to participate in federated learning. 4.2

Gradient Leakage Attack Evaluation

Comparison with Other Gradient Leakage Attacks. We first conduct a set of experiments to compare the CPL-patterned attack with two existing gradient leakage attacks: the deep gradient attack [34], and the gradient inverting attack [9], which replaces the L2 distance function with cosine similarity and performs the optimization on the sign of the gradient. We measure the attack results on the four benchmark image datasets in Table 2. For all four datasets, the CPL attack is a much faster and more efficient attack with the highest attack success rate (ASR) and the lowest attack iterations on both content and label reconstruction. Meanwhile, the high SSIM and low MSE for CPL attack indicate the quality of the reconstructed data is almost identical to the private training data. We also observe that gradient inverting attack [9] can achieve a high ASR compared to deep gradient attack [34] but at a great cost of attack iterations. Note that the CPL attack offers slightly higher ASR compared to [9] but at much lower attack cost in terms of the required attack iteration. Table 2. Comparison of different gradient leakage attacks CIFAR10 CPL attack iter28.3

CIFAR100

LFW

MNIST

[34]

[9]

CPL

[34]

[9]

CPL

[34]

[9]

CPL

[34]

[9]

114.5

6725

61.8

125

6813

25

69.2

4527

11.5

18.4

3265 0.784

ASRc

0.973

0.754

0.958

0.981

0.85

0.978

1

0.857

0.974

1

0.686

ASRl

1

0.965

1

1

0.94

1

1

0.951

1

1

0.951

1

0.959

0.953

0.958

0.998

0.997

0.9978 0.99

0.985

0.989

SSIM MSE

0.9985 0.9982 0.9984

2.2E-042.5E-042.2E-045.4E-046.5E-045.4E-042.2E-042.9E-042.3E-041.5E-051.7E-051.6E-05

558

W. Wei et al. Table 3. Comparison of different geometrical initialization in CPL attacks

Variation Study: Geometrical Initialization. This set of experiments measure and compare the four types of geometrical initialization methods: patterned random, dark/light, R.G.B., and optimal. For optimal initialization, we feed a piece of data that is randomly picked from the training set. This assumption is reasonable when different clients hold part of the information about one data item. Table 3 shows the result. We observe that the performance of all four geometrical initializations is always better than the random initialization (see the bold highlight in Table 3). Note that the optimal initialization is used in this experiment as a reference point as it assumes the background information about the data distribution. Furthermore, the performance of geometrical initializations is also dataset-dependent. CPL attack on CIFAR100 requires a longer time and more iterations to succeed than CPL on CIFAR10 and LFW. Variation Study: Batch Size and Iterations. Motivated by the batch size visualization in Fig. 5, we study the impact of hyperparameters used in local training, such as batch size and the number of local training iterations, on the performance of CPL attack. Table 4a shows the results of the CPL attack on the LFW dataset with five different batch sizes. We observe that the ASR of CPL attack is decreased to 96%, 89%, 76%, and 13% as the batch size increases to 2, 4, 8, and 16. The CPL attacks at different batch sizes are successful at the attack iterations around 25 to 26 as we measure attack iterations only on successfully reconstructed instances. Table 4b shows the results of the CPL attack under five different settings of local iterations before sharing the gradient updates. We show that as more iterations are performed at local training before sharing the gradient update, the ASR of the CPL attack is decreasing with 97%, 85%, and 39% for iterations of 5, 7, and 9 respectively. This result confirms our analysis in Sect. 3.2 that a larger batch size for local training prior to sharing the local gradient update may help mitigate the CPL attack because the shared gradient data capture more shared features among the images in the batch instead of more specific to an individual image.

Federated Learning Privacy Leakage Evaluation Framework

559

Table 4. Effect of local training hyperparameters on CPL attack (LFW)

Variation Study: Leakage in Communication Efficient Sharing. This set of experiments measures and compares the gradient leakage in CPL under baseline protocol (full gradient sharing) and communication-efficient protocol (significant gradient sharing with low-rank filer). Table 5 shows the result. To illustrate the comparison results, we provide the accuracy of the baseline protocol and the communication-efficient protocol of varying compression percentages on all four benchmark datasets in Table 5(a). We make two interesting observations. (1) CPL attack can generate high confidence reconstructions (high ASR, high SSIM, low MSE) for MNIST and CIFAR10 at compression rate 40%, and for Table 5. Effect of CPL attack under communication-efficient FL protocols

560

W. Wei et al. Table 6. Mitigation with Gaussian noise and Laplace noise

CIFAR100 and LFW at the compression rate of 90%. Second, as the compression percentage increases, the number of attack iterations to succeed the CPL attack decreases. This is because a larger portion of the gradients are low significance and are set to 0 by compression. When the attack fails, it indicates that the reconstruction cannot be done even with the infinite(∞) attack iterations, but we measure SSIM and MSE of the failed attacks at the maximum attack iterations of 300. (2) CPL attacks are more severe with more training labels in the federated learning task. A possible explanation is that informative gradients are more concentrated when there are more classes. 4.3

Mitigation Strategies

Gradient Perturbation with Additive Noise. We consider Gaussian noise and Laplace noise with zero means and different magnitude of variance in this set of experiments. Table 6 provides the mitigation results on CIFAR100 and LFW. In both cases, the client privacy leakage attack is largely mitigated at some cost of accuracy if we add sufficient Gaussian noise (G-10e-2) or Laplace noise (L10e-2). Figure 10 illustrates the effect of the additive noise using four examples. The large additive noise we use to obfuscate the gradient while performing the reconstruction, the small SSIM and the large MSE indicate the more dissimilar between the gradient of the reconstructed data and the gradient of the original sensitive input, leading to poor quality of the CPL attack. Gradient Squeezing with Controlled Local Training Iterations. Instead of sharing the gradient from the local training computation at each round t, we schedule and control the sharing of the gradient only after M iterations of local training. Table 7 shows the results of varying M from 1 to 10 with step 1. It shows that as M increases, the ASR of CPL attack starts to decrease, with 97.1%, 83.5%, 50% and 29.2% for M = 3, 5, 8 and 10 respectively for CIFAR10, and with 100%, 97%, 78% and 7% for M = 3, 5, 8 and 10 respectively for LFW. An

Federated Learning Privacy Leakage Evaluation Framework

Fig. 10. Effect of additive noise

561

Fig. 11. Effect of local training

example of gradient squeezing with controlled local training iterations is provided in Fig. 11. This preliminary mitigation study shows that clients in federated learning may adopt some attack resilient optimizations when configuring their local training hyperparameters.

5

Related Work

Privacy in federated learning has been studied in two contexts: training-phase privacy attacks and prediction-phase privacy attacks. Gradient leakage attacks, formulated in CPL of different forms, or those in literature [9,32,34], are one type of privacy exploits in the training phase. In addition, Aono et al [1,2] proposed a privacy attack, which partially recovers private data based on the proportionality between the training data and the gradient updates in multi-layer perceptron models. However, their attack is not suitable for convolutional neural networks because the size of the features is far larger than the size of convolution weights. Hitaj et al [11] poisoned the shared model by introducing mislabeled samples into the training. In comparison, gradient leakage attacks are more aggressive since client privacy leakage attacks make no assumption on direct access to the training data as those in training poisoning attacks and yet can compromise the private training data by reconstruction attack based on only the local parameter updates to be shared with the federated server. Privacy exploits at the prediction phase include model inversion, membership inference, and GAN-based reconstruction attack [10,11,27]. Fredrikson et al. [7] proposed the model inversion attack to exploit confidence values revealed along with predictions. Ganju et al. [8] inferred the global properties of the training data from a trained white-box fully connected neural network. Membership attacks [23,25] exploited the statistical differences between the model prediction on its training set and the prediction on unseen data to infer the membership of training data.

562

W. Wei et al. Table 7. Mitigation with controlled local training iterations

6

Conclusion

We have presented a principled framework for evaluating and comparing different forms of client privacy leakage attacks. This framework showed that an effective mitigation method against client gradient leakage attacks should meet the two important criteria: (1) the defense can mitigate the gradient leakage attacks no matter how the attacker configures his reconstruction algorithm to launch the attack; and (2) the defense can mitigate the attacks no matter how the FL system is configured for joint training. Extensive experiments on four benchmark datasets highlighted the importance of providing a systematic evaluation framework for an in-depth understanding of the various forms of client privacy leakage threats in federated learning and for developing and evaluating different mitigation strategies. Acknowledgements. The authors acknowledge the partial support from NSF CISE SaTC 1564097, NSF 2038029 and an IBM Faculty Award.

7 7.1

Appendices Proof of Theorem 1

Assumption 1 (Convexity). we say f (x) is convex if f (αx + (1 − α)x ) ≤ αf (x) + (1 − α)f (x ), where x, x are data point in Rd , and α ∈ [0, 1].

(3)

Federated Learning Privacy Leakage Evaluation Framework

563

Lemma 1. If a convex f (x) is differentiable, we have: f (x ) − f (x) ≥ ∇f (x), x − x.

Proof. Equation 3 can be rewritten as: α → 0, we complete the proof.

f (x +α(x−x ))−f (x ) α

(4)

≤ f (x) − f (y). When

Assumption 2 (Lipschitz Smoothness). With Lipschitz continuous on the differentiable function f (x) and Lipschitz constant L, we have: ||∇f (x) − ∇f (x ) ≤ L||x − x ||,

(5)

Lemma 2. If f (x) is Lipschitz-smooth, we have: f (xt+1 ) − f (xt ) ≤ −

1 ||∇f (xT )||22 2L

(6)

Proof. Using the Taylor expansion of f (x) and the uniform bound over Hessian matrix, we have f (x ) ≤ f (x) + ∇f (x), x − x + By inserting x = x − f (x −

1 L ∇f (x)

L  ||x − x||22 . 2

(7)

into Eq. 5 and Eq. 7, we have:

1 1 1 L 1 ∇f (x)) − f (x) ≤ − ∇f (x), ∇f (x) + || ∇f (x)||22 = − ||∇f (x)||22 L L 2 L 2L

Lemma 3 (Co-coercivity). A convex and Lipschitz-smooth f (x) satisfies: ∇f (x ) − ∇f (x), x − x ≥

1 ||∇f (x ) − ∇f (x)|| L

(8)

Proof. Due to Eq. 5, ∇f (x ) − ∇f (x), x − x ≥ ∇f (x ) − ∇f (x),

1 1 (∇f (x ) − ∇f (x)) = ||∇f (x ) − ∇f (x)|| L L

Then we can proof the attack convergence theorem: f (xT ) − f (x∗ ) ≤ 2L||x0 −x∗ ||2 . T Proof. Let f (x) be convex and Lipschitz-smooth. It follow that 1 ∇f (xt )||22 L 1 1 = ||xt − x∗ ||22 − 2 xt − x∗ , ∇f (xt ) + 2 ||∇f (xt )||22 L L 1 t ∗ 2 t 2 ≤ ||x − x ||2 − 2 ||∇f (x )||2 L

||xt+1 − x∗ ||22 = ||xt − x∗ −

(9)

564

W. Wei et al.

Equation 9 holds due to Eq. 8 in Lemma 3. Recall Eq. 6 in Lemma 2, we have: f (xt+1 ) − f (x∗ ) ≤ f (xt ) − f (x∗ ) −

1 ||∇f (xt )||22 . 2L

(10)

By applying convexity, f (xt ) − f (x∗ ) ≤ ∇f (xt ), xt − x∗  ≤ ||∇f (xt )||2 ||xt − x∗ || ≤ ||∇f (xt )||2 ||x1 − x∗ ||.

(11)

Then we insert Eq. 11 into Eq. 10: 1 1 (f (xt ) − f (x∗ ))2 2L ||x1 − x∗ ||2 1 f (xt ) − f (x∗ ) 1 ≤ − β ⇒ f (xt ) − f (x∗ ) f (xt+1 ) − f (x∗ ) f (xt+1 ) − f (x∗ ) 1 1 ⇒ ≤ −β f (xt ) − f (x∗ ) f (xt+1 ) − f (x∗ ) 1 1 ⇒β≤ − , t+1 ∗ t f (x ) − f (x ) f (x ) − f (x∗ )

f (xt+1 ) − f (x∗ ) ≤ f (xt ) − f (x∗ ) −

(12) (13) (14)

1 1 t+1 where β = 2L )− ||x1 −x∗ ||2 . Equation 12 is done by divide both side with (f (x ∗ t ∗ t+1 ∗ t ∗ f (x ))(f (x ) − f (x )) and Eq. 13 utilizes f (x ) − f (x ) ≤ f (x ) − f (x ). Then, following by induction over t = 0, 1, 2, ..T − 1 and telescopic cancellation, we have 1 1 1 − ≤ . Tβ ≤ f (xT ) − f (x∗ ) f (x0 ) − f (x∗ ) f (xT ) − f (x∗ )

1 1 1 − ≤ ∗ 0 ∗ T − f (x ) f (x ) − f (x ) f (x ) − f (x∗ ) 1 T 1 ⇒ ≤ 2L ||x1 − x∗ ||2 f (xT ) − f (x∗ ) 2L||x0 − x∗ ||2 . ⇒ f (xT ) − f (x∗ ) ≤ T

Tβ ≤

f (xT )

(15) (16) (17)

Thus complete the proof.

References 1. Phong, L.T., Aono, Y., Hayashi, T., Wang, L., Moriai, S.: Privacy-preserving deep learning: revisited and enhanced. In: Batten, L., Kim, D.S., Zhang, X., Li, G. (eds.) ATIS 2017. CCIS, vol. 719, pp. 100–110. Springer, Singapore (2017). https://doi. org/10.1007/978-981-10-5421-1 9 2. Aono, Y., Hayashi, T., Wang, L., Moriai, S., et al.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13(5), 1333–1345 (2017)

Federated Learning Privacy Leakage Evaluation Framework

565

3. Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., Shmatikov, V.: How to backdoor federated learning. arXiv preprint arXiv:1807.00459 (2018) 4. Battiti, R.: First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput. 4(2), 141–166 (1992) 5. Bonawitz, K., et al.: Towards federated learning at scale: System design. In: Proceedings of the 2nd SysML Conference, pp. 619–633 (2018) 6. Fletcher, R.: Practical Methods of Optimization. Wiley, Hoboken (2013) 7. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015) 8. Ganju, K., Wang, Q., Yang, W., Gunter, C.A., Borisov, N.: Property inference attacks on fully connected neural networks using permutation invariant representations. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–633 (2018) 9. Geiping, J., Bauermeister, H., Dr¨ oge, H., Moeller, M.: Inverting gradients-how easy is it to break privacy in federated learning? arXiv preprint arXiv:2003.14053 (2020) 10. Hayes, J., Melis, L., Danezis, G., De Cristofaro, E.: Logan: evaluating privacy leakage of generative models using generative adversarial networks. arXiv preprint arXiv:1705.07663 (2017) 11. Hitaj, B., Ateniese, G., Perez-Cruz, F.: Deep models under the GAN: information leakage from collaborative deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618 (2017) 12. Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Technical report (2008) 13. Kamp, M., et al.: Efficient decentralized deep learning by dynamic model averaging. In: Berlingerio, M., Bonchi, F., G¨ artner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 393–409. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7 24 14. Koneˇcn` y, J., McMahan, H.B., Yu, F.X., Richt´ arik, P., Suresh, A.T., Bacon, D.: Federated learning: strategies for improving communication efficiency. In: NIPS Workshop on Private Multi-Party Machine Learning (2016) 15. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. In: Technical report (2009) 16. LeCun, Y., Cortes, C., Burges, C.J.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist10, 34 (1998) 17. Lin, Y., Han, S., Mao, H., Wang, Y., Dally, W.J.: Deep gradient compression: reducing the communication bandwidth for distributed training. In: International Conference on Learning Representations (2018) 18. Liu, W., Chen, L., Chen, Y., Zhang, W.: Accelerating federated learning via momentum gradient descent. IEEE Trans. Parallel Distrib. Syst. (2020) 19. Ma, C., et al.: Adding vs. averaging in distributed primal-dual optimization. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1973–1982 (2015) 20. McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communicationefficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282 (2017) 21. McMahan, B., Ramage, D.: Federated learning: Collaborative machine learning without centralized training data. Google Res. Blog 3 (2017)

566

W. Wei et al.

22. McMahan, H.B., Moore, E., Ramage, D., Arcas, B.A.: Federated learning of deep networks using model averaging. corr abs/1602.05629 (2016). arXiv preprint arXiv:1602.05629 (2016) 23. Melis, L., Song, C., De Cristofaro, E., Shmatikov, V.: Exploiting unintended feature leakage in collaborative learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706. IEEE (2019) 24. Rossi, F., G´egout, C.: Geometrical initialization, parametrization and control of multilayer perceptrons: application to function approximation. In: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN 1994), vol. 1, pp. 546–550. IEEE (1994) 25. Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017) 26. Vanhaesebrouck, P., Bellet, A., Tommasi, M.: Decentralized collaborative learning of personalized models over networks. In: Artificial Intelligence and Statistics (2017) 27. Wang, Z., Song, M., Zhang, Z., Song, Y., Wang, Q., Qi, H.: Beyond inferring class representatives: user-level privacy leakage from federated learning. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 2512–2520. IEEE (2019) 28. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004) 29. Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019) 30. Yao, X., Huang, T., Zhang, R.X., Li, R., Sun, L.: Federated learning with unbiased gradient aggregation and controllable meta updating. arXiv preprint arXiv:1910.08234 (2019) 31. Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Netw. 3(6), 889–898 (1992) 32. Zhao, B., Mopuri, K.R., Bilen, H.: iDLG: Improved deep leakage from gradients. arXiv preprint arXiv:2001.02610 (2020) 33. Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with non-IID data. arXiv preprint arXiv:1806.00582 (2018) 34. Zhu, L., Liu, Z., Han, S.: Deep leakage from gradients. In: Advances in Neural Information Processing Systems, pp. 14747–14756 (2019)

Network Security II

An Accountable Access Control Scheme for Hierarchical Content in Named Data Networks with Revocation Nazatul Haque Sultan1(B) , Vijay Varadharajan1 , Seyit Camtepe2 , and Surya Nepal2 1

University of Newcastle, Callaghan, Australia {Nazatul.Sultan,Vijay.Varadharajan}@newcastle.edu.au 2 CSIRO Data61, Marsfield, NSW 2122, Australia {seyit.camtepe,surya.nepal}@data61.csiro.au

Abstract. This paper presents a novel encryption-based access control scheme to address the access control issues in Named Data Networking (NDN). Though there have been several recent works proposing access control schemes, they are not suitable for many large scale real-world applications where content is often organized in a hierarchical manner (such as movies in Netflix) for efficient service provision. This paper uses a cryptographic technique, referred to as Role-Based Encryption, to introduce inheritance property for achieving access control over hierarchical contents. The proposed scheme encrypts the hierarchical content in such a way that any consumer who pays a higher level of subscription and is able to access (decrypt) contents in the higher part of the hierarchy is also able to access (decrypt) the content in the lower part of the hierarchy using their decryption keys. Additionally, our scheme provides many essential features such as authentication of the consumers at the very beginning before forwarding their requests into the network, accountability of the Internet Service Provider, consumers’ privilege revocations, etc. In addition, we present a formal security analysis of the proposed scheme showing that the scheme is provably secure against Chosen Plaintext Attack. Moreover, we describe the performance analysis showing that our scheme achieves better results than existing schemes in terms of functionality, computation, storage, and communication overhead. Our network simulations show that the main delay in our scheme is due to cryptographic operations, which are more efficient and hence our scheme is better than the existing schemes. Keywords: Named Data Networking · Access control · Accountability · Revocation · Encryption · Authentication security

1

· Provable

Introduction

Recent development in Named Data Networking (NDN) has given a new direction to build the future Internet. Unlike the traditional host-centric IP-based c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 569–590, 2020. https://doi.org/10.1007/978-3-030-58951-6_28

570

N. H. Sultan et al.

networking architecture, the emphasis in NDN is shifted from where the data is located to what the data is. NDN is built by keeping data (or content) as its pivotal point, which helps to cope with the bandwidth insufficiency issue of the traditional IP-based networking architecture [1]. Recently, Cisco predicted that 82% of the world business traffic will be constituted by Internet video by the year 2022 [2]. Cisco also predicted that there will be 5.3 billion total Internet users (66 percent of global population) by 2023 [3]. As such, the current host-tohost communication network (i.e., IP-based networking architecture) has become inadequate for such content and user intensive services [4]. On the other hand, the services of NDN such as in-network cache, support of mobility, multicast and in-built security make it an ideal architecture for today’s ever growing content and user intensive services in the Internet [5]. In NDN, each content or data is represented using unique content name. A consumer (or user) sends interest packet containing the desired content name to the Content Provider. Once the interest packet reaches the Content Provider, it sends the requested content in a data packet. While forwarding the data packet, NDN routers keep a copy of the data packet in its cache, so that if any further interest packets arrive (either from the same consumer or from other consumers) for the same content, the routers can satisfy the requests by sending the data packets without contacting the actual Content Provider. This enables NDN to utilize efficiently the network resources such as network bandwidth, routers’ cache space, and lower latency content delivery to consumers. Although NDN provides the aforementioned benefits, it also brings new security challenges among which access control is an important one [5]. In NDN, as routers to keep copies of the data packets in their in-network caches, unauthorized consumers can easily obtain their desired content from the network without Content Providers’ permission. This has a negative impact on Content Providers’ business. Therefore, an effective access control mechanism is critical for the successful deployment of NDN. In the recent years, several research works have addressed the access control problem in data centric networks. In general, these solutions can be categorised into two groups, namely authentication-based solutions and encryption-based solutions. We will describe the existing solutions in the Related Work section. The main rationale behind our proposed scheme is that most Content Providers arrange content in hierarchical structure providing better consumer choice as well as offering better flexibility for providing different subscription options for different categories based on consumers’ preferences. Furthermore, hierarchical representation allows the inheritance property to be taken into account in system design. For instance, if a consumer is subscribing to a higherlevel content (in the hierarchy) by paying a higher subscription fee, the consumer can also get access to the content at the lower level in the hierarchy. We have previously proposed a new encryption scheme called Role Based Encryption (RBE) [6–9], which takes into account of hierarchies in roles. Furthermore, RBE, by design, takes into account the inheritance property, which helps to simplify the key management problem. In this paper, we have used RBE to develop a novel

An Accountable Access Control Scheme for Hierarchical Content

571

encryption based approach for a secure access control scheme for hierarchically organized content in NDN. As far as we are aware, our proposed scheme is the first one that considers the design of a secure access control scheme for hierarchically organized encrypted content in NDNs. Note that our previous RBE schemes [6–9] addressed secure access control on encrypted data stored in the cloud. As there are fundamental differences between NDN and host-centric IP-based network cloud architectures, our work described in this paper is different from our previous works on RBE for cloud data security. Furthermore, certain mechanisms such as partial decryption computations done by cloud service providers on behalf of users are not suitable for the NDN environment. Another important characteristic of our scheme is that of ISP (Internet Service Provider) accountability. Due to caches in routers, the ISP is able to satisfy the consumer requests for content. This makes it hard for the Content Provider to know the exact service amount that the ISP has provided. Our scheme enables the ISP to provide unforgeable evidence to the Content Provider on the number of consumers’ requests serviced by the ISP. This makes it easier for the Content Provider to pay the ISP based on usage of content by its consumers. It can also be useful to have feedback information about consumers’ preferences and content popularity to help the Content Provider improve its content services.

All contents

TV Series

Low quality

Drama

AcƟon

...

Movies

Standard quality

High DefiniƟon

Low quality

Standard quality

...

...

...

...

AnimaƟon

...

Drama

High DefiniƟon

AcƟon

...

AnimaƟon

Free Contents

Fig. 1. Sample content hierarchy

1.1

Related Work

In the recent years, several research works have been proposed to address the access control problem in data centric networks, which are broadly categorised into two main groups, namely authentication and encryption [10]. The main idea of the authentication-based access control schemes is to verify authenticity of the consumers’ content requests either at the routers (edge and internal routers) or Content Provider (CP) side before sending the contents. In [11], an access control scheme is proposed using a signature mechanism, where each router verifies integrity and authenticity of the encrypted contents before caching, and later send the cached contents only to the authorized consumers. In [12] and [13], CP requires to authenticate the consumers before sending any contents to them.

572

N. H. Sultan et al.

Moreover, [12] requires a secure-channel for sending the contents. In [14], the routers verify authenticity of the consumers using tags1 , which are assigned to them by the CP, before sending the contents to the consumers. In [15], an access control framework is proposed using the traditional Kerberos 2 mechanism. The scheme uses two separate online entities for authentication and authorization of the consumers. A shared key is established between the CP and the consumer for content encryption. In [16], CP verifies the authenticity of a consumer before establishing a session key for sending the encrypted contents. It is observed that all the existing authentication-based access control schemes: increases overhead at the routers [11,14], limits the utilization of NDN resources3 [12–14,16] or needs online authority [12,13,15,16], which are not desirable for NDN [5]. On the hand, the main idea of the encryption-based access control schemes is to encrypt the contents that allows only the authorized consumers to decrypt. In the literature, different encryption techniques (e.g., Attribute-Based Encryption (ABE) [17], Broadcast Encryption (BE) [10,18–20], Identity-Based Encryption (IBE) [21], Proxy Re-encryption (PRE) [22], etc.) or a combination of those techniques have been used [23]. In [17], an ABE scheme for access control is proposed which allows any consumer having a qualified decryption key (i.e., a qualified set of attributes) to decrypt the ciphertexts. In [18], a BE scheme is proposed to share contents among a set of n consumers and can revoke t number of consumers from the system. In [19], another broadcast encryption technique is proposed, which integrates time tokens in the encrypted contents, and allows only the authorized consumers satisfying the time limitation to decrypt the ciphertexts. In [10] and [20], two BE based access control schemes have been proposed, which enables the edge routers to block interest requests of unauthorized consumers from entering the network using a signature-based authentication mechanism. [10] and [20] also address accountability issue of the ISP. In [21], a third-party entity (“Online Shop”) uses an IBE based access control mechanism to share encrypted contents with the authorized consumers. In [23], another IBE based access control scheme is proposed, which also uses proxy re-encryption and broadcast encryption techniques, that embeds consumers’ identities along with their subscription time in the encrypted content itself. This enables the consumers to access the contents within their respective subscription time. In [22], an in-device proxy re-encryption model is proposed that delegates the functionality of a proxy server to the consumers. However, all existing encryption-based works are not suitable for platforms like content services (such as Netflix (www. netflix.com) and Disney+ (www.disneyplus.com)), where contents are organized in hierarchies, such as genre, geographic region or video quality. Figure 1 shows an example of such a content hierarchy based video streaming service. To share the contents with the authorized consumers, the existing schemes would require issuing separate secret keys per category of content, which could be cumbersome. 1 2 3

Each tag contains a signature, validity period, etc. www.tools.ietf.org/html/rfc4120. An encrypted content can only be shared with a particular consumer. As such, the other authorized consumers cannot take benefit of the cached contents.

An Accountable Access Control Scheme for Hierarchical Content

573

Further, most of the schemes either do not support immediate revocation4 , e.g., [10,17,19], etc. (which is one of the essential requirements for NDN) or provides an inefficient revocation mechanism [18]. Moreover, authentication of the interest requests of the consumers before forwarding into the network at very beginning5 is only provided by a few schemes such as [10,19,20]. However, [10,19,20] are vulnerable to collusion attack between a revoked consumer and ISP (i.e., edge routers). 1.2

Contributions

This paper proposes a novel encryption-based access control scheme for content organized in a hierarchical structure in NDN. A practical insight in this scheme is to reflect the real-world environment of the organization of content into hierarchical categories. Each category of content is then associated with a group of consumers who have subscribed to it and assigned appropriate unique decryption keys. The contents in a category are encrypted in such a way that any consumer associated with that category and ancestor categories of the content hierarchy can decrypt the content using their decryption key. This is where the idea from our previous work on RBE has been used. Content hierarchy is discussed further in Sect. 3.1 and 4.1. Our secure access scheme for NDN also achieves accountability enabling the ISP to provide unforgeable evidence (on the number of consumers’ requests that it has serviced) to content providers. Furthermore, our scheme offers an efficient mechanism for revocation of privileges, which is a major stumbling block for the real-world deployment of NDN in content distribution. The major contributions of this paper are as follows: – Proposed a novel encryption based access control scheme for hierarchical content in NDN – Designed an efficient signature mechanism, which enables the edge routers to authenticate the consumers’ data requests before forwarding them to the Content Provider – Proposed an accountability mechanism enabling the Content Provider to verify ISP’s services to its consumers, thereby allowing the Content Provider to pay the correct amount to the ISP for its services – Proposed a batch signature verification mechanism to verify correctly and efficiently the ISP’s service – Introduced revocation mechanisms, enabling the Content Provider to revoke the privileges of consumers from accessing its contents in an efficient manner. We introduce both expiration based and immediate revocation mechanisms. The former mechanism automatically revokes consumers after expiry of their subscription or validity periods. The latter mechanism enables the 4 5

Privilege revocation of one or more consumers at any time. This is another essential requirement for NDN, which helps to prevent entering bogus interest requests into the network. This, in turn, prevents DoS attacks.

574

N. H. Sultan et al.

CP to revoke one or more consumers at anytime from accessing contents of a category. – Provided a formal security analysis of the proposed scheme to demonstrate that the scheme is provably secure against Chosen Plaintext Attack (CPA) – Demonstrated that the proposed scheme performs better than the existing works in terms of functionality, computation and communication overhead This paper is organized as follows: Sect. 2 gives proposed model which includes system model, adversary model, and security model. Sect. 3 describes some preliminaries including content hierarchy, bilinear map, and complexity assumption, which are essential to understand our proposed scheme. Section 4 presents our proposed scheme in details. Analysis of the proposed scheme is provided in Sect. 5 and finally, Sect. 6 concludes this paper. NETFLIX Content Provider Internet Service Provider

Disney+ Content Provider

Cloud

User

User Intermediate Router

Edge Router

Fig. 2. System model

2 2.1

Proposed Model System Model

Figure 2 shows NDN architecture with many Content Providers, an Internet Service Provider, and a large number of Consumers. Content Providers (CPs) – They are the content producers like Netflix, Disney+ and Prime Video (www.primevideo.com). Each CP maintains a content hierarchy. A consumer may subscribe to one or more categories of content in the content hierarchy. For each subscription, CP issues access tokens in the form of secret keys to the consumers. Moreover, CP encrypts its contents before broadcasting them to the consumers. Internet Service Provider (ISP) – It is responsible for managing the NDN network. It mainly consists of two types of routers, namely Edge Routers (ERs) and Intermediate Routers (IRs). Both ERs and IRs are responsible for forwarding consumers’ interest packets to the CPs. The ERs and IRs are cache-enabled and hence can satisfy the consumers’ interests without contacting the CPs if the content consumers’ have the requested is in their caches. Additionally, ERs are responsible for authenticating the consumers before forwarding the interest packets into the network.

An Accountable Access Control Scheme for Hierarchical Content

575

Consumers – They are the users of the content. Each consumer must register with a CP and subscribes for one or more categories of content. 2.2

Adversary Model

Our adversary model consists of two types of adversaries– ISP and consumers. Honest-but-Curious and Greedy ISP – ISP is considered to be honest-butcurious. As a business, the ISP’s aim is to increase its revenue from the CPs. As such, it honestly performs all the tasks assigned to it thereby providing incentives for the CPs to use its services. At the same time, it could be interested in learning as much as possible about the plaintext content. Moreover, ISPs can exaggerate or even cheat the CP, when it comes to the services provided to the consumers for gaining more financial benefit. In this sense, we regard the ISP to be greedy. Malicious Consumers – The consumers are considered to be malicious in that they may collude among themselves or even with the ISP to gain access to content beyond what they are authorised to access. Note that, CPs are assumed as trusted entities, as they are the actual producers of the content and hence have a vested interest in protecting their content. 2.3

Security Model

The security model of the proposed scheme is defined by the Semantic Security against Chosen Plaintext Attacks (IND-CPA). IND-CPA is illustrated using a security game between a challenger C and an adversary A, which is presented in Appendix A.

1

2

4

2

3

5

1′

1

6

7

(a) Content Hierarchy 1.

4

3

5

6

7

(b) Content Hierarchy 2.

Fig. 3. Sample content hierarchies.

576

3 3.1

N. H. Sultan et al.

Preliminaries Content Hierarchy

In the proposed scheme, content is organized in a hierarchy, where consumers having access rights for ancestor category of content can inherit access rights of the descendant categories of content. For simplicity, from now onward, we may represent a category as a “role” and vise versa, as the content hierarchy is analogous to the role hierarchy. That is, consumers having access rights for the ancestor category or role inherits access rights to its descendent categories or roles. Figure 3a and Fig. 3b show two sample content hierarchies, namely Content Hierarchy 1 and Content Hierarchy 2 respectively. Let us consider the Content Hierarchy 1 as an example to define the notations that we will be using in a content hierarchy. – rr : root role of a content hierarchy. We assume that in any content hierarchy there can be only one root role. – Ψ : set of all roles in the content hierarchy. For example, Ψ = {rr , r1 , r2 , r3 , r4 , r5 , r6 , r7 } – Ari : ancestor set of the role ri . For example, Ar1 = {r1 , rr }, Ar3 = {rr , r1 , r3 } and Ar7 = {rr , r1 , r2 , r4 , r6 , r7 }. – Dri : descendent set of the role ri . For example, Dr1 = {r2 , r3 , r4 , r5 , r6 , r7 }, Dr3 = {r6 , r7 } and Dr7 = ∅ 3.2

Bilinear Map

Let G1 and GT be two cyclic multiplicative groups. Let g be a generator of G1 . The bilinear map eˆ : G1 × G1 → GT should fulfil the following properties: – Bilinear– eˆ(g a , g b ) = eˆ(g, g)ab , ∀g ∈ G1 , ∀(a, b) ∈ Z∗q – Non-degenerate– eˆ(g, g) = 1 – Computable– There exist an efficiently computable algorithm to compute eˆ(g, g), ∀g ∈ G1 3.3

Complexity Assumption

Our proposed scheme is based on the following complexity assumption. Decisional Bilinear Diffie-Hellman (DBDH) Assumption. If (a, b, c, z) ∈ Z∗q are chosen randomly, for any polynomial time adversary A, then the ability for that adversary to distinguish the tuples g, g a , g b , g c , Z = eˆ(g, g)a·b·c  and g, g a , g b , g c , Z = eˆ(g, g)z  is negligible.

4

Proposed Scheme

This section first presents an overview of our proposed scheme followed by its main construction. In the overview section, we briefly describe main idea of our proposed encryption technique.

An Accountable Access Control Scheme for Hierarchical Content

...

... .

577

...

...

Fig. 4. A Sample Content Key Hierarchy (CKH)

4.1

Overview

In our proposed scheme, each CP maintains a content hierarchy CH, which consists of different categories of contents organized in a hierarchical structure. Each CP also maintains a Content Key Hierarchy (CKH) associated with the CH. A sample CKH is shown in Fig. 4. Each node in the CKH represents a category of the content. In the CKH, each node is associated with a public key, which is used to encrypt the content belonging to that category. Moreover, each node is also associated with a set of consumers who have subscribed to the associated category of content. For each subscription, the consumers are assigned unique secret keys (i.e., SKrIDi u ), so that they can access any content encrypted using the public key associated with that node (or category) as well as the content encrypted using the public keys associated with any descendent nodes (or sub-categories). For instance, if CP encrypts the content using the public key associated with r2 category, then all consumers associated with r1 and r2 categories can decrypt the content using their respective secret keys. Similarly, all the ciphertexts associated with r5 and r7 can be decrypted by the consumers associated with r1 , r2 , r3 , r5 and r1 , r2 , r3 , r4 , r5 , r6 , r7 respectively. Note that, CP keeps the root node rr as an unused category. Our proposed scheme also provides a signature scheme that enables the edge routers to authenticate content requests (or interest packets) of the consumers before forwarding them to the networks. In the signature, each consumer includes his/her validity or subscription period, thereby enabling the edge router to prevent any consumer having expired subscription period from accessing the content and network resources. This achieves our time expiration-based revocation of consumers. Moreover, our proposed scheme introduces a novel immediate revocation scheme using the concept of bilinear-map accumulator. More details are given in Sect. 4.3.

578

4.2

N. H. Sultan et al.

Construction

Our proposed scheme comprises mainly 8 phases– Content Provider Setup, Consumer Registration, Content Publication, Content Request and Consumer Authentication, Content Re-encryption, Decryption, ISP Service Accountability and Immediate Revocation. The details of these phases are presented next. Content Provider Setup. CP initiates this phase to generate system parameter SP and master secret key MSK. CP chooses a content hierarchy Ψ , two cyclic multiplicative groups G1 and GT of order q, where q is a large prime number. It also chooses a generator g ∈ G1 , three collusion resistant hash functions H : {0, 1}∗ → G1 , H1 : {0, 1}∗ → Z∗q , H2 : GT → Z∗q , a bilinear map eˆ : G1 × G1 → GT , and random numbers (σ, δ, ρ, {tri }∀ri ∈Ψ ) ∈ Z∗q . It then computes P, Q. It also computes public key PKri = PKri , Ari , ri , proxy re-encryption keys {PKeyrrwi }∀rw ∈Ari \{ri } and accumulator Accri for each category ri ∈ (Ψ \ {rr }), where σ





δ

∀rj ∈Ar \{rr } i

tr j tr



P =ˆ e(g, g) ; Q = eˆ(g, g) ; PKri = g     trj sk rw PKeyri = ; Accri = eˆ (H(ri ), g) ∀IDu ∈Uri IDu tr ∀rj ∈Ari \{rw ,rr }

Uri represents the set of all the consumers who  possess category ri . The system parameter SP and master secret key MSK are SP = q, G1 , GT , eˆ, g, H, H1 , H2 , P, Q,    {PKri }∀ri ∈Ψ and MSK = σ, δ, ρ, {tri , Accri , {PKeyrrwi }∀rw ∈Ari \{ri } }∀ri ∈Ψ \{rr } respectively. The system parameter SP is made public by keeping them in a public bulletin board, whereas the master secret key MSK is kept in a secure place. Note that, the proxy re-encryption keys are not generated for the root role. Further, CP securely sends the proxy re-encryption keys to the NDN routers when requested. We assume that CP uses a secure public key encryption technique such as the one in [24] for transmission of the proxy re-encryption keys. Consumer Registration. This phase is initiated by the CP when a consumer wants to subscribe for its contents. Suppose a consumer, say IDu (IDu is the unique identity of the consumer), subscribes to the category rx for a period of VPIDu . The CP issues a set of secret keys SKrIDx u and public keys PubrIDx u to the subscribed consumer IDu . Moreover, the CP generates an updated accumulator v associated with the category rx . CP chooses a random number skIDu ∈ Z∗q Accver rx (termed as private key) and computes SKrIDx u = skIDu , RK1IDu ,rx , RK2IDu ,rx , PubrIDx u = v Pub1rIDx u , Pub2rIDx u , PubWitrIDx u , Accver rx , where RK1IDu ,rx = g

σ·skIDu +δ trx

; RK2IDu ,rx = g

σ·skIDu +δ   tr  j ∀rj ∈(Arx \{rr }) tr ρ

1

Pub1rIDx u = H(rx ) skIDu +ρ·H1 (VPIDu ) ; Pub2rIDx u = eˆ (H(rx ), g) skIDu +ρ·H1 (VPIDu ) PubWitrIDx u = eˆ (H(rx ), g)



∀IDj ∈(Urx \{IDu })

skIDj

v ; Accver = (Accverv−1 )skIDu rx rx

An Accountable Access Control Scheme for Hierarchical Content

579

v Accver and Accverv−1 represent current and previous the accumulator associated rx rx with category rx . The secret key SKIDu is sent to the consumer using a secure public key encryption mechanism, as described in Content Provider Setup phase, and the public keys are kept in a public bulletin board. Note that, we shall refer to Pub1rIDx u , Pub2rIDx u as public keys and PubWitrIDx u as witness public key. Also, the subscription or validity period VPIDu can be of the format dd– mm– yyyy.

Content Publication. CP initiates this phase when it wants to broadcast a content to the authorized (subscribed) consumers. Suppose CP wants to broadcast a content M for the consumers having access rights for the category ri . CP first encrypts the content using a random symmetric key K ∈ GT with a symmetric key encryption mechanism. It then encrypts the random symmetric key K using following encryption process. CP chooses two random numbers (y1 , y2 ) ∈ Z∗q and computes C1 , C2 , C3 , Cri , where C1 = K · P y1 = K · eˆ(g, g)σ·y1 ; C2 = Q(y1 +y2 ) = eˆ(g, g)δ·(y1 +y2 ) C3 = P

y2

σ·y2

= eˆ(g, g)

(y1 +y2 )

; Cri = (PKri )

=g

(y1 +y2 )·

 ∀rj ∈Ar \{rr } i



tr j tr



Final, ciphertext is CT = Enck (M), C1 , C2 , C3 , Cri , where Enck (M) is the symmetric key encryption component of the actual content. Note that, the Advanced Encryption Standard (AES) scheme can be used for symmetric key encryption. Also note that, our proposed scheme uses a hierarchical content naming scheme similar to the one defined in [4]. For instance, a drama TV series name can be represented as “/com/example/all-contents/tvseries/quality/drama/xyz.mp4/chunk 1 ”, where /com/example/ represents the CP’s domain name (e.g., example.com), /all-contents/tv-series/quality/drama represents category of the content in the CH (or CKH)6 , and /xyz.mp4 represents file name and /Chunk 1 specifies the contents in the file. This naming scheme helps the routers to identify the category of the requested content along with the ancestor categories in the CH. Content Request and Consumer Authentication. This phase is divided into two sub-phases, namely Content Request and Consumer Authentication. Content Request. A consumer initiates this sub-phase when he/she wants to access a content. Suppose, the consumer IDu , possessing access rights for the category rx , wants to access a content of category ri . The consumer IDu chooses random number v ∈ Z∗q , timestamp ts, content name CN and computes a signature Sig = S1 , S2 , S3 , where  v v·ρ S1 = [skIDu + H1 (ts||CN )] · v; S2 = Pub2rIDx u = eˆ (H(rx ), g) skIDu +ρ·H1 (VPIDu ) ; S3 = g v

The consumer IDu sends the interest packet with the signature Sig, current timestamp ts and validity period VPIDu to the nearest ER. 6

It also shows the ancestor categories in the CH.

580

N. H. Sultan et al.

Consumer Authentication. The ER initiates this sub-phase when it receives an interest packet from a consumer. In this phase, the ER verifies authenticity of the requested consumer before forwarding or satisfying his/her interest. ER first checks the timestamp ts and validity period VPIDu of the consumer with the current time ts . If they are invalid (i.e., ts − ts < Δt and VPIDu < ts , where Δt is a predefined value), the ER ignores the request. Otherwise, it searches the database for an entry belonging to the consumer IDu . If there is no entry for the consumer, it sends a request to the CP asking for the public keys Pub1rIDx u , Pub2rIDx u . After receiving the public keys Pub1rIDx u , Pub2rIDx u , the ER verifies authenticity of the consumer IDu possessing the access rights for the category rx using the following process: ER computes V1 , V2 and V3 ,   S1 V1 = eˆ Pub1rIDx u , g · (S2 )H1 (VPIDu ) = eˆ (H(rx ), g)

[skIDu +ρ·H1 (VPIDu ]·v+H1 (ts||CN )·v skIDu +ρ·H1 (VPIDu )

H1 (ts||CN )·v

v

= eˆ (H(rx ), g) · eˆ (H(rx ), g) skIDu +ρ·H1 (VPIDu ) v

V2 = eˆ (H(rx ), S3 ) = eˆ (H(rx ), g) 

H1 (ts||CN )·v H1 (ts||CN ) V3 = eˆ Pub1rIDx u , S3 = eˆ (H(rx ), g) skIDu +ρ·H1 (VPIDu ) The ER compares V1 and V2 · V3 . If V1 == V2 · V3 , the ER successfully authenticates the consumer and it forwards the interest packet to the network if the data packets are not in its cache. Otherwise, the ER ignores the request. Note that, a consumer having expired subscription or validity period cannot pass through Consumer Authentication sub-phase. As such, this achieves our expiration-based consumer revocation. Content Re-encryption. In this phase, a router (either ER or IR) re-encrypts a ciphertext. Suppose a router (either ER or IR) receives an interest request for ri category of content from the consumer IDu having access rights for the category rx . Let, CTEncK (M), C1 , C2 , C3 , Cr i  be the requested ciphertext. If rx == ri , the router forwards the data packet to the consumer. Otherwise, if rx = ri and rx ∈ Ari , the router performs a re-encryption operation over the ciphertext. The routers recomputes Cri component of the ciphertext and generates a reencrypted ciphertext CT , which is sent to the consumer IDu . To recomputes Cri , the router requires the proxy re-encryption key PKeyrrxi . If the router does not have the proxy re-encryption key PKeyrrxi in its database, it sends a request for it to the CP as described in Content Provider Setup phase. For the re-encryption, the router computes Cr i , where Cr i

= (Cri )

1 r PKeyrx i



=g

 ∀rj ∈(Ar \{rr }) i



tr j tr

 ·

1   tr  j ∀rj ∈Ar \{rx ,rr } tr i

= g y·trx

The final re-encrypted ciphertext is CT = EncK (M), C1 , C2 , C3 , Cr i . Note that, after receiving the proxy re-encryption key, the router can keep it in its database for future use.

An Accountable Access Control Scheme for Hierarchical Content

581

Content Decryption. A consumer initiates this phase to decrypt a ciphertext. The consumer IDu first recovers the random symmetric key K from the ciphertext using his/her secret key SKrIDx u = skIDu , RK1IDu ,rx , RK2IDu ,rx , and then decrypts the symmetric key encryption component of the ciphertext using K to get the plaintext content M. If the consumer IDu receives the ciphertext CT = EncK (M), C1 , C2 , C3 , Cri , i.e., rx = ri , then he/she computes D1 , where   D1 = eˆ RK2IDu ,rx , Cri ⎞ ⎛ σ·sk +δ ⎜ = eˆ ⎝g

IDu   tr  j ∀rj ∈(Arx \{rr }) tr

= eˆ (g, g)

[σ·skIDu +δ]·(y1 +y2 )

,g

(y1 +y2 )·





∀rj ∈(Ar \{rr }) i

= eˆ (g, g)

[σ·skIDu ]·(y1 +y2 )

tr j tr



⎟ ⎠

· eˆ(g, g)δ·(y1 +y2 )

If the consumer IDu receives the re-encrypted ciphertext CT EncK (M), C1 , C2 , C3 , Cr i , i.e., rx ∈ Ari , then he/she computes D1 , where   D1 = eˆ RK1IDu ,rx , Cr i  σ·sk +δ  IDu = eˆ g trx , g (y1 +y2 )·trx = eˆ (g, g)

[σ·skIDu +δ]·(y1 +y2 )

[σ·skIDu ]·(y1 +y2 )

= eˆ (g, g)

=

· eˆ(g, g)δ·(y1 +y2 )

Then the consumer IDu computes D2 , D3 and recovers the secret key K as follows: D1 σ·sk ·(y +y ) σ·sk ·y σ·sk ·y = eˆ (g, g) IDu 1 2 = eˆ (g, g) IDu 1 · eˆ (g, g) IDu 2 C2 σ·sk ·y σ·sk ·y D2 eˆ (g, g) IDu 1 · eˆ (g, g) IDu 2 σ·sk ·y D3 = = = eˆ (g, g) IDu 1 σ·sk ·y (C3 )skIDu eˆ (g, g) IDu 2 D2 =

K =

C1 (D3 )

1 skIDu

σ·y

=

K · eˆ (g, g) 1 σ·y eˆ (g, g) 1

Finally, the consumer IDu decrypts the symmetric key encryption component EncK (M) using K and gets the plaintext content M. ISP Service Accountability. This phase is initiated by the CP when it wants to evaluate the service information provided by the ISP (i.e., ERs). In our scheme, the ERs periodically send consumers’ service information ts, CN, Sig, IDu  to the respective CP, where ts is the time at which the consumer IDu sent the interest packet to the ER, CN is the requested content name, Sig is the signature of the consumer IDu . After receiving the service information, the CP first compares timestamp ts with the validity period VPIDu of the consumer IDu . If ts ≤ VPIDu , then only the CP needs to verify signature Sig of the consumer IDu . Otherwise, it ignores that service information of the consumer IDu , as the ISP has provided services to a consumer whose validity period has already expired. In this case, the CP can also take punitive actions against

582

N. H. Sultan et al.

the ISP. Afterward, the CP employs a batch verification technique to verify the consumers’ signatures. It computes V1 , V2 and V3 using the signatures of the consumers who had access rights for the same category of contents in the content hierarchy. Let’s assume that the selected consumers’ had access rights for the rx category of contents in the content hierarchy and Srx is the set of those consumers. CP computes ⎛ V1 = eˆ ⎝ ⎛

⎞   S Pub1rIDx u 1 , g ⎠ ·

 ∀IDu ∈Srx



= eˆ ⎝



(S2 )H1 (VPIDu )

∀IDu ∈Srx



H(rx )

S1 skIDu +ρ·H1 (VPIDu )

∀IDu ∈Srx

=





, g⎠ ·

v·ρ·H1 (VPIDu )

eˆ (H(rx ), g) skIDu +ρ·H1 (VPIDu )

∀IDu ∈Srx

eˆ (H(rx ), g)

skIDu +H1 (ts||CN )·v+v·ρ·H1 (VPIDu ) skIDu +ρ·H1 (VPIDu )

∀IDu ∈Srx 

= eˆ (H(rx ), g) ⎛ V2 = eˆ ⎝H(rx ), ⎛ V3 = eˆ ⎝H(rx ),

∀IDu ∈Srx





v

· eˆ (H(rx ), g) ⎞



S3 ⎠ = eˆ (H(rx ), g)

∀IDu ∈Srx



H1 (ts||CN )·v ∀IDu ∈Srx skID +ρ·H1 (VPID ) u u

⎞ (S3 )

H1 (ts||CN ) skIDu +ρ·H1 (VPIDu )

∀IDu ∈Srx

v



⎠ = eˆ (H(rx ), g)

H1 (ts||CN )·v ∀IDu ∈Srx skID +ρ·H1 (VPID ) u u

∀IDu ∈Srx

Note that the CP knows the private key skIDu and the validity period VPIDu of each consumer in Srx . It also knows ρ. As such, the CP can compute skIDu + ρ · H1 (VPIDu ). If V1 = V2 · V3 , then the verification process is successful, which demonstrates that the consumers’ service information are legitimate. 4.3

Immediate Revocation

Our immediate revocation mechanism is designed using the concept of bilinearmap accumulator [25]. When the CP wants to prevent one or more consumers from accessing the contents of a category, this phase is initiated. We need a small modification in our proposed scheme to revoke consumers. Suppose, the CP wants to revoke a set of consumers having access rights for the category ri . Let Rri be the set of revoked consumers possessing category ri . The CP updates the system parameters such as public keys of the categories associated non-revoked consumers and with (Dri ∪ {ri }), public witness keys  of all the  verv−1 H2 (kv )  verv accumulator associated with ri , where PKrj = PKrj ∀rj ∈ (Dri ∪{ri })

v   ∀ID ∈Rv skID   sk r ,ver v v v−1 j ri j PubWitrIDx ,ver = PubWitIDx u v−1 ∀IDj ∈Rri IDj ; Accver = Accver ri ri u

k v ∈ GT , v ∈ Z∗q are random numbers, and verv and verv−1 represent current and previous version respectively.

An Accountable Access Control Scheme for Hierarchical Content

583

Table 1. Functionality comparison

v CP also computes an additional component {Hdrrj = Chver = kv · r j

v Accver rj }∀rj ∈(Dri ∪{ri }) , which is sent along with each newly encrypted ciphertexts. This additional component Hdrrj is mainly used to share the secret k v with all the non-revoked authorized consumers of rj ∈ (Dri ∪ {ri }), so that they can use it to update their secret keys for decryption of the newly encrypted contents. Any non-revoked consumer having access rights for rj can recover k v from v v Hdrrj and updates his/her secret keys (RK1IDu ,rj )H2 (k ) , (RK2IDu ,rj )H2 (k ) , where

v

k =

5 5.1

verv Ch r

j rj ,verv sk (PubWitIDu ) IDu

=

kv ·ˆ e(H(r j ),g) v

(ˆ e(H(rj ),g)

v

 ∀IDk ∈Ur

j

skID

k

∀IDk ∈Ur \{IDu } skIDk sk j ) IDu

.

Analysis Security Analysis

Chosen Plaintext Attack security of the proposed scheme can be defined by the following theorem and proof. Theorem 1. If a probabilistic polynomial time (PPT) adversary A can win the CPA security game defined in Sect. 2.3 with a non-negligible advantage , then a PPT simulator B can be constructed to break DBDH assumption with nonnegligible advantage 2 . Proof. In this proof, we construct a PPT simulator B to break our proposed scheme with an advantage 2 with the help of adversary A. More details are given in Appendix B.

5.2

Performance Analysis

In this section, we evaluate the comprehensive performance of our proposed scheme in terms of functionality, computation, storage and communication overhead. We consider the same security level of all the studied schemes for the comparison purpose.

584

N. H. Sultan et al. Table 2. Computation, communication and storage overhead comparison

Comprehensive Performance Analysis. We have compared our scheme with other existing schemes in two parts, namely Functionality Comparison and Computation, Communication and Storage Overhead Comparison, which are presented next. Functionality Comparison. Table 1 shows the functionality comparison of our proposed scheme with the following other notable works [10,11,14–16,18–23]. One can see that our proposed scheme supports several features that are essential, which the other existing schemes do not support. (i) Most notably, access control over hierarchical content is supported only by our proposed scheme, which has been the primary motivation behind our contribution. (ii) Our scheme also prevent unauthorized consumers from sending interest packets to the network through an authentication process, which helps to mitigate DoS attack. In addition, our scheme is resistant against collusion attack between the ISP (i.e., routers) and revoked consumers, which the existing schemes do not provide. (iii) Our scheme is the only existing scheme that supports both expiry-based (enforced at the ERs) and immediate consumer revocation (enforced by the CP). (iv) ISP’s accountability is also supported by our scheme which enables the CP to compensate the ISP only for the actual services rendered to the consumers. (v) Moreover, as the consumers do not need the CP for accessing content, the CP can be offline for much of the time, which can give to rise to practical benefits. Computation, Communication and Storage Overhead Comparison. In this comparison, we consider [10,20] as these two existing schemes support many of the features that our scheme has. Table 2 shows the computation, communication and storage overhead comparison between our scheme and [10,20]. The comparison is shown in asymptotic upper bound in the worst cases. We consider the most

An Accountable Access Control Scheme for Hierarchical Content

585

frequently operated phases in the comparison. The computation cost comparison is done in terms of number of cryptographic operations such as exponentiation (i.e., Tg1 , Tgt ), pairing (i.e., Tp ), hash (i.e., Th ), group element multiplication (i.e., Tmulg1 , Tmulgt ) operations. In the comparison of storage and communication overhead, the sizes of ciphertext, signature, additional ciphertext component (i.e., Hdrri ), and secret keys of consumers have been considered, which are measured in terms of group element size (i.e., |G1 |, |GT |, |Z∗q |) and size of a symmetric key cipher (i.e., |Enck |). From Table 2, one can see that signature generation cost of our proposed scheme is less than that of [10], while it is comparable with [20]. On the other hand, our proposed scheme takes less cost to verify a signature compared with [10,20]. Our scheme also takes less cost than [10,20] to generate secret keys of the consumers; while content encryption and decryption costs are comparable with [10,20]. The additional proxy re-encryption operation in our scheme takes only one exponentiation operation to re-encrypt a ciphertext. As such, it only introduces a minimal overhead at the routers. Unlike [10,20], our proposed scheme supports immediate revocation, where most of the expensive operations are performed by the CP. A consumer needs to compute three exponentiation, one group multiplication, and one hash operations to update his/her secret keys. It can be also observed that our batch verification operations takes less cost than [10]. It is to be noted that [20] does not provide batch verification operation. From Table 2, it can also be observed that ciphertext size our scheme is comparable with that of [10,20], while the size of the signature is considerably less than that of [10,20]. Hence our scheme incurs less communication burden. However, our scheme adds one additional GT element per ciphertext during immediate revocation event, which increase only minimal overhead in the system. Moreover, a consumer in our scheme requires to maintain less number of secret keys compared to [10,20].

6

Concluding Remarks

In this paper, we have proposed a novel encryption-based access control scheme for NDN while organizing the contents into a hierarchy based on different categories. The proposed scheme employs the concept of Role-Based Encryption that introduces inheritance property, thereby enabling the consumers having access rights for ancestor categories of contents to access contents of the descendent categories. In addition, our proposed scheme enables the ERs to verify authenticity of a consumer before forwarding his/her interest request into the NDN network, which prevents from entering bogus requests into the network. Moreover, CP can revoke consumers any time without introducing heavy overhead into the system. Furthermore, our scheme provides ISP’s accountability which is an important property as it enables the CPs to know reliably the amount of content sent to the consumers by the ISP. If a dispute arises between the ISP and the CP over the amount of service rendered to the consumers, then the information provided to the CP can be used in dispute resolution. We have also provided

586

N. H. Sultan et al.

a formal security analysis of our scheme demonstrating that our scheme is provably secure against CPA. A comprehensive performance analysis also shows that our scheme outperforms other notable works in terms of functionality, storage, communication and computation overhead. We have also been carrying out the simulation of our scheme using MiniNDN. The simulation involves the CP publishing regularly new content, and the consumers request either the new or old content. Our main objective has been to measure the average retrieval time for the consumer and the delay introduced by our scheme. In our experiments, we have varied the number of routers in the ISP as well as the bandwidth of each link and the cache in each of the routers. Our results show that the delay in our scheme between a user sending out an interest packet and receiving the corresponding data packet has been mainly due to the cryptographic operations (which we have shown to be superior to the existing schemes). We will be discussing in detail the simulation results in our technical report describing the implementation [26].

Appendix A

Security Model

The security model of the proposed scheme is defined by the Semantic Security against Chosen Plaintext Attacks (IND-CPA) under Selective-ID model 7 . INDCPA is illustrated using the following security game between a challenger C and an adversary A. – Init Adversary A submits a challenged category ri and identity IDu to the challenger C. – Setup Challenger C runs Content Provider Setup phase to generate system parameters and master secret. It also generates proxy re-encryption keys and a private key for the adversary A. Challenger C sends the system parameters, proxy re-encryption keys and private key to the adversary A. / Ari and a validity – Phase 1 Adversary A sends an identity of a category rx∗ ∈ period VP∗IDu . Challenger C runs Consumer Registration phase and generates r∗

r∗

secret key SKIDx u and public key PubIDx u . Challenger C sends these keys to the adversary A. Adversary A can ask for the secret and public keys for polynomially many times. – Challenge After the Phase 1 is over, the adversary A sends two equal length messages K0 and K1 to the challenger C. The challenger C flips a random coin μ ∈ {0, 1} and encrypts Kμ by initiating Content Publication phase. Challenger C sends the ciphertext of Kμ to the adversary A. – Phase 2 Same as Phase 1. – Guess Adversary A outputs a guess μ of μ. The advantage of wining the game for the adversary A is AdvIND−CPA = |P r[μ = μ] − 12 |. 7

In the Selective-ID security model, the adversary must submit a challenged category before starting the security game. This is essential in our security proof to set up the system parameters.

An Accountable Access Control Scheme for Hierarchical Content

587

Definition 1. The proposed scheme is semantically secure against Chosen Plaintext Attack if AdvIND−CPA is negligible for any polynomial time adversary A. Remark 1. In Phase 1, the adversary A is also allowed to send queries for reencryption of the ciphertexts and signature generation. In our security game, the simulator B gives all the proxy re-encryption keys to the adversary A in the Setup phase. As such, the adversary A can answer the re-encryption queries. Similarly, as the private key is also given to the adversary A, the adversary can also answer all the signature generation queries. Therefore, we do not include re-encryption and signature generation oracles in Phase 1. We observe that the adversary A has at least the same capability as the NDN routers.

Appendix B

Security Proof

Proof. In this proof, we construct a PPT simulator B to break our proposed scheme with an advantage 2 with the help of an adversary A. The DBDH challenger C chooses random numbers (a, b, c, z) ∈ Z∗q and computes A = g a , B = g b , C = g c . It flips a random coin l ∈ {0, 1}. If l = 0, the challenger C computes Z = eˆ(g, g)abc ; otherwise it computes Z = eˆ(g, g)z . The challenger C sends the tuple g, A, B, C, Z to a simulator B and asks the simulator B to output l. Now the simulator acts as the challenger in the rest of the security game. Init– Adversary A submits a challenged category ri and identity IDu to the simulator B.  Setup– Simulator B chooses random numbers (x, sk, , [ski ]∀i∈Uj ,  trj ∀rj ∈Ψ ) ∈ Z∗q . The simulator sets η = x − ab · sk and computes the following parameters:    ∀ry ∈Ar \{rr } t ry j eˆ(g, g)ab = eˆ(A, B); eˆ(g, g)η = eˆ(g, g)x · eˆ(A, B)−sk ; PKrj = C 



PKrj = g

∀ry ∈Ar \{rr } j

t ry  ∀rj ∈(Ψ \Ari )

;



PKeyrrwj



=

try

∀ry ∈Arj \{rw ,rr }



∀rj ∈Ari

∀rj ∈Ψ

   ski Moreover, the simulator B computes Accrj = eˆ (H(rx∗ ), g) ∀i∈Urj ∀rj ∈Ari    sk ∀i∈Ur ski j and Accrj = eˆ (H(rx∗ ), g) . Simulator B sends the tuple ∀rj ∈(Ψ \Ari )   ab η q, G, GT , eˆ, g, H, H1 , eˆ(g, g) , eˆ(g, g) , PKrj = {PKrj , Arj , rj }∀rj ∈Ψ , proxy reencryption keys {PKeyrrwj }∀rj ∈Ψ , and also sk (i.e., private key) to the adversary A. It keeps other parameters by itself. / Ari and validity Phase 1– Adversary A submits a category rx∗ such that rx∗ ∈ period VP∗IDu to the simulator B. The simulator B computes a pair of secret keys r∗

r∗

RK1IDu ,r∗x , RK2IDu ,r∗x , another pair of public keys Pub1IDx u , Pub2IDx u and a witness public r∗

key PubWitIDx u , where

588

N. H. Sultan et al.

RK1IDu ,r∗x

=g

ab·sk+η tr∗ x

r∗ x

Pub1IDu =H(rx∗ )

=g

x tr∗ x

; RK2IDu ,r∗x

1 sk+·H1 (VP∗ IDu )

r∗

PubWitIDx u = eˆ (H(rx∗ , g))



=g

ab·sk+η tr  j ∀rj ∈Ar ∗ \{rr } tr r x

=g

x tr  j ∀rj ∈Ar ∗ \{rr } tr r x



r∗ x



; Pub2IDu = eˆ (H(rx∗ ), g) sk+·H1 (VPIDu )

∀i∈Ur ∗ x

ski

Simulator B sends the pair of secret keys RK1IDu ,r∗x , RK2IDu ,r∗x , public keys r∗

r∗

r∗

Pub1IDx u , Pub2IDx u and public witness key PubWitIDx u to the adversary A. Challenge– When the adversary A decides that Phase 1 is over, it submits two equal length messages K0 and K1 to the simulator B. The simulator chooses a random number r ∈ Z∗q and flips a random binary coin μ. It then encrypts Kμ using the challenged category ri and generates a ciphertext CTμ = C1 , C2 , C3 , Cri , where C1 = Kμ · Z C2 = eˆ(g, g)η·(c+r) = eˆ(g, g)[x−ab·sk ](c+r) = eˆ(g, g)x(c+r) · eˆ(g, g)−ab(c+r)·sk = eˆ(g, C)x · eˆ(g, g)x·r · Z −sk · eˆ(A, B)−r·sk C3 = eˆ(g, g)abr = eˆ(A, B)r ; Cri = g

 (c+r)· ∀r ∈Ar \{rr } t rj j i



= (C · g r )

∀rj ∈Ar \{rr } i

t rj

Phase 2– Same as Phase 1. Guess – The adversary A guesses a bit μ and sends to the simulator B. If μ = μ then the adversary A wins CPA game; otherwise it fails. If μ = μ, simulator B answers “DBDH” in the game (i.e. outputs l = 0); otherwise B answers “random” (i.e. outputs l = 1). If Z = eˆ(g, g)z ; then C1 is completely random from the view of the adversary A. So, the received ciphertext CTμ is not compliant to the game (i.e. invalid ciphertext). Therefore, the adversary A chooses μ randomly. Hence, probability of the adversary A for outputting μ = μ is 12 . If Z = eˆ(g, g)abc , then adversary A receives a valid ciphertext. The adversary A wins the CPA game with non-negligible advantage  (according to Theorem 1). So, the probability of outputting μ = μ for the adversary A is 12 + , where probability  is for guessing that the received ciphertext is valid and probability 1 2 is for guessing whether the valid ciphertext CTμ is related to K0 or K1 . Therefore, the overall advantage AdvIND−CPA of the simulator B is 12 ( 12 +  + 1 1  2) − 2 = 2.

References 1. Jacobson, V., et al.: Networking named content. In: Proceedings of the 5th International Conference on Emerging Networking Experiments and Technologies, CoNEXT 2009, pp. 1–12 (2009) 2. Cisco. 2020 Global Networking Trends Report. Accessed 5 July 2020

An Accountable Access Control Scheme for Hierarchical Content

589

3. Cisco. Cisco Annual Internet Report (2018–2023), White paper. 2020. Accessed 5 July 2020 4. Zhang, L., et al.: Named data networking. SIGCOMM Comput. Commun. Rev. 44(3), 66–73 (2014) 5. Tourani, R., Misra, S., Mick, T., Panwar, G.: Security, privacy, and access control in information-centric networking: a survey. IEEE Commun. Surv. Tutor. 20(1), 566–600 (2018) 6. Zhou, L., Varadharajan, V., Hitchens, M.: Achieving secure role-based access control on encrypted data in cloud storage. IEEE Trans. Inf. Forensics Secur. 8(12), 1947–1960 (2013) 7. Zhou, L., Varadharajan, V., Hitchens, M.: Trust enhanced cryptographic role-based access control for secure cloud data storage. IEEE Trans. Inf. Forensics Secur. 10(11), 2381–2395 (2015) 8. Sultan, N.H., Varadharajan, V., Zhou, L., Barbhuiya, F.A.: A role-based encryption scheme for securing outsourced cloud data in a multi-organization context (2020). https://arxiv.org/abs/2004.05419 9. Sultan, N.H., Laurent, M., Varadharajan, V.: Securing organization’s data: a rolebased authorized keyword search scheme with efficient decryption (2020). https:// arxiv.org/abs/2004.10952 10. Xue, K., et al.: A secure, efficient, and accountable edge-based access control framework for information centric networks. IEEE/ACM Trans. Netw. 27(3), 1220–1233 (2019) 11. Li, Q., Zhang, X., Zheng, Q., Sandhu, R., Fu, X.: LIVE: lightweight integrity verification and content access control for named data networking. IEEE Trans. Inf. Forensics Secur. 10(2), 308–320 (2015) 12. Fotiou, N., Polyzos, G.C.: Securing content sharing over ICN. In: Proceedings of the 3rd ACM Conference on Information-Centric Networking, ACM-ICN 2016, pp. 176–185 (2016) 13. AbdAllah, E.G., Zulkernine, M., Hassanein, H.S.: DACPI: a decentralized access control protocol for information centric networking. In: 2016 IEEE International Conference on Communications (ICC), pp. 1–6, May 2016 14. Tourani, R., Stubbs, R., Misra, S.: TACTIC: tag-based access control framework for the information-centric wireless edge networks. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 456–466, July 2018 15. Nunes, I.O., Tsudik, G.: KRB-CCN: lightweight authentication and access control for private content-centric networks. In: Preneel, B., Vercauteren, F. (eds.) ACNS 2018. LNCS, vol. 10892, pp. 598–615. Springer, Cham (2018). https://doi.org/10. 1007/978-3-319-93387-0 31 16. Bilal, M., Pack, S.: Secure distribution of protected content in information-centric networking. IEEE Syst. J. 14(2), 1–12 (2019) 17. Li, B., et al.: Attribute-based access control for ICN naming scheme. IEEE Trans. Dependable Secure Comput. 15(2), 194–206 (2018) 18. Misra, S., et al.: AccConF: an access control framework for leveraging in-network cached data in the ICN-enabled wireless edge. IEEE Trans. Dependable Secure Comput. 16(1), 5–17 (2019) 19. Xia, Q., et al.: TSLS: time sensitive, lightweight and secure access control for information centric networking. In: IEEE Global Communications Conference (GLOBECOM), pp. 1–6, December 2019 20. He, P., et al.: LASA: lightweight, auditable and secure access control in ICN with limitation of access times. In: IEEE International Conference on Communications (ICC), pp. 1–6, May 2018

590

N. H. Sultan et al.

21. Tseng, Y., Fan, C., Wu, C.: FGAC-NDN: fine-grained access control for named data networks. IEEE Trans. Netw. Serv. Manag. 16(1), 143–152 (2019) 22. Suksomboon, K., et al.: In-device proxy re-encryption service for informationcentric networking access control. In: IEEE 43rd Conference on Local Computer Networks (LCN), pp. 303–306, October 2018 23. Zhu, L., et al.: T-CAM: time-based content access control mechanism for ICN subscription systems. Future Gener. Comput. Syst. 106, 607–621 (2020) 24. Boneh, D., Franklin, M.K.: Identity-based encryption from the Weil pairing. In: Proceedings of the 21st Annual International Cryptology Conference on Advances in Cryptology, CRYPTO 2001, pp. 213–229 (2001) 25. Au, M.H., Tsang, P.P., Susilo, W., Mu, Y.: Dynamic universal accumulators for DDH groups and their application to attribute-based anonymous credential systems. In: Fischlin, M. (ed.) CT-RSA 2009. LNCS, vol. 5473, pp. 295–308. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00862-7 20 26. Sultan, N.H., Varadharajan, V.: On the design and implementation of a RBE based access control scheme for data centric model for content sharing. ACSRC Technical report, The University of Newcastle, Australia, April 2020

PGC: Decentralized Confidential Payment System with Auditability Yu Chen1,2,3,4 , Xuecheng Ma5,6 , Cong Tang7 , and Man Ho Au8(B) 1

2

School of Cyber Science and Technology, Shandong University, Qingdao 266237, China State Key Laboratory of Cryptology, P.O. Box 5159, Beijing 100878, China 3 Key Laboratory of Cryptologic Technology and Information Security, Ministry of Education, Shandong University, Qingdao 266237, China 4 Shandong Institute of Blockchain, Jinan, China [email protected] 5 State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China 6 School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China [email protected] 7 Beijing Temi Co., Ltd., pgc.info, Beijing, China [email protected] 8 Department of Computer Science, The University of Hong Kong, Pok Fu Lam, China [email protected]

Abstract. Many existing cryptocurrencies fail to provide transaction anonymity and confidentiality. As the privacy concerns grow, a number of works have sought to enhance privacy by leveraging cryptographic tools. Though strong privacy is appealing, it might be abused in some cases. In decentralized payment systems, anonymity poses great challenges to system’s auditability, which is a crucial property for scenarios that require regulatory compliance and dispute arbitration guarantee. Aiming for a middle ground between privacy and auditability, we introduce the notion of decentralized confidential payment (DCP) system with auditability. In addition to offering confidentiality, DCP supports privacy-preserving audit in which an external party can specify a set of transactions and then request the participant to prove their compliance with a large class of policies. We present a generic construction of auditable DCP system from integrated signature and encryption scheme and non-interactive zero-knowledge proof systems. We then instantiate our generic construction by carefully designing the underlying building blocks, yielding a standalone cryptocurrency called PGC. In PGC, the setup is transparent, transactions are less than 1.3 KB and take under 38ms to generate and 15 ms to verify. At the core of PGC is an additively homomorphic public-key encryption scheme that we newly introduce, twisted ElGamal, which is not only as secure as standard exponential ElGamal, but also friendly to c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 591–610, 2020. https://doi.org/10.1007/978-3-030-58951-6_29

592

Y. Chen et al. Sigma protocols and Bulletproofs. This enables us to easily devise zeroknowledge proofs for basic correctness of transactions as well as various application-dependent policies in a modular fashion. Keywords: Cryptocurrencies · Decentralized payment system Confidential transactions · Auditable · Twisted ElGamal

1

·

Introduction

Cryptocurrencies such as Bitcoin [Nak08] and Ethereum [Woo14] realize decentralized peer-to-peer payment by maintaining an append-only public ledger known as blockchain, which is globally distributed and synchronized by consensus protocols. In blockchain-based payment systems, correctness of each transaction must be verified by the miners publicly before being packed into a block. To enable efficient validation, major cryptocurrencies such as Bitcoin and Ethereum simply expose all transaction details (sender, receiver and transfer amount) to the public. Following the terminology in [BBB+18], privacy for transactions consists of two aspects: (1) anonymity, hiding the identities of sender and receiver in a transaction and (2) confidentiality, hiding the transfer amount. While Bitcoinlike and Ethereum-like cryptocurrencies provide some weak anonymity through unlinkability of account addresses to real world identities, they lack confidentiality, which is of paramount importance. 1.1

Motivation

Auditability is a crucial property in all financial systems. Informally, it means that an auditor can not only check that if transactions satisfy some pre-fixed policies before-the-fact, but also request participants to prove that his/her transactions comply with some policies after-the-fact. In centralized payment system where there exists a trusted center, such as bank, Paypal or Alipay, auditability is a built-in property since the center knows details of all transactions and thus be able to conduct audit. However, it is challenging to provide the same level of auditability in decentralized payment systems with strong privacy guarantee. In our point of view, strong privacy is a double-edged sword. While confidentiality is arguably the primary concern of privacy for any payment system, anonymity might be abused or even prohibitive for applications that require auditability, because anonymity provides plausible deniability [FMMO19], which allows participants to deny their involvements in given transactions. Particularly, it seems that anonymity denies the feasibility of after-the-fact auditing, due to an auditor is unable to determine who is involved in which transaction. We exemplify this dilemma via the following three typical cases. Case 1: To prevent criminals from money laundering in a decentralized payment system, an auditor must be able to examine the details of suspicious transactions. However, if the system is anonymous, locating the participants of suspicious

PGC: Decentralized Confidential Payment System with Auditability

593

transactions and then force them to reveal transaction details would be very challenging. Case 2: Consider an employer pay salaries to employees via a decentralized payment system with strong privacy. If the employee did not receive the correct amount, how can he support his complaint? Likewise, how can the employer protect himself by demonstrating that he has indeed completed payment with the correct amount. Case 3: Consider the employees also pay the income tax via the same payment system. How can he convince the tax office that he has paid the tax according to the correct tax rate. Similar scenarios are ubiquitous in monetary systems that require after-thefact auditing and in e-commerce systems that require dispute arbitration mechanism. 1.2

Our Contributions

The above discussions suggest that it might be impossible to construct a decentralized payment system offering the same level of auditability as centralized payment system while offering confidentiality and anonymity simultaneously without introducing some degree of centralization or trust assumption. In this work, we stick to confidentiality, but trade anonymity for auditability. We summarize our contributions as follows. Decentralized Confidential Payment System with Auditability. We introduce the notion of decentralized confidential payment (DCP) system with auditability, and formalize a security model. For the sake of simplicity, we take an account-based approach and focus only on the transaction layer, and treat the network/consensus-level protocols as black-box and ignore attacks against them. Briefly, DCP should satisfy authenticity, confidentiality and soundness. The first security notion stipulates that only the owner of an account can spend his coin. The second security notion requires the account balance and transfer amount are hidden. The last security notion captures that no one is able to make validation nodes accept an incorrect transaction. In addition, we require the audit to be privacy-preserving, namely, the participant is unable to fool the auditor and the auditing results do not impact the confidentiality of other transactions. We then present a generic construction of auditable DCP system from two building blocks, namely, integrated signature and encryption (ISE) schemes and non-interactive zero-knowledge (NIZK) proof systems. In our generic DCP construction, ISE plays an important role. First, it guarantees that each account can safely use a single keypair for both encryption and signing. This feature greatly simplifies the overall design from both conceptual and practical aspects. Second, the encryption component of ISE ensures that the resulting DCP system is complete, meaning that as soon as a transaction is recorded on the blockchain, the payment is finalized and takes effect immediately – receiver’s balance increases with the same amount that sender’s balance decreases, and the receiver can spend

594

Y. Chen et al.

his coin on his will. This is in contrast to many commitment-based cryptocurrencies that require out-of-band transfer, which are thus not complete. NIZK not only enables a sender to prove transactions satisfy the most basic correctness policy, but also allows a user to prove that any set of confidential transactions he participated satisfy a large class of application-dependent policies, such as limit on total transfer amounts, paying tax rightly according the tax rate, and transaction opens to some exact value. In summary, our generic DCP construction is complete, and supports flexible audit. PGC: A Simple and Efficient Instantiation. While the generic DCP construction is relatively simple and intuitive, an efficient instantiation is more technically involved. We realize our generic DCP construction by designing a new ISE and carefully devising suitable NIZK proof system. We refer to the resulting cryptocurrency as PGC (stands for Pretty Good Confidentiality). Notably, in addition to the advantages inherited from the generic construction, PGC admits transparent setup, and its security is based solely on the widely-used discrete logarithm assumption. To demonstrate the efficiency and usability of PGC, we implement it as a standalone cryptocurrency, and also deploy it as smart contracts. We report the experimental results in Sect. 5. 1.3

Technical Overview

We discuss our design choice in the generic construction of DCP, followed by techniques and tools towards a secure and efficient instantiation, namely, PGC. 1.3.1

Design Choice in Generic Construction of Auditable DCP

Pseudonymity vs. Anonymity. Early blockchain-based cryptocurrencies offers pseudonymity, that is, addresses are assumed to be unlinkable to their real world identities. However, a variety of de-anonymization attacks [RS13,BKP14] falsified this assumption. On the other hand, a number of cryptocurrencies such as Monero and Zcash sought to provide strong privacy guarantee, including both anonymity and confidentiality. In this work, we aim for finding a sweet balance between privacy and auditability. As indicated in [GGM16], identity is crucial to any regulatory system. Therefore, we choose to offer privacy in terms of confidentiality, and still stick to pseudonymous system. Interestingly, we view pseudonymity as a feature rather than a weakness, assuming that an auditor is able to link account addresses to real world identities. This opens the possibility to conduct after-the-fact audit. We believe this is the most promising avenue for real deployment of DCP that requires auditability. PKE vs. Commitment. A common approach to achieve transaction confidentiality is to commit the balance and transfer amount using a global homomorphic commitment scheme (e.g., the Pedersen commitment [Ped91]), then derive a secret from blinding randomnesses to prove correctness of transaction and authorize transfer. The seminal DCP systems [Max,Poe] follow this approach.

PGC: Decentralized Confidential Payment System with Auditability

595

Nevertheless, commitment-based approach suffers from several drawbacks. First, the resulting DCP systems are not complete. Due to lack of decryption capability, senders are required to honestly transmit the openings of outgoing commitments (includes randomness and amount) to receivers in an out-of-band manner. This issue makes the system much more complicated, as it must be assured that the out-of-band transfer is correct and secure. Second, users must be stateful since they have to keep track of the randomness and amount of each incoming commitment. Otherwise, failure to open a single incoming commitment will render an account totally unusable, due to either incapable of creating the NIZK proofs (lack of witness), or generating the signature (lack of signing key). This incurs extra security burden to the design of wallet (guarantee the openings must be kept in a safe and reliable way). Observe that homomorphic PKE can be viewed as a computationally hiding and perfectly binding commitment, in which the secret key serves as a natural trapdoor to recover message. With these factors in mind, our design equips each user with a PKE keypair rather than making all users share a global commitment. Integrated Signature and Encryption vs. SIG+PKE. Intuitively, to secure a DCP system, we need a PKE scheme to provide confidentiality, and a signature scheme to provide authenticity. If we follow the principle of key separation, i.e., use different keypairs for encryption and signing operations respectively, the overall design would be complicated. In that case, each account will be associated with two keypairs, and consequently deriving account address turns out to be very tricky. If we derive the address from one public key, which one should be chosen? If we derive the address from the two public keys, then additional mechanism is needed to link the two public keys together. A better solution is to adopt key reuse strategy, i.e., use the same keypair for both encryption and signing. This will greatly simplify the design of overall system. However, reusing keypairs may create new security problems. As pointed out in [PSST11], the two uses may interact with one another badly, in such a way as to undermine the security of one or both of the primitives, e.g., the case of textbook RSA encryption and signature. In this work, we propose to use integrated signature and encryption (ISE) scheme with joint security, wherein a single keypair is used for both signature and encryption components in a secure manner, to replace the naive combination of signature and encryption. To the best of our knowledge, this is the first time that ISE is used in DCP system to ensure provable security. We remark that the existing proposal Zether [BAZB20] essentially adopts the key reuse strategy, employing a signature scheme and an encryption scheme with same keypair. Nevertheless, they do not explicitly identify key reuse strategy and formally address joint security. 1.3.2

Overview of Our Generic Auditable DCP

DCP from ISE and NIZK. We present a generic construction of DCP system from ISE and NIZK. We choose the account-based model for simplicity and usability. In our generic DCP construction, user creates an account by generating

596

Y. Chen et al.

a keypair of ISE, in which the public key is used as account address and secret key is used to control the account. The state of an account consists of a serial number (a counter that increments with every outgoing transaction) and an encrypted balance (encryption of plaintext balance under account public key). State changes are triggered by transactions from one account to another. A blockchain tracks the state of every account. Let C˜s and C˜r be the encrypted balances of two accounts controlled by Alice and Bob respectively. Suppose Alice wishes to transfer v coins to Bob. She constructs a confidential transaction via the following steps. First, she encrypts v under her public key pks and Bob’s public key pkr respectively to obtain Cs and Cr , and sets memo = (pks , pkr , Cs , Cr ). Then, she produces a NIZK proof πcorrect for the most basic correctness policy of transaction: (i) Cs and Cr are two encryptions of the same transfer amount under pks and pkr ; (ii) the transfer amount lies in a right range; (iii) her remaining balance is still positive. Finally, she signs serial number sn together with memo and πcorrect under her secret key, obtaining a signature σ. The entire transaction is of the form (sn, memo, πcorrect , σ). In this way, validity of a transaction is publicly verifiable by checking the signature and NIZK proof. If the transaction is valid, it will be recorded on the blockchain. Accordingly, Alice’s balance (resp. Bob’s balance) will be updated as C˜s = C˜s − Cs (resp. C˜r = C˜r + Cr ), and Alice’s serial number increments. Such balance update operation implicitly requires that the underlying PKE scheme satisfies additive homomorphism. In summary, the signature component is used to provide authenticity (proving ownership of an account), the encryption component is used to hide the balance and transfer amount, while zero-knowledge proofs are used to prove the correctness of transactions in a privacy-preserving manner. Auditing Policies. A trivial solution to achieve auditability is to make the participants reveal their secret keys. However, this approach will expose all the related transactions to the auditor, which is not privacy-preserving. Note that a plaintext and its ciphertext can be expressed as an N P relation. Auditability can thus be easily achieved by leveraging NIZK. Moreover, note that the structure of memo is symmetric. This allows either the sender or the receiver can prove transactions comply with a variety of application-dependent policies. We provide examples of several useful policies as below: – Anti-money laundering: the sum of a collection of outgoing/incoming transactions from/to a particular account is limited. – Tax payment: a user pays the tax according to the tax rate. – Selectively disclosure: the transfer amount of some transaction is indeed some value. 1.3.3 PGC: A Secure and Efficient Instantiation A secure and efficient instantiation of the above DCP construction turns out to be more technical involved. Before proceeding, it is instructive to list the desirable features in mind:

PGC: Decentralized Confidential Payment System with Auditability

597

1. transparent setup – do not require a trusted setup. This property is of utmost importance in the setting of cryptocurrencies. 2. efficient – only employ lightweight cryptographic schemes based on wellstudied assumptions. 3. modular – build the whole scheme from reusable gadgets. The above desirable features suggest us to devise efficient NIZK that admits transparent setup, rather than resorting to general-purpose zk-SNARKs, which are either heavyweight or require trusted setup. Besides, the encryption component of ISE should be zero-knowledge proofs friendly. We begin with the instantiation of ISE. A common choice is to select Schnorr signature [Sch91] as the signature component and exponential ElGamal [CGS97] as the encryption component.1 This choice brings us at least three benefits: (i) Schnorr signature and ElGamal PKE share the same discrete-logarithm (DL) keypair (i.e., pk = g sk ∈ G, sk ∈ Zp ), thus we can simply use the common public key as account address. (ii) The signing operation of Schnorr’s signature is largely independent to the decryption operation of ElGamal PKE, which indicates that the resulting ISE scheme could be jointly secure. (iii) ElGamal PKE is additively homomorphic. Efficient Sigma protocols can thus be employed to prove linear relations on algebraically encoded-values. For instance, proving plaintext equality: Cs and Cr encrypt the same message under pks and pkr respectively. It remains to design efficient NIZK proofs for the basic correctness policies and various application-dependent policies. Obstacle of Working with Bulletproof. Besides using Sigma protocols to prove linear relations over encrypted values, we also need range proofs to prove the encrypted values lie in the right interval. In more details, we need to prove that the value v encrypted in Cs and the value encrypted in C˜s − Cs (the current balance subtracts v) lies in the right range. State-of-the-art range proof is Bulletproof [BBB+18], which enjoys efficient proof generation/verification, logarithmic proof size, and transparent setup. As per the desirable features of our instantiation, Bulletproof is an obvious choice. Recall that Bulletproof only accepts statements of the form of Pedersen commitment g r hv . To guarantee soundness, the DL relation between commitment key (g, h) must be unknown to the prover. Note that an ElGamal ciphertext C of v under pk is of the form (g r , pk r g v ), in which the second part ciphertext can be viewed as a commitment of v under commitment key (pk, g). To prove v lies in the right range, it seems that we can simply run the Bulletproof on pk r g v . However, this usage is insecure since the prover owns an obvious trapdoor, say sk, of such commitment key. There are two approaches to circumvent this obstacle. The first approach is to commit v with randomness r under commitment key (g, h), then prove (v, r) in g v hr is consistent with that in the ciphertext (g r , pk r g v ). Similar idea was used in Quisquis [FMMO19]. The drawback of this approach is that it brings extra overhead of proof size as well as proof generation/verification. The second 1

In the remainder of this paper, we simply refer to the exponential ElGamal PKE as ElGamal PKE for ease of exposition.

598

Y. Chen et al.

approach is due to B¨ unz et al. [BAZB20] used in Zether. They extend Bulletproof to Σ-Bullets, which enables the interoperability between Sigma protocols and Bulletproof. Though Σ-Bullets is flexible, it requires custom design and analysis from scratch for each new Sigma protocols. Twisted ElGamal - Our Customized Solution. We are motivated to directly use Bulletproof in a black-box manner, without introducing Pedersen commitment as a bridge or dissecting Bulletproof. Our idea is to twist standard ElGamal, yielding the twisted ElGamal. We sketch twisted ElGamal below. The setup algorithm picks two random generators (g, h) of G as global parameters, while the key generation algorithm is same as that of standard ElGamal. To R − Zp , encrypt a message m ∈ Zp under pk, the encryption algorithm picks r ← r r m then computes ciphertext as C = (X = pk , Y = g h ). The crucial difference to standard ElGamal is that the roles of key encapsulation and session key are switched and the message m is lifted on a new generator h.2 Twisted ElGamal retains additive homomorphism, and is as efficient and secure as the standard exponential ElGamal. More importantly, it is zero-knowledge friendly. Note that the second part of twisted ElGamal ciphertext (even encrypted under different public keys) can be viewed as Pedersen commitment under the same commitment key (g, h), whose DL relation is unknown to all users. Such structure makes twisted ElGamal compatible with all zero-knowledge proofs whose statement is of the form Pedersen commitment. In particular, one can directly employ Bulletproof to generate range proofs for values encrypted by twisted ElGamal in a black-box manner, and these proofs can be easily aggregated. Next, we abstract two distinguished cases. Prover Knows the Randomness. This case generalizes scenarios in which the prover is the producer of ciphertexts. Concretely, consider a twisted ElGamal ciphertext C = (X = pk r , Y = g r hv ), to prove m lies in the right range, the prover first executes a Sigma protocol on C to prove the knowledge of (r, v), then invokes a Bulletproof on Y to prove v lies in the right range. Prover Knows the Secret Key. This case generalizes scenarios where the prover is the recipient of ciphertexts. Due to lack of randomness as witness, the prover cannot directly invoke Bulletproof. We solve this problem by developing ciphertext refreshing approach. The prover first decrypts C˜s − Cs to v using sk, then generates a new ciphertext Cs∗ of v under fresh randomness r∗ , and proves that C˜s − Cs and Cs∗ do encrypt the same message under his public key. Now, the prover is able to prove that v encrypted by Cs∗ lies in the right range by composing a Sigma protocol and a Bulletproof, via the same way as the first case. The above range proofs provide two specialized “proof gadgets” for proving encrypted values lie in the right range. We refer to them as Gadget-1 and Gadget2 hereafter. As we will see, all the NIZK proofs used in this work can be built from these two gadgets and simple Sigma protocols. Such modular design helps 2

As indicated in [CZ14], the essence of ElGamal is Fsk (g r ) = pkr forms a publicly evaluable pseudorandom functions over G. The key insight of switching is that Fsk is in fact a permutation.

PGC: Decentralized Confidential Payment System with Auditability

599

to reduce the footprint of overall cryptographic code, and have the potential to admit parallel proof generation/verification. We highlight the two “gadgets” are interesting on their own right as privacy-preserving tools, which may find applications in other domains as well, e.g., secure machine learning. Enforcing more Auditing Policies. In the above, we have discussed how to employ Sigma protocols and Bulletproof to enforce the basic correctness policy for transactions. In fact, we are able to enforce a variety of polices that can be expressed as linear constraints over transfer amounts. In Sect. 4.2, we show how to enforce limit, rate and open policies using Sigma protocols and the two gadgets we have developed. 1.4

Related Work

The seminal cryptocurrencies such as Bitcoin and Ethereum do not provide sufficient level of privacy. In the past years privacy-enhancements have been developed along several lines of research. We provide a brief overview below. The first direction aims to provide confidentiality. Maxwell [Max] initiates the study of confidential transaction. He proposes a DCP system by employing Pedersen commitment to hide transfer amount and using range proofs to prove correctness of transaction. Mimblewimble/Grin [Poe,Gri] further improve Maxwell’s construction by reducing the cost of signatures. The second direction aims to enhance anonymity. A large body of works enhance anonymity via utilizing mixing mechanisms. For instance, CoinShuffle [RMK14], Dash [Das], and Mixcoin [BNM+14] for Bitcoin and M¨ obius [MM] for Ethereum. The third direction aims to attain both confidentiality and anonymity. Monero [Noe15] achieves confidentiality via similar techniques used in Maxwell’s DCP system, and provides anonymity by employing linkable ring signature and stealth address. Zcash [ZCa] achieves strong privacy by leveraging key-private PKE and zk-SNARK. Despite great progress in privacy-enhancement, the aforementioned schemes are not without their limitations. In terms of reliability, most of them require out-of-band transfer and thus are not complete. In terms of efficiency, some of them suffer from slow transaction generation due to the use of heavy advanced cryptographic tools. In terms of security, some of them are not proven secure based on well-studied assumption, or rely on trusted setup. Another related work is zkLedger [NVV18], which offers strong privacy and privacy-preserving auditing. To attain anonymity, zkLedger uses a novel tablebased ledger. Consequently, the size of transactions and audit efficiency are linear in the total number of participants in the system, and the scale of participants is fixed at the setup stage of system. Concurrent and Independent Work. Fauzi et al. [FMMO19] put forward a new design of cryptocurrency with strong privacy called Quisquis in the UTXO model. They employ updatable public keys to achieve anonymity and a slight variant of standard ElGamal encryption to achieve confidentiality. B¨ unz et al. [BAZB20] propose a confidential payment system called Zether, which is compatible with Ethereum-like smart contract platforms. They use standard

600

Y. Chen et al.

ElGamal to hide the balance and transfer amount, and use signature derived from NIZK to authenticate transactions. They also sketch how to acquire anonymity for Zether via a similar approach as zkLedger [NVV18]. Both Quisquis and Zether design accompanying zero-knowledge proofs from Sigma protocols and Bulletproof, but take different approaches to tackle the incompatibility between ElGamal encryption and Bulletproof. Quisquis introduces ElGamal commitment to bridge ElGamal encryption, and uses Sigma protocol to prove consistency of bridging. Finally, Quisquis invokes Bulletproof on the second part of ElGamal commitment, which is exactly a Pedersen commitment. Zether develops a custom ZKP called Σ-Bullets, which is a dedicated integration of Bulletproof and Sigma protocol. Given an arithmetic circuit, a Σ-Bullets ensures that a public linear combination of the circuit’s wires is equal to some witness of a Sigma protocol. This enhancement in turn enables proofs on algebraically-encoded values such as ElGamal encryptions or Pedersen commitments in different groups or using different commitment keys. We highlight the following crucial differences of our work to Quisquis and Zether: (i) We focus on confidentiality, and trade anonymity for auditability. (ii) We use jointly secure ISE, rather than ad-hoc combination of signature and encryption, to build DCP system in a provably secure way. (iii) As for instantiation, PGC employs our newly introduced twisted ElGamal rather than standard ElGamal to hide balance and transfer amount. The nice structure of twisted ElGamal enables the sender to prove the transfer amount lies in the right range by directly invoking Bulletproof in a black-box manner, without any extra bridging cost as Quisquis. The final proof for correctness of transactions in PGC is obtained by assembling small “proof gadgets” together in a simple and modular fashion, which is flexible and reusable. This is opposed to Zether’s approach, in which zero-knowledge proof is produced by a Σ-Bullets as a whole, while building case-tailored Σ-Bullets requires to dissect Bulletproof and skillfully design its interface to Sigma protocol. Note. Due to space limit, we refer to the full version of this work [CMTA19] for definitions of standard cryptographic primitives and all security proofs.

2

Definition of DCP System

We formalize the notion of decentralized confidential payment system in account model, adapting the notion of decentralized anonymous payment system [ZCa]. 2.1

Data Structures

We begin by describing the data structures used by a DCP system. Blockchain. A DCP system operates on top of a blockchain B. The blockchain is publicly accessible, i.e., at any given time t, all users have access to Bt , the ledger at time t, which is a sequence of transactions. The blockchain is also append-only, i.e., t < t implies that Bt is a prefix of Bt .

PGC: Decentralized Confidential Payment System with Auditability

601

Public Parameters. A trusted party generate public parameters pp at the setup time of system, which is used by system’s algorithms. We assume that pp always include an integer vmax , which specifies the maximum possible number of coins that the system can handle. Any balance and transfer below must lie in the integer interval V = [0, vmax ]. Account. Each account is associated with a keypair (pk, sk), an encoded balance C˜ (which encodes plaintext balance m), ˜ as well as an incremental serial number ˜ and pk are made public. The sn (used to prevent replay attacks). Both sn, C, public key pk serves as account address, which is used to receive transactions from other accounts. The secret key sk is kept privately, which is used to direct transactions to other accounts and decodes encoded balance. Confidential Transaction. A confidential transaction ctx consists of three parts, i.e., sn, memo and aux. Here, sn is the current serial number of sender account pks , memo = (pks , pkr , C) records basic information of a transaction from sender account pks to receiver account pkr , where C is the encoding of transfer amount, and aux denotes application-dependent auxiliary information. 2.2

Decentralized Confidential Payment System with Auditability

An auditable DCP system consists of the following polynomial-time algorithms: – Setup(λ): on input a security parameter λ, output public parameters pp. A trusted party executes this algorithm once-for-all to setup the whole system. – CreateAccount(m, ˜ sn): on input an initial balance m ˜ and a serial number sn, ˜ A user runs this algooutput a keypair (pk, sk) and an encoded balance C. rithm to create an account. ˜ on input a secret key sk and an encoded balance C, ˜ – RevealBalance(sk, C): output the balance m. ˜ A user runs this algorithm to reveal the balance. – CreateCTx(sks , pks , pkr , v): on input a keypair (sks , pks ) of sender account, a receiver account address pkr , and a transfer amount v, output a confidential transaction ctx. A user runs this algorithm to transfer v coins from account pks to account pkr . – VerifyCTx(ctx): on input a confidential transaction ctx, output “0” denotes valid and “1” denotes invalid. Miners run this algorithm to check the validity of proposed confidential transaction ctx. If ctx is valid, it will be recorded on the blockchain B. Otherwise, it is discarded. – UpdateCTx(ctx): for each fresh ctx on the blockchain B, sender and receiver update their encoded balances to reflect the change, i.e., the sender account decreases with v coins while the receiver account increases with v coins. – JustifyCTx(pk, sk, {ctx}, f ): on input a user’s keypair (pk, sk), a set of confidential transactions he participated and a policy f , output a proof π for f (pk, {ctx}) = 1. A user runs this algorithm to generate a proof for auditing. – AuditCTx(pk, {ctx}, f, π): on input a user’s public key, a set of confidential transactions he participated, a policy f and a proof, output “0” denotes accept and “1” denotes reject. An auditor runs this algorithm to check if f (pk, {ctx}) = 1.

602

2.3

Y. Chen et al.

Correctness and Security Model

Correctness of basic DCP functionality requires that a valid ctx will always be accepted and recorded on the blockchain, and the states of associated accounts will be updated properly, i.e., the balance of sender account decreases the same amount as the balance of receiver account increases. Correctness of auditing functionality requires honestly generated auditing proofs for transactions complying with policies will always be accept. As to security, we focus solely on the transaction layer of a cryptocurrency, and assume network-level or consensus-level attacks are out of scope. Intuitively, a DCP system should provide authenticity, confidentiality and soundness. Authenticity requires that the sender can only be the owner of an account, nobody else (who does not know the secret key) is able to make a transfer from this account. Confidentiality requires that other than the sender and receiver (who does not know the secret keys of sender and receiver), no one can learn the value hidden in a confidential transaction. While the former two notions address security against outsider adversary, soundness addresses security against insider adversary (e.g. the sender himself). See the full version [CMTA19] for the details of security model.

3

A Generic Construction of Auditable DCP from ISE and NIZK

We present a generic construction of auditable DCP from ISE and NIZK. In a nutshell, we use homomorphic PKE to encode the balance and transfer amount, use NIZK to enforce senders to build confidential transactions honestly and make correctness publicly verifiable, and use digital signature to authenticate transactions. Let ISE = (Setup, KeyGen, Sign, Verify, Enc, Dec) be an ISE scheme whose PKE component is additively homomorphic on message space Zp . Let NIZK = (Setup, CRSGen, Prove, Verify)3 be a NIZK proof system for Lcorrect (which will be specified later). The construction is as below. – Setup(1λ ): runs ppise ← ISE.Setup(1λ ), ppnizk ← NIZK.Setup(1λ ), crs ← NIZK.CRSGen(ppnizk ), outputs pp = (ppise , ppnizk , crs). – CreateAccount(m, ˜ sn): on input an initial balance m ˜ ∈ Zp and a serial number sn ∈ {0, 1}n (e.g., n = 256), runs ISE.KeyGen(ppise ) to generate a keypair (pk, sk), computes C˜ ← ISE.Enc(pk, m; ˜ r) as the initial encrypted balance, sets sn as the initial serial number4 , outputs public key pk and secret key sk. Fix the public parameters, the KeyGen algorithm naturally induces an N P relation Rkey = {(pk, sk) : ∃r s.t. (pk, sk) = KeyGen(r)}. 3

4

We describe our generic DCP construction using NIZK in the CRS model. The construction and security proof carries out naturally if using NIZK in the random oracle model instead. By default, m ˜ and sn should be zero, r should be a fixed and publicly known randomness, say the zero string 0λ . This settlement guarantees that the initial account state is publicly auditable. Here, we do not make it as an enforcement for flexibility.

PGC: Decentralized Confidential Payment System with Auditability

603

˜ on input secret key sk and encrypted balance C, ˜ out– RevealBalance(sk, C): ˜ puts m ˜ ← ISE.Dec(sk, C). – CreateCTx(sks , pks , v, pkr ): on input sender’s keypair (pks , sks ), the transfer ˜ s −v) ∈ amount v, and receiver’s public key pkr , the algorithm first checks if (m V and v ∈ V (here m ˜ s is the current balance of sender account pks ). If not, returns ⊥. Otherwise, it creates ctx via the following steps: 1. compute Cs ← ISE.Enc(pks , v; r1 ), Cr ← ISE.Enc(pkr , v; r2 ), set memo = (pks , pkr , Cs , Cr ), here (Cs , Cr ) serve as the encoding of transfer amount; 2. run NIZK.Prove with witness (sks , r1 , r2 , v) to generate a proof πcorrect for memo = (pks , pkr , Cs , Cr ) ∈ Lcorrect , where Lcorrect is defined as: {∃sks , r1 , r2 , v s.t. Cs = Enc(pks , v; r1 ) ∧ Cr = Enc(pkr , v; r2 ) ∧ v ∈ V ∧ (pks , sks ) ∈ Rkey ∧ Dec(sks , C˜s − Cs ) ∈ V} Lcorrect can be decomposed as Lequal ∧Lright ∧Lsolvent , where Lequal proves the consistency of two ciphertexts, Lright proves that the transfer amount lies in the right range, and Lsolvent proves that the sender account is solvent. 3. run σ ← ISE.Sign(sks , (sn, memo)), sn is the serial number of pks ; 4. output ctx = (sn, memo, aux), where aux = (πcorrect , σ). – VerifyCTx(ctx): parses ctx = (sn, memo, aux), memo = (pks , pkr , Cs , Cr ), aux = (πcorrect , σ), then checks its validity via the following steps: 1. check if sn is a fresh serial number of pks ; 2. check if ISE.Verify(pks , (sn, memo), σ) = 1; 3. check if NIZK.Verify(crs, memo, πcorrect ) = 1. If all the above tests pass, outputs “1”, miners confirm that ctx is valid and record it on the blockchain, sender updates his balance as C˜s = C˜s − Cs and increments the serial number, and receiver updates his balance as C˜r = C˜r + Cr . Else, outputs “0” and miners discard ctx. We further describe how to enforce a variety of useful policies. – JustifyCTx(pks , sks , {ctxi }ni=1 , flimit ): on input pks , sks , {ctxi }ni=1 and flimit , parses ctxi = (sni , memoi = (pks , pkri , Cs,i , Cri ), auxi ), then runs NIZK.Prove with witness sks to generate πlimit for (pks , {Cs,i }1≤i≤n , amax ) ∈ Llimit : n {∃sk s.t. (pk, sk) ∈ Rkey ∧ vi = ISE.Dec(sk, Ci ) ∧ i=1 vi ≤ amax } A user runs this algorithm to prove compliance with limit policy, i.e., the sum of vi sent from account pks is less than amax . The same algorithm can be used to proving the sum of vi sent to the same account is less than amax . – AuditCTx(pks , {ctxi }ni=1 , πlimit , flimit ): on input pks , {ctxi }ni=1 , πlimit and flimit , first parses ctxi = (sni , memoi = (pks , pkri , Cs,i , Cri ), auxi ), then outputs NIZK.Verify(crs, (pks , {Cs,i }1≤i≤n , amax ), πlimit ). The auditor runs this algorithm to check compliance with limit policy.

604

Y. Chen et al.

– JustifyCTx(pku , sku , {ctx}2i=1 , frate ): on input pku , sku , {ctxi }2i=1 and frate , the algorithm parses ctx1 = (sn1 , memo1 = (pk1 , pku , C1 , Cu,1 ), aux1 ) and ctx2 = (sn2 , memo2 = (pku , pk2 , Cu,2 , C2 ), aux2 ), then runs NIZK.Prove with witness sku to generate πrate for the statement (pku , Cu,1 , Cu,2 , ρ) ∈ Lrate : {∃sk s.t. (pk, sk) ∈ Rkey ∧ vi = ISE.Dec(sk, Ci ) ∧ v1 /v2 = ρ} A user runs this algorithm to demonstrate compliance with tax rule, i.e., proving v1 /v2 = ρ. – AuditCTx(pku , {ctxi }2i=1 , πrate , frate ): on input pku , {ctxi }2i=1 , πrate and frate , parses ctx1 = (sn1 , memo1 = (pk1 , pku , C1 , Cu,1 ), aux1 ), ctx2 = (sn2 , memo2 = (pku , pk2 , Cu,2 , C2 ), aux2 ), outputs NIZK.Verify(crs, (pku , Cu,1 , Cu,2 , ρ), πrate ). An auditor runs this algorithm to check compliance with rate policy. – JustifyCTx(sku , ctx, fopen ): on input pku , sku , ctx and fopen , parses ctx = (sn, pks , pkr , Cs , Cr , aux), then runs NIZK.Prove with witness sku (where the subscript u could be either s or r) to generate πopen for (pku , Cu , v ∗ ) ∈ Lopen : {∃sk s.t. (pk, sk) ∈ Rkey ∧ v ∗ = ISE.Dec(sk, C)} A user runs this algorithm to demonstrate compliance with open policy. – AuditCTx(pku , ctx, πopen , fopen ): on input pku , ctx, πopen and fopen , parses ctx = (sn, pks , pkr , Cs , Cr , aux), outputs NIZK.Verify(crs, (pku , Cu , v), πopen ), where the subscript u could be either s or r. An auditor runs this algorithm to check compliance with open policy.

4

PGC: An Efficient Instantiation

We now present an efficient realization of our generic DCP construction. We first instantiate ISE from our newly introduced twisted ElGamal PKE and Schnorr signature, then devise NIZK proofs from Sigma protocols and Bulletproof. 4.1

Instantiating ISE

We instantiate ISE from our newly introduced twisted ElGamal PKE and the classical Schnorr signature. Twisted ElGamal. We propose twisted ElGamal encryption as the PKE component. Formally, twisted ElGamal consists of four algorithms as below: R

– Setup(1λ ): run (G, g, p) ← GroupGen(1λ ), pick h ← − G∗ , set pp = (G, g, h, p) as global public parameters. The randomness and message spaces are Zp . R – KeyGen(pp): on input pp, choose sk ← − Zp , set pk = g sk . – Enc(pk, m; r): compute X = pk r , Y = g r hm , output C = (X, Y ). −1 – Dec(sk, C): parse C = (X, Y ), compute hm = Y /X sk , recover m from hm .

PGC: Decentralized Confidential Payment System with Auditability

605

Remark 1. As with the standard exponential ElGamal, decryption can only be efficiently done when the message is small. However, it suffices to instantiate our generic DCP framework with small message space (say, 32-bits). In this setting, our implementation shows that decryption can be very efficient. Correctness and additive homomorphism are obvious. The standard INDCPA security can be proved in standard model based on the DDH assumption. Theorem 1. Twisted ElGamal is IND-CPA secure in the 1-plaintext/2recipient setting based on the DDH assumption. ISE from Schnorr Signature and Twisted ElGamal Encryption. We choose Schnorr signature [Sch91] as the signature component. By merging the Setup and KeyGen algorithms of twisted ElGamal encryption and Schnorr signature, we obtain the ISE scheme, whose joint security is captured by the following theorem. Theorem 2. The obtained ISE scheme is jointly secure if the twisted ElGamal is IND-CPA secure (1-plaintext/2-recipient) and the Schnorr signature is EUFCMA secure. 4.2

Instantiating NIZK

Now, we design efficient NIZK proof systems for basic correctness policy (Lcorrect ) and more extended policies (Llimit , Lrate , Lopen ). As stated in Sect. 3, Lcorrect can be decomposed as Lequal ∧ Lright ∧ Lsolvent . Let Πequal , Πright , Πsolvent be NIZK for Lequal , Lright , Lsolvent respectively, and let Πcorrect := Πequal ◦ Πright ◦ Πsolvent , where ◦ denotes sequential composition.5 By the property of NIZK for conjunctive statements [Gol06], Πcorrect is a NIZK proof system for Lcorrect . Now, the task breaks down to design Πequal , Πright , Πsolvent . We describe them one by one as below. NIZK for Lequal . Recall that Lequal is defined as: {(pk1 , X1 , Y1 , pk2 , X2 , Y2 ) | ∃r1 , r2 , v s.t. Xi = pkiri ∧ Yi = g ri hv for i = 1, 2}. For twisted ElGamal, randomness can be safely reused in the 1-plaintext/2recipient setting. Lequal can thus be simplified to: {(pk1 , pk2 , X1 , X2 , Y ) | ∃r, v s.t. Y = g r hv ∧ Xi = pkir for i = 1, 2}. Sigma protocol for Lequal . To obtain a NIZK for Lequal , we first design a Sigma protocol Σequal = (Setup, P, V ) for Lequal . The Setup algorithm of Σequal is same as that of the twisted ElGamal. On statement (pk1 , pk2 , X1 , X2 , Y ), P and V interact as below: 5

In the non-interactive setting, there is no distinction between sequential and parallel composition.

606

Y. Chen et al. R

1. P picks a, b ← − Zp , sends A1 = pk1a , A2 = pk2a , B = g a hb to V . R 2. V picks e ← − Zp and sends it to P as the challenge. 3. P computes z1 = a + er, z2 = b + ev using witness w = (r, v), then sends (z1 , z2 ) to V . V accepts iff the following three equations hold simultaneously: pk1z1 = A1 X1e ∧ pk2z1 = A2 X2e ∧ g z1 hz2 = BY e Lemma 1. Σequal is a public-coin SHVZK proof of knowledge for Lequal . Applying Fiat-Shamir transform to Σequal , we obtain Πequal , which is actually a NIZKPoK for Lequal . NIZK for Lright . Recall that Lright is defined as: {(pk, X, Y ) | ∃r, v s.t. X = pk r ∧ Y = g r hv ∧ v ∈ V}. For ease of analysis, we additionally define Lenc and Lrange as below: Lenc = {(pk, X, Y ) | ∃r, v s.t. X = pk r ∧ Y = g r hv } Lrange = {Y | ∃r, v s.t. Y = g r hv ∧ v ∈ V} It is straightforward to verify that Lright ⊂ Lenc ∧ Lrange . Observing that each instance (pk, X, Y ) ∈ Lright has a unique witness, while the last component Y can be viewed as a Pedersen commitment of value v under commitment key (g, h), whose discrete logarithm logg h is unknown to any users. To prove (pk, X, Y ) ∈ Lright , we first prove (pk, X, Y ) ∈ Lenc with witness (r, v) via a Sigma protocol Σenc = (Setup, P1 , V1 ), then prove Y ∈ Lrange with witness (r, v) via a Bulletproof Λbullet = (Setup, P2 , V2 ). Sigma protocol for Lenc . We begin with a Sigma protocol Σenc = (Setup, P, V ) for Lenc . The Setup algorithm is same as that of twisted ElGamal. On statement x = (pk, X, Y ), P and V interact as below: R

− Zp , sends A = pk a and B = g a hb to V . 1. P picks a, b ← R 2. V picks e ← − Zp and sends it to P as the challenge. 3. P computes z1 = a + er, z2 = b + ev using witness w = (r, v), then sends (z1 , z2 ) to V . V accepts iff the following two equations hold simultaneously: pk z1 = AX e ∧ g z1 hz2 = BY e Lemma 2. Σenc is a public-coin SHVZK proof of knowledge for Lenc . Bulletproofs for Lrange . We employ the logarithmic size Bulletproof Λbullet = (Setup, P, V ) to prove Lrange . To avoid repetition, we refer to [BBB+18, Section 4.2] for the details of the interaction between P and V . Lemma 3 [BBB+18, Theorem 3]. Assuming the hardness of discrete logarithm problem, Λbullet is a public-coin SHVZK argument of knowledge for Lrange .

PGC: Decentralized Confidential Payment System with Auditability

607

Sequential Composition. Let Γright = Σenc ◦ Λbullet be the sequential composition of Σenc and Λbullet . The Setup algorithm of Γright is a merge of that of Σenc and Λbullet . For range V = [0, 2 − 1], it first generates a group G of prime order p together with two random generators g and h, then picks independent generators g, h ∈ G . Let P1 = Σenc .P , V1 = Σenc .V , P2 = Λbullet .P , V2 = Λbullet .V . We have Γright .P = (P1 , P2 ), Γright .V = (V1 , V2 ). Lemma 4. Assuming the discrete logarithm assumption, Γright = (Setup, P, V ) is a public-coin SHVZK argument of knowledge for Lright . Applying Fiat-Shamir transform to Γright , we obtain Πright , which is actually a NIZKAoK for Lright . NIZK for LSolvent . Recall that Lsolvent is defined as: ˜ C) | ∃sk s.t. (pk, sk) ∈ Rkey ∧ ISE.Dec(sk, C˜ − C) ∈ V}. {(pk, C, ˜ ˜ = pk r˜, Y˜ = g r˜hm ) is the encryption of current balance In the above, C˜ = (X m ˜ of account pk under randomness r˜, C = (X = pk r , Y = g r hv ) encrypts the    transfer amount v under randomness r. Let C  = (X  = pk r , Y  = g r hm ) = C˜ − C, Lsolvent can be rewritten as:

{(pk, C  ) | ∃r , m s.t. C  = ISE.Enc(pk, m ; r ) ∧ m ∈ V}. We note that while the sender (playing the role of prover) learns m ˜ (by decrypting C˜ with sk), v and r, it generally does not know the randomness r˜. This is because C˜ is the sum of all the incoming and outgoing transactions of pk, whereas the randomness behind incoming transactions is unknown. The consequence is that r (the first part of witness) is unknown, which renders prover unable to directly invoke the Bulletproof on instance Y  . ˜ − v) under a fresh randomness r∗ to obtain Our trick is encrypting m = (m ∗ ∗  ∗ ∗ ∗ a new ciphertext C = (X , Y ), where X ∗ = pk r , Y ∗ = g r hm . C ∗ could be viewed as a refreshment of C  . Thus, we can express Lsolvent as Lequal ∧ Lright , i.e., (pk, C  ) ∈ Lsolvent ⇐⇒ (pk, C  , C ∗ ) ∈ Lequal ∧ (pk, C ∗ ) ∈ Lright . To prove C  and C ∗ encrypting the same value under public key pk, we cannot simply use a Sigma protocol like Protocol for Lequal , in which the prover uses the message and randomness as witness, as the randomness r behind C  is typically unknown. Luckily, we are able to prove this more efficiently by using the secret key as witness. Generally, Lequal can be written as: {(pk, C1 , C2 ) | ∃sk s.t. (pk, sk) ∈ Rkey ∧ ISE.Dec(sk, C1 ) = ISE.Dec(sk, C2 )} When instantiated with twisted ElGamal, C1 = (X1 = pk r1 , Y1 = g r1 hm ) and C2 = (X2 = pk r2 , Y2 = g r2 hm ) are encrypted by the same public key pk, then proving membership of Lequal is equivalent to proving logY1 /Y2 X1 /X2 equals logg pk. This can be efficiently done by utilizing the Sigma protocol Σddh = (Setup, P, V ) for discrete logarithm equality due to Chaum and Pedersen [CP92]. Applying Fiat-Shamir transform to Σddh , we obtain a NIZKPoK Πddh for Lddh . We then prove (pk, C ∗ = (X ∗ , Y ∗ )) ∈ Lright using the NIZKPoK Πright as

608

Y. Chen et al.

we described before. Let Πsolvent = Πddh ◦ Πright , we conclude that Πsolvent is a NIZKPoK for Lsolvent by the properties of AND-proofs. Putting all the sub-protocols described above, we obtain Πcorrect = Πequal ◦ Πright ◦ Πsolvent . Theorem 3. Πcorrect is a NIZKAoK for Lcorrect = Lequal ∧ Lright ∧ Lsolvent . We then show the designs of NIZK for typical auditing policies. NIZK for Llimit . Recall that Llimit = {pk, {Ci }1≤i≤n , amax } is defined as: {∃sk s.t. (pk, sk) ∈ Rkey ∧ vi = ISE.Dec(sk, Ci ) ∧

n 

vi ≤ amax }

i=1

additive homomorphism of twisted ElGaLet Ci = (Xi = pk ri , Yi = g ri hvi ). By  n r r v mal,  the prover first computes C = i=1 Ci = (X = pk , Y = g h ), where n n r = i=1 ri , v = i=1 vi . As aforementioned, in PGC users are not required to maintain history state, which means that users may forget the random coins when implementing after-the-fact auditing. Nevertheless, this is not a problem. It is equivalent to prove (pk, C) ∈ Lsolvent , which can be done by using Gadget-2. NIZK for Lrate . Recall that Lrate is defined as: {(pk, C1 , C2 , ρ) | ∃sk s.t. (pk, sk) ∈ Rkey ∧ vi = ISE.Dec(sk, Ci ) ∧ v1 /v2 = ρ} Without much loss of generality, we assume ρ = α/β, where α and β are two positive integers that are much smaller than p. Let C1 = (pk r1 , g r1 hv1 ), C2 = (pk r2 , g r2 hv2 ). By additive homomorphism of twisted ElGamal, we compute C1 = β · C1 = (X1 = pk βr1 , Y1 = g βr1 hβv1 ), C2 = α · C2 = (X2 = pk αr2 , Y2 = g αr2 hαv2 ). Note that v1 /v2 = ρ = α/β if and only if hβv1 = hαv2 ,6 Lrate is equivalent to (Y1 /Y2 , X1 /X2 , g, pk) ∈ Lddh , which in turn can be efficiently proved via Πddh for discrete logarithm equality using sk as witness, as already described in Protocol for Lsolvent . NIZK for Lopen . Recall that Lopen is defined as: {(pk, C = (X, Y ), v) | ∃sk s.t. X = (Y /hv )sk ∧ pk = g sk } The above language is equivalent to (Y /hv , X, g, pk) ∈ Lddh , which in turn can be efficiently proved via Πddh for discrete logarithm equality.

5

Performance

We first give a prototype implementation of PGC as a standalone cryptocurrency in C++ based on OpenSSL, and collect the benchmarks on a MacBook Pro with an Intel i7-4870HQ CPU (2.5 GHz) and 16 GB of RAM. The source code of PGC is publicly available at Github [lib]. For demo purpose only, we only focus on the transaction layer and do not explore any optimizations.7 The experimental results are described in the table below. 6 7

Since both v1 , v2 , α, β are much smaller than p, no overflow will happen. We expect at least 2× speedup after optimizations.

PGC: Decentralized Confidential Payment System with Auditability

609

Table 1. The computation and communication complexity of PGC. PGC

ctx size big-O

Transaction cost (ms) Bytes Generation Verify

Confidential transaction (2 log2 () + 20)|G| + 10|Zp | 1310

40

Auditing policies

Auditing cost (ms)

Proof size

14

big-O

Bytes Generation Verify

Limit policy

(2 log2 () + 4)|G| + 5|Zp |

622

21.5

7.5

Rate policy

2|G| + 1|Zp |

98

0.55

0.69

Open policy

2|G| + 1|Zp |

98

0.26

0.42

Here we set the maximum number of coins as vmax = 2 − 1, where  = 32. PGC operates elliptic curve prime256v1, which has 128 bit security. The elliptic points are expressed in their compressed form. Each G element is stored as 33 bytes, and Zp element is stored as 32 bytes.

Acknowledgments. We thank Benny Pinkas and Jonathan Bootle for clarifications on Sigma protocols and Bulletproofs in the early stages of this research. We particularly thank Shuai Han for many enlightening discussions. Yu Chen is supported by National Natural Science Foundation of China (Grant No. 61772522, No. 61932019). Man Ho Au is supported by National Natural Science Foundation of China (Grant No. 61972332).

References [BAZB20] B¨ unz, B., Agrawal, S., Zamani, M., Boneh, D.: Zether: towards privacy in a smart contract world. In: Bonneau, J., Heninger, N. (eds.) FC 2020. LNCS, vol. 12059, pp. 423–443. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-51280-4 23 [BBB+18] B¨ unz, B., Bootle, J., Boneh, D., Poelstra, A., Wuille, P., Maxwell, G.: Bulletproofs: short proofs for confidential transactions and more. In: 2018 IEEE Symposium on Security and Privacy, SP 2018, pp. 315–334 (2018) [BKP14] Biryukov, A., Khovratovich, D., Pustogarov, I.: Deanonymisation of clients in bitcoin P2P network. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS 2014, pp. 15–29 (2014) [BNM+14] Bonneau, J., Narayanan, A., Miller, A., Clark, J., Kroll, J.A., Felten, E.W.: Mixcoin: anonymity for bitcoin with accountable mixes. In: Christin, N., Safavi-Naini, R. (eds.) FC 2014. LNCS, vol. 8437, pp. 486–504. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45472-5 31 [CGS97] Cramer, R., Gennaro, R., Schoenmakers, B.: A secure and optimally efficient multi-authority election scheme. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 103–118. Springer, Heidelberg (1997). https:// doi.org/10.1007/3-540-69053-0 9 [CMTA19] Chen, Y., Ma, X., Tang, C., Au, M.H.: PGC: pretty good confidential transaction system with auditability. Cryptology ePrint Archive, Report 2019/319 (2019). https://eprint.iacr.org/2019/319 [CP92] Chaum, D., Pedersen, T.P.: Wallet databases with observers. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 89–105. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-48071-4 7

610

Y. Chen et al.

[CZ14] Chen, Yu., Zhang, Z.: Publicly evaluable pseudorandom functions and their applications. In: Abdalla, M., De Prisco, R. (eds.) SCN 2014. LNCS, vol. 8642, pp. 115–134. Springer, Cham (2014). https://doi.org/10.1007/9783-319-10879-7 8 [Das] Dash. https://www.dash.org [FMMO19] Fauzi, P., Meiklejohn, S., Mercer, R., Orlandi, C.: Quisquis: a new design for anonymous cryptocurrencies. In: Galbraith, S.D., Moriai, S. (eds.) ASIACRYPT 2019. LNCS, vol. 11921, pp. 649–678. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34578-5 23 [GGM16] Garman, C., Green, M., Miers, I.: Accountable privacy for decentralized anonymous payments. In: Grossklags, J., Preneel, B. (eds.) FC 2016. LNCS, vol. 9603, pp. 81–98. Springer, Heidelberg (2017). https://doi.org/ 10.1007/978-3-662-54970-4 5 [Gol06] Goldreich, O.: Foundations of Cryptography, vol. 1. Cambridge University Press, New York (2006) [Gri] Grin. https://grin-tech.org/ [lib] libPGC. https://github.com/yuchen1024/libPGC [Max] Maxwell, G.: Confidential transactions (2016). https://people.xiph.org/ ∼greg/confidential values.txt [MM] Meiklejohn, S., Mercer, R.: M¨ obius: trustless tumbling for transaction privacy. PoPETs 2, 105–121 (2018) [Nak08] Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008). https://bitcoin.org/bitcoin.pdf [Noe15] Noether, S.: Ring signature confidential transactions for monero (2015). https://eprint.iacr.org/2015/1098 [NVV18] Narula, N., Vasquez, W., Virza, M.: zkLedger: privacy-preserving auditing for distributed ledgers. In: 15th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2018, pp. 65–80 (2018) [Ped91] Pedersen, T.P.: Non-interactive and information-theoretic secure verifiable secret sharing. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 129–140. Springer, Heidelberg (1992). https://doi.org/10.1007/3-54046766-1 9 [Poe] Poelstra, A.: Mimblewimble. https://download.wpsoftware.net/bitcoin/ wizardry/mimblewimble.pdf [PSST11] Paterson, K.G., Schuldt, J.C.N., Stam, M., Thomson, S.: On the joint security of encryption and signature, revisited. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 161–178. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25385-0 9 [RMK14] Ruffing, T., Moreno-Sanchez, P., Kate, A.: CoinShuffle: practical decentralized coin mixing for bitcoin. In: Kutylowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8713, pp. 345–364. Springer, Cham (2014). https://doi. org/10.1007/978-3-319-11212-1 20 [RS13] Ron, D., Shamir, A.: Quantitative analysis of the full bitcoin transaction graph. In: Sadeghi, A.-R. (ed.) FC 2013. LNCS, vol. 7859, pp. 6–24. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39884-1 2 [Sch91] Schnorr, C.P.: Efficient signature generation by smart cards. J. Cryptol. 4(3), 161–174 (1991). https://doi.org/10.1007/BF00196725 [Woo14] Wood, G.: Ethereum: a secure decentralized transaction ledger (2014). http://gavwood.com/paper.pdf, https://www.ethereum.org/ [ZCa] Zcash: privacy-protecting digital currency. https://z.cash/

Secure Cloud Auditing with Efficient Ownership Transfer Jun Shen1,2 , Fuchun Guo3 , Xiaofeng Chen1,2 , and Willy Susilo3(B) 1

3

State Key Laboratory of Integrated Service Networks (ISN), Xidian University, Xi’an 710071, China demon [email protected], [email protected] 2 State Key Laboratory of Cryptology, P. O. Box 5159, Beijing 100878, China Institute of Cybersecurity and Cryptology, School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia {fuchun,wsusilo}@uow.edu.au

Abstract. Cloud auditing with ownership transfer is a provable data possession scheme meeting verifiability and transferability simultaneously. In particular, not only cloud data can be transferred to other cloud clients, but also tags for integrity verification can be transferred to new data owners. More concretely, it requires that tags belonging to the old owner can be transformed into that of the new owner by replacing the secret key for tag generation while verifiability still remains. We found that existing solutions are less efficient due to the huge communication overhead linear with the number of tags. In this paper, we propose a secure auditing protocol with efficient ownership transfer for cloud data. Specifically, we sharply reduce the communication overhead produced by ownership transfer to be independent of the number of tags, making it with a constant size. Meanwhile, the computational cost during this process on both transfer parties is constant as well.

Keywords: Cloud storage

1

· Integrity auditing · Ownership transfer

Introduction

As cloud computing has developed rapidly, outsourcing data to cloud servers for remote storage has become an attractive trend [4,11,16]. However, when cloud clients store their data in the cloud, the integrity of cloud data would be threatened due to accidental corruptions or purposive attacks caused by a semi-trusted cloud server. Hence, cloud auditing was proposed as a significant security technology for integrity verification, and has been widely researched [12,17,18,20]. Concretely, a certain cloud client with secret and public key pair (sk1 , pk1 ) generates tags σi = T (mi , sk1 ) based on sk1 for data blocks mi , and uploads {(mi , σi )} to the cloud for remote storage. Once audited with chal, the cloud responds with a constant-size proof of possession P generated from (mi , σi , chal), rather than sending back the challenged cloud data for checking c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 611–631, 2020. https://doi.org/10.1007/978-3-030-58951-6_30

612

J. Shen et al.

directly. The validity of P is checked via pk1 and thereby the integrity of cloud data [5,6,17]. Ownership transfer requires that the owner of cloud data is changeable among cloud clients, instead of always being the original uploader. The ownership is usually indicated by signatures. When ownership transfer occurs, signatures of the old owner should be transformed to that of the new one with the old secret key contained in signatures replaced. Consequently, the new owner obtains the ownership and accesses these data without the approval from the old owner. Cloud auditing with ownership transfer is a provable data possession scheme meeting verifiability and transferability simultaneously. In particular, not only data can be transferred to other cloud clients, but also tags for integrity verification can be transferred to new data owners. Specifically, when the ownership of some data is transferred to a new owner with key pair (sk2 , pk2 ), verifiable tags σi should be σi = T (mi , sk2 ) instead of σi = T (mi , sk1 ). Thus, the transferred data belong to the new owner and are verifiable via pk2 . A trivial solution to achieve cloud auditing with ownership transfer follows the “download-upload” mechanism. Specifically, the new owner downloads the cloud data approved and uploads new generated verifiable tags. It is obvious that such a way is not an advisable one for the huge computational overhead of tag generation and the communication cost even up to twice the size of transferred data. In order to relieve the computational burden, Wang et al. [25] considered to outsource computations of transfer for the first time, and proposed the concept of “Provable Data Possession with Outsourced Data Transfer” (DT-PDP). However, the communication overhead in their transfer protocol is still linear with the number of tags. To the best of our knowledge, there is no better way to further reduce the communication cost produced during ownership transfer. 1.1

Our Contribution

Inspired by the communication overhead, in this work, we are devoted to designing a secure cloud auditing protocol with efficient ownership transfer. Our contributions are summarized as follows. – We focus on the efficiency of ownership transfer, reducing communications produced by transfer to be with a constant size and independent of the number of tags. Meanwhile, the majority of computational cost during ownership transfer is delegated to the computing server, leaving computations on both transfer parties constant as well. – We analyze the security of the proposed protocol, demonstrating that it is provably secure under the k-CEIDH assumption in the random oracle model. The protocol achieves properties of correctness, soundness, unforgeability and detectability. In addition, when ownership transfer occurs, the protocol is secure against collusion attacks, making the data untransferred protected.

Secure Cloud Auditing with Efficient Ownership Transfer

1.2

613

Related Work

Extensive researches have been conducted on the integrity verification for cloud storage. Such researches focus on various aspects, including but not limited to dynamic operations, privacy preservation, key-exposure resistance, etc. In 2007, Ateniese et al. [1] proposed the first “Provable Data Possession” (PDP) scheme, making trusted third parties enabled to execute public verifications. The scheme employs random sampling and homomorphic authenticators to achieve the public auditing property. Almost at the same time, Juels and Kaliski [10] put forward the concept of “Proofs of Retrievability” (PoRs), which is slightly different from PDP but with the similar purpose of checking the remotely stored data. These data are encoded by error correcting codes to enable retrievability and with several sentinels inserted for data possession verification. Since such sentinels should be responded a few each time when audited, the number of challenge executions is extremely limited. In 2008, Shacham and Waters [19] first combined coding technology and PDP together. In this design, the data to be outsourced are divided twice as sectors for the first time, where sectors are equivalent to blocks in previous schemes and several sectors share one data tag, decreasing the size of the processed data storing in the cloud. Subsequently, more researches were conducted on cloud auditing. In 2008, Ateniese et al. [2] first considered data auditing supporting dynamic operations and proposed a PDP scheme with partial dynamics, failing to support data insertion. To solve this problem, Erway et al. [7] designed the first PDP scheme with fully dynamic storage, while it suffers from high computational and communication overheads. Later, Wang et al. [28] employed the Merkle Hash Tree to establish another dynamic PDP scheme, which is much simpler. This topic is also researched in [9,22]. Besides, Wang et al. [24] focused on the study of privacy preservation in cloud auditing, combining homomorphic linear authenticators and random masking technology together. Later, Worku et al. [30] noted that the scheme in [24] leaks the identity privacy of data owners, and utilized ring signature to improve privacy preservation. What’s more, key-exposure resistance is also considered in cloud auditing. Yu et al. [33] first considered the security problem caused by secret key exposure and employed the Merkle Hash Tree to give a solution, of which the efficiency is unsatisfying. Then, they released heavy computations in [32] and enhanced key-exposure resilience in [34]. Other aspects have also been studied these years. Auditing schemes for shared cloud data are researched in [8,21,31]. Data sharing with user revocation is achieved in [23] through using proxy re-signature to renew tags generated by the revoked user and in [15] via Shamir’s Secret Sharing. To simplify certificate management, identity-based auditing schemes are proposed [14,26,36]. Besides, integrity auditing supporting data deduplication [13,27,35] has also been studied. In 2019, it is Wang et al. [25] that considered the integrity checking for the remote purchased data with computations of ownership transfer outsourced for the first time, and proposed the concept of DT-PDP. In such a scheme, the computational and communication costs are reduced. Unfortunately, commu-

614

J. Shen et al.

nications are still huge for its linearity with the number of tags. Besides, this scheme only achieves partial and one-time ownership transfer. In contrast, we explore how to achieve secure cloud auditing enabling thorough ownership transfer with high efficiency. 1.3

Organization

The rest of this paper is organized as follows. Section 2 introduces some preliminaries. Section 3 describes the system model and definition, as well as the security model. Section 4 presents details of the proposed secure cloud auditing protocol with efficient ownership transfer. Section 5 demonstrates the correctness and security of our design. Section 6 shows the performance analysis. Finally, conclusions are drawn in Sect. 7.

2

Preliminaries

To facilitate understandings, we present preliminaries including bilinear pairings, intractable problems in cyclic groups, and homomorphic authenticators. 2.1

Bilinear Pairings

Let G and GT be cyclic groups of prime order p, and g is a generator of G. The pairing e : G × G → GT is a bilinear one iff the following conditions are satisfied:   ab – Bilinear: e ua , v b = e (u, v) holds for ∀a, b ∈ Zp and ∀u, v ∈ G; – Non-degenerate: e (g, g) = 1GT ; – Computable: e (u, v) is efficiently computable for ∀u, v ∈ G. The following are two intractable problems in G. Definition 1 (Computational Diffie-Hellman problem). The Computational Diffie-Hellman (CDH) problem in G is described as follows: given a tuple (g, g x , g y ) for any x, y ∈R Zp as input, output g xy . Define that the CDH assumption holds in G if for any PPT adversary A,     Pr A 1λ , g, g x , g y = g xy ≤ negl(λ) holds for arbitrary security parameter λ, where negl(·) is a negligible function. Definition 2 (k-Computational Exponent Inverse Diffie-Hellman problem). The k-Computational Exponent Inverse Diffie-Hellman problem (k-CEIDH problem) in G is defined as  follows: given a (2k+2)-tuple  1 1 1 e1 , e2 , · · · , ek , g, g b , g a+e1 , g a+e2 , · · · , g a+ek as input, where k is a non  b negative integer, g is a generator of G and b, e1 , · · · , ek ∈R Zp , output ei , g a+ei for any ei ∈ {e1 , · · · , ek }. We define that the k-CEIDH assumption holds in G if for all PPT adversaries and arbitrary security parameter λ, there exists a negligible function negl(·) s.t.    1 b 1 1 Pr A 1λ , e1 , e2 , · · · , ek , g, g b , g a+e1 , g a+e2 , · · · , g a+ek = g a+ei ≤ negl(λ).

Secure Cloud Auditing with Efficient Ownership Transfer

615

  1 When k = 1, the k-CEIDH problem e1 , g, g b , g a+e1 can be regarded as a CDH instance, which is computationally infeasible for A to give the solution   1 b 1 a+e1 . When k > 1, the additional elements e2 , · · · , ek , g a+e2 , · · · , g a+ek seem g to give no assistance to A in solving the above problem. Thus, we can assume that the k-CEIDH problem is as difficult as the CDH problem. 2.2

Homomorphic Authenticators

The homomorphic authenticator is a homomorphic verifiable signature, allowing public verifiers to check the integrity of remote storage without specific data blocks, which is a significant building block employed in public cloud auditing mechanisms [3,13,19,20,22]. Given a bilinear pairing e : G × G → GT and a data file F = (m1 , m2 , · · · , mi , · · · , mn ), where mi ∈ Zp . Let the signer with key pair a (sk = a, pk = g a ) generate signatures σi = (umi ) on data blocks mi for i ∈ [1, n], where a ∈R Zp , g is a generator of G and u ∈R G. The signature is a homomorphic authenticator iff the following properties are met: – Blockless verifiability: the validity of σi is able to be batch authenticated via

m r i i ∈ Zp instead of blocks {mi }i∈I , where I is an integer set and i∈I mi ri ∈ Zp . Essentially, the verification equation could be written as      ? e σi ri , g = e u i∈I mi ri , pk . i∈I

– Non-malleability: given σi for mi and σj for mj , the signature on m = ami + bmj can not be derived directly from the combination of σi and σj , where ami , bmj ∈ Zp .

3 3.1

System Models and Definitions The System Model

Similar to but slightly different from the system model in [25], our model additionally involves a third party auditor since we achieve public verification. Specifically, such a model involves four entities: the cloud server (CS ), the old data owner (PO), the new owner (NO) and the third party auditor (TPA). – The cloud server CS is an entity with seemingly inexhaustible resources. It is responsible for data storage and computing delegations from cloud clients. – The old data owner PO is the transferred party, being individuals or some organizations. It uploads data files to the cloud for remote storage to release local management burden. – The new data owner NO is a cloud client being the target of data ownership transfer, i.e., the transferring party. Once ownership is transferred, NO inherits the jurisdiction on these data from PO.

616

J. Shen et al.

– The third party auditor TPA is a trustworthy entity with professional knowledge of integrity verification. It is able to provide convincing results for auditing delegations from cloud clients. We describe the system architecture in brief here. On the one hand, PO outsources its data to the CS and has on-demand access to these data. The integrity of such data is checked by the TPA with delegations from PO. Once receiving auditing challenges from the TPA, the CS generates corresponding proofs of intact storage. With such proofs, the TPA figures out and sends auditing results to PO. On the other hand, when PO decides to transfer the ownership of some data to NO, it generates and sends auxiliaries to NO. After processing these auxiliaries, NO completes the transfer with the computing service of the CS. Consequently, NO inherits the rights from PO and becomes the data owner currently. 3.2

Cloud Auditing with Ownership Transfer

Inspired by the DT-PDP scheme defined in [25], the formal definition for cloud auditing with ownership transfer is given as follows: Definition 3. The cloud auditing protocol with ownership transfer consists of five algorithms defined below.   – SysGen 1λ → param: This algorithm is a probabilistic one run by the system. With the security parameter λ as input, it outputs the system parameter param. – KeyGen (param) → (SK, P K): This algorithm is a probabilistic one run by cloud clients. Input param, the client generates secret and public key pair (SK, P K), where SK is kept secret for tag generation and P K is distributed public. – TagGen (F, n, mi , SK, param) → (σi , t): This algorithm is a deterministic one run by the data owner with SK. Input the file abstract F , number of blocks n, data blocks mi , and param, it outputs the file tag t and homomorphic verifiable tags σi corresponding to mi . – Audit (Q, t, mi , σi , P K, param) → {0, 1}: This algorithm is a probabilistic one run by the CS and the TPA in two steps: i) Input a challenge set Q from the TPA and file tag t along with data block and tag pairs {(mi , σi )}i∈Q , the CS sends the proof P to the TPA; ii) With P , the data owner’s P K and param as inputs, the TPA verifies the validity of P and outputs “ 1” if succeed and “ 0” otherwise. – TagTrans (F, SKa , P Ka , SKb , P Kb , σi , param) → (σi , t ): This algorithm is a deterministic one run by the CS, PO and NO in three steps: i) Input the file abstract F , PO’s SKa , it sends auxiliaries Aux to NO; ii) With Aux, PO’s P Ka and NO’s SKb as inputs, NO sends Aux to the CS; iii) With Aux , NO’s P Kb , PO’s tags σi and param as inputs, it outputs the renewed file tag t and block tags σi for NO.

Secure Cloud Auditing with Efficient Ownership Transfer

617

The correctness of the protocol is defined as follows: Definition 4 (Correctness). The cloud auditing with ownership transfer is correct iff the following conditions are satisfied: – If PO and the CS execute honestly, then for any challenged data blocks mi , there is always Audit (Q, t, mi , σi , P Ka , param) → 1. – If PO, NO and the CS are honest, then σi for NO generated by TagTrans (F, SKa , P Ka , SKb , P Kb , σi , param) → (σi , t ) matches exactly with mi uploaded by PO. Furthermore, for any auditing task from NO, the output of Audit (Q, t, mi , σi , P Kb , param) → {0, 1} is always “ 1”. 3.3

The Security Model

A secure cloud auditing protocol with efficient ownership transfer should satisfy properties of soundness, unforgeability, secure transferability and detectability, which are defined as follows. Definition 5 (Soundness). The cloud auditing with ownership transfer is sound if it is infeasible for the CS to provide a valid proof to pass the integrity verification, when the challenged data mi is corrupted to be mi = mi , i.e., for any security parameter λ and negligible function negl(·), there is  Pr AOsign (SK,·) (P K, param) → (σi , t ) ∧ mi = mi ∧ (mi , σi ) ∈ / {(m, σ)} ∧Audit ({i}, t , mi , σi , P K, param) → 1 :

  (SK, P K) ← KeyGen (param) , param ← SysGen 1λ ≤ negl(λ), where Osign (·, ·) is the oracle for signature queries, and {(m, σ)} is the set of pairs that A had queried, the same below. Definition 6 (Unforgeability). The cloud auditing with ownership transfer is unforgeable if the ownership of any block ms is infeasible to be forged for all adversaries, i.e., for any security parameter λ and negligible function negl(·),  Pr AOsign (SK,·) (P K, param) → (σs , t ) ∧ (ms , σs ) ∈ / {(m, σ)} ∧Audit ({s}, t , ms , σs , P K, param) → 1 :

  (SK, P K) ← KeyGen (param) , param ← SysGen 1λ ≤ negl(λ). Definition 7 (Secure transferability). The cloud auditing protocol is with secure ownership transfer iff the following conditions are satisfied: – If P O is honest, it is resistant to the collusion attack launched by N O and the CS. In another word, such colluding adversaries can not produce any valid

618

J. Shen et al.

tags on P O’s behalf. Suppose that the key pair of P O is (SKa , P Ka ). We define that for all PPT adversaries and arbitrary security parameter λ,  Pr AOsign (SKa ,·),Oaux (·) (P Ka , SKb , P Kb , param) → (σs , t ) / {(m, σ)} ∧ Audit ({s}, t , ms , σs , P Ka , param) → 1 : ∧(ms , σs ) ∈   (SK, P K) ← KeyGen (param) , param ← SysGen 1λ ≤ negl(λ), where negl(·) is a negligible function and Oaux (·) is for auxiliary queries in ownership transfer. – If N O is honest, it is protected from a colluding CS and P O. That is, even though with the combined ability of the CS and P O, tags belonging to NO cannot be produced. Suppose that the key pair of N O is (SKb , P Kb ). We define that for any PPT adversary and security parameter λ,  Pr AOsign (SKb ,·),Oaux (·) (P Kb , SKa , P Ka , param) → (σs , t ) / {(m, σ)} ∧ Audit ({s}, t , ms , σs , P Kb , param) → 1 : ∧(ms , σs ) ∈   (SK, P K) ← KeyGen (param) , param ← SysGen 1λ ≤ negl(λ). Definition 8 (Detectability). The cloud auditing protocol with ownership transfer is (ρ, δ)-detectable, where 0 ≤ ρ, δ ≤ 1, if the probability that integrity corruptions of the transferred cloud data can be detected is no less than δ when there are a fraction ρ of corrupted data blocks.

4

The Proposed Auditing Protocol

We begin by the overview of the proposed secure auditing protocol with efficient ownership transfer. Then we present it in more details. 4.1

Overview

To achieve cloud auditing with ownership transfer, we argue that it is impractical for the large costs produced, if the new owner downloads cloud data and uploads new generated tags for these data. On the other hand, if we delegate ownership transfer to a third party to reduce computations and improve efficiency, secrets of transfer parties may suffer from collusion attacks launched by the other transfer party and the third party employed. In such a delegation, the communication cost is still a primary concern. To simultaneously improve the security and efficiency, we construct a novel tag structure. With such a structure, ownership transfer is securely outsourced to relieve computations on transfer parties. Apart from that, auditing of the transferred data is consequently executed using the public key of the new owner. Last and most important, the communication cost produced by ownership transfer is constant and independent of the number of tags.

Secure Cloud Auditing with Efficient Ownership Transfer

4.2

619

The Cloud Auditing with Efficient Ownership Transfer

The encoded file with abstract F to be uploaded by PO for remote storage is divided into n blocks, appearing as {m1 , m2 , · · · , mn }, where mi ∈ Zp for i ∈ [1, n]. Similar to [22,24,29], S = Kgen,Sig,Vrf is a signature scheme and S.Sig (·)ssk is a signature under the secret signing key ssk, where (ssk, spk) ← S.Kgen 1λ . The procedure of the protocol execution is as follows:   – SysGen 1λ → param. On input the security parameter λ, the system executes as follows: 1. Select two cyclic multiplicative groups G and GT with prime order p and a bilinear pairing e : G × G → GT . 2. Pick two independent generators g, u ∈R G. 3. Choose two cryptographic hash functions H1 : {0, 1}∗ → G and H2 : {0, 1}∗ → Zp . 4. Return the system parameter param = {G, GT , p, g, u, e, H1 , H2 , S}. – KeyGen (param) → (SK, P K). On input the system parameter param, PO and NO generate key pairs as follows: 1. PO selects ska = α ∈R Zp and computes pka = g α . 2. NO selects skb = β ∈R Zp and computes pkb = g β . 3. PO runs S.Kgen to generate a signing key pair (sska , spka ). 4. NO runs S.Kgen to generate a signing key pair (sskb , spkb ). 5. PO obtains key pair: (SKa , P Ka ) = ((ska , sska ) , (pka , spka )), NO obtains key pair: (SKb , P Kb ) = ((skb , sskb ) , (pkb , spkb )). – TagGen (F, n, mi , SK, param) → (σi , t). On input file abstract F , data blocks mi and their amount n, and param, PO with SKa executes as follows: 1. Compute r = H2 (α||F ), h = g r , and h1 = hα . 2. Calculate tags for {mi }i∈[1,n] as r

1

σi = H1 (F ||i) · (umi ) α+H2 (F ||h) . 3. Compute t = (F ||n||h||h1 ) ||S.Sig(F ||n||h||h1 )sska as the file tag.

4. Upload {(mi , σi )}i∈[1,n] , t to the cloud for remote storage and delete the local copy. – Audit (Q, t, mi , σi , P K, param) → {0, 1}. On input the file tag t, a c-element challenge set Q = {(i, γi ∈R Zp )} chosen by the TPA, data blocks mi along with tags σi , P Ka of PO, and param, the TPA interacts with the CS as follows: 1. The CS calculates the data proof and tag proof as  mi γi , P σ = σiγi . PM = (i,γi )∈Q

(i,γi )∈Q

2. The CS sends the proof P = (P M, P σ, t) to the TPA as the response.

620

J. Shen et al.

3. The TPA obtains F, h, h1 if S.Vrf(F ||n||h||h1 )spka = 1, and then checks the validity of P via the verification equation ⎞ ⎛     ? γ e P σ, pka g H2 (F ||h) = e ⎝ H1 (F ||i) i , h1 hH2 (F ||h) ⎠ · e uP M , g . (i,γi )∈Q

4. The TPA outputs “1” iff the equation holds, indicating that these data are correctly and completely stored. Otherwise, at least one of these challenged data blocks must have been corrupted in the cloud. – TagTrans (F, SKa , P Ka , SKb , P Kb , σi , param) → (σi , t ). On input the file abstract F , data tags σi generated by PO, and param, PO with key pair (SKa , P Ka ) and NO with (SKb , P Kb ) interact with the CS as follows:  1. NO computes r = H2 (β||F ), h = g r , and h1 = hβ . 2. PO selects x ∈R Zp , computes auxiliaries r = H2 (α||F ) , aux = −

1 − x, v = ux , α + H2 (F ||h)

and sends Aux = (r||aux||v) ||S.Sig(r||aux||v)sska to NO. 3. NO parses Aux and recovers r, aux, v if S.Vrf(r||aux||v)spka = 1; Otherwise, drops it and aborts. 4. NO picks x ∈R Zp , computes auxiliaries for tag recomputation: R = r − r, aux =

 1 − x + aux, V = vv  = vux , β + H2 (F ||h )

and sends Aux = (R||aux ||V ) ||S.Sig(R||aux ||V )sskb along with the new file tag t = (F ||n||h ||h1 ) ||S.Sig(F ||n||h ||h1 )sskb to the CS. 5. The CS parses Aux and obtains R, aux , V iff S.Vrf(R||aux ||V )spkb = 1, and stores t iff S.Vrf(F ||n||h ||h1 )spkb = 1. 6. The CS computes the new tags belonging to NO as R

σi = σi · H1 (F ||i) · (umi )

aux

r

1

· V mi = H1 (F ||i) · (umi ) β+H2 (F ||h ) .

This completes the description of the cloud auditing with efficient ownership transfer, where data tags before and after transfer are with the same structure. Such a fact enables ownership to be transferred to another cloud client by following algorithm TagTrans (F, SKa , P Ka , SKb , P Kb , σi , param) → (σi , t ). Note that Aux for regenerating tags under the same file is with constant size and independent of the number of tags.

5

Correctness and Security Analysis

We give proofs of the following several theorems to demonstrate achievements of correctness, soundness, unforgeability, secure transferability and detectability defined.

Secure Cloud Auditing with Efficient Ownership Transfer

621

Theorem 1. The proposed protocol is correct. Concretely, if PO uploads its data honestly and the CS preserves them well, then the proof responded by the CS is valid with overwhelming probability. Proof. We demonstrate the correctness of the proposed protocol by proving the equality of the verification equation, since the equation only holds for valid proofs. The correctness of the equation is derived as follows:   e P σ, pka · g H2 (F ||h)   γi α H2 (F ||h) =e σ ,g · g (i,γi )∈Q i   1 rγi mi α+H2 (F ||h) γi α H2 (F ||h) =e H1 (F ||i) · (u ) ,g g (i,γi )∈Q (i,γi )∈Q      m γ γ =e H1 (F ||i) i , h1 · hH2 (F ||h) · e u (i,γi )∈Q i i , g (i,γi )∈Q     γi H2 (F ||h) H1 (F ||i) , h1 · h · e uP M , g . =e (i,γi )∈Q

If PO and NO calculate and output auxiliaries honestly and the CS stores data honestly as well as renews tags correctly, then the proof for the challenged data belonging to the new owner NO is valid. Similarly, the correctness is derivable from the equality of verification equation, of which the process is omitted here, since structures of verification equations are identical for the same tag structure before and after ownership transfer. Theorem 2. The proposed protocol is sound. Concretely, if the k-CEIDH assumption holds, no adversary can cause the TPA to accept proofs generated from some corrupted data with non-negligible probability in the random oracle model. Proof. Suppose that there exists a PPT adversary A who can break the soundness of the protocol. We construct a simulator B to break the k-CEIDH assumption and collision resistance of hash functions by Given as input a k-CEIDH problem instance  interacting with A. 1 1 1 b α+e1 e1 , e2 , · · · , ek , g, g , g , g α+e2 , · · · , g α+ek , B controls random oracles and runs A. Let the file with abstract Fl and blocks {ml1 , ml2 , · · · , mln } along with tags {σl1 , σl2 , · · · , σln } be challenged. The query set causing the challenger to abort is Q = {(i, γi )} with |Q| = c, and the proof from A is Pl∗ = (P Ml∗ , P σl∗ , t∗l ). Let the acceptable proof from the honest prover be  γi mli γi , P σl = σli , tl ), Pl = (P Ml = (i,γi )∈Q

where hl = g rl = g H2 (α||Fl ) and h1l = hα l in tl .

(i,γi )∈Q

622

J. Shen et al.

According to Theorem 1, it is required that the expected proof perfectly satisfies the verification equation, i.e.,       H (F ||h ) γ H1 (Fl ||i) i , h1l hl 2 l l e uP Ml , g . e P σl , pka g H2 (Fl ||hl ) = e (i,γi )∈Q

Since A broke the soundness of the protocol, it is obvious that Pl = Pl∗ and that    ∗ e P σl∗ , pka g H2 (Fl ||hl ) = e

(i,γi )∈Q

H1 (Fl ||i)γi , h∗1l h∗l

H2 (Fl ||h∗ l)

   ∗ e uP Ml , g .

First, we demonstrate that h∗l = hl and h∗1l = h1l if hash functions are r rl α = (g α ) l = pka rl , collision-free. Since hl = g rl = g H2 (α||Fl ) and h1l = hα l = g ∗ ∗ where g, pka are public parameters, hl = hl and h1l = h1l indicate the equality of exponents rl∗ and rl , implying a collision of H2 occurring with negligible probability 1/p. Second, we show that (P Ml∗ , P σl∗ ) = (P Ml , P σl ) if the assumption of kCEIDH problem holds. The following are the details: – Setup. Let H1 : {0, 1}∗ → G and H2 : {0, 1}∗ → Zp be random oracles controlled by the simulator. B sets u = g b , where b ∈R Zp . – H-query. This phase is for hash queries of H1 and H2 . Times of queries to H2 (Fl ||hl ) is q21 , to H1 (Fl ||i) is q1 and to H2 (α||Fl ) is q22 . Such query and response pairs are recorded in empty tables TH21 , TH1 and TH22 generated by B. For queries of the ith block in file with abstract Fl , if they are searchable in tables, B returns these recorded values; otherwise, it executes as follows: • For query (Fl , hl ), B chooses el ∈R Zp , sets H2 (Fl ||hl ) = el , records (l, Fl ||hl , el , H2 (Fl ||hl ) , A) in TH21 , and responds the hash query with H2 (Fl ||hl ). • For query (Fl , i), B chooses xli ∈R Zp , sets H1 (Fl ||i) = g xli /umli , records (l, i, Fl ||i, xli , H1 (Fl ||i) , A) in TH1 , and responds the hash query with H1 (Fl ||i). 1 , • For query (α, Fl ), B chooses rl ∈R Zp , sets rl = H2 (α||Fl ) = rl + α+e l  records (l, α||Fl , rl , H2 (α||Fl ) , A) in TH22 , and responds the hash query with H2 (α||Fl ). – S-query. This phase is for signature queries, which is conducted for qs times. Let the ith block of file with abstract Fl be queried, we have H1 (Fl ||i) =

g xli 1 , H2 (Fl ||hl ) = el , rl = rl + . umli α + el

Then, B computes σli by  σli =

g xli umli

1 rl + α+e

l

u

mli α+el

=



1 xli α+e



1 mli α+e l

g xli rl g

umli rl u

σli is the response to signature query on mli .

l

u

mli α+el



1

g xli rl g α+el =  umli rl

xli

.

Secure Cloud Auditing with Efficient Ownership Transfer

623

– Forgery. Eventually, A forges a proof containing the corrupted i∗ th block with m∗li∗ = mli∗ . The verification equation is rearranged as       γi α+el α+el ? =e H1 (Fl ||i) , hl · e u P Ml , g . e P σl , g (i,γi )∈Q

Divide the verification equation for (P σl∗ , P Ml∗ ) by that for (P σl , P Ml ), i.e.,     ∗ α+el e (P σl∗ /P σl ) , g =e uP Ml /uP Ml , g . Let ΔP Ml = P Ml∗ −P Ml = γi∗ (m∗li∗ − mli∗ ), the division yields the solution to the k-CEIDH problem: 1

1

u α+el = (P σl∗ /P σl ) ΔP Ml . This completes the simulation and solution. The correctness is shown below. The simulation is indistinguishable from the real attack because randomnesses including b in setup, el , xli , xli∗ , rl in hash responses and i∗ in signature generation are randomly chosen and independent in the view of A. Different randomnesses in signature and hash queries ensure the success of simulation. For the same file, such randomnesses vary in H1 (Fl ||i) merely with probability of 1 − qp1 , making the probability of successful simulation and useful attack be  qs 1 − qp1 . Suppose A (t, qs , q1 , q21 , q22 , )-breaks the protocol. With A’s abilqs  ity, B solves the k-CEIDH problem with probability of 1 − qp1 · ≈ . The time cost of simulation is TS = O (q1 + q21 + q22 + qs ). Therefore, B solves the k-CEIDH problem with (t + TS , ). Theorem 3. The proposed protocol is unforgeable. Concretely, if the k-CEIDH assumption holds, it is computationally infeasible for all adversaries to forge provably valid tags for any data with non-negligible probability in the random oracle model. Proof. Suppose there is an adversary A who can break the unforgeability of the proposed protocol. We construct a simulator B to break the k-CEIDH assumption by interacting  with A. With the input of  1 1 1 e1 , e2 , · · · , ek , g, g b , g α+e1 , g α+e2 , · · · , g α+ek , the k-CEIDH adversary B simulates the security game for A below. – Setup. Hash functions H1 and H2 are random oracles controlled by the simulator. B sets u = g b , where b ∈R Zp . – H-query. Hash queries are made in this phase, which are the same with that described in the proof of Theorem 2. – S-query. Signature queries are made in this phase by A for qs times. For a query on the ith block of file with abstract Fl , we have H1 (Fl ||i) = g xli /umli , 1 . B computes σli for mli by H2 (Fl ||hl ) = el , and rl = rl + α+e l  σli =

g xli umli

1 rl + α+e

l

mli

u α+el =



1

g xli rl g α+el  umli rl

xli

.

624

J. Shen et al.

– Forgery. In this phase, A aims to return a forged tag σli∗ of the i∗th block mli∗ that has never been queried in file with abstract Fl . The corresponding hash responses from B are H1 (Fl ||i∗ ) = g xli∗ /umli∗ , H2 (Fl ||hl ) = el , and H2 (α||Fl ) = rl . Then there is 1

r

σli∗ = H1 (Fl ||i∗ ) l (umli∗ ) α+el =



g xli∗ umli∗

rl

1

(umli∗ ) α+el =

1 g rl xli∗ mli∗ α+e l . u url mli∗

Now we have found the solution to the k-CEIDH problem: 1

1

u α+el = (σli∗ url mli∗ /g rl xli∗ ) mli∗ . This completes the simulation and solution. The correctness is shown below. The simulation is indistinguishable from the real attack because randomnesses including b in setup, el , xli , xli∗ , rl in hash responses and i∗ in signature generation are randomly chosen. For the same file, such randomnesses vary in H1 (Fl ||i) merely with probability of 1 − qp1 , making the probqs  . Suppose A ability of successful simulation and useful attack be 1 − qp1 (t, qs , q1 , q21 , q22 , )-breaks theprotocol.  With A’s ability, B solves the k-CEIDH qs

problem with probability of 1 − qp1 · ≈ . The time cost of simulation is TS = O (q1 + q21 + q22 +qs ). Therefore, B solves the k-CEIDH problem with (t + TS , ).

Theorem 4. The proposed protocol is with secure ownership transfer. Concretely, if the k-CEIDH assumption holds, it is computationally infeasible for any colluding adversary to forge provably valid tags on behalf of others with nonnegligible probability in the random oracle model. Proof. First, we demonstrate that P O is secure against collusion attack launched by the CS and N O. Assume that there is a PPT adversary A who can break the transfer security of P O. Then, we construct a simulator B to break the kCEIDH assumption by interacting with A. Given as input a problem instance   1 1 1 e1 , e2 , · · · , ek , g, g b , g α+e1 , g α+e2 , · · · , g α+ek , B controls random oracles, runs A and works as follows. – Setup. Since the CS and N O collude, B must be able to provide A with the secret key β of N O. Set u = g b , where b ∈R Zp . – H-query. Hash queries concerning data file belonging to PO are made as the same as that in Theorem 2. For queries of the ith block in file with abstract Fl , B sets H2 (Fl ||hl ) = el , H1 (Fl ||i) = g xli /umli and rl = H2 (α||Fl ) = 1 , where el , xli , rl ∈R Zp . rl + α+e l – S-query. Signature queries for tags belonging to P O are as the same as that in the proof of Theorem 2. – Aux-query. Queries for auxiliaries to regenerate tags are made in this phase 1 1 for PO is from H-query, and auxl = − α+e −xl and for qa times. rl = rl + α+e l l

Secure Cloud Auditing with Efficient Ownership Transfer

625 N

rl vl = uxl , where xl is chosen by B. NO computes rlN = H2 (β||Fl ), hN , l =g Nβ N N h1l = hl , and picks xl ∈R Zp , then there are:

R = rlN − rl , auxN l =

1 xN l . − xN l + auxl , Vl = vl u N β + el

– Forgery. Eventually, A forges a valid tag for a block ml∗ i in file with abstract Fl∗ of PO that has not been queried. With hash responses H1 (Fl∗ ||i) = g xl∗ i /uml∗ i and H2 (Fl∗ ||hl∗ ) = el∗ , along with rl∗ from Aux-query, there is σl∗ i = H1 (Fl∗ ||i)

rl∗

1

(uml∗ i ) α+el∗ =

g rl∗ xl∗ i ml∗ i α+e1 ∗ l . u u rl∗ ml∗ i

Hence, we figure out the solution to the k-CEIDH problem: 1

1

u α+el∗ = (σl∗ i url∗ ml∗ i /g rl∗ xl∗ i ) ml∗ i . This completes the simulation and solution. The correctness is shown below. The simulation is indistinguishable from the real attack because randomnesses including b in setup, el , el∗ ,xli , xl∗ i , rl in hash responses, xl , rl∗ in aux queries and l∗ in signature generation are randomly chosen. For different files, such randomnesses vary in H1 (Fl ||i) with 1 − qp1 , H2 (Fl ||hl ) with 1 − qp21 and H2 (α||Fl ) with 1 − qp22 , making the probability of successful simulation and useqs  ful attack be (1 − qp1 )(1 − qp21 )(1 − qp22 ) . Suppose A (t, qs , q1 , q21 , q22 , qa , )breaks theprotocol. With A’s ability, B solves the k-CEIDH problem with probqs ability of (1 − qp1 )(1 − qp21 )(1 − qp22 ) ≈ . The time cost of simulation is TS = O (q1 + q21 + q22 + qs + qa ). Therefore, B solves the k-CEIDH problem with (t + TS , ). Second, we show that N O is secure against collusion attack launched by the CS and P O. We assume that there is a PPT adversary A who can break the transfer security of N O. Then, we construct a simulator B to break the k-CEIDH assumption by interacting with A. On input a k-CEIDH problem  1 1 1 instance e1 , e2 , · · · , ek , g, g b , g β+e1 , g β+e2 , · · · , g β+ek , B controls random oracles, runs A and works as follows. – Setup. Since the CS and P O collude, B must be able to provide A with the secret key α of P O. Set u = g b , where b ∈R Zp . – H-query. Hash queries are made similar to the above, only with the difference 1 , in that for query (β, Fl ), B chooses rl ∈R Zp , sets rl = H2 (β||Fl ) = rl + β+e l  and records (l, β||Fl , rl , H2 (β||Fl ), A) in TH22 . – S-query. Signature queries for tags of N O are similar to that in the proof of Theorem 2, with the difference that the query is for tag belonging to N O. Hence, the tag response for mli is:  σli =

g xli umli

1 rl + β+e

l

mli

u β+el =



1

g xli rl g β+el  umli rl

xli

.

626

J. Shen et al.

– Aux-query. Queries for auxiliaries   to regenerate tags are made in this phase. 1 1  xl , , aux = − − x , v = u From NO, there is Aux = rl = rl + β+e l l l β+el l where rl is from H-query and xl is chosen by B. PO computes rlP = H2 (α||Fl ), rlP Pα P hP , hP l =g 1l = hl , and picks xl ∈R Zp , then there are: 1 xP l . − xP l + auxl , Vl = vl u N α + el – Forgery. Eventually, A returns a valid tag for a block ml∗ i in file with abstract Fl∗ belonging to NO that has not been queried. With H1 (Fl∗ ||i) = g xl∗ i /uml∗ i , H2 (Fl∗ ||hl∗ ) = el∗ , and rl∗ , B computes R = rlP − rl , auxP l =

σl∗ i = H1 (Fl∗ ||i)

rl∗

1

(uml∗ i ) β+el∗ =

Then, the solution to k-CEIDH problem is:

g rl∗ xl∗ i ml∗ i β+e1 ∗ l . u u rl∗ ml∗ i

1

1

u β+el∗ = (σl∗ i url∗ ml∗ i /g rl∗ xl∗ i ) ml∗ i . This completes the simulation and solution. The correctness is analyzed similar to that in the first part, which is omitted here. Consequently, B solves the k-CEIDH problem with (t + TS , ). c

Theorem 5. The proposed protocol is (fc , 1 − (1 − fc ) )-detectable, where fc is the ratio of corrupted data blocks and c is the number of challenged blocks. Proof. Given a fraction fc of corrupted data after ownership transfer, the probc ability of corruption detectability is no less than 1 − (1 − fc ) . Table 1. Property and communication cost comparisons Protocols

Protocol [25] Our protocol

Outsourced computation of ownership transfer Yes

Yes

Public verifiability

No

Yes

Thorough transferability

No

Yes

Unlimited times of ownership transfer

No

Yes

Data auditing

2|q| + log2 n 4|q|

c(log2 n + |p|) 4|p|

Challenges Responses

Ownership transfer The transferred side n|q| The transferring side |q| The computing server side (n + 1)|q|

3|p| 3|p| 0

It is obvious that a block is regarded to be intact iff it is chosen from the c fraction 1 − fc of the whole. (1 − fc ) is the probability that all the challenged c blocks are considered to be well-preserved, when the protocol is not detectable. Since the protocol is detectable, at least one of them should be in the fc part, c ruling out the case with the probability of (1 − fc ) . Hence, the probability that c our protocol is detectable is at least 1 − (1 − fc ) .

Secure Cloud Auditing with Efficient Ownership Transfer

6

627

Performance Analysis

We show the efficiency of our design through numerical analysis and experimental results compared with the state of the art. Following that, we discuss the performance of ownership transfer among multiple cloud clients. 6.1

Efficiency Analysis

Simulation experiments are conducted on the Ubuntu 12.04.5 (1 GB memory) VMware 10.0 in a laptop running Windows 10 with Intel(R) Core(TM) i5-8250U @ 1.6 GHZ and 8 GB RAM. The codes are written using C programming language with GMP Library (GMP-6.1.2) and PBC Library (pbc-0.5.14), of which the data results are used to draw figures in MATLAB 2019. The property and communication cost comparisons are presented in Table 1, where n is the number of data blocks. Let the number of sectors in [25] set to be 1, and q equals to p appearing in our protocol, which is the prime order of G. According to Table 1, both protocols enable computation of transfer outsourced, while we provide thorough transferability rather than only the partial one because the secret key of the old owner no longer remains in tags for new owners. In addition, we achieve public verifiability of data and unlimited times of ownership transfer among cloud clients. As for the communication overhead, though [25] costs less in data auditing, it trades for more computations in proof generation in turn. In ownership transfer, the communication cost in our protocol is with a smaller and constant size. Table 2. Computational overhead comparison Protocols

Protocol [25]

Our protocol

Tag generation

(2n + 2)E + 2nM

(2n + 3)E + nM + I

Proof generation

cE + (4c − 1)M

cE + (2c − 1)M

Proof verification

4P + (c + 1)E + 4cM 3P + (c + 3)E + (c + 1)M

Aux generation on transferred side

nE + nM + nI

Aux generation on transferring side E + I Tag recomputation on server side

E + 2nM

E+I E+M +I 4nE + 3nM

The computational overhead analysis is shown in Table 2, where c is the number of challenged blocks. For the sake of simplicity, we denote modular exponentiation as E, point multiplication as M , bilinear pairing as P and inverse as I. We ignore hash functions in comparison, for the fact that its cost is negligible when compared with that of the pre-mentioned operations. According to Table 2, the computational cost of [25] in data auditing concerning the first three entries is larger than ours. The experimental results are shown in Fig. 1, where n ranges from 500 to 5000 in Fig. 1(a), and n is set to be 1000 in Fig. 1(b)(c). Costs in ownership transfer are described in Fig. 2 with n from 500 to 5000. According

628

J. Shen et al.

100

1.4

60

40

20

4 Our protocol Protocol [25]

1.2

time for proof verification (s)

time for proof generation (s)

time for tag generation (s)

Our protocol Protocol [25]

80

1 0.8 0.6 0.4 0.2

0

3 2.5 2 1.5 1 0.5

0 0

1000

2000

3000

4000

5000

Our protocol Protocol [25]

3.5

0 0

50

number of data blocks

100

150

200

250

300

0

50

number of challenged blocks

(a) tag generation

100

150

200

250

300

number of challenged blocks

(b) proof generation

(c) proof verification

Fig. 1. Time cost of data outsourcing and auditing

70 Our protocol Protocol [25]

60 50 40 30 20 10 0 0

1000

2000

3000

4000

10-3

6

Our protocol Protocol [25]

5 4 3 2 1 0

5000

0

1000

number of data blocks

3000

4000

5000

70 Our protocol Protocol [25]

60 50 40 30 20 10 0 0

1000

2000

3000

4000

5000

number of data blocks

number of data blocks



(a) Aux generation

(b) Aux generation

(c) recomputation on server

100

70

total time for ownership transfer (s)

time for computation couldn't be outsourced (s)

2000

time for transfer on computing server side (s)

time for transfer on the transferring side (s)

time for transfer on the transferred side (s)

to Fig. 2, though computation on the transferring side is more expensive in our protocol, the cost in [25] that could not be outsourced to the computing server is significantly higher than that of a constant size in ours. Besides, the total cost of ownership transfer is lower in our design. Note that, the more computational overhead concerning the last entry in our protocol seems like nothing, since the server is equipped with professional computing power.

Our protocol Protocol [25]

60 50 40 30 20 10 0

Our protocol Protocol [25]

80

60

40

20

0 0

1000

2000

3000

4000

5000

number of data blocks

(d) computations could not be outsourced

0

1000

2000

3000

4000

5000

number of data blocks

(e) the total cost for transfer

Fig. 2. Time cost of data ownership transfer

6.2

Ownership Transfer Discussion

Recall that in the core part of Sect. 4, we have introduced the ownership transfer of cloud data from PO to NO. Due to the well-designed novel data tags in our

Secure Cloud Auditing with Efficient Ownership Transfer

629

protocol, the regenerated tag for NO is with the same structure with that for PO, except that the secret key embedded in now belongs to NO rather than PO. With such a tag structure, the transferred data looks just like the data originally possessed by NO and can be transferred continuously to other cloud clients. More generally speaking, our protocol supports ownership transfer of any piece of data among multiple clients for numerous times.

7

Conclusion

We propose a secure auditing protocol with efficient ownership transfer for cloud data, satisfying verifiability and transferability simultaneously. Compared with the state of the art, the communication cost produced by ownership transfer is constant rather than dependent of the number of tags, and the computational overhead during transfer on both transfer parties is constant as well. In addition, the protocol satisfies properties of correctness, soundness, unforgeability and detectability, which also protects the untransferred data from collusion attacks when ownership transfer occurs. Acknowledgments. This work is supported by the National Nature Science Foundation of China (No. 61960206014) and the National Cryptography Development Fund (No. MMJJ20180110).

References 1. Ateniese, G., et al.: Provable data possession at untrusted stores. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 598–609. ACM (2007) 2. Ateniese, G., Di Pietro, R., Mancini, L.V., Tsudik, G.: Scalable and efficient provable data possession. In: Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, pp. 1–10. ACM (2008) 3. Chen, X., Lee, B., Kim, K.: Receipt-free electronic auction schemes using homomorphic encryption. In: Lim, J.-I., Lee, D.-H. (eds.) ICISC 2003. LNCS, vol. 2971, pp. 259–273. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-246916 20 4. Chen, X., Jin, L., Jian, W., Ma, J., Lou, W.: Verifiable computation over large database with incremental updates. In: Proceedings of European Symposium on Research in Computer Security, pp. 148–162 (2014) 5. Chen, X., Li, J., Huang, X., Ma, J., Lou, W.: New publicly verifiable databases with efficient updates. IEEE Trans. Dependable Secure Comput. 12(5), 546–556 (2014) 6. Dillon, T., Wu, C., Chang, E.: Cloud computing: issues and challenges. In: Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications, pp. 27–33. IEEE (2010) 7. Erway, C., K¨ up¸cu ¨, A., Papamanthou, C., Tamassia, R.: Dynamic provable data possession. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 213–222. ACM (2009)

630

J. Shen et al.

8. Fu, A., Yu, S., Zhang, Y., Wang, H., Huang, C.: NPP: a new privacy-aware public auditing scheme for cloud data sharing with group users. IEEE Trans. Big Data (2017) 9. Jin, H., Jiang, H., Zhou, K.: Dynamic and public auditing with fair arbitration for cloud data. IEEE Trans. Cloud Comput. 6(3), 680–693 (2016) 10. Juels, A., Kaliski Jr., B.S.: PORs: proofs of retrievability for large files. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 584–597. ACM (2007) 11. Kamara, S., Lauter, K.: Cryptographic cloud storage. In: Sion, R., et al. (eds.) FC 2010. LNCS, vol. 6054, pp. 136–149. Springer, Heidelberg (2010). https://doi.org/ 10.1007/978-3-642-14992-4 13 12. Li, J., Tan, X., Chen, X., Wong, D.S., Xhafa, F.: OPoR: enabling proof of retrievability in cloud computing with resource-constrained devices. IEEE Trans. Cloud Comput. 3(2), 195–205 (2014) 13. Li, J., Li, J., Xie, D., Cai, Z.: Secure auditing and deduplicating data in cloud. IEEE Trans. Comput. 65(8), 2386–2396 (2015) 14. Li, Y., Yu, Y., Min, G., Susilo, W., Ni, J., Choo, K.K.R.: Fuzzy identity-based data integrity auditing for reliable cloud storage systems. IEEE Trans. Dependable Secure Comput. 16(1), 72–83 (2017) 15. Luo, Y., Xu, M., Fu, S., Wang, D., Deng, J.: Efficient integrity auditing for shared data in the cloud with secure user revocation. In: Proceedings of 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 1, pp. 434–442. IEEE (2015) 16. Mell, P., Grance, T., et al.: The NIST definition of cloud computing (2011) 17. Ramachandra, G., Iftikhar, M., Khan, F.A.: A comprehensive survey on security in cloud computing. Procedia Comput. Sci. 110, 465–472 (2017) 18. Rittinghouse, J.W., Ransome, J.F.: Cloud Computing: Implementation, Management, and Security. CRC Press, Boca Raton (2017) 19. Shacham, H., Waters, B.: Compact proofs of retrievability. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 90–107. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89255-7 7 20. Shen, J., Shen, J., Chen, X., Huang, X., Susilo, W.: An efficient public auditing protocol with novel dynamic structure for cloud data. IEEE Trans. Inf. Forensics Secur. 12(10), 2402–2415 (2017) 21. Shen, J., Zhou, T., Chen, X., Li, J., Susilo, W.: Anonymous and traceable group data sharing in cloud computing. IEEE Trans. Inf. Forensics Secur. 13(4), 912–925 (2017) 22. Tian, H., et al.: Dynamic-hash-table based public auditing for secure cloud storage. IEEE Trans. Serv. Comput. 10(5), 701–714 (2015) 23. Wang, B., Li, B., Li, H.: Panda: public auditing for shared data with efficient user revocation in the cloud. IEEE Trans. Serv. Comput. 8(1), 92–106 (2013) 24. Wang, C., Chow, S.S., Wang, Q., Ren, K., Lou, W.: Privacy-preserving public auditing for secure cloud storage. IEEE Trans. Comput. 62(2), 362–375 (2011) 25. Wang, H., He, D., Fu, A., Li, Q., Wang, Q.: Provable data possession with outsourced data transfer. IEEE Trans. Serv. Comput. (2019) 26. Wang, H., He, D., Tang, S.: Identity-based proxy-oriented data uploading and remote data integrity checking in public cloud. IEEE Trans. Inf. Forensics Secur. 11(6), 1165–1176 (2016) 27. Wang, J., Chen, X., Li, J., Kluczniak, K., Kutylowski, M.: TrDup: enhancing secure data deduplication with user traceability in cloud computing. Int. J. Web Grid Serv. 13(3), 270–289 (2017)

Secure Cloud Auditing with Efficient Ownership Transfer

631

28. Wang, Q., Wang, C., Ren, K., Lou, W., Li, J.: Enabling public auditability and data dynamics for storage security in cloud computing. IEEE Trans. Parallel Distrib. Syst. 22(5), 847–859 (2010) 29. Wang, Y., Wu, Q., Bo, Q., Shi, W., Deng, R.H., Hu, J.: Identity-based data outsourcing with comprehensive auditing in clouds. IEEE Trans. Inf. Forensics Secur. 12(4), 940–952 (2017) 30. Worku, S.G., Xu, C., Zhao, J., He, X.: Secure and efficient privacy-preserving public auditing scheme for cloud storage. Comput. Electr. Eng. 40(5), 1703–1713 (2014) 31. Yang, G., Yu, J., Shen, W., Su, Q., Fu, Z., Hao, R.: Enabling public auditing for shared data in cloud storage supporting identity privacy and traceability. J. Syst. Softw. 113, 130–139 (2016) 32. Yu, J., Ren, K., Wang, C.: Enabling cloud storage auditing with verifiable outsourcing of key updates. IEEE Trans. Inf. Forensics Secur. 11(6), 1362–1375 (2016) 33. Yu, J., Ren, K., Wang, C., Varadharajan, V.: Enabling cloud storage auditing with key-exposure resistance. IEEE Trans. Inf. Forensics Secur. 10(6), 1167–1179 (2015) 34. Yu, J., Wang, H.: Strong key-exposure resilient auditing for secure cloud storage. IEEE Trans. Inf. Forensics Secur. 12(8), 1931–1940 (2017) 35. Yuan, H., Chen, X., Jiang, T., Zhang, X., Yan, Z., Xiang, Y.: DedupDUM: secure and scalable data deduplication with dynamic user management. Inf. Sci. 456, 159–173 (2018) 36. Zhang, Y., Yu, J., Hao, R., Wang, C., Ren, K.: Enabling efficient user revocation in identity-based cloud storage auditing for shared big data. IEEE Trans. Dependable Secure Comput. 17(3), 608–619 (2020)

Privacy

Encrypt-to-Self: Securely Outsourcing Storage Jeroen Pijnenburg1(B) and Bertram Poettering2 1

Royal Holloway, University of London, Egham, UK [email protected] 2 IBM Research – Zurich, Rüschlikon, Switzerland [email protected]

Abstract. We put forward a symmetric encryption primitive tailored towards a specific application: outsourced storage. The setting assumes a memory-bounded computing device that inflates the amount of volatile or permanent memory available to it by letting other (untrusted) devices hold encryptions of information that they return on request. For instance, web servers typically hold for each of the client connections they manage a multitude of data, ranging from user preferences to technical information like database credentials. If the amount of data per session is considerable, busy servers sooner or later run out of memory. One admissible solution to this is to let the server encrypt the session data to itself and to let the client store the ciphertext, with the agreement that the client reproduce the ciphertext in each subsequent request (e.g., via a cookie) so that the session data can be recovered when required. In this article we develop the cryptographic mechanism that should be used to achieve confidential and authentic data storage in the encryptto-self setting, i.e., where encryptor and decryptor coincide and constitute the only entity holding keys. We argue that standard authenticated encryption represents only a suboptimal solution for preserving confidentiality, as much as message authentication codes are suboptimal for preserving authenticity. The crucial observation is that such schemes instantaneously give up on all security promises in the moment the key is compromised. In contrast, data protected with our new primitive remains fully integrity protected and unmalleable. In the course of this paper we develop a formal model for encrypt-to-self systems, show that it solves the outsourced storage problem, propose surprisingly efficient provably secure constructions, and report on our implementations.

1

Introduction

We explore techniques that enable a computing device to securely outsource the storage of data. We start with motivating this area of research by describing three application scenarios where outsourcing storage might prove crucial. The full version of this article is available at https://ia.cr/2020/847. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 635–654, 2020. https://doi.org/10.1007/978-3-030-58951-6_31

636

J. Pijnenburg and B. Poettering

Web Server. We come back to the example considered in the abstract, giving more details. While it is difficult to make general statements about the setup of a web server back-end, it is fair to say that the processing of HTTP requests routinely also includes extracting a session identifier from the HTTP header and fetching basic session-related information (e.g., the user’s password, the date and time of the last login, the number of failed login attempts, but also other kinds of data not related to security) from a possibly remote SQL database. To avoid the inherent bottleneck induced by the transmission and processing of the database query, such data can be cached on the web server, the limits of this depending only on the amount of available working memory (RAM). For some types of web applications and a large number of web sessions served simultaneously, these memory-imposed limits might represent a serious restriction to efficiency. This article scouts techniques that allow the web server to securely outsource the storage of session information to the (untrusted) web clients. Hardware Security Module. An HSM is a computing device that performs cryptographic and other security-related operations on behalf of the owning user. While such devices are internally built from off-the-shelf CPUs and memory chips, a key concept of HSMs is that they are specially encapsulated to protect them against physical attacks, including various kinds of side channel analysis. One consequence of this tamper-proof shielding is that the memory capacity of an HSM can never be physically extended—unlike it would be the case for desktop computers—so that the amount of available working memory might constitute a relevant obstacle when the HSM is deployed in applications with requirements that increase over time (e.g., due to a growing user base). This article scouts techniques that allow the HSM to securely outsource the storage of any kind of valuable information to the (untrusted) embedding host system. Smartcard. A smartcard, most prominently recognized in the form of a payment card or a mobile phone security token, is effectively a tiny computing device. While fairly potent configurations exist (with 32-bit CPUs and a couple of 100 KBs of memory), as the costs associated with producing a smartcard scales roughly linearly with the amount of implemented physical memory, in order to be cost effective, mass-produced cards tend to come with only a small amount of memory. This article scouts techniques that allow smartcards to securely outsource the storage of valuable information to the infrastructure they connect to, e.g., a banking or mobile phone backbone, or a smartphone. Trusted Platform Module. A TPM is a discrete security chip that is embedded into virtually all laptops and desktop PCs produced in the past decade. A TPM supports its host system by offering trusted cryptographic services and is typically relied upon by boot loaders and operating systems. TPMs are located conceptually between HSMs and smartcards, and as much as these they benefit from a secure option to outsource storage. Outsourced Storage Based on Symmetric Cryptography. If a computing device has access to some kind of external storage facility (a memory chip wired

Encrypt-to-Self: Securely Outsourcing Storage

637

to it, a connected hard drive, cloud storage, etc.), then, intuitively, it can virtually extend the amount of memory available to it by outsourcing storage, i.e., by serializing data objects and communicating them to the storage facility which will reproduce them on request. In this article we focus on the case where neither the external storage facility nor the connection to it is considered trustworthy. More concretely, we assume that all infrastructure outside of the computing device itself is under control of an adversary that aims at reading or changing the data that is to be externally stored.1 As a first approximation one might conclude that standard tools from the domain of symmetric encryption are sufficient to achieve security in this setting. Consider for instance the following approach based on authenticated encryption (AE, [13]): The computing device samples a fresh symmetric key; whenever it wants to store internal data on the outsourced storage, it encrypts and authenticates the data by invoking the AE encryption algorithm with its key and hands the resulting ciphertext over to the storage facility; to retrieve the data, it requests a copy of the ciphertext, and decrypts and verifies it. While this simple solution requires further tweaking to thwart replay attacks, as long as the AE key remains private it can be used to protect confidentiality and integrity as expected. Our Contribution: Secure Outsourced Storage w/Key Leakage. While we confirm that standard cryptographic methods will securely solve the storage outsourcing problem if the used key material remains private, we argue that satisfactory solutions should go a step further by providing as much security as possible even if the latter assumption (that keys remain private) is not met. Indeed, different attacks against practical systems that lead to partial or full memory leakage continue to regularly emerge (including different types of side channel analysis against embedded systems, cold-boot attacks against memory chips, Meltdown/Spectre-like attacks against modern CPUs, etc.), and it is commonly understood that the corruption model considered for cryptographic primitives should always be as strong as possible and affordable. For two-party symmetric encryption (e.g., AE) this strongest model necessarily excludes any type of user corruption2 as the keys of both parties are identical: Once any party is corrupted, any past or future ciphertext can be decrypted and ciphertexts can be forged for any message, i.e., no form of confidentiality or authenticity remains. We point out, however, that for outsourced storage a stronger corruption model is both feasible and preferable. Clearly, like in the AE case, if the adversary obtains a copy of the used key material then all confidentiality guarantees are lost (the adversary can decrypt what the device can decrypt, that is, everything), but a 1

2

Certainly, the storage device can always decide to “fail” by not returning any data previously stored into it, leading to an attack on the availability of the computing device. We hence consider environments where this is either not a problem or where such an attack cannot be prevented anyway (independently of the storage technique). Note that this assumption holds for our three motivating scenarios. We use the terms ‘key leakage’, ‘user corruption’, and ‘state corruption’ synonymously.

638

J. Pijnenburg and B. Poettering

similar reasoning with respect to integrity protection cannot be made. To see this, consider the encrypt-then-hash (EtH) solution where the computing device encrypts the outsourced data as described above, but in addition to having the ciphertext stored externally it internally registers a hash of it (computed with, say, SHA256). When the device decides to recover externally stored data, it requests a copy of the ciphertext, recomputes its hash value, and decrypts only if the hash value is consistent with the internally registered value. Note that even if the device is corrupted and its keys became public, all successfully decrypted ciphertexts are necessarily authentic. The example just given shows that while no solution for secure storage outsourcing can do much about protecting data confidentiality against key leakage attacks, solutions can fully protect the integrity of the stored data in any case. Naive AE-based schemes do not provide this type of security, and the contribution of our work is to fill this gap and to explore corresponding constructions. Precisely, this article provides the following: (1) We identify the new encryptto-self (ETS) primitive as the right cryptographic tool to solve the outsourced storage problem and formalize its syntax and security properties. (2) We formalize notions more directly related to the outsourced storage problem and provably confirm that secure solutions based on ETS are indeed immediate. (3) We design provably secure constructions of ETS from established cryptographic primitives.3 (4) We develop open-source implementations of our constructions that are optimized with respect to security and efficiency. Related Work. While we are not aware of any former systematic treatment of the encrypt-to-self (ETS) primitive, a number of similar primitives or ad hoc constructions partially overlap with our results. We discuss these in the following, but emphasize that none of them provides general solutions to the ETS problem. Memory Encryption in Modern CPUs. Recent desktop and server CPUs offer dedicated infrastructure for memory encryption, with the main applications in cloud computing and Trusted Execution Environments (TEEs). Prominent TEE examples include Intel SGX and ARM TrustZone in which every memory access of the processes that are executed within a TEE (aka ‘enclave’) is conducted through a memory encryption engine (MEE). This effectively implements outsourced data storage, but with quite different access rules and patterns than in the ETS case. While we consider the (stateless) encryption of a message to a ciphertext and then a decryption of a ciphertext back to a message, MEEs are stateful systems that consider the protected physical memory area a single ciphertext that is constantly locally modified with each write operation [8]. Password Managers. A password manager can be seen as a database that stores security credentials in an encrypted form and requires e.g., a master password to be unlocked. Also this can be seen as an ETS instance, but the cryp3

The above encrypt-then-hash (EtH) solution is secure in our models but requires two passes over the data. Our solutions are more efficient, getting along with just one pass.

Encrypt-to-Self: Securely Outsourcing Storage

639

tographic design of password managers has a different focus than general outsourced storage. More concretely, the central challenge solved by good password managers is the password-based key derivation, which typically involves invoking a time-expensive derivation function like PBKDF2 [9] or a memory-hard derivation function like ARGON2 [5]. Password-based key derivation is not considered in our treatment of the ETS primitive (we instead assume uniform keys). Encryptment. A symmetric encryption option that recently emerged as a proposal to protect messages in instant messaging is Encryptment [7]. Its features go beyond regular authenticated encryption in that the tags contained in ciphertexts act as (cryptographically strongly binding) commitments to the encoded messages. This committing feature was deemed helpful for the public resolution of cyber harassment cases by allowing affected parties to appeal to a judging authority by opening their ciphertexts by releasing their keys. On first sight this has nothing to do with our ETS setting (in which only one party holds a key, this key would never be deliberately shared, and a necessity of provably releasing message contents to anybody else is not considered). Interestingly, however, our constructions of ETS are very similar to those of [7]. The intuitive reason for this is that the ETS setting requires that ciphertexts remain unforgeable under key leakage, which somewhat aligns with the committing property of encryptment that is required to survive disclosing keys to a judge. Ultimately, however, the applications and thus security models of ETS and encryptment differ, and our constructions are actually more efficient than those in [7].4 Technical Approach. In addition to formalizing the security of the encryptto-self (ETS) primitive, in the course of this article we also propose efficient provably-secure constructions from standardized building blocks. As discussed above, the authenticity promises of ETS shall withstand adversaries that have knowledge of the key material. In this setting one cannot hope that standard secret-key authentication building blocks like MACs or universal hash functions will be of help, as generically they lose all security when the key is leaked. We instead employ, as they manifest unkeyed authentication primitives, cryptographic hash functions like SHA256. A first candidate construction, already hinted at above, would be the encrypt-then-hash (EtH) approach where the message is first encrypted (using any secret key scheme, e.g., AES-CTR) and the ciphertext is then hashed. Our constructions are more efficient than this by exploiting the structure of Merkle–Damgård (MD) hash functions and dual-use leveraging on the properties of their inner building block: the compression function (CF). Intuitively, for authentication we build on the collision resistance of the CF, and for confidentiality we build on a PRF-like property of the CF. More precisely, our message schedule for the CF is such that each invocation provides 4

This is the case for at least two reasons: (1) The ETS primitive does not need to be committing to the key, which is the case for encryptment. (2) Our message padding is more sophisticated than that of [7] and does not require the processing of a length field.

640

J. Pijnenburg and B. Poettering

both confidentiality and integrity for the processed block. This effectively halves the computational costs in comparison to the EtH approach. We believe that a cryptographic analysis is not complete without also implementing the construction under consideration. This is because only implementing a scheme will enforce making conscious decisions about all its details and building blocks, and these decisions may crucially affect the obtained security and efficiency. We thus realized three ready-to-use instances of the ETS primitive, based on the CFs of the top performing hash functions SHA256, SHA512, and BLAKE2. In fact, observations from implementing the schemes led to considerable feedback to the theoretical design which was updated correspondingly. One example for this is connected to memory alignment: Computations on modern CPUs experience noticeable efficiency penalties if memory accesses are not aligned to specific boundaries. Our constructions reflect this at two different levels: at the register level and at the cache level (64 bit alignment for registeroriented operations, and 256 bit alignment for bulk memory transfers5 ).

2

Preliminaries

Notation. All algorithms considered in this article may be randomized. We let N = {0, 1, . . .} and N+ = {1, 2, . . .}. For the Boolean constants True and False we either write T and F, respectively, or 1 and 0, respectively, depending on the context. An alphabet Σ is any finite set of symbols or characters. We denote with Σ n the set of strings of length n and with Σ ≤n the strings of length up to (and including) n. In the practical parts of this article we assume that |Σ| = 256, i.e., that all strings are byte strings. We denote string concatenation  with . If var is a string variable and exp evaluates to a string, we write var ← exp shorthand for var ← var  exp. Further, if exp evaluates to a string, we write var  var  ←n exp to denote splitting exp such that we assign the first n characters from exp to var and assign the remainder to var  . When we do not need the remainder, we write var ←n exp shorthand for var  dummy ←n exp and discard dummy. In pseudocode, if S is a finite set, expression $(S) stands for picking an element of S uniformly at random. Associative arrays implement the ‘dictionary’ data structure: Once the instruction A[·] ← exp initialized all items of array A to the default value exp, with A[idx ] ← exp and var ← A[idx ] individual items indexed by expression idx can be updated or extracted. 2.1

Security Games

Security games are parameterized by an adversary, and consist of a main game body plus zero or more oracle specifications. The execution of a game starts with the main game body and terminates when a ‘Stop with exp’ instruction is reached, where the value of expression exp is taken as the outcome of the game. The adversary can query all oracles specified by the game, in any order 5

The value 256 stems from the size of the cache lines of 1st level cache.

Encrypt-to-Self: Securely Outsourcing Storage

641

and any number of times. If the outcome of a game G is Boolean, we write Pr[G(A)] for the probability that an execution of G with adversary A results in True, where the probability is over the random coins drawn by the game and the adversary. We define macros for specific combinations of game-ending instructions: We write ‘Win’ for ‘Stop with T’ and ‘Lose’ for ‘Stop with F’, and further ‘Reward cond ’ for ‘If cond : Win’, ‘Promise cond ’ for ‘If ¬cond : Win’, ‘Require cond ’ for ‘If ¬cond : Lose’. These macros emphasize the specific semantics of game termination conditions. For instance, a game may terminate with ‘Reward cond ’ in cases where the adversary arranged for a situation— indicated by cond resolving to True—that should be awarded a win (e.g., the crafting of a forgery in an authenticity game). 2.2

Handling of Algorithm Failures

Regarding the algorithms of cryptographic schemes, we assume that any such algorithm can fail. Here, by failure we mean that an algorithm doesn’t generate output according to its syntax specification, but instead outputs some kind of error indicator (e.g., an AE decryption algorithm that rejects an unauthentic ciphertext or a randomized signature algorithm that doesn’t have sufficiently many random bits to its disposal). Instead of encoding this explicitly in syntactical constraints which would clutter the notation, we assume that if an algorithm invokes another algorithm as a subroutine, and the latter fails, then also the former immediately fails.6 We assume the same for game oracles: If an invoked scheme algorithm fails, then the oracle immediately aborts as well. Further, we assume that the adversary learns about this failure, i.e., the oracle will return the error indicator when it aborts. Note that this implies that if a scheme’s algorithms leak vital information through error messages, then the scheme will not be secure in our models. (That is, our models are particularly robust.) We believe that our way to handle errors implicitly rather than explicitly contributes to obtaining definitions with clean and clear semantics. 2.3

Memory Alignment

For n a power of 2, we say an address of computer memory is n-byte aligned if it is a multiple of n bytes. We further say that a piece of data is n-byte aligned if the address of its first byte is n-byte aligned. A modern CPU accesses a single (aligned) word in memory at a time. Therefore, the CPU performs reads and writes to memory most efficiently when the data is aligned. For example, on a 64-bit machine, 8 bytes of data can be read or written with a single memory 6

This approach to handling algorithm failures is taken from [12] and borrows from how modern programming languages handle ‘exceptions’, where any algorithm can raise (or ‘throw’) an exception, and if the caller does not explicitly ‘catch’ it, the caller is terminated as well and the exception is passed on to the next level. See Wikipedia: Exception_handling_syntax for exception handling syntaxes of many different programming languages.

642

J. Pijnenburg and B. Poettering

access if the first byte lies on an 8-byte boundary. However, if the data does not lie within one word in memory, the processor would need to access two memory words, which is considerably less efficient. Our scheme algorithms are designed such that when they need to move around data, they exclusively do this for aligned addresses. In practice, the preferred alignment value depends on the hardware used, so for generality in this article we refer to it abstractly as the memory alignment value mav. (A typical value would be mav = 8.) 2.4

Tweaking the Compression Functions of Hash Functions

The main NIST hash functions of the SHA2 family (FIPS 180-4, [10]) accomplish their task of hashing a message into a short string by strictly following the Merkle–Damgård framework: All inputs to their core building block—the compression function—are either directly taken from the message or from the chaining state. It has been recognized, however, that options to further contextualize or domain-separate the inputs of compression functions can be of advantage. Indeed, compression functions that are designed according to the alternative, more recent HAIFA framework [4] have a number of additional inputs, for instance an explicit salt input, that allow for weaving some extra bits of context information into the bulk hash operations. A concrete example for this is the compression function of the popular BLAKE2 hash function ([2,14], a HAIFA design), which takes as an additional input a Boolean finalization flag that is to be set specifically when processing the very last (padded) block of a hash computation. The idea behind making the last invocation “special” is that this effectively thwarts length extension attacks: While conducting extension attacks against the SHA2 hash functions, where the compression functions do not natively support any such marking mechanism, is quite immediate,7 similar attacks against BLAKE2 are impossible [6]. We note that, generally speaking, an ad hoc way of augmenting the input of a compression function by an additional small number of bits is to XOR predefined constants into the hashing state (e.g., before or while the compression function is executed), with the choice of constants depending on the added bits. For instance, if the finalization flag is set, the BLAKE2 compression function will flip all bits of one of its inputs, but beyond that operate as normal. While textbook SHA2 does not support contextualizing compression function invocations via additional inputs, we observe that NIST, in order to solve an emerging domain-separation problem in the definition of their FIPS 180-4 standard, employed ad hoc modifications of some SHA2 functions that can be seen as (implicitly) retrofitting a one-bit additional input into the compression function. Concretely, the SHA512/t functions [10], that intuitively represent plain SHA512 truncated to 0 < t < 512 bits, are carefully designed such that for any t1 = t2 the functions SHA512/t1 and SHA512/t2 are independent of each 7

For instance, an adversary who doesn’t know a value x but instead the values H(x) and y, can compute H(x  y) by just continuing the iterative MD computation from chain value H(x) on. Note this does not require inverting the compression function.

Encrypt-to-Self: Securely Outsourcing Storage

643

other.8 The separation of the individual SHA512/t versions works as follows [10, Sect. 5.3.6]: First compute the SHA512 hash value of the string "SHA512/xxx" (where placeholder xxx is replaced by the decimal encoding of t), then XOR the byte value 0xa5 (binary: 0b10100101) into every byte of the resulting chain state, then continue with regular SHA512 steps from that state on, truncating the final hash value to t bits. While the XORing step is ad hoc, it arguably represents a fairly robust domain separation method for SHA2. Our constructions of the encrypt-to-self primitive rely on compression functions that are tweaked with a single bit, that is, that support one bit as an additional input. When we implement this based on SHA2 compression functions, we employ precisely the mechanism scouted by NIST: When the additional tweak bit is set, we XOR constant 0xa5 into all state bytes and continue operation as normal. Our BLAKE2 based construction, on the other hand, uses the already existing finalization bit.

3

Foundations of Encrypt-to-Self

The overall goal of this article is to provide a secure solution for outsourced storage. We identified the novel encrypt-to-self (ETS) primitive, which provides one-time secure encryption with authenticity guarantees that hold beyond key compromise, as the right tool to construct outsourced storage.9 In this section we first formalize and study ETS, then formalize outsourced storage, and finally show how the former immediately implies the latter. This allows us to leave the outsourced storage topic aside in the remaining part of the paper and lets us instead fully focus on constructing and implementing ETS. 3.1

Syntax and Security of ETS

ETS consists of an encryption and a decryption algorithm, where the former translates a message to a binding tag and a ciphertext, and the latter recovers the message from the tag-ciphertext pair. For versatility the two operations further support the processing of an associated-data input [13] which has to be identical for a successful decryption. The task of the binding tag is to prevent forgery attacks: A user that holds an authentic copy of the binding tag will never accept any ciphertext they did not generate themselves, even if all their secrets become public. Note that while standard authenticated encryption (AE) does not provide this type of authentication, the encrypt-then-hash construction suggested in Sect. 1 does. In Sect. 4 we provide a considerably more efficient construction that uses a hash function’s compression function as its core building block. Here, we define the generic syntax of ETS and formalize its security requirements. 8 9

In particular, for instance, SHA512/128("a") is not a prefix of SHA512/192("a"). While ETS is novel, note that prior work explored the quite similar Encryptment primitive [7]. Encryptment is stronger than ETS, and less efficient to construct.

644

J. Pijnenburg and B. Poettering

Fig. 1. Games for ETS. For the values ad  , c provided by the adversary we require / M, we encode suppressed messages with ⊥. that ad  ∈ AD, c ∈ C. Assuming ⊥ ∈ For the meaning of instructions Stop with, Lose, Promise, Reward, and Require see Sect. 2.1.

Definition 1. Let AD be an associated data space and let M be a message space. An encrypt-to-self (ETS) scheme for AD and M consists of algorithms enc, dec, a key space K, a binding-tag space Bt, and a ciphertext space C. The encryption algorithm enc takes a key k ∈ K, associated data ad ∈ AD and a message m ∈ M, and returns a binding tag bt ∈ Bt and a ciphertext c ∈ C. The decryption algorithm dec takes a key k ∈ K, a binding tag bt ∈ Bt, associated data ad ∈ AD and a ciphertext c ∈ C, and returns a message m ∈ M. A shortcut notation for this API is K × AD × M → enc → Bt × C

K × Bt × AD × C → dec → M.

Correctness and Security. We require of an ETS scheme that if a message m is processed to a tag-ciphertext pair with associated data ad , and a message m is recovered from this pair using the same associated data ad , then the messages m, m shall be identical. This is formalized via the SAFE game in Fig. 1.10 In particular, observe that if the adversary queries Dec(ad , c) (for the authentic ad and c that it receives in line 02) and the dec procedure produces output m , the game promises that m = m (lines 05,06). Recall from Sect. 2.1 that this means the game stops with output T if m = m. Intuitively, the scheme is safe if we can rely on m = m, that is, if the maximum advantage Advsafe (A) := maxad∈AD,m∈M Pr[SAFE(ad , m, A)] that can be attained by realistic adversaries A is negligible. The scheme is perfectly safe if Advsafe (A) = 0 for all A. We remark that the universal quantification over all pairs (ad , m) makes our advantage definition particularly robust. 10

The SAFETY term borrows from the Distributed Computing community. SAFETY should not be confused with a notion of security. Informally, safety properties require that “bad things” will not happen. (In the case of encryption, it would be a bad thing if the decryption of an encryption yielded the wrong message.) For an initial overview we refer to Wikipedia: Safety_property and for the details to [1].

Encrypt-to-Self: Securely Outsourcing Storage

645

Our security notions demand that the integrity of ciphertexts be protected (INT-CTXT), and that encryptions be indistinguishable in the presence of chosen-ciphertext attacks (IND-CCA). The notions are formalized via the INT and IND0 , IND1 games in Fig. 1, where the latter two depend on some equivalence relation ≡ ⊆ M × M on the message space.11 For consistency, in lines 07, 15, 24 we suppress the message in all games if the adversary queries Dec(ad , c). This is crucial in the INDb games, as otherwise the adversary would trivially learn which message was encrypted, but does not harm in the other games as the adversary already knows m. Recall from Sect. 2.2 that all algorithms can fail, and if they do, then the oracles immediately abort. This property is crucial in the INT game where the dec algorithm must fail for unauthentic input such that the oracle immediately aborts. Otherwise, the game will reward the adversary, that is the game stops with T (line 14). We say that a scheme provides integrity if the maximum advantage Advint (A) := maxad∈AD,m∈M Pr[INT(ad , m, A)] that can be attained by realistic adversaries A is negligible, and that it provides indistinguishability if the same holds for the advantage Advind (A) := maxad∈AD,m0 ,m1 ∈M |Pr[IND1 (ad , m0 , m1 , A)] − Pr[IND0 (ad , m0 , m1 , A)]|. 3.2

Sufficiency of ETS for Outsourced Storage

We define the syntax of an outsourced storage scheme. We model such a scheme as a set of stateful algorithms, where algorithm write is invoked to store data and algorithm read is invoked to retrieve it. We indicate the statefulness of the algorithms by appending the term st to their names, where st is the state variable. Definition 2. Let M be a message space. A storage outsourcing scheme for M consists of algorithms gen, write, read, a state space ST , and a ciphertext space C. The state generation algorithm gen takes no input and outputs an (initial) state st ∈ ST . The storage algorithm write takes a state st ∈ ST and a message m ∈ M, and outputs an (updated) state st ∈ ST and a ciphertext c ∈ C. The retrieval algorithm read takes a state st ∈ ST and a ciphertext c ∈ C, and outputs an updated state st ∈ ST and a message m ∈ M. A shortcut notation for this API is gen → ST

M → writeST → C

C → readST → M .

Correctness and Security. We require of a storage outsourcing scheme that if a message m is processed to a ciphertext, and subsequently a message m is recovered from this ciphertext, then the messages m, m shall be identical. This is formalized via the SAFE game in Fig. 2. Observe boolean flag is (‘in-sync’) tracks 11

We use relation ≡ (in line 18 of INDb ) to deal with certain restrictions that practical ETS schemes may feature. Concretely, our construction does not take effort to hide the length of encrypted messages, implying that indistinguishability is necessarily limited to same-length messages. In our formalization such a technical restriction can be expressed by defining ≡ such that m0 ≡ m1 ⇔ |m0 | = |m1 |.

646

J. Pijnenburg and B. Poettering

whether the attack is active or passive. Initially is = T, i.e., the attack is passive; however, once the adversary requests the reading of a ciphertext that is not the last created one, the game sets is ← F to flag the attack as active (line 11). For passive attacks the game promises that any m returned by the read procedure is the last one that was processed by the write procedure (line 13). Intuitively, the scheme is safe if the maximum advantage Advsafe (A) := Pr[SAFE(A)] that can be attained by realistic adversaries A is negligible. The scheme is perfectly safe if Advsafe (A) = 0 for all A. Our security notions demand that the integrity of ciphertexts be protected (INT-CTXT), and that encryptions be indistinguishable in the presence of chosen-ciphertext attacks (IND-CCA). The notions are formalized via the INT and IND0 , IND1 games in Fig. 2, where the latter two depend on some equivalence relation ≡ ⊆ M × M on the message space (see also Footnote 13). Recall from Sect. 2.2 that all algorithms can fail, and if they do, the oracles immediately abort. This property is crucial in the INT game where the read algorithm must fail for unauthentic input such that the adversary is not rewarded in the subsequent line in the Read oracle. For consistency we suppress the message in the Read oracle for passive attacks in all games if the adversary queries Dec(ad , c). This is crucial in the INDb games, as otherwise the adversary would trivially learn which message was encrypted, but does not harm in the other games as the adversary already knows m for passive attacks. Furthermore, we remark the adversary is only allowed to query the Corrupt oracle if M contains at most 1 message, i.e., the ChWrite oracle was queried for m0 = m1 . Otherwise, the adversary would be able to run the read procedure and trivially learn m. We say that a scheme provides integrity if the maximum advantage Advint (A) := Pr[INT(A)] that can be attained by realistic adversaries A is negligible, and that it provides indistinguishability if the same holds for the advantage Advind (A) := |Pr[IND1 (A)] − Pr[IND0 (A)]|. Construction from ETS. Constructing secure outsourced storage from ETS is immediate: The write procedure samples a uniformly random key and runs the enc procedure of ETS to obtain a binding tag and ciphertext. It stores the binding tag (and key) in the state and returns the ciphertext. The read procedure gets the key and binding tag from the state, runs the dec procedure of ETS and returns the message. The details of this construction are in Fig. 3. The security argument is obvious.

4

Construction of Encrypt-to-Self

We mentioned in Sect. 1 that a generic construction of ETS can be realized by combining standard symmetric encryption with a cryptographic hash function: one encrypts the message and computes the binding tag as the hash of the ciphertext. Here we provide a more efficient construction that builds on the compression function of a Merkle–Damgård hash function. To be more precise, our construction uses a tweakable compression function with tweak space T =

Encrypt-to-Self: Securely Outsourcing Storage

647

Fig. 2. Games for outsourced storage. For all values m, m0 , m1 , c provided by the adver/ M, we encode supsary we require that m, m0 , m1 ∈ M and c ∈ C. Assuming ⊥ ∈ pressed messages with ⊥. Boolean flag is (‘in-sync’) tracks whether the attack is active or passive. For the meaning of instructions Stop with, Lose, Promise, and Require see Sect. 2.1.

{0, 1}, i.e., the domain of the compression function is extended by one bit (see Sect. 2.4). We provide a general definition below. Definition 3. For Σ an alphabet, c, d ∈ N+ with c ≤ d, and a tweak space T , we define a tweakable compression function to be a function F : Σ d × T × Σ c → Σ c that takes as input a block B ∈ Σ d from the data domain, a tweak t ∈ T from the tweak space, and a string C ∈ Σ c from the chain domain, and outputs a string C  ∈ Σ c in the chain domain. We will write Ft (B, C) as shorthand notation for F (B, t, C). For practical tweakable compression functions the memory alignment value mav (see Sect. 2.3) will divide both c and d. When constructing an ETS scheme from F , because the compression function only takes fixed-size input, we need to map the (ad , m) input to a series of block–tweak pairs (B, t). We will refer to this mapping as the input encoding. We take a modular approach by fixing the encoding independently of the encryption engine, and detail the former in Sect. 4.1 and the latter in Sect. 4.2. Together they form an efficient construction of ETS. We first convey a rough overview of our ETS construction. In Fig. 4 we consider an example with block size d double the chaining value size c. We assume

648

J. Pijnenburg and B. Poettering

Fig. 3. Construction for outsourced storage from ETS. If in line 07 the condition is not met, the read algorithm aborts with some error indicator. Recall from Sect. 2.2 that the read algorithm also aborts if the dec invocation in line 09 fails.

that key k is padded to size d. The first block B1 only contains associated data and we XOR B1 with the key k before we feed it into the compression function. From the second block, we start processing message data. We fill the first half of the block with message data m1 and the second half with associated data ad 3 , and XOR with the key. We also XOR m1 with the current chaining value C1 , to generate a partial ciphertext ct 1 . The same happens in the third block and we append ct 2 to the ciphertext. If there is associated data left after processing all message data we can load the entire block with associated data, which occurs in the fourth block. Note, we no longer need to XOR the key into the block after we have processed all message data, because at this point the input to the compression function will already be independent of the message m. After processing all blocks, we XOR an offset ω ∈ {ω0 , ω1 } with the chaining value, where ω0 , ω1 are two distinct constants. The binding tag will be (a truncation of) the last chaining value C ∗ . Therefore, it will be crucial to fix ω0 , ω1 such that they remain distinct after truncation. Note that the task of the encoding is not only to partition ad and m into blocks B1 , B2 , . . . as described, but also to derive tweak values t1 , t2 , . . . and the choice of the final offset ω in such a way that the overall encoding is injective. 4.1

Message Block Encoding

We turn to the technical component of our ETS construction that encodes the (ad , m) input into a series of output pairs (B, t) and the final offset value ω. For

Fig. 4. Example for enc(k, ad , m) where d = 2c and ad = ad 1  . . .  ad 6 and m = m1  m2 with |ad | = 6c and |m| = 2c. For clarity we have made the blocks Bi , as they are output by the encoding function, explicit. Inspiration for this figure is drawn from https://www.iacr.org/authors/tikz/.

Encrypt-to-Self: Securely Outsourcing Storage

649

authenticity we require that the encoding is injective. For efficiency we require that the encoding is online (i.e., the input is read only once, left-to-right, and with small state), that the number of output pairs is as small as possible, and that the encoding preserves memory alignment (see Sect. 2.3). Syntactically, for the outputs we require that all B ∈ Σ d , all t ∈ T , and ω ∈ Ω, where quantities c, d are those of the employed compression function, T = {0, 1}, and Ω ⊆ Σ c is any two-element set. (Note that |T | = 2 allows us to use the tweaking approach from Sect. 2.4; further, in our implementations we use Ω = {ω0 , ω1 } where ω0 = 0x00c and ω1 = 0xa5c .) Overall, the task we are facing is the following: Task. Assume |Σ| = 256 and AD = M = Σ ∗ and T = {0, 1} and |Ω| = 2. For c, d ∈ N+ , c < d, find an injective encoding function encode : AD × M → (Σ d ×T )∗ ×Ω that takes as input two finite strings and outputs a finite sequence of pairs (B, t) ∈ Σ d × T and an offset ω ∈ Ω. A detailed specification of our encoding (and decoding) function can be found in Fig. 6, but we present it here in text. Our construction does not use the decoding function, but we provide it anyway to show that the encoding function is indeed injective. Roughly, we encode as follows. We fill the first block with ad and for any subsequent block we load the message in the first part of the block and ad in the second part of the block. When we have processed all the message data, we load the full block with ad again. Clearly, we need to pad ad if it runs out before we have processed all message data. We do this by appending a special termination symbol ∈ Σ to ad and then appending null bytes as needed. Similarly, we need to pad the message if the message length is not a multiple of c. Naturally, one might want to pad the message to a multiple of c. However, this is suboptimal: Consider the scenario where there are d − c + 1 bytes remaining to be processed of associated data and 1 byte of message data. In principle, message and associated data would fit into a single block, but this would not be the case any longer if the message is padded to size c. On the other hand, for efficiency reasons we do not want to misalign all our remaining associated data. If we do not pad at all, when we process the next d bytes of associated data, we can only fit d − 1 bytes in the block and have to put 1 byte into the next block. Hence, we pad m up to a multiple of the memory alignment value mav. Therefore, we prepend a padded message with its length |m|; this will uniquely determine where m stops. We then fill up with null bytes until reaching a multiple of mav. This restricts us to c ≤ 256 bytes such that |m| always can be encoded into a single byte. As far as we are aware, any current practical compression function satisfies this requirement. In Fig. 5, for the artificially small case with c = 1 and d = 2 we provide four examples of what the blocks would look like for different inputs. The top row shows the encoding of 8 bytes of associated data and an empty message. The second row shows the encoding of empty associated data and 3 bytes of message data. The third row shows the encoding of 6 bytes of associated data and 2 bytes of message data. The final row shows the encoding of 3 bytes of associated data and 3 bytes of message data.

650

J. Pijnenburg and B. Poettering

Fig. 5. Example encodings for the case c = 1 and d = 2.

We have two ambiguities remaining. (1) How to tell whether ad was padded or not? Consider the first row in Fig. 5. What distinguishes the case ad = ad 1  . . .  ad 7 from ad = ad 1  . . .  ad 7  ad 8 with ad 8 = ? A similar question applies to the message. (2) How to tell whether a block contains message data or not? Compare e.g., the first row with the third row. This is where the tweaks come into play. First of all, we tweak the first block if and only if the message is empty. This fully separates the authentication-only case from the case where we have message input. Next, if the message is non-empty, we use the tweaks to indicate when we switch from processing message data to ad -only: we tweak when we have consumed all of m, but still have ad left. Note the first block never processes message data, so the earliest block this may tweak is the second block and hence this rule does not interfere with the first rule. Furthermore, observe this rule never tweaks the final block, as by definition of being the final block, we do not have any associated data left to process. Next, we need to distinguish between the cases whether m is padded or not. In fact, as the empty message was already taken care of, we need to do this only if m is at least one byte in size. As in this case the final block does not coincide with the first block, we can exploit that its tweak is still unused; we correspondingly tweak the final block if and only if m is padded. Obviously, this does not interfere with the previous rules. Finally, we need to decide whether ad was padded or not. We do not want to enforce a policy of ‘always pad’, as this could result in an extra block and hence an extra compression function invocation. Instead, we use our offset output. We set the offset ω to ω1 if ad was padded; otherwise we set it to ω0 . This completes our description of the encoding function. The decoding function is a technical exercise carefully unwinding the steps taken in the encoding function, which we perform in Fig. 6. We obtain that for all m ∈ M, ad ∈ AD we have decode(encode(ad , m)) = (ad , m). It immediately follows that our encoding function is injective. For readability we have implemented the core functionality of the encoding in a coroutine called nxt, rather than a subroutine. Instead of generating the entire sequence of (B, t) pairs and returning the result, it will ‘Yield’ one pair and suspend its execution. The next time it is called (e.g., the next step in a for loop), it will resume execution from where it called ‘Yield’,

Encrypt-to-Self: Securely Outsourcing Storage

651

Fig. 6. ETS construction: encoder, decoder, encryptor, and decryptor. (Procedure nxt is a coroutine for encode, enc, and dec, see text.) Using global constants mav, c, d, taglen, and IV.

instead of at the beginning of the function, with all of its state intact. The encode procedure is a simple wrapper that runs the nxt procedure and collects its output, but our authenticated encryption engine described in Sect. 4.2 will call the nxt procedure directly.

652

4.2

J. Pijnenburg and B. Poettering

Encryption Engine

We now turn our focus to the encryption engine. We assume that the associated data and message are present in encoded format, i.e., as a sequence of pairs (B, t), where B ∈ Σ d is a block and t ∈ {0, 1} is a tweak, and additionally an offset ω ∈ {ω0 , ω1 }. To be precise, we will use the nxt procedure that generates the next (B, t) pair on the fly. We specify the encryption and decryption algorithms in Fig. 6 and assume they are provided with a key of length d. As illustrated in Fig. 4, the main idea is to XOR the key with all blocks that are involved with message processing. For the skeleton of the construction, we initialize the chaining value C to IV and loop through the sequence of pairs (B, t) output by the encoding function, each iteration updating the chaining value C ← Ft (B, C). We now describe each iteration of the enc procedure in more detail, where numbers in brackets refer to line numbers in Fig. 6. If the block is empty [66], we are in the final iteration and do not do anything. Otherwise, we check if we are in the first iteration or if we have message data left [68]. In this case we XOR the key into the block [69]. This ensures we start with an unknown input block and that subsequent inputs are statistically independent of the message block. If we only have ad remaining we can use the block directly as input to the compression function. If we have message data left we will encrypt it starting from the second block [70]. To encrypt, we take a chunk of the message, XOR it with the chaining value of equal size and append the result to the ciphertext [71–74]. We only start encrypting from the second iteration as the first chaining value is public. Finally, we call the compression function Ft to update our chaining value [75]. Once we have finished the loop, the last pair (B, t) equals (, ω) by definition. So we XOR the offset ω with the chaining value C and truncate the result to obtain the binding tag [76]. We return the binding tag along with the ciphertext. The dec procedure is similar to the enc procedure but needs to be slightly adapted. Informally, the nxt procedure now outputs a block B = (ct  ad ) [79] instead of B = (m  ad ) [65]. Hence, we XOR with the chaining variable [88, 93] such that the block becomes B = (m  ad ) and the compression function call takes equal input compared to the enc procedure. The case distinction handles the slightly different positioning of ciphertext in the blocks. Finally, there obviously is a check if the computed binding tag is equal to the stored binding tag [96]. In order to prove security, we need further assumptions on our compression function than the standard assumption of preimage resistance and collision resistance. For example, we need F to be difference unpredictable. Roughly, this notion says it is hard to find a pair (x, y) such that F (x) = F (y) ⊕ z for a given difference z. Moreover, we truncate the binding tag, so actually it should be hard to find a tuple such that this equation holds for the first |bt| bits. We note collision resistance of F does not imply collision resistance of a truncated version of F [3]. However, such assumptions can be justified if one views the compression function as a random function. Hence, instead of several ad hoc assumptions, we prove our construction secure directly in the random oracle model.

Encrypt-to-Self: Securely Outsourcing Storage

653

As described in Sect. 2.4 we tweak the SHA2 compression function by modifying the chaining value depending on the tweak. Let F be the tweakable compression function in Fig. 6, we denote with F  the (standard) compression function that will take as input the block and the (modified) chaining value. Let H : Σ d × Σ c → Σ c be a random oracle. We will substitute H for F  in our construction. We remark the BLAKE2b compression function is a tweakable compression function and it can be substituted for a random oracle directly. Hence, we focus on the security proof with a standard compression function, as the case with a tweakable compression function is a simplification of the proof with a standard compression function. In the random oracle model, our ETS construction from Fig. 6 provides integrity and indistinguishability (Theorem 1), assuming sufficiently large tag and key lengths. Here, we only state the theorem and we refer the reader to the full version [11] for the proof. Theorem 1. Let π be the construction given in Fig. 6, H a random oracle replacing the compression function, A an adversary, Advint π (A) the advantage that A has against π in the integrity game of Fig. 1, Advind π (A) the advantage that A has against π in the indistinguishability games of Fig. 1, and q the number of random oracle queries (either directly or indirectly via Dec). We have, −|bt| , Advint π (A) ≤ q · 2

and

5

2 −c + q · 2−|k| + Advint Advind π (A) ≤ q · 2 π (A).

Implementation of Encrypt-to-Self

We implemented three versions of the ETS primitive. Precisely, we implemented the padding scheme and encryption engine from Fig. 6 in C, based on the compression functions of SHA256, SHA512, and BLAKE2 [10,14] and the tweaking approach described in Sect. 2.4. All three functions are particularly good performers in software, where specifically SHA512 and BLAKE2 excel on 64bit platforms. Roughly, we measured that the BLAKE2 version is about 15% faster than the SHA512 version, which in turn is about 15% faster than the SHA256 version.12 We note that our code is written in pure C and in principle would benefit from assembly optimizations. Fortunately, however, all three compression functions are ARX (add–rotate–xor) designs so that the penalty of not hand-optimizing is not too drastic. Further, many freely available assembly implementations of the SHA2 and BLAKE2 core functions exist, e.g., in the 12

We do not provide cycle counts as we measured on outdated hardware (an Intel Core i3-2350M CPU @ 2.30 GHz) and our numbers would not allow for a meaningful efficiency estimation on current CPUs. Note that https://blake2.net/ reports a performance of raw BLAKE2(b) on Skylake of roughly 1 GB/s. Our ETS adds the message encryption step on top of this (a series of memory accesses and XOR operations per message block), so it seems fair to expect an overall performance of more than 800 MB/s.

654

J. Pijnenburg and B. Poettering

OpenSSL package, and we made sure that our API abstractions are compatible with these, allowing for drop in replacements. Acknowledgments. We thank the reviewers of ESORICS’20 for their comments and appreciate the feedback provided by Cristina Onete. The research of Pijnenburg was supported by the EPSRC and the UK government as part of the Centre for Doctoral Training in Cyber Security at Royal Holloway, University of London (EP/P009301/1). The research of Poettering was supported by the European Union’s Horizon 2020 project FutureTPM (779391).

References 1. Alpern, B., Schneider, F.B.: Recognizing safety and liveness. Distrib. Comput. 2(3), 117–126 (1987). https://doi.org/10.1007/BF01782772 2. Aumasson, J.-P., Neves, S., Wilcox-O’Hearn, Z., Winnerlein, C.: BLAKE2: simpler, smaller, fast as MD5. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 119–135. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38980-1_8 3. Biham, E., Chen, R.: Near-collisions of SHA-0. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 290–305. Springer, Heidelberg (2004). https://doi.org/ 10.1007/978-3-540-28628-8_18 4. Biham, E., Dunkelman, O.: A framework for iterative hash functions - HAIFA. Cryptology ePrint archive, report 2007/278 (2007). http://eprint.iacr.org/2007/ 278 5. Biryukov, A., Dinu, D., Khovratovich, D.: Argon2: new generation of memory-hard functions for password hashing and other applications. In: EuroS&P, pp. 292–302. IEEE (2016) 6. Chang, D., Nandi, M., Yung, M.: Indifferentiability of the hash algorithm BLAKE. Cryptology ePrint archive, report 2011/623 (2011). http://eprint.iacr.org/2011/ 623 7. Dodis, Y., Grubbs, P., Ristenpart, T., Woodage, J.: Fast message franking: from invisible salamanders to encryptment. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018. LNCS, vol. 10991, pp. 155–186. Springer, Cham (2018). https:// doi.org/10.1007/978-3-319-96884-1_6 8. Gueron, S.: Memory encryption for general-purpose processors. IEEE Secur. Priv. 14(6), 54–62 (2016) 9. Kaliski, B.: PKCS #5: password-based cryptography specification version 2.0. RFC 2898, September 2000. https://rfc-editor.org/rfc/rfc2898.txt 10. NIST: FIPS 180–4: Secure Hash Standard (SHS). Technical report, NIST (2015). https://doi.org/10.6028/NIST.FIPS.180-4 11. Pijnenburg, J., Poettering, B.: Encrypt-to-self: securely outsourcing storage. Cryptology ePrint archive, report 2020/847 (2020). https://eprint.iacr.org/2020/847 12. Pijnenburg, J., Poettering, B.: Key assignment schemes with authenticated encryption. IACR Trans. Symmetric Cryptol. 2020(2), 40–67 (2020). Revisited 13. Rogaway, P.: Authenticated-encryption with associated-data. In: Atluri, V. (ed.) ACM CCS 2002, pp. 98–107. ACM Press, November 2002 14. Saarinen, M.O., Aumasson, J.: The BLAKE2 cryptographic hash and message authentication code (MAC). RFC 7693 (2015). https://rfc-editor.org/rfc/rfc7693. txt

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph Yang Cao1(B) , Yonghui Xiao2 , Shun Takagi1 , Li Xiong2 , Masatoshi Yoshikawa1 , Yilin Shen3 , Jinfei Liu2 , Hongxia Jin3 , and Xiaofeng Xu2 1

Kyoto University, Kyoto, Japan {yang,yoshikawa}@i.kyoto-u.ac.jp, [email protected] 2 Emory University, Atlanta, Georgia [email protected], {lxiong,jliu253}@emory.edu, [email protected] 3 Samsung Research America, Mountain View, USA {yilin.shen,hongxia.jin}@samsung.com

Abstract. Location privacy has been extensively studied in the literature. However, existing location privacy models are either not rigorous or not customizable, which limits the trade-off between privacy and utility in many real-world applications. To address this issue, we propose a new location privacy notion called PGLP, i.e., Policy Graph based Location Privacy, providing a rich interface to release private locations with customizable and rigorous privacy guarantee. First, we design a rigorous privacy for PGLP by extending differential privacy. Specifically, we formalize location privacy requirements using a location policy graph, which is expressive and customizable. Second, we investigate how to satisfy an arbitrarily given location policy graph under realistic adversarial knowledge, which can be seen as constraints or public knowledge about user’s mobility pattern. We find that a policy graph may not always be viable and may suffer location exposure when the attacker knows the user’s mobility pattern. We propose efficient methods to detect location exposure and repair the policy graph with optimal utility. Third, we design an end-to-end location trace release framework that pipelines the detection of location exposure, policy graph repair, and private location release at each timestamp with customizable and rigorous location privacy. Finally, we conduct experiments on real-world datasets to verify the effectiveness and the efficiency of the proposed algorithms. Keywords: Spatiotemporal data · Location privacy · Trajectory privacy · Differential privacy · Location-based services

This work is partially supported by JSPS KAKENHI Grant No. 17H06099, 18H04093, 19K20269, U.S. National Science Foundation (NSF) under CNS-2027783 and CNS1618932, and Microsoft Research Asia (CORE16). Yang and Yonghui contributed equally to this work. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 655–676, 2020. https://doi.org/10.1007/978-3-030-58951-6_32

656

1

Y. Cao et al.

Introduction

As GPS-enabled devices such as smartphones or wearable gadgets are pervasively used and rapidly developed, location data have been continuously generated, collected, and analyzed. These personal location data connecting the online and offline worlds are precious, because they could be of great value for the society to enable ride sharing, traffic management, emergency planning, and disease outbreak control as in the current covid-19 pandemic via contact tracing, disease spread modeling, traffic and social distancing monitoring [4,19,26,30]. On the other hand, privacy concerns hinder the extensive use of big location data generated by users in the real world. Studies have shown that location data could reveal sensitive personal information such as home and workplace, religious and sexual inclinations [35]. According to a survey [18], 78% smartphone users among 180 participants believe that Apps accessing their location pose privacy threats. As a result, the study of private location release has drawn increasing research interest and many location privacy models have been proposed in the last decades (see survey [34]). However, existing location privacy models for private location releases are either not rigorous or not customizable. Following the seminal paper [22], the early location privacy models were designed based on k-anonymity [37] and adapted to different scenarios such as mobile P2P environments [14], trajectory release [3] and personalized k-anonymity for location privacy [21]. The follow-up studies revealed that k-anonymity might not be rigorous because it syntactically defines privacy as a property of the final “anonymized” dataset [29] and thus suffers many realistic attacks when the adversary has background knowledge about the dataset [28,31]. To this end, the state-of-the-art location privacy models [1,12,38,39] were extended from differential privacy (DP) [15] to private location release since DP is considered a rigorous privacy notion which defines privacy as a property of the algorithm. Although these DP-based location privacy models are rigorously defined, they are not customizable for different scenarios with various requirements on privacy-utility trade-off. Taking an example of Geo-Indistinguishability [1], which is the first DP-based location privacy, the protection level is solely controlled by a parameter  to achieve indistinguishability between any two possible locations (the indistinguishability is scaled to the Euclidean distance between any two possible locations). This one-size-fits-all approach may not fit every application’s requirement on utility-privacy trade-off. Different location-based services (LBS) may have different usage of the data and thus need different location privacy policies to strike the right balance between privacy and utility. For instance, a proper location privacy policy for weather apps could be “allowing the app to access a user’s city-level location but ensuring indistinguishability among locations in each city”, which guarantees both reasonable privacy and high usability for a city-level weather forecast. Similarly, for POI recommendation [2], trajectory mining [32] or crowd monitoring during the pandemic [26], a suitable location privacy policy could be “allowing the app to access the semantic category (e.g., a restaurant or a shop) of a user’s location but ensuring indistinguishability among locations with the

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph (a) Time t

Possible locations Ct

657

(b) Time t+1

Possible locations Ct+1

Fig. 1. An example of location policy graph and the constrained domains (i.e., possible locations) Ct and Ct+1 at time t and t + 1, respectively.

same category”, so that the LBS provider may know the user is at a restaurant or a shop, but not sure which restaurant or which shop. In this work, we study how to release private location with customizable and rigorous privacy. There are three significant challenges to achieve this goal. First, there is a lack of a rigorous and customizable location privacy metric and mechanisms. The closest work regarding customizable privacy is Blowfish privacy [25] for statistical data release, which uses a graph to represent a customizable privacy requirement, in which a node indicates a possible database instance to be protected, and an edge represents indistinguishability between the two possible databases. Blowfish privacy and its mechanisms are not applicable in our setting of private location release. It is because Blowfish privacy is defined on a statistical query over database with multiple users’ data; whereas the input in the scenario of private location release is a single user’s location. The second challenge is how to satisfy an arbitrarily given location privacy policy under realistic adversarial knowledge, which is public knowledge about users’ mobility pattern. In practice, as shown in [39,40], an adversary could take advantage of side information to rule out inadmissible locations1 and reduce a user’s possible locations into a small set, which we call constrained domain. We find that the location privacy policy may not be viable under a constrained domain and the user may suffer location exposure (we will elaborate how this could happen in Sect. 4.1). The third challenge is how to release private locations continuously with high utility. We attempt to provide an end-to-end solution that takes the user’s true location and a predefined location privacy policy as inputs, and outputs private location trace on the fly. We summarize the research questions below. – How to design a rigorous location privacy metric with customizable location privacy policy? (Sect. 3) – How to detect the problematic location privacy policy and repair it with high utility? (Sect. 4) – How to design an end-to-end framework to release private location continuously? (Sect. 5)

1

For example, it is impossible to move from Kyoto to London in a short time.

658

Y. Cao et al.

1.1

Contributions

In this work, we propose a new location privacy metric and mechanisms for releasing private location trace with flexible and rigorous privacy. To the best of our knowledge, this is the first DP-based location privacy notion with customizable privacy. Our contributions are summarized below. First, we formalize Policy Graph based Location Privacy (PGLP), which is a rigorous privacy metric extending differential privacy with a customizable location policy graph. Inspired by the statistical privacy notion of Blowfish privacy [25], we design location policy graph to represent which information needs to be protected and which does not. In a location policy graph (such as the one shown in Fig. 1), the nodes are the user’s possible locations, and the edges indicate the privacy requirements regarding the connected locations: an attacker should not be able to significantly distinguish which location is more probable to be the user’s true location by observing the released location. PGLP is a general location privacy model compared with the prior art of DP-based location privacy notions, such as Geo-Indistinguishability [1] and Location Set Privacy [39]. We prove that they are two instances of PGLP under the specific configurations of the location policy graph. We also design mechanisms for PGLP by adapting the Laplace mechanism and Planar Isotropic Mechanism (PIM) (i.e., the optimal mechanism for Location Set Privacy [39]) w.r.t. a given location policy graph. Second, we design algorithms that examine the feasibility of a given location policy graph under adversarial knowledge about the user’s mobility pattern modeled by Markov Chain. We find that the policy graph may not always be viable. Specially, as shown in Fig. 1, some nodes (locations) in a policy graph may be excluded (e.g., s4 and s6 in Fig. 1 (b)) or disconnected (e.g., s5 in Fig. 1 (b)) due to the limited set of the possible locations. Protecting the excluded nodes is a lost cause, but it is necessary to protect the disconnected nodes since it may lead to location exposure when the user is at such a location. Surprisingly, we find that a disconnected node may not always result in the location exposure, which also depends on the protection strength of the mechanism. Intuitively, this happens when a mechanism “overprotects” a location policy graph by implicitly guaranteeing indistinguishability that is not enforced by the policy. We design an algorithm to detect the disconnected nodes that suffer location exposure , which are named isolated node. We also design a graph repair algorithm to ensure no isolated node in a location policy graph by adding an optimal edge between the isolated node and another node with high utility. Third, we propose an end-to-end private location trace release framework with PGLP that takes inputs of the user’s true location at each time t and outputs private location continuously satisfying a pre-defined location policy graph. The framework pipelines the calculation of constrained domains, isolated node detection, policy graph repair, and private location release mechanism. We also reason about the overall privacy guarantee in multiple releases. Finally, we implement and evaluate the proposed algorithms on real-world datasets, showing that privacy and utility can be better tuned with customizable location policy graphs.

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

2 2.1

659

Preliminaries Location Data Model

Similar to [39,40], we employ two coordinate systems to represent locations for applicability for different application scenarios. A location can be represented by an index of grid coordinates or by a two-dimension vector of longitude-latitude coordinate to indicate any location on the map. Specifically, we partition the map into a grid such that each grid cell corresponds to an area (or a point of interest); any location represented by a longitude-latitude coordinate will also have a grid number or index on the grid coordinate. We denote the location domain as S = {s1 , s2 , · · · , sN } where each si corresponds to a grid cell on the map, 1 ≤ i ≤ N . We use s∗t and zt to denote the user’s true location and perturbed location at time t. We also use t in s∗ and z to refer the locations at a single time when it is clear from the context. Location Query. For the ease of reasoning about privacy and utility, we use a location query f : S → R2 to represent the mapping from locations to the longitude and latitude of the center of the corresponding grid cell. 2.2

Problem Statement

Given a moving user on a map S in a time period {1, 2, · · · , T }, our goal is to release the perturbed locations of the user to untrusted third parties at each timestamp under a pre-defined location privacy policy. We define Indistinguishability as a building block for reasoning about our privacy goal. Definition 1 (-Indistinguishability). Two locations si and sj are indistin-guishable under a randomized mechanism A iff for any output z ⊆ Pr(A(si )=z) ≤ e , where  ≥ 0. Range(A), we have Pr(A(s j )=z) As we exemplified in the introduction, different LBS applications may have different metrics of utility. We aim at providing better utility-privacy trade-off by customizable -Indistinguishability between locations. Adversarial Model. We assume that the attackers know the user’s mobility pattern modeled by Markov chain, which is widely used for modeling user mobility profiles [10,20]. We use matrix M ∈ [0, 1]N ×N to denote the transition probabilities of Markov chain with mij being the probability of moving from location si to location sj . Another adversarial knowledge is the initial probability distribution of the user’s location at t = 1. To generalize the notation, we denote probability distribution of the user’s location at t by a vector pt ∈ [0, 1]1×N , and denote the ith element in pt by pt [i] = Pr(s∗t = si ), where s∗t is the user’s true location at t and si ∈ S. Given the above knowledge, the attackers could infer the user’s possible locations at time t, which is probably smaller than the location domain S, and we call it a constrained domain.

660

Y. Cao et al.

Definition 2 (Constrained domain). We denote Ct = {si | Pr(s∗t = si ) > 0, si ∈ S} as constrained domain, which indicates a set of possible locations at t. We note that the constrained domain can be explained as the requirement of LBS applications. For example, an App could only be used within a certain area, such as a university free shuttle tracker.

3

Policy Graph Based Location Privacy

In this section, we first formalize the privacy requirement using location policy graph in Sect. 3.1. We then design the privacy metric of PGLP in Sect. 3.2. Finally, we propose two mechanisms for PLGP in Sect. 3.3. 3.1

Location Policy Graph

Inspired by Blowfish privacy [25], we use an undirected graph to define what should be protected, i.e., privacy policies. The nodes are secrets, and the edges are the required indistinguishability, which indicates an attacker should not be able to distinguish the input secrets by observing the perturbed output. In our setting, we treat possible locations as nodes and the indistinguishability between the locations as edges. Definition 3 (Location Policy Graph). A location policy graph is an undirected graph G = (S, E) where S denotes all the locations (nodes) and E represents indistinguishability (edges) between these locations. Definition 4 (Distance in Policy Graph). We define the distance between two nodes si and sj in a policy graph as the length of the shortest path (i.e., hops) between them, denoted by dG (si , sj ). If si and sj are disconnected, dG (si , sj ) = ∞. In DP, the two possible database instances with or without a user’s data are called neighboring databases. In our location privacy setting, we define neighbors as two nodes with an edge in a policy graph. Definition 5 (Neighbors). The neighbors of location s, denoted by N (s), is the set of nodes having an edge with s, i.e., N (s) = {s |dG (s, s ) = 1, s ∈ S}. We denote the nodes having a path with s by N P (s), i.e., the nodes in the same connected component with s. In our framework, we assume the policy graph is given and public. In practice, the location privacy policy can be defined application-wise and identical for all users using the same application.

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

3.2

661

Definition of PGLP

We now formalize Policy Graph based Location Privacy (PGLP), which guarantees -indistinguishability in Definition 1 for every pair of neighbors (i.e., for each edge) in a given location policy graph. Definition 6 ({, G}-Location Privacy). A randomized algorithm A satisfies {, G}-location privacy iff for all z ⊆ Range(A) and for all pairs of neighbors s Pr(A(s)=z)  and s in G, we have Pr(A(s  )=z) ≤ e . In PGLP, privacy is rigorously guaranteed through ensuring indistinguishability between any two neighboring locations specified by a customizable location policy graph. The above definition implies the indistinguishability between two nodes that have a path in the policy graph. Lemma 1. An algorithm A satisfies {, G}-location privacy, iff any two nodes si , sj ∈ G are  · dG (si , sj )-indistinguishable. Lemma 1 indicates that, if there is a path between two nodes si , sj in the policy graph, the corresponding indistinguishability is required at a certain degree; if two nodes are disconnected, the indistinguishability is not required (i.e., can be ∞) by the policy. As an extreme case, if a node is disconnected with any other nodes, it is allowed to be released without any perturbation. Comparison with Other Location Privacy Models. We analyze the relation between PGLP and two well-known DP-based location privacy models, i.e., Geo-Indistinguishability (Geo-Ind) [1] and δ-Location Set Privacy [39]. We show that PGLP can represent them under proper configurations of policy graphs. Geo-Ind [1] guarantees a level of indistinguishability between two locations si and sj that is scaled with their Euclidean distance, i.e., ·dE (si , sj )-indistinguishability, where dE (·, ·) denotes Euclidean distance. Note that the unit length used in Geo-Ind scales the level of indistinguishability. We assume that, for any neighbors s and s , the unit length used in Geo-Ind makes dE (s, s ) ≥ 1. Let G1 be a location policy graph that every location has edges with its closest eight locations on the map, as shown in Fig. 2 (a). We can derive Theorem 1 by , sj ) ≤ dE (si , sj ) for any si , sj ∈ G1 (e.g., in Fig. 2(a), Lemma 1 with the fact of dG (si√ dG (s1 , s2 ) = 3 and dE (s1 , s2 ) = 10).

Fig. 2. Two examples of location policy graphs.

662

Y. Cao et al.

Theorem 1. An algorithm satisfying {, G1 }-location privacy also achieves Geo-Indistinguishability. δ-Location Set Privacy [39] extends differential privacy on a subset of possible locations, which is assumed as adversarial knowledge. We note that the constrained domain in Definition 2 can be considered a generalization of δ-location set, whereas we do not specify the calculation of this set in PGLP. δ-Location Set Privacy ensures indistinguishability among any two locations in the δ-location set. Let G2 be a location policy graph that is complete, i.e., fully connected among all locations in the δ-location set as shown in Fig. 2(b). Theorem 2. An algorithm satisfying {, G2 }-location privacy also achieves δLocation Set privacy. We defer the proofs of the theorems to a full version because of space limitation. 3.3

Mechanisms for PGLP

In the following, we show how to transform existing DP mechanisms into one satisfying PGLP using graph-calibrated sensitivity. We temporarily assume the constrained domain C = S and study the effect of C on policy G in Sect. 4. As shown in Sect. 2.1, the problem of private location release can be seen as answering a location query f : S → R2 privately. Then we can adapt the existing DP mechanism for releasing private locations by adding random noises to longitude and latitude independently. We use this approach below to adapt the Laplace mechanism and Planar Isotropic Mechanism (PIM) (i.e., an optimal mechanism for Location Set Privacy [39]) to achieve PGLP. Policy-Based Laplace Mechanism (P-LM). Laplace mechanism is built on the 1 -norm sensitivity [16], defined as the maximum change of the query results due to the difference of neighboring databases. In our setting, we calibrate this sensitivity w.r.t. the neighbors specified in a location policy graph. Definition 7 (Graph-calibrated 1 -norm Sensitivity). For a location s 1 norm of and a query f (s): s → R2 , its 1 -norm sensitivity SfG is the maximum  Δf G where Δf G is a set of points (i.e., two-dimension vectors) of f (si ) − f (sj ) for si , sj ∈ N P (s) (i.e., the nodes with the same connected component of s). We note that, for a true location s, releasing N P (s) does not violate the privacy defined by the policy graph. It is because, for any connected s and s , N P (s) and N P (s ) are the same; while, for any disconnected s and s , the indistinguishability between N P (s) and N P (s ) is not required by Definition 6. Algorithm 1. Policy-based Laplace Mechanism (P-LM) Require: , G, the user’s true locations. 1: Calculate SfG = sup|| f (si ) − f (sj ) ||1 for all neighbors si , sj ∈ N P (s);

2: 3:

Perturb location z = f (s) + [Lap(SfG /), Lap(SfG /)]T ; return a location z ∈ S that is closest to z on the map.

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

663

Theorem 3. P-LM satisfies {, G}-location privacy. Policy-Based Planar Isotropic Mechanism (P-PIM). PIM [39] achieves the low bound of differential privacy on two-dimension space for Location Set Privacy. It adds noises to longitude and latitude using K-norm mechanism [24] with sensitivity hull [39], which extends the convex hull of the sensitivity space in K-norm mechanism. We propose a graph-calibrated sensitivity hull for PGLP. Definition 8 (Graph-calibrated Sensitivity Hull). For a location s and a hull query f (s): s → R2 , the graph-calibrated sensitivity hull K(G) is the convex  of ΔfG where Δf G is a set of points (i.e., two-dimension vectors) of f (si ) − f (sj ) for any si , sj ∈ N P (s) and si , sj are neighbors, i.e., K(G) = Conv(Δf G ). We note that, in Definitions 7 and 8, the sensitivities are independent of the true location s and all the nodes in N (s) have the same sensitivity. Definition 9. (K-norm Mechanism [24]). Given any function f (s): s → Rd and its sensitivity hull K, K-norm mechanism outputs z with probability below. Pr(z) =

1 exp (−||z − f (s)||K ) Γ (d + 1)Vol(K/)

(1)

where Γ (·) is Gamma function and Vol(·) denotes volume.

Algorithm 2. Policy-based Planar Isotropic Mechanism (P-PIM) Require: , G, the user’s true  location s.  1: Calculate K(G) = Conv f (si ) − f (sj ) for all neighbors si , sj ∈ N P (s); 2: z = f (s) + Y where Y is two-dimension noise drawn by Eq.(1) with sensitivity hull K(G); 3: return a location z ∈ S that is closest to z on the map.

Theorem 4. P-PIM satisfies {, G}-location privacy. We can prove Theorems 3 and 4 using Lemma 1. The sensitivity is scaled with the graph-based distance. We note that directly using Laplace mechanism or PIM can satisfy a fully connected policy graph over locations in the constrained domain as shown in Fig. 2(b). Theorem 5. Algorithm 2 has the time complexity O(|C| log(h)+h2 log(h)) where h is number of vertices on the polygon of Conv(Δf G ).

4

Policy Graph Under Constrained Domain

In this section, we investigate and prevent the location exposure of a policy graph under constrained domain in Sect. 4.1 and 4.2, respectively; then we repair the policy graph in Sect. 4.3.

664

4.1

Y. Cao et al.

Location Exposure

As shown in Fig. 1 (right) and introduced in Sect. 1, a given policy graph may not be viable under adversarial knowledge of constrained domain (Definition 2). We illustrate the potential risks due to the constrained domain shown in Fig. 3.

Fig. 3. (a) The constrained domain C = {s2 , s3 , s5 }; (b) The constrained policy graph.

We first examine the immediate consequences of the constrained domain to the policy graph by defining the excluded and disconnected nodes. We then show the disconnected node may lead to location exposure. Definition 9 (Excluded node). Given a location policy graph G = (S, E) and a constrained domain C ⊂ S, if s ∈ S and s ∈ C, s is an excluded node. Definition 10 (Disconnected node). Given a location policy graph G = (S, E) and a constrained domain C ⊂ S, if a node s ∈ C, N (s) = ∅ and N (s) ∩ C = ∅, we call s a disconnected node. Intuitively, the excluded node is outside of the constrained domain C, such as the gray nodes {s1 , s4 , s6 } in Fig. 3; whereas the disconnected node (e.g., s5 in Fig. 3) is inside of C and has neighbors, yet all its neighbors are outside of C. Next, we analyze the feasibility of a location policy graph under a constrained domain. The first problem is that, by the definition of excluded nodes, it is not possible to achieve indistinguishability between the excluded nodes and any other nodes. For example in Fig. 3, the indistinguishability indicated by the gray edges is not feasible because of Pr(A(s4 ) = z) = Pr(A(s6 ) = z) = 0 for any z given the adversarial knowledge of Pr(s4 ) = Pr(s6 ) = 0. Hence, one can only achieve a constrained policy graph, such as the one with nodes {s2 , s3 , s5 } in Fig. 3(b). Definition 11 (Constrained Location Policy Graph). A constrained location policy graph G C is a subgraph of the original location policy graph G under a constrained domain C that only includes the edges inside of C. Formally, G C = (C, E C ) where C ⊆ S and E C ⊆ E. Definition 12 (Location Exposure under constrained domain). Given a policy graph G, constrained domain C and an algorithm A satisfying (, G C )location privacy, for a disconnected node s, if A does not guarantee indistinguish-ability between s and any other nodes in C, we call s an isolated node. The user suffers location exposure when she is at the location of the isolated node.

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

4.2

665

Detecting Isolated Node

An interesting finding is that a disconnected node may not always lead to location exposure, which also depends on the algorithm for PGLP. Intuitively, the indistinguishability between a disconnected node and a node in the constrained domain could be guaranteed implicitly. We design Algorithm 3 to detect the isolated node in a constrained policy graph w.r.t. P-PIM. It could be extended to any other PGLP mechanism. For each disconnected node, we check whether it is indistinguishable with other nodes. The problem is equivalent to checking if there is any node inside the convex body f (si ) + K(G C ), which can be solved by the convexity property (if a point sj is inside a convex hull K, then sj can be expressed by the vertices of K with coefficients in [0, 1]) with complexity O(m3 ). We design a faster method with complexity O(m2 log(m)) by exploiting the definition of convex hull: if sj is inside f (si ) + K(G C ), then the new convex hull of the new graph by adding edge si sj will be the same as K(G C ). We give an example of disconnected but not isolated node in Appendix A. Algorithm 3. Finding Isolated Node Require: G, C, disconnected node si ∈ C. 1: Δf G = ∨ (f (sj ) − f (sk ));

2: 3: 4: 5: 6: 7: 8:

C

sj sk ∈E C

K(G ) ← Conv(Δf ); for all sj ∈ C, sj = si do if Conv(Δf G , f (sj ) − f (si )) == K(G C ) then return false end if end for return true

4.3

 We use ∨ to denote Union operator.

G

 not isolated  isolated

Repairing Location Policy Graph

To prevent location exposure under the constrained domain, we need to make sure that there is no isolated node in a constrained policy graph. A simple way is to modify the policy graph to ensure the indistinguishability of the isolated node by adding an edge between it and another node in the constrained domain. The selection of such a node could depend on the requirement of the application. Without the loss of generality, a baseline method for repairing the policy graph could be choosing an arbitrary node from the constrained domain and adding an edge between it and the isolated node. A natural question is how can we repair the policy graph with better utility. Since different ways of adding edges in the policy graph may lead to distinct graph-based sensitivity, which is propositional to the area of the sensitivity hull (i.e., a polygon on the map), the question is equivalent to adding an edge with the minimum area of sensitivity hull (thus the least noise). We design Algorithm 4 to find the minimum area of the new sensitivity hull, as shown in an example in Fig. 4. The analysis is shown in Appendix B. We note that both Algorithms 3 and 4 are oblivious to the true location, so they do not consume the privacy budget. Additionally, the adversary may

666

Y. Cao et al.

be able to “reverse” the process of graph repair and extract the information about the original location policy graph; however, this does not compromise our privacy notion since the location policy graph is public in our setting.

Fig. 4. An example of graph repair with high utility. (a): if C = {s3 , s4 , s5 , s6 }, then s3 is isolated because f (s3 ) + K(G C ) only contains s3 ; to protect s3 , we can re-connect s3 to one of the valid nodes {s4 , s5 , s6 }. (b) shows the new sensitivity hull after adding f (s4 ) − f (s3 ) to K(G C ); (c) shows the new sensitivity hull after adding f (s6 ) − f (s3 ) to K(G C ); (d) shows the new sensitivity hull after adding f (s5 ) − f (s3 ) to K(G C ). Because (b) has the smallest area of the sensitivity hull, s3 should be connected to s4 .

Algorithm 4. Graph Repair with High Utility Require: G, C, isolated node si 1: G C ← G ∧ C; 2: K ← K(G C ); 3: sk ← ∅; 4: minArea ← ∞; 5: for all sj ∈ C, sj = si do 6: K ← K(G C ∨ si sj ); i=h  7: Area = det(vi , vj ) where vh+1 = v1 ;

 new sensitivity hull in O(mlog(m))  Θ(h) time

i=1,j=i+1

8: if Area < minArea then 9: sk ← sj ; 10: minArea = Area; 11: end if 12: end for 13: G C ← G C ∨ si sk 14: return repaired policy graph

5 5.1

 find minimum area

GC ;

 add edge si sk to the graph

Location Trace Release with PGLP Location Release via Hidden Markov Model

A remaining question for continuously releasing private location with PGLP is how to calculate the adversarial knowledge of constrained domain Ct at each time t. According to our adversary model described in Sect. 2.2, the attacker knows

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

667

the user’s mobility pattern modeled by the Markov chain and the initial probability distribution of the user’s location. The attacker also knows the released mechanisms for PGLP. Hence, the problem of calculating the possible location domain (i.e., locations that Pr(s∗t ) > 0) can be modeled as an inference problem in Hidden Markov Model (HMM) in Fig. 5: the attacker attempts to infer the probability distribution of the true location s∗t , given the PGLP mechanism, the Markov model of s∗t , and the observation of z1 , · · · , zt at the current time t. The constrained domain at each time is derived as the locations in the probability distribution of s∗t with non-zero probability.

Fig. 5. Private location trace release via HMM.

We elaborate the calculation of the probability distribution of s∗t as follows. The probability Pr(zt |s∗t ) denotes the distribution of the released location zt where s∗t is the true location at any timestamp t. At timestamp t, we use p− t and p+ t to denote the prior and posterior probabilities of an adversary about current state before and after observing zt respectively. The prior probability can be derived by the (posterior) probability at previous timestamp t − 1 and + the Markov transition matrix as p− t = pt−1 M. The posterior probability can be computed using Bayesian inference as follows. For each state si : Pr(zt |s∗t = si )p− t [i] ∗ p+ t [i] = Pr(st = si |zt ) =  ∗ = s )p− [j] P r(z |s t t j t j

(2)

Algorithm 5 shows the location trace release algorithm. At each timestamp t, we compute the constrained domain (Line 2). For all disconnected nodes under the constrained domain, we check if they are isolated by Algorithm 3. If so, we derive a minimum protectable graph Gt by Algorithm 4. Next, we use the proposed PGLP mechanisms (i.e., P-LM or P-PIM) to release a perturbed location zt . Then the released zt will also be used to update the posterior probability p+ t (in the equation below) by Eq. (2), which subsequently will be used to compute the prior probability for the next time t + 1. We note that, only Line 9 (invoking PGLP mechanisms) uses the true location s∗t . Algorithms 3 and 4 are independent of the true location, so they do not consume the privacy budget.

668

Y. Cao et al.

Algorithm 5. Location Trace Release Mechanism for PGLP ∗ Require: , G, M, p+ t−1 , st

+ 1: p− t ← pt−1 M; 2: Ct ← {si |p− t [i] > 0}; 3: GtC ← G ∧ Ct ; 4: for all disconnected node si in GtC do 5: if si is isolated then 6: GtC ← Algorithm 4(GtC , Ct , si ); 7: end if 8: end for 9: mechanisms for PGLP with parameters , s∗t , Gt ; 10: Derive p+ t by Equation (2); ∗ 11: return Algorithm 5(, GtC , M, p+ t , st+1 );

 Markov transition  constraint  Definition 12  isolated node detection by Algorithm 3  repair graph Gt by Algorithm 4  Algorithms 1 or 2  inference go to next timestamp

Theorem 6 (Complexity). Algorithm 5 has complexity O(dm2 log(m)) where d is the number of disconnected nodes and m is the number of nodes in GtC . 5.2

Privacy Composition

We analyze the composition of privacy for multiple location releases under PGLP. In Definition 6, we define {, G}-location privacy for single location release, where  can be considered the privacy leakage w.r.t. the privacy policy G. A natural question is what would be the privacy leakage of multiple releases at a single timestamp (i.e., for the same true location) or at multiple timestamps (i.e., for a trajectory). In either case, the privacy guarantee (or the upper bound of privacy leakage) in multiple releases depends on the achievable location policy graphs. Hence, the key is to study the composition of the policy graphs in multiple releases. Let A1 , · · · , AT be T independent random algorithms that takes true locations s∗1 , · · · , s∗T as inputs (note that it is possible s∗1 = · · · = s∗T ) and outputs z1 , · · · , zT , respectively. When the viable policy graphs are the same at each release, we have Lemma 2 as below. Lemma 2. If all A1 , · · · , AT satisfy (, G)-location privacy, the combination of {A1 , · · · , AT } satisfies (T , G)-location privacy. As shown in Sect. 4, the feasibility of achieving a policy graph is affected by the constrained domain, which may change along with the released locations. We denote G1 , · · · , GT as viable policy graphs at each release (for single location or for a trajectory), which could be obtained by algorithms in Sect. 4.2 and Sect. 4.3. We give a more general composition theorem for PGLP below. respecTheorem 7. If A1 , · · · , AT satisfy (1 , G1 ), · · · , (T, GT ), -location privacy,  T tively, the combination of {A1 , · · · , AT } satisfies i=1 i , G1 ∧ · · · ∧ GT -location privacy, where ∧ denotes the intersection between the edges of policy graphs. The above theorem provides a method to reason about the overall privacy in continuous releases using PGLP. We note that the privacy composition does not depend on the adversarial knowledge of Markov model, but replies on the

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

669

soundness of the policy graph and PGLP mechanisms at each t. However, the resulting G1 ∧ · · · ∧ GT may not be the original policy graph. It is an interesting future work to study how to ensure a given policy graph across the timeline.

6

Experiments

6.1

Experimental Setting

We implement the algorithms use Python 3.7. The code is available in github2 . We run the algorithms on a machine with Intel core i7 6770k CPU and 64 GB of memory running Ubuntu 15.10 OS. Datasets. We evaluate the algorithms on three real-world datasets with similar configurations in [39] for comparison purpose. The Markov models were learned from the raw data. For each dataset, we randomly choose 20 users’ location trace with 100 timestamps for testing. – Geolife dataset [41] contains tuples with attributes of user ID, latitude, longitude and timestamp. We extracted all the trajectories within the Fourth Ring of Beijing to learn the Markov model, with the map partitioned into cells of 0.34 × 0.34 km2 . – Gowalla dataset [13] contains 6, 442, 890 check-in locations of 196, 586 users over 20 months. We extracted all the check-ins in Los Angeles to train the Markov model, with the map partitioned into cells of 0.37 × 0.37 km2 . – Peopleflow dataset3 includes 102, 468 locations of 11, 406 users with semantic labels of POI in Tokyo. We partitioned the map into cells of 0.27 × 0.27 km2 . Policy Graphs. We evaluate two types of location privacy policy graphs for different applications as introduced in Sect. 1. One is for the policy of “allowing the app to access a user’s location in which area but ensuring indistinguishability among locations in each area”, represented by Gk9 , Gk16 , Gk25 below. The other is for the policy of “allowing the app to access the semantic label (e.g., a restaurant or a shop) of a user’s location but ensuring indistinguishability among locations with the same category”, represented by Gpoi below. – Gk9 is a policy graph that all locations in each 3 × 3 region (i.e., 9 grid cells using grid coordinates) are fully connected with each other. Similarly, we have Gk16 and Gk25 for region size 4 × 4 and 5 × 5, respectively. – Gpoi : all locations with both the same category and the same 6 × 6 region are fully connected. We test the category of restaurant in Peopleflow dataset. Utility Metrics. We evaluate three types of utility (error) for different applications. We run the mechanisms 200 times and average the results. Note that the lower value of the following metrics, the better utility. 2 3

https://github.com/emory-aims/pglp. http://pflow.csis.u-tokyo.ac.jp/.

670

Y. Cao et al.

Fig. 6. Utility of P-LM vs. P-PIM with respect to Gk9 , Gk16 and Gpoi .

Fig. 7. Utility of different policy graphs.

– The general utility was measured by Euclidean distance (km), i.e., Eeu , between the released location and the true location as defined in Sect. 2.2. – The utility for weather apps or road traffic monitoring, i.e., “whether the released location is in the same region with the true location”. We measure it by Er = ||R(s∗ ), R(z)||0 where R(·) is a region query that returns the index of the region. Here we define the region size as 5 × 5 grid cells. – The utility for POI mining or crowd monitoring during the pandemic, i.e., “whether the released location is the same category with the true location”. We measure it by Epoi = ||C(s∗ ), C(z)||0 where C(·) returns the category of the corresponding location. We evaluated the location category of “restaurant”. 6.2

Results and Analysis

P-LM vs. P-PIM. Figure 6 compares the utility of two proposed mechanisms P-LM and P-PIM for PGLP under the policy graphs Gk9 , Gk16 , Gk25 and Gpoi on Peopleflow dataset. The utility of P-PIM outperforms P-LM for different policy graphs and different  since the sensitivity hull could achieve lower error bound. Utility Gain by Tuning Policy Graphs. Figure 7 demonstrates that the utility of different applications can be boosted with appropriate policy graphs. We evaluate the three types of utility metrics using different policy graphs on Peopleflow dataset. Figure 7 shows that, for utility metrics Eeu , Er and Epoi , the policy graphs with the best utility are Gk9 , Gk25 and Gpoi , respectively. Gk9 has smallest Eeu because of the least sensitivity. When the query is 5 × 5 region query, Gk25 has the full usability (Er =0). When the query is POI query like the one mentioned above, Gpoi leads to full utility (Er =0) since Gpoi allows to

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

671

Fig. 8. Evaluation of graph repair.

disclose the semantic category of the true location while maintaining the indistinguishability among the set of locations with the same category. Note that Epoi is decreasing with larger  for policy graph G9 because the perturbed location has a higher probability to be the true location; while this effect is diminished in larger policy graphs such as G16 or G25 due to their larger sensitivities. We conclude that location policy graphs can be tailored flexibly for better utilityprivacy trade-off. Evaluation of Graph Repair. Figure 8 shows the results of graph repair algorithms. We compare the proposed Algorithm 4 with a baseline method that repairs the problematic policy graph by adding an edge between the isolated node with its nearest node in the constrained domain. It shows that the utility measured by Eeu of Algorithm 4 is always better than the baseline but at the cost of higher runtime. Notably, the utility is decreasing (i.e., higher Eeu ) with larger constrained domains because of larger policy graph (thus higher sensitivity); a larger constrained domain also incurs higher runtime because more isolated nodes need to be processed. Evaluation of Location Trace Release. We demonstrate the utility of private trajectory release with PGLP in Fig. 9 and Fig. 10. In Fig. 9, we show the results of P-LM and P-PIM on the Geolife Dataset. We test 20 users’ trajectories with 100 timestamps and report average Eeu at each timestamp. We can see P-PIM has higher utility than P-PIM, which in line with the results for single location release. The error Eeu is increasing along with timestamps due to the enlarged constrained domain, which is in line with Fig. 8. The average of Eeu across 100 timestamps on different policy graphs, i.e., Gk9 , Gk16 and Gk25 is also in accordance with the single location release in Fig. 7. Gk9 has the least average error of Eeu due to the smallest sensitivity. In Fig. 10, we show the utility of P-PIM with different policy graphs on two different datasets Geolife and Gowalla. The utility of Gk9 is always the best over different timestamps for both datasets. In general, the Gowalla dataset has better utility than the Geolife dataset because the constraint domain of the Gowalla dataset is smaller. The reason is that the Gowalla dataset collects checkin locations that have an apparent mobility pattern, as shown in [13]. While Geolife dataset collects GPS trajectory with diverse transportation modes such as walk, bus, or train; thus, the trained Markov model is less accurate.

672

Y. Cao et al.

Fig. 9. Utility of private trajectory release with P-LM and P-PIM.

Fig. 10. Utility of private trajectory release with different policy graphs.

7

Conclusion

In this paper, we proposed a flexible and rigorous location privacy framework named PGLP, to release private location continuously under the real-world constraints with customized location policy graphs. We design an end-to-end private location trace release algorithm satisfying a pre-defined location privacy policy. For future work, there are several promising directions. One is to study how to use the rich interface of PGLP for the utility-privacy tradeoff in the realworld location-based applications, such as carefully designing location privacy policies for COVID-19 contact tracing [4]. Another exciting direction is to design advanced mechanisms to achieve location privacy policies with less noise.

A

An Example of Isolated Node

Intuition. We examine the privacy guarantee of P-PIM w.r.t. G C in Fig. 3(a). According to K-norm Mechanism [24] in Definition 9, P-PIM guarantees that, for any two neighbors si and sj , their difference is bounded in the convex body K , i.e. f (si ) − f (sj ) ∈ K . Geometrically, for a location s, all other locations in the convex body of K + f (s) are -indistinguishable with s. Example 1 (Disconnected but Not Isolated Node). In Fig. 11, s2 is disconnected under constraint C = {s2 , s4 , s5 , s6 }. However, s2 is not isolated because f (s2 ) + K contains f (s4 ) and f (s5 ). Hence, s2 and other nodes in C are indistinguishable.

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

673

Fig. 11. (a) A policy graph under C = {s2 , s4 , s5 , s6 }; (b) the Δf G of vectors f (si ) − f (sj ); (c) the sensitivity hull K(G C ) covering the Δf G ; (d) the shape f (s2 ) + K(G C ) containing f (s4 ) and f (s5 ). That is to say, s2 is indistinguishable with s4 and s5 .

B

Policy Graph Repair Algorithm

Figure 4(a) shows the graph under constraint C = {s3 , s4 , s5 , s6 }. Then s3 is exposed because f (s3 )+K G contains no other node. To satisfy the PGLP without isolated nodes, we need to connect s3 to another node in C, i.e. s4 , s5 or s6 . If s3 is connected to s4 , then Fig. 4(b) shows the new graph and its sensitivity hull. By adding two new edges {f (s3 ) − f (s4 ), f (s4 ) − f (s3 )} to Δf G , the shaded areas are attached to the sensitivity hull. Similarly, Figs. 4(c) and (d) show the new sensitivity hulls when s3 is connected to s6 and s5 respectively. Because the smallest area of K G is in Fig. 4(b), the repaired graph is G C ∨ s3 s4 , i.e., add edge s3 s4 to the graph G C . Theorem 8. Algorithm 4 takes O(m2 log(m)) time where m is the number of valid nodes (with edge) in the policy graph. Algorithm. Algorithm 4 derives the minimum protectable graph for location data when an isolated node is detected. We can connect the isolated node si to the rest (at most m) nodes, generating at most m convex hulls where m = |V| is the number of valid nodes. In two-dimensional space, it only takes O(mlog(m)) time to find a convex hull. To derive Area of a shape, we exploit the computation of determinant whose intrinsic meaning is the Volume of the column vectors. i=h  det(vi , vj ) to derive the Area of a convex hull with Therefore, we use i=1,j=i+1

clockwise nodes v1 , v2 , · · · , vh where h is the number of vertices and vh+1 = v1 . By comparing the area of these convex hulls, we can find the smallest area in O(m2 log(m)) time where m is the number of valid nodes. Note that Algorithms 3 and 4 can also be combined together to protect any disconnected nodes.

C C.1

Related Works Differential Privacy

While differential privacy [15] has been accepted as a standard notion for privacy protection, the concept of standard differential privacy is not generally

674

Y. Cao et al.

applicable for complicated data types. Many variants of differential privacy have been proposed, such as Pufferfish privacy [27], Geo-Indistinguishability [1] and Voice-Indistinguishability [23] (see Survey [33]). Blowfish privacy [25] is the first generic framework with customizable privacy policy. It defines sensitive information as secrets and known knowledge about the data as constraints. By constructing a policy graph, which should also be consistent with all constraints, Blowfish privacy can be formally defined. Our definition of PGLP is inspired by Blowfish framework. Notably, we find that the policy graph may not be viable under temporal correlations represented by Markov model, which was not considered in the previous work. This is also related to another line of works studying how to achieve differential privacy under temporal correlations [8–10,36]. C.2

Location Privacy

A close line of works focus on extending differential privacy to location setting. The first DP-based location privacy model is Geo-Indistinguishability [1], which scales the sensitivity of two locations to their Euclidean distance. Hence, it is suitable for proximity-based applications. Following by Geo-Indistinguishability, several location privacy notions [38,40] have been proposed based on differential privacy. A recent DP-based location privacy, spatiotemporal event privacy [5–7], proposed a new representation for secrets in spatiotemporal data called spatiotemoral events using Boolean expression. It is essentially different from this work since here we are considering the traditional representation of secrets, i.e., each single location or a sequence of locations. Several works considered Markov models for improving utility of released location traces or web browsing activities [11,17], but did not consider the inference risks when an adversary has the knowledge of the Markov model. Xiao et al. [39] studied how to protect the true location if a user’s movement follows Markov model. The technique can be viewed as a special instantiation of PGLP. In addition, PGLP uses a policy graph to tune the privacy and utility to meet diverse the requirement of location-based applications.

References 1. Andr´es, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geoindistinguishability: differential privacy for location-based systems. In: CCS, pp. 901–914 (2013) 2. Bao, J., Zheng, Yu., Wilkie, D., Mokbel, M.: Recommendations in location-based social networks: a survey. GeoInformatica 19(3), 525–565 (2015). https://doi.org/ 10.1007/s10707-014-0220-8 3. Bettini, C., Wang, X.S., Jajodia, S.: Protecting privacy against location-based personal identification. In: Jonker, W., Petkovi´c, M. (eds.) SDM 2005. LNCS, vol. 3674, pp. 185–199. Springer, Heidelberg (2005). https://doi.org/10.1007/ 11552338 13 4. Cao, Y., Takagi, S., Xiao, Y., Xiong, L., Yoshikawa, M.: PANDA: policy-aware location privacy for epidemic surveillance. In: VLDB Demonstration Track (2020, to appear)

PGLP: Customizable and Rigorous Location Privacy Through Policy Graph

675

5. Cao, Y., Xiao, Y., Xiong, L., Bai, L.: PriSTE: from location privacy to spatiotemporal event privacy. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1606–1609 (2019) 6. Cao, Y., Xiao, Y., Xiong, L., Bai, L., Yoshikawa, M.: PriSTE: protecting spatiotemporal event privacy in continuous location-based services. Proc. VLDB Endow. 12(12), 1866–1869 (2019) 7. Cao, Y., Xiao, Y., Xiong, L., Bai, L., Yoshikawa, M.: Protecting spatiotemporal event privacy in continuous location-based services. IEEE Trans. Knowl. Data Eng. (2019) 8. Cao, Y., Xiong, L., Yoshikawa, M., Xiao, Y., Zhang, S.: ConTPL: controlling temporal privacy leakage in differentially private continuous data release. VLDB Demonstration Track 11(12), 2090–2093 (2018) 9. Cao, Y., Yoshikawa, M., Xiao, Y., Xiong, L.: Quantifying differential privacy under temporal correlations. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 821–832 (2017) 10. Cao, Y., Yoshikawa, M., Xiao, Y., Xiong, L.: Quantifying differential privacy in continuous data release under temporal correlations. IEEE Trans. Knowl. Data Eng. 31(7), 1281–1295 (2019) 11. Chatzikokolakis, K., Palamidessi, C., Stronati, M.: A predictive differentiallyprivate mechanism for mobility traces. In: De Cristofaro, E., Murdoch, S.J. (eds.) PETS 2014. LNCS, vol. 8555, pp. 21–41. Springer, Cham (2014). https://doi.org/ 10.1007/978-3-319-08506-7 2 12. Chatzikokolakis, K., Palamidessi, C., Stronati, M.: Constructing elastic distinguishability metrics for location privacy. Proc. Priv. Enhancing Technol. 2015(2), 156–170 (2015) 13. Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: KDD, pp. 1082–1090 (2011) 14. Chow, C.-Y., Mokbel, M.F., Liu, X.: Spatial cloaking for anonymous locationbased services in mobile peer-to-peer environments. GeoInformatica 15(2), 351–380 (2011) 15. Dwork, C.: Differential privacy. In: ICALP, pp. 1–12 (2006) 16. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878 14 17. Fan, L., Bonomi, L., Xiong, L., Sunderam, V.: Monitoring web browsing behavior with differential privacy. In: WWW, pp. 177–188 (2014) 18. Fawaz, K., Shin, K.G.: Location privacy protection for smartphone users. In: CCS, pp. 239–250 (2014) 19. Furuhata, M., Dessouky, M., Ord´ on ˜ez, F., Brunet, M.-E., Wang, X., Koenig, S.: Ridesharing: the state-of-the-art and future directions. Transp. Res. Part B: Methodol. 57, 28–46 (2013) 20. Gambs, S., Killijian, M.-O., del Prado Cortez, M.N.: Next place prediction using mobility Markov chains. In: Proceedings of the First Workshop on Measurement, Privacy, and Mobility, pp. 1–6 (2012) 21. Gedik, B., Liu, L.: Protecting location privacy with personalized k-anonymity: Architecture and algorithms. IEEE Trans. Mob. Comput. 7(1), 1–18 (2008) 22. Gruteser, M., Grunwald, D.: Anonymous usage of location-based services through spatial and temporal cloaking. In: MobiSys, pp. 31–42 (2003) 23. Han, Y., Li, S., Cao, Y., Ma, Q., Yoshikawa, M.: Voice-indistinguishability: protecting voiceprint in privacy-preserving speech data release. In: IEEE ICME (2020)

676

Y. Cao et al.

24. Hardt, M., Talwar, K.: On the geometry of differential privacy. In: STOC, pp. 705–714 (2010) 25. He, X., Machanavajjhala, A., Ding, B.: Blowfish privacy: tuning privacy-utility trade-offs using policies, pp. 1447–1458 (2014) 26. Ingle, M., et al.: Slowing the spread of infectious diseases using crowdsourced data. IEEE Data Eng. Bull. 12 (2020) 27. Kifer, D., Machanavajjhala, A.: A rigorous and customizable framework for privacy. In: PODS, pp. 77–88 (2012) 28. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE ICDE, pp. 106–115 (2007) 29. Li, N., Lyu, M., Su, D., Yang, W.: Differential privacy: from theory to practice (2016) 30. Luo, Y., Tang, N., Li, G., Li, W., Zhao, T., Yu, X.: DEEPEYE: a data science system for monitoring and exploring COVID-19 data. IEEE Data Eng. Bull. 12 (2020) 31. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. In: IEEE ICDE, p. 24 (2006) 32. Parent, C., et al.: Semantic trajectories modeling and analysis. ACM Comput. Surv. 45(4), 42:1–42:32 (2013) 33. Pej´ o, B., Desfontaines, D.: SoK: differential privacies. In: Proceedings on Privacy Enhancing Technologies Symposium (2020) 34. Primault, V., Boutet, A., Mokhtar, S.B., Brunie, L.: The long road to computational location privacy: a survey. IEEE Commun. Surv. Tutor. 21, 2772–2793 (2018) 35. Recabarren, R., Carbunar, B.: What does the crowd say about you? Evaluating aggregation-based location privacy. WPES 2017, 156–176 (2017) 36. Song, S., Wang, Y., Chaudhuri, K.: Pufferfish privacy mechanisms for correlated data. In: SIGMOD, pp. 1291–1306 (2017) 37. Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002) 38. Takagi, S., Cao, Y., Asano, Y., Yoshikawa, M.: Geo-graph-indistinguishability: protecting location privacy for LBS over road networks. In: Foley, S.N. (ed.) DBSec 2019. LNCS, vol. 11559, pp. 143–163. Springer, Cham (2019). https://doi.org/10. 1007/978-3-030-22479-0 8 39. Xiao, Y., Xiong, L.: Protecting locations with differential privacy under temporal correlations. In: CCS, pp. 1298–1309 (2015) 40. Xiao, Y., Xiong, L., Zhang, S., Cao, Y.: LocLok: location cloaking with differential privacy via hidden Markov model. Proc. VLDB Endow. 10(12), 1901–1904 (2017) 41. Zheng, Y., Chen, Y., Xie, X., Ma, W.-Y.: GeoLife2.0: a location-based social networking service. In: IEEE MDM, pp. 357–358 (2009)

Where Are You Bob? Privacy-Preserving Proximity Testing with a Napping Party Ivan Oleynikov1(B) , Elena Pagnin2 , and Andrei Sabelfeld1 1

Chalmers University, Gothenburg, Sweden {ivanol,andrei}@chalmers.se 2 Lund University, Lund, Sweden [email protected]

Abstract. Location based services (LBS) extensively utilize proximity testing to help people discover nearby friends, devices, and services. Current practices rely on full trust to the service providers: users share their locations with the providers who perform proximity testing on behalf of the users. Unfortunately, location data has been often breached by LBS providers, raising privacy concerns over the current practices. To address these concerns previous research has suggested cryptographic protocols for privacy-preserving location proximity testing. Yet general and precise location proximity testing has been out of reach for the current research. A major roadblock has been the requirement by much of the previous work that for proximity testing between Alice and Bob both must be present online. This requirement is not problematic for one-to-one proximity testing but it does not generalize to one-to-many testing. Indeed, in settings like ridesharing, it is desirable to match against ride preferences of all users, not necessarily ones that are currently online. This paper proposes a novel privacy-preserving proximity testing protocol where, after providing some data about its location, one party can go offline (nap) during the proximity testing execution, without undermining user privacy. We thus break away from the limitation of much of the previous work where the parties must be online and interact directly to each other to retain user privacy. Our basic protocol achieves privacy against semi-honest parties and can be upgraded to full security (against malicious parties) in a straight forward way using advanced cryptographic tools. Finally, we reduce the responding client overhead from quadratic (in the proximity radius parameter) to constant, compared to the previous research. Analysis and performance experiments with an implementation confirm our findings. Keywords: Secure proximity testing based services · MPC

1

· Privacy-preserving location

Introduction

As we digitalize our lives and increasingly rely on smart devices and services, location based services (LBS) are among the most widely employed ones. These c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 677–697, 2020. https://doi.org/10.1007/978-3-030-58951-6_33

678

I. Oleynikov et al.

range from simple automation like “switch my phone to silent mode if my location is office”, to more advanced services that involve interaction with other parties, as in “find nearby coffee shops”, “find nearby friends”, or “find a ride”. Preserving Privacy for Location Based Services. Current practices rely on full trust to the LBS providers: users share their locations with the providers who manipulate location data on behalf of the users. For example, social apps Facebook and Tinder require access to user location in order to check if other users are nearby. Unfortunately, location and user data has been often breached by the LBS providers [27]. The ridesharing app Uber has been reported to violate location privacy of users by stalking journalists, VIPs, and ex-partners [22], as well as ex-filtrating user location information from its competitors [40]. This raises privacy concerns over the current practices. Privacy-Preserving Proximity Testing. To address these concerns previous research has suggested cryptographic protocols for privacy-preserving location services. The focus of this paper is on the problem of proximity testing, the problem of determining if two parties are nearby without revealing any other information about their location. Proximity testing is a useful ingredient for many LBS. For example, ridesharing services are often based on determining the proximity of ride endpoints [18]. There is extensive literature (discussed in Sect. 6) on the problem of proximity testing [11,20,21,24,29,30,35,36,38,41,42,44]. Generality and Precision of Proximity Testing. Yet general and precise location proximity testing has been out of reach for the current research. A major roadblock has been the requirement that proximity testing between Alice and Bob is only possible in a pairwise fashion and both must be present online. As a consequence, Alice cannot have a single query answered with respect to multiple Bobs, and nor can she is able to check proximity with respect to Bob’s preferences unless Bob is online. The popular ridesharing service BlaBlaCar [4] (currently implemented as a full-trust service) is an excellent fit to illustrate our goals. This service targets intercity rides which users plan in advance. It is an important requirement that users might go off-line after submitting their preferences. The goal is to find rides that start and end at about the same location. Bob (there can be many Bobs) submits the endpoints of a desired ride to the service and goes offline (napping). At a later point Alice queries the service for proximity testing of her endpoints with Bob’s. A key requirement is that Alice should be able to perform a one-to-many proximity query, against all Bobs, and compute answer even if Bob is offline. Unfortunately, the vast majority of the previous work [11,20,21, 24,30,35,36,38,41,42,44] fall short of addressing this requirement. Another key requirement for our work is precision. A large body of prior approaches [11,26,29–31, 41,42,44] resort to grid-based approximations where the proximity problem is reduced to the problem of checking whether the parties are located in the same cell on the grid. Unfortunately, grid-based proximity

Privacy-Preserving Proximity Testing with a Napping Party

679

suffers from both false positives and negatives and can be exploited when crossing cell boundaries [9]. In contrast, our work targets precise proximity testing. This paper addresses privacy-preserving proximity testing with respect to napping parties. Beyond the described offline functionality and precision we summarize the requirements for our solution as follows: (1) security, in the sense that Alice may not learn anything else about Bob’s location other than the proximity; Bob should not learn anything about Alice’s location; and the service provider should not learn anything about Alice’s or Bob’s locations; (2) generality, in the sense that the protocol should allow for one-to-many matching without demanding all users to be online; (3) precision, in the sense of a reliable matching method, not an approximate one; (2) lightweight client computation, in the sense of offloading the bulk of work to intermediate servers. We further articulate on these goals in Sect. 2. Contributions. This paper proposes OLIC (OffLine Inner-Circle), a novel protocol for proximity testing (Sect. 4). We break away from the limitation of much of the previous work where the parties must be online. Drawing on Hallgren et al.’s twoparty protocol InnerCircle [20] we propose a novel protocol for proximity testing that utilizes two non-colluding servers. One server is used to blind Bob’s location in such a way that the other server can unblind it for any Alice. Once they have uploaded their locations users in our protocol can go offline and retrieve the match outcome the next time they are online. In line with our goals, we guarantee security with respect to semi-honest parties, proving that the only location information leaked by the protocol is the result of the proximity test revealed to Alice (Sect. 4.2). We then show how to generically mitigate malicious (yet non-colluding) servers by means of zero knowledge proofs and multi-key homomorphic signatures (Sect. 4.3). Generality in the number of users follows from the fact that users do not need to be online in a pairwise fashion, as a single user can query proximity against the encrypted preferences of the other users. We leverage InnerCircle to preserve the precision, avoiding to approximate proximity information by grids or introducing noise. Finally, OLIC offloads the bulk of work from Bob to the servers, thus reducing Bob’s computation and communication costs from quadratic (in the proximity radius parameter) to constant. We note, that while InnerCircle can also be trivially augmented with an extra server to offload Bob’s computations to, this will add extra security assumptions and make InnerCircle less applicable in practice. OLIC, on the other hand, already requires the servers for Bob to submit his data to, and we get offloading for free. On Alice’s side, the computation and communication costs stay unchanged. We develop a proof of concept implementation of OLIC and compare it with InnerCircle. Our performance experiments confirm the aforementioned gains (Sect. 5). On the 2 Non-Colluding Servers Assumption. We consider this assumption to be realistic, in the sense that it significantly improves on the current practices of a single full-trust server as in BlaBlaCar, while at the same time being compatible with common assumptions in practical cryptographic privacy-preserving

680

I. Oleynikov et al.

systems. For example, Sharemind [39] requires three non-colluding servers for its multi-party computation system based on 3-party additive secret sharing. To the best of our knowledge, OLIC represents the first 2-server solution to perform proximity testing against napping users in ridesharing scenarios, where privacy, precision and efficiency are all mandatory goals. Notably, achieving privacy using a single server is known to be impossible [17]. Indeed, if Bob was to submit his data to only one server, the latter could query itself on this data and learn Bob’s location via trilateration attack.

2

Modeling Private Proximity Testing Using Two Servers

The goal of private proximity testing is to enable one entity, that we will call Alice, to find out whether another entity, Bob, is within a certain distance of her. Note that the functionality is asymmetric: only Alice learns whether Bob lies in her proximity. The proximity test should be performed without revealing any additional information regarding the precise locations to one another, or to any third party. Our Setting. We consider a multi party computation protocol for four parties (Alice, Bob, Server1 , and Server2 ) that computes the proximity of Alice’s and Bob’s inputs in a private way and satisfies the following three constraints: (C-1) Alice does not need to know Bob’s identity before starting the test, nor the parties need to share any common secret; (C-2) Bob needs to be online only to update his location data. In particular, Bob can ‘nap’ during the actual protocol execution. (C-3) The protocol is executed with the help of two servers. In detail, constraint (C-1) ensures that Alice can look for a match in the database without necessarily targeting a specific user. This may be relevant in situations where one wants to check the availability of a ride ‘near by’, instead of searching if a specific cab is at reach and aligns with our generality goal towards oneto-many matching. Constraint (C-2) is rarely considered in the literature. The two most common settings in the literature are either to have Alice and Bob communicate directly to one another (which implies that either the two parties need to be online at the same time) [20], or to rely on a single server, which may lead either to ‘hiccup’ executions (lagging until the other party rejoins online) [30] or to the need for a trusted server. In order to ensure a smooth executions even with a napping Bob and to reduce the trust in one single server, we make use of two servers to store Bob’s contribution to the proximity test, that is constraint (C-3). This aligns with our goal of lightweight client computation. We remark that, for a napping Bob, privacy is lost if we use a single server [17]. Finally, unlike [30] we do not require the existence of a social network among system users, to determine who trusts whom, nor do we rely on shared secret keys among users.

Privacy-Preserving Proximity Testing with a Napping Party

681

Formalizing ‘Napping’. We formalize the requirement that ‘Bob may be napping’ during the protocol execution in the following way. It is possible to split the protocol in two sequential phases. In the first phase, Bob is required to be online and to upload data to the two servers. In the second phase Alice comes online and perform her proximity test query to one server, that we call Server1 . The servers communicate with one another to run the protocol execution, and finally Server2 returns the result to Alice. Ideal Functionality. We adopt an ideal functionality that is very close to the one in [20]: if Alice and Bob are within a distance r2 of each other, the protocol outputs 1 (to Alice), otherwise it outputs 0 (to Alice). Figure 1 depicts this behavior. Alice and Bob are the only parties giving inputs to the protocol. Server1 and Server2 do not give any input nor receive any output. This approach aligns with our goal towards precision and a reliable matching method, and brakes away from approximate approaches. For simplicity, we assume the threshold value r2 to be set a priori, but our protocol accommodates for Alice (or Bob) to choose this value.

Fig. 1. Figurative representation of the functionality implemented by privacy-preserving location proximity.

Practical Efficiency. Finally, we are interested in solutions that are efficient and run in reasonable time on commodity devices (e.g., laptop computers, smartphones). Concretely, we aim at reducing the computational burden of the clients—Alice and Bob—so that their algorithms can run in just a few seconds, and can in the worst case match the performance of InnerCircle. Attacker Model. We make the following assumptions: (A-1) Server1 , Server2 are not colluding; (A-2) All parties (Alice, Bob, Server1 , and Server2 ) are honest but curious, i.e., they meticulously follow the protocol but may log all messages and attempt to infer some further knowledge on the data they see. In our model any party could be an attacker that tries to extract un-authorized information from the protocol execution. However, we mainly consider attackers that do so without deviating from their expected execution. In Sect. 4.3, we show that it is possible to relax assumption (A-2) and guarantee users’ privacy against 2 malicious servers. In this case, the servers may misbehave and output a different result than what expected from the protocol execution. However, we do not let the two server collude and intentionally share information. While we can handle malicious servers, we cannot tolerate a malicious Alice or Bob. Detecting attacks such as Alice or Bob providing fake coordinates, or Alice trilaterating Bob is outside our scope and should be addressed by different means. We regard such attacks as orthogonal to our contribution and suggest to mitigate them by employing tamper-resistant location devices or location tags techniques [30].

682

I. Oleynikov et al.

Privacy. Our definition of privacy goes along the lines of [30]. Briefly, a protocol is private if it reveals no information other than the output of the computation, i.e., it has the same behavior as the ideal functionality. To show the privacy of a protocol we adopt the standard definition of simulation based security for deterministic functionalities [28]. Concretely, we will argue the indistinguishably between a real world and an ideal world execution of our protocol, assuming that the building blocks are secure, there is no collusion among any set of parties and all parties are honest but curious. General Limitations. Proximity testing can be done in a private way only for a limited amount of requests [20,30] in order to avoid trilateration attacks [43]. Practical techniques to limit the number of requests to proximity services are available [34]. Asymmetric proximity testing is suitable for checking for nearby people, devices and services. It is known [44] that the asymmetric setting not directly generalize to mutual location proximity testing.

3

Recap of the InnerCircle Protocol

We begin by giving a high-level overview of the InnerCircle protocol by Hallgren et al. [20] and then describe its relation to our work. The InnerCircle protocol in [20] allows two parties, Alice and Bob, to check whether the Euclidean distance between them is no greater than a chosen value r. The protocol is privacy preserving, i.e., Alice only learns a bit stating whether the Bob lies closer that r or not, while it does not reveal anything to Bob or to any external party.

Fig. 2. Diagram of the InnerCircle protocol.

Privacy-Preserving Proximity Testing with a Napping Party

683

More formally, let (xA , yA ) and (xB , yB ) respectively denote Alice’s and Bob’s locations. Throughout the paper, we will use the shorthand D = (xA − xB )2 + (yA − yB )2 to denote the squared Euclidean distance between Alice and Bob and δ is an encryption of D. For for any fixed non-negative integer r, the functionality implemented by the InnerCircle protocol is defined by: FInnerCircle ((xA , yA ), (xB , yB )) = (z, ε), where  1, if D ≤ r2 (D = (xA − xB )2 + (yA − yB )2 ) z= 0, otherwise. InnerCircle is a single round protocol where the first message is from Alice to Bob and the second one is from Bob to Alice (see Fig. 2). InnerCircle has three major steps (detailed in Fig. 3): (1) Alice sends to Bob encryptions of her coordinates together with her public key pk; (2) Bob computes δ (an encryption of D) using Alice’s data and his own coordinates, after that he runs LessThan to produce a list L of ciphertexts which encode whether D ≤ r2 (without knowing D himself) and sends L to Alice; (3) Alice decrypts the list using CheckLessThan and learns whether D ≤ r2 or not. Figure 3 contains the definitions of the three procedures used in InnerCircle that will carry on to our OLIC protocol. Below we describe the major ones (LessThan and CheckLessThan) in more detail. The procedure LessThan (Fig. 3b) produces a set of ciphertexts from which Alice can deduce whether D ≤ r2 without disclosing the actual value D. The main idea behind this algorithm is that given δ = Encpk (D) Bob can “encode” whether D = i in the value x ← δ ⊕ Encpk (−i), and then “mask” it by multiplying it by a random element l ← x  r. Observe that if D − i = 0 then Decsk (l) = 0; otherwise if D −i is some invertible element of M then Decsk (l) is uniformly distributed on M \ {0}. (When we instantiate our OLIC protocol for specific cryptosystem, we will ensure that D − i is either 0 or an invertible element, hence we do not consider here the third case when neither of these holds.) The LessThan Fig. 3. The core algorithms in the procedure terminates with a shuffling of the InnerCircle protocol.

684

I. Oleynikov et al.

list, i.e., the elements in L are reorganized in a random order so that the entry index no longer correspond to the value i. The procedure CheckLessThan (called inProx in [20]) depicted in Fig. 3c. This procedure takes as input the list L output by LessThan and decrypts one by one all of its components. The algorithm returns 1 if and only if there is a list element that decrypts to 0. This tells Alice whether D = i for some i < r2 . We remark that Alice cannot infer any additional information, in particular she cannot extract the exact value i for which the equality holds. In the LessThan procedure Bob computes such “encodings” l for all i ∈ {0 . . . r2 } and accumulates them in a list. So if D ∈ {0 . . . r2 −1} is the case, then one of the list elements will decrypt to zero and others will decrypt to uniformly random M \ {0} elements, otherwise all of the list elements will decrypt to random nonzero elements. If the cryptosystem allowed multiplying encrypted values, Bob could compute the product of the “encodings” instead of collecting them into a list and send only a single plaintext, but the cryptosystem used here is only additively homomorphic and does not allow multiplications.

4

Private Location Proximity Testing with Napping Bob

Notation. We denote an empty string by ε, and a list of values by [·] and the set of plaintexts (of an encryption scheme) by M. We order parties alphabetically, protocols’ inputs and outputs follow the order of the parties. Finally, we denote c the computationally indistinguishablity of two distributions D1 , D2 by D1 ≡ D2 . 4.1

OLIC: Description of the Protocol

In what follows, we describe our protocol for privacy-preserving location proximity testing with a napping Bob. We name this protocol OLIC (for OffLine InnerCircle) to stress its connection with the InnerCircle protocol and the fact that it can run while Bob is offline. More specifically, instead of exchanging messages directly with one another (as in InnerCircle), in OLIC Alice and Bob communicate with (and through) two servers. The Ideal Functionality of OLIC. At a high level, OLIC takes as input two locations (xA , yA ) and (xB , yB ), and a radius value r; and returns 1 if the Euclidean distance between the locations is less than or equal to r, and 0 otherwise. We perform this test with inputs provided by two parties, Alice and Bob, and the help of two servers, Server1 and Server2 . Formally, our ideal functionality has the same input and output behavior as InnerCircle for Alice and Bob, but it additionally has two servers whose inputs and outputs are empty strings ε. FOLIC ((xA , yA ), (xB , yB ), ε, ε) = (res, ε, ε, ε),  1, if (xA − xB )2 + (yA − yB )2 ≤ r2 where res = 0, otherwise.

(1)

Privacy-Preserving Proximity Testing with a Napping Party

685

In our protocol we restrict the input/output spaces of the ideal functionality of Eq. 1 to values that are meaningful to our primitives. In other words, we require that the values xA , yA , xB , yB are admissible plaintext (for the encryption scheme employed in OLIC), and that the following values are invertible (in the ring of plaintext) xB , yB , D − i for i ∈ {0, . . . , r2 − 1} and D = (xA − xB )2 + (yA − yB )2 . We will denote the set of all suitable inputs as Sλ ⊆ M. Figure 4 depicts the flow of the OLIC protocol. Figure 5 contains the detailed description of the procedures called in Fig. 4.

Fig. 4. Overview of the message flow of our OLIC protocol. The message exchanges are grouped to reduce vertical space; in a real execution of the protocol Bob may submit at any time before CompTerms is run.

Step-0: Alice, Server1 , and Server2 independently generate their keys. Alice sends her pk to Server1 and Server2 ; Server1 and Server2 send their respective public keys pk1 , pk2 to Bob. Step-1: At any point in time, Bob encodes his location using the BStart algorithm (Fig. 5a), and sends its blinded coordinates to Server1 , and the corresponding blinding factors to Server2 . Step-2: At any point in time Alice runs AStart (Fig. 3a) and sends her ciphertexts to Server1 . Step-3: once Server1 collects Alice’s and Bob’s data it can proceed with computing 3 addends useful to obtain the (squared) Euclidean distance between

686

I. Oleynikov et al.

their locations. To do so, Server1 runs CompTerms (Fig. 5b), and obtains: 2 2 c1 = Encpk (x2A + yA + (x2B + yB + r1 )) c2 = Encpk (2xA xB r2 ) c3 = Encpk (2yA yB r3 )

Server1 sends the above three ciphertexts to Server2 . Step-4: Server2 runs UnBlind (Fig. 5c) to remove Bob’s blinding factors from c1 , c2 , c3 and obtains the encrypted (squared) Euclidean distance between Alice and Bob: δ = c1  Encpk (−r1 )  c2  Encpk (r2−1 )  c3  Encpk (r3−1 ) = Encpk ((xA − xB )2 + (yA − yB )2 ) Then Server2 uses δ and the radius value r to run LessThan (as in InnerCircle, Fig. 3b), obtains the list of ciphertexts L and returns L to Alice. Step-5: Alice runs CheckLessThan (Fig. 3c) to learn whether D ≤ r2 in the same way as done in InnerCircle. The correctness of OLIC follows from the correctness of InnerCircle and a straightforward computation of the input-output of BStart, CompTerms, and UnBlind. 4.2

OLIC: Privacy of the Protocol

We prove that our OLIC provides privacy against semi-honest adversaries. We do so by showing an efficient simulator for each party involved in the protocol and then arguing that each simulator’s output is indistinguishable from the view of respective party. More details on the cryptographic primitives and their security model can be found in [13,28]. Theorem 1. If the homomorphic encryption scheme HE = (Gen, Enc, Dec, Eval) used in OLIC is IND-CPA secure and function private then OLIC securely realizes the privacy-preserving location proximity testing functionality (in Eq. 1) against semi-honest adversaries. At a high level, to prove Theorem 1 we construct one simulator per party and prove that each simulator’s output is computationally indistinguishable (by nonuniform algorithms) from the views of each single party in a real protocol execution. The simulators for Bob, Server1 , and Server2 are trivial — all the messages in their views are either uniformly distributed or encrypted, both of which is easy to simulate (any encrypted value can be simulated as an encryption of 0). The simulator for Alice is slightly more involved, it needs to generate a list of ciphertexts L whose distribution is defined by the output bit res: when res = 0 the L is a list of encryptions of random nonzero elements of M, otherwise one of its elements (the position of which is chosen uniformly at random) contains an encryption of zero. Given res this can also be done efficiently. Due to space restriction the detailed proof of Theorem 1 can be found in the extended version of this paper [32].

Privacy-Preserving Proximity Testing with a Napping Party

687

Fig. 5. The new subroutines in OLIC.

Privacy Remark on Bob. In case the two servers collude, Bob looses privacy, in the sense that Server1 and Server2 together can recover Bob’s location (by unblinding the tmpi in CompTerms, Fig. 5b). However Alice retains her privacy even in case of colluding servers (thanks to the security of the encryption scheme). 4.3

Security Against Malicious Servers

We describe a generic way to turn OLIC into a protocol secure against malicious, but non-colluding, servers. To do so we employ a suitable non-interactive zero knowledge proof system (NIZK) and a fairly novel cryptographic primitive called multi-key homomorphic signatures (MKHS) [1,10]. The proposed maliciously secure version of OLIC is currently more of a feasibility result rather than a concrete instantiation: to the best of our knowledge there is no combination of MKHS and NIZK that would fit our needs.

688

I. Oleynikov et al.

Homomorphic signatures [5,14] enable a signer to authenticate their data in such a way that any third party can homomorphically compute on it and obtain (1) the result of the computation, and (2) a signature vouching for the correctness of the latter result. In addition to what we just described, MKHS make it possible to authenticate computation on data signed by multiple sources. Notably, homomorphic signatures and MKHS can be used to efficiently verify that a computation has been carried out in the correct way on the desired data without need to access the original data [10]. This property is what we leverage to make OLIC secure against malicious servers. At a high level, our proposed solution to mitigate malicious servers works as follows. Alice and Bob hold distinct secret signing keys, skA and skB respectively, and sign their ciphertexts before uploading them to the servers. In detail, using the notation in Fig. 4, Alice sends to Server1 the three messages (ciphertexts) a1 , a2 , a3 along with their respective signatures σiA ← MKHS.Sign(skA , ai , i ), where i is an opportune label.1 Bob acts similarly. Server1 computes fi (the circuit corresponding to the computation in CompTerms on the i-th input) on each ciphertext and each signature, i.e., ci ← f (ai , bi ) and σi ← MKHS.Eval(f, {pkA , pkB }, σiA , σiB ). The unforgeability of MKHS ensures that each multi-key signature σi acts as a proof that the respective ciphertext ci has been computed correctly (i.e., using f on the desired inputs2 ). Server2 can be convinced of this fact by checking whether MKHS.Verif(f, {j }, {pkA , pkB }, ci , σi ) returns 1. If so, Server2 proceeds and computes the value δ (and its multi-key signature σ) by evaluating the circuit g corresponding to the function UnBlind. As remarked in Sect. 4.2, privacy demands that the random coefficients, xi :s, involved in the LessThan procedure are not leaked to Alice. However, without the xi :s Alice cannot run the MKHS verification (as this randomness should be hardwired in the circuit h corresponding to the function LessThan). To overcome this obstacle we propose to employ a zero knowledge proof system. In this way Server2 can state that it knows undisclosed values xi :s such that the output data (the list L ← [l1 , . . . , lr2 −1 ]) passes the MKHS verification on the computations dependent on xi :s. This can be achieved by interpreting the MKHS verification algorithm as a circuit v with inputs xi (and li ). Security Considerations. To guarantee security we need the MKHS to be context hiding (e.g., [37], to prevent leakage of information between Server1 and Server2 ); unforgeable (in the homomorphic sense [10]); and the final proof to be zero knowledge. Implementing this solution would equip OLIC with a quite advanced and heavy machinery that enables Alice to detect malicious behaviors from the servers. A Caveat on Labels. There is final caveat on data labels [10] needed for the MKHS schemes. We propose to set labels as a string containing the public information: day, time, identity of the user producing the location data, and data 1 2

More details on labels at the end of the section. The ‘desired’ inputs are indicated by the labels, as we discuss momentarily.

Privacy-Preserving Proximity Testing with a Napping Party

689

type identifier (e.g., (1, 3) to identify the ciphertext b3 , sent by Bob to Server1 , (2, 1) to identify the ciphertext b1 , sent by Bob to Server2 , and (0, 2) to identify Alice’s ciphertext a2 —Alice only sends data to Server1 ). We remark that such label information would be retrievable by InnerCircle as well, as in that case Alice knows the identity of her interlocutor (Bob), and the moment (day and time) in which the protocol is run.

5

Evaluation

In this section we evaluate the performance of our proposal in three different ways: first we provide asymptotic bounds on time complexity of the algorithms in OLIC; second, we provide bounds on the total communication cost of the protocol; finally we develop a proof-of-concept implementation of OLIC to test its performance (running time and communication cost) and compare it against the InnerCircle protocol. Recall that InnerCircle and OLIC implement the same functionality (privacy-preserving proximity testing), however the former requires Bob to be online during the whole protocol execution while the latter does not. Parameters. As described in Sect. 4.1, OLIC requires Alice to use an additive homomorphic encryption scheme. However, no special property is needed by the ciphertexts from Bob to the servers and between servers. Our implementation employs the ElGamal cryptosystem over a safe-prime order group for ciphertexts to and from Alice, while for the other messages it uses the Paillier cryptosystem. We refer to this implementation as (non-EC), as it does not rely on any elliptic curive cryptograpy (ECC). In order to provide a fair comparison with its predecessor (InnerCircle), we additionally instantiate OLIC using only ECC cryptosystems (EC), namely Elliptic Curve ElGamal. We note that additive homomorphic ElGamal relies on the face that a plaintext value m is encoded into a group element as g m (where g denotes the group generator). In this way, the multiplication of g m1 · g m2 returns an encoding of the addition of the corresponding plaintext values m1 + m2 . In order to ‘fully’ decrypt a ciphertext, one would have to solve the discrete logarithm problem and recover m from g m , which should be unfeasible, as this corresponds to the security assumption of the encryption scheme. Fortunately, this limitation is not crucial for our protocol. In OLIC, indeed, Alice only needs to check whether a ciphertext is an encryption of zero or not (in the CheckLessThan procedure), and this can be done efficiently without fully decrypting the ciphertext. However, we need to keep it into account when implementing OLIC with ECC. Indeed, in OLIC the servers need to actually fully decrypt Bob’s ciphertexts in order to work with the corresponding (blinded) plaintexts. We do so by employing a non-homomorphic version of ElGamal based on the curve M383 in (EC) and Paillier cryptosystem in (non-EC). In our evaluation, we ignore the computational cost of initial key-generation as, in real-life scenarios, this process is a one-time set up and is not needed at every run of the protocol.

690

5.1

I. Oleynikov et al.

Asymptotic Complexity

Table 1 shows both the concrete numbers of cryptographic operations and our asymptotic bounds on the time complexity of the algorithms executed by each party involved in OLIC. We assume that ElGamal and Paillier encryption, decryption and arithmetic operations, are performed in time O(λm) (assuming binary exponentiation), where λ here is the security parameter and m is the cost of λ-bit interger multiplication — depends on the specific multiplication algorithm used, e.g., m = O(n log n log log n) for Sch¨ onhage–Strassen algorithm. This applies to both (EC) and (non-EC), although (EC) in practice is used with a lot smaller values of λ. Table 1. Concrete number of operations and asymptotic time complexity for each party in OLIC (m is the cost of modular multiplication, λ the security parameter and r the radius). In our implementation Alice decryption is actually an (efficient) zero testing. Party

Cryptosystem operations

Time bound

Alice

3 encryptions r2 decryptions

O(r2 λm)

Bob

6 encryptions

O(λm)

Server1 3 decryptions, 3 homomorphic operations,

O(λm)

Server2 3 decryptions, 2r2 + 5 arithmetic operations, r2 encryptions

O(r2 λm)

Communication Cost. The data transmitted among parties during a whole protocol execution amounts to r2 + 6 ciphertexts between Alice and the servers and 6 ciphertexts between Bob and servers. Each ciphertext consists of 4λ bits in case of (EC), and 2λ bits in case of (non-EC) (for both ElGamal and Paillier). Asymptotically, both cases require O(λr2 ) bit of communication—although (EC) is used with a lot smaller λ value in implementation. 5.2

Implementation

We developed a prototype implementation of OLIC in Python (available at [32]). The implementation contains all of the procedures shown in pseudocode in Figs. 3 and 5. To ensure a fair compare between OLIC and InnerCircle, we implemented latter in the same environment (following the nomenclature in Fig. 2). Our benchmarks for the total communication cost of the OLIC protocol are reported in Fig. 6. We measured the total execution time of each procedure of both InnerCircle and OLIC. Figure 7 shows the outcome of our measurements for values of the proximity radius parameter r ranging from 0 to 100. Detailed values can be found in extended version of this paper [32].

Privacy-Preserving Proximity Testing with a Napping Party

691

Setup. We used cryptosystems from the cryptographic library bettertimes [19], which was used for benchmarking of the original InnerCircle protocol. The benchmarks were run on Intel Core i7-8700 CPU running at a frequency of 4.4 GHz. For (non-EC) we used 2048-bit keys for ElGamal (the same as in InnerCircle [20]), and 2148-bit keys for Paillier to accommodate a modulus larger than the ElGamal modulus (so that any possible message of Bob fits into the Paillier message space). Fig. 6. Total communication cost of For (EC) we used curves Curve25519 OLIC. for additive homomorphic encryption and M383 for the ciphertexts exchanged among Bob and the servers from the ecc-pycrypto library [7]. We picked these curves because they were available in the library and also because M383 uses a larger modulus and allows us to encrypt the big values from the plaintexts field of Curve25519. The plaintext ring for (non-EC) ElGamal-2048 has at least |M| ≥ 22047 elements, which allows Alice and Bob’s points to lie on the grid {1, 2 . . . 21023 }2 ensuring that the seqared distance between them never exceeds 22047 . The corresponding plaintext ring size for (EC) is |M| ≥ 2251 (the group size of Curve25519), and the grid is {1, 2 . . . 2125 }2 . Since Earth equator is ≈ 226 meters long, either of the two grids is more than enough to cover any location on Earth with 1 meter precision. Optimizations. In InnerCircle [20], the authors perform three types of optimizations on the LessThan procedure: (O-1) Iterating only through those values of i ∈ {0, . . . , r2 − 1} which can be represented as a sum of two squares, i.e., such i that ∃ a, b : i = a2 + b2 . (O-2) Precomputing the ElGamal ciphertexts Encpk (−i), and only for those i described in (O-1). (O-3) Running the procedure in parallel, using 8 threads. We adopt the optimizations (O-1) and (O-2) in our implementation as well. Note that (O-1) reduces the length of list L (Fig. 4), as well as the total communication cost. We disregard optimization (O-3) since we are interested in the total amount of computations a party needs to do (thus this optimization is not present in our implementations of OLIC and InnerCircle). 5.3

Performance Evaluation

Figure 7 shows a comparison of the total running time of each party in OLIC versus InnerCircle. One significant advantage of OLIC is that it offloads the execution of the LessThan procedure from Bob to the servers. This is reflected in

692

I. Oleynikov et al.

Fig. 7. Running times of each party in InnerCircle and OLIC for both (non-EC) and (EC) instantiations. (Reported times are obtained as average of 40 executions.)

Fig. 7, where the running time of Bob is flat in OLIC, in contrast to quadratic as in InnerCircle. Indeed, the combined running time of the two servers in OLIC almost matches the running time of Bob in InnerCircle. Note that the servers have to spend total 10–12 s on computations when r = 100, which is quite a reasonable amount of time, given that servers are usually not as resource-constrained as clients, and that the most time-consuming procedure LessThan consists of a single loop which can easily be executed in parallel to achieve even better speed. Finally, we remark that the amount of data being sent (Fig. 6) between the parties in OLIC is quite moderate. In case Alice wants to be matched with multiple Bobs, say, with k of them, the amount of computations that she and the servers perform will grow linearly with k. The same applies to the communication cost. Therefore, one can obtain the counterparts of Figs. 7 and 6 for multiple Bobs by simply multiplying all the plots (except Bob’s computations) by k.

6

Related Work

Zhong et al. [44] present the Louis, Lester and Pierre protocols for location proximity. The Louis protocol uses additively homomorphic encryption to compute the distance between Alice and Bob while it relies on a third party to perform the proximity test. Bob needs to be present online to perform the protocol. The Lester protocol does not use a third party but rather than performing proximity testing computes the actual distance between Alice and Bob. The Pierre protocol resorts to grids and leaks Bob’s grid cell distance to Alice.

Privacy-Preserving Proximity Testing with a Napping Party

693

Hide&Crypt by Freni et al. [11] splits proximity in two steps. Filtering is done between a third party and the initiating principal. The two principals then execute computation to achieve finer granularity. In both steps, the granule which a principal is located is sent to the other party. C-Hide&Hash by Mascetti et al. [29] is a centralized protocol, where the principals do not need to communicate pairwise but otherwise share many aspects with Hide&Crypt. FriendLocator by ˇ snys et al. [42] presents a centralized protocol where clients map their position Sikˇ to different granularities, similarly to Hide&Crypt, but instead of refining via the second principal each iteration is done via the third party. VicinityLocator also ˇ snys et al. [41] is an extension of FriendLocator, which allows the proximity by Sikˇ of a principal to be represented not only in terms of any shape. Narayanan et al. [30] present protocols for proximity testing. They cast the proximity testing problem as equality testing on a grid system of hexagons. One of the protocol utilizes an oblivious server. Parties in this protocol use symmetric encryption, which leads to better performance. However, this requires to have preshared keys among parties, which is less amenable to one-to-many proximity testing. Saldamli et al. [36] build on the protocol with the oblivious server and suggest optimizations based on properties from geometry and linear algebra. Nielsen et al. [31] and Kotzanikolaou et al. [26] also propose grid-based solutions. ˇ enka and Gasti [38] homomorphically compute distances using the UTM Sedˇ projection, ECEF (Earth-Centered Earth-Fixed) coordinates, and the Haversine formula that make it possible to consider the curvature of the Earth. Hallgren et al. [20] introduce InnerCircle for parallizable decentralized proximity testing, using additively homomorphic encryption between two parties that must be online. The MaxPace [21] protocol builds on speed constraints of an InnerCirclestyle protocol as to limit the effects of trilateration attacks. Polakis [34] study different distance and proximity disclosure strategies employed in the wild and experiment with practical effects of trilateration. Sakib and Huang [35] explore proximity testing using elliptic curves. They require Alice and Bob to be online to be able to run the protocol. J¨ arvinen et al. [24] design efficient schemes for Euclidean distance-based privacy-preserving location proximity. They demonstrate performance improvements over InnerCircle. Yet the requirement of the two parties being online applies to their setting as well. Hallgren et al. [18] how to leverage proximity testing for endpoint-based ridesharing, building on the InnerCircle protocol, and compare this method with a method of matching trajectories. The computational bottle neck of privacy-preserving proximity testing is the input comparison process. Similarly to [20,33], we rely on homomorphic encryption to compare a private input (the distance between the submitted locations) with a public value (the threshold). Other possible approaches require the use of the arithmetic black box model [6], garbled circuits [25], generic two party computations [12], or oblivious transfer extensions [8]. To summarize, the vast majority [11,20,21,24,30,35,36, 38,41,42,44] of the existing approaches to proximity testings require both parties to be online, thus not being suitable for one-to-many matching. A notable exception to the work

694

I. Oleynikov et al.

above is the C-Hide&Hash protocol by Mascetti et al. [29], which allows oneto-many testing, yet at the price of not computing the precise proximity result but its grid-based approximation. Generally, a large number of approaches [11, 26,29–31,41,42,44] resort to grid-based approximations, thus loosing precision of proximity tests. There is a number of existing works, which consider the problem of computing generic functions in the setting, where clients are not online during the whole execution. Hallevi et al. [17] consider a one-server scenario and show that the notion of security agains semi-honest adversary (which we prove for our protocol) is impossible to achive with one server. Additionally, the model from [17] lets all the parties know each other’s public keys, i.e., the clients know all the other clients who supply inputs for the protocol—this does not allow one-tomany matching, which we achive in our work. Further works [2,3,15,16,23] also consider one-server scenarios.

7

Conclusions

We have presented OLIC, a protocol for privacy-preserving proximity testing with a napping party. In line with our goals, (1) we achieve privacy with respect to semi-honest parties; (2) we enable matching against offline users which is needed in scenarios like ridesharing; (3) we retain precision, not resorting to grid-based approximations, and (4) we reduce the responding client overhead from quadratic (in the proximity radius parameter) to constant. Future work avenues include developing a fully-fledged ridesharing system based on our approach, experimenting with scalability, and examining the security and performance in the light of practical security risks for LBS services. Acknowledgments. This work was partially supported by the Swedish Foundation for Strategic Research (SSF), the Swedish Research Council (VR) and the European Research Council (ERC) under the European Unions’s Horizon 2020 research and innovation program under grant agreement No 669255 (MPCPRO). The authors would also like to thank Carlo Brunetta for insightful discussion during the initial development of the work.

References 1. Aranha, D.F., Pagnin, E.: The simplest multi-key linearly homomorphic signature scheme. In: Schwabe, P., Th´eriault, N. (eds.) LATINCRYPT 2019. LNCS, vol. 11774, pp. 280–300. Springer, Cham (2019). https://doi.org/10.1007/978-3-03030530-7 14 2. Beimel, A., Gabizon, A., Ishai, Y., Kushilevitz, E., Meldgaard, S., PaskinCherniavsky, A.: Non-interactive secure multiparty computation. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8617, pp. 387–404. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44381-1 22

Privacy-Preserving Proximity Testing with a Napping Party

695

3. Benhamouda, F., Krawczyk, H., Rabin, T.: Robust non-interactive multiparty computation against constant-size collusion. In: Katz, J., Shacham, H. (eds.) CRYPTO 2017. LNCS, vol. 10401, pp. 391–419. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-63688-7 13 4. BlaBlaCar - Trusted carpooling. https://www.blablacar.com/ 5. Boneh, D., Freeman, D.M.: Homomorphic signatures for polynomial functions. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 149–168. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20465-4 10 6. Catrina, O., de Hoogh, S.: Improved primitives for secure multiparty integer computation. In: Garay, J.A., De Prisco, R. (eds.) SCN 2010. LNCS, vol. 6280, pp. 182–199. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-153174 13 7. ChangLiu: Ecc-pycrypto Library (2019). https://github.com/lc6chang/eccpycrypto. Accessed 14 Apr 2020 8. Couteau, G.: New protocols for secure equality test and comparison. In: Preneel, B., Vercauteren, F. (eds.) ACNS 2018. LNCS, vol. 10892, pp. 303–320. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93387-0 16 9. Cu´ellar, J., Ochoa, M., Rios, R.: Indistinguishable regions in geographic privacy. In: SAC, pp. 1463–1469 (2012) 10. Fiore, D., Mitrokotsa, A., Nizzardo, L., Pagnin, E.: Multi-key homomorphic authenticators. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016. LNCS, vol. 10032, pp. 499–530. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3662-53890-6 17 11. Freni, D., Vicente, C.R., Mascetti, S., Bettini, C., Jensen, C.S.: Preserving location and absence privacy in geo-social networks. In: CIKM, pp. 309–318 (2010) 12. Garay, J., Schoenmakers, B., Villegas, J.: Practical and secure solutions for integer comparison. In: Okamoto, T., Wang, X. (eds.) PKC 2007. LNCS, vol. 4450, pp. 330–342. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-716778 22 13. Gentry, C., Halevi, S., Vaikuntanathan, V.: i-hop homomorphic encryption and rerandomizable Yao circuits. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 155–172. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3642-14623-7 9 14. Gorbunov, S., Vaikuntanathan, V., Wichs, D.: Leveled fully homomorphic signatures from standard lattices. In: STOC, pp. 469–477. ACM (2015) 15. Gordon, S.D., Malkin, T., Rosulek, M., Wee, H.: Multi-party computation of polynomials and branching programs without simultaneous interaction. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 575–591. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38348-9 34 16. Halevi, S., Ishai, Y., Jain, A., Komargodski, I., Sahai, A., Yogev, E.: Noninteractive multiparty computation without correlated randomness. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10626, pp. 181–211. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70700-6 7 17. Halevi, S., Lindell, Y., Pinkas, B.: Secure computation on the web: computing without simultaneous interaction. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 132–150. Springer, Heidelberg (2011). https://doi.org/10.1007/9783-642-22792-9 8 18. Hallgren, P., Orlandi, C., Sabelfeld, A.: PrivatePool: privacy-preserving ridesharing. In: CSF, pp. 276–291 (2017) 19. Hallgren, P.: BetterTimes Python Library (2017). https://bitbucket.org/hallgrep/ bettertimes/. Accessed 22 Jan 2020

696

I. Oleynikov et al.

20. Hallgren, P.A., Ochoa, M., Sabelfeld, A.: InnerCircle: a parallelizable decentralized privacy-preserving location proximity protocol. In: PST, pp. 1–6 (2015) 21. Hallgren, P.A., Ochoa, M., Sabelfeld, A.: MaxPace: speed-constrained location queries. In: CNS (2016) 22. Hern, A.: Uber employees ‘spied on ex-partners, politicians and Beyonc´e’ (2016). https://www.theguardian.com/technology/2016/dec/13/uber-employeesspying-ex-partners-politicians-beyonce 23. Jarrous, A., Pinkas, B.: Canon-MPC, a system for casual non-interactive secure multi-party computation using native client. In: Sadeghi, A., Foresti, S. (eds.) WPES, pp. 155–166. ACM (2013) ´ Schneider, T., Tkachenko, O., Yang, Z.: Faster privacy24. J¨ arvinen, K., Kiss, A., preserving location proximity schemes. In: Camenisch, J., Papadimitratos, P. (eds.) CANS 2018. LNCS, vol. 11124, pp. 3–22. Springer, Cham (2018). https://doi.org/ 10.1007/978-3-030-00434-7 1 25. Kolesnikov, V., Sadeghi, A.-R., Schneider, T.: Improved garbled circuit building blocks and applications to auctions and computing minima. In: Garay, J.A., Miyaji, A., Otsuka, A. (eds.) CANS 2009. LNCS, vol. 5888, pp. 1–20. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10433-6 1 26. Kotzanikolaou, P., Patsakis, C., Magkos, E., Korakakis, M.: Lightweight private proximity testing for geospatial social networks. Comput. Commun. 73, 263–270 (2016) 27. Lee, D.: Uber concealed huge data breach (2017). http://www.bbc.com/news/ technology-42075306 28. Lindell, Y.: How to simulate it – a tutorial on the simulation proof technique. Tutorials on the Foundations of Cryptography. ISC, pp. 277–346. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57048-8 6 29. Mascetti, S., Freni, D., Bettini, C., Wang, X.S., Jajodia, S.: Privacy in geo-social networks: proximity notification with untrusted service providers and curious buddies. VLDB J. 20(4), 541–566 (2011) 30. Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M., Boneh, D.: Location privacy via private proximity testing. In: NDSS (2011) 31. Nielsen, J.D., Pagter, J.I., Stausholm, M.B.: Location privacy via actively secure private proximity testing. In: PerCom Workshops, pp. 381–386. IEEE CS (2012) 32. Oleynikov, I., Pagnin, E., Sabelfeld, A.: Where are you Bob? Privacy-preserving proximity testing with a napping party (2020). https://www.cse.chalmers.se/ research/group/security/olic/ 33. Pagnin, E., Gunnarsson, G., Talebi, P., Orlandi, C., Sabelfeld, A.: TOPPool: timeaware optimized privacy-preserving ridesharing. PoPETs 2019(4), 93–111 (2019) 34. Polakis, I., Argyros, G., Petsios, T., Sivakorn, S., Keromytis, A.D.: Where’s wally?: Precise user discovery attacks in location proximity services. In: CCS (2015) 35. Sakib, M.N., Huang, C.: Privacy preserving proximity testing using elliptic curves. In: ITNAC, pp. 121–126. IEEE Computer Society (2016) 36. Saldamli, G., Chow, R., Jin, H., Knijnenburg, B.P.: Private proximity testing with an untrusted server. In: WISEC. pp. 113–118. ACM (2013) 37. Schabh¨ user, L., Butin, D., Buchmann, J.: Context hiding multi-key linearly homomorphic authenticators. In: Matsui, M. (ed.) CT-RSA 2019. LNCS, vol. 11405, pp. 493–513. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12612-4 25 38. Sedenka, J., Gasti, P.: Privacy-preserving distance computation and proximity testing on earth, done right. In: AsiaCCS, pp. 99–110 (2014) 39. Sharemind MPC Platform. https://sharemind.cyber.ee/sharemind-mpc/multiparty-computation/

Privacy-Preserving Proximity Testing with a Napping Party

697

40. Shu, C.: Uber reportedly tracked Lyft drivers using a secret software program named ‘Hell’ (2017). https://techcrunch.com/2017/04/12/hell-o-uber/ 41. Siksnys, L., Thomsen, J.R., Saltenis, S., Yiu, M.L.: Private and flexible proximity detection in mobile social networks. In: MDM, pp. 75–84 (2010) 42. Siksnys, L., Thomsen, J.R., Saltenis, S., Yiu, M.L., Andersen, O.: A location privacy aware friend locator. In: SSTD, pp. 405–410 (2009) 43. Veytsman, M.: How I was able to track the location of any tinder user (2014). http://blog.includesecurity.com/ 44. Zhong, G., Goldberg, I., Hengartner, U.: Louis, Lester and Pierre: three protocols for location privacy. In: PET, pp. 62–76 (2007)

Password and Policy

Distributed PCFG Password Cracking Radek Hranick´ y(B) , Luk´ aˇs Zobal(B) , Ondˇrej Ryˇsav´ y , Duˇsan Kol´ aˇr, and D´ avid Mikuˇs Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic {ihranicky,izobal,rysavy,kolar}@fit.vutbr.cz [email protected]

Abstract. In digital forensics, investigators frequently face cryptographic protection that prevents access to potentially significant evidence. Since users prefer passwords that are easy to remember, they often unwittingly follow a series of common password-creation patterns. A probabilistic context-free grammar is a mathematical model that can describe such patterns and provide a smart alternative for traditional brute-force and dictionary password guessing methods. Because more complex tasks require dividing the workload among multiple nodes, in the paper, we propose a technique for distributed cracking with probabilistic grammars. The idea is to distribute partially-generated sentential forms, which reduces the amount of data necessary to transfer through the network. By performing a series of practical experiments, we compare the technique with a naive solution and show that the proposed method is superior in many use-cases.

Keywords: Distributed

1

· Password · Cracking · Forensics · Grammar

Introduction

With the complexity of today’s algorithms, it is often impossible to crack a hash of a stronger password in an acceptable time. For instance, verifying a single password for a document created in MS Office 2013 or newer requires 100,000 iterations of SHA-512 algorithm. Even with the use of the popular hashcat1 tool and a machine with 11 NVIDIA GTX 1080 Ti2 units, brute-forcing an 8character alphanumeric password may take over 48 years. Even the most critical pieces of forensic evidence lose value over such a time. Over the years, the use of probability and statistics showed the potential for a rapid improvement of attacks against human-created passwords [12,13,19]. Various leaks of credentials from websites and services provide an essential source of knowledge about user password creation habits [2,20], including the use of existing words [4] or reusing the same credentials between multiple services [3]. 1 2

https://hashcat.net/. https://onlinehashcrack.com/tools-benchmark-hashcat-gtx-1080-ti-1070-ti.

c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 701–719, 2020. https://doi.org/10.1007/978-3-030-58951-6_34

702

R. Hranick´ y et al.

Therefore, the ever-present users’ effort to simplify work is also their major weakness. People across the world unwittingly follow common password-creation patterns over and over. Weir et al. showed how probabilistic context-free grammars (PCFG) [19] could describe such patterns. They proposed a technique for automated creation of probabilistic grammars from existing password datasets that serve as training dictionaries. A grammar is a mathematical model that allows representing the structure of a password as a composition of fragments. Each fragment is a finite sequence of letters, digits, or special characters. Then, by derivation using rewriting rules of the grammar, one can not only generate all passwords from the original dictionary but produce many new ones that still respect passwordcreation patterns learned from the dictionary. The rewriting rules of PCFG have probability values assigned accordingly to the occurrence of fragments in the training dictionary. The probability of each possible password equals the product of probabilities of all rewriting rules used to generate it. Generating password guesses from PCFGs is deterministic and is performed in an order defined by their probabilities. Therefore, more probable passwords are generated first, which helps with a precise targeting of an attack. Since today’s challenges in password cracking often require distributed computing [11], it is necessary to find appropriate mechanisms for PCFG-based password generators. From the entire process, the most complex part that needs distribution is the calculation of password hashes. It is always possible to generate all password guesses on a single node and distribute them to others that perform the hash calculation. However, as we detected, the generating node and the interconnecting network may quickly become a bottleneck. Inspired by Weir’s work with preterminal structures [19] and our previously published parallel PCFG cracker [10], we present a distributed solution that only distributes “partially-generated” passwords, and the computing nodes themselves generate the final guesses. 1.1

Contribution

We propose a mechanism for distributed password cracking using probabilistic context-free grammars. The concept uses preterminal structures as basic units for creating work chunks. In the paper, we demonstrate the idea by designing a proof-of-concept tool that also natively supports the deployment of the hashcat tool. Our solution uses adaptive work scheduling to reflect the performance of available computing nodes. We evaluate the technique in a series of experiments by cracking different hash algorithms using different PCFGs, network speeds, and numbers of computing nodes. By comparison with the naive solution, we illustrate the advantages of our concept.

Distributed PCFG Password Cracking

1.2

703

Structure of the Paper

The paper is structured as follows. Section 2 provides a summary of related work. Section 3 describes the design of our distributed PCFG Manager. Section 4 shows experimental results of our work. Finally, Sect. 5 concludes the paper.

2

Background and Related Work

The use of probabilistic methods for computer-based password cracking dates back to 1980. Martin Hellman introduced a time-memory trade-off method for cracking DES cipher [6]. This chosen-plaintext attack used a precomputation of data stored in memory to reduce the time required to find the encryption key. With a method of distinguished points, Ron Rivest reduced the necessary amount of lookup operations [16]. Phillipe Oechslin improved the original concept and invented rainbow tables as a compromise between the brute-force attack and a simple lookup table. For the cost of space and precomputation, the rainbow table attack reduces the cracking time of non-salted hashes dramatically [14]. The origin of passwords provides another hint. Whereas a machine-generated password may be more or less random, human beings follow specific patterns we can describe mathematically. Markov chains are stochastic models frequently used in natural language processing [15]. Narayanan et al. showed the profit of using zero-order and first-order Markovian models based on the phonetical similarity of passwords to existing words [13]. Hashcat cracking tool utilizes this concept in the default configuration of a mask brute-force attack. The password generator uses a Markov model supplied in an external .hcstat file. Moreover, the authors provide a utility for the creation of new models by automated processing of character sequences from existing wordlists. Weir et al. introduced password cracking using probabilistic context-free grammars (PCFG) [19]. The mathematical model is based on classic context-free grammars [5] with the only difference that each rewriting rule has a probability value assigned. Similarly to Markovian models, a grammar can be created automatically by training on an existing password dictionary. For generating passwords guesses from an existing grammar, Weir proposed the Next function together with a proof of its correctness. The idea profits from the existence of pre-terminal structures - sentential forms that produce password guesses with the same probability. By using PCFG on MySpace dataset (split to training and testing part), Weir et al. were able to crack 28% to 128% more passwords in comparison with the default ruleset from John the Ripper (JtR) tool3 using the same number of guesses [19]. The original approach did not distinguish between lowercase and uppercase letter. Thus, Weir extended the original concept by adding special rules for capitalization of letter fragments. Due to high space complexity, the original Next function was later replaced by the Deadbeat dad algorithm [18]. 3

https://www.openwall.com/john/.

704

R. Hranick´ y et al.

Through the following years, Weir’s work inspired other researchers as well. Veras et al. proposed a semantic-based extension of PCFGs that makes the provision of the actual meaning of words that create passwords [17]. Ma et al. performed a large-scale evaluation of different probabilistic password models and proposed the use of normalization and smoothing to improve the success rate [12]. Houshmand et al. extended Weir’s concept by adding an extra set of rules that respect the position of keys on keyboards to reflect frequent patterns that people use. The extension helped improve the success rate by up to 22%. Besides, they advised to use Laplace probability smoothing, and created guidelines for choosing appropriate attack dictionaries [8]. After that, Houshmand et al. also introduced targeted grammars that utilize information about a user who created the password [7]. Last but not least, Agarwall et al. published a well-arranged overview of new technologies in password cracking techniques, including the state-of-the-art improvements in the area of PCFGs and Markovian models [1]. In our previous research, we focused on the practical aspects of grammarbased attacks and identified factors that influence the time of making password guesses. We introduced parallel PCFG cracking and possibilities for additional filtering of an existing grammar. Especially, removing rules that rewrite the starting non-terminal to long base structures lead to a massive speedup of password guessing with nearly no impact on success rate [10]. In 2019, Weir released4 a compiled PCFG password guesser written in pure C to achieve faster cracking and easier interconnection with existing tools like hashcat and JtR. Making the password guessing faster, however, resolves only a part of the problem. Serious cracking tasks often require the use of distributed computing. But how to efficiently deliver the password guesses to different nodes? Weir et al. suggested the possible use of preterminal structures directly in a distributed password cracking trial [19]. To verify the idea, we decided to narrowly analyze the possibilities for distributed PCFG guessing, create a concrete design of intranode communication mechanisms, and experimentally test its usability.

3

Distributed PCFG Manager

For distributed cracking, we assume a network consisting of a server and a set of clients, as illustrated in Fig. 1. The server is responsible for handling client requests and assigning work. Clients represent the cracking stations equipped with one or more OpenCL-compatible devices like GPU, hardware coprocessors, etc. In our proposal, we talk about a client-server architecture since the clients are actively asking for work, whereas the server is offering a “work assignment service.” In PCFG-based attacks, a probabilistic context-free grammar represents the source of all password guesses, also referred to as candidate passwords. Each guess represents a string generated by the grammar, also known as a terminal structure [18,19]. In a distributed environment, we need to deliver the passwords to the cracking nodes somehow. A naive solution is to generate all password 4

https://github.com/lakiw/compiled-pcfg.

Distributed PCFG Password Cracking

705

Fig. 1. An example of a cracking network

candidates on the server and distribute them to clients. However, such a method has high requirements on the network bandwidth, and as we detected in our previous research, also high memory requirements to the server [10]. Another drawback of the naive solution is limited scalability. Since all the passwords are generated on a single node, the server may easily become a bottleneck of the entire network. And thus, we propose a new distributed solution that is inspired by our parallel one [10]. In the following paragraphs, we describe the design and communication protocol of our distributed PCFG Manager. To verify the usability of our concept, we created a proof-of-concept tool that is freely available5 on GitHub. For implementation, we chose Go6 programming language, because of its speed, simplicity, and compilation to machine language. The tool can run either as a server or as a client. The PCFG Manager server should be deployed on the server node. It is responsible for processing the input grammar and distribution of work. The PCFG Manager client running on the client nodes generates the actual password guesses. It either prints the passwords directly to the standard output or passes them to an existing hash cracker. The behavior depends on the mode of operation specified by the user. Details are explained in the following paragraphs. The general idea is to divide the password generation across the computing nodes. The server only generates the preterminal structures (PT), while the 5 6

https://github.com/nesfit/pcfg-manager. https://golang.org/.

706

R. Hranick´ y et al.

terminal structures (T) are produced by the cracking nodes. The work is assigned progressively in smaller pieces called chunks. Each chunk produced by the server contains one or more preterminal structures, from which the clients generate the password guesses. To every created chunk, the server assigns a unique identifier called the sequence number. The keyspace, i.e., the number of possible passwords, of each chunk is calculated adaptively to fit the computational capabilities of a node that will be processing it. Besides that, our design allows direct cracking with hashcat tool. We chose hashcat as a cracking engine for the same reasons as we did for the Fitcrack distributed password cracking system [11], mainly because of its speed and range of supported hash formats. The proposed tool supports two different modes of operation: – Generating mode - the PCFG Manager client generates all possible password guesses and prints them to the standard output. A user can choose to save them into a password dictionary for later use or to pass them to another process on the client-side. – Cracking mode - With each chunk, the PCFG Manager client runs hashcat in stdin mode with the specified hashlist and hash algoritm. By using a pipe, it feeds it with all password guesses generatable from the chunk. Once hashcat processes all possible guesses, the PCFG Manager client returns a result of the cracking process, specifying which hashes were cracked within the chunk and what passwords were found. 3.1

Communication Protocol

The proposed solution uses remote procedure calls with the gRPC7 framework. For describing the structure of transferred data and automated serialization of payload, we use the Protocol buffers8 technology. The server listens on a specified port and handles requests from client nodes. The behavior is similar to the function of Gouroutine M from the parallel version [10] - it generates PT and tailors workunits for client nodes. Each workunit, called chunk, contains one or more PTs. As shown in listing 1.1, the server provides clients an API consisting of four methods. Listing 1.2 shows an overview of input/output messages that are transferred with the calls of API methods. 1 2 3 4 5

6

service PCFG { rpc Connect ( Empty ) returns ( ConnectResponse ) {} rpc Disconnect ( Empty ) returns ( Empty ) ; rpc GetNextItems ( Empty ) returns ( Items ) {} rpc SendResult ( CrackingResponse ) returns (← ResultResponse ) ; } Listing 1.1. Server API

7 8

https://grpc.io/. https://developers.google.com/protocol-buffers.

Distributed PCFG Password Cracking

707

When a client node starts, it connects to the server using the Connect() method. The server responds with the ConnectResponse message containing a PCFG in a compact serialized form. If the desire is to perform an attack on a concrete list of hashes using hashcat, the ConnectResponse message also contains a hashlist (the list of hashes intended to crack) and a number that defines the hash mode 9 , i.e., cryptographic algorithms used. 1 2 3 4 5 6 7 8 9 10 11 12 13 14

message ConnectResponse { Grammar grammar = 1 ; repeated string hashList = 2 ; string hashcatMode = 3 ; } message Items { repeated TreeItem preTerminals = 1 ; } message ResultResponse { bool end = 1 ; } message CrackingResponse { map < string , string > hashes = 1 ; } Listing 1.2. Messages transferred between the server and clients

Once connected, the client asks for a new chunk of preterminal structures using the GetNextItems() method. In response, the server assigns the client a chunk of 1 to N preterminal structures, represented by the Items message. After the client generates and processes all possible passwords from the chunk, using the SendResult() call, it submits the result in the CrackingResponse message. In cracking mode, the message contains a map (an associative array) of cracked hashes together with corresponding plaintext passwords. If no hash is cracked within the chunk or if the PCFG Manager runs in generating mode, the map is empty. With the ResultResponse message, the server then informs the client, if the cracking is over or if the client should ask for a new chunk by calling the GetNextItems() method. The last message is Disconnect() that clients use to indicate the end of their participation, so that the server can react adequately. For instance, if a client had a chunk assigned, but disconnected without calling the SendResult() method, the server may reassign the chunk to a different client. The flow of messages between the server and a client is illustrated in Fig. 2. 3.2

Server

The server represents the controlling point of the computation network. It maintains the following essential data structures: 9

https://hashcat.net/wiki/doku.php?id=example hashes.

708

R. Hranick´ y et al.

Fig. 2. The proposed communication protocol

– Priority queue - the queue is used by the Deadbeat dad algorithm [18] for generating preterminal structures. – Buffered channel - the channel represents a memory buffer for storing already generated PTs from which the server creates chunks of work. – Client information - for each connected client, the server maintains its IP address, current performance, the total number of password guesses performed by the client, and information about the last chunk that the client completed: its keyspace and timestamps describing when the processing started and ended. If the client has a chunk assigned, the server also stores its sequence number, PTs, and the keyspace of the chunk. – List of incomplete chunks - the structure is essential for a failure-recovery mechanism we added to the server. If any client with a chunk assigned disconnects before reporting its result, the chunk is added to the list to be reassigned to a different client. – List of non-cracked hashes (cracking mode only) - the list contains all input hashes that have not been cracked yet. – List of cracked hashes (cracking mode only) - The list contains all hashes that have already been cracked, together with corresponding passwords.

Distributed PCFG Password Cracking

709

Once started, the server loads an input grammar in the format used by Weir’s PCFG Trainer10 . Next, it checks the desired mode of operation and other configuration options – the complete description is available via the tool’s help. In cracking mode, the server loads all input hashes to the list of non-cracked hashes. In generating mode, all hashlists remain empty. The server then allocates memory for the buffered channel, where the channel size can be specified by the user. As soon as all necessary data structures get initialized, the server starts to generate PTs using the Deadbeat dad algorithm, and with each PT, it calculates and stores its keyspace. Generated PTs are sent to the buffered channel. The process continues as long as there is free space in the channel, and the grammar allows new PTs to be created. If the buffer infills, generating new PTs is suspended until the positions in the channel get free again. When a client connects, the server adds a new record to the client information structure. In the ConnectResponse message, the client receives the grammar that should be used for generating passwords guesses. In cracking mode, the server also sends the hashlist and hash mode identifying the algorithms that should be used, as illustrated in Fig. 2. Upon receiving the GetNextItems() call, the server pops one or more PTs from the buffered channel and sends them to the client as a new chunk. Besides, the server updates the client information structure to denote what chunk is currently assigned to the client. The number of PTs taken depends on their keyspace. Inspired by our adaptive scheduling algorithm that we Like in Fitcrack [9,11], the system schedules work adaptively to the performance of each client. In our previous research, we introduced an algorithm for adaptive task scheduling. We integrated the algorithm into our proof-of-concept tool, Fitcrack11 - a distributed password cracking system, and showed its benefits [9,11]. Therefore, we decided to use a similar technique and schedule work adaptively to each client’s performance. In our distributed PCFG Manager, the performance of a client (pc ) in passwords per second is calculated from the keyspace (klast ) and computing time (Δtlast ) of the last assigned chunk. The keyspace of a new chunk (knew ) assigned to the client depends on the client’s performance and the chunk duration parameter that the user can specify: pc =

klast , Δtlast

knew = pc ∗ chunk duration.

(1) (2)

The server removes as many PTs from the channel as needed to make the total keyspace of the new chunk at least equal to knew . If the client has not solved any chunk yet, we have no clue to find pc . Therefore, for the very first chunk, the knew is set to a pre-defined constant. 10 11

https://github.com/lakiw/legacy-pcfg/blob/master/python pcfg cracker version3/ pcfg trainer.py. https://fitcrack.fit.vutbr.cz.

710

R. Hranick´ y et al.

An exception occurs if a client with a chunk assigned disconnects before reporting its result. In such a case, the server saves the assignment to the list of incomplete chunks. If the list is not empty, chunks in it have an absolute priority over the newly created ones. And thus, upon the following GetNextItems() call from any client, the server uses the previously stored chunk. Once a client submits a result via the SendResult() call, the server updates the information about the last completed chunk inside the client information structure. For each cracked hash, the server removes it from the list of noncracked hashes and adds it to the list of cracked hashes together with the resulting password. If all hashes are cracked, the server prints each of them with the correct password and ends. In generating mode, the server continues as long as new password guesses are possible. The same happens in the cracking mode in case there is a non-cracked hash. Finally, if a client calls the Disconnect() method, the server removes its record from the client information structure. As described above, if the client had a chunk assigned, the server will save it for later use. 3.3

Client

After calling the Connect() method, a client receives the ConnectResponse message containing a grammar. In cracking mode, the message also include a hashlist and a hash mode. Then it calls the GetNextItems() method to obtain a chunk assigned by the server. Like in our parallel version, the client then subsequently takes one PT after another and uses the generating goroutines to create passwords from them [10]. In the generating mode, all password guesses are printed to the standard output. For the cracking mode, it is necessary to have a compiled executable of hashcat on the client node. The user can define the path using the program parameters. The PCFG Manager then starts hashcat in the dictionary attack mode with the hashlist and hash mode parameters based on the information obtained from the ConnectResponse message. Since no dictionary is specified, hashcat automatically reads passwords from the standard input. And thus, the client creates a pipe with the end connected to the hashcat’s input. All password guesses are sent to the pipe. From each password, hashcat calculates a cryptographic hash and compares it to the hashes in the hashlist. After generating all password guesses within the chunk, the client closes the pipe, waits for hashcat to end, and reads its return value. On success, the client loads the cracked hashes. In the end, the client informs the server using the SendResult() call. If any hashes are cracked, the client adds them to the CrackingResponse message that is sent with the call. The architecture of the PCFG Manager client is displayed in Fig. 3.

4

Experimental Results

We conduct a number of experiments in order to validate several assumptions. First, we want to show the proposed solution results in a higher cracking per-

Distributed PCFG Password Cracking

711

Fig. 3. The architecture of the client side

formance and lower network usage. We also show that while the naive terminal distribution quickly reaches the speed limit by filling the network bandwidth, our solution scales well across multiple nodes. We discuss the differences among different grammars and the impact of scrambling the chunks during the computation. In our experiments, we use up to 16 computing nodes for the cracking tasks and one server node distributing the chunks. All nodes have the following configuration: – CentOS 7.7 operating system, – NVIDIA GeForce GTX1050 Ti GPU, – Intel(R) Core(TM) i5-3570K CPU, and 8 GB RAM. The nodes are in a local area network connected with links of 10, 100, and 1000 Mbps bandwidth. During the experiments, we incrementally change the network speed to observe the changes. Furthermore, we limit the number of generated passwords to 1, 10, and 100 million. With this setup, we conduct a number of cracking tasks on different hash types and grammars. As the hash cracking speed has a significant impact on results, we chose bcrypt with five iterations, a computationally difficult hash type, and SHA3-512, an easier, yet modern hash algorithm. Table 1 displays all chosen grammars with description. The columns cover statistics of the source dictionary: password count (pw-cnt) and average password length (avg-len), as well as statistics of the generated grammar: the number of possible passwords guesses (pw-cnt), the number of base structures (base-cnt), their average length (avg-base-len) and the maximum length of base structures (max-base-len), in nonterminals. One can also notice the enormous number of generated passwords, especially with the myspace grammar. Such a high number is caused only by few base structures with many nonterminals. We discussed the complexity added by long base structure in our previous research [10]. If not stated otherwise, the

712

R. Hranick´ y et al. Table 1. Grammars used in the experiments

Dictionary statistics name

PCFG statistics

pw-cnt avg-len pw-cnt

myspace 37,145

8.59

base-cnt avg-base-len max-base-len

6E+1874 1,788

4.50

600 8

cain

306,707 9.27

3.17E+15 167

2.59

john

3,108

1.32E+09 72

2.14

8

phpbb

184,390 7.54

2.84E+37 3,131

4.11

16

singles

12,235

6.67E+11 227

3.07

8

6.06 7.74

dw17a 10,000 7.26 2.92E+15 106 2.40 12 a https://github.com/danielmiessler/SecLists/blob/master/Passwords/darkweb2017top10000.txt

grammars are generated from password lists found on SkullSecurity wiki page12 . For each combination of described parameters, we run two experiments – first, with the naive terminal distribution (terminal), and second using our solution with the preterminal distribution (preterminals). 4.1

Computation Speedup and Scaling

The primary goal is to show that the proposed solution provides faster password cracking using PCFG. In Fig. 4, one can see the average cracking speed of SHA3-512 hash with myspace grammar, with different task sizes and network bandwidths. Apart from the proposed solution being generally faster, we see a significant difference in speeds with the lower network bandwidths. This is well seen in the detailed graph in Fig. 5 which shows the cracking speed of SHA3-512 in a 10Mbps network with different task sizes. The impact of the network bandwidth limit is expected as the naive terminal distribution requires a significant amount of data in the form of a dictionary to be transmitted. In the naive solution, network links become the main bottleneck that prevents achieving higher cracking performance. In our solution, the preterminal distribution reduces data transfers dramatically, which removes the obstacle and allow for achieving higher cracking speeds. Apart from SHA3-512, we conducted the same set of experiments with SHA1 and MD5 hashes. Results from cracking these hashes are not present in the paper as they are very similar to SHA3. The reason for this is the cracking performance of SHA1 and MD5 is very high, similar to the former. Figure 6 illustrates the network activity using both solutions. The graph compares the data transfered in time in SHA3-512 cracking task with the lowest limit on network bandwidth. With the naive terminal distribution one can notice the transfer speed is limited for the entire experiment with small pauses for generating the terminals. As a result, the total amount of data transferred is more than 12

https://wiki.skullsecurity.org/Passwords.

Distributed PCFG Password Cracking

713

Fig. 4. Average cracking speed with different bandwidths and password count (SHA3512/myspace grammar/4 nodes)

Fig. 5. Detail of 10Mbps network bandwidth experiment (SHA3-512/myspace grammar / 4 nodes)

Fig. 6. Comparison of network activity (SHA3-512/myspace grammar/100 million hashes/10 Mbps network bandwidth/16 nodes)

714

R. Hranick´ y et al.

Fig. 7. Average cracking speed with different bandwidths and password count (bcrypt/myspace grammar)

15 times bigger than with the proposed preterminal distribution. The maximum speed has not reached the 10 Mbit limit due to auto-negotiation problems on the local network. The difference between the two solutions disappears if we crack very complex hash algorithms. Figure 7 shows the results of cracking bcrypt hashes with myspace grammar. The average cracking speeds are multiple times lower than with SHA3. In this case, the two solutions do not differ because the transferred chunks have much lower keyspace since clients can not verify as many hashes as was possible for SHA3. Most of the experiment time is used by hashcat itself, cracking the hashes. In the previous graphs, one could also notice the cracking speed increases with more hashes. This happens since smaller tasks cannot fully leverage the whole distributed network as the smallest task took only several seconds to crack. Figure 8 shows that the average cracking speed is influenced by the number of connected cracking nodes. While for the smallest task, there is almost no difference with the increasing node count, for the largest task, the speed rises even between 8 and 16 nodes. As this task only takes several minutes, we expect larger tasks would visualize this even better. One can also observe the naive solution using terminal distribution does not scale well. Even though we notice a slight speedup up to 4 nodes. The speed is capped after that even in the largest task because of the network bottleneck mentioned above. Figure 9 shows a similar picture but now with the network bandwidth cap increased from 100 Mbps to 1000 Mbps. While the described patterns remain visible, the difference between our and the naive solutions narrows. The reason for this is with the increased bandwidth, it is possible to send greater chunks of pre-generated dictionaries, as described above. With the increasing number of nodes and cracking task length, advantages of the proposed solution become more clear, as seen at the top of the described graph.

Distributed PCFG Password Cracking

715

Fig. 8. Scaling across multiple cracking nodes (SHA3-512/myspace grammar/100 Mbps network bandwidth)

Fig. 9. Scaling across multiple cracking mar/1000 Mbps network bandwidth)

nodes

(SHA3-512/myspace

gram-

716

4.2

R. Hranick´ y et al.

Grammar Differences

Next, we measure how the choice of a grammar influences the cracking speed. In Fig. 10, we can see the differences are significant. While the cracking speed of the naive solution is capped by the network bandwidth, results from the proposed solution show generating passwords using some grammars is slower than with others – a phenomenon that is connected with the base structures lengths [10]. Generating passwords from the Darkweb2017 (dw17) grammar is also very memory demanding because of the long base structures at the beginning of the grammar, and 8 GB RAM is not enough for the largest cracking task using the naive solution. With the proposed preterminal-based solution, we encounter no such problem.

Fig. 10. Differences in cracking speed among grammars

4.3

Workunit Scrambling

The Deadbeat dad algorithm [18] ran on the server ensures the preterminal structures are generated in a probability order. The same holds for passwords, if generated sequentially. However, in a distributed environment, the order of outgoing chunks and incoming results may scramble due to the non-deterministic behavior of the network. Such a property could be removed by adding an extra intra-node synchronization to the proposed protocol. However, we do not consider that necessary if the goal is to verify all generated passwords in the shortest possible time. Moreover, the scrambling does not affect the result as a whole. The goal of generating and verifying n most probable passwords is fulfilled. For example, for an assignment of generating and verifying 1 million most probable passwords from a PCFG, our solution generates and verifies 1 million most probable passwords, despite the incoming results might be received in a different order.

Distributed PCFG Password Cracking

717

Though the scrambling has no impact on the final results, we study the extent of chunk scrambling in our setup. We observe the average difference between the expected and real order of chunks arriving at the server, calling it a scramble factor Sf . In the following equation, n is the number of chunks, and rk is the index where k-th chunk was received: 1 |(k − rk )|. n n

Sf =

(3)

k=1

We identified two key features affecting the scramble factor – the number of the computing nodes in the system and the number of chunks distributed. In Figure 11, one can see that the scramble factor is increasing with the increasing number of chunks and computing nodes. This particular graph represents experiments with bcrypt algorithm, myspace grammar, capped at 10 million hashes. Other experiments resulted in a similar pattern. We conclude that the scrambling is relatively low in comparison with the number of chunks and nodes.

Fig. 11. Average scramble factor, bcrypt myspace grammar capped at 10 milion hashes

5

Conclusion

We proposed a method and a protocol for distributed password cracking using probabilistic context-free grammars. The design was experimentally verified using our proof-of-concept tool with a native hashcat support. We showed that distributing the preterminal structures instead of the final passwords reduces the bandwith requirements dramatically, and in many cases, brings a significant speedup. This fact confirms the hypothesis of Weir et al., who suggested preterminal distribution may be helpful in a distributed password cracking trial [19].

718

R. Hranick´ y et al.

Moreover, we showed how different parameters, such as the hash algorithm, network bandwidth, or the choice of concrete grammar, affect the cracking process. In the future, we would like to redesign the tool to become an envelope over Weir’s compiled version of the password cracker in order to stay up to date with the new features that are added over time. The C-based version could figure as a module for our solution. Acknowledgements. The research presented in this paper is supported by Ministry of Education, Youth and Sports of the Czech Republic from the National Programme of Sustainability (NPU II) project “IT4Innovations excellence in science” LQ1602.

References 1. Aggarwal, S., Houshmand, S., Weir, M.: New technologies in password cracking techniques. In: Lehto, M., Neittaanm¨ aki, P. (eds.) Cyber Security: Power and Technology. ISCASE, vol. 93, pp. 179–198. Springer, Cham (2018). https://doi.org/10. 1007/978-3-319-75307-2 11 2. Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: 2012 IEEE Symposium on Security and Privacy, pp. 538–552, May 2012. https://doi.org/10.1109/SP.2012.49 3. Das, A., Bonneau, J., Caesar, M., Borisov, N., Wang, X.: The tangled web of password reuse. NDSS 14, 23–26 (2014) 4. Florencio, D., Herley, C.: A large-scale study of web password habits. In: Proceedings of the 16th International Conference on World Wide Web WWW 2007, pp. 657–666. ACM, New York (2007). https://doi.org/10.1145/1242572.1242661 5. Ginsburg, S.: The Mathematical Theory of Context Free Languages. McGrawHillBook Company (1966) 6. Hellman, M.: A cryptanalytic time-memory trade-off. IEEE Trans. Inf. Theory 26(4), 401–406 (1980) 7. Houshmand, S., Aggarwal, S.: Using personal information in targeted grammarbased probabilistic password attacks. DigitalForensics 2017. IAICT, vol. 511, pp. 285–303. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67208-3 16 8. Houshmand, S., Aggarwal, S., Flood, R.: Next gen PCFG password cracking. IEEE Trans. Inf. Forensics Secur. 10(8), 1776–1791 (2015) 9. Hranick´ y, R., Holkoviˇc, M., Matouˇsek, P., Ryˇsav´ y, O.: On efficiency of distributed password recovery. J. Digit. Forensics Secur. Law 11(2), 79–96 (2016). http:// www.fit.vutbr.cz/research/viewpub.php.cs?id=11276 10. Hranick´ y, R., Liˇstiak, F., Mikuˇs, D., Ryˇsav´ y, O.: On practical aspects of PCFG password cracking. In: Foley, S.N. (ed.) DBSec 2019. LNCS, vol. 11559, pp. 43–60. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22479-0 3 11. Hranick´ y, R., Zobal, L., Ryˇsav´ y, O., Kol´ aˇr, D.: Distributed password cracking with boinc and hashcat. Digit. Invest. 2019(30), 161–172 (2019). https://doi.org/ 10.1016/j.diin.2019.08.001. https://www.fit.vut.cz/research/publication/11961 12. Ma, J., Yang, W., Luo, M., Li, N.: A study of probabilistic password models. In: 2014 IEEE Symposium on Security and Privacy, pp. 689–704 (May 2014). https:// doi.org/10.1109/SP.2014.50 13. Narayanan, A., Shmatikov, V.: Fast dictionary attacks on passwords using timespace tradeoff. In: Proceedings of the 12th ACM Conference on Computer and Communications Security, pp. 364–372. CCS 2005. ACM, New York (2005). https://doi.org/10.1145/1102120.1102168

Distributed PCFG Password Cracking

719

14. Oechslin, P.: Making a faster cryptanalytic time-memory trade-off. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 617–630. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4 36 15. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989). https://doi.org/10.1109/5. 18626 16. Robling Denning, D.E.: Cryptography and Data Security. Addison-Wesley Longman Publishing Co., Inc. (1982) 17. Veras, R., Collins, C., Thorpe, J.: On semantic patterns of passwords and their security impact. In: NDSS (2014) 18. Weir, C.M.: Using probabilistic techniques to aid in password cracking attacks. Ph.D. thesis, Florida State University (2010) 19. Weir, M., Aggarwal, S., De Medeiros, B., Glodek, B.: Password cracking using probabilistic context-free grammars. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 391–405, May 2009. https://doi.org/10.1109/SP.2009.8 20. Weir, M., Aggarwal, S., Collins, M., Stern, H.: Testing metrics for password creation policies by attacking large sets of revealed passwords. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 162–175 (2010)

Your PIN Sounds Good! Augmentation of PIN Guessing Strategies via Audio Leakage Matteo Cardaioli1,2(B) , Mauro Conti1,4 , Kiran Balagani3 , and Paolo Gasti3 1

3

University of Padua, Padua, Italy [email protected] 2 GFT Italy, Milan, Italy New York Institute of Technology, Old Westbury, USA 4 University of Washington, Seattle, USA

Abstract. Personal Identification Numbers (PINs) are widely used as the primary authentication method for Automated Teller Machines (ATMs) and Point of Sale (PoS). ATM and PoS typically mitigate attacks including shoulder-surfing by displaying dots on their screen rather than PIN digits, and by obstructing the view of the keypad. In this paper, we explore several sources of information leakage from common ATM and PoS installations that the adversary can leverage to reduce the number of attempts necessary to guess a PIN. Specifically, we evaluate how the adversary can leverage audio feedback generated by a standard ATM keypad to infer accurate inter-keystroke timing information, and how these timings can be used to improve attacks based on the observation of the user’s typing behavior, partial PIN information, and attacks based on thermal cameras. Our results show that inter-keystroke timings can be extracted from audio feedback far more accurately than from previously explored sources (e.g., videos). In our experiments, this increase in accuracy translated to a meaningful increase in guessing performance. Further, various combinations of these sources of information allowed us to guess between 44% and 89% of the PINs within 5 attempts. Finally, we observed that based on the type of information available to the adversary, and contrary to common knowledge, uniform PIN selection is not necessarily the best strategy. We consider these results relevant and important, as they highlight a real threat to any authentication system that relies on PINs.

1

Introduction

Authentication via Personal Identification Numbers (PINs) dates back to the mid-sixties [5]. The first devices to use PINs were automatic dispensers and control systems at gas stations, while the first applications in the banking sector appeared in 1967 with cash machines [7]. PINs have found widespread use over the years in devices with numeric keypads rather than full keyboards [22]. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 720–735, 2020. https://doi.org/10.1007/978-3-030-58951-6_35

Your PIN Sounds Good!

721

In the context of financial services, ISO 9564-1 [10] specifies basic security principles for PINs and PIN entry devices (e.g., PIN pads). For instance, to mitigate shoulder surfing attacks [12,13,17], ISO 9564-1 indicates that PIN digits must not be displayed on a screen, or identified using different sounds or sound duration for each key. As a compromise between security and usability, PIN entry systems display a fixed symbol (e.g., a dot) to represent a key being pressed, and provide the same audio feedback (i.e., same tone, same duration) for all keys. While previous work has demonstrated that observing the dots as they appear on screen as a result of a key press reduces the search space for a PIN [4], to our knowledge no work has targeted the use of audio feedback to recover PINs. In this paper, we evaluate how the adversary can reduce PIN search space using audio feedback, with (and without) using observable information such as PIN typing behavior (one- or two-handed), knowledge of one digit of the PIN, and knowledge of which keys have been pressed. We compare our attacks with an attack based on the knowledge of PIN distribution. Exploiting audio feedback has several advantages compared to observing the user or the screen during PIN entry. First, sound is typically easier to collect. The adversary might not be able to observe the ATM’s screen directly, and might risk being exposed when video-recording an ATM in a public space. In contrast, it is easy to record audio covertly, e.g., by casually holding a smartphone while pretending to stand in a line behind other ATM users. The sound emitted by ATMs is quite distinctive and can be easily isolated even in noisy environments. Second, sound enables higher time resolution compared to video. Conventional video cameras and smartphones record video between 24 and 120 frames per second. In contrast, audio can be recorded with a sampling rate between 44.1 kHz and 192 kHz, thus potentially allowing at least two orders of magnitude higher resolution. Contributions. In this paper, we analyze several novel side channels associated with PIN entry. In particular: 1. We show that it is possible to retrieve accurate inter-keystroke timing information from audio feedback. In our experiments, we were able to correctly detect 98% of the keystroke feedback sounds with an average error of 1.8ms. Furthermore, 75% of inter-keystroke timings extracted by the software had absolute error under 15 ms. Our experiments also demonstrate that interkeystroke timings extracted from audio can be more accurate than the same extracted from video recordings of PIN entry as done in [3,4]. 2. We analyze how the behavior of the user affects the adversary’s ability to guess PINs. Our results show that users who type PINs with one finger are more vulnerable to PIN guessing from inter-keystroke timings compared to users that enter their PIN using at least two fingers. In particular, the combining inter-keystroke timing with the knowledge that the user is a single-finger typist leads to 34-fold improvement over random guessing when the adversary is allowed to perform up to 5 guessing attempts.

722

M. Cardaioli et al.

3. We combine inter-keystroke timing information with knowledge of one key in the PIN (i.e., the adversary was able to see either the first or the last key pressed by the user), and with knowledge of which keys have been pressed by the user. The latter information is available, as shown in this paper as well as in recent work [1,11,16,24] when the adversary is able to capture a thermal image of the PIN pad after the user has typed her PIN. Our experiments show that inter-keystroke timing significantly improves performance for both attacks. For example, by combining inter-keystroke timing with a thermal attack, we were able to guess 15% of the PINs at the first attempt, reaching a four-fold improvement in performance. By combining multiple attacks, we were also able to drastically reduce the number of attempts required to guess a PIN. Specifically, we were able to guess 72% of the PINs within the first 3 attempts. 4. Finally, we show that uniform PIN selection might not be the best strategy against an adversary with access to one or more of the side-channel information discussed in this paper. Organization. Section 2 reviews related work on password and PIN guessing. Section 3 presents our adversary model. We present our algorithms for interkeystroke timing extraction in Sect. 4.1. In Sect. 4, we present the results of our experiments, while in Sect. 5 we analyze how different side-channels affect the guessing probability of individual PINs. We conclude in Sect. 6.

2

Related Work

Non-acoustic Side-channels. Vuagnoux and Pasini [20] demonstrated that it is possible to recover keystrokes by analyzing electromagnetic emanations from electronic components in wired and wireless keyboards. Marquardt et al. [15] showed that it is possible to recover key presses by recoding vibrations generated by a keyboard using an accelerometer. Other attacks focus on keystroke inference via motion detection from embedded sensors on wearable devices. For example, Sarkisyan et al. [18] and Wang et al. [21] infer smartphone PINs using movement data recorded by a smartwatch. Those attacks require that the adversary is able to monitor the user’s activity while the user is typing. However, there are attacks that allow the adversary to exploit information available several seconds after the user has typed her password. For instance, one such attack is based the observation that when a user presses a key, the heat from her finger is transferred to the keypad, and can be later be measured using a thermal camera [24]. Depending on the material of the keyboard, thermal residues have different dissipation rates [16], thus affecting the time window in which the attacks are effective. Abdelrahman et al. [1] evaluated how different PINs and unlock patterns on smartphones on can influence thermal attack performance. Kaczmarek et al. [11] demonstrated how a thermal attack can recover precise information about a password up to 30 s after it was typed, and partial information within 60 s.

Your PIN Sounds Good!

723

Acoustic Side-channels. Asonov and Agrawal showed that each key on a keyboard emits a characteristic sound, and that this sound can be used to infer individual keys [2]. Subsequent work further demonstrated the effectiveness of sound emanation for text reconstruction. Berger et al. [6] combined keyboard acoustic emanation with a dictionary attack to reconstruct words, while Halevi and Saxena [9] analyzed keyboard acoustic emanations to eavesdrop over random password. Because ISO 9564-1 [10] specifications require that each key emits the same sound, those attacks do not apply to common keypads, including those on ATMs. Another type of acoustic attack is based on time difference of arrivals (TDoA) [14,23,25]. These attacks rely on multiple microphones to triangulate the position of the keys pressed. Although this attacks typically result in good accuracies, they are difficult to instantiate in realistic environments. Song et al. [19] presented an attack based on latency between key presses measured by snooping encrypted SSH traffic. Their experiments show that information about inter-keystroke timing can be used to narrow the password search space substantially. A similar approach was used by Balagani et al. [4], who reconstructed inter-keystroke timing from the time of appearance of the masking symbols (e.g., “dots”) while a user types her password. Similarly, Balagani et al. [3] demonstrated that precise inter-keystroke timing information recovered from videos drastically reduces the number of attempts required to guess a PIN. The main limitation of [3,4] is that they require the adversary to video-record the ATM screen while the user is typing her PIN. Depending on the location and the ATM, this might not be feasible. Further, this reduces the set of vulnerable ATMs and payment systems to those that display on-screen feedback. To our knowledge, this is the first paper to combine inter-keystroke timing information deduced from sound recording with observable information from other sources, and thereby drastically reduce the attempts to guess a PIN compared to prior work. Our attacks are applicable to a multitude of realistic scenarios. This poses an immediate and severe threat to current ATMs or PoS.

3

Adversary Model

In this section we evaluate four classes of information that the adversary can exploit to infer PINs. These classes are: (1) Key-stroke timing information extracted from audio recordings; (2) Knowledge of whether the user is a singleor multi-finger typist; (3) Information about the first or the last digit of the PIN; and (4) Information about which keys have been pressed, but not their order. Next, we briefly review how each of these classes of information can be collected by the adversary. Class 1: Keystroke Timing. Keystroke timing measures the distance between consecutive keystroke events (e.g., the time between two key presses, or between the release of the key and the subsequent keypress). Collecting keystroke timing by compromising the software of an ATM located in a public space, or physically tampering with the ATM (e.g., by modifying the ATM’s keyboard) is

724

M. Cardaioli et al.

Fig. 1. Different typing strategies. Left: one finger; center: multiple fingers of one hand; right: multiple fingers of two hands.

not practical in most cases. However, as shown in [3], the adversary can infer keystroke timings without tampering with the ATM by using video recordings of the “dots” that appear on the screens when the user types her PIN. In this paper, we leverage audio signals to infer precise inter-keystroke timings. Class 2: Single- or Multi-finger Typists. The adversary can typically directly observe whether the user is typing with one or more fingers. While the number of fingers used to enter a PIN does not reveal information about the PIN itself, it might be a useful constraint when evaluating other sources of information leakage. Figure 1 shows users typing using a different number of fingers. Class 3: Information about the first or the last digit of the PIN. As users move their hands while typing their PIN, the adversary might briefly have visibility of the keypad, and might be able to see one of the keys as it is pressed (see Fig. 1). We model this information by disclosing either the first or the last digit of the PIN to the adversary. Class 4: Which Keys Have Been Pressed. This information can be collected using various techniques. For instance, the adversary can use a thermal camera to determine which keys are warmer, thus learning which digits compose the PIN (see, e.g., Fig. 2). As an alternative, the adversary can place UV-sensitive powder on the keys before the user enters her PIN, and then check which keys had the powder removed by the users using a UV light. While these attacks do not reveal the order in which the keys were pressed (except when the PIN is composed of one repeated digit), they significantly restrict the search space. Although this attack can be typically performed covertly, it requires specialized equipment.

Your PIN Sounds Good!

(a) Thermal trace after 2 seconds.

(b) Thermal trace after 7 seconds.

(c) Thermal trace after 10 seconds.

(d) Thermal trace after 15 seconds.

725

Fig. 2. Thermal image of a metallic PIN pad after applying a transparent plastic cover for PIN 2200.

4

Experiment Results

We extracted keystroke sounds using the dataset from [4]. This dataset was collected from 22 subjects, who typed several 4-digit PINS on a simulated ATM (see Fig. 3). Nineteen subjects completed three data collection sessions, while three subjects completed only one session. In each session, subjects entered a total of 180 PINs as follows: each subject was shown a 4-digit PIN. The PIN remained on the screen for 10 s, during which the subject was encouraged to type the PIN multiple times. After 10 s, the PIN disappeared from the screen. At this point, the subject was asked to type the PIN 4 times from memory. In case of incorrect entry, the PIN was briefly displayed again on the screen, and the subject was allowed to re-enter it. This procedure was repeated in three batches of 15 PINs. As a result, each PIN was typed 12 times per session. Each time a subject pressed a key, the ATM simulator emitted an audio feedback and logged the corresponding timestamp with millisecond resolution. Users were asked to type 44 different 4-digit PINs which represented all the Euclidean distances between keys on the keypad. Sessions were recorded in a

726

M. Cardaioli et al.

Fig. 3. Left: user typing a PIN using the ATM simulator. Right: close up view of the ATM simulator’s keypad.

relatively noisy indoor public space (SNR −15 dB) using a Sony FDR-AX53 camera located approximately 1.5 m away from the PIN pad. The audio signal was recorded with a sampling frequency of 48 kHz. 4.1

Extraction of Keystroke Timings from Keypad Sound

To evaluate the accuracy of timing extraction from keystroke sounds, we first linearly normalized the audio recording amplitude in the interval [−1, 1]. We applied a 16-order Butterworth band-pass filter [8] centered at 5.6 kHz to isolate the characteristic frequency window of the keypad feedback sound. Finally, to isolate the signal from room noise, we processed the audio recording to “mute” all samples with an amplitude below a set threshold (0.01 in our experiments). We then calculated the maximum amplitude across nearby values in a sliding window of 1,200 samples (consecutive windows had 1199 overlapping samples), corresponding to 25 ms of audio recording. We determined the length of the window by evaluating the distance between consecutive timestamps logged by the ATM simulator (ground truth), which was at least 100 ms for 99.9% of the keypairs. Figure 4 shows the result of this process. We then extracted the timestamps of the peaks of the processed signal and compared them to the ground truth. Our results show that this algorithm can accurately estimate inter-keystroke timing information. We were able to correctly detect 98% of feedback sound with a mean error of 1.8 ms. Extracting timings from audio led to more accurate time estimation than using video [4]. With the latter, 75% of the extracted keystroke timings had errors of up to 37 ms. In contrast, using audio we were able to extract 75% of the keystroke with errors below 15 ms. Similarly, using video, 50% of the estimated keystrokes timings had errors of up to 22 ms, compared to less than 7 ms with audio. Figure 5 shows the errors distribution for timings extracted from video and audio recordings.

Your PIN Sounds Good!

4.2

727

PIN Inference from Keystroke Timing (Class 1)

This attack ranks PINs based on the estimated Euclidean distance between subsequent keys in each PIN. In particular, we calculated an inter-key Euclidean distance vector from a sequence of inter-keystroke timings inferred from audio feedback. As an example, the distance vector associated with PIN 5566 is [0, 1, 0], where the first ‘0’ is the distance between keys 5 and 5, ‘1’ between keys 5 and 6, and ‘0’ between 6 and 6. Any four-digit PIN is associated with one distance vector of size three. Each element of the distance vector can be 0, 1, 2, 3, diagonal distance 1 (e.g., 1–3), diagonal distance 2 (e.g., 3–7), short diagonal distance (e.g., 2–9), or long diagonal distance (e.g., 3–0). Different PINs might be associated with the same distance vector (e.g., 1234 and 4567). The goal of this attack is to reduce the search space by considering only PINs that match the estimated distance vector. 0.06 Original Sound Filtered Signal Window Peak

0.04

Amplitude

0.02

0

-0.02

-0.04

-0.06

12

12.5

13

13.5

14

Time (s)

20%

20%

15%

15%

Frequency

Frequency

Fig. 4. Comparison between the original sound signal, filtered sound signal, windowed signal, and extracted peaks.

10%

5%

0%

10%

5%

-50

0

50

Sound Inter-key Time Error [ms]

0%

-50

0

50

Video Inter-key Time Error [ms]

Fig. 5. Error distribution of estimated inter-keystroke timings. Left: timing errors from audio. Right: timing errors from video.

728

M. Cardaioli et al. Logged

Sound

Video

100% 90% 80%

PINs Recovered

70% 60% 10%

50% 8%

40%

6%

30%

4%

20%

2%

10% 0%

0

0

2000

4000

10

20

6000

30

40

8000

50

0%

10000

Number of Guesses

Fig. 6. CDF showing the percentage of PINs recovered using keystroke timing information derived from the ground truth (logged), sound feedback, and video.

For evaluation, we split our keystroke dataset into two sets. The first (training set) consists of 5195 PINs, typed by 11 subjects. The second (test set) consists of 5135 PINs, typed by a separate set of 11 subjects. This models the lack of knowledge of the adversary of the specific typing patterns of the victim user. To estimate the Euclidean distances between consequent keys, we modeled a set of gamma function on the inter-keystroke timing distribution, one for each distance. We then applied the algorithm from [3] to infer PINs from estimated distances. With this strategy, we were able to guess 4% of PINs within 20 attempts—a 20-fold improvement compared to random guessing. Figure 6 shows how timings extracted from audio and video feedback affect the number of PIN guessed by the algorithm compared to ground truth. Timings extracted from audio feedback exhibit a smaller decrease in guessing performance compared to timings extracted from video. 4.3

PIN Inference from Keystroke Timing and Typing Behavior (Class 2)

This attack improves on the keystroke timing attack by leveraging knowledge of whether the user is a single- or multi-finger typist. This additional information allows the adversary to better contextualize the timings between consecutive keys. For single-finger typists, the Euclidean distance between keys 1 and 0 is the largest (see Fig. 3), and therefore we expect the corresponding inter-keystroke timing to be the largest. However, if the user is a two-finger typist, then 1 might be typed with the right hand index finger, and 0 with the left hand index finger.

Your PIN Sounds Good!

729

As a result, the inter-keystroke time might not be representative of the Euclidean distance between the two keys. To systematically study typing behavior, we analyzed 61 videos from the 22 subjects. 70% of the subject were single-finger typists; 92% of them entered PINs using the index finger, and 8% with the thumb. We divided multi-finger typists into three subclasses: (1) PINs entered using fingers from two hands (38% of the PINs typed with more than one finger); (2) PINs entered with at least two fingers of the same hand (34% of the PINs typed with more than one finger); and (3) PINs that we were not able to classify with certainty due obfuscation of the PIN pad in the video recording (28% of the PINs typed with more than one finger). In our experiments, subjects’ typing behavior was quite consistent across PINs and sessions. Users that were predominantly single-finger typists entered 11% of their PINs using more than one finger, while multi-finger typists entered 23% of the PINs using one finger. We evaluated guessing performance of timing information inferred from audio feedback on single-finger PINs and multi-finger PINs separately. We were able to guess a substantially higher number of PINs for each number of attempts for users single-finger typists (see Fig. 7) compared to multi-finger typists. In particular, the percentage of PINs recovered within 5 attempts was twice as high for PINs entered with one finger compared to PINs entered with multiple fingers. Further, the guessing rate within the first 5 attempts was 34 times higher compared to random guessing when using timing information on single-finger PINs. However, our ability to guess multi-finger PINs using timing information was only slightly better than random. This strongly suggests that the correlation Keystroke timing

Keystroke timing single-finger typist

Keystroke timing multi-finger typist

100%

PINs Recovered

80%

10%

60%

8% 6%

40%

4% 2%

20% 0

0%

0

2000

4000

10

20

6000

30

40

8000

50

0%

10000

Number of Guesses

Fig. 7. CDF showing the percentage of PINs recovered using only keystroke timing information from audio feedback, compared to timing information for single- or multifinger typists.

730

M. Cardaioli et al.

between inter-keystroke timing and Euclidean distance identified in [4] holds only quite strongly for PINs entered using a single finger, and only marginally for PINs entered with two or more fingers. 4.4

Knowledge of the First or the Last Digit of the PIN (Class 3)

In this section, we examine how information on the first or last digit of the PIN reduces the search space when combined with keystroke timings. Knowledge of one digit alone reduces the search space by a factor of 10 regardless of the digit’s position, because the adversary needs to guess only the remaining three digits. (As a result, the expected number of attempts to guess a random PIN provided no additional information is 500.) To determine how knowledge of the first or the last digit impacts PIN guessing based on keystroke timing, we applied the same procedure described in Sect. 4.2: for each PIN in the testing set, we associated a list of triplets of distances sorted by probability. We then pruned the set of PINs associated with those distance triplets to match the knowledge of the √ first or last PIN. For instance, given only the estimated distances 3, 0, and 2, the associated PINs are 0007, 0009, 2224, and 2226. If we know that the first digit of the correct PIN is 2, then our guesses are reduced to 2224 and 2226. Information about the first or last digit of the PIN boosted the guessing performance of the keystroke-timing attack substantially, as shown in Fig. 8. In particular, guessing accuracy increased by 15–19 times within 3 attempts (4.36% guessing rate when the first digit was known, and 5.57% when the last digit was known), 7 times within 5 attempts, and about 4 times within 10 attempts, Keystroke timing

Knowledge of one digit

Knowledge of first digit and keystroke timing

Knowledge of last digit and keystroke timing

100%

PINs Recovered

80%

25%

60%

20% 15%

40%

10% 5%

20% 0

0%

0

2000

4000

10

20

6000

30

40

8000

50

0%

10000

Number of Guesses

Fig. 8. CDF showing the percentage of PINs recovered using keystroke timings inferred from audio, random guessing over 3 digits of the PIN, and using inferred keystroke timings and knowledge of the first or the last digit of the PIN.

Your PIN Sounds Good!

731

compared to timing information alone. In all three cases, timing information substantially outperformed knowledge of one of the digits in terms of guessing rate. 4.5

Knowledge of Which Keys Have Been Pressed (Class 4)

In this section, we evaluate how knowledge of which digits compose a PIN, but not their order, restricts the PIN search space, in conjunction with information about keystroke timings. The adversary can acquire this knowledge, for instance, by observing the keypad using a thermal camera shortly after the user has typed her PIN [11], or by placing UV-sensitive powder on the keys before the user enters her PIN, and then checking which keys were touched using a UV light. Information on which digits compose a PIN can be divided as follows: 1. The user pressed only one key. In this case, the user must have entered the same digit 4 times. No additional information is required to recover the PIN. 2. The user pressed two distinct keys, and therefore each digit of the PIN might be repeated between one and three times, and might be in any position of the PIN. In this case, the number of possible PINs is 24 − 2 = 14, i.e., the number of combinations of two values in four position, except for the combinations where only one of the two digits appears. 3. The user pressed three distinct keys. The number of possible PINs is equal to the combinations of three digits in four positions, i.e., 4 · 3 · 3 = 36 4. The user pressed four distinct keys. The number of possible PINs is 4! = 24. We evaluated how many PINs the adversary could recover given keystroke timings and the set of keys pressed by the user while entering the PIN. Our Knowledge of PIN digits and inter-keystroke timings

Knowledge of PIN digits

100%

PINs Recovered

80%

40%

60%

30%

40%

20% 10%

20% 1

0%

0

5

10

15

2

20

25

3

30

0%

35

Number of Guesses

Fig. 9. CDF showing the percentage of PINs recovered with the knowledge of which keys have been pressed with and without inter-keystroke timing information.

732

M. Cardaioli et al.

results, presented in Fig. 9, show that combining these two sources of information leads to a high PIN recovery rate. Specifically, within the first three attempts, knowing only which keys were pressed led to the recovery of about 11% of the PINs. Adding timing information increased this value to over 33%. 4.6

Combining Multiple Classes of Information

In this section we examine how combining multiple classes of information leads to an improvement in the probability of correctly guessing a PIN. First, we investigated how guessing probability increases when the adversary knows one of the digits of the PIN (first or last), the typing behavior (singlefinger typist), and is able to infer inter-keystroke timing information from audio feedback. We used 3461 PINs typed by 11 subjects containing only single-finger PINs. In our experiments, we were able to guess 8.73% of the PINs within 5 attempts, compared to 6.97% with timing information and knowledge of one digit. We then considered knowledge of the values composing the PIN, typing behavior, and inferred timing information. In this case, we successfully guessed 50.74% of the PINs within 5 attempts, and 71.39% within 10 attempts. Finally, when we considered the values composing the PIN, one of the PIN’s digits, and inferred timing information, we were able to guess 86.76% of the PINs in 5 attempts, and effectively all of them (98.99%) within 10 attempts. All our results are summarized in Table 1.

5

PINs and Their Guessing Probability Distribution

In this section, we evaluate whether the classes of information identified in this paper make some of the PINs easier to guess than others, and thus intrinsically less secure. With respect to estimated inter-keystroke timings, different timing vectors identify a different number of PINs. For instance, vector [0,0,0] corresponds to 10 distinct PINs (0000, 1111, . . .), while vector [1,1,1] corresponds to 216 PINs (0258, 4569, . . .). This indicates that, against adversaries who are able to infer inter-keystroke timing information, choosing PINs uniformly at random from the entire PIN space is not the best strategy. The adversary’s knowledge of which digits compose the PIN has a similar effect of the guessing probability of individual PINs. In this case, PINs composed of three different digits are the hardest to guess, with a probability of 1/36, compared to PINs composed of a single digit, which can always be guessed at the first attempt. The adversary’s knowledge of one digit of the PIN and/or the typing behavior do not affect the guessing probability of individual PINs.

Your PIN Sounds Good!

733

Table 1. Results from all combinations of attacks considered in this paper, sorting by guessing rate after 5 attempts. Because single finger reduces the PIN search space only in conjunction with inter-keystroke timings, we do not present results for single finger alone. Information

PINs guessed within attempt

Keystroke timing Single finger First digit PIN digits 1

2

3

5

10

0.01%

0.02%

0.03%

0.05%

0.10%

0.10%

0.20%

0.30%

0.50%

1.00%

0.02%

0.31%

0.70%

1.05%

2.51%

0.03%

0.52%

0.91%

1.30%

3.38%

o

3.02%

3.72%

4.36%

6.97%

11.04%

o

3.73%

4.13%

5.43%

8.73%

14.01%

o

3.76%

7.52%

11.28% 18.80% 37.60%

o

15.54% 27.79% 33.63% 44.25% 65.57%

o o o

o

o o

o

o o

o

o o

6

o

o

19.04% 34.01% 40.60% 50.74% 71.31%

o

o

13.27% 26.62% 39.88% 66.40% 92.80%

o

o

35.27% 53.46% 66.84% 86.76% 98.99%

o

o

40.86% 60.24% 71.77% 89.19% 99.28%

Conclusion

In this paper, we showed that inter-keystroke timing inferred from audio feedback emitted by a PIN pad compliant with ISO 9564-1 [10] can be effectively used to reduce the attempts to guess a PIN. Compared to prior sources of keystroke timing information, audio feedback is easier to collect, and leads to more accurate timing estimates (in our experiments, the average reconstruction error was 1.8 ms). Due to this increase in accuracy, we were able to reduce the number of attempts needed to guess a PIN compared to timing information extracted from videos. We then analyzed how using inter-keystroke timing increases guessing performance of other sources of information readily available to the adversary. When the adversary was able to observe the first or the last digit of a PIN, inter-keystroke timings further increased the number of PINs guessed within 5 attempts by 14 times. If the adversaries was capable of observing which keys were pressed to enter a PIN (e.g., using a thermal camera), adding inter-keystroke timing information allowed the adversary to guess 15% of the PINs with a single attempt. This corresponds to a 4 times reduction in the number of attempts compared to knowing only which keys were pressed. We evaluated how typing behavior affects guessing probabilities. Our results show that there is a strong correlation between Euclidean distance between keys and inter-keystroke timings when the user enters her PIN using one finger. However, this correlation was substantially weaker when users typed with more than one finger. We then showed that the combination of multiple attacks can dramatically reduce attempts to guess the PIN. In particular, we were able to guess 72% of the

734

M. Cardaioli et al.

PINs within the first 3 attempts, and about 90% of the PINs within 5 attempts, by combining all the sources of information considered in this paper. Finally, we observed that different adversaries require different PIN selection strategies. While normally PINs should be selected uniformly at random from the entire PIN space, this is not true when the adversary has access to inter-keystroke timings or thermal images. In this case, some classes of PINs (e.g., those composed of a single digit) are substantially easier to guess than other classes (e.g., those composed of three different digits). As a result, uniform selection from appropriate subsets of the entire PIN space leads to harder-toguess PINs against those adversaries. We believe that our results highlight a real threat to PIN authentication systems. The feasibility of these attacks and their immediate applicability in real scenarios poses a considerable security threat for ATMs, PoS-s, and similar devices.

References 1. Abdelrahman, Y., Khamis, M., Schneegass, S., Alt, F.: Stay cool! understanding thermal attacks on mobile-based user authentication. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 3751–3763. ACM (2017) 2. Asonov, D., Agrawal, R.: Keyboard acoustic emanations. In: IEEE S&P (2004) 3. Balagani, K., et al.: Pilot: password and pin information leakage from obfuscated typing videos. J. Comput. Secur. 27(4), 405–425 (2019) 4. Balagani, K.S., Conti, M., Gasti, P., Georgiev, M., Gurtler, T., Lain, D., Miller, C., Molas, K., Samarin, N., Saraci, E., Tsudik, G., Wu, L.: SILK-TV : secret information leakage from keystroke timing videos. In: Lopez, J., Zhou, J., Soriano, M. (eds.) ESORICS 2018. LNCS, vol. 11098, pp. 263–280. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99073-6 13 5. B´ atiz-Lazo, B., Reid, R.: The development of cash-dispensing technology in the UK. IEEE Ann. Hist. Comput. 33(3), 32–45 (2011) 6. Berger, Y., Wool, A., Yeredor, A.: Dictionary attacks using keyboard acoustic emanations. In: Proceedings of the 13th ACM conference on Computer and communications security, pp. 245–254. ACM (2006) 7. Bonneau, J., Preibusch, S., Anderson, R.: A birthday present every eleven wallets? the security of customer-chosen banking PINs. In: Keromytis, A.D. (ed.) FC 2012. LNCS, vol. 7397, pp. 25–40. Springer, Heidelberg (2012). https://doi.org/10.1007/ 978-3-642-32946-3 3 8. Butterworth, S.: On the theory of filter amplifiers. Wireless Eng. 7(6), 536–541 (1930) 9. Halevi, T., Saxena, N.: A closer look at keyboard acoustic emanations: random passwords, typing styles and decoding techniques. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pp. 89–90. ACM (2012) 10. ISO: Financial services - personal identification number (pin) management and security - part 1: Basic principles and requirements for pins in card-based systems (2017). https://www.iso.org/standard/68669.html

Your PIN Sounds Good!

735

11. Kaczmarek, T., Ozturk, E., Tsudik, G.: Thermanator: thermal residue-based post factum attacks on keyboard password entry. arXiv preprint arXiv:1806.10189 (2018) 12. Kumar, M., Garfinkel, T., Boneh, D., Winograd, T.: Reducing shoulder-surfing by using gaze-based password entry. In: Proceedings of the 3rd symposium on Usable privacy and security, pp. 13–19. ACM (2007) 13. Kwon, T., Hong, J.: Analysis and improvement of a pin-entry method resilient to shoulder-surfing and recording attacks. IEEE Trans. Inf. Forensics Secur. 10(2), 278–292 (2015) 14. Liu, J., Wang, Y., Kar, G., Chen, Y., Yang, J., Gruteser, M.: Snooping keystrokes with mm-level audio ranging on a single phone. In: Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, pp. 142–154. ACM (2015) 15. Marquardt, P., Verma, A., Carter, H., Traynor, P.: (sp) iPhone: decoding vibrations from nearby keyboards using mobile phone accelerometers. In: Proceedings of the 18th ACM conference on Computer and communications security, pp. 551–562. ACM (2011) 16. Mowery, K., Meiklejohn, S., Savage, S.: Heat of the moment: characterizing the efficacy of thermal camera-based attacks. In: Proceedings of the 5th USENIX conference on Offensive technologies, p. 6. USENIX Association (2011) 17. Roth, V., Richter, K., Freidinger, R.: A pin-entry method resilient against shoulder surfing. In: Proceedings of the 11th ACM conference on Computer and communications security, pp. 236–245. ACM (2004) 18. Sarkisyan, A., Debbiny, R., Nahapetian, A.: Wristsnoop: smartphone pins prediction using smartwatch motion sensors. In: 2015 IEEE international workshop on information forensics and security (WIFS), pp. 1–6. IEEE (2015) 19. Song, D.X., Wagner, D., Tian, X.: Timing analysis of keystrokes and timing attacks on SSH. In: USENIX Security Symposium (2001) 20. Vuagnoux, M., Pasini, S.: Compromising electromagnetic emanations of wired and wireless keyboards. In: USENIX security symposium, pp. 1–16 (2009) 21. Wang, C., Guo, X., Wang, Y., Chen, Y., Liu, B.: Friend or foe?: your wearable devices reveal your personal pin. In: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 189–200. ACM (2016) 22. Wang, D., Gu, Q., Huang, X., Wang, P.: Understanding human-chosen pins: characteristics, distribution and security. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 372–385. ACM (2017) 23. Wang, J., Zhao, K., Zhang, X., Peng, C.: Ubiquitous keyboard for small mobile devices: harnessing multipath fading for fine-grained keystroke localization. In: Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services, pp. 14–27. ACM (2014) 24. Zalewski, M.: Cracking safes with thermal imaging. ser (2005). http://lcamtuf. coredump.cx/tsafe 25. Zhu, T., Ma, Q., Zhang, S., Liu, Y.: Context-free attacks using keyboard acoustic emanations. In: ACM CCS (2014)

GDPR – Challenges for Reconciling Legal Rules with Technical Reality Miroslaw Kutylowski1(B) , Anna Lauks-Dutka1 , and Moti Yung2,3 1

Department of Fundamentals of Computer Science, Wroclaw University of Science and Technology, Wroclaw, Poland {miroslaw.kutylowski,anna.lauks}@pwr.edu.pl 2 Columbia University, New York, USA [email protected] 3 Google LLC, New York City, NY, USA

Abstract. The main real impact of the GDPR regulation of the EU should be improving the protection of data concerning physical persons. The sharp GDPR rules have to create a controllable information environment, and to prevent misuse of personal data. The general legal norms of GDPR may, indeed, be regarded as justified and well motivated by the existing threats, however, substantial problems emerge when we attempt to implement GDPR in a real information processing systems setting. This paper aims at bringing attention to some critical challenges related to the GDPR regulation from this technical implementation perspective. Our goal is to alert the community that due to incompatibility between the legal concepts (as understood by a layman) and the technical state-of-the-art, a literal implementation of the GDPR may, in fact, lead to a decrease in the attainable real security level, thus hurting privacy. Further, this situation may create barriers to information processing environments – including in critical evolving areas which are very important for citizens’ security and safety. Demonstrating the problem, we provide a (possibly incomplete) list of concrete major clashes between the legal concepts of GDPR and security technologies. We also discuss possible solutions to these problems (from a technology perspective), and review related activities. We hope that this work will encourage people to seek improvements and reforms of GDPR based on realistic privacy needs and computing goals, rather than the current situation where people involved in IT projects, merely attempt to only do things that are justified (and perhaps severely restricted) by GDPR.

Keywords: GDPR

· Compliance · Privacy · Security

M. Yung—The opinions in this work are personal and do not represent the employers of the authors. The work of the first author has been initiated within the project 2014/15/B/ST6/02837 of Polish National Science Centre. c Springer Nature Switzerland AG 2020  L. Chen et al. (Eds.): ESORICS 2020, LNCS 12308, pp. 736–755, 2020. https://doi.org/10.1007/978-3-030-58951-6_36

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

1

737

Introduction

Since mid 2018, processing personal data in Europe as well as in some cases also outside Europe has to comply with the General Data Protection Regulation (GDPR) [19] of the European Union. In many other countries (including USA, P.R.C., Japan, Korea, Australia, . . . ) heavy personal data protection laws have been enforced – however there might be profound differences among these, both in terms of the overall concept and in term of the fine details. In theory, harsh penalties for not adjusting to GDPR rules on the one hand, and an increased awareness of protection necessity as well as technological advances, on the other hand, should guarantee a swift transition to a world governed by the GDPR rules. However, the emerging reality is quite different: First, the GDPR may be regarded as a paper tiger that is not guarding against personal data acquisition (Edward Snowden, 2019), while on the other hand it has created substantial costs for running information systems. Moreover, even in Europe real adjustment to the fundamental principles is far from being the reality. In some areas the situation seems to be alarming (see Sect. 4.3 for some examples: – the situation of AI in Europe and epidemic information processing). Nevertheless, there is almost no discussion about GDPR itself and the necessity for corrections given the experience gathered. As the EU Commission is to present a review on GDPR in 2020, now it is, perhaps, the last moment to express concerns. Paper Organization. Due to the lack of space we do not provide an introductory guide about the GDPR Regulation – there is an abundant literature on this topic. In Sect. 2 we present challenges for realization of the GPDR rules. These problems are not merely related to ambiguity of a legal text – most of them are due to overlooking the full complexity of issues in the information society. As the devil is in the details, we attempt to provide points that can be described as legal norms that would be appropriate from the technology point of view. In Sect. 3 we present a few general concepts de lege ferenda. In Sect. 4 we report some observed initiatives (or their lack) aimed at adjusting GDPR after two years of experience.

2 2.1

GDPR Challenges Classification as “Personal Data”

The GDPR regulation says that data falls into the category of personal data if and only if it concerns an identifiable person. It does not specify the decision context. To make the problem harder, in many practical cases the decision must be automated. Of course, there are evident cases, where the data contain explicit identifiers of data subjects. The hard case is to guarantee that a data piece D does not fall into the category of personal data. A procedure F classifying data as personal data or non-personal data may take into account the following context as its input:

738

M. Kutylowski et al.

Option 1: just the data D, by itself, Option 2: D in the context of all other existing datasets, Option 3 (hybrid solution): D and a piece of information about the other datasets available to the data controller. Properties of Option 1: While it is easy to implement, classification as notpersonal data may turn out to be obviously wrong taking into account the purpose and plausible interpretation of the GDPR. Adopting option 1 would enable easy circumvention of the GDPR principles. Properties of Option 2: This option is infeasible in practice. Even if access to all other datasets is given, the analysis itself might be extremely complex and infeasible to automate. Definitely, it is very unlikely to get a decision in a real time. Last but not least, in general there is no free access to all other datasets. But even if the access is granted, the permissions are usually granted for a “fair use”. The analysis being, in fact, a deanonymization attempt will not necessarily be treated as a “fair use”. It can even be regarded as an offense against GDPR. Properties of Option 3: This option is more realistic for implementation than option 2 (however it is still potentially hard and costly). Unfortunately, it may deliver misleading results from the point of view of GDPR goals. As we see, every option leads to a kind of deadlock. The main problem is that the attribute personal data is regarded as a general property unrelated to the party processing this data. A different approach would be to determine this attribute in the context of the party processing the data. Thereby, the classification of the same dataset D might be different for different parties. Indeed, for instance, if data in D are encrypted homomorphically, then a party P1 holding D but not the decryption key could perform nontrivial operations on ciphertexts from D without regarding them as personal data. At the same time a party P2 holding the decryption key should regard D as a personal data (so even an unauthorized read operation executed by P2 would be a GDPR violation). An immediate consequence of such an approach is that one cannot permanently classify a given data, the burden to classify data as personal or non-personal is on the party processing the data. (After all: Data is not information on papers which live in a static form forever, data in computations is processed within a context by different parties!) Rule 1 (personal data as an attribute of data and processing party). A data shall be considered a personal data by a party processing it, if and only if this party can identify a physical person related to these data. Let us note that Rule 1 has to be used with caution. While for internal processing Rule 1 can be easily applied, it does not automatically enable a data processor P to transfer a non-personal data D to other parties. In this case P has to make sure that D has still the attribute non-personal for the receiving party. If this is not the case, then the GDPR rules apply in full for the transfer. These precautions are necessary in particular if P aims to publish D. Then P has

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

739

to make sure that by publishing D it will grant an unlawful access to personal data. We discuss these problems in detail in Sect. 2.2. The proposed approach would have deep consequences for security practice. Assume for instance that party Y has technical capabilities to link the pseudonyms in an anonymous credentials system back to real identities of the users (e.g. thanks to a trapdoor information like the opening authority in group signature schemes which are used for multi-use anonymous credentials, or just thanks to extraordinary computational resources). The system of anonymous credentials can be used as usual, and the regular participants may safely assume the data processed do not fall into the category of personal data. On the other hand, party Y should keep hands off all data from the anonymous credential system, as any operation on these data would mean a violation of the GDPR principles. While Y would have technical possibility to access personal data, they would have no lawful way to take advantage of this knowledge. In fact, the above situation would be uncomfortable for Y , as it would be necessary to collect a convincing evidence that it has not used its extra knowledge. Indeed, sooner or later it may be revealed that Y had such capabilities and Y might be accused of using it for own advantage. Therefore, for its own interest party Y should create a verifiable system guarding the use of the trapdoor information – e.g. via secret sharing and storing the shares in different locations controlled by different bodies, or an automatic audit of processing activities, and so on. A good (but still academic) example of a scheme that automatically guards against misusing knowledge advantage are fail-stop signatures: even if a powerful party Y can compute discrete logarithms, it cannot use this capability for forging signatures without creating a strong evidence of a forgery. 2.2

Consequences of Processing Non-personal Data

If data categorization depends on the processing party, the following problem arises: Problem 2 (conversion to personal data). It may happen that a party X processes data D, which is non-personal from the point of view of X,and creates an output that converts Dto personal data from the point of view of a party Z. We propose the following rule: Rule 3 (impact of processing data). A party X processing non-personal data is responsible for all consequences of that processing from the point of view of GDPR. It can be argued that X can freely process data D, as literally taken these activities are not covered by GDPR. However, indirectly they may have a significant impact of giving access to personal data for other parties: Example 1. A dataset D with no personal data from the point of view of party A is transmitted to party B in a country where personal data protection rules are not obligatory. This is not directly prohibited by GDPR as the regulation’s

740

M. Kutylowski et al.

scope are exclusively personal data. On the other hand, at the same time it may happen that for B or its partner C the data D falls into the category of personal data. Assume now that B or C de-anonymizes the data D and publishes the resulting data D . Neither B nor C can be accused of GDPR violations assuming that D does not concern offering goods or services by B or C. Party A also may not be accused directly due to the fact that it has not performed any transfer operation of a personal data. In order to avoid the problems one may adopt the following detailed interpretation of GDPR and the proposed Rule 3: Rule 4 (Admissibility of data transfer). A may send data D to B iff A can reasonably assume that either: – D is not-personal data for B or any potential partner of B, or – B complies with the GDPR obligation and has the right to keep this data. Note that according to Rule 4 it is legitimate to transfer data to party B that complies with the Privacy Shield framework. 2.3

Abandoned Data

The case so far ignored by GDPR is the situation when personal data is processed by a party P , but for some reason P becomes inactive (P may become inactive or even cease to exist). The data themselves may be hosted in a system maintained by a third party provider H. What should happen with these data? In an analogous case of a certification authority P , there are detailed rules how to manage data related to the qualified certificates if P terminates its services. The host H has a dilemma of what to do with the personal data left by P : deleting the data is not allowed, as protection against destroying data is one of the main targets of GDPR. Erasure is allowed only if H is entitled by law or by the data subjects concerned. Keeping the data as they are, is also processing personal data in the sense of GDPR and requires appropriate legal grounds. As long as the data cannot be classified as personal data by H, then the situation is slightly less problematic for H – there are no provisions in GDPR that would prohibit erasure of non-personal data. Therefore, in any kind of cloud services it is important for H to apply effective means of pseudonymization. The usual argument in favor of such approach is to guard against misuse of the private data by H. From the point of view of the provider H, the legal safety against supervisory authorities might be regarded as even more important. Over time it may turn out that the used data protection mechanism becomes ineffective – e.g. due to advances in cryptanalysis. In this case H may be accused of GDPR violations when no countermeasures have been implemented. Note that H may be charged not only when key leakage is its direct fault – GDPR requires to implement appropriate safeguards against third party attacks as well.

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

2.4

741

Semantically Neutral Pseudonymization

GDPR attempts to be technologically neutral, however it frequently refers to techniques such as pseudonymization. One may have an impression of pseudonymization being the Holy Grail of personal data protection. While pseudonymization might be extremely effective and useful in practice, we present an example showing that nontrivial semantic problems may arise. One may hope that pseudonymization or anonymization is a process that, apart from hiding the identity of the data subject, does not change the meaning of the data being processed. This is frequently the case, but not always: Example 2. Consider in a medical dataset D containing the following note: Alice suffers the same symptoms as her brother Bob. There are the following options for pseudonymization of Bob’s identity in the medical record of Alice (note that we do not aim to pseudonymize Alice, as she is a patient and her real ID must be available for her physician). – Replacing Bob by a pseudonym X does not change the fact that X can be identified as Bob (provided that Alice has a single brother), so X becomes an identifiable person and the data record Alice suf f ers the same symptoms as her brother X still falls into the scope of personal data of data subject Bob. The original goal of hiding Bob’s identity has not been achieved. – In order to prevent immediate linking, we replace her brother Bob by X: Alice suf f ers the same symptoms as X This removes the link to Bob entirely, but semantically this is a different sentence. It merely says Alice is not the only person with these symptoms, while the original data record may indicate possibility of a genetic background. – One can apply full pseudonymization and store the record Y suf f ers the same symptoms as her brother X. however this record is useless for the medical treatment of Alice. Of course, the right for health protection of Alice may overwrite the right of Bob’s privacy and this follows directly from GDPR. So, there is an excuse for not applying pseudonymization in medical records. Nevertheless, Example 2 has to show that replacing identifiers by pseudonyms cannot be regarded as a universal tool solving all problems of privacy protection: Problem 5 (ineffective pseudonymization). There are data records for which pseudonymization either does not hide the identity of a data subject or changes semantic meaning of the data. In both cases pseudonymization fails its purpose.

742

2.5

M. Kutylowski et al.

Shared Personal Data

GDPR silently assumes that there is a link between a data record and at most one data subject. However, how should the rules apply, if a record D concerns more than one data subject? Problem 6 (multiple data subjects problem). Assume that a data record D concerns identifiable data subjects A and B. Then, a consent of which party is required according to GDPR to process D: – a consent of a single party (either A or B), or – a consent of both A and B? Example 3. A data record D stored by a controller M concerns data subjects A and B. Then A requests to store D, while B requests to remove D. What should be the action of M in this situation? If a consent/request of one party suffices to process D, then M gets into troubles: – if D is erased by M following the request of party B, then what about the right of A for protection of her data from erasure? – if D is kept by M following the request of A, then what about the right-tobe-forgotten of B? So whatever M decides to do, it can be accused of violating the obligations formulated in GDPR. If a consent of all data subjects is required, then again M is in a deadlock situation: – M cannot continue to store D as the consent of all data subjects is missing, – M cannot erase D as the consent of all data subjects is missing. Problem 7 (data with multiple data subjects). It is necessary to agree upon an interpretation or extension of GDPR so that the obligations of the data controller are defined explicitly in case of multiple data subjects of the data. An idea for solving this problem might be the following GDPR extension: Rule 8 (Progressive/regressive data processing). Each data record should have a field or multiple fields data subject. The content of this field can be determined manually at the data creation time and can be changed during any editing operation. The operations on personal data should be classified as progressive and regressive. A progressive operation is an operation that creates a new information contents. A regressive operation is an operation strictly limited to erasing information contents. For progressive operations a consent of all data subjects is necessary (or a corresponding legal reason replacing the consent). For regressive operations a request/withdrawal of the consent by just one data subject is enough to legitimize the operation. If processing personal data complies with Rule 8, then the following invariant condition holds: if D is stored in a data set then we may assume that all data subjects of D gave their consent to store D in this form.

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

2.6

743

Linked Data and Local Categorization

GDPR silently assumes that data D storing personal data of a single party A can be published iff there is a consent of A or a legal reason for this. However, it may happen that there is no legal reason to publish D, A has not given a consent to publish it, but nevertheless the data gets effectively published: Example 4. – Alice and Bob gave their consent to publish the following record D: “Alice and Bob earn together x EUR” in a public dataset M1 . – Later Alice gives her consent to insert the record D : “Alice earns y EUR” in a public dataset M2 – a cloud space provided by P2 . Problem 9 (implicit personal data processing). In the situation from Example 4, can the provider P2 of M2 store D following the request of Alice? There are arguments that P2 should follow the request as well as deny it: – The data subject of D is exclusively Alice, so according to GDPR (and a civil contract with Alice) P2 may be obliged to store D . – On the other hand, publishing D is equivalent to publication of a record D : “Bob earns x−y EUR.” This happens without a consent of Bob. Moreover, the only data subject for record D is Bob. Problem 10 (semantic analysis). Is a data processor obliged to perform a semantical analysis of the request concerning data processing having in mind violations of personal data protection of people other than the explicit data subject? A positive answer to Problem 10 would raise the question what is the necessary scope of the analysis and how much is the data controller responsible for a misclassification? In practice any analysis may fail. For instance, the access to M1 might be restricted (e.g. if this is a payed service or a service for a closed set of users). Moreover, P2 may be even unaware of the existence of the data record D. Even if free access to all datasets that may contain relevant data records D is given, performing necessary analysis may enormously increase the cost and dramatically decrease efficiency of massive data processing. Indeed, an operation on a single data record would require parsing all datasets that may be related to that record (note that the records implying personal information may not be as simple as in the example, but can be, in fact, a complicated analysis of records about subset of people). The result would not be available in real time. The only solution to this legal deadlock seems to be adoption of the following rule: Rule 11 (extended context of a consent). A consent of a party X concerning a data record D in a dataset M should be understood as the right to process D in M regardless of the context that may emerge outside M . In Example 4, given Rule 11, the right of Alice to publish D should be unrestricted. Concurrently, when Bob gives his consent to publish D, he must keep in mind that Alice if free to disclose her personal data.

744

M. Kutylowski et al.

2.7

Data Aggregation

Assume that party P holds a dataset D containing personal data collected according to GDPR (e.g. the data necessary to run the contracts with the customers of P ). Assume that P aggregates data from D: for instance, P may compute the average amount of money spent by the clients of P . Computing such characteristics might be commercially useful for P , but not always are the data processed for the original purpose (realization of a contract or fulfilling legal obligations). Problem 12 (data aggregation). Does the result of an aggregation operation fall into the category of personal data? Does aggregation fall into the category of processing personal data – based on the fact that its inputs are personal data? Answering the above on the grounds of GDPR is uneasy: – The aggregated data does not concern a single person, but in the mathematical sense it brings some information about each data subject: e.g. the probability distribution for the salary of Alice might be different than the probability distribution of her salary given the median salary. Note that it may happen that the median belongs to Alice – and the exact amount is revealed. This might be a sensitive information. For instance, the income declared in the previous years is used as an authentication key for a current tax declarations in Poland. It follows that a large anonymity set is not enough to claim that the aggregation result can be revealed without violating cybersecurity conditions (regardless whether we regard it as a violation of GDPR). – At which moment the aggregated data looses its attribute personal data? The legal system only considers a Boolean answer, while in reality there are plenty of possibilities in between. There might be efforts to find simple shortcuts like: data concerning a group of more than 10 persons is not personal data, but it is so easy to misuse such rules in order to bypass the intended personal data protection. The concepts like differential privacy are attractive in theory, but using it in a standard practice could be a nightmare. For instance, what  should be used in the context of -differential privacy? Or, since differential privacy techniques are based on probabilistic noise, what about cases where the exact sum of private data items has to be calculated for accounting or other commercial purposes? There is the following legal dilemma: Problem 13 (data processing classification). In order to classify a process as processing personal data: – is it enough to find that personal data are used as input in the process? or – the personal data must appear (explicitly or implicitly) in the output of the process? A possible pragmatic solution to Problem 13 might be the following:

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

745

Rule 14 (narrow definition of data processing). A data processing P where personal data are included in the input of P shall not be understood as processing of personal data, if the output of P (explicit and implicit) does not contain personal data. Rule 14 would be very useful for big data and AI computations like learning models based on private inputs which do not retain private properties. Adopting Rule 14 would also provide a positive answer to the following question: Problem 15 (right to anonymize). Is it legal to create a dataset Anon(D) by anonymization of all data records of D? A positive answer to Question 15 would solve a lot of problems concerning usage of data that are initially contaminated with personal data. This issue has been recognized – for this reason in the current version of GDPR there are exceptions from the general restrictions to process personal data. This touches, in particular, the issue of processing for research purposes (under the provision of respecting the fundamental rights of the data subjects). On the dark side, current exceptions from the general personal data protection rules make room for potential misuses and privacy violations. A rogue party may relatively easily masquerade personal data processing as a research activity. Adopting the right to anonymize data would eliminate the need for the GDPR exceptions in most of the cases occurring in practice. Thereby, one could eliminate the necessity of many exceptions without endangering the research targets. Let us note that the current GDPR interpretation of European Data Protection Supervisor is not in line with Rule 14 (see Sect. 4.1). 2.8

Quorum Systems

The authors of GDPR seem to have in mind traditional data processing techniques originating from pre-electronic era: it has been silently assumed that each data record is stored in a single physical piece and there is a single party controlling the physical medium used. In such a situation, whenever a data record has to be modified, there is a corresponding physical operation executed on the corresponding physical memory location. GDPR allows a situation of more than one controller (joint controllers). In such a case the roles and responsibilities of the controllers have to be strictly defined. In a quorum system this is not the case. Moreover, a quorum system can be run by a number of independent parties (in fact this is a preferred solution making a user independent from a particular service provider). In a quorum system a data operation (read, write, erase) is initiated by separate requests to the servers of the system. The key property of the system is that some requests may result in a failure. A quorum system is immune to such failures, as long as the number of failures is limited. According to GDPR, a data processor has strict obligations concerning data integrity. These obligations are not fulfilled by an individual member of a quorum

746

M. Kutylowski et al.

system but by the system as a whole. On the other hand, a user has the right to exercise its own rights against any of joint controllers. Thereby, the only safe solution would be to run a quorum system by a single organization. However, this is just what we aim to avoid – a technical dependency on a single organization. 2.9

Secret Sharing

Assume that a personal data D is stored using a secret sharing scheme (e.g. a threshold system k-out-of-n) and that each share is kept by a different party. (This is a favored solution if we have to ensure that no single party can reconstruct the data). Problem 16. How does the GDPR regulation apply to a party holding a share of a personal data according to a secret sharing scheme? From the information theoretic point of view the data contained in a share might be purely random (e.g. for schemes based on Lagrange interpolation of polynomials), so one can argue that a share is not a data concerning a data subject. On the other hand, this data collectively with shares from other parties enables reconstruction of the personal data. So which obligations for a data processor apply in this case? The discussion here is not purely academic – if we adopt the interpretation that a party holding only a share is not processing personal data, then among others the following problems may arise: First, a party keeping a share is no longer obliged by GDPR to protect the share from erasure or modification – indeed, these obligations concern only personal data. Second, after splitting a personal data into shares one could freely store them without a consent of a data subject – as long as each share is stored by a different party. Indeed, no consent is required by GDPR for processing non-personal data. Thereby, it would be very easy to circumvent the personal data protection requirements of GDPR. On the other hand, adopting the interpretation that storing a share falls into the scope of GDPR also creates problems: as long as at least k shares are available in a k-out-of-n threshold system, no complaints are likely. However, in case when less than k shares are intact, then who bears responsibility of the data loss? It is likely that the data subject would sue each share holder that has failed to keep it share. The legal ground would be violation of the GDPR requirement for protecting stored personal data. 2.10

Communication

A standard way of transmitting confidential data over public networks is to send the data encrypted with the key shared by the sender and the receiver (endto-end encryption). In reality, the communication protocol may involve many operations that are not limited to forwarding the ciphertext to the destination point: the packets may be duplicated, dropped, sent over multiple paths, stored in the spooling systems, additional encryption layers may be imposed, etc. In case

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

747

of a standard copyright law, these operations are not regarded as an exploitation of authors’ copyright as long as they serve the original purpose of message delivery. Such an exempt does not exist in the case of GDPR: these operations are regarded as processing personal data, as long as the data themselves can be regarded as related to an identifiable person. In almost all communication protocols at least the destination address is explicitly given (the notable exception are the systems like TOR that aim to hide the identity of the communicating parties). In this case the data sent explicitly involves the receiving party. If this is a physical person, then the data fall into the category of personal data. It can be argued that due to encryption the data are unreadable and therefore in some sense “erased.” However, even in this case one can argue that encryption is a form of secret sharing (one share is a ciphertext or ciphertexts and one is a key or keys). Consequently, the same concerns apply. Last but not least, there is always some personal data leakage - the communication volume, which reveals the maximal Kolmogorov complexity of the plaintext data transmitted. Just like in the case of secret sharing, creating a legal framework well capturing existing communication techniques and not creating obstacles for technical improvements is a challenging task. While there are no reports about supervisory authorities putting their hands on existing communication protocols, it cannot be excluded that in the future innovative technologies will be blocked due to such formal reasons. The supervisory authorities may for instance take a position that resilience against traffic analysis is one of GDPR requirements. Anyway, it is relatively likely that provisions of GDPR and the corresponding personal data protection acts can be used as an excuse for protectionist practices in the market of emerging communication networks (5G and beyond). 2.11

P2P Systems

Apart from other problems with distributed systems, there might be problems specific for P2P systems. One of the trouble sources is that the destination server for a given data is not known in advance – it is determined by P2P assignment of logical addresses to the servers. This, in turn, depends on the current state of the system. The owner of the data can be even unaware of the identity of the server storing his data. This prevents using a P2P system by a data controller even for storing encrypted records, as there must be a contract between the data controller and the data processor. For keeping the data by the data subject himself there are similar problems: the party processing the data has, according to GDPR, certain obligations to inform the data subject. However, in many P2P schemes the address of the data subject is not automatically available to the destination server. Problem 17. Certain rules of GDPR are hard or infeasible for P2P systems. The problems are caused, among others, by lack of an explicit linking between the party inserting the data and the party storing the data.

748

M. Kutylowski et al.

2.12

Blockchain and Append-Only Data Systems

Recently, append-only data systems are gaining popularity, with the Blockchain as perhaps the most prominent example. The most important feature of such systems is that after a data entry is inserted into the system, it cannot be altered or erased. This is fundamental not only for cryptocurrencies, but also for achieving undeniability of transactions in the classical financial systems. GDPR related problems start, if such a system admits storing personal data. The source of the troubles is the right-to-be-forgotten. No civil agreement may overwrite this right. A party P running the system may defend itself by storing only data that fall into the category of non-personal data. However, this does not guarantee that the problems will be avoided. For instance, a user submitting a signed data s may publish elsewhere a certificate with the public key for signature verification and own real ID. At this moment s becomes personal data, and consequently the user may exercise his right-to-be-forgotten and demand erasure of s. However, this should be technically infeasible and the system provider is trapped to violate GDPR rules.

3

De Lege Ferenda

Apart from solutions suggested in Sect. 2, we present here a few general concepts for privacy protection that would ease deploying pragmatic technical solutions while on the other hand imply high standards of personal data protection. Processing Personal Data Versus Use of Personal Data. In the current legal situation there are strict rules for processing personal data. On the other hand, the processing itself might be unintentional and/or unconscious. For instance, as noted in Sect. 2.1 it might be non-trivial to determine whether processing of personal data takes place. In the gray area where it is unclear whether the regulation applies, a safe solution is to take precautionary steps and assume that the GDPR rules apply in full. However, such a strategy leads to severe limitations of data processing with profound negative consequences. For instance, it may hinder detection and prevention of financial criminality. Another important field of this kind is controlling the spread of infectious diseases, where adjustment to the rules of GDPR may decrease efficiency of identifying infection chains. Most negative side effects would be avoided, if the regulation were concentrated on preventing negative consequences of a misconduct. The following limitation involves administrative fines which could serve this goal: Rule 18. Administrative fines may be imposed on a party P processing personal data without due diligence, if this results in: (a) a profit for P while the rights and freedoms of a data subject are violated, or (b) a violation of the data subject’s rights and freedoms by a third party.

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

749

Implementing Rule 18 would prevent any administrative fines when noncompliance with GDPR brings no profit to the data processor while the violations have strictly internal consequences. The scope of Rule 18 should be understood broadly. It should apply even if the violation addressed by point (b) takes place outside the territorial scope of the GDPR. (Of course, the legal interpretation of our technical desire is in place). Such an approach would solve some problems, e.g. in the area of data analysis. Even if big data analysis does not fully comply with the GDPR, there will be no sanctions as long as the data processor does not make profit based on a violation. Moreover, the data processor would be able to concentrate on making sure that the rights and freedoms of data subjects are respected, rather than on compliance issues. Last but not least, there might be the cases where the rights and freedoms of data subjects are overridden by other rights and higher values (the common good involving emergencies, is an example). Responsibility. One of the sources of ineffectiveness of GDPR is focusing on administrative fines payed to the state supervisory authorities. This may result in a situation where the fines secure the state income and not the interests of the potential victims. An alternative option would be a mechanism of an automatic compensation: Rule 19. On demand of a victim, a party getting a profit resulting from a violation of personal data protection rules has to pay a compensation proportional to this profit, independently of who is responsible for the original violation. For Rule 19, a default amount could be determined in order to ease risk analysis and eliminate legal disputes. The practice will determine the value of private information based on what a person loss is. Due to Rule 19, a practical effect on deferring unlawful processing would be more predictable, while the penalties would lose its ad hoc character. In order to avoid penalties, any commercial activity should be supported by procedures that are transparent and provide a self-evident proof of lawful processing. A promising technique in this direction is the SPKI public key infrastructure [7]. What SPKI really provides is a user-centric framework of access control. The SPKI system of delegating access rights would provide a clear and efficient way of determining the right to process personal data.

4

State of Discussion and the Previous Work

While there are many activities regarding the GDPR regulation, there is not much focus on rethinking the general paradigms of GDPR. An implicit assumption is almost always that GDPR is not a subject for discussion and that with some (substantial) efforts one can comply with its requirements. Typically, only narrow aspects of personal data protection in selected IT systems are concerned.

750

M. Kutylowski et al.

The situation is somewhat surprising taking into account that GDPR itself did not declare itself sacred, and, rather, stipulates regular reviews on GDPR practice by European Commission, taking into account “the positions and findings of the European Parliament, of the Council, and of other relevant bodies or sources”. So far, with a few exceptions mentioned below, there are not many such other sources, while the first review is scheduled for 2020. Below we review some ongoing and prior activities; our focus is technical though some related legal issues naturally enter into the discussion. 4.1

Activities of Authorities

European Data Protection Supervisor. Among others, the role of the EDPS is to evaluate regulatory initiatives and provide guidance to the interested parties. So one could expect that the problems of the GDPR implementation are reported in Annual Reports of EDPS. Indeed, there is such a case in the 2019 report [9]: We received a complaint from a member of staff at an EU institution. He wanted access to his personal data relating to a harassment complaint, submitted against him by a colleague, which had been declared inadmissible. The part easy to resolve was the invalid ground of data access refusal – protection of a potential victim while the complaint was found inadmissible. A non-trivial issue is that the accusation involved at the same time two other persons in the same institution. Finally, fulfilling the request of the complainant would include the personal data of all accused persons as well as the alleged victim. The position of the EDPS was: We requested that the EU institution take all reasonable steps to ensure that the complainant’s right of access to his personal data was granted. Literally taken, this opinion indicates that the right to access the data by the complainant overrides the privacy rights of the victim and of the other accused persons. On the other hand, EDPS says that the EU institution should balance the rights and freedoms of the complainant and of the alleged victim. The problem is that such “balancing” is infeasible in a massive processing environment, and creates enormous costs when done manually. Another interesting issue reported in [9] is a temporary ban on the production of social media monitoring (SMM) reports by EASO providing news on the latest shifts in asylum and migration routes and smuggling offers, as well as an overview of conversations in the social media community relating to key issues. In this case, the personal data were processed by EASO without a proper legal ground, but the result did not contain personal data, while EASO declared that they took excess measures to ensure that no personal data was ever stored. The ban shows that GDPR may block data processing even if the result is not violating privacy and freedoms of data subjects, while there is a clear public interest for processing. So, the current practice does not follow the proposed Rule 14. EDPS attempts to provide guidance on interpretation of the privacy law. Quite useful are the guidelines proposing a framework for evaluation such aspects as necessity and proportionality of processing. However, they focus on privacy

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

751

protection by the EU institutions and do not cover the problems arising outside the public administration. ePrivacy. Closely related to GDPR is the ePrivacy regulation proposal [8] aiming at setting barriers for misusing personal data by providers of electronic communication. The general idea of ePrivacy is that a provider of electronic communication cannot use the data of the subscribers except for direct service maintaining. The current process of reaching an agreement upon ePrivacy is slow, with major disputes in the EU. Quite interesting from the point of view of the problems discussed in Sect. 2.5 is admitting processing user’s data by the provider for the purpose of the provision of an explicitly requested service by an end-user for purely individual use if the requesting end-user has given consent and where such requested processing does not adversely affect fundamental rights and interests of another person concerned and does not exceed the duration necessary . . . . This particular condition is a special case of the proposed Rule 8. US CLOUD Act. Personal data protection is an area of conflict between the EU and the US authorities. The recent US CLOUD Act gives the US law enforcement authorities the right to access personal data processed by US providers overseas. On the other hand, the EU data protection authorities indicate that there is no common understanding regarding law enforcement and the scope of unlawful activities. This creates a very hard situation for some companies – it might be impossible to comply with data access rules of GDPR and CLOUD. 4.2

Academic Research

The academic IT community has devoted a substantial effort to ease implementation of GDPR. Usually, even if the authors point to certain difficulties, some kind of solution within the scope of the current regulation is proposed. Most of the papers present solutions addressed for very specific application areas. GDPR is a legal concept focusing on essential legal principles to be achieved. Unfortunately, what is observed is that there is a significant conceptual gap between legal and mathematical thinking around data privacy (see e.g. [4]). Compliance. Many researchers provide an evidence that implementation of GDPR creates high organizational costs. Just understanding the requirements by SME is already a substantial problem. Some works provide tools enabling translating the requirements into a form that can be addressed in an algorithmic way [11]. For bigger organizations achieving compliance becomes a very complex task due to its scale. There have been many efforts to ease this process by, say, semi-automated audits (see e.g. [1]).

752

M. Kutylowski et al.

Privacy Trade-offs. There are examples of substantial problems on the technical side. Among others, the authors point to metadata explosion and conflict with the previous efficiency goals in the area of database design [18], or log size explosion (a read operation has to be followed by a write operation) [16]. In case of technologies like distributed ledger, substantial problems must be solved in order to comply with the right-to-be-forgotten (RtbF) principle [10]. In some cases – like persuasive systems, where a user takes advice by looking at experience of others [17] – the GDPR requirements may block development. It has also been observed that the procedures introduced due to GDPR may themselves create attack opportunities, despite the original intention. [12] shows such attacks based on abusing the right to access data by the data subject. New Concepts. A interesting paradigm of the right-not-to-be-deceived has been proposed in [14]. It addresses the problem of user manipulation by personalization of the information contents provided to him. This could concern the cases when the user is actually willing to accept the biased information. Enforcing the right-not-to-be-deceived would dramatically change the current information processing landscape. 4.3

Industry and Independent Institutions

The controversies about GDPR have started already before it was adopted by the EU. In 2014, the decision by the Court of Justice of the European Union (CJEU) in the Google versus Spain case said that a search engine operator is responsible for processing personal information originating from web pages published by third parties. This decision has been later reflected by the right-to-be-forgotten and the right to data rectification – the most fundamental principles of GDPR. The controversy is that the decision creates an obligation to evaluate data and remove links not only by the parties holding the source data, but also by the parties processing already published data. Feasibility of the right-to-be-forgotten (RtbF) has been in focus of, both, industry and the academic community [13]. One of the interesting legal concepts was reputation bankruptcy – a very controversial one, as the rights and freedoms of a bankrupt person are in conflict with the rights and conflicts of other persons. Balancing these rights or a selective application of RtbF in information systems is a procedural nightmare creating substantial legal risks. RtbF creates also severe technical problems. It has been pointed out that physical removal of data may take a long time (like 180 days!). Fortunately, the GDPR regulation does not specify a concrete time limit, but refers to an undue delay. Nevertheless, the understanding of an undue delay may be different for a data controller and a supervision authority. Responding to the demand, some technical concepts have been developed. However, already an overview from 2012 by ENISA [6] has warned about discrepancy between the expectations and the technical reality. Thus far, the technological situation has not changed enough to withdraw this warning.

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

753

Another heavily discussed issue was article 22 and the right for a human review of an automated decision. While the regulation could sound reasonable for early IT systems, it ignores complexity of the modern state-of-the-art. For instance, if the decision is based on statistical correlation retrieved from big data, how one can review the outcome and explain the grounds for a data subject [15]? The AI industry in Europe warns that GDPR creates severe problems for AI computing. Apparently, it has been overlooked by the legislators, as the immediate consequence is that the European industry is losing the competition with USA and China1 . Article [20] points to the following critical issues: 1. Requiring companies to manually review significant algorithmic decisions raises the overall cost of AI. 2. The right to explanation could reduce AI accuracy. 3. The right to erasure could damage AI systems. 4. The prohibition on re-purposing data will constrain AI innovation. 5. Vague rules could deter companies from using de-identified data. 6. The GDPR’s complexity will raise the cost of using AI. 7. The GDPR increases regulatory risks for firms using AI. 8. Data-localization requirements raise AI costs. A deep reform of GDPR in the context of AI has been proposed in [2]. The goals proposed are expanding authorized uses of AI in the public interest, allowing re-purposing of data that poses only minimal risk, not penalizing automated decision-making, permitting basic explanations of automated decisions, and making fines proportional to harm. The problems for data processing has been also detected by the European authorities (see e.g. [5] for B2B communication problems). The data processing may involve very dynamic manipulation of data, while many GDPR concepts are focused on static data, which is not the emerging reality of modern information systems. During the corona virus epidemics, it became evident that the current strict protection of personal data and lack of efficacy to make exempts from these rules for the vital public interests, makes GDPR one of the severe obstacles in the fight against corona virus [3]. Facing the catastrophic situation, Italy introduced extraordinary rules for collection and processing on medical data. However, [3] compares such rules to sticking plaster on a wooden leg and advocates for an international uniform framework. Definitely, the current situation proves that GDPR has been designed having in mind a standard situation. However, the most important case for evaluating efficacy of a security and safety framework are exceptional and critical situations. For example, while the Google-Apple mechanism for contact tracking carefully attempts to avoid any personal data processing, what about a request of an identifiable person (e.g. a well-known actress) to exercise the RtbF in relation to the BLE signals sent from her smart phone? Obviously, such a request when compared to the common good is a total absurd! 1

Also in P.R.C. there are opinions pointing to the conflict between the recent cybersecurity law and its data protection chapter on one hand and feasibility of AI data processing.

754

5

M. Kutylowski et al.

Conclusions

We have presented and exemplified a number of areas and situation which demonstrate shortcomings in the current GDPR rules. Primarily, big data constraints, dynamic and evolving data manipulation and processing, cryptographic tools and techniques, evolving communication and distributed computing configurations, pose new ways by which private data is treated, and pause challenges to GDPR. We also proposed some suggestions (to be considered as initial attempts at remedy), and reviewed current related activities.

References 1. Arfelt, E., Basin, D., Debois, S.: Monitoring the GDPR. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 681–699. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29959-0 33 2. Castro, D., Chivot, E.: The EU needs to reform the GDPR to remain competitive in the algorithmic economy. Center for Data Innovation (2019). https:// www.datainnovation.org/2019/05/the-eu-needs-to-reform-the-gdpr-to-remaincompetitive-in-the//-algorithmic-economy/ 3. Chivot, E.: COVID-19 crisis shows limits of EU data protection rules and AI readiness. Center for Data Innovation (2020). https://www.datainnovation.org/2020/ 03/covid-19-crisis-shows-limits-of-eu-data-protection-rules-and//-ai-readiness/ 4. Cohen, A., Nissim, K.: Towards formalizing the GDPR’s notion of singling out. CoRR abs/1904.06009 (2019). http://arxiv.org/abs/1904.06009 5. Directorate-General for Communications Networks: Study on data sharing between companies in Europe. The European Commission (2018). https:// publications.europa.eu/en/publication-detail/-/publication/8b8776ff-4834-11e8be1d-01aa75ed71a1/language-en 6. Druschel, P., Backes, M., Tirtea, R.: The right to be forgotten - between expectations and practice. ENISA (2012). https://www.enisa.europa.eu/publications/theright-to-be-forgotten/at download/fullReport 7. Ellison, C.M.: SPKI requirements. RFC 2692, 1–14 (1999). https://doi.org/10. 17487/RFC2692 8. EU Presidency: Proposal for a Regulation of the European Parliament and of the Council concerning the respect for private life and the protection of personal data in electronic communications and repealing Directive 2002/58/EC (Regulation on Privacy and Electronic Communications) (amendments) (2020). https:// privacyblogfullservice.huntonwilliamsblogs.com/wp-content/uploads/sites/28/ 2020/02/CONSIL ST 5979 2020 INIT EN TXT.pdf 9. European Data Protection Supervisor: Annual report 2019 (2019). https://edps. europa.eu/sites/edp/files/publication/2020-03-17 annual report 2020 en.pdf 10. Farshid, S., Reitz, A., Roßbach, P.: Design of a forgetting blockchain: A possible way to accomplish GDPR compatibility. In: Bui, T. (ed.) 52nd Hawaii International Conference on System Sciences, HICSS 2019, Grand Wailea, Maui, Hawaii, USA, 8–11 January 2019, pp. 1–9. ScholarSpace/AIS Electronic Library (AISeL) (2019). http://hdl.handle.net/10125/60145

GDPR – Challenges for Reconciling Legal Rules with Technical Reality

755

11. Labadie, C., Legner, C.: Understanding data protection regulations from a data management perspective: a capability-based approach to EU-GDPR. In: Ludwig, T., Pipek, V. (eds.) Human Practice. Digital Ecologies. Our Future. 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), 24–27 February 2019, Siegen, Germany, pp. 1292–1306. University of Siegen, Germany/AISeL (2019). https:// aisel.aisnet.org/wi2019/track11/papers/3 12. Martino, M.D., Robyns, P., Weyts, W., Quax, P., Lamotte, W., Andries, K.: Personal information leakage by abusing the GDPR ‘right of access’. In: Lipford, H.R. (ed.) Fifteenth Symposium on Usable Privacy and Security, SOUPS 2019, Santa Clara, CA, USA, 11–13 August 2019. USENIX Association (2019). https://www. usenix.org/conference/soups2019/presentation/dimartino 13. Politou, E.A., Alepis, E., Patsakis, C.: Forgetting personal data and revoking consent under the GDPR: challenges and proposed solutions. J. Cybersecur. 4(1), 1–20 (2018). https://doi.org/10.1093/cybsec/tyy001 14. Reviglio, U.: Towards a right not to be deceived? An interdisciplinary analysis of media personalization in the light of the GDPR. In: Pappas, I.O., Mikalef, P., Dwivedi, Y.K., Jaccheri, L., Krogstie, J., M¨ antym¨ aki, M. (eds.) I3E 2019. IAICT, vol. 573, pp. 47–59. Springer, Cham (2020). https://doi.org/10.1007/978-3-03039634-3 5 15. Roig, A.: Safeguards for the right not to be subject to a decision based solely on automated processing (article 22 GDPR). Eur. J. Law Technol. 8(3) (2017). http://ejlt.org/article/view/570 16. Shah, A., Banakar, V., Shastri, S., Wasserman, M., Chidambaram, V.: Analyzing the impact of GDPR on storage systems. In: Peek, D., Yadgar, G. (eds.) 11th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2019, Renton, WA, USA, 8–9 July 2019. USENIX Association (2019). https://www. usenix.org/conference/hotstorage19/presentation/banakar 17. Shao, X., Oinas-Kukkonen, H.: How does GDPR (General Data Protection Regulation) affect persuasive system design: design requirements and cost implications. In: Oinas-Kukkonen, H., Win, K.T., Karapanos, E., Karppinen, P., Kyza, E. (eds.) PERSUASIVE 2019. LNCS, vol. 11433, pp. 168–173. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17287-9 14 18. Shastri, S., Banakar, V., Wasserman, M., Kumar, A., Chidambaram, V.: Understanding and benchmarking the impact of GDPR on database systems. PVLDB 13(7), 1064–1077 (2020). http://www.vldb.org/pvldb/vol13/p1064-shastri.pdf 19. The European Parliament and the Council of the European Union: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/ec (General Data Protection Regulation). Off. J. Eur. Union 119(1) (2016) 20. Wallace, N., Castro, D.: The impact of the EU’s new data protection regulation on AI. Center for Data Innovation (2018). http://www2.datainnovation.org/2018impact-gdpr-ai.pdf

Author Index

Abu Jabal, Amani I-523 Agrawal, Shweta I-42 Aiolli, Fabio II-165 Akkaya, Kemal II-734 Alcaraz, Cristina I-174 Alves, Fernando I-217 Andongabo, Ambrose I-217 Aparicio-Sánchez, Damián II-230 Ateniese, Giuseppe I-502 Au, Man Ho I-591 Ayday, Erman I-110 Bai, Guangdong I-399 Balagani, Kiran I-720 Belleville, Nicolas I-440 Bernaschi, Massimo I-502 Bertino, Elisa I-523 Bessani, Alysson I-217 Beyah, Raheem I-257 Bossuat, Angèle I-193 Boyen, Xavier II-336 Brasser, Ferdinand I-377 Bultel, Xavier I-193 Cachin, Christian I-133 Cagli, Eleonora I-440 Calo, Seraphin I-523 Calzavara, Stefano I-23, II-421 Camtepe, Seyit I-569 Cao, Yang I-655 Cardaioli, Matteo I-720 Cebe, Mumin II-734 Chen, Bo I-67 Chen, Chen I-399 Chen, Kai I-338 Chen, Li I-461 Chen, Liwei I-338 Chen, Wenzhi I-257 Chen, Xiaofeng I-611 Chen, Xiaojun I-419 Chen, Xihui II-185 Chen, Yu I-591 Chiang, Yu-Hsi II-251

Chow, Ka-Ho I-545, II-460 Ciampi, Michele II-590 Cohen, Dvir I-88 Collins, Daniel I-133 Compagna, Luca I-23 Conti, Mauro I-277, I-720, II-165 Cornélie, Marie-Angela I-440 Cortier, Véronique II-3 Costache, Anamaria II-546 Couroussé, Damien I-440 Crain, Tyler I-133 Curtis, Caitlin I-399 Dalskov, Anders II-654 Delaune, Stéphanie II-3 Deng, Robert H. I-3 Dmitrienko, Alexandra I-377 Dong, Ye I-419 Dreier, Jannik II-3 Du, Yunlan I-316 Dumas, Cécile I-440 Duong, Dung Hoang II-107 Duong, Tuyet II-697 Elovici, Yuval I-88 Erdin, Enes II-734 Erkin, Zekeriya II-378 Ersoy, Oğuzhan II-378 Escobar, Santiago II-230 Esgin, Muhammed F. II-378 Fan, Lei II-697 Fang, Binxing I-419 Ferrara, Pietro II-421 Ferreira, Pedro M. I-217 Fouque, Pierre-Alain I-193 Fu, Lirong I-257 Garg, Rachit I-42 Gashi, Ilir I-217 Gasti, Paolo I-720 Gilboa-Markevich, Nimrod Gramoli, Vincent I-133

II-42

758

Author Index

Große-Kampmann, Matteo II-272 Guo, Fuchun I-611 Gursoy, Mehmet Emre I-480, I-545, II-460 Gutiérrez, Raúl II-230 Haines, Thomas II-336 Halimi, Anisa I-110 Hanaoka, Goichiro II-65, II-674 Hayata, Junichiro II-674 Holz, Marco II-401 Holz, Thorsten II-272 Hou, Botao II-131 Hou, Y. Thomas II-610 Hranický, Radek I-701 Hsiao, Hsu-Chun II-251 Huang, Weiqing I-295 Huang, Xinyi I-3 Humayed, Abdulmalik I-153 Jancar, Jan II-209 Janovsky, Adam II-505 Ji, Shouling I-257 Jiang, Jianguo I-295 Jiang, Weipeng I-359 Jin, Hongxia I-655 Jing, Jiwu I-67 Johnsten, Tom I-461 Kamp, Manuel I-88 Karayannidis, Nikos II-590 Katz, Jonathan II-697 Keller, Marcel II-654 Këpuska, Ema II-185 Kiayias, Aggelos II-590 Kim, Tiffany Hyun-Jin II-251 Kiss, Ágnes II-401 Ko, Ryan K. L. I-399 Koisser, David I-377 Kolář, Dušan I-701 Komatsu, Misaki II-65 Küennemann, Robert II-525 Kumar, Nishant I-42 Kurt, Ahmet II-734 Kutyłowski, Mirosław I-736 Laine, Kim II-546 Lauks-Dutka, Anna I-736

Law, Mark I-523 Le, Huy Quoc II-107 Lee, Wei-Han I-257 Li, Fengjun I-153 Li, Gang I-295 Li, Gengwang I-295 Li, Jin I-461, II-610 Li, Jinfeng I-338 Li, Weiping II-146 Li, Xue I-399 Li, Yongqiang II-131 Lin, Jingqiang I-153 Lin, Yueh-Hsun I-316 Liu, Chao I-295 Liu, Jinfei I-655 Liu, Limin I-67 Liu, Ling I-480, I-545, II-460 Liu, Peiyu I-257 Liu, Wenling II-357 Liu, Zhen II-357 Lobo, Jorge I-523 Loh, Jia-Chng I-3 Loper, Margaret I-545 Lopez, Javier I-174 Lou, Jiadong I-461 Lou, Wenjing II-610 Lu, Kangjie I-257 Lu, Tao I-257 Lu, Yuan II-713 Lucchese, Claudio II-421 Luo, Bo I-153 Lv, Bin I-295 Lv, Zhiqiang I-295 Lyerly, Robert I-237 Ma, Xuecheng I-591 Maingault, Laurent I-440 Mao, Bing I-316 Martin, Tobias I-88 Masure, Loïc I-440 Matsuura, Kanta II-674 Matyas, Vashek II-505 Mauw, Sjouke II-185 McMurtry, Eleanor II-23 Meng, Dan I-338 Mikuš, Dávid I-701 Mirsky, Yisroel I-88

Author Index

Mo, Tong II-146 Mosli, Rayan II-439 Müller, Johannes II-336 Nemati, Hamed II-525 Nemec, Matus II-505 Nepal, Surya I-569 Neureither, Jens I-377 Nguyen, Khoa II-357 Ning, Jianting I-3 Ning, Zhenyu I-316 Ohara, Kazuma Oleynikov, Ivan Onete, Cristina Orlandi, Claudio

II-65 I-677 I-193 II-654

Pagnin, Elena I-677 Pan, Jiaxin II-485 Pan, Yin II-439 Park, Jeongeun II-86 Pasquini, Dario I-502 Paul, Sebastian II-295 Pereira, Olivier II-23 Picek, Stjepan II-165 Pieprzyk, Josef II-107 Pijnenburg, Jeroen I-635 Player, Rachel II-546 Poettering, Bertram I-635 Poh, Geong Sen I-3 Pohlmann, Norbert II-272 Polato, Mirko II-165 Prabhakaran, Manoj I-42 Puzis, Rami I-88 Ramírez-Cruz, Yunior II-185 Rathee, Deevashwer II-401 Ravindran, Binoy I-237 Ringerud, Magnus II-485 Rios, Ruben I-174 Roman, Rodrigo I-174 Ruan, Na II-569 Rubio, Juan E. I-174 Russo, Alessandra I-523 Ryšavý, Ondřej I-701 Sabelfeld, Andrei I-677 Sadeghi, Ahmad-Reza I-377 Sakai, Yusuke II-65

Sapiña, Julia II-230 Scheible, Patrik II-295 Schneider, Thomas II-401 Schuldt, Jacob C. N. II-674 Sedlacek, Vladimir II-209 Sekan, Peter II-505 Shabtai, Asaf I-88 Shen, Jun I-611 Shen, Liyan I-419 Shen, Yilin I-655 Shi, Gang I-338 Shi, Jinqiao I-419 Shrishak, Kris II-654 Shulman, Haya II-654 Su, Chunhua II-569 Sultan, Nazatul Haque I-569 Sun, Hanyi II-569 Susilo, Willy I-611, II-107 Svenda, Petr II-209, II-505 Takagi, Shun I-655 Tang, Cong I-591 Tang, Qiang II-713 Tatang, Dennis II-272 Teague, Vanessa II-23 Thai, Phuc II-697 Tian, Haibo II-317 Tian, Linan I-338 Tibouchi, Mehdi II-86 Tolpegin, Vale I-480 Tricomi, Pier Paolo I-277 Truex, Stacey I-480, I-545, II-460 Tsudik, Gene I-277 Tzeng, Nian-Feng I-461 Uluagac, A. Selcuk II-734 Urban, Tobias II-272 van der Merwe, Thyla I-193 Varadharajan, Vijay I-569 Verma, Dinesh I-523 Veronese, Lorenzo I-23 Wang, Guiling II-713 Wang, Xiaoguang I-237 Wang, Zhilong I-316 Wei, Wenqi I-545, II-460 Weng, Jian I-3 Wool, Avishai II-42

759

760

Author Index

Wright, Matthew II-439 Wu, Bin I-359, II-131 Wu, Bo II-146 Wu, Yanzhao I-545, II-460 Xiao, Yang II-610 Xiao, Yonghui I-655 Xing, Xinyu I-316 Xiong, Li I-655 Xu, Jun I-316 Xu, Qizhen I-338 Xu, Shengmin I-3 Xu, Xiaofeng I-655 Xue, Rui I-359 Yamada, Shota II-65 Yang, Guomin II-357 Yoshikawa, Masatoshi I-655 You, Weijing I-67 Yu, Chia-Mu II-251 Yu, Min I-295

Yu, Xingxin I-359 Yu, Yu II-357 Yu, Zhengmin I-359 Yuan, Bo II-439 Yuan, Xu I-461 Yung, Moti I-736 Zhang, Fangguo II-317 Zhang, Fengwei I-316 Zhang, Ning II-610 Zhang, Rong II-146 Zhang, Xuhong I-257 Zhang, Yanjun I-399 Zhang, Yihe I-461 Zhang, Zhuoran II-317 Zhao, Haoyue II-131 Zhao, Yunlei II-633 Zhou, Hong-Sheng II-697 Zhou, Qifei II-146 Zindros, Dionysis II-590 Zobal, Lukáš I-701