327 49 1MB
English Pages 87 [86] Year 2023
Synthesis Lectures on Computer Architecture
Rashmi Agrawal · Ajay Joshi
On Architecting Fully Homomorphic Encryption-based Computing Systems
Synthesis Lectures on Computer Architecture Series Editor Natalie Enright Jerger, University of Toronto, Toronto, Canada
This series covers topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. The scope will largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and ASPLOS.
Rashmi Agrawal · Ajay Joshi
On Architecting Fully Homomorphic Encryption-based Computing Systems
Rashmi Agrawal Boston University Boston, MA, USA
Ajay Joshi Boston University Boston, MA, USA
ISSN 1935-3235 ISSN 1935-3243 (electronic) Synthesis Lectures on Computer Architecture ISBN 978-3-031-31753-8 ISBN 978-3-031-31754-5 (eBook) https://doi.org/10.1007/978-3-031-31754-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
We live in an increasingly data-driven world, where both individuals and business entities commonly use third-party cloud services to process their data. Third-party cloud services provide flexibility and scalability for data processing. However, the use of these third-party cloud services also creates the possibility of data getting leaked. Maintaining data privacy in the cloud while providing flexibility and scalability is a challenge. At one extreme, one can uphold the confidentiality of private data by imposing a strict process where data is only seen by a trusted human expert, but that comes at the cost of making the data processing limited, slow, and expensive. At the other extreme, one can use compute models that provide excellent flexibility and scalability, but at the expense of requiring data to be in clear text for processing. Effectively, there exists a trade-off between data privacy and data processing efficiency. What we need is a balanced solution that provides the best of both worlds. Over the years, various privacy-preserving techniques such as differential privacy, secure multi-party computation (SMPC), and fully homomorphic encryption (FHE) have emerged that allow us to reconcile privacy and data processing efficiency. In this book, we focus on FHE-based privacy-preserving technique. FHE allows outsourcing of computations on private data to third-party cloud servers by enabling operations on the ciphertext. Effectively, a user encrypts his/her private data and sends it to the cloud. The cloud server does not need to decrypt the data while computing on it and generates an encrypted result. Only the user can decrypt the generated encrypted result. We focus on one of the lattice-based FHE schemes, i.e., the Cheon–Kim–Kim–Song (CKKS) scheme, that allows operating on encrypted real numbers, which enables broader usage of FHE. We provide a detailed description of all the operations in the CKKS FHE scheme and explain how to design an end-to-end CKKS-based application. Out of all the CKKS operations, the bootstrapping operation is the most expensive in terms of both compute and memory. Using the analysis of the bootstrapping operation on various hardware backends (CPU, GPU, FPGA, and ASIC), we provide insights on the challenges associated with performing practical FHE-based computing on different hardware backends.
v
vi
Preface
Organization This book is organized as follows: • Chapter 1 introduces the notion of privacy-preserving computing (PPC) along with a brief description of various cryptographic and non-cryptographic PPC techniques. This chapter also introduces the FHE-based PPC technique, and provides its history and background. • As the focus of this book is on CKKS FHE scheme, Chapter 2 describes in detail all the primitive operations in the CKKS FHE scheme by explicitly writing out algorithms for each of these operations. Chapter 2 also describes the bootstrapping operation, and then presents a complete FHE-based computing example. • Chapter 3 first describes the key performance bottlenecks in the CKKS FHE scheme and then presents the profiling results (based on arithmetic intensity analysis) of individual primitive operations. This chapter also describes the bootstrapping operation that has been designed using these primitive operations followed by its arithmetic intensity analysis. Finally, this chapter discusses the arithmetic intensity analysis of an example CKKS-based logistic regression training application. Based on various arithmetic intensity analysis, this chapter concludes that the bootstrapping operation and its encompassing CKKS-based applications are memory bound. • Chapter 4 presents an analysis of CKKS-based computing on example CPU, GPU, ASIC, and FPGA. This analysis reveals that when executing CKKS-based applications, both CPUs and GPUs are limited in performance because of the limited main memory bandwidth, ASICs outperform CPUs and GPUs but are expensive, and FPGAs provide an affordable and practical solution. This chapter describes several trade-offs that we need to consider when using each of these compute platforms for CKKS-based computing. • Chapter 5 concludes the book with a description of some of the emerging architectures and approaches that could be used for accelerating FHE-based applications. Boston, USA August 2022
Rashmi Agrawal Ajay Joshi
Acknowledgements
The authors are grateful to Natalie Enright Jerger, editor and Charles Glaser & Ambrose Berkumans (Springer Nature) for their support throughout the entire process of preparing this book. This book would not have been possible without their patience and encouragement. Many thanks to our collaborators at BU, MIT, NEU, KAIST, UCAM, and Analog Devices for FHE-based computing research and several technical discussions. The book draws upon a couple of years of research with these collaborators and countless discussions. We also greatly appreciate the feedback from anonymous reviewers of this book. The authors are grateful for the research sponsorship from Red Hat. Finally, the authors deeply appreciate their family members for their unconditional love and support. Boston, USA August 2022
Rashmi Agrawal Ajay Joshi
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Privacy-Preserving Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 A Brief History of HE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 First-Generation HE Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Second-Generation HE Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Third-Generation HE Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Alternate Cryptographic Technique for PPC . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Multi-party Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 HE Versus MPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Non-cryptographic Techniques for PPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Anonymization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 HE Versus Non-cryptographic PPC Techniques . . . . . . . . . . . . . . . . . 1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 5 7 8 9 10 11 11 12 13 13 13 14 14 15
2 The CKKS FHE Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 CKKS Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Basic Compute Optimizations in CKKS Scheme . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Residue Number System (RNS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 NTT/iNTT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Key Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Public Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Switching Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Client-Side Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19 20 21 22 22 23 25 25 25 26 26 27 27 ix
x
Contents
2.4.4 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.5 Example of Encode and Decode Operation . . . . . . . . . . . . . . . . . . . . . 2.5 Server-Side Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Rescale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Rotate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Conjugate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.6 Key Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.7 Noise Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.8 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Example of an FHE-Based Computing with CKKS Scheme . . . . . . . . . . . . 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28 28 29 30 30 32 33 35 35 39 40 44 46 46
3 Architectural Analysis of CKKS FHE Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Compute Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Modular Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 NTT/iNTT Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Choice of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Memory Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 On-chip Memory Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Main Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Changing Data Access Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Arithmetic Intensity Analysis of Basic Operations . . . . . . . . . . . . . . . . . . . . . 3.4 Arithmetic Intensity Analysis of Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Arithmetic Intensity Analysis of LR Model Training . . . . . . . . . . . . . . . . . . . 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 50 50 50 51 52 53 54 54 55 56 57 58 59 59
4 Designing Computing Systems for CKKS FHE Scheme . . . . . . . . . . . . . . . . . . . 4.1 CPU-Based Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Compute Versus Memory Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 GPU-Based Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Compute Versus Memory Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 FPGA-Based Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Compute Versus Memory Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 ASIC-Based Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Compute Versus Memory Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61 61 63 64 65 66 67 68 69 69 70
Contents
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Increase Main Memory Bandwidth with Improved Utilization . . . . 5.1.2 Use In-Memory/Near-Memory Computing/Wafer-Scale Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Future Improvements to FHE Schemes . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
73 74 74 75 75 76
1
Introduction
Recent decades have seen a surge in the amount and diversity of data collected and processed by computing systems. Our online transactions, search queries, web browsing history, our movie preferences and videos we watch, and fitness goals are just a few examples of data that are being collected and processed on a daily basis. Several data privacy regulations such as General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and Health Insurance Portability and Accountability Act (HIPAA) restrict access to this personal and sensitive data. The organizations typically upload this data to a centralized location, i.e., a third-party cloud server to utilize the wide variety of cloud services. However, following the guidelines of the data privacy regulations, the sensitive parts of the data are encrypted and then uploaded to the cloud server. The rest of the data is uploaded in plaintext form. Cloud service providers decrypt this sensitive data within a trusted execution environment such as Intel Software Guard Extensions (SGX) (Costan and Devadas 2016) and then operate on it within a secure enclave. However, it is possible to extract information about the private data sets even if the data is decrypted within a secure enclave (Kocher et al. 2019; Lipp et al. 2018; Murdock et al. 2020; Götzfried et al. 2017; Biondo et al. 2018; Schwarz et al. 2019). Moreover, working with secure enclaves requires an efficient key storage and management technique so that the secret key used for data decryption (within the secure enclave) itself is secure (van Bulck et al. 2018). Effectively, while the cloud service providers impart the convenience of a variety of services and large-scale compute capabilities, they cannot always guarantee the confidentiality of user data (Popovi´c et al. 2010) because the compute nodes in the cloud operate on the data in plaintext format (unencrypted form). The private data is susceptible to both insider and outsider attacks not only within the organization collecting the data but also within the third-party cloud service provider that processes the data. To prevent insider attacks, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Agrawal and A. Joshi, On Architecting Fully Homomorphic Encryption-based Computing Systems, Synthesis Lectures on Computer Architecture, https://doi.org/10.1007/978-3-031-31754-5_1
1
2
1 Introduction
both the data-collecting organizations and the cloud service providers deploy a role-based access control that grants access to private data to only authorized users. To overcome the issue of outsider attacks, the data collection organizations themselves can set up a private cloud environment instead of sharing the data with a third-party cloud service provider. It is observed that even this private cloud environment is susceptible to data leakage (Kaur et al. 2017) just like the third-party cloud service provider environment. Consequently, organizations need to deploy appropriate privacy-enhancing techniques that enable the required functionalities while maintaining data privacy. Below we describe some of the existing privacy-preserving computing techniques along with some concrete examples. To be self-contained, we discuss both relevant non-cryptographic and cryptographic solutions that can be used to efficiently protect data uploaded to the cloud while enabling a variety of cloud services.
1.1
Privacy-Preserving Computing
Privacy-preserving computing (PPC) techniques enable secure computation and analysis of data without revealing the content of the data. Thus, PPC can be leveraged in a wide variety of application domains. In the medical domain, PPC can be used for genome analysis for precision medicine, medical imaging-based diagnosis, lifestyle tracking, and the study of rare diseases. In marketing analysis, the purchase history of a consumer can be analyzed for the purposes of personalized advertising. Similarly, in the finance domain, financial transaction data of users can be used to predict fraudulent transactions, detect credit card frauds, predict stock prices and invest money, etc. Capabilities to process survey results privately can help in building anonymous voting systems or gathering product ratings/reviews. Even energy companies can secure their smart grids by adding privacy-preserving compute capabilities to their grid. Many government organizations can utilize PPC for crime prevention, to prevent tax evasion, solving police cases, judicial systems, and many more. PPC can also be utilized in many other applications like secure multi-keyword search, private information retrieval (PIR), and homomorphic spam filters. By enabling large-scale data sharing between organizations, PPC can also contribute towards federated machine learning applications. Over the years, both academia and industry have developed a variety of PPC methods including anonymization, differential security, HE, and multi-party computing. To choose a PPC method, we need to answer the following questions: • How many parties are involved in the process? • Who is allowed to access the data in plaintext form? • What is the size of raw data and size of processed data corresponding to each of the PPC techniques? • What is the data storage overhead for each of the PPC techniques?
1.1
Privacy-Preserving Computing
3
• If the data is used for privacy-preserving machine learning, where should the training and prediction take place? Does the training involve data from multiple users i.e., are we doing federated machine learning? • How much computation time and capacity can one afford? What compute platform will balance the compute and the data transfers? • How much communication cost can one afford? • What level of security needs to be provided? Typically, all PPC techniques involve multiple parties—one or more input parties (IP) that contribute the data, one or more independent computing parties (CP) that can individually or jointly operate on the data following some protocols, and one or more result parties (RP) that receive the results of the computation. Depending upon the PPC technique, an organization may undertake the roles of all three parties. PPC techniques can be broadly classified into two categories based on whether they use a non-cryptographic or cryptographic approach. Non-cryptographic approaches such as anonymization and differential privacy rely on simple randomization or anonymization techniques and are not math-heavy. However, these non-cryptographic approaches may need to depend on a trusted third party to provide any strong security guarantees. Cryptographic approaches such as homomorphic encryption and multi-party computation allow an individual or group of participants to securely calculate any function of their joint inputs using encryption schemes. The use of cryptography removes the need for a trusted third party in these PPC techniques. Below we first provide an overview of the homomorphic encryptionbased PPC technique, and then briefly discuss the alternate PPC techniques including both cryptographic and non-cryptographic techniques. To describe the various PPC techniques over the next few sections, we use an example privacy-preserving machine learning application that works with private user data. We choose a machine learning application example because machine learning is commonly used in healthcare applications where data privacy is a major concern. For example, machine learning plays a central role in the development of precision medicine, whereby treatment is tailored to the clinical or genetic features of the patient using the patient’s private genome data. However, this requires collecting and sharing a patient’s genome data with several parties, which leads to privacy concerns. We consider two use cases that require sharing of a patient’s genome data. The first use case involves an individual patient with personal genome data stored on his/her local machine. This patient’s genome data can be used for training an existing machine learning model to improve the model’s accuracy or the data can be used as input to the machine learning model to infer if the patient has any disease. Here the machine learning model can be unencrypted or encrypted form. In both the inference and the training scenarios, the patient will have to transmit the private genome data to the cloud via a hospital, and the
4
1 Introduction
Plaintext data
Patient with personal genome data on local machine
Plaintext results Third-party cloud server
Fig. 1.1 A naive implementation where a client shares his/her personal genome data, in plaintext form, with a third-party cloud server, and gets a response back also in plaintext form
Patient-1 with personal genome data on local machine
Plaintext data
Plaintext results
Third-party cloud server
Patient-2 with personal genome data on local machine
Fig. 1.2 Two patients share their personal genome data (in plaintext form) with a hospital that is sent to a third-party cloud server for federated machine learning model training
cloud will send a response back. In a naive implementation, the patient might share the data with the cloud in plaintext form and the cloud will send a response back in plaintext form as well. This scenario is depicted in Figure 1.1. The second use case requires data from more than one individual. For example, a hospital or a group of hospitals want to develop a machine-learning model using private genome data from multiple patients. Figure 1.2 shows such a scenario in a simplistic way where the private genome data from two patients is received by an individual hospital. The hospital, with limited compute capabilities, sends this data to the third-party cloud server for training a machine learning model (which may or may not be in encrypted form). Both use cases pose serious threats to patients’ private genome data while data is stored, in transit, or processed.
1.2
1.2
Homomorphic Encryption
5
Homomorphic Encryption
Homomorphic encryption (HE) is an enhanced form of encryption that supports arbitrary computations on encrypted data. For example, as shown in Figure 1.3, a client can encrypt his/her data and then send it to a third-party cloud for processing. The third-party cloud can perform computations on this encrypted data, and send the encrypted results back to the client. Only the client, who owns the secret key that was used for encryption, can now decrypt the result. The decrypted result of this homomorphic evaluation is the same as the result that the client would have got if the cloud performed the same operations on the plaintext data. In a HE-based computing model, given that no encryption or decryption happens on the cloud side, the client does not need to share his/her secret keys with the cloud and thus the data is never in its plaintext form on the cloud. Effectively, the user data remains private and is not accessible to anyone (including a super user in the cloud). Rivest et al. (1978) proposed the fundamental idea of HE-based computing in 1978. As the encryption schemes kept developing, HE operations were tried using each encryption scheme. However, each encryption scheme could support only partial homomorphic operations. For example, Paillier cryptosystem (Paillier 2005) supported only additive homomorphism while Elgamal (ElGamal 1985) and RSA (Rivest et al. 1983) encryption schemes supported only multiplicative homomorphism. Therefore, the homomorphic operations supported by these encryption schemes are known as partial homomorphic operations and have limited practical applications. It was only in 2009 that Gentry’s seminal work (Gentry et al. 2009) demonstrated that the lattice-based encryption schemes (Lyubashevsky et al. 2010) can be leveraged to perform both the homomorphic addition and multiplication operations. Using Gentry’s work as the foundation, researchers have developed a variety of lattice-based HE schemes such as Gentry–Sahai–Waters (GSW) (Gentry et al. 2013), Brakerski–Gentry–Vaikuntanathan (BGV) (Brakerski et al. 2012), Fan–Vercauteren (FV) (Fan et al. 2012), Cheon–Kim– Kim–Song (CKKS) (Cheon et al. 2017), Fastest Homomorphic Encryption in the West (FHEW) (Ducas and Micciancio 2015), and Fully Homomorphic Encryption scheme over
Encrypted data
Encrypted results Patient with personal genome data on local machine
Fig. 1.3 Secure computations over the third-party cloud using HE
Third-party cloud server with HE compute capabilities
6
Patient-1 with personal genome data on local machine
1 Introduction
Encrypted data
Encrypted result
Third-party cloud server with Multi-key HE compute capabilities
Patient-2 with personal genome data on local machine
Fig. 1.4 Secure computations over the third-party cloud using MKHE
the Torus (TFHE) (Chillotti et al. 2020). These schemes are post-quantum secure (Bernstein and Lange 2017) as they rely on lattice-based cryptography, which uses complex mathematical problems that remain NP-hard even for quantum computers. The HE schemes mentioned above only allow computation on ciphertexts that are decryptable under the same secret key and are suitable for the first use case mentioned above. In contrast, a multi-key HE (MKHE) scheme (López-Alt et al. 2017) is a cryptographic primitive supporting arithmetic operation on ciphertexts that are not necessarily decryptable with the same secret key. In addition to solving the aforementioned issues of HE, MKHE can also be used to design multi-party computation (MPC) protocols with minimal rounds incurring low communication cost (Mukherjee and Wichs 2016). In addition, an MPC protocol from MKHE satisfies the on-the-fly MPC (López-Alt et al. 2012) property, where the circuit to be evaluated can be dynamically decided after the data providers upload their encrypted data. For the second use case mentioned above, as shown in Figure 1.4, using MKHE-based computing (Chen et al. 2019) approach, a hospital can perform training/inference for a machine learning model using the encrypted genome data received from multiple patients. Here each patient encrypts his/her private data using their own secret key and then shares it with the hospital. The hospital can then use this encrypted data to perform machine learning training/inference in its own cloud or share this encrypted data with a third-party cloud service provider who will perform the machine learning inference.
1.3
1.3
A Brief History of HE
7
A Brief History of HE
HE schemes include multiple types of encryption schemes that can perform different classes of computations over encrypted data. These computations are either represented as Boolean or arithmetic (including integer and real numbers) circuits.1 Based on the type of HE operation (addition or multiplication) that is supported and the depth of the circuit that can be evaluated using an HE scheme, some common classifications of HE schemes are partially homomorphic encryption, somewhat homomorphic encryption (SHE), leveled fully homomorphic encryption (leveled FHE), and fully homomorphic encryption (FHE) scheme. Partially HE schemes support the evaluation of circuits consisting of only one type of operation/gate, e.g., addition or multiplication. Partially additive HE schemes are also known as PAHE and partially multiplicative HE schemes are known as PMHE schemes. SHE schemes can compute many additions and a small number of multiplications on ciphertexts. Thus, SHE schemes can evaluate both types of gates, but only for a subset of the circuit. For example, SHE can accomplish tasks like computing the sum of given numbers (requires no multiplications), computing standard deviation (requires only one multiplication), and predictive analysis such as logistical regression (requires computing a few multiplications depending on the required precision). Leveled FHE schemes support the evaluation of arbitrary circuits composed of multiple types of gates of bounded (or pre-determined) depth. So, even though both SHE and leveled FHE schemes represent a restricted family of circuits, with leveled FHE one can plan in advance the type of functions (up to what depth) he/she wants to compute. Whereas with SHE this bound on the depth is more universal, in the sense that it does not scale well with the depth. Therefore, leveled FHE makes sense in practice, i.e., use it for some specific FHE function/application in mind. FHE schemes allow the evaluation of arbitrary circuits composed of multiple types of gates of unbounded depth and represent the strongest notion of HE. For the majority of the existing homomorphic encryption schemes, the multiplicative depth2 of circuits is the main practical limitation in performing computations over encrypted data. Figure 1.5 shows the timeline of the development of HE over the last 40 years or so.
1 The HE community borrowed the circuits terminology from the digital logic circuits world as the
initial HE schemes could evaluate only a Boolean circuit consisting of two types of gates, i.e., XOR (for addition) and AND (for multiplication). Current HE schemes provide more functionality but the community still continues to use the “circuit” terminology. 2 The multiplicative depth is the maximal number of sequential homomorphic multiplications that can be performed on fresh ciphertexts such that once decrypted, we retrieve the result of these multiplications correctly.
8
1 Introduction
Fig. 1.5 HE development timeline
1978 Privacy homomorphisms ~30 years Partial HE DGHV ("over the integers" branch) 2010
2009
Gentry's seminal work
2011 BGV
BFV 2012
LWE/RLWE, leveled schemes branch
2012 LTV 2013 (NTRU branch) GSW (LWE-based) 2014 FHEW
CKKS 2016
1.3.1
fast bootstrapping branch
2016 TFHE
First-Generation HE Schemes
Using lattice-based cryptography, Craig Gentry described the first feasible construction of an FHE scheme by enabling both addition and multiplication operations on ciphertexts. This made it possible to construct circuits for performing arbitrary computations. Gentry’s initial proposal was a SHE scheme and was limited to evaluating low-degree polynomials over encrypted data. This limitation is imposed by the noise in ciphertext that is generated using a lattice-based encryption scheme. The noise within the ciphertexts grows as we perform addition and multiplication on the ciphertexts. When this noise grows beyond a critical threshold level, the result of homomorphic computation gets destroyed and cannot be recovered after decryption. By adding a bootstrapping procedure, Gentry demonstrated that an existing SHE scheme can be modified to obtain an FHE scheme. This bootstrapping step is capable of evaluating its own decryption circuit homomorphically to refresh the ciphertext and thus enables further computation on the ciphertexts. After bootstrapping, we can obtain a new ciphertext that encrypts the same value as before but with reduced noise levels. By bootstrapping the ciphertext whenever the noise grows too large, it is possible to compute an arbitrary number of additions and multiplications without destroying the underlying plaintext result. Gentry based the security of his scheme on the assumed hardness of two problems: certain worstcase problems over ideal lattices, and the sparse (or low-weight) subset sum problem. In 2010, Marten van Dijk, Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan presented another FHE scheme namely, DGHV (van Dijk et al. 2010) that uses Gentry’s basic FHE construction, but it is not based on problems over ideal lattices. DGHV scheme is instead
1.3
A Brief History of HE
9
constructed using integers, resulting in a very simple SHE scheme with similar properties with regards to homomorphic operations and efficiency. The somewhat homomorphic component in the work of Van Dijk et al. is similar to an encryption scheme (Levieil and Naccache 2008) proposed by Levieil and Naccache in 2008. Many refinements and optimizations of the scheme of Van Dijk et al. were proposed in a sequence of works (Cheon et al. 2013; Coron et al. 2011, 2012, 2014) by many researchers.
1.3.2
Second-Generation HE Schemes
The HE schemes of this generation are derived from techniques that were developed starting in 2011–2012 by Zvika Brakerski, Craig Gentry, Vinod Vaikuntanathan, and others. This led to the development of much more efficient S/FHE schemes including: • The Brakerski–Gentry–Vaikuntanathan (BGV) scheme (Brakerski et al. 2012) building on techniques of Brakerski and Vaikuntanathan (2014a) using standard lattice problems; • The Brakerski/Fan–Vercauteren (BFV) scheme (Fan et al. 2012) building on Brakerski’s scale-invariant cryptosystem (Brakerski 2012); • The NTRU-based scheme based on the Lopez-Alt, Tromer, and Vaikuntanathan (LTV) scheme (López-Alt et al. 2012); • The NTRU-based scheme by Bos, Lauter, Loftus, and Naehrig (BLLN) scheme (Bos et al. 2013) building on LTV and Brakerski’s scale-invariant cryptosystem; • The Cheon, Kim, Kim, and Song (CKKS) scheme (Cheon et al. 2017), an approximate homomorphic encryption scheme. The security of LTV and BLLN schemes rely on an overstretched (Albrecht et al. 2016) variant of the NTRU computational problem. This NTRU variant was found to be vulnerable to sub-field lattice attacks (Cheon et al. 2016), which is why these two schemes are no longer used in practice. The security of other schemes (BGV, BFV, CKKS) is based on the hardness of the Ring learning with errors (RLWE) problem.3 Both BGV and BFV schemes support HE operations on integer data. The CKKS scheme supports HE operations on real numbers and includes an efficient rescaling operation that scales down an encrypted message after multiplication. For comparison, such rescaling requires bootstrapping in both the BGV and BFV schemes. The rescaling operation makes CKKS scheme the most efficient method for evaluating 3 RLWE is a computational problem that serves as the basis for homomorphic encryption and also as
the foundation of new quantum-safe cryptographic algorithms. RLWE specializes the LWE problem to the polynomial rings over finite fields. The solution to the RLWE problem can be used to solve NPhard shortest vector problem (SVP) in a lattice, which is an important feature for basing cryptography on the RLWE problem. The readers not familiar with RLWE problem should refer (Lyubashevsky et al. 2010) for a detailed explanation.
10
1 Introduction
polynomial approximations and is the preferred approach for implementing privacy-preserving machine learning applications. The scheme introduces several approximation errors (both deterministic and non-deterministic) that require special handling in practice (Kim et al. 2020). All the second-generation schemes follow the basic blueprint of Gentry’s original construction, namely, they first construct a somewhat homomorphic cryptosystem and then convert it to a fully homomorphic cryptosystem using bootstrapping. However, a distinguishing characteristic of the second-generation schemes is that they all feature a much slower noise growth when compared to first-generation schemes during homomorphic computations. Moreover, second-generation schemes even enable homomorphic evaluation of many real-world applications without invoking bootstrapping, instead operating in the leveled FHE mode. Additional optimizations by Craig Gentry, Shai Halevi, and Nigel Smart resulted in schemes with nearly optimal asymptotic complexity (Gentry et al. 2012a, b, c). These optimizations were built using the Smart-Vercauteren techniques (Smart and Vercauteren 2014) that enabled the packing of many plaintext values in a single ciphertext and operating on all these plaintext values in a Single Instruction Multiple Data (SIMD) style operation. Many of these optimizations to the second-generation schemes were also ported to the cryptosystem over the integers (DGHV scheme).
1.3.3
Third-Generation HE Schemes
To get rid of an expensive “relinearization” operation in homomorphic multiplication, in 2013, Craig Gentry, Amit Sahai, and Brent Waters together proposed a new technique for building FHE schemes. Using this new technique, they proposed a new FHE scheme that is more commonly known as the GSW scheme (Gentry et al. 2013), featuring an even slower noise growth for certain types of circuits, and hence better efficiency and stronger security (Brakerski and Vaikuntanathan 2014b). Based on this observation, Jacob AlperinSheriff and Chris Peikert described a very efficient bootstrapping technique (Alperin-Sheriff and Peikert 2014) for the GSW scheme. Further improvements to the proposed bootstrapping technique led to the development of very efficient ring variants of the GSW FHE scheme, i.e., FHEW (Ducas and Micciancio 2015) in 2014 and TFHE (Chillotti et al. 2016) in 2016. In fact, the FHEW scheme was the first HE construction to demonstrate that by refreshing the ciphertexts after every single operation, it is possible to reduce the bootstrapping time to a fraction of a second. FHEW scheme even introduced a new method to compute Boolean gates on encrypted data with a highly simplified bootstrapping procedure. This bootstrapping procedure was a variant of the bootstrapping procedure proposed by Alperin-Sheriff and Peikert for the GSW scheme. The TFHE scheme further improved the efficiency of the FHEW scheme by implementing a ring variant of the bootstrapping procedure (Gama et al. 2016) while using a similar approach as
1.4
Alternate Cryptographic Technique for PPC
11
Table 1.1 Summary of generations of Lattice-based HE schemes Gen.
Schemes
Noise growth
Plaintext datatype
Encrypt level
Public key Compute support
Operation accuracy
Operation type
1
Gentry
Rapid
Bit
Latticebased
Largest
Add, Mult –
2
BGV, BFV, CKKS
Slower
Integer, Integer, Real
Integer
Large
Predefined ops
Approximate Statistical
3
FHEW, TFHE
Slowest
Bit, Integer
Bit, Integer
Small
Any arbitrary op
Exact
–
Any
the FHEW scheme. Table 1.1 summarizes the FHE schemes from all three generations by listing a few key properties of these schemes.
1.4
Alternate Cryptographic Technique for PPC
1.4.1
Multi-party Computation
Multi-party computation (MPC) is a well-researched sub-field of cryptography that makes use of cryptographic protocols to preserve data privacy. With MPC, the data contributed by the participating parties is kept private during the data contribution process and computation happens on this private data without revealing anything to the involved parties. All parties involved in MPC can assume the role of input, compute, and result parties. In MPC protocol, all computing parties interact with each other such that the output is correctly computed and nothing but “only” the output is learned. MPC protocol guarantees this property even if one or more parties are cheating or behaving as adversaries. Typically, MPC protocols make use of Yao’s garbled circuit (Yao 1986) (limited to two parties) and secret sharing scheme (Cramer et al. 2000) (two or more parties). MPC applies to the second use case mentioned above where in more than one patient can jointly compute a function by contributing their private data. If we want to compute the logistic regression model over collected data by two different patients using the secret sharing approach as shown in Figure 1.6, each patient acts as a computation party and the hospital acts as a result party. Each patient computes two shares of their input data (based on the number of computation parties, which is two here) and sends them separately to the other patient. Note that each computation party receives an equal number of dependent X i and independent Yi variables. Each computation party should append the received shares and their corresponding
12
1 Introduction
Fig. 1.6 Secure multi-party computation using secret sharing
A's shares: Xa1,Xa2,Ya1,Ya2
Xa B's shares: Xb1,Xb2,Yb1,Yb2
A computes LR function using Xa1+Xb1 and Ya1+Yb1
)
b (Y
),f
fb ( X
B computes LR function using Xa2+Xb2 and Ya2+Yb2
1
,Y b X b1
fa(X),fa(Y)
2
,Y a 2
Final result generated by adding the result shares
dependent variables in the order they are received. Finally, both the patients (computation parties) send their computed shares of the logistic regression coefficient to the hospital (result party), and the hospital, then, simply sums these shares together to compute the final result.
1.4.2
HE Versus MPC
Table 1.2 provides a head-to-head comparison of HE and MPC. Typically, HE-based computing is dominated by the compute cost and the applications are memory-bound due to a large amount of data, while MPC is dominated by the communication cost due to its frequent interaction between the parties involved. Consequently, when picking a privacypreserving computing solution for an application, not only one needs to keep in mind the required security model but also the compute platform that is available for the protocol execution. Proposing a solution at random may lead to insecure models and poorly performing applications. MPC-based privacy-preserving computing techniques are high performing when trying to accomplish simple tasks such as two millionaires’ problem on low-resource devices. However, to accomplish more sophisticated and complex tasks such as machine learning training and inference, we need to use of HE-based techniques.
1.5
Non-cryptographic Techniques for PPC
13
Table 1.2 HE versus MPC HE
MPC
Compute-bound, Memory-bound
Network-bound
Privacy
Encryption
Encryption/Non-collusion
Interactive
No
Yes
Cryptographic security
Yes
Yes
Communication channel
Insecure, encrypted data secure Encrypted data secure in in transit transit/Data transmitted over perfectly secure channels
Supported operations
Linear
Performance
1.5
Non-cryptographic Techniques for PPC
1.5.1
Anonymization
Linear and non-linear
Anonymization is a non-cryptographic approach whereby the data owner/manager removes personal identifiers before releasing the data to the public. For example, Netflix released anonymous movie rating data to aid contestants for its $1M prize to build better recommender systems (Bennett et al. 2007) for movies. Unfortunately, despite the anonymization, researchers were able to utilize this dataset along with IMDB background knowledge to identify the Netflix records of known users and were further able to deduce the users’ apparent movie preferences (Narayanan and Shmatikov 2006). This incident very well demonstrated that anonymization cannot reliably protect the privacy of individuals in the face of strong adversaries.
1.5.2
Differential Privacy
To address the shortcomings of data anonymization, researchers introduced differential privacy mechanisms (Dwork 2006; Dwork et al. 2014; Wood et al. 2018). Differential privacy is a mathematical technique of adding a controlled amount of randomness to a dataset to prevent a malicious user from obtaining information about individuals in the dataset. Therefore, the resulting noisy dataset is still accurate enough to generate aggregate insights while maintaining the privacy of individual participants. The major advantage of differential privacy is that it enables businesses to share their data with other organizations to collaborate
14
1 Introduction
with them without risking their customers’ privacy. Differential privacy can be implemented locally or globally. In local differential privacy, noise is added to individual data before it is centralized in a database. In global differential privacy, noise is added to raw data after it is collected from many individuals. There are multiple real-world applications where differential privacy is used. For example, Google introduced a differential privacy tool called Randomized Aggregatable PrivacyPreserving Ordinal Response (RAPPOR) (Erlingsson et al. 2014) to Chrome browsers in 2014. It helps Google to analyze and draw insights from browser usage while preventing sensitive information from being traced. Similarly, Apple uses differential privacy in iOS and macOS devices to analyze personal data such as emojis, search queries, and health information (Apple Differential Privacy Team 2017). In addition, differential privacy is used in various other privacy-preserving applications involving artificial intelligence such as federated learning or synthetic data generation.
1.5.3
HE Versus Non-cryptographic PPC Techniques
Traditional anonymization techniques such as removing columns containing personally identifiable information or data masking can be susceptible to re-identification, thereby providing weaker security guarantees in comparison to HE. Moreover, anonymization can be permanent, meaning that once the data has been anonymized there is no way to link it back to an individual. Therefore, the results of such anonymized data may be of limited use. In contrast, in HE, using an encryption key, one can retrieve identifiable information when needed to link the data back to an individual. Differential privacy is not applicable to the first use case that we discussed above. It only works well with the second use case which is collaborative in nature. In contrast, HE is equally applicable to both use cases. For large datasets, the inaccuracies introduced by differential privacy can be ignored but it is not the case for small ones. For a small dataset, the noise added by differential privacy can seriously impact any analysis based on it. HE does not modify the underlying dataset, and thus, irrespective of the dataset size, HE works well.
1.6
Summary
Privacy-preserving computing is necessary for all the applications where we need to maintain user data privacy. Over the years, researchers have developed a variety of cryptographic and non-cryptographic techniques to enable privacy-preserving computing. The non-cryptographic techniques include trusted execution environments, anonymization, and
References
15
differential privacy, while the cryptographic techniques include multi-party computation and homomorphic encryption. This book focuses on homomorphic encryption. Homomorphic encryption maintains data privacy by enabling computations on encrypted data. Here, the data owner encrypts his/her data before sending it to a third-party cloud server, the cloud server is able to perform computations on this encrypted data because of the homomorphic properties of the underlying encryption scheme. The data owner gets the encrypted result and decrypts it with his/her own key to obtain the result of the computation. Gentry’s seminal work (Gentry et al. 2009) laid the foundation stone for the development of a variety of lattice-based homomorphic encryption schemes such as GSW, BGV, B/FV, CKKS, FHEW, and TFHE. All of these schemes are post-quantum secure (Bernstein and Lange 2017) as they rely on lattice-based cryptography, which uses complex mathematical problems that remain NP-hard even for quantum computers. In the next chapter, we describe in detail the CKKS FHE scheme as it allows operating on encrypted real numbers and lends itself naturally to privacy-preserving applications in a variety of domains.
References Albrecht M, Bai S, Ducas L (2016) A subfield lattice attack on overstretched ntru assumptions. In: Annual international cryptology conference. Springer, pp 153–178 Alperin-Sheriff J, Peikert C (2014) Faster bootstrapping with polynomial error. In: Annual cryptology conference. Springer, pp 297–314 Apple Differential Privacy Team (2017) Learning with privacy at scale. https://docs-assets.developer. apple.com/ml-research/papers/learning-with-privacy-at-scale.pdf Bennett J, Lanning S et al (2007) The netflix prize. In: Proceedings of KDD cup and workshop, vol 2007. Citeseer, p 35 Bernstein DJ, Lange T (2017) Post-quantum cryptography. Nature 549(7671):188–194 Biondo A, Conti M, Davi L, Frassetto T, Sadeghi A-R (2018) The guard’s dilemma: efficient {Code-Reuse} attacks against intel {SGX}. In: 27th USENIX security symposium (USENIX Security 18), pp 1213–1227 Bos JW, Lauter K, Loftus J, Naehrig M (2013) Improved security for a ring-based fully homomorphic encryption scheme. In: IMA international conference on cryptography and coding. Springer, pp 45–64 Brakerski Z (2012) Fully homomorphic encryption without modulus switching from classical gapsvp. In: Annual cryptology conference. Springer, pp 868–886 Brakerski Z, Vaikuntanathan V (2014a) Efficient fully homomorphic encryption from (standard) lwe. SIAM J Comput 43(2):831–871 Brakerski Z, Vaikuntanathan V (2014b) Lattice-based fhe as secure as pke. In: Proceedings of the 5th conference on innovations in theoretical computer science, pp 1–12 Brakerski Z, Vaikuntanathan V, Gentry C (2012) Fully homomorphic encryption without bootstrapping. In: Innovations in theoretical computer science Chen H, Dai W, Kim M, Song Y (2019) Efficient multi-key homomorphic encryption with packed ciphertexts with application to oblivious neural network inference. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pp 395–412
16
1 Introduction
Cheon JH, Coron J-S, Kim J, Lee MS, Lepoint T, Tibouchi M, Yun A (2013) Batch fully homomorphic encryption over the integers. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 315–335 Cheon JH et al (2017) Homomorphic encryption for arithmetic of approximate numbers. In: International conference on the theory and application of cryptology and Information Security Cheon JH, Jeong J, Lee C (2016) An algorithm for ntru problems and cryptanalysis of the ggh multilinear map without a low-level encoding of zero. LMS J Comput Math 19(A):255–266 Chillotti I, Gama N, Georgieva M, Izabachène M (2020) Tfhe: fast fully homomorphic encryption over the torus. J Cryptol 33(1):34–91 Chillotti I, Gama N, Georgieva M, Izabachene M (2016) Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds. In: International conference on the theory and application of cryptology and information security. Springer, pp 3–33 Coron J-S, Lepoint T, Tibouchi M (2014) Scale-invariant fully homomorphic encryption over the integers. In: International workshop on public key cryptography. Springer, pp 311–328 Coron J-S, Mandal A, Naccache D, Tibouchi M (2011) Fully homomorphic encryption over the integers with shorter public keys. In: Annual cryptology conference. Springer, pp 487–504 Coron J-S, Naccache D, Tibouchi M (2012) Public key compression and modulus switching for fully homomorphic encryption over the integers. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 446–464 Costan V, Devadas S (2016) Intel sgx explained. Cryptology ePrint Archive Cramer R, Damgård I, Maurer U (2000) General secure multi-party computation from any linear secret-sharing scheme. In: International conference on the theory and applications of cryptographic techniques. Springer, pp 316–334 Ducas L, Micciancio D (2015) Fhew: bootstrapping homomorphic encryption in less than a second. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 617–640 Dwork C (2006) Differential privacy, automata, languages and programming-icalp 2006, lncs 4052 Dwork C, Roth A et al (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9(3–4):211–407 ElGamal T (1985) A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans Inf Theory 31(4):469–472 Erlingsson Ú, Pihur V, Korolova A (2014) Rappor: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp 1054–1067 Fan J et al (2012) Somewhat practical fully homomorphic encryption. IACR Cryptol ePrint Archive 2012:144 Gama N, Izabachene M, Nguyen PQ, Xie X (2016) Structural lattice reduction: generalized worst-case to average-case reductions and homomorphic cryptosystems. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 528–558 Gentry C et al (2009) Fully homomorphic encryption using ideal lattices. Stoc 9:169–178 Gentry C, Halevi S, Smart NP (2012a) Better bootstrapping in fully homomorphic encryption. In: International workshop on public key cryptography. Springer, pp 1–16 Gentry C, Halevi S, Smart NP (2012b) Fully homomorphic encryption with polylog overhead. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 465–482 Gentry C, Halevi S, Smart NP (2012c) Homomorphic evaluation of the aes circuit. In: Annual cryptology conference. Springer, pp 850–867
References
17
Gentry C, Sahai A, Waters B (2013) Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Annual cryptology conference. Springer, pp 75–92 Götzfried J, Eckert M, Schinzel S, Müller T (2017) Cache attacks on intel sgx. In: Proceedings of the 10th European workshop on systems security, pp 1–6 Kaur K, Gupta I, Singh AK et al (2017) A comparative evaluation of data leakage/loss prevention systems (dlps). In: Proceeding of the 4th internation conference computer science & information technology (CS & IT-CSCP), pp 87–95 Kim A, Papadimitriou A, Polyakov Y (2020) Approximate homomorphic encryption with reduced approximation error. Cryptology ePrint Archive Kocher P, Horn J, Fogh A, Genkin D, Gruss D, Haas W, Hamburg M, Lipp M, Mangard S, Prescher T et al (2019) Spectre attacks: exploiting speculative execution. In: 2019 IEEE symposium on security and privacy (SP). IEEE, pp 1–19 Levieil E, Naccache D (2008) Cryptographic test correction. In: International workshop on public key cryptography. Springer, pp 85–100 Lipp M, Schwarz M, Gruss D, Prescher T, Haas W, Fogh A, Horn J, Mangard S, Kocher P, Genkin D et al (2018) Meltdown: reading kernel memory from user space. In: 27th USENIX security symposium (USENIX Security 18), pp 973–990 López-Alt A, Tromer E, Vaikuntanathan V (2017) Multikey fully homomorphic encryption and applications. SIAM J Comput 46(6):1827–1892 López-Alt A, Tromer E, Vaikuntanathan V (2012) On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the forty-fourth annual ACM symposium on theory of computing, pp 1219–1234 Lyubashevsky V, Peikert C, Regev O (2010) On ideal lattices and learning with errors over rings. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 1–23 Mukherjee P, Wichs D (2016) Two round multiparty computation via multi-key fhe. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 735–763 Murdock K, Oswald D, Garcia FL, Van Bulck J, Gruss D, Piessens F (2020) Plundervolt: softwarebased fault injection attacks against intel sgx. In: 2020 IEEE symposium on security and privacy (SP). IEEE, pp 1466–1482 Narayanan A, Shmatikov V (2006) How to break anonymity of the netflix prize dataset. arXiv:cs/0610105 Paillier P (2005) Paillier encryption and signature schemes Popovi´c K et al (2010) Cloud computing security issues and challenges. In: The 33rd international convention MIPRO. IEEE, pp 344–349 Rivest RL, Adleman L, Dertouzos ML et al (1978) On data banks and privacy homomorphisms. Found Secure Comput 4(11):169–180 Rivest RL, Shamir A, Adleman L (1983) A method for obtaining digital signatures and public-key cryptosystems. Commun ACM 26(1):96–99 Schwarz M, Weiser S, Gruss D (2019) Practical enclave malware with intel sgx. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer, pp 177–196 Smart NP, Vercauteren F (2014) Fully homomorphic simd operations. Designs Codes Cryptogr 71(1):57–81 van Dijk M, Gentry C, Halevi S, Vaikuntanathan V (2010) Fully homomorphic encryption over the integers. In: Annual international conference on the theory and applications of cryptographic techniques. Springer, pp 24–43
18
1 Introduction
van Bulck J, Minkin M, Weisse O, Genkin D, Kasikci B, Piessens F, Silberstein M, Wenisch TF, Yarom Y, Strackx R (2018) Foreshadow: extracting the keys to the intel {SGX} kingdom with transient {Out-of-Order} execution. In: 27th USENIX security symposium (USENIX Security 18), pp 991–1008 Wood A, Altman M, Bembenek A, Bun M, Gaboardi M, Honaker J, Nissim K, O’Brien DR, Steinke T, Vadhan S (2018) Differential privacy: a primer for a non-technical audience. Vand J Ent & Tech L 21:209 Yao AC-C (1986) How to generate and exchange secrets. In: 27th annual symposium on foundations of computer science (sfcs 1986). IEEE, pp 162–167
2
The CKKS FHE Scheme
In this chapter, we describe the CKKS FHE scheme in detail. The CKKS scheme supports operations on real numbers, and thus, is the preferred scheme for implementing privacypreserving machine learning applications. We would like to note that the original CKKS scheme (Cheon et al. 2017) supports homomorphic computations over the real numbers, but its implementation could not employ core optimization techniques based on the Residue Number System (RNS) decomposition (see Section 2.2.1) and the Number Theoretic Transform (NTT) (see Section 2.2.3). Consequently, Cheon et al. (2019) proposed a full RNS variant of the CKKS FHE scheme that is optimal for implementation on a standard computer system. They introduced a new structure of the ciphertext modulus which allows using both the RNS decomposition of cyclotomic polynomials and the NTT conversion on each of the RNS components. Therefore, in this chapter, we focus on the full RNS variant of the CKKS FHE scheme. Figure 2.1 illustrates the data flow and the operations involved in a typical application based on the CKKS FHE scheme. The operations can be broadly classified into two sets of operations, i.e., client-side operations and server-side operations. The client (patient in this case) is responsible for performing client-side operations starting with encoding of the personal genome data and then encrypting it to send the resulting ciphertext to the cloud. The third-party cloud server performs the server-side operations such as addition, multiplication, rotation, conjugate, and bootstrapping on this encrypted data and sends the encrypted result back to the client. Below we first provide a detailed discussion of all the client-side and server-side operations. We then explain the use of CKKS FHE scheme to homomorphically evaluate an example application i.e., logistic regression (LR) model training.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Agrawal and A. Joshi, On Architecting Fully Homomorphic Encryption-based Computing Systems, Synthesis Lectures on Computer Architecture, https://doi.org/10.1007/978-3-031-31754-5_2
19
20
2 The CKKS FHE Scheme
Personal genome data
Encode
Encrypt
Public key
Patient
FHE operations like Add, Mult, Rotate, KeySwitch, Bootstrapping
KeySwitch keys Private key
Decrypted result (private output) Decode
Third-party cloud server with HE compute capabilities
Decrypt Client-side operations
Server-side operations
Fig. 2.1 The data flow of an end-to-end encrypted computation based on CKKS FHE scheme
2.1
CKKS Parameters
The homomorphism1 in CKKS FHE scheme is based on the Ring learning with errors (RLWE) (Lyubashevsky et al. 2010) encryption scheme. All the parameters along with notations that are used in this scheme are summarized in Table 2.1. For a given security parameter λ, we need to choose the following ring parameters. • • • • • •
N , a power-of-two integer. The nth cyclotomic polynomial2 N (X ) = X N + 1 with N = 2n. A ciphertext modulus Q. A special modulus P that is co-prime to Q. A secret key distribution χkey . This distribution can be binary, ternary, or Gaussian. An error √ distribution χerr . This distribution is typically Gaussian with a standard deviation of 8/ 2π ≈ 3.2.
Using these parameters, we can define a polynomial ring as R = Z[X ]/(X N + 1) whose elements have degree at most N − 1 as X N = −1 ∈ R. We can further extend our polynomial ring R to R Q i.e., R modulo an integer Q whose elements have coefficients in [−(Q − 1)/2, Q/2] ∩ Z. For example, if Q = 5, the coefficients are chosen from {−2, −1, 0, 1, 2}. However, when performing actual computations, we represent the coefficients in [0, Q − 1] ∩ Z. So with Q = 5, for actual computations the coefficients are represented by {0, 1, 2, 3, 4}.
1 A map f : R → S between rings is called a ring homomorphism if f (x + y) = f (x) + f (y) and
f (x y) = f (x) f (y) for all x, y ∈ R.
2 For any positive integer n, nth cyclotomic polynomial is the unique irreducible polynomial (with integer coefficients) that is a divisor of X n − 1 and is not a divisor of X k − 1 for any k < n and k
coprime to n. Its roots are all nth primitive roots of unity e2iπ n .
2.2
Basic Compute Optimizations in CKKS Scheme
21
Table 2.1 CKKS FHE parameters and their description Parameter
Description
λ
Desired security level of the scheme. Typically, set to 128
η
Number of precision bits in plaintext elements
N
Degree of the polynomial in the ciphertext ring. Typically, a power-of-two integer
n
Number of plaintext elements in a single ciphertext. Can range from 1 to N /2
Q
Full modulus of a coefficient in ciphertext
q
Typically, a machine word sized prime modulus and a limb of Q
Scale factor used for scaling plaintext
P
Product of the extension limbs added for the raised modulus
p
A limb of P and is a machine word sized prime modulus
L
Maximum number of limbs in a ciphertext
Current number of limbs in a ciphertext
dnum
Number of digits in the switching key
α
(L + 1)/dnum . Number of limbs that comprise a single digit in the key-switching decomposition. This value is fixed throughout the computation
β
( + 1)/α . An -limb polynomial is split into this number of digits during base decomposition
d
Multiplicative depth of a circuit to be evaluated
2.2
Basic Compute Optimizations in CKKS Scheme
In practice, to evaluate a depth d circuit homomorphically with η-bit precision3 , the parameter log Q needs to be larger than (d ∗ log ) + η. Say, if we want to evaluate a circuit with d = 10, η = 40 (30-bit precision for the decimal part and 10-bit precision for the integer part), and log = 30, then we will need log Q > 340. For many real-world applications such as machine learning training, the depth of the circuit can go beyond 100. Consequently, we need log Q value to be much larger than 3,400 bits. However, for a fixed value of N , the larger the value of log Q, the less secure the FHE scheme is. So to maintain the same level of security, as the value of log Q increases, we need to choose a larger N value even if we do not need additional slots. Figure 2.2 shows the relationship between log Q and log N values for some of the common security levels. The plot is based on the parameter values suggested by the homomorphic standard white paper (HE Standard 2018).
3 Interested readers can refer to the work of Costache et al. (2022) that presents a detailed analysis
on precision loss in the CKKS scheme.
22
2 The CKKS FHE Scheme
Fig. 2.2 Values of the parameter log N and log Q for different security levels
2.2.1
Residue Number System (RNS)
As discussed above, the polynomial coefficients are extremely large, often on the order of thousands of bits. No known existing compute platforms have multipliers and adders that can handle operands that are thousands of bits wide. Moreover, arithmetic operations modulo a large integer are very compute-intensive. Therefore, to compute on such large numbers, we use the residue number system (RNS) (Navi et al. 2010) also popularly known as the Chinese remainder theorem (CRT) representation. Using RNS, we represent these large numbers modulo Q = Lj=0 q j , where each q j is a prime number that fits in a standard machine word (typically less than 64 bits). We define the RNS basis as a set B := {q0 , . . . , q L } consisting of L + 1 moduli, and refer to each q j as a limb of Q. Using the RNS optimization, we can perform operations over values in Z Q without any native support for multi-precision arithmetic. We can represent an integer x ∈ Z Q as a length-L+1 vector of scalars [x]B = (x0 , x1 , . . . , x L ), where x j ≡ x (mod q j ) is a limb of x. To add two values x, y ∈ Z Q , we have xi + yi ≡ x + y (mod qi ). Similarly, to multiply two values x, y ∈ Z Q , we have x j · y j ≡ x · y (mod q j ). This allows us to perform addition and multiplication over Z Q while only operating over standard machine words.
2.2.2
Modular Arithmetic
All FHE operations (on the ciphertext polynomials) boil down to scalar modular additions and scalar modular multiplications. Current commodity compute systems such as CPU and GPU implement integer arithmetic. Consequently, modular arithmetic is implemented via emulation where in a single modular arithmetic operation translates into multiple arithmetic instructions. Thus, to perform modular arithmetic operations there is a significant increase in the amount of compute that we need to perform. Therefore, optimizing modular arithmetic is critical to optimizing FHE computation.
2.2
Basic Compute Optimizations in CKKS Scheme
23
When we add two operands that are log q-bit wide, the resulting value can be up to (log q + 1)-bit wide. However, the coefficients of the polynomial are restricted to Zq . So we need to perform modular reduction operation after every integer addition operation. To perform modular addition without having to perform any expensive division operations, one can use the standard approach of conditional subtraction whenever the addition overflows the modulus. When computing back-to-back sums of many scalars, one can avoid performing a modular reduction until the end of the summation, as long as the unreduced sum fits in a machine word. Similar to addition, when we multiply two operands that are log q-bit wide, the resulting value can be up to (2 log q)-bit wide. To perform generic modular multiplications, one can use the well-known Barrett reduction technique (Barrett 1987) and Montgomery multiplication (Koc et al. 1996). To perform multiplications with a constant or with elements whose values are known in advance such as the root of unity, one can use Shoup’s technique (Shoup et al. 2001) for multiplication. With Shoup’s technique, when computing x · y (mod q) and if x and q are known in advance, we can precompute a value xs such that ModMulShoup(x, y, xs , q) = x · y (mod q) is much faster than directly computing x · y (mod q).
2.2.3
NTT/iNTT
To perform the multiplication of two polynomials in R Q we first reduce the two polynomials modulo X N + 1 in R and then further reduce them modulo Q in R Q . In order to enable fast polynomial multiplication, we represent all polynomials by default as a series of N evaluations at fixed roots of unity. This allows polynomial multiplication to occur in O(N ) time4 instead of O(N 2 ) time. We refer to this representation as the evaluation representation. Certain server-side operations such as Rescale (refer Section 2.5.3) and KeySwitch (refer Section 2.5.6) operate over the polynomial’s coefficient representation, which is simply a vector of its coefficients. Addition of two polynomials and multiplication of a polynomial by a scalar are O(N ) in both the coefficient and the evaluation representation. Effectively, we need efficient mechanisms for switching between the two representations. We can use a number theoretic transform (NTT) or inverse NTT (Kim et al. 2020), which is the finite field version of the fast Fourier transform (FFT) (Nussbaumer 1981) to switch between these two representations. NTT/iNTT takes O(N log N ) time and O(N ) space for a degree-(N − 1) polynomial. A number theoretic transform (NTT) is a generalization of the FFT over a finite ring R Q instead of the complex number field C. The NTT Equation (Agrawal et al. 2019) is given as follows:
4 In CKKS scheme, the default representation of ciphertext or plaintext is in NTT format, and thus,
the operational complexity of polynomial multiplication is O(N ) and not O(N log N ).
24
2 The CKKS FHE Scheme
Xi =
N −1
xk · ω ik
(2.1)
k=0
where ω is the N th root of unity in the corresponding field of the polynomial. For a ring R Q , the N th root of unity ω satisfies two conditions. First, ω N = 1 (mod Q) and second, the period of ω i is exactly N . This N th root of unity ω is also known as twiddle factors. For iNTT, we replace ω with ω −1 in Equation 2.1, where ω −1 = ω N −1 (mod Q). iNTT computation also requires computing the inverse of N , which can be computed as N −1 · N = 1 (mod Q). Example There exists multiple FFT algorithms (Swarztrauber 1984) that can be leveraged to perform an NTT and iNTT operation. Here, we present an example NTT computation (see Figure 2.3) using the Cooley–Tukey (Mersereau and Speake 1981) FFT algorithm, one of the most popularly known algorithms for FFT. For this example, we assume N = 8, Q = 17, ω = 9, and input vector a = [1, 2, 3, 7, 5, 4, 1, 2]. We compute various powers of → ω and represent it as a vector − ω = [1, 9, 13, 15, 16, 8, 4, 2]. As we have N = 8, log N = 3 and so our NTT computation will be done over 3 stages. In addition, the Cooley–Tukey algorithm requires the input to be re-arranged in the bit-reversed order of its index. Therefore, as shown in Figure 2.3, we begin with this bit-reversed order permutation of the input vector and then perform the computation in various NTT stages. Every stage computes N /2 modular multiplication, addition, and subtraction operations through a butterfly operation. Hence, an NTT operation has an operational complexity of O(N /2 log N ). One can perform iNTT operation (with the Cooley–Tukey algorithm) using the exact same steps as shown in Figure 2.3 with the only difference that the output of the last stage is scaled by N −1 . Fig. 2.3 NTT example using Cooley–Tukey algorithm
2.3
Key Generation
2.3
25
Key Generation
Public-private key pair is required to perform encryption and decryption operations (refer Figure 2.1) at the client (who is the data owner), while the switching keys are required to perform the KeySwitch operation at the server-side. Key generation operation can be carried out either by a client or a trusted third-party for the client.
2.3.1
Public Keys
To generate the public key, we first sample the secret/private key s from the key distribution χkey such that s = χkey and set the secret key as sk = (1, s). Next we sample L + 1 limbs of the first public key represented by a from a uniform sampler, where each limb is a vector of elements such that elements are upper bounded by corresponding prime modulus qi . Mathematically we represent this sampling as (a (0) , . . . , a (L) ) = U ( Lj=0 Rq j ). We need to sample the error polynomial e from an error distribution χerr such that e = χerr . We can now compute the second public key b as follows: b( j) = −a ( j) · s + e
(2.2)
We Set the public key pk as pk ( j) = (b( j) , a ( j) ) ∈ Rq2 j for 0 ≤ j ≤ L.
2.3.2
Switching Keys
The KeySwitch procedure (described in detail in Section 2.5.6) uses switching keys ksk (also known as evaluation keys) to convert a ciphertext with the secret key s1 into a ciphertext encrypting the same message with the secret key s2 . For the two secret key polynomials s1 k−1 and s2 , we sample uniform elements (a (0) , . . . , a (k+L) ) ← U ( i=0 R pi × Lj=0 Rq j ), and an error polynomial e ← χerr . The switching keys are then computed as follows: b b
(k+ j)
← −a
(i)
(k+ j)
← −a
(i)
· s2 + e
· s2 + [P]q j + e
(mod pi ), 0 ≤ i < k
(2.3)
(mod q j ), 0 ≤ j ≤ L
(2.4)
Technically, switching key can be seen as the RNS representation of (b , a ) ∈ R P.Q for b = −a s˙2 + P s˙1 + e (mod P.Q). For example, after a homomorphic multiplication operation, the switching keys can be used to perform the relinearization operation with s1 set to s2 and s2 set as the original secret key s. Note that we can publicly share the switching keys ksk with the party performing computation such as the cloud server because the secret will be hard to extract based on the RLWE problem.
26
2.4
2 The CKKS FHE Scheme
Client-Side Operations
As discussed earlier, the encoding/decoding and encryption/decryption are performed on the client-side. These operations are typically not considered as computationally expensive and thus, not implemented by the existing state-of-the-art FHE-based implementations (Jung et al. 2021; Kim et al. 2022a, b; Samardzic et al. 2021, 2022).
2.4.1
Encoding
The basic plaintext data-type in the CKKS scheme is an element chosen from C, the field of complex numbers. In order to enable SIMD-style operations on the server-side, the client can encode many numbers in a single ciphertext. Thus, a plaintext can be either a single element or can be vector of length n elements where each element is chosen from the field of complex numbers. Moreover, these non-integer elements have a finite precision η in CKKS scheme and so they are scaled by a scale factor before encoding them so as to reduce the precision loss from encryption noise. This scale factor is usually the size of one of the limbs of the ciphertext, which is slightly less than a machine word. The encoding operation denoted by Encode takes as input this n-dimensional vector z, runs a complex inverse FFT on this vector, and returns an integer polynomial m(X ) ∈ Z[X ]/(X N + 1). The mapping in encoding operation is given as follows: → − → z ) ∈ R[X ]/( N (X )) z = (z j ) j∈S ∈ Z[i] N /2 → Z (X ) = σ −1 ◦ l −1 (− → m(X ) = · Z [X ] ∈ Z[X ]/( N (X ))
(2.5) Here, σ is the canonical embedding map C[x]/(X N + 1) → C N defined by a(X ) → (a(ζ j ) j∈Z N ), ζ = ex p(−πi/N ), and S ≤ Z∗N is a subgroup such that Z∗N /S = {±1}. Note that σ defines an isomorphism, which means it is a bijective homomorphism, so any vector will be uniquely encoded into its corresponding polynomial, and vice-versa. For rounding coefficients to integers, we use a technique called coordinate-wise random rounding, defined in toolkit for RLWE Cryptography (Lyubashevsky et al. 2013). This rounding technique allows to round a real x either to x or x + 1 with a probability that is higher the closer x is to x or x + 1. Let us simplify the Equation 2.5 to get a better understanding of the encoding operation. Encoding of a vector z ∈ C N /2 into the corresponding polynomial, requires comput N −1 ci X i ∈ ing σ −1 . Therefore, the problem reduces to finding a polynomial m(X ) = i=0 N N /2 3 N C[X ]/(X + 1), given a vector z ∈ C , such that σ(m) = (m(ζ), m(ζ ), . . . , m(ζ −1 )) = 2i+1 ) j = z , i = 1, . . . , N This (z 1 , z 2 , . . . , z n ). Replacing X with ζ, we get n−1 i j=0 c j (ζ
2.4
Client-Side Operations
27
can be viewed as a linear equation Ac = z, with A being a Vandermonde matrix5 of the (ζ 2i+1 )i=1,...,N , c the vector of the polynomial coefficients, and z the vector we want to encode. Therefore, to find the coefficients of the polynomial we need to compute c = A−1 z either by solving a system of linear equations or by running a complex inverse FFT on this z using the Vandermonde matrix. Inverse FFT approach is specifically beneficial when dealing with large values of N resulting in large Vandermonde matrix. Data Size: Encode operation converts a log m-bit element or a log m-bit vector of n elements into a polynomial having N log(m + )-bit coefficients. Here, n = N /2.
2.4.2
Encryption
Encryption operation can be performed using either symmetric or asymmetric key. In either case, we denote the encryption of an integer polynomial m(X ) (that is an encoded length-n vector) by Encrypt(m) = m. Encryption of the polynomial m(X ) under the public key pk generates a ciphertext ct = (ct( j) )0≤ j≤L ∈ Lj=0 Rq2 j by computing the following equation: ct( j) = μ · pk ( j) + (m + e0 , e1 )
(mod q j ), 0 ≤ j ≤ L
(2.6)
Here, μ is a uniformly sampled polynomial from the distribution χenc , and e0 and e1 are two noise polynomials sampled from the error distribution χerr . We denote the encryption as m = (c0 , c1 ) where c0 and c1 are the two polynomials that comprise the ciphertext. The coefficients in both the ciphertext polynomials are elements of Z Q , where Q has L + 1 limbs. Thus, in total, the size of a ciphertext is 2N (L + 1) machine words. Note that the ciphertexts are known as “packed” ciphertexts when they encrypt n = N /2 elements vector in a single ciphertext. For encryption under secret key, we can replace pk by s in Equation 2.6 and compute the polynomial c0 . The polynomial c1 in the ciphertext is simply μ. Data Size: Encrypt operation converts a polynomial having N log(m + )-bit coefficients into two polynomials having N log Q-bit coefficients. If log m = 10, log = 32, and N = 216 , then the size of the plaintext polynomial is ∼344 KB. This plaintext polynomial is encrypted into a ciphertext of size ∼28.3 MB where log Q = 1728 and λ = 128bit security.
2.4.3
Decryption
ˆ = m + e, The decryption operation returns an approximate value of the original message m where magnitude of e is small enough to not destroy the significand of m. We denote 5 In linear algebra, a Vandermonde matrix is a matrix with the terms of a geometric progression in
each row.
28
2 The CKKS FHE Scheme
the decryption operation as Decrypt(ct). Decrypt(ct) takes as input a ciphertext ct = ˆ = ct(0) , sk (mod q0 ). Decryption equation is as follows: (ct( j) )0≤ j≤ and outputs m ct(0) = ct[0]( j) + ct[1]( j) · s( j)
(mod q0 ), 0 ≤ j ≤
(2.7)
Here, ct[0] = c0 and ct[1] = c1 polynomials in the ciphertext. To recover a correct value after decryption, the encrypted plaintext should satisfy the condition m ≤ q0 /2.
2.4.4
Decoding
Decoding takes the approximate message polynomial from the decryption operation and maps the underlying plaintext to the field of complex numbers. We denote this decoding ˆ This operation can be viewed as the inverse of encoding operation operation by Decode(m). ˆ and then scales it down by the scale factor . The mapping in decoding as it runs FFT on m operation is given as follows: → → → m = (m(ζ j )) j∈S ∈ C(N /2) → − z = −1 · − m ∈ Z[i] N /2 m(X ) ∈ Z[X ]/( M (X )) → − (2.8) Decoding, m(X ) into a vector z, simply evaluates the polynomial on certain values that are the roots of the cyclotomic polynomial M (X ) and those N roots are ζ, ζ 3 , . . . , ζ N −1 . Recall that we multiplied by > 0 during encoding as the rounding might destroy some significant numbers, so now we divide by during decoding to keep a precision of 1 . To understand how this works, imagine you want to round x = 2.6 and you do not want to round it to the closest integer but to the closest multiple of 0.167 to keep some precision. Then, you want to set the scale = 6, which gives a precision of 1 = 0.167. Indeed, now when we get x = 6 ∗ 2.6 = 15.6 = 16. Once we divide it by the same scale we get 2.67, which is indeed the closest multiple of 0.167 of x = 2.6.
2.4.5
Example of Encode and Decode Operation
Let us now discuss an example to better understand what we have discussed so far in the −2πi −πi client-side operations. Let N = 8, n = N2 = 4, N (X ) = X 4 + 1, and ζ = e 8 = e 4 . Our goal is to encode the following vector z = [3 + 4i, 2 − i] and then decode it back. First → z ) = [3 + 4i, 2 − i, 2 + i, 3 − 4i] by including the complex conjugates we generate −1 (− of the vector elements. Encode and Decode happens as follows: −πi
√ • Compute ζ = e 4 = 1+i 2 • Compute powers of ζ i.e., √ , ζ 3 = −1+i √ , ζ5 = – ζ = 1+i 2
2
−1−i √ , 2
ζ7 =
1−i √ 2
2.5
Server-Side Operations
29
⎛
• Generate the Vandermonde matrix, ⎛
ζ0 ⎜(ζ 3 )0 A=⎜ ⎝(ζ 5 )0 (ζ 7 )0 ⎞ 1 1 ζ3 ζ ⎟ ⎟ ζ 6 ζ 2⎠
ζ1 (ζ 3 )1 (ζ 5 )1 (ζ 7 )1
ζ2 (ζ 3 )2 (ζ 5 )2 (ζ 7 )2
⎞ ⎛ 1 ζ3 ⎜1 (ζ 3 )3 ⎟ ⎟=⎜ (ζ 5 )3 ⎠ ⎝1 (ζ 7 )3 1
ζ ζ3 ζ5 ζ7
ζ2 ζ6 ζ2 ζ6
⎞ ζ3 ζ⎟ ⎟ ζ 7⎠ ζ5
1 1 7 ζ5 ⎜ ζ • Compute A−1 = 41 A¯T = 41 ⎜ ⎝ζ 6 ζ 2 ζ5 ζ7 ζ ζ3 ⎛ ⎞⎛ ⎞ ⎞ ⎛ 1 1 1 1 10 3 + 4i √ −1−i −1+i 1+i 1−i ⎜ √ √ √ √ ⎟⎜ 2 −i ⎟ ⎟ ⎜ ⎜ 2 2 2 ⎟⎜ ⎟ = 1 ⎜4 2 ⎟ • Solve c = A−1 z = 41 ⎜ 2 ⎟⎝ 4 ⎠ ⎠ ⎝ i −i i ⎠ 2+i 10 ⎝ −i √ 1−i 1+i −1+i −1−i √ √ √ √ 3 − 4i 2 2 2 2 2 2 • Multiply with the scale factor = 64 and round off to the nearest integer to get the encoded polynomial m(X ) = 160 + 91X + 160X 2 + 45X 3 . • Now to Decode, we just need to compute m(X ) at ζ and ζ 3 after scaling down m(X ) by . 2 +45X 3 = 2.5 + 1.421875X + 2.5X 2 + – m (X ) = m(X )/ = 160+91X +160X 64 3 0.703125X – m (ζ) = (3.007 + 4.0026i) and m (ζ 3 ) = (1.9918 + 0.9988i).
2.5
Server-Side Operations
As discussed earlier, there are many server-side homomorphic operations such as addition, multiplication, rotation, conjugate, and bootstrapping. There are two different use cases for the operations requiring two operands like addition and multiplication. In the first use case, the client can send encrypted personal data and the server provides the other input in the form of a plaintext. For example, in a privacy-preserving machine learning inference, the server can be a model owner and the client may want to encrypt an image to quickly check for a disease. In this case, the client will need to encrypt the image to be HIPAA (HHS 2021) compliant, however the server need not encrypt the model weights as model weights for this trained model are not going to leave the server. In this use case, the operations will occur between plaintext and ciphertext to begin with and will incur less computational cost. In the second use case, the client can send encrypted personal data and the server also provides the other input in encrypted form. For the same example as earlier for a privacypreserving machine learning inference, now the server is no longer a model owner but the trained model is owned by a hospital. In this case, the server will receive the encrypted model weights from the hospital to just run the inference and the client will also send an encrypted image to quickly check for a disease. Therefore, in this use case, the operations will occur between two ciphertexts from beginning to end and will incur a higher computational cost than in the first use case.
30
2.5.1
2 The CKKS FHE Scheme
Addition
We denote the addition between plaintext and ciphertext as PtAdd. Thus, PtAdd adds a plaintext vector y to an encrypted vector x, resulting in an encrypted vector. PtAdd(x , y) = x + y
(2.9)
All arithmetic operations on plaintexts are component-wise; the entries of the vector x + y (resp. x · y) are the component-wise sums (resp. products) of the entries of x with the corresponding entries of y. Similarly, we denote addition between two ciphertexts as Add. Add adds an encrypted vector y to an encrypted vector x, resulting in an encrypted vector. (2.10) Add(x , y) = x + y Data size: PtAdd adds a plaintext polynomial to the c0 polynomial of the ciphertext. If log m = 10, log = 32, and N = 216 , then the size of the plaintext polynomial is ∼344 KB. For log Q = 1728 and N = 216 , the size of c0 polynomial in the ciphertext is ∼14.15 MB, implying ∼14.5 MB total data. Addition operation (Add) adds two ciphertexts, each having two polynomials c0 and c1 , where each polynomial is N log Q-bit coefficients. If log Q = 1728 and N = 216 , then the size of each ciphertext is ∼28.3 MB, implying ∼56.6 MB total data.
2.5.2
Multiplication
We denote multiplication between plaintext and ciphertext as PtMult. Thus, PtMult multiplies a plaintext vector y with an encrypted vector x, resulting in an encrypted vector. The exact sequence of operations is listed in Algorithm 2.1. Similarly, we denote multiplication between two ciphertexts as Mult. Mult multiplies an encrypted vector y with an encrypted vector x, also resulting in an encrypted vector. However, Mult has more complex lower-level operations than PtMult. The exact sequence of operations is listed in Algorithm 2.2. Algorithm 2.1 PtMult(m , m ) = m · m 1: (c0 , c1 ) := m 2: (u, v) := (c0 · ( · m ), c1 · ( · m )) 3: return (ModDownB,1 (u), ModDownB,1 (v))
Rescale
When performing PtMult or Mult operations, the underlying scale factor in the messages also gets multiplied together, thus the scale factor grows as well. The scale factor must be
2.5
Server-Side Operations
31
shrunk down in order to avoid overflowing the ciphertext coefficient modulus. We discuss how this procedure works in Section 2.5.3. Other than Rescale, Mult requires a KeySwitch operation after multiplication. This is because multiplication results in a ciphertext encrypted under the secret key s 2 and KeySwitch operation help to convert the ciphertext encrypting the multiplication result under the original secret key s. This KeySwitch operation within Mult is more commonly known as a relinearization operation (Fan et al. 2012). Algorithm 2.2 Mult(m1 s , m2 s , ksk s2 →s ) = m1 · m2 s 1: 2: 3: 4: 5: 6: 7: 8: 9:
(c01 , c11 ) := m1 s (c02 , c12 ) := m2 s (c03 , c13 , c23 ) := (c01 c02 , c01 c12 + c02 c11 , c11 c12 ) − → c := Decompβ (c03 ) → cˆ [i] := ModUp(− c [i]) for 1 ≤ i ≤ β. ˆ vˆ ) := KSKIP(ksk s2 →s , cˆ ) (u, ˆ ModDown(ˆv)) (u, v) := (ModDown(u), (c 0 , c 1 ) := (c13 + u, c23 + v) return (ModDownB,1 (c 0 ), ModDownB,1 (c 1 ))
Rescale
To understand relinearization in a simplistic way, the output of the Mult is a ciphertext of dimension three (see line 3 on Algorithm 2.2), implying that the ciphertext has three polynomials instead of two. If we do nothing to get rid of the third polynomial, a following Mult operations will result in five polynomials in the ciphertext, then nine, and so on. Therefore, the size of the ciphertext will grow exponentially and it will not be usable in practice if we were to define ciphertext-ciphertext multiplication in such a fashion. We need to find a way to do multiplication without increasing the ciphertext size at each step and that is where relinearization comes in. Relinearization (by performing a KeySwitch) allows to have a ciphertext of dimension two, and not three, such that once it is decrypted using the regular decryption circuit (which only needs the secret key s and not its square s 2 ), we get the multiplication of the two underlying plaintext messages. One more thing to note about Mult operation is the resulting reduction in bit precision in the underlying plaintext message. Given encryptions of d messages with η bits of precision, a circuit of depth log d computes their product with (η − log d − 1) bits of precision in d multiplications. This is very close to plaintext floating-point multiplication that can compute a significand with (η − log d) bits of precision. Data size: PtMult multiplies a plaintext polynomial with the c0 and c1 polynomials of the ciphertext. If log m = 10, log = 32, and N = 216 , then the size of the plaintext polynomial is ∼344 KB. For log Q = 1728 and N = 216 , the size of ciphertext is ∼28.3 MB. For Rescale (at the end of PtMult), twiddle factors are required for NTT and iNTT operations, whose size
32
2 The CKKS FHE Scheme
is equivalent to one polynomial of the ciphertext i.e., 14.15 MB. Therefore, the total data required to perform a PtMult operation is ∼42.8 MB. Unlike PtMult that multiplies a plaintext with ciphertext, Mult multiplies two ciphertexts. For log Q = 1728 and N = 216 , the size of a single ciphertext is ∼28.3 MB, implying ∼56.6 MB of total ciphertext data. In addition, Mult needs a ksk for the KeySwitch operation whose size is dependent on the dnum parameter (explained in detail later in Section 2.5.6). Considering dnum = 1 and log q = 54, modulus Q will have 32 limbs, which means that the size of our ksk will be 2 ∗ 32 ∗ 54 ∗ 216 = 28.3 MB. Moreover, for Rescale, twiddle factors will add another 14.15 MB data. Thus, the total data required to perform a Mult operation is ∼99 MB. Note that the output of both PtMult and Mult will be a ciphertext that is one limb less than the original number of limbs, meaning that if there were 32 limbs to begin with, due to Rescale operation the resulting ciphertext will have only 31 limbs. Consequently, the size of the resultant ciphertext will be 27.43 MB instead of 28.3 MB. Also note that as the value of dnum increases, the size of the ksk will also increase.
2.5.3
Rescale
As mentioned in Section 2.4, all encoded messages in CKKS must have a scale factor . In both the PtMult and Mult implementations, the multiplication of the encoded messages results in the product having a scaling factor of 2 . Before these operations can complete, we must shrink the scaling factor back down to (or at least a value very close to ). If this operation is neglected, the scaling factor will eventually grow to overflow the ciphertext modulus, resulting in decryption failure. L−1 Algorithm 2.3 Rescale({m1 · m2 s } Lj=0 , {q L−1 (mod q j )} L−1 j=0 )) = {m1 · m2 s } j=0 )
( j)
( j)
1: (c0 , c1 ) := {m1 · m2 s } Lj=0 ˆ := (iNTT(c(L) ), iNTT(c(L) )) 2: (ˆa, b) 0 1 3: for j from 0 to L − 1 do ˆ (mod q j ) 4: (x, y) ← (NTT(ˆa), NTT(b)) 5: (u( j) , v( j) ) ← (Add(−x, a( j) ), Add(−y, b( j) )) (mod q j ) 6: (uˆ ( j) , vˆ ( j) ) ← q L−1 · (u( j) , v( j) ) (mod q j ) 7: end for 8: return ({uˆ ( j) , vˆ ( j) } L−1 j=0 )
limb-wise
To shrink the scale factor, we divide the ciphertext by (or a value that is close to ) and round the result to the nearest integer. This operation, called ModDown, keeps the scale factor of the ciphertext roughly the same throughout the computation. We sometimes refer
2.5
Server-Side Operations
33
to a ModDown instruction that occurs at the end of an operation as Rescale. Algorithm 2.3 shows the exact sequence of operations that happen within the Rescale operation. For a ciphertext ct encrypting a plaintext m, the rescaling algorithm returns an encryption of q−1 · m ≈ q −1 · m at level ( − 1). The output ciphertext contains an additional error from the approximation of q to q and the rounding of the input ciphertext. After each PtMult (Algorithm 2.1) and Mult (Algorithm 2.2) operation, the ciphertext modulus shrinks. This occurs in the Rescale operation at the end of these functions. If a ciphertext begins with L limbs, we can only compute a circuit with multiplicative depth L − 1, since the ciphertext modulus shrinks by a number of limbs equal to the multiplicative depth of the circuit being homomorphically evaluated. This foreshadows Section 2.5.8 where we present an operation called bootstrapping (Gentry et al. 2009) that increases the ciphertext modulus. Data size: For log Q = 1728 and N = 216 , the size of the input ciphertext is ∼28.3 MB. In Algorithm 2.3 on line 2, we need twiddle factors are iNTT operation on the last limb (with log q = 54) of the ciphertext, whose size is equivalent to 54 ∗ 216 = 0.442 MB. Then on line 4, we need twiddle factors to perform N T T for L limbs, whose size is equivalent to 31 ∗ 54 ∗ 216 = 27.43 MB. Therefore, the total data required to perform a Rescale operation is ∼56.6 MB and the size of the output ciphertext is 27.43 MB.
2.5.4
Rotate
Rotate operation rotates a vector by k positions. To do so, the Rotate operation takes in an encryption of a vector x of length n and an integer 0 ≤ k < n, and outputs an encryption of a rotation of the vector x by k positions. More formally, we denote Rotate operation as Rotate(x , k) = φk (x). As an example, when k = 1, the rotation φ1 (x) is defined as follows: x = x0 x1 . . . xn−2 xn−1 φ1 (x) = xn−1 x0 . . . xn−3 xn−2 The Rotate operation is necessary for computations that operate on data residing in different slots of the encrypted vectors. For example, if you want to perform matrix-vector multiplication, the result of multiplication between a matrix row and the vector ends up within the slots of a single ciphertext. There is no homomorphic operation to support the addition of data within the slots of a single ciphertext. Then to perform this inner product, we need to generate rotated copies of the ciphertext so as to align the slots and perform the addition. This approach to inner product computation is more commonly known as the rotate-and-sum approach. Further detail on Rotate operation usage in the real-world application is described in Section 2.6.
34
2 The CKKS FHE Scheme
Algorithm 2.4 lists the exact sequence of operations that happen within a Rotate operation. Step-2 of the algorithm performs an Automorph operation following Equation 2.11. new_indexk (i) = 5k · i
(mod N )
(2.11)
Thus, any original slot indexed by i in a ciphertext maps to the rotated slot (indexed by new_indexk (i)) following this automorphism equation. Note that this permutation is an automorphism which is not simply a rotation; intuitively, the permutation ψk of an encoded message will result in the decoded value being permuted by the natural rotation φk . As Automorph permutes the slots of the ciphertext, after Automorph the ciphertext is encrypted under a rotated secret key ψk (s). Consequently, once the permutation is done, we need to perform a KeySwitch operation to have the ciphertext encrypted under the original secret key s. Algorithm 2.4 Rotate(ms , k, ksk ψk (s)→s ) = φk (m)s 1: 2: 3: 4: 5: 6: 7:
(c0 , c1 ) := ms (arot , brot ) := (Automorph(c0 , k), Automorph(c1 , k)) − a→ rot := Decompβ (arot ) aˆ [i] := ModUp(− a→ rot [i]) for 1 ≤ i ≤ β. ˆ vˆ ) := KSKIP(ksk ψk (s)→s , aˆ ) (u, ˆ ModDown(ˆv)) (u, v) := (ModDown(u), return (u, v + brot )
β digits.
Algorithm 2.5 lists the optimized steps in a batched rotation operation, denoted by HRotate (short form for hoisted Rotate). HRotate computes many rotations on the same ciphertext faster than applying Rotate independently several times. This helps reduce the computational complexity of Rotate by reducing the number of ModUp (one of the most expensive operation in KeySwitch) operations by hoisting ModUp outside the for loop. Data size: For log Q = 1728 and N = 216 , the size of the input ciphertext is ∼28.3 MB. Considering dnum = 1 and log q = 54, modulus Q will have 32 limbs, which means that the size of our ksk will be 2 ∗ 32 ∗ 54 ∗ 216 = 28.3 MB. Therefore, the total data required to perform a Rotate operation is ∼56.6 MB. Note that for Algorithm 2.5, every rotation with a different value of k will require a unique ksk for the KeySwitch operation.
2.5
Server-Side Operations
35
Algorithm 2.5 HRotate(ms , {ki , ksk ψki (s)→s }ri=1 ) = {φki (m)s }ri=1 1: (c0 , c1 ) := ms → 2: − a := Decompβ (c0 ) → 3: aˆ [ j] := ModUp(− a [ j]) for 1 ≤ j ≤ β. 4: for i from 1 to r do 5: aˆ rot := Automorph(ˆa, ki ) for 1 ≤ j ≤ β ˆ vˆ ) := KSKIP(ksk ψk (s)→s , aˆ rot ) 6: (u, i ˆ ModDown(ˆv)) 7: (u, v) := (ModDown(u), := Automorph(c , ki ) 8: b rot 1 9: φki (m) s := (u, v + brot ) 10: end for 11: return { φki (m) s }ri=1
2.5.5
β digits.
Conjugate
Conjugate operation outputs an encryption of the complex conjugate of the encrypted input vector. Formally, it is written as Conjugate(x) = x. The Conjugate operation implementation is identical to the Rotate implementation with only two minor differences. First, during the permutation phase the rotation of the vector happens by n positions and second, after this permutation and before the KeySwitch happens, the vector is negated. Algorithm 2.6 shows the exact sequence of operations happen within a Conjugate operation. Algorithm 2.6 Conjugate(ms , n, ksk ψn (s)→s ) = φn (m)s 1: 2: 3: 4: 5: 6: 7:
(c0 , c1 ) := ms (arot , brot ) := −(Automorph(c0 , n), Automorph(c1 , n)) − a→ rot := Decompβ (arot ) aˆ [i] := ModUp(− a→ rot [i]) for 1 ≤ i ≤ β. ˆ vˆ ) := KSKIP(ksk ψn (s)→s , aˆ ) (u, ˆ ModDown(ˆv)) (u, v) := (ModDown(u), return (u, v + brot )
β digits.
Data size: The total data size is same as a single Rotate operation.
2.5.6
Key Switching
In both the Mult and Rotate implementations, there is an intermediate ciphertext with a decryption key that differs from the decryption key of the input ciphertext. In order to change this new decryption key back to the original decryption key, we perform a KeySwitch (Brak-
36
2 The CKKS FHE Scheme
erski and Vaikuntanathan 2011) operation. This operation takes in a switching key ksk s →s and a ciphertext ms that is decryptable under a secret key s . The output of the KeySwitch operation is a ciphertext ms that encrypts the same message but is decryptable under a different key s. Since the KeySwitch operation differs between Mult and Rotate, we do not define it separately. Instead, we go a level deeper and define the subroutines necessary to implement KeySwitch for each of these operations. In addition to the ModDown operation, we use the ModUp operation, which allows us to add primes to our RNS basis. We follow the structure of the switching key in the work of Han and Ki ((2020), where the switching key is a 2 × dnum matrix of polynomials and is parameterized by a length dnum. Here, dnum defines the number of decomposed digits for each limb.
a a . . . adnum (2.12) ksk = 1 2 b1 b2 . . . bdnum The KeySwitch operation requires that a ciphertext polynomial be split into dnum “digits,” then multiplied with the switching key. We define the function Decomp that splits a polynomial into dnum digits as well as a KSKIP operation to multiply the dnum digits by the switching key. Decompβ (x) takes in a polynomial x and a parameter dnum6 and splits x into dnum digits {x(1) , . . . , x(β) }. If x has L limbs, each digit of x has roughly α := (L + 1)/dnum limbs. Decomp consists of the following two sub-operations: ( jα+i) · [Q ]q jα+i if jα + i ≤ a (i) (2.13) aj = 0 otherwise for 0 ≤ i < α, 0 ≤ j < β, β = ( + 1)/α , and Q =
αβ−1
i=+1 qi .
(i) (i) a j ← a j · [ Qˆ −1 ]q jα+i
(2.14)
for 0 ≤ i < α, 0 ≤ j < β with jα + i ≤ , and Qˆ = i= j Q i . Here, Equation 2.13 performs the first sub-operation, which is zero-padding and split. Equation 2.14 performs the second sub-operation that RNS-Decompose in Decomp. This approach to KeySwitch (that performs decomposition before performing a modulus switch) is known as a hybrid approach and allows to control the noise growth while performing KeySwitch. In principle, P should be much bigger than Q to effectively reduce the size of noise added through KeySwitch operation. However, using a larger P reduces the total number of levels supported by the scheme. Therefore, instead of using a larger P, noise can be controlled using temporary moduli. Thus, through Decomp in the hybrid KeySwitch approach, instead of using {qi }0≤i≤L , we use the partial prod( j+1)α−1 qi 0≤ j