Blockchain-Based Data Security in Heterogeneous Communications Networks 9783031524769, 9783031524776

This book investigates data security approaches in Heterogeneous Communications Networks (HCN). First, the book discusse

124 78

English Pages 203 [200] Year 2024

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Acronyms
1 Introduction
1.1 Heterogeneous Communications Networks (HCN)
1.2 Emerging Architecture of Mobile Network
1.3 AI-Assisted Network Management in HCN
1.3.1 Challenges of Network Management
1.3.2 Artificial Intelligence for Networking
1.4 Data Management in HCN
1.4.1 Data Lifecycle in HCN
1.4.2 Centralized Data Management Approach
1.4.3 Blockchain-Based Data Management Approach
1.4.4 Balancing Efficiency, Privacy, and Fairness in Blockchain-Based DM
1.5 Blockchain-Based Data Security Approaches
1.5.1 Reliable Data Provenance
1.5.1.1 Use Case
1.5.1.2 Design Challenges
1.5.2 Transparent Data Query
1.5.2.1 Use Case
1.5.2.2 Design Challenges
1.5.3 Fair Data Marketing
1.5.3.1 Use Case
1.5.3.2 Design Challenges
1.6 Aim of the Monograph
References
2 Fundamental Data Security Technologies
2.1 Basic Crypto Technologies
2.1.1 Notations
2.1.2 Digital Signature
2.1.3 Data Encryption
2.1.3.1 Symmetric Key-Based Encryption
2.1.3.2 Public Key-Based Encryption
2.1.4 Hash Function
2.2 Basic Blockchain Technologies
2.2.1 Data Structures
2.2.2 Identity and Transaction Management
2.2.3 Consensus Protocol and Reward Mechanism
2.2.4 Smart Contract
2.2.5 Channel in Hyperledger Fabric
2.2.6 Performance Metrics
2.2.7 Testing Network
2.3 Privacy-Enhancing Technologies for Blockchain
2.3.1 Cryptographic Commitment
2.3.1.1 Pedersen Commitment
2.3.1.2 Polynomial Commitment
2.3.1.3 Vector Commitment
2.3.1.4 Merkle Tree
2.3.2 Zero-Knowledge Proof
2.3.2.1 Sigma Protocol
2.3.2.2 Fiat-Shamir Heuristic
2.3.2.3 ZKP for Algebraic Relations
2.3.3 zk-SNARK
2.3.3.1 Workflow of zk-SNARK
2.3.3.2 Quadratic Arithmetic Program (QAP)
2.3.3.3 Non-Universal zk-SNARK
2.3.3.4 Universal zk-SNARK
2.3.3.5 Open-Source Implementations
2.3.4 Commit-and-Prove ZKP
2.3.4.1 Application Scenario
2.3.4.2 Constructions
2.3.5 Anonymous Credential
2.3.5.1 Definitions
2.3.5.2 Representative Constructions
2.4 On/off-chain Computation Model for Blockchain
2.4.1 SNARK-Based Approach
2.4.2 Trusted Execution Environment-Based Approach
2.4.2.1 Useful Mechanisms
2.4.2.2 Integrating Blockchain with SGX
2.4.2.3 Implementations
2.5 Summary
References
3 Reliable Data Provenance in HCN
3.1 Motivations and Applications
3.2 Application Requirements
3.2.1 Provenance Trustworthiness
3.2.2 Provenance Privacy
3.2.3 Provenance Query
3.3 State-of-the-Art Data Provenance Approaches
3.3.1 Non-Blockchain-Based Approach
3.3.2 Blockchain-Based Approach
3.3.3 Decentralization and Efficiency Dilemma
3.4 Use Case: Distributed Network Provenance
3.4.1 Network Provenance Model
3.4.1.1 Graph-Based Network Provenance
3.4.1.2 Distributed Network Provenance Model
3.4.2 Defining Archiving Security
3.4.2.1 Security Model
3.4.2.2 Design Goals
3.4.3 Building Blocks
3.4.3.1 Cryptographic Primitives
3.4.3.2 Pinocchio-Based VC
3.4.4 Representative Constructions
3.4.4.1 System Setup by TA
3.4.4.2 On-chain Digest Construction by Administrators
3.4.4.3 Cross-Domain Provenance Query
3.4.4.4 Verification of Provenance Query
3.4.5 Security Analysis
3.4.5.1 Security Assumptions
3.4.5.2 Blockchain Security
3.4.5.3 VC Security
3.4.5.4 Security of Merkle Proof
3.4.5.5 Archiving Security
3.4.6 Performance Evaluation
3.4.6.1 Digest Performance Analysis
3.4.6.2 Off-chain Performance
3.4.6.3 On-chain Performance Analysis
3.4.6.4 Multi-Level Query Strategy
3.5 Summary and Discussions
References
4 Transparent Data Query in HCN
4.1 Motivations and Applications
4.2 Application Requirements
4.2.1 Privacy
4.2.2 Trustworthiness
4.2.3 Efficiency
4.3 State-of-the-Art Data Query Approaches
4.3.1 Cloud-Based Data Query
4.3.2 Blockchain-Based Data Query
4.3.3 Decentralization and Efficiency Dilemma
4.4 Use Case: Blockchain-Based VNF Query
4.4.1 VNF Query in HCN
4.4.2 Threat Model and Design Goals
4.4.3 Building Blocks
4.4.3.1 Cryptographic Notations
4.4.3.2 Commitment Schemes
4.4.3.3 SNARG
4.4.4 Representative Constructions
4.4.4.1 System Setup
4.4.4.2 Design of Pruning Function
4.4.4.3 VNF Listing
4.4.4.4 VNF Query Construction
4.4.4.5 VNF Query Processing
4.4.4.6 VNF Query Verification
4.4.5 Security Analysis
4.4.5.1 Security of SNARG
4.4.5.2 Security of Commitments
4.4.5.3 Dictionary Pruning Security
4.4.5.4 Verifiable VNF Query
4.4.6 Performance Evaluation
4.4.6.1 Implementation Overview
4.4.6.2 Off-Chain Benchmarks
4.4.6.3 Performance Gain by Dictionary Pruning
4.4.6.4 Overheads for Dictionary Pruning
4.4.6.5 On-Chain Benchmarks
4.5 Summary and Discussions
References
5 Fair Data Marketing in HCN
5.1 Motivations and Applications
5.2 Application Requirements
5.2.1 Regulation Compliance
5.2.2 Identity Privacy
5.2.3 Data Marketing Fairness
5.3 State-of-the-Art Data Marketing Approaches
5.3.1 Centralized Data Marketing
5.3.2 Decentralized Data Marketing
5.3.2.1 On-Chain Model
5.3.2.2 On/off-Chain Model
5.3.3 Decentralization and Fairness Dilemma
5.4 Use Case: Blockchain–Cloud Fair Data Marketing
5.4.1 Blockchain–Cloud Data Marketing Model
5.4.2 Security Model and Goals
5.4.3 Design Goals
5.4.4 Building Blocks
5.4.4.1 Cryptographic Notations
5.4.4.2 ElGamal Encryption
5.4.4.3 Zero-Knowledge Proof
5.4.4.4 Multi-message PS Signature
5.4.4.5 Public Verifiable Secret Sharing (PVSS)
5.4.5 Representative Constructions
5.4.5.1 Setup
5.4.5.2 Registration
5.4.5.3 Data Listing
5.4.5.4 Data Trading
5.4.5.5 Tracing
5.4.6 Security Analysis
5.4.6.1 Blockchain Security
5.4.6.2 Credential Security
5.4.6.3 Consortium Management
5.4.6.4 Marketing Fairness
5.4.7 Performance Evaluation
5.4.7.1 Complexity Analysis
5.4.7.2 Experimental Setup
5.4.7.3 Off-Chain Performance
5.4.7.4 On-Chain Performance
5.5 Summary and Discussions
References
6 Conclusion and Future Works
6.1 Conclusion
6.1.1 Reliable Data Provenance
6.1.2 Transparent Data Query
6.1.3 Fair Data Marketing
6.2 Future Works
6.2.1 On/off-Chain Computation Model with Modular Designs
6.2.2 Multi-party Fair AI Model Sharing
Index
Recommend Papers

Blockchain-Based Data Security in Heterogeneous Communications Networks
 9783031524769, 9783031524776

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Wireless Networks

Dongxiao Liu Xuemin (Sherman) Shen

Blockchain-Based Data Security in Heterogeneous Communications Networks

Wireless Networks Series Editor Xuemin Sherman Shen, University of Waterloo, Waterloo, ON, Canada

The purpose of Springer’s Wireless Networks book series is to establish the state of the art and set the course for future research and development in wireless communication networks. The scope of this series includes not only all aspects of wireless networks (including cellular networks, WiFi, sensor networks, and vehicular networks), but related areas such as cloud computing and big data. The series serves as a central source of references for wireless networks research and development. It aims to publish thorough and cohesive overviews on specific topics in wireless networks, as well as works that are larger in scope than survey articles and that contain more detailed background information. The series also provides coverage of advanced and timely topics worthy of monographs, contributed volumes, textbooks and handbooks.

Dongxiao Liu • Xuemin (Sherman) Shen

Blockchain-Based Data Security in Heterogeneous Communications Networks

Dongxiao Liu Department of Electrical Computer Engineering University of Waterloo Waterloo, ON, Canada

Xuemin (Sherman) Shen Department of Electrical Computer Engineering University of Waterloo Waterloo, ON, Canada

ISSN 2366-1186 ISSN 2366-1445 (electronic) Wireless Networks ISBN 978-3-031-52476-9 ISBN 978-3-031-52477-6 (eBook) https://doi.org/10.1007/978-3-031-52477-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Preface

The future communications network is envisioned to have a highly heterogeneous architecture with a variety of network stakeholders, including technological vendors, service providers, operators, etc. To enable efficient and automatic network management, artificial intelligence (AI)-assisted methods are playing an increasingly important role, and the wealth of network data is key to the success of such methods. To this end, the future heterogeneous communications networks (HCN) require secure management of data lifecycles. However, under the strict requirements of privacy regulations, this becomes a non-trivial issue among distributed network stakeholders. Blockchain is a promising approach to building decentralized data management architecture among network stakeholders. By utilizing smart contract, network stakeholders can maintain a consistent and trusted view of data lifecycles for data sharing and processing. At the same time, the distribution and transparency nature of the blockchain puts forward new design challenges for data management in HCN. This monograph investigates the design challenges and presents three blockchainbased data security approaches for data management in HCN. In Chap. 1, we introduce the blockchain-based data management for HCN. We discuss the necessity to have a decentralized architecture and explore the challenges of balancing the efficiency, privacy, and fairness requirements. In Chap. 2, we present a survey of fundamental data security techniques for blockchain-based applications, ranging from basic crypto primitives to privacyenhancing techniques for blockchain. In Chap. 3, we investigate reliable data provenance in HCN. We define and realize the notion of archiving security and introduce an on/off-chain computation model to reduce the on-chain storage and computation costs. In Chap. 4, we investigate transparent data query in HCN. We design a dictionary pruning strategy for verifiable data queries, which reduces query space and improves the off-chain proving efficacy. In Chap. 5, we investigate fair data marketing in HCN. We design a hybrid marketing architecture with consortium management to preserve the identity privacy

v

vi

Preface

of data owners. We design a fair data marketing protocol with verifiable marketing operations. In Chap. 6, we conclude this monograph and discuss future research directions for blockchain-based data security approaches in HCN. We would like to thank Prof. Weihua Zhuang, Prof. Xiaodong Lin, Prof. Jianbing Ni, Prof. Jiahui Hou, Cheng Huang, Liang Xue, Rob Sun, Bidi Ying, and all BBCR members for their valuable suggestions and discussions. We would like to thank the senior editor Mary E. James and Praveena John from Springer Nature for their valuable help on this monograph. Waterloo, ON, Canada

Dongxiao Liu Xuemin (Sherman) Shen

Contents

1

2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Heterogeneous Communications Networks (HCN) . . . . . . . . . . . . . . . . . . . 1.2 Emerging Architecture of Mobile Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 AI-Assisted Network Management in HCN . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Challenges of Network Management . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Artificial Intelligence for Networking . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Data Management in HCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Data Lifecycle in HCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Centralized Data Management Approach . . . . . . . . . . . . . . . . . . . . . 1.4.3 Blockchain-Based Data Management Approach . . . . . . . . . . . . . . 1.4.4 Balancing Efficiency, Privacy, and Fairness in Blockchain-Based DM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Blockchain-Based Data Security Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Reliable Data Provenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Transparent Data Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 Fair Data Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Aim of the Monograph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 4 4 4 6 6 8 9 10 12 12 13 15 17 19

Fundamental Data Security Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basic Crypto Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Digital Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Data Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Hash Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Basic Blockchain Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Identity and Transaction Management . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Consensus Protocol and Reward Mechanism . . . . . . . . . . . . . . . . . 2.2.4 Smart Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Channel in Hyperledger Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23 23 23 24 24 25 26 27 27 28 30 30 vii

viii

Contents

2.2.6 Performance Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.7 Testing Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Privacy-Enhancing Technologies for Blockchain . . . . . . . . . . . . . . . . . . . . . 2.3.1 Cryptographic Commitment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Zero-Knowledge Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 zk-SNARK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Commit-and-Prove ZKP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Anonymous Credential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 On/off-chain Computation Model for Blockchain. . . . . . . . . . . . . . . . . . . . . 2.4.1 SNARK-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Trusted Execution Environment-Based Approach . . . . . . . . . . . . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30 31 31 32 36 40 44 47 50 51 52 54 54

3

Reliable Data Provenance in HCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Motivations and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Application Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Provenance Trustworthiness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Provenance Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Provenance Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 State-of-the-Art Data Provenance Approaches . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Non-Blockchain-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Blockchain-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Decentralization and Efficiency Dilemma . . . . . . . . . . . . . . . . . . . . . 3.4 Use Case: Distributed Network Provenance . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Network Provenance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Defining Archiving Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Representative Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Summary and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57 57 58 58 58 58 59 59 60 61 61 62 65 66 68 78 81 87 88

4

Transparent Data Query in HCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Motivations and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Application Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Privacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Trustworthiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Efficiency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 State-of-the-Art Data Query Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Cloud-Based Data Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Blockchain-Based Data Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Decentralization and Efficiency Dilemma . . . . . . . . . . . . . . . . . . . . . 4.4 Use Case: Blockchain-Based VNF Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 VNF Query in HCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91 91 92 92 93 93 93 93 94 95 96 98

Contents

ix

4.4.2 Threat Model and Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Representative Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.5 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

100 101 103 116 119 132 132

5

Fair Data Marketing in HCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Motivations and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Application Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Regulation Compliance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Identity Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Data Marketing Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 State-of-the-Art Data Marketing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Centralized Data Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Decentralized Data Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Decentralization and Fairness Dilemma . . . . . . . . . . . . . . . . . . . . . . . 5.4 Use Case: Blockchain–Cloud Fair Data Marketing . . . . . . . . . . . . . . . . . . . 5.4.1 Blockchain–Cloud Data Marketing Model . . . . . . . . . . . . . . . . . . . . 5.4.2 Security Model and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.5 Representative Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.6 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.7 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Summary and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

137 137 138 138 139 139 140 140 141 142 142 144 146 147 147 153 167 171 179 179

6

Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Reliable Data Provenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Transparent Data Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Fair Data Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 On/off-Chain Computation Model with Modular Designs . . . 6.2.2 Multi-party Fair AI Model Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . .

185 185 186 186 187 187 187 188

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

Acronyms

3GPP 5G AES AHP AI AP BFT CA CP CRS DM ECDSA EK GDPR HCN JPBC kNN LTE NFV PoS PoW PP PRF PVSS QAP QoE QoS RAM SAG SAGIN SAGVN

3rd-Generation Partnership Project Fifth Generation Advanced Encryption Standard Algebraic Holographic Proofs Artificial Intelligence Access Points Byzantine-Fault Tolerant Certificate Authority Commit-and-Prove Common Reference String Data Management Elliptic Curve Digital Signature Algorithm Evaluation Key General Data Protection Regulation Heterogeneous Communications Networks Java Pairing-Based Cryptography k-Nearest Neighbor Long-Term Evolution Network Function Virtualization Proof-of-Stake Proof-of-Work Pre-processing Pseudo-Random Function Public Verifiable Secret Sharing Quadratic Arithmetic Program Quality of Experience Quality of Service Random Access Memory Space-Air-Ground Space-Air-Ground Integrated Network Space-Air-Ground Vehicular Network xi

xii

SGX SHA SNARG SNARK SRS TA TEE UAV UTXO V2X VC VK VNF VR VR ZKP

Acronyms

Software Guard Extensions Secure Hash Algorithm Succinct Non-interactive Argument Succinct Non-interactive Argument of Knowledge Structured Reference String Trusted Authority Trusted Execution Environment Unmanned Aerial Vehicle Unspent Transaction Output Vehicle-to-Everything Verifiable Computation Verification Key Virtualized Network Function Augmented Reality Virtual Reality Zero-Knowledge Proof

Chapter 1

Introduction

1.1 Heterogeneous Communications Networks (HCN) Mobile communication network enables human-to-human, human-to-machine, and machine-to-machine wireless communications via mobile devices, such as mobile phones, sensors, etc. [1]. Nowadays, mobile communication network is playing a vital role in our daily lives by supporting a wide range of mobile services, including entertainment, live meetings, social network, and smart transportation. According to statistics, there are 6.4 billion mobile network subscriptions worldwide in 2022, which is predicted to increase to 7.7 billion in 2028,1 and mobile users on average spend a few hours on their phone every day. To this end, mobile communication network is also becoming the driving force in economic and societal developments, and attracting more and more attention from researchers, industry, and governmental officers. From the first generation (1G) to the currently commercialized fifth generation (5G) [2], the architecture of mobile communication network has witnessed a significant evolution. In the early 1G-2G, mainly talk and message services were supported by mobile networks. With the increasing bandwidth and data rate, 3G and 4G Long-term Evolution (LTE) can support image or video transmissions for more convenient human-to-human communications. In 5G, a huge number of Internet of Things devices are deployed, which can reach 15 billion worldwide in 2023.2 As a result, 5G network has evolved to further support massive human-to-machine and machine-to-machine communications that revolutionizes transportation, production, and education sectors.

1 https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/. 2 https://www.statista.com/statistics/1183457/iot-connected-devices-worldwide/.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 D. Liu, X. (Sherman) Shen, Blockchain-Based Data Security in Heterogeneous Communications Networks, Wireless Networks, https://doi.org/10.1007/978-3-031-52477-6_1

1

2

1 Introduction

Future networks will further explore technological advances to provide higher data rate, lower transmission delay, and ultra-reliable connections [3]. While the visions and roadmap of future networks are still under discussion, a common sense is that future networks will have ultra-complicated and heterogeneous stakeholders to support fruitful services and human-centric networking: • Heterogenous services: Future networks are envisioned to integrate various application scenarios, including vehicle-to-everything (V2X), industrial IoT, virtual reality/augmented reality (VR/AR), etc. With stronger communication capabilities, future network is going to provide a fully immersible and intelligent service landscape. Moreover, the services can have highly diversified and dynamic resource requirements [4], which can raise new challenges in efficiently managing network resources. • Heterogenous network stakeholders: Future networking architecture will integrate stakeholders from all industrial sectors, including technological vendors, mobile operators, cloud/edge servers, application providers, etc. For example, a more open networking architecture can include either 3rd Generation Partnership Project (3GPP) access points (AP) or non-3GPP AP. The stakeholders need to work closely to enhance user service diversity and resource management efficiency. • Human-centric networking [5]: Personalized network services that consider user locations, service requirements, and user preferences are playing a dominating role. That is, network stakeholders are shifting their service paradigm from guaranteeing Quality-of-Service (QoS) to improving Quality-of-Experience (QoE). The heterogeneity of future network stakeholders and services, as well as the human-centric networking requirements, put a significant burden on current network architecture, where model-based network management mechanism is widely adopted. Therefore, extensive research and practice efforts are required to design the advanced network architecture and management mechanism.

1.2 Emerging Architecture of Mobile Network In this section, we discuss an emerging architecture for mobile network that addresses the heterogeneity challenges. As shown in Fig. 1.1, space-air-ground integrated network (SAGIN) is regarded as a promising architecture of access networks [6]. Specifically, SAGIN [7] integrates various access points (AP) that come from either 3GPP mobile operators or non-3GPP private operators, including base stations and drive-through Wi-Fi AP at the ground, unmanned aerial vehicle (UAV) in the air, and satellites in the space. To support diversified services efficiently and effectively, the SAGIN architecture also has the following visions [8]:

1.2 Emerging Architecture of Mobile Network

3

Satellite Network

UAV Network

UAV Trajectory UAV

Cloud

Edge Catching

Ground Network Base Station

Non-3GPP AP

Fig. 1.1 Architecture of SAGVN

• Extreme connectivity: Ubiquitous connectivity can be achieved with the everyday and everywhere network coverage of SAGIN. Ground APs can support daily network coverage in urban areas while satellite APs can provide network access in rural areas. At the same time, UAV APs can help increase network capacity during rush hours or emergency times. • Network function virtualization (NFV) [9]: SAGIN should support virtualized network management. Specifically, network functions, such as wireless access, and routing, can be abstracted as virtualized network functions (VNFs) and a network service consists of a chain of VNFs as a network slice. By doing so, efficient and flexible resource sharing and service configurations can be realized to meet the stringent service requirements in future networks. • Network intelligence: Most importantly, future networks should support native AI, where network slices and network services are automatically configured by AI models. To this end, human-centric networking can be better provisioned and the overall management costs of network resources can be reduced. Among the above-mentioned three visions, AI-assisted network management is the key enabling technology that can address service heterogeneity in future networks, which will be discussed in detail in the next section.

4

1 Introduction

1.3 AI-Assisted Network Management in HCN 1.3.1 Challenges of Network Management To cope with the increasing service complexity and user dynamics, the resource management mechanism for future networks should satisfy the following requirements: • First, the resource management mechanism should be computationally efficient for quick decision-makings. • Second, the outcome from the network management should help improve overall resource utilization effectiveness. • Third, the management process should be automatic and require fewer human interventions. In the emerging network architecture for future networks, existing model-based network management mechanisms face new challenges [8]: First, the model-based methods often require that the traffic or service requirement follow known distribution; second, an associated optimization problem can be large in the size of the problem and the number of the parameters, which can time-consuming to solve the problem; third, the model-based management often requires experienced network engineers. Therefore, a novel and efficient network management mechanism is needed in future networks.

1.3.2 Artificial Intelligence for Networking In recent years, due to the developments of computing capability and the wealth of user data, artificial intelligence has gained significant technological advances in computer vision and automatic systems [10]. At the same time, momentum attention has also been directed to applying AI in the network management. For example, network intelligence is a vision that integrates AI into every network aspect while edge intelligence aims at deploying light-weight models at network edges [11, 11]. Federated learning [12, 13] can also help distributed network stakeholders work together and train an AI model to make system-level network management decisions. To achieve AI-based network management, it is important to properly manage the lifecycle of AI models in future networks. More specifically, as shown in Fig. 1.2, the lifecycle of AI models for network intelligence consists of four phases: • Data Management: Network stakeholders collaboratively manage the wealth of user and the system runtime data in terms of data collection, data storage, data sharing, and data usage. • Model Training: A model provider (e.g., mobile operator) uses the data to train an AI model to provide network management services.

1.3 AI-Assisted Network Management in HCN

5

Data Management

Model Training

Model Use

Model Deployment

Fig. 1.2 Lifecycle of AI models

• Model Deployment: The trained model can be deployed at a network entity to provide services. For example, traffic prediction models can be deployed at mobile operators while content prediction models can be deployed at edge servers. • Model Use: After the deployment, the model can be used with newly arrived input data to make network resource allocation decisions. Compared with the traditional model-based approach, the AI-based approach has the following benefits: • First, AI-based approach is data-driven and does not require knowledge of prior distributions. In this way, AI-based approach is more flexible and adaptive to various network management tasks. • Second, AI-based approach often has an efficient online inference phase, which can make timely decisions with less computational resources. • Third, AI-based approach can achieve automatic network management that requires fewer human interventions. To this end, AI-based network management is envisioned to play a key role in future networks and thus is attracting extensive research efforts. A case study regarding AI-based UAV trajectory design [14] is discussed to help better understand the lifecycle of AI models. In this case, the goal is to optimize overall file delivery performance to vehicular users while UAVs travel around an area with cached files. Specifically, a model provider can collect historic trajectory data and file requirements of vehicular users. From an optimization model, the model provider can calculate UAV trajectory and cached files for delivery. Using this information, the model trainer can train an AI model that replaces the optimization algorithm for future uses. By doing so, quick and effective online UAV trajectories can be made.

6

1 Introduction

However, to fulfill the full potential of AI-based network management, data security is critical for provisioning massive and high-quality training data to model providers, which requires collaborative data management among network stakeholders in future networks.

1.4 Data Management in HCN Data management (DM) in future networks refers to the management of data lifecycle [15] events by network stakeholders. The goal of DM is to establish a trustworthy and reliable platform for network stakeholders to record critical data lifecycle events. Specifically, DM has the following benefits for future networks: • First, DM can boost collaborations among network stakeholders to reduce data circulation barriers. • Second, recorded data lifecycle events can also serve as provenance evidence in case of data disputes. • Third, DM can help regulators enforce data privacy regulation rules and accountability against misbehaving parties.

1.4.1 Data Lifecycle in HCN As shown in Fig. 1.3, to support AI-assisted network management, network stakeholders need to collaboratively manage the data lifecycle [16], which roughly includes five phases: • Data Collection: Data from multiple network layers can help train AI models. For example, from the massive use of mobile devices, application providers can collect user behavior data, such as user clicks and trajectory. Mobile operators can collect service demand and traffic data at either network edge or core network. Network administrators can collect huge amounts of network log data, such as packet forwarding history. • Data Storage: Network stakeholders can either store the collected data on their own storage or outsource the data to a data center, such as a cloud server. • Data Sharing: Network stakeholders often need to share their data for training an AI model. For example, mobile operators need vehicle trajectory data from application providers. Together with the network operation data, such as service demand, the mobile operators can train a slice reservation model for allocating network resources to vehicular applications. • Data Processing: Network stakeholders can use the wealth of collected and stored data to train AI models for AI-assisted network management. • Data Deletion: After the data are used, the data should be deleted to protect user privacy.

1.4 Data Management in HCN

HCN Stakeholders Vendor

7

Data Lifecycle Collection

Operator

Storage

App Provider

Sharing

Data Center

Processing

User Deletion Fig. 1.3 Data lifecycle in HCN

As the data volume keeps increasing and frequent cross-domain data sharing often happens, it is not an easy task to securely manage the data lifecycle. Therefore, a secure and reliable data management approach is required among network stakeholders. However, due to the network heterogeneity in future networks, it still faces non-trivial design challenges [17]: • Trust: Data lifecycle management should be secure to network stakeholders across trust boundaries. • Reliability: Recording of data lifecycle events should be transparent and verifiable, which can resist attacks from both external and internal adversaries. • Efficiency: DM should be efficient among distributed stakeholders. Moreover, the operational costs of managing a DM platform should be low that does not put an additional burden on the stakeholders. • Regulation: In recent years, strict privacy regulations for user data have been proposed and enforced. A typical example is the European General Data Protection Regulation (GDPR) [18]. More specifically, GDPR establishes a set of privacy protection principles for data “controllers” and “processors,” such as application providers who collect user data. Failing to comply with the regulations can result in huge financial losses. Without going into details, some key GDPR terms are highlighted below, including data owner’s rights and data controller/processor’s obligations. – Personal data should be collected and processed with specified purposes that are transparent to data owners. Unauthorized access to the data or data processing beyond the data usage agreement should be restricted.

8

1 Introduction

– Proper consent from data owners must be obtained before processing the data. The consent shall be able to be withdrawn by the data owners at any time. – GDPR grants data owners a set of rights, including Right-to-be-forgotten Right-to-object, Right-to-restrict-processing, etc. These rights ensure that data owners are fully informed of the data processing and have full control over the processing of their personal data. – Data controller/processor should record detailed data processing activities. When there are multiple controllers who jointly determine the data processing, the respective responsibilities of each controller should be decided transparently. – Cross-border data transfers are strictly regulated by GDPR. – Accountability or penalties against misbehaving parties should be enforced by the authority in an effective and proper manner. To address the above-mentioned challenges, two DM approaches are discussed and compared in the next section.

1.4.2 Centralized Data Management Approach The first approach is to rely on a centralized DM platform for recording critical data lifecycle events. As shown in Fig. 1.4, network stakeholders can pre-determine management rules and communicate with each other for data sharing/processing via the centralized platform. For example, a data sharing instance can be submitted by an involved party to be archived on the platform. In this way, the platform serves as a management authority and data storage unit for data lifecycle events. The centralized DM platform can be easily implemented with efficient operations. However, the centralized approach may have some limitations when being applied to HCN: • Lack of mutual trust: Network stakeholders can come from different trust domains. For example, application providers and mobile operators can be

HCN Stakeholders Vendor Operator

User

App Provider

Data Center

Fig. 1.4 Centralized DM architecture

DM by Centralized Platform

1.4 Data Management in HCN

9

controlled by different business groups. In this case, there is often insufficient mutual trust among the network stakeholders, which makes it difficult to agree on a single trusted authority for data management. • Lack of management transparency: Data lifecycle is managed by a single DM platform. Without effective regulation mechanisms and management rules, transparency of the DM platform cannot be guaranteed. As a result, the platform may not always strictly fulfill its responsibilities. • Single-Point Failure: The centralized DM platform is vulnerable to either external or internal attacks, which can cause a single-point failure of the platform and reduce platform reliability. Considering the limitations of the centralized DM platform, a decentralized DM platform can be a more promising approach. Specifically, the blockchain can be the underlying architecture for DM, which will be discussed in the next section.

1.4.3 Blockchain-Based Data Management Approach Blockchain is a ledger that is maintained by distributed network nodes [19–21]. More specifically, it consists of blocks of peer-to-peer transactions. With consensus protocols, such as Proof-of-Work (PoW), the blockchain helps ensure a consistent view of ledger storage among mutually distrusted network nodes. Blockchain was first proposed in 2009 as the underlying data storage structure of cryptocurrency, i.e., Bitcoin. Since then, it has found numerous applications beyond cryptocurrency, such as distributed databases and provenance ledgers. Using the blockchain as a trusted machine for program execution, smart contract [22] can be deployed for managing shared business logic [23]. Specifically, smart contract can define terms and conditions for participants to change ledger storage. For DM in future networks, smart contract can help network stakeholders to collaboratively manage data lifecycle events [15] over the blockchain for data sharing and processing [24–26]. In contrast to the centralized DM platform, a blockchain-based DM platform is shown in Fig. 1.5. Network stakeholders, including vendors, operators, application providers, etc., work together to maintain a consortium blockchain with efficient consensus protocols and membership management. The blockchain serves as a shared and trusted storage [27] for network stakeholders to record data lifecycle events via interactions with a DM contract. The blockchain-based DM platform has the following benefits: • Decentralization: There is no reliance on a single and trusted platform for data management. Network stakeholders with inefficient mutual trust work collaboratively to maintain a trusted and shared storage. • Transparency: Blockchain storage and state updates are transparent to all participating nodes. This significantly increases management transparency which complies with GDPR requirements.

10

1 Introduction

HCN Stakeholders Vendor Operator

User

App Provider

Data Center

DM Contract

Blockchain

Fig. 1.5 Blockchain-based DM architecture

• Immutability: The blockchain storage is immutable and cannot be maliciously modified. Therefore, it can serve as a reliable and secure provenance ledger to archive DM events as digital evidence in case of any DM disputes [28]. • No Single-point-failure: DM events are shared among multiple network nodes who maintain the blockchain, which increases the robustness of the storage to avoid single-point failure. There are extensive state-of-the-art works that adopt the blockchain for DM services, including outsourcing computations [29], secure auditing [30], digital forensics [31], network slicing management [32], deep learning [33], data sharing [34], etc.

1.4.4 Balancing Efficiency, Privacy, and Fairness in Blockchain-Based DM While the blockchain-based DM can achieve distinguished benefits, it also has some inherited challenges as a distributed architecture: • On-chain Efficiency: All blockchain full nodes must maintain a full copy of the ever-increasing ledger. At the same time, the full nodes also run consensus protocols to achieve collective consistency in ledger storage. With the on-chain data being duplicated at every node and transaction verification being verified by the majority of the nodes, on-chain storage and computation costs can be significantly expensive compared with the off-chain costs [35]. For example, Ethereum blockchain uses Gas to measure the on-chain computation and storage costs where a simple operation can cost contract callers a few dollars. • On-chain Privacy: Data stored on the blockchain are transparent to blockchain nodes. That is, each blockchain node can freely download and access on-chain data, which could cause privacy issues if data stored on the blockchain are sensitive, such as user profile data. Therefore, additional measures including

1.4 Data Management in HCN

11

Privacy Decentralization

Efficiency Fairness

Fig. 1.6 Balancing privacy, efficiency, and fairness in blockchain-based DM

encryption and access control should be provided for on-chain data privacy preservation [36]. For example, the consortium blockchain, Hyperledger Fabric, uses channels (sub-chains) to achieve data access control, where only members of a specific channel can access the data within the channel. • Fairness: Blockchain is a distributed architecture with (potentially anonymous) participants from everywhere. Participant behavior can change dramatically which can be unpredictable and malicious. A malicious blockchain node can purposely delay, drop, or modify received messages from neighbor nodes, which can cause fairness issues [37] beyond security issues. For example, transactions of a specific blockchain node can be purposely delayed to be included in the ledger and on-chain data trading can be canceled if one party suddenly drops out of the trading. Therefore, efficient countermeasures with accountability enforcement must be obtained for blockchain operations. For practical implementations and large-scale applications, the above-mentioned three challenges must be addressed or mitigated for blockchain-based DM. However, there are often trade-offs between the challenges as shown in Fig. 1.6: • Efficiency vs Privacy: Many privacy-preserving techniques, such as data encryption and zero-knowledge proof (ZKP), can be applied to enhance the privacy of on-chain data. However, the use of privacy-preserving techniques usually requires additional computation and storage overheads. For example, homomorphic encryptions can increase the size of the original plaintext, and verifications of ZKPs can be communication inefficient. At the same time, the proof size can be linear in the size of the problem for specific ZKP constructions. • Privacy vs Fairness: Although identity privacy and data privacy are essential for blockchain applications, they can sometimes increase the risk of unfair collaborations. Specifically, when the data privacy is preserved, traditional inspection or forensic mechanisms cannot be directly applied. For example,

12

1 Introduction

applying inspection rules on encrypted data packets and transmissions is a nontrivial task. As a result, malicious behavior in data operations becomes harder to detect and the fairness of DM cannot be easily preserved. • Fairness vs Efficiency: To preserve fairness in multi-party computation protocols, additional measures must be deployed. For example, secret sharing techniques are used to counter drop-out attacks while hash-locked deposits in smart contract are applied to penalize dishonest behaviors. However, the additional measures bring non-negligible costs, such as the increased communication costs caused by broadcasting messages in secret sharing protocols. To this end, the decentralization proprieties of the blockchain are double swords for DM. While significantly increasing the trust and transparency of the DM, the trade-offs between its privacy, efficiency, and fairness challenges must be carefully addressed. In the following, the monograph discusses three security approaches as illustrative cases, to investigate how practical designs can be applied to strike a balance between the challenges.

1.5 Blockchain-Based Data Security Approaches In this section, three representative blockchain-based data security approaches for future networks are presented: reliable data provenance, transparent data query, and fair data marketing, in terms of background, use case, and design challenges. The three approaches are of important research and application value, which can serve as the foundations of many blockchain-based DM schemes.

1.5.1 Reliable Data Provenance Data provenance refers to the storage and archiving of historic network runtime data, which is originally from the database area [38]. Data provenance can be used to answer “why” questions in the network for debugging and diagnosis purposes. For example, why does a packet forwarding instance fail, or why does a network congestion happen? Various modeling and analysis techniques, such as provenance graphs to describe dependency relations between network events, can be applied in the data provenance for different application scenarios. In future networks, network stakeholders are becoming extremely heterogeneous and distributed. As a result, provenance data can be stored at network stakeholders of different trust domains and cross-domain data sharing [39] is often necessary for collective goals. Since the stakeholders cannot easily agree on a centralized authority to manage the provenance data, a blockchain-based decentralized approach is required to collectively manage and analyze the provenance data [40].

1.5 Blockchain-Based Data Security Approaches

1.5.1.1

13

Use Case

A typical use case is the distributed network provenance [41, 42]. Network administrators of different network domains store their network logs on a blockchain. When a global system error happens and cross-domain network diagnosis is required, the network administrators can query and retrieve the others’ network logs on the blockchain to build a provenance graph that represents the relationships between related network events. Then, the network administrators can analyze the constructed provenance graph to identify the root causes of the global error. In this case, the network logs serve as reliable provenance data for network administrators.

1.5.1.2

Design Challenges

While the blockchain-based approach can achieve reliable data provenance across network trust domains, it also raises new design challenges: • For efficiency, due to the large volume of provenance data and expensive on-chain computation and storage costs, it is prohibitively expensive to directly store and process the data on the blockchain. • For privacy, provenance data, such as network logs, may contain sensitive user information. Without proper protections, directly storing the data on the blockchain can raise a high risk of privacy leakage, which does not comply with GDPR requirements. • For fairness, effective and efficient misbehavior detection and accountability enforcement should be achieved for fair data provenance. For example, providing incorrect provenance data should be detectable and accountable. Therefore, an efficient data provenance scheme that reduces on-chain overheads and preserves on-chain data privacy should be designed. This monograph will discuss a representative scheme in Chap. 3.

1.5.2 Transparent Data Query If we regard blockchain as a distributed database, query functionalities must be supported, to lay the foundations of many other advanced applications. For example, the first step of the data provenance approach is to query the provenance logs of another network administrator. More specifically, there are several basic query modules: • Keyword query: This is the basic query module that checks if an item contains the given keyword. • Range query: This query module checks if a numeric value lies in a given range.

14

1 Introduction

• Membership query: This query module checks if an item is a member of a set of items. The query modules can be combined in a conjunctive or disjunctive manner. General query language, such as SQL, can also be applied to support versatile query types. To query data on the blockchain, developers can use a smart contract [43] to define query functions and return query results as contract calls. By doing so, trustworthy and transparent queries over the blockchain data can be achieved. Another approach is to use verifiable query techniques [44] to build on/off-chain computation models.

1.5.2.1

Use Case

A typical use case of blockchain-based data query is the collaborative VNF management in future networks [45]. Future network is envisioned to have a multiprovider NFV paradigm, where multiple network resource providers abstract their services as VNFs and a network slice can consist of VNFs from multiple providers. To this end, a consortium blockchain platform can be built for network resource providers to collaboratively manage VNF information and slice configuration. When a user has a service requirement, a VNF manager can use the requirement to query the VNF information on the blockchain and find suitable VNFs for the service. For example, a user request can include service types, performance metrics, and prices while a suitable slice can consist of an access function, a firewall function, and a data transmission function.

1.5.2.2

Design Challenges

Due to the limitations of on-chain resources, designing an efficient data query scheme over the blockchain faces the following challenges: • For privacy, data owners would not like to directly store their data on the blockchain for query services which may contain sensitive business information. For example, VNF information of network resource providers can contain the VNF location and routing information. • For computation efficiency, as discussed before, directly conducting data query can be costive on the blockchain. Therefore, on/off-chain computation models with verifiable computations can offload the expensive on-chain computations to cheap off-chain computations. For example, using succinct non-interactive argument of knowledge (SNARK) for VNF query can achieve efficient proof verifications on the blockchain. However, the off-chain proof generation cost of the SNARK-based approach increases significantly. • For storage efficiency, on-chain data structures need also to be carefully designed. Joint commitment scheme design, including Pedersen commitments with multi-

1.5 Blockchain-Based Data Security Approaches

15

ple indexes, Merkle commitments, and polynomial commitments, is required for different query types. • For fairness, efficient responses to disputes about data and query results should be guaranteed with effective accountability enforcement. Transparent data query on the blockchain is the most fundamental security approach for blockchain-based DM. Efficient designs for different query types, inter/intro-chain queries, and on/off-chain computation models need to be achieved for practical implementations. This monograph will discuss a representative scheme in Chap. 4.

1.5.3 Fair Data Marketing AI-assisted network management requires a large volume of high-quality data from data owners, such as users or system administrators. As data owners in future networks can be highly heterogeneous and come from different trust domains, data sharing is often required to obtain the data for model training. Moreover, since the data are of great business value, data owners are also willing to trade their data for financial gains, which leads to the recent development of data marketing [46]. However, due to the enforcement of data privacy laws, such as GDPR, data sharing and trading across different entities are under strict regulations. To comply with the regulation requirements, a transparent and reliable marketing platform can be built, where the blockchain can serve as the underlying architecture [47, 48]. More specifically, data owners and data buyers can conduct data listing, data trading, and payments on the blockchain where critical data marketing operations are recorded for regulation purposes. However, due to the unpredictable behavior of data owners and data buyers, there are fairness issues in data marketing that cannot be overlooked: • First, data owners should provide clear descriptions of their data and payments should be guaranteed if the actual data content meets the descriptions. • Second, data buyers should only pay a data owner if they get the correct data that are promised by the data owner. • Third, dishonest operations, such as purposely delaying data transmission, dropping out of transactions, or providing incorrect data, should be detectable and accountable.

1.5.3.1

Use Case

We present a use case of blockchain-based data marketing for IoT data [46]. The deployment of IoT devices is increasing worldwide, which leads to the generation and collection of IoT data at a remarkable speed. As IoT devices are often short of storage and processing capabilities, they can rely on a third-party server, such as a

16

1 Introduction

cloud server, for data services. With the integration of the blockchain, a hybrid data marketing model can be constructed as follows: • Data owners would like to sell their IoT data to buyers, such as vendors for product development. • The blockchain is a transparent data controller that records data marketing instances for provenance and regulations. • The cloud server is the data processor that is responsible for receiving data transfer requests from data owners and transferring the requested data to data buyers. The hybrid data marketing architecture has the following benefits: First, it relieves the data owners of IoT devices from storing and processing the data by outsourcing the data to a cloud server. Second, by utilizing the blockchain, the architecture achieves reliable data marketing control that complies with the transparency requirement of GDPR. Third, the architecture eliminates the reliance on a centralized data marketing platform to increase trust among heterogeneous network stakeholders.

1.5.3.2

Design Challenges

The above-mentioned hybrid model still faces the following implementation challenges considering the limitations of a blockchain architecture: • For privacy, data should be stored on an off-chain cloud server without being directly exposed on the blockchain. In this case, the outsourced data should also be encrypted on the cloud, which prohibits implementing other functionalities, including data query, provenance, etc. In different applications, data privacy preferences and policies can change dramatically [49], which requires finegrained privacy management and automatic detection tools [50]. The data owner and buyer’s identity privacy should be preserved on the chain. However, their true identities can still be recovered if disputes happen. • For storage efficiency, succinct commitments of off-chain data should be designed for various provenance and query purposes. At the same time, succinct commitments of data marketing operations of involved entities should also be designed for regulations. • For computation efficiency, both on-chain verifications and off-chain proof generations should be computationally efficient. • For fairness, first, the above-mentioned three fairness goals of the data marketing should be achieved. Second, with the introduction of the cloud server, additional measures should also be designed to motivate the cloud server to behave timely and honestly. Fair data marketing with the cloud-blockchain hybrid model is not a trivial task for designers and practitioners. This monograph will discuss a representative scheme in Chap. 5.

1.6 Aim of the Monograph

17

1.6 Aim of the Monograph This monograph aims to discuss efficient blockchain-based data security approaches to achieve trustworthy data management across network stakeholders. More specifically, this monograph investigates three data security approaches in blockchainbased DM, including reliable data provenance, transparent data query, and fair data marketing, and presents a set of on/off-chain solutions to strike a balance between privacy, efficiency, and fairness in blockchain-based DM. The organization of the monograph is shown in Fig. 1.7. In Chap. 2, we present a comprehensive survey of fundamental data security techniques for blockchain-based applications. We first review basic crypto techniques, including digital signature, data encryption, and hash function. Then, we discuss basic blockchain techniques from its data structures, identity and transaction management, consensus and reward mechanism to the smart contract, to give readers an overview of the blockchain. We take the Hyperledger Fabric as a concrete example of consortium blockchain to discuss its features. Performance metrics and testing network of the blockchain are also presented. Furthermore, we investigate privacy-enhancing techniques designed for blockchain-based applications. More specifically, cryptographic commitment schemes, including Pedersen commitment, vector commitment, and Merkle tree, can help enhance on-chain data privacy for data storage; zero-knowledge proof (ZKP) including Sigma protocol and complex relations can achieve on-chain privacy for data computations while zk-SNARK

Chapter 1 Introduction Chapter 2 Fundamental Data Security Technologies

Blockchainbased Data Security in HCN

Chapter 3 Reliable Data Provenance Chapter 4 Transparent Data Query Chapter 5 Fair Data Marketing Chapter 6 Conclusion and Future Works

Fig. 1.7 Organization of the monograph

18

1 Introduction

techniques further enhance on-chain efficiency with succinct verifications for arbitrary relations; on/off-chain computation models for the blockchain can be built either from SNARK or trusted execution environment; anonymous credential techniques are critical of realizing on-chain identity privacy. In Chap. 3, to design the reliable data provenance, we propose a secure and efficient distributed network provenance scheme based on the blockchain [42]. The proposed scheme builds a distributed storage using blockchain for network domain managers to share network log data for diagnosis of global errors. To solve the onchain efficiency and privacy issue of directly storing log data on the blockchain, the proposed scheme uses cryptographic commitment to succinctly digest and store network log data. With the integration of the zk-SNARK technique, efficient offchain provenance proof can be generated for cross-domain data provenance that can be efficiently verified on the blockchain. The proposed scheme formalizes a novel security notion, denoted as “archiving security,” where network domain managers honestly commit their network log data for future provenance queries. Extensive experiments from real-world SNARK implementations and thorough security analysis are conducted to demonstrate the efficiency and security of the proposed scheme. In Chap. 4, to design the transparent data query, we propose an authenticated and prunable dictionary for blockchain-based VNF management [51]. First, the proposed scheme designs a blockchain-based VNF management framework among different network resource providers to eliminate reliance on a centralized VNF management entity. Second, the proposed scheme integrates Pedersen commitment with zk-SANRK to succinctly store the VNF dictionary on the blockchain for efficient on-chain VNF query verifications. To address the random access memory (RAM) issue in off-chain proof generation using SNARK, a two-level SNARK system for verifiable VNF query is proposed, where the first SNARK “prunes” the VNF dictionary to narrow down the query space and the second SNARK completes the query. By doing so, unnecessary memory accesses are avoided when verifiable VNF queries are represented as an arithmetic circuit. Moreover, a dictionary pruning mechanism is designed to securely generate a compact authenticator for the pruned dictionary. Extensive experiments demonstrate that the proposed dictionary pruning strategy can achieve significant performance increase in proof generation compared with the traditional SNARK-based verifiable query scheme. A real-world blockchain testing network based on Hyperledger Fabric is set up to showcase the feasibility of the proposed scheme. In Chap. 5, to design the fair data marketing, we propose an efficient and fair data marketing scheme based on the consortium blockchain [46]. The proposed scheme introduces the hybrid data marketing model that complies with GDPR, where the cloud server serves as a data storage unit and the blockchain serves as a control unit. More specifically, data are encrypted before being outsourced to the cloud server, and data decryption keys are traded securely on the blockchain. The proposed scheme designs a set of zero-knowledge proof protocols for commit-and-prove data marketing operations of data owners, data buyers, and the cloud server. With the ZKP protocols and financial incentives, the proposed scheme sets the marketing

References

19

workflow carefully to ensure that rational entities honestly conduct marketing operations and marketing misbehavior can be efficiently detected. Detailed security analysis demonstrates that the proposed scheme achieves fair data marketing, where data owners can only be paid if data buyers get the correct data. Moreover, consortium management of anginous credentials for data owners is achieved with distributed credential issuance and threshold identity tracing. Extensive experiments are conducted using a real-world testing blockchain network to show the feasibility and efficiency of the proposed scheme. In Chap. 6, we conclude this monograph and discuss future directions of blockchain-based data security approaches in HCN. Three representative blockchain-based data security approaches are investigated in this monograph to shed light on the practical design and analysis of blockchain-based DM in HCN. For AI-assisted data processing, more research efforts should be directed to the designs of on/off-chain computation models with modular instantiations and multi-party fair AI model sharing with efficient verification.

References 1. J. De Vriendt, P. Lainé, C. Lerouge, and X. Xu, “Mobile network evolution: a revolution on the move,” IEEE Communications Magazine, vol. 40, no. 4, pp. 104–111, 2002. 2. P. Rost, A. Banchs, I. Berberana, M. Breitbach, M. Doll, H. Droste, C. Mannweiler, M. A. Puente, K. Samdanis, and B. Sayadi, “Mobile network architecture evolution toward 5G,” IEEE Communications Magazine, vol. 54, no. 5, pp. 84–91, 2016. 3. X. You, C.-X. Wang, J. Huang, X. Gao, Z. Zhang, M. Wang, Y. Huang, C. Zhang, Y. Jiang, J. Wang et al., “Towards 6G wireless communication networks: Vision, enabling technologies, and new paradigm shifts,” Science China Information Sciences, vol. 64, pp. 1–74, 2021. 4. J. Feng, F. R. Yu, Q. Pei, J. Du, and L. Zhu, “Joint optimization of radio and computational resources allocation in blockchain-enabled mobile edge computing systems,” IEEE Transactions on Wireless Communications, vol. 19, no. 6, pp. 4321–4334, 2020. 5. S. Dang, O. Amin, B. Shihada, and M.-S. Alouini, “What should 6G be?” Nature Electronics, vol. 3, no. 1, pp. 20–29, 2020. 6. T. Ma, B. Qian, X. Qin, X. Liu, H. Zhou, and L. Zhao, “Satellite-terrestrial integrated 6g: An ultra-dense LEO networking management architecture,” IEEE Wireless Communications, 2022. 7. N. Kato, Z. M. Fadlullah, F. Tang, B. Mao, S. Tani, A. Okamura, and J. Liu, “Optimizing spaceair-ground integrated networks by artificial intelligence,” IEEE Wireless Communications, vol. 26, no. 4, pp. 140–147, 2019. 8. X. Shen, J. Gao, W. Wu, K. Lyu, M. Li, W. Zhuang, X. Li, and J. Rao, “AI-assisted networkslicing based next-generation wireless networks,” IEEE Open Journal of Vehicular Technology, vol. 1, pp. 45–66, 2020. 9. W. Zhuang, Q. Ye, F. Lyu, N. Cheng, and J. Ren, “SDN/NFV-empowered future IOV with enhanced communication, computing, and caching,” Proceedings of the IEEE, vol. 108, no. 2, pp. 274–291, 2019. 10. V. C. Müller and N. Bostrom, “Future progress in artificial intelligence: A survey of expert opinion,” Fundamental Issues of Artificial Intelligence, pp. 555–572, 2016. 11. Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, 2019.

20

1 Introduction

12. W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang, D. Niyato, and C. Miao, “Federated learning in mobile edge networks: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 22, no. 3, pp. 2031–2063, 2020. 13. Z. Yang, M. Chen, K.-K. Wong, H. V. Poor, and S. Cui, “Federated learning for 6G: Applications, challenges, and opportunities,” Engineering, vol. 8, pp. 33–41, 2022. 14. H. Wu, F. Lyu, C. Zhou, J. Chen, L. Wang, and X. Shen, “Optimal UAV caching and trajectory in aerial-assisted vehicular networks: A learning-based approach,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 12, pp. 2783–2797, 2020. 15. R. Li and H. Asaeda, “A blockchain-based data life cycle protection framework for information-centric networks,” IEEE Communications Magazine, vol. 57, no. 6, pp. 20–25, 2019. 16. G. P. Freund, P. B. Fagundes, and D. D. J. de Macedo, “An analysis of blockchain and GDPR under the data lifecycle perspective,” Mobile Networks and Applications, vol. 26, pp. 266–276, 2021. 17. X. Shen, D. Liu, C. Huang, L. Xue, H. Yin, W. Zhuang, R. Sun, and B. Ying, “Blockchain for transparent data management toward 6G,” Engineering, vol. 8, pp. 74–85, 2022. 18. General Data Protection Regulation (GDPR). https://gdpr-info.eu. Accessed October 2023. 19. S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Decentralized Business Review, p. 21260, 2008. 20. X. Zheng, R. R. Mukkamala, R. Vatrapu, and J. Ordieres-Mere, “Blockchain-based personal health data sharing system using cloud storage,” in Proc. Of Healthcom, 2018, pp. 1–6. 21. H.-N. Dai, Z. Zheng, and Y. Zhang, “Blockchain for internet of things: A survey,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8076–8094, 2019. 22. G. Wood, “Ethereum: A secure decentralised generalised transaction ledger Byzantium version,” Ethereum Project Yellow Paper, pp. 1–39, 2018-06-05. 23. J. Mendling, I. Weber, W. V. D. Aalst, J. V. Brocke, C. Cabanillas, F. Daniel, S. Debois, C. D. Ciccio, M. Dumas, S. Dustdar et al., “Blockchains for business process managementchallenges and opportunities,” ACM Transactions on Management Information Systems (TMIS), vol. 9, no. 1, p. 4, 2018. 24. M. Yuan, Y. Xu, C. Zhang, Y. Tan, Y. Wang, J. Ren, and Y. Zhang, “Trucon: Blockchain-based trusted data sharing with congestion control in internet of vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp. 3489–3500, 2022. 25. Z. Xiong, Y. Zhang, D. Niyato, P. Wang, and Z. Han, “When mobile blockchain meets edge computing,” IEEE Communications Magazine, vol. 56, no. 8, pp. 33–39, 2018. 26. C. Xu, K. Wang, P. Li, S. Guo, J. Luo, B. Ye, and M. Guo, “Making big data open in edges: A resource-efficient blockchain-based approach,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 4, pp. 870–882, 2018. 27. C. Zhang, M. Zhao, L. Zhu, W. Zhang, T. Wu, and J. Ni, “Fruit: A blockchain-based efficient and privacy-preserving quality-aware incentive scheme,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 12, pp. 3343–3357, 2022. 28. N. B. Truong, K. Sun, G. M. Lee, and Y. Guo, “GDPR-compliant personal data management: A blockchain-based solution,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1746–1761, 2020. 29. C. Lin, D. He, X. Huang, and K.-K. R. Choo, “OBFP: Optimized blockchain-based fair payment for outsourcing computations in cloud computing,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3241–3253, 2021. 30. M. Li, Y. Chen, L. Zhu, Z. Zhang, J. Ni, C. Lal, and M. Conti, “Astraea: Anonymous and secure auditing based on private smart contracts for donation systems,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3002–3018, 2023. 31. M. Li, J. Weng, J.-N. Liu, X. Lin, and C. Obimbo, “Toward vehicular digital forensics from decentralized trust: An accountable, privacy-preserving, and secure realization,” IEEE Internet of Things Journal, vol. 9, no. 9, pp. 7009–7024, 2021.

References

21

32. D. B. Rawat, “Fusion of software defined networking, edge computing, and blockchain technology for wireless network virtualization,” IEEE Communications Magazine, vol. 57, no. 10, pp. 50–55, 2019. 33. J. Weng, J. Weng, J. Zhang, M. Li, Y. Zhang, and W. Luo, “Deepchain: Auditable and privacypreserving deep learning with blockchain-based incentive,” IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. 2438–2455, 2019. 34. Z. Su, Y. Wang, Q. Xu, and N. Zhang, “LVBS: Lightweight vehicular blockchain for secure data sharing in disaster rescue,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 1, pp. 19–32, 2020. 35. S. Bowe, A. Chiesa, M. Green, I. Miers, P. Mishra, and H. Wu, “Zexe: Enabling decentralized private computation,” in Proc. of IEEE S&P, 2020, pp. 947–964. 36. A. Kosba, A. Miller, E. Shi, Z. Wen, and C. Papamanthou, “Hawk: The blockchain model of cryptography and privacy-preserving smart contracts,” in Proc. of IEEE S&P, 2016, pp. 839– 858. 37. M. Li, J. Weng, A. Yang, J.-n. Liu, and X. Lin, “Towards blockchain-based fair and anonymous ad dissemination in vehicular networks,” vol. 68, no. 11, 2019, pp. 11 248–11 259. 38. A. Chen, Y. Wu, A. Haeberlen, B. T. Loo, and W. Zhou, “Data provenance at internet scale: Architecture, experiences, and the road ahead,” in Proc. of CIDR, 2017. 39. L. Liu, J. Feng, Q. Pei, C. Chen, Y. Ming, B. Shang, and M. Dong, “Blockchain-enabled secure data sharing scheme in mobile-edge computing: An asynchronous advantage actor– critic learning approach,” IEEE Internet of Things Journal, vol. 8, no. 4, pp. 2342–2353, 2020. 40. X. Liang, S. Shetty, D. Tosh, C. Kamhoua, K. Kwiat, and L. Njilla, “Provchain: A blockchain-based data provenance architecture in cloud environment with enhanced privacy and availability,” in Proc. of IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2017, pp. 468–477. 41. W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr, “Secure network provenance,” in Proc. of ACM Symposium on Operating Systems Principles, 2011, pp. 295– 310. 42. D. Liu, J. Ni, C. Huang, X. Lin, and X. Shen, “Secure and efficient distributed network provenance for IoT: A blockchain-based approach,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7564–7574, 2020. 43. S. Hu, C. Cai, Q. Wang, C. Wang, X. Luo, and K. Ren, “Searching an encrypted cloud meets blockchain: A decentralized, reliable and fair realization,” in Proc. of IEEE INFOCOM, 2018, pp. 792–800. 44. H. Wu, Z. Peng, S. Guo, Y. Yang, and B. Xiao, “VQL: efficient and verifiable cloud query services for blockchain systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 6, pp. 1393–1406, 2021. 45. J. Backman, S. Yrjölä, K. Valtanen, and O. Mämmelä, “Blockchain network slice broker in 5G: Slice leasing in factory of the future use case,” in Internet of Things Business Models, Users, and Networks, 2017, pp. 1–8. 46. D. Liu, C. Huang, J. Ni, X. Lin, and X. Shen, “Blockchain-cloud transparent data marketing: Consortium management and fairness,” IEEE Transactions on Computers, vol. 71, no. 12, pp. 3322–3335, 2022. 47. M. S. Rahman, A. Al Omar, M. Z. A. Bhuiyan, A. Basu, S. Kiyomoto, and G. Wang, “Accountable cross-border data sharing using blockchain under relaxed trust assumption,” IEEE Transactions on Engineering Management, vol. 67, no. 4, pp. 1476–1486, 2020. 48. J. Kang, R. Yu, X. Huang, M. Wu, S. Maharjan, S. Xie, and Y. Zhang, “Blockchain for secure and efficient data sharing in vehicular edge computing and networks,” IEEE internet of things journal, vol. 6, no. 3, pp. 4660–4670, 2018. 49. Y. Qu, S. Du, S. Li, Y. Meng, L. Zhang, and H. Zhu, “Automatic permission optimization framework for privacy enhancement of mobile applications,” IEEE Internet of Things Journal, vol. 8, no. 9, pp. 7394–7406, 2020.

22

1 Introduction

50. L. Zhou, C. Wei, T. Zhu, G. Chen, X. Zhang, S. Du, H. Cao, and H. Zhu, “Policycomp: Counterpart comparison of privacy policies uncovers overbroad personal data collection practices,” in Proc. of USENIX Security, 2023, pp. 1073–1090. 51. D. Liu, C. Huang, L. Xue, J. Hou, X. Shen, W. Zhuang, R. Sun, and B. Ying, “Authenticated and prunable dictionary for blockchain-based vnf management,” IEEE Transactions on Wireless Communications, vol. 21, no. 11, pp. 9312–9324, 2022.

Chapter 2

Fundamental Data Security Technologies

2.1 Basic Crypto Technologies Basic crypto technologies for blockchain data security are presented, including notations, digital signature, data encryption, and hash function.

2.1.1 Notations Let a set of cyclic groups with a prime order p be as follows [1, 2]: G = {G1 , G2 , GT }.

.

(2.1)

Let g and .g˜ be two generators from .G1 and .G2 , respectively. An efficient bilinear pairing is computed as follows: e(g a , g˜ b ) = e(g, g) ˜ ab ,

.

(2.2)

where .a, b ∈ Z2p . We consider a Type III pairing: G1 /= G2 ,

.

(2.3)

and there is no efficient homomorphism between .G1 and .G2 .

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 D. Liu, X. (Sherman) Shen, Blockchain-Based Data Security in Heterogeneous Communications Networks, Wireless Networks, https://doi.org/10.1007/978-3-031-52477-6_2

23

24

2 Fundamental Data Security Technologies

2.1.2 Digital Signature There are two entities in a digital signature scheme [3]: message sender and message receiver. The sender would like to demonstrate to the receiver that a transmitted message is authentic by generating a digital string (signature) for the message. A digital signature scheme usually consists of three algorithms: • KeyGen: The algorithm generates a public/private key pair for the message sender. It is computationally infeasible to derive the private key from the public key. • Sign: The algorithm takes a digital message and the private key to output a digital signature. • Verify: The algorithm takes the signature, the message, and the public key to output accept or reject. The basic security property of a digital signature scheme is unforgeability. It requires that an adversary cannot forge a valid signature for a message that he has not queried. Due to its authenticity and non-repudiation properties, digital signature plays an important role in identifying message sources and serving as provenance evidence in communication systems.

2.1.3 Data Encryption Data encryption is used to preserve data confidentiality in communication networks by encrypting and decrypting messages with digital keys. Anyone without a correct decryption key cannot efficiently decrypt an encrypted message. According to the types of keys, there are two categories of data encryption: symmetric key-based encryption and public key-based encryption.

2.1.3.1

Symmetric Key-Based Encryption

In a symmetric key-based encryption scheme, the encryption key is the same as the decryption key. Block cipher, such as Advanced Encryption Standard (AES) [4], is symmetric key-based, where messages are processed in blocks of equal size. As the symmetric key-based scheme is fast in encryption and decryption, it is usually used for processing large volumes of data, such as sharing video files or encrypting end-to-end communications.

2.1.3.2

Public Key-Based Encryption

In a public key-based encryption scheme, the encryption key differs from the decryption key [5]. More specifically, messages are encrypted using a public key while the encrypted messages are decrypted using the corresponding private key.

2.1 Basic Crypto Technologies

25

It should not be computationally feasible to derive the private key from the public key. Compared with symmetric key-based encryption, the computation overhead of public key-based encryption is much higher. As a result, public key-based encryption is usually used to encrypt short messages. A typical example of public key-encryption is the ElGamal encryption scheme, which consists of the following algorithms: • .KeyGen(G) → (sk, pk) This algorithm sets a secret key (sk) as a randomly chosen element from .Zp and computes a public key as: pk = g sk .

.

(2.4)

The algorithm outputs a private/public key pair: (sk, pk).

.

(2.5)

• .Enc(m, pk) → c This algorithm takes a message .m ∈ Zp and outputs an ElGamal encryption of the message as follows: c = (c1 , c2 ) .

= (g r , pk r g m ).

(2.6)

r is randomly chosen from .Zp . • .Dec(c, sk) → g m This algorithm uses the secret key to decrypt a ciphertext c as follows: g m = c2 /c1sk .

.

(2.7)

Symmetric key-based and public key-based encryption schemes can work together to help establish secure communication channels [6]. For example, key negotiating materials can be encrypted using the public key-based scheme to help derive a symmetric key for encrypting future communication messages.

2.1.4 Hash Function A hash function H [7] can be written as a one-way function: H : (0, 1)∗ → (0, 1)l

.

(2.8)

26

2 Fundamental Data Security Technologies

that maps a message of arbitrary length to a l-bit string. For different hash algorithms, such as Secure Hash Algorithm (SHA), l can be set to 256, 512, etc. The hash function should have the following properties: • Collision resistance: It should be computationally inefficient to find two strings that have the same hash values. • One-wayness: It is easy to compute the hash value with an input. However, there is no efficient method other than brute-force search to find the input of a given hash value.

2.2 Basic Blockchain Technologies Blockchain is an emerging technology that achieves decentralized trust among peer-to-peer nodes. From its name, the blockchain’s data structure is a chain of blocks and each of the blocks consists of dozens of peer-to-peer transactions. It was first proposed as the underlying architecture for cryptocurrency, i.e., Bitcoin [8], where any node in the network can join the Bitcoin and make transactions with a self-generated identity and a public blockchain address. To ensure the consistency of the transaction records among (potentially malicious) peer nodes, blockchain uses consensus protocols and reward mechanisms. Since 2009, Bitcoin has been a popular decentralized finance system with a rapidly changing price. Beyond cryptocurrency, it is later discovered that blockchain storage can be used to store more than transaction information, e.g., stock price. At the same time, a successful transaction can change the state of the blockchain. If the blockchain is equipped with a scripting language, it can serve as a trusted program execution environment among distributed nodes. In 2015, Ethereum [9] was first proposed to support the program functionality. The key technology in Ethereum is smart contract, which significantly boosted the development of decentralized applications. Bitcoin and Ethereum blockchains can be classified as public blockchain, where any node in the network can join with an anonymous identity. As the nodes can be malicious, such as posting incorrect information or delaying transactions, consensus protocols for public blockchain are designed to achieve ledger consistency, which results in low transaction throughput and long transaction confirmation time. For peer nodes with a certain level of mutual trust, such as industrial partners, consortium blockchain can be built with more efficient identity management and consensus protocols. A typical example of the consortium blockchain is Hyperledger Fabric [10], which can be widely used in distributed applications, such as supply chain management and certificate management. In the following, we will use the three blockchains, i.e., Bitcoin, Ethereum, and Hyperledger Fabric as concrete examples, to discuss essential technological aspects of blockchain.

2.2 Basic Blockchain Technologies

27

2.2.1 Data Structures Blockchain is a chain of blocks of peer-to-peer transactions. In each block, there are some essential information as follows: • Transactions: Transactions are organized into a Merkle tree with a root stored in the block header. • Hash of previous block: Blocks are chained together by hashes of their precedent blocks. Once a block is confirmed on the blockchain, it can no longer be maliciously changed due to the difficulty of finding collisions for hash functions. • Nonce: It is used in the Bitcoin blockchain as answers to hash puzzles. There is other information in a block, such as block size and block reward, which are not discussed in detail in this chapter.

2.2.2 Identity and Transaction Management There are two kinds of identity and transaction management for public and consortium blockchains. • Public blockchain: Any node in the network can self-generate a public/private key pair of a digital signature scheme, e.g., ECDSA. The node keeps the private key securely at its local storage and publishes the public key along with a blockchain address. Transactions in public blockchain are based on the Unspent Transaction Output (UTXO) model [8]. Generally speaking, the balance of a blockchain account comes from unspent transactions where the account is the receipt. To do so, each transaction consists of one or more previous unspent transactions and the receipt’s public key. To demonstrate the ownership of the unspent transactions, the transaction sender should sign the transaction with the private key corresponding to the receipt’s public key in the unspent transactions. • Consortium blockchain: Nodes in the consortium blockchain usually have a certain degree of mutual trust and do not necessarily need to be anonymous. As a result, there is an identity management entity in the consortium blockchain. The identity management entity can deploy existing techniques to assign blockchain node credentials. For example, Hyperledger Fabric [10] adopts an existing certificate-based mechanism where a chain of trust for blockchain certificates is required. At the same time, consortium blockchain mostly uses the smart contract to store shared business logic and data among industrial partners. Another type of transaction management mechanisms that has not been widely used is account-based blockchain. More specifically, all blockchain nodes collectively maintain a vector commitment [11] to manage account and transaction information. Each entry of the vector stores the balance, public key, etc. of a blockchain account. The owner of the account can use its private key to send and receive transactions with (zero-knowledge) membership proofs.

28

2 Fundamental Data Security Technologies

Miner 1

Miner 2

Tx1

Tx2

Tx6

Tx7

Tx3

……

Tx8

……

Blockchain with Forks Fig. 2.1 Forks in blockchain

2.2.3 Consensus Protocol and Reward Mechanism In the Bitcoin blockchain, there are special entities (i.e., miners) that help collect transactions and propose new blocks. More specifically, miners listen to the blockchain network for new transactions, verify the balance and ownership of the transactions, pack dozens of valid transactions into a new block, and propose the block to the blockchain as the next block. An ideal case of the above-discussed process is that each time a single miner proposes a new block. However, as there can be many miners in the blockchain network, more than one miner can propose a different block at the same time. A typical example of miners competing with each other is shown in Fig. 2.1. Miner 1 would like to propose a block of some transactions to the blockchain while miner 2 would like to propose another different block. Without proper countermeasures, chain forks [12] can happen in the blockchain which result in more than one valid chain and break the ledger consistency on a global scale.

2.2 Basic Blockchain Technologies

29

To prevent forks from happening or reduce the probability of fork generation, the blockchain needs consensus protocols. In the following, we will discuss three typical consensus protocols. • Proof-of-Work (PoW) [8] is the consensus protocol that was initially adopted in the Bitcoin blockchain. In a PoW blockchain, to propose a new block, miners are required to solve a hash puzzle. More specifically, miners need to find a specific nonce for the proposed block. With this nonce and other block information as input, the first a few bits of a hash function should all be 0. To find the nonce, miners have no better method but to search the whole input space, which is called mining. As the mining increases the difficulty of proposing the next block, the probability of fork generation is reduced [12]. However, forks can still happen sometimes even with the mining process. To further reduce forks, PoW requires that a transaction can only be used after k consecutive blocks. That is, the consistency of blockchain should be guaranteed among peer nodes in the long run. • Proof-of-Stake (PoS) [13] is another type of consensus protocols. In contrast to PoW, PoS reduces the probability of fork generation by pre-determining block generators for each time slot. Global time is divided into equal slots in PoS and each of the following two phases occupies a specific number of the slots. – Leader Selection: This phase selects slot leaders for future block generations. The critical part is to make the leader selection process distributed and random by adopting the publicly verifiable secret sharing technique. At the same time, a node with more “stake,” i.e., cryptocurrency, should be more likely to be selected as a slot leader. – Block Generation: After the leaders are selected, slot leaders can then collect valid transactions and propose new blocks during their slots. The proposed block will become valid after confirmations from other blockchain nodes. The leader selection and block generation phases can be repeatedly executed. Compared with PoW, PoS requires fewer computing resources to solve the hash puzzle. • Consortium Blockchain: As there is usually more trust among nodes in the consortium blockchain, more efficient consensus protocols can be deployed. For example, in Hyperledger Fabric, the RAFT ordering service or the ByzantineFault Tolerant protocol can be implemented. The consensus protocols consume resources from participating nodes, such as miners and slot leaders. In public blockchains, to motivate nodes to collect transactions and propose blocks, there are mainly the following incentive mechanisms: • Block Reward: Each block creator can obtain a specific amount of cryptocurrency after the proposed block is accepted in the blockchain. • Transaction Fee: When a node makes a transaction, the node should include a “tip” from its unspent balances. In the smart contract of Ethereum, the tip is called

30

2 Fundamental Data Security Technologies

a gas fee and is mainly dependent on the computation and storage complexity of a contract execution.

2.2.4 Smart Contract Smart contract is a computer program stored and executed in the blockchain, such as Ethereum. Smart contract specifies terms and conditions for blockchain state updates. In Ethereum [9], codes of a smart contract can be stored on the blockchain. Blockchain nodes can trigger contract execution by sending a transaction including application data and the address of the contract, which will be verified by blockchain nodes. If the transaction is confirmed, the state of the blockchain storage, e.g., the stock number of digital goods, will be changed. Ethereum provides a scripting language, i.e., Solidity, for programming smart contracts. In Hyperledger Fabric [10], smart contracts are denoted as chaincodes. Instead of providing a new scripting language, Hyperledger Fabric supports Go, Node.js, and JAVA. Blockchain peer nodes can write chaincodes with data structures and use a set of chaincode functions to access or change the data. Chaincodes are packed, approved, and committed in the blockchain by peer nodes before being executed.

2.2.5 Channel in Hyperledger Fabric Compared with Bitcoin or Ethereum blockchain, Hyperledger Fabric introduces a new concept: channel, which can also be regarded as a sub-chain. More specifically, each channel in Hyperledger Fabric can have individual chain storage, channel members, and consensus protocols. Blockchain nodes must first agree on the same channel parameters and join the channel before proposing and executing chaincodes. With channels, Hyperledger Fabric provides a more flexible plug-in chain management method for blockchain applications.

2.2.6 Performance Metrics There are the following essential metrics to measure blockchain performance: • Transaction per hour: This metric indicates the number of transactions that a blockchain can confirm in an hour, which is used to measure the throughput of a blockchain. • Block time/Transaction confirmation time: Block time indicates the average time required for a new block to be accepted by the blockchain. Transaction

2.3 Privacy-Enhancing Technologies for Blockchain

31

confirmation time refers to the average time required for a transaction to be confirmed on the blockchain. The two metrics are used to measure the delay of the blockchain. • Chain size: It is the total size of all blocks in a blockchain system. Each blockchain full node needs to store all the blocks. • Ledger robustness: A blockchain should achieve Persistence and Liveness [13]. Informally, Persistence requires that blockchain nodes reach a consistent view of the shared ledger; Liveness requires that a valid transaction will become “stable” in the blockchain. Public blockchains (i.e., Bitcoin and Ethereum) have low transaction throughput and a long delay. The chain size of public blockchains is keeping increasing which has reached a few hundred of GBs in 2023. Consortium blockchains (e.g., Hyperledger Fabric) are set up for industrial applications with flexible plugin settings which are often more efficient than public blockchains in terms of throughput and delay. As for the ledger robustness, it is dependent on the underlying consensus protocols and trust assumptions among the peer nodes.

2.2.7 Testing Network There are various testing networks for developing blockchain-based applications, such as Parity Ethereum. Developers can choose different network settings and test the above-discussed performance metrics. For smart contracts on public blockchains, developers can use online Remix Ethereum IDE for writing and testing codes before deploying the codes on the actual blockchain network. For consortium blockchains, Hyperledger Fabric is an open-source project that can support a wide range of network settings, including channels, consensus protocols, the number of peer nodes, endorsement policy, etc. It also provides a test network with default shell scripts. Developers can test the feasibility of chaincodes and extend the test network to a complex setting.

2.3 Privacy-Enhancing Technologies for Blockchain Blockchain is a shared ledger among distributed nodes. That is, on-chain data are transparent to all nodes, which can sometimes cause privacy concerns. In this section, we will discuss privacy-enhancing techniques for the blockchain, including cryptographic commitment, zero-knowledge proof, zero-knowledge succinct noninteractive argument of knowledge, and anonymous credential.

32

2 Fundamental Data Security Technologies

2.3.1 Cryptographic Commitment Cryptographic commitment scheme [14, 15] can “digest” a digital message to a succinct message, i.e., commitment. The commitment can later be opened directly or in a privacy-preserving manner. Informally, it consists of two algorithms: • Commit: The algorithm takes a message from a pre-defined message space to generate a commitment to the message. • Open: The algorithm reveals the original message to a verifier who can verify if the commitment is correctly generated from the revealed message. For example, the collision-resistant hash function can be regarded as a special commitment scheme. A message is “committed” by calculating the hash value of the message as a commitment and anyone can verify the commitment by recomputing the hash. A commitment scheme can have the following security properties: • Binding: It is not computationally feasible to open a commitment to a different message that is not digested in the commitment. • Hiding: Computationally bounded adversaries should not be able to distinguish commitments for different messages. Commitment schemes can have a variety of applications on the blockchain. • Data provenance: Commitments can serve as reliable records of data, which can be used for data provenance and tracing. • Data privacy: Combined with the zero-knowledge proof technique, commitments can be used to prove certain properties of the original data without revealing the data. • Succinct storage: Many commitment schemes can have constant-size commitments, such as Pedersen commitment. Compared with directly storing the original data on the blockchain, storing the commitments is much more storage efficient. In the following, we will investigate several commitment schemes and present their applications on the blockchain. Note that notations of bilinear groups are reused.

2.3.1.1

Pedersen Commitment

Pedersen commitment [14] is often written as follows: Com = g0r g1a ,

.

(2.9)

where .g0 and .g1 ∈ G1 are linearly independent generators, .a ∈ Zp is the message to be committed and .r ∈ Zp is a random number.

2.3 Privacy-Enhancing Technologies for Blockchain

33

Pedersen commitment can be used for committing to a single secret, such as the balance of a blockchain account. With a range proof system, it can prove enough balance of a blockchain account without revealing the balance.

2.3.1.2

Polynomial Commitment

Polynomial commitment allows a prover to commit to a polynomial: f (x) = a0 + a1 x 1 + a2 x 2 + . . . + ad x d , .

(a0 , a1 , . . . ad ) ∈ Zd+1 p .

(2.10)

ai is a coefficient of the polynomial bounded by a degree d. Later, the prover can open the commitment to any point:

.

(xi , f (xi )).

.

(2.11)

A strawman solution to the polynomial commitment is to commit to each coefficient individually using Pedersen commitment, which can result in a linearly increasing size of the commitment. A representative construction that achieves constant-size polynomial commitment is proposed in [16]. Compared with Pedersen commitment, the construction in [16] relies on the knowledge-based assumption where a setup of a reference string should be constructed: 2

d

(g, g s , g s , . . . g s ) ∈ Gd+1 1 .

.

(2.12)

g ∈ G1 is a random generator and s is trapdoor secret. That is, the setup of the above reference string must be conducted by a trusted authority and the trapdoor secret needs to be destroyed after the setup. Another approach is to apply a secure multi-party computation protocol to construct the reference string [17]. With the reference string, a commitment to .f (x) is calculated as follows:

.

Com =

.

d  i (g s )ai ,

(2.13)

i=0 i

where .g s is from the common reference string and .ai is the coefficient of .f (x). Opening the commitment to a point is also efficient by constructing an evaluation witness. Polynomial commitment is useful for constructing privacy-preserving applications on the blockchain: • Zero-knowledge proof: Polynomial commitments can be combined with ZKP to achieve various functionalities. For example, as many constraint systems are

34

2 Fundamental Data Security Technologies

expressed as polynomial evaluations, such as quadric arithmetic programs, zeroknowledge polynomial evaluations of the quadratic arithmetic program (QAP) can achieve succinct zero-knowledge argument of knowledge [18]. By carefully designing the coefficients of a polynomial, the polynomial can encode a vector of integers, i.e., vector commitment, which is an essential building block of emerging stateless account-based blockchains. • Threshold cryptography: In blockchain, there are often application scenarios where a distributed committee needs to collaboratively share a secret or negotiate secure parameters using threshold cryptography [19]. More specifically, many constructions are based on Shamir secret sharing, which requires each participant to construct a polynomial and share the commitment of the polynomial to others.

2.3.1.3

Vector Commitment

Vector commitment is the commitment of a vector of integers, which can be constructed from the extended Pedersen commitment: Com = g0r

n 

.

giai .

(2.14)

i=1

The secrets in the commitment are: (a1 , a2 , . . . an ) ∈ Znp .

.

(2.15)

A set of linearly independent generators is denoted as: (g0 , g1 , g2 , . . . gn ) ∈ Gn+1 1

.

(2.16)

r ∈ Zp is a random number. By carefully designing coefficients of a polynomial, vector commitments can also be instantiated from polynomial commitment schemes. Later, a prover can choose to open the vector at a given position or some given positions. Vector commitments can be combined with various zero-knowledge proof techniques to achieve verifiable computations on the blockchain.

.

• Membership proof: Membership proof is to demonstrate that an element or a subset of elements is from a set of elements. With vector commitments [11], the set of all elements can be encoded into a vector and committed. Later, a prover can demonstrate membership of an element by opening the vector commitment at the corresponding position. • Zero-knowledge non-interactive argument of knowledge (zk-SNARK): In preprocessing zk-SNARK constructions [20], I/O vales are encoded into an extended Pedersen commitment. The commitment can be used in the verification phase.

2.3 Privacy-Enhancing Technologies for Blockchain

35

Root

……

……

H(m1)

H(m2)

H(m3)

H(m4)

……

H(mn-3)

H(mn-2)

H(mn-1)

H(mn)

Fig. 2.2 Merkle tree

• Dictionary: Vector commitment can also be used to construct a data dictionary [6] that supports query, deletion, update, etc. By encoding blockchain account information into an entry of the vector, the vector commitment can be used to construct a stateless blockchain [21]. Specifically, all nodes maintain an updatable vector commitment for all blockchain accounts.

2.3.1.4

Merkle Tree

Merkle tree is a data structure that digests a set of messages as a (balanced binary) tree [22]. The set of the messages is denoted as: (m1 , m2 , . . . , mn ).

.

(2.17)

Merkle tree supports membership proof of a committed message. Specifically, it consists of three algorithms: • Construct: The algorithm encodes hashes of messages as leaf nodes of a balanced binary tree. From bottom to top, a parent node is the hash of its two children nodes. An illustration of a Merkle tree is shown in Fig. 2.2. • Prove: Given a single message (e.g., .m3 ) committed in the Merkle tree, the algorithm returns hash values of its siblings along the path from .H (m3 ) to the root. • Verify: The algorithm recomputes a root from the proof and checks if it equals the root of the Merkle tree.

36

2 Fundamental Data Security Technologies

Merkle tree is a commitment scheme for membership proofs and its security relies on the collision resistance of hash functions. That is, it can prove membership of a message in a set without revealing other messages. The Merkle tree method is usually computationally efficient as it only involves computations of hash functions. However, as its proof consists of values of a tree path, the proof size is logarithmic to the number of messages committed. Merkle tree is widely used for blockchain applications. As we discussed before, transactions are organized as a Merkle tree in each block of the Bitcoin blockchain. Combined with zk-SNARK technique, the verification of the Merkle proof can be made zero knowledge, which is the technique adopted by privacy-preserving blockchain systems, such as Zerocash.

2.3.2 Zero-Knowledge Proof Zero-knowledge proof (ZKP) [14, 15, 23–25] is a cryptographic protocol with two entities: prover and verifier. For a public relation R, if a tuple of .(x, w) is valid, we denote: R(x, w) = 1.

.

(2.18)

The goal of ZKP is that the prover with .(x, w) wants to convince the verifier that R(x, w) = 1 without revealing the witness w. More specifically, ZKP usually consists of three algorithms:

.

• Setup: The algorithm generates a common reference string (CRS) for ZKP, which is used to compute and verify proofs. For different ZKP systems, common reference string can be roughly classified into two types: – Universal: A CRS is called universal if it can support arbitrary relations. – Non-universal: A CRS is called non-universal if it can only be used for ZKP of a specific relation. For example, QAP-based SNARK with pairing-based constructions requires a relation-dependent CRS [26]. Another important feature of CRS is whether it involves a trapdoor secret in the setup process, which decides if the setup algorithm needs to be run by a trusted authority or a secure multi-party computation protocol. • Prove: The algorithm takes .(x, w) and the CRS as inputs to output a proof .π . • Verify: The algorithm takes x, the CRS, and the proof .π as inputs. It outputs either to accept or reject the proof. Informally, there are three essential properties of ZKP: • Honest-verifier completeness: An honest verifier will accept valid proof. • Computational soundness: A computationally bounded adversary cannot forge a proof for .R(x ' , w ' ) /= 1 to pass the verification algorithm.

2.3 Privacy-Enhancing Technologies for Blockchain

37

Message Challenge

Prover

Verifier Response

Fig. 2.3 Sigma protocol

• Zero-knowledge: The proof should only reveal whether .R(x, w) = 1 other than the witness w. For soundness, a formal definition often requires the existence of an extractor that can recover the witness from transcripts in ZKP. The property is defined as knowledge soundness, which differs succinct non-interactive argument (SNARG) from SNARK [26]. If a ZKP system is used to construct a digital signature scheme, non-malleability of a valid proof is also required. For zero knowledge, it requires a simulator that can output indistinguishable proofs. In the following, we will discuss some representative ZKP constructions and their applications in blockchain-based data security in HCN, including Sigma protocol and ZKP for algebraic relations.

2.3.2.1

Sigma Protocol

Sigma protocol [27, 28] is a three-move ZKP protocol for proving relations in the discrete logarithm setting. In a Sigma protocol, witnesses are encoded in (extended) Pedersen commitment. Statements to be proven can be as simple as “I know the secret of the Pedersen commitment” to “I know witness for a circuit evaluation.” To run a Sigma protocol, the prover and verifier first agree on a public relation R with CRS. Then, the prover and verifier interactive with each other as shown in Fig. 2.3. 1. The prover generates the first message and sends the message to the verifier. 2. After receiving the message, the verifier generates a random number as a challenge and sends the challenge to the prover. 3. After receiving the challenge, the prover computes a response and sends the response to the verifier. Finally, the verifier checks the response to decide whether to accept or reject the proof. ZKP in the discrete logarithm can be written as follows [23]: ZKP {(a, r) : Y = g1a g2r },

.

(2.19)

where a and .r ∈ Zp are the witnesses and .g1 and .g2 ∈ G1 are random generators. This ZKP is to demonstrate the knowledge of a Pedersen commitment Y . Another

38

2 Fundamental Data Security Technologies

example is to demonstrate the equality of discrete logarithms among two elements: ZKP {(a) : Y1 = g1a ∧

.

(2.20)

Y2 = g2a }, where a is the witness and .g1 and .g2 ∈ G1 are random generators. .∧ denotes the “AND” operation. A sigma protocol is ZKP with honest-verifier completeness, soundness, and zero knowledge under discrete logarithm assumption. Sigma protocols can support complex relations across multiple commitments, such as proof of multiplication/addition [29], where a prover would like to demonstrate that the witness in one commitment is the multiplication/addition of the witnesses in another two commitments. This can be generalized to support proof systems of algebraic relations. However, directly adopting Sigma protocols for algebraic relations can be costive in computational and communication overheads. For blockchain-based applications, Sigma protocols are useful in proving secret information. For example, a blockchain statement can be “I have a blockchain account with a balance larger than 100” or “I know the secret key of a blockchain account.” However, as the consensus protocols require costive communications, it is desirable to have non-interactive Sigma protocols. That is, the challenge-response messages in a Sigma protocol can be avoided. Non-interactivity can be achieved with the Fiat-Shamir heuristic [30].

2.3.2.2

Fiat-Shamir Heuristic

If we model a hash function H as a random oracle, Fiat-Shamir heuristic [30] can turn the three-move Sigma protocol into a non-interactive ZKP protocol. Specifically, the prover can use the hash function to generate a random challenge with transcripts in the protocol. We take the ZKP in Eq. 2.19 as an example. 1. The prover first chooses .a ' and .r ' ∈ Zp and computes: '

'

Y ' = g1a g2r .

.

(2.21)

It should be noted that the random values must be different in different ZKP instances. Instead of letting the verifier choose a random challenge and sending the challenge to the prover, the prover uses the hash function as a random oracle to compute: c = H (Y ||Y ' ).

.

(2.22)

2.3 Privacy-Enhancing Technologies for Blockchain

39

The prover further computes: z1 = a ' − ca, .

z2 = r ' − cr

(2.23)

and sends Y , .Y ' , z1 and .z2 to the verifier. 2. Upon receiving the message, the verifier first recomputes: c = H (Y ||Y ' )

(2.24)

Y c g1z1 g2z2 = Y ' .

(2.25)

.

and checks: .

If the above equation holds, the verifier accepts the ZKP that the prover knows the a and r in Y . Fiat-Shamir heuristic is secure in the random oracle model, which is instantiated with hash functions for real-world applications. To construct a signature scheme from ZKP, such as Schnorr signature [27], a signer obtains a private/public key pair: (a, g a ) ∈ (Zp , G1 ).

.

(2.26)

The signer can sign a message m by demonstrating its knowledge of a in .g a . m is taken as the input of the hash function for calculating the challenge c, which gives the signature unforgeability in the random oracle model.

2.3.2.3

ZKP for Algebraic Relations

We have discussed Sigma protocols for simple relations in the previous section. Sigma protocols can be naturally extended to support complex relations by integrating general ZKP for addition and multiplication with “AND” operation [28, 31]. However, they often result in prohibitively high communication and computation overheads. In the following, we investigate representative ZKP protocols for complex relations and discuss briefly how the efficiency can be improved compared with directly using Sigma protocols. • Membership proof: Membership proof [31] is used to prove (in zero knowledge) that a data item is a member of the data set. An approach is to commit each individual data item as Pedersen commitment and prove knowledge of the witness with an “OR” argument. A more efficient approach is to encode a data set into polynomials and explore algebraic features of the polynomial to prove membership, such as the vector commitment scheme that we discussed in the previous section. Merkle tree [22] can also be used to construct membership proofs.

40

2 Fundamental Data Security Technologies

• Range proof: Range proof is to demonstrate a commitment containing a secret that lies in a range, i.e., .[x, y], .x, y ∈ Zp . It can be written as follows: ZKP {(a, r) : .

Y = g1a g2r ∧

(2.27)

a ∈ [x, y]}, where Y is a Pedersen commitment with r as randomness and a as the secret. Range proof can be instantiated from Sigma protocols by committing to each individual bit of a. As a result, communication overheads are linear with the number of bits in the secret. Bulletproof [32] is an interactive proof system (that can be made non-interactive with Fiat-Shamir transformation) to demonstrate inner-product relations between two committed vectors. By encoding a and ranges into vectors, the range proof can be transformed into an inner-product proof with a logarithmic proof size. • Polynomial relation: Polynomial relations can include (zero-knowledge) point evaluations of a committed polynomial, which is efficiently instantiated from knowledge-based assumptions [16]. By exploring algebraic features in polynomials, ZKP with unique functionalities can be constructed, such as to demonstrate a committed vector’s hamming weight is no more than a given value [29]. • Circuit evaluation: It is to demonstrate that a prover knows the witness for a valid assignment of a Boolean or arithmetic circuit evaluation. As many computation problems can be represented by circuit evaluations, this type of proof can support general relations. Recent technological advances have explored various constraint systems, such as QAP, for circuit evaluations, and proposed different instantiations from inner-product argument [32], batched polynomial evaluations [18] or pairing-based construction [20]. This will be discussed in detail in the next section. ZKP for general and complex relations has a wide range of applications for privacy-preserving blockchain applications [33, 34]. For example, to protect account balances and account identities, zero-knowledge range proof and proof of knowledge of a private signature can be constructed.

2.3.3 zk-SNARK A widely deployed ZKP protocol for circuit evaluations is the zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK) [20, 35]. The most distinctive feature of zk-SNARK constructions is the succinctness of proof size and proof verification. For example, a notable construction [26] achieves constant proof size (3 group elements) and 4 pairings for non-I/O verifications regardless of the complexity of relations to be proved.

2.3 Privacy-Enhancing Technologies for Blockchain

2.3.3.1

41

Workflow of zk-SNARK

Denote R as a relation and .(x, w) is a valid tuple where x is the statement and w is the witness. A general workflow of zk-SNARK for R consists of the following components: • Circuit representation: By types of circuit operations, existing works have explored various circuit representations [36], including Boolean circuits and arithmetic circuits. Boolean circuit involves Boolean operations over bits and arithmetic circuit involves addition/multiplication operations over fields. Circuit designs for specific relations are crucial for the efficiency of a SNARK system especially when dealing with loops and random memory access within the circuit. By circuit universality, a universal circuit can support various relations bounded by the number of operations in the circuit while a non-universal circuit can only support a specific relation. • Constraint system: Circuit representations are translated into constraint systems with polynomials, such as QAP. With constraint systems, circuit evaluations are reduced to polynomial evaluations with I/O values and intermediate values. • ZKP construction: Evaluations of constraint systems can be instantiated from different ZKP constructions, such as polynomial ZKP [18], sumcheck ZKP [37], or pairing-based ZKP [20]. A typical workflow of the QAP-based SNARK system is shown in Fig. 2.4. More specifically, a relation represented by the arithmetic circuit is first converted to a QAP and then is instantiated from pairing-based cryptography. Fig. 2.4 Workflow of QAP-based SNARK

42

2 Fundamental Data Security Technologies

In the following, we will first introduce the QAP and then discuss three representative constructions of modern zk-SNARKs, including non-universal zkSNARK and universal zk-SNARK.

2.3.3.2

Quadratic Arithmetic Program (QAP)

Quadratic Arithmetic Program (QAP) [35] consists of three sets of polynomials A, B, C:

.

A = {ai (x)}, B = {bi (x)}, .

(2.28)

C = {ci (x)}, i ∈ [0, m].

A QAP can be constructed from an arithmetic circuit, where a target polynomial t (x) is constructed by selecting multiple random roots for multiplication gates in the circuit. We discuss the construction of a QAP as follows [20]:

.

1. For each multiplication gate in the circuit, choose a random root .ri . Compute the target polynomial: t (x) =

.



(x − ri ).

(2.29)

2. .ak (x) encodes the left input for each multiplication gate. Specifically, .ak (ri ) = 1 if k-th wire is the left input of i-th multiplication gate. Similarly, .{bk (x)} and .{ck (x)} encode the right input and output of each multiplication gate. Given the three sets of polynomials, .{yi }, i ∈ [1, n] are valid assignments of the circuit iff we can find witness .{yi }, i ∈ [n + 1, m], such that the following .p(x) can be divided by .t (x): p(x) = (a0 (x) +

m 

yi ai (x))(b0 (x) +

i=1 .

− (c0 (x) +

m 

m  i=1

yi bi (x)) (2.30)

yi ci (x)).

i=1

With the equivalent QAP for the circuit, the circuit evaluation on given I/O assignments is converted to find witnesses that satisfy the divisibility check of the QAP. The divisibility check can be instantiated by evaluating at some special points, which have two main constructions:

2.3 Privacy-Enhancing Technologies for Blockchain

43

• Non-universal SNARK: CRS is constructed for each relation, which requires a trusted setup process. • Universal SNARK: CRS can be universal and a trusted setup is not required or only required once.

2.3.3.3

Non-Universal zk-SNARK

In non-universal SNARK [20], knowledge-based assumptions are required to generate the CRS. More specifically, the CRS consists of evaluation key (EK) and verification key (VK): • EK encodes the three sets of polynomials into group elements with a trapdoor secret. EK is used by the prover to generate a proof .π for valid assignments and witnesses to the QAP. • VK is used for verifying the proof and its size is often linearly increasing with the number of I/O assignments in the QAP. The trapdoor secret must be destroyed after the setup, which requires either a trusted authority or a secure multi-party computation protocol. The proof generation requires a prover to solve a polynomial .H (x), such that .H (x) times .t (x) equals .p(x) in Eq. 2.30. The prover then encodes the witnesses and .H (x) into the proof with the EK. In the verification, a verifier needs to encode I/O values into group elements. The final checks with the I/O elements and the proof are conducted in bilinear pairings. The advantage of the non-universal SNARK is that the proof size and the proof verification are extremely efficient. Except for the multi-exponentiation computations for I/O values, the proof size and verification costs remain constant regardless of the size of QAP. However, this requires a large size of evaluation keys and a trusted setup per relation.

2.3.3.4

Universal zk-SNARK

Universal zk-SNARK can be constructed from universal circuits. The number of addition/multiplication gates in the circuits is bounded [38] and the gates are decided to be activated or not at specific relation instances, which could lead to the increase of prover computation overheads. Another approach to achieve universal zk-SNARK is to construct a universal structured reference string (SRS) [18]. Specifically, the SRS is updatable, and an offline indexing phase is used to encode the given index into polynomials. Combined with algebraic holographic proofs (AHP) and extractable polynomial commitment schemes, the encoded polynomials can be efficiently evaluated.

44

2.3.3.5

2 Fundamental Data Security Technologies

Open-Source Implementations

There are two essential steps to implement a SNARK system: • Circuit design: Developers should first construct an arithmetic circuit that represents a relation to be proven. To make the process easier for non-experts in circuit designs, existing works have also provided compilers that can generate circuits from programs. Pinocchio [20] provides a compiler that translates C programs to arithmetic circuits while jsnark provides a JAVA library [39]. Note that random memory access, such as dynamic array access and “break” operations, needs to be carefully designed (avoided) in programs. The number of multiplication gates in a circuit is the main metric that indicates its complexity. • ZKP instantiation: Given an arithmetic circuit with valid assignments, a ZKP system that consists of setup, prove, and verify algorithms can be instantiated from libsnark library1 [40]. More specifically, the libsnark library generates a QAP for the circuit and evaluates the QAP at the assignments. It provides submodules for setting up CRS, calculating proof, and verifying the proof and supports popular SNARK constructions including Pinocchio, Groth16, etc. Different crypto groups, including bn128 and alt-bn128, can be selected. An example of implementing QAP-based SNARK on the blockchain includes the following modules: • A circuit generator, such as jsnark, takes into a programming code to output a *.arith file, which consists of descriptions of each gate in the circuit. Then, the libsnark library takes the circuit file as input to output the CRS including the verification key and evaluation key. • A prover can use the libsnark library and take the circuit file, assignments of circuit input wires (*.in file), and the evaluation key as inputs. The prover can generate the output of the circuit evaluation with proof. • The verification of the circuit output and the proof mainly consists of computing multi-exponentiation of I/O wires and a few pairing operations. The verification can be conducted with the evaluation key with either the libsnark interface or the contract language on the blockchain.

2.3.4 Commit-and-Prove ZKP 2.3.4.1

Application Scenario

SNARK constructions are designed for relations represented by circuits. Despite its generality and efficiency in proof verifications, it increases the prover overhead and is not optimized for specific algebraic relations. For example, it is very efficient to use the Sigma protocol to prove knowledge of a commitment: 1 https://github.com/akosba/xjsnark.

2.3 Privacy-Enhancing Technologies for Blockchain

ZKP {(a) : Y = g a },

.

45

(2.31)

where a is the secret and Y is the commitment without randomness. If we use SNARK to prove the above relation, a prover needs to encode group operations into a large circuit and prove a valid assignment with a number of crypto operations. This also applies to the membership proof based on vector commitment and polynomial evaluation proof that we discussed before. For a complex relation, there can be different sub-relations in it and each of them can be efficiently instantiated from different ZKP systems. There are extensive use cases for the complex relations on the blockchain: • A user that knows the secret of a public commitment would like to prove that the secret is also committed in a whitelist, e.g., Merkle tree. • An account owner on a stateless blockchain would like to prove that he owns a valid account committed in a vector commitment and that the balance in its account is larger than a threshold. • A user that knows a witness to a public relation would like to generate a signature on a message, i.e., Signature of Knowledge (SoK) [41]. The above-mentioned use cases often involve ZKP with Sigma protocols (knowledge of a public Pedersen commitment) and SNARKs (knowledge of a pre-image of a hash function).

2.3.4.2

Constructions

To achieve efficient ZKP for complex relations, Commit-and-Prove (CP)-ZKP [42] is a promising approach. More specifically, secrets are first committed into different commitments to serve as “bridges” that connect various ZKP systems. We briefly discuss three CP-ZKP cases: • The simplest case is shown at the top of Fig. 2.5 and can be written as follows: ZKP {(a) : Com(a)∧ .

R1 (x1 , a)∧

(2.32)

R2 (x2 , a)}. The same commitment .Com(a) for a secret a is used in two different relations .R1 and .R2 , where .x1 and .x2 are two statements. The two relations can be instantiated with Sigma protocol and SNARKs [42]. • A more complex case that the same secret is committed in two different commitments Com and .Com' is shown in the bottom of Fig. 2.5. The two commitments are associated with ZKP systems for relations .R1 and .R2 , respectively.

46

2 Fundamental Data Security Technologies

Com

R1

R2

Com

Com’

R1

R2

Fig. 2.5 Example 1: CP-ZKP

At the same time, a “linking” method that demonstrates that Com and .Com' contain the same secret is also required. The linking method can be a Sigma protocol if the commits are Pedersen commitments with different generators. This happens often for verifiable computation proofs where an output of the first function is committed and converted as a committed input of the second function [43]. However, for commitments in different forms, the linking method needs to be carefully designed. For example, Com can be committed bit by bit while ' .Com is committed using Pedersen commitment. • As shown in Fig. 2.6, a commitment Com of multiple secrets can also be split into two different commitments, .Com1 and .Com2 . That is, Com can be a vector commitment, while .Com1 and .Com2 commit to part of secrets in Com. With two different ZKP systems, the commitments can be used for different relations .R1 , .R2 , and .R3 . In this case, a ZKP system that proves a linear or membership relationship between Com and .Com1 /Com2 is required. By combining the above-discussed templates of CP-ZKP systems, modular designs of ZKPs for complex relations can be achieved [42]. Commitment can also serve as a succinct digest of a large data set. When the data set is involved in a ZKP protocol, e.g., proving a search over the data set, commitments can be used in verifying the proof instead of using the original data set [44]. By doing so, extensive communication overheads can be saved in the verifications, which is especially critical for verifications on the blockchain. However, to directly use commitments in verifications, either of the following two conditions needs to be met:

2.3 Privacy-Enhancing Technologies for Blockchain

47

Com

Com1

R1

Com2

R2

R3

Fig. 2.6 Example 2: CP-ZKP

• Trusted commitment construction: Commitments of the data set are computed by a trusted authority. • Proof of well-formedness: A prover needs to demonstrate commitments of the data set are well-formed. For example, for an extended Pedersen commitment, the prover needs to demonstrate that he/she knows all the secrets encoded over a set of public generators.

2.3.5 Anonymous Credential Anonymous credential [23, 45] enables a user to prove a valid identity to a verifier without revealing his/her true identity. It has been widely used in blockchain-based applications: • Privacy-preserving transactions: Anonymous credentials can be used to hide the identities of the transaction sender and receiver. • Anonymous voting/review [46]: Blockchain can serve as a trusted storage unit for recording votes or reviews from users. With the anonymous credential, the identity privacy of users can be preserved. • Anonymous data trading [47]: Both data seller and data buyer’s identity privacy can be preserved on the blockchain using anonymous credentials. Anonymous credentials can be instantiated from different techniques. In the Bitcoin blockchain, user anonymity comes from the pseudonym of their selfgenerated signing keys. However, pseudonym-based approaches cannot achieve unlinkability in different authentication sessions. That is, the same pseudonym is used across different sessions, which can sometimes expose additional information. In contrast, group signature-based anonymous credential [23, 48] enables unlinkable authentication sessions for higher privacy guarantees. Generally speaking, a group signature lets a user demonstrate that he/she is from a group without revealing true identity. In the following, we will discuss definitions and representative constructions of group signature-based anonymous credentials.

48

2 Fundamental Data Security Technologies

Credential Issuance

Issuer

User

Signature Generation

Signature Open

Verifier Signature Verification Fig. 2.7 Overview of group signature-based anonymous credential

2.3.5.1

Definitions

In a group signature-based anonymous credential [45, 49] scheme, there are the following entities: • Issuer: An issuer is a trusted entity that is responsible for setting up system parameters and public/secret keys, and for issuing anonymous credentials to users. • User: A user can apply for an anonymous credential from the issuer with a selfgenerated secret key. The user can then use the credential to sign anonymously on messages. • Verifier: A verifier with public keys of the system can check the correctness of a group signature. • For group signature with conditional privacy, there can also be an opening authority to open a valid signature to a registered user. In some cases, the opening authority can be the issuer. As shown in Fig. 2.7, the group signature-based anonymous credential consists of five procedures: • Setup: The issuer generates system public/secret keys. • Credential Issuance: A user commits to a self-chosen user secret key and asks the issuer to sign on the committed secret. The obtained signature along with the user secret key is the anonymous credential for the user. • Signature Issuance: The user generates an anonymous signature from the credential on a message. • Signature Verification: The verifier checks the correctness of the signature and message with the system public keys.

2.3 Privacy-Enhancing Technologies for Blockchain

49

• Signature Open: The issuer opens a group signature to a user identity in case of any dispute. For group signature-based anonymous credentials, the security notions include: • Unforgeability: A computationally bounded adversary cannot forge a valid anonymous credential without knowing system secret keys. • Unlinkability: An adversary cannot distinguish two group signatures even if the signatures come from the same user. • Traceability: A valid group signature can be opened to the exact user who generates the signature by the issuer.

2.3.5.2

Representative Constructions

In this section, we briefly discuss representative constructions of anonymous credentials from PS signature [45]. Suppose the issuer generates a set of system public/secret keys: SK = (x, y0 , y1 , y2 , . . . , yn ) ∈ Zn+2 P , .

P K = (g, ˜ g˜ x , g˜ y0 , g˜ y1 , . . . , g˜ yn ) ∈ Gn+3 2 .

(2.33)

The PS signature has two forms: a single-message signature for identity credentials and the multi-message signature for attribute credentials: • Identity Credential: Suppose we have a user identity secret .sk ∈ Zp , its PS signature has the form: π = (π1 , π2 ) .

x+y0 sk

= (gr , gr

(2.34) ),

where .gr is a randomly chosen element from .G1 and .gr /= 1G1 . • Attribute Credential: Suppose we have a user identity secret .sk ∈ Zp and a set of user attributes .(a1 , a2 , . . . , an ) ∈ ZnP , a multi-message PS signature on the secret and the attributes has the form: π = (π1 , π2 ) .

 x+y0 sk+ ni=1 yi ai

= (gr , gr

(2.35) ),

where .gr is also a random generator and .gr /= 1G1 . It should be noted that .gr must be different for different signatures. With the forms of PS signature in mind, the credential issuance phase is to sign on committed user secret and the signature generation/verification phase is to prove/verify knowledge of a signature using Sigma protocols [45].

50

2 Fundamental Data Security Technologies

Sign on Committed Secret A user with secret identity key .sk ∈ Zp can generate a commitment .g sk along with a ZKP of sk and send them to the issuer. The issuer checks the ZKP and signs on the committed message .g sk as: π = (π1 , π2 ) .

= (g u , (g x (g sk )y0 )u ),

(2.36)

where u is randomly chosen from .Zp . We can see that the above .π is a singlemessage PS signature on sk with .g u as the random generator. Proof Knowledge of Signature With .(π, sk), the user can generate an anonymous signature on a message .m ∈ (0, 1)∗ . • First, to make the signatures unlinkable, the user needs to randomize the signature as: π ' = (π1' , π2' ) .

= (π1t , π2t ),

(2.37)

where t is randomly chosen from .Zp . • Then, the user can generate a ZKP that he/she knows the sk in .π ' : ZKP = {(sk) : π ' is valid P S signature on sk.}(m),

.

(2.38)

where m is the message to be signed and is taken as the input of the hash function for calculating the challenge in a Sigma protocol. The messages in the Sigma protocol serve as the anonymous signature on m and the verification of the signature is to check the correctness of the Sigma protocol. Similar constructions for proving knowledge of attributes in multi-message PS signature can also be instantiated from the Sigma protocol. The security of the above-mentioned scheme is inherited from the security of PS signature and Sigma protocols and relies on the trustworthiness of the issuer. It should be mentioned that the single issuer can also be extended to a set of issuing authorities using threshold cryptography.

2.4 On/off-chain Computation Model for Blockchain Blockchain systems adopt consensus protocols to achieve consistent storage and state updates among distributed nodes, which significantly increases the on-chain burden of blockchain nodes:

2.4 On/off-chain Computation Model for Blockchain

51

• Storage overhead: All blockchain (full) nodes must maintain a copy of the whole ledger. This can reach a few hundred GBs in public blockchains, such as Bitcoin and Ethereum. • Computation overhead: Blockchain nodes need to verify function calls of a smart contract before the calls can change the blockchain state. As a result, directly storing large data sets and conducting heavy computations on the blockchain are prohibitively costive. On/off-chain computation model [50, 51] is a promising approach to address the efficiency issue. In the following, we will discuss the SNARK-based and Trusted Execution Environment (TEE)-based approaches for constructing on/off-chain computation models.

2.4.1 SNARK-Based Approach SNARK can be used to construct verifiable computation (vc) schemes [20]. More specifically, a function F is denoted as follows: F (x1 , x2 ) → y,

.

(2.39)

where .x1 and .x2 are inputs of F and y is the output. Using SNARK, the function can be represented as an arithmetic circuit where .x1 , x2 , and y are I/O values. By doing so, a proof .π that demonstrates .x1 , x2 , and y are valid assignments can be generated. A verifier can use the verification algorithm of the SNARK to efficiently check the validity of the assignments. SNARK-based vc can be used to construct on/off-chain computation model [44, 50]. To move expensive on-chain overheads off-chain, a SNARK system can be set up for the on-chain operations, e.g., conducting a query over historic transactions. Then, the operation can be conducted off-chain using SNARK and the results can be verified on the blockchain. Since SNARK verifications are succinct regardless of the complexity of the original operation, on-chain verifications can save lots of storage and computation resources. Either the inputs .(x1 , x2 ) or the output y can be digested as vector commitments, which can be efficiently stored and verified on the blockchain. However, if the commitments do not come from trusted entities, a ZKP protocol that demonstrates their well-formedness should be integrated with the SNARK, which sometimes can result in non-negligible computation/communication overheads. At the same time, it should be noted that most SNARK constructions do not naturally come with proof non-malleability. For specific applications, such as the signature of knowledge using SNARK without non-malleability, additional designs are required, such as embedding a pseudo-random function (PRF) into the circuit.

52

2 Fundamental Data Security Technologies

2.4.2 Trusted Execution Environment-Based Approach Trusted execution environment (TEE) [52] is a hardware-based approach to achieve confidential and verifiable computations. Generally speaking, programs and data can be loaded and executed in a protected (trusted) memory space to preserve confidentiality and integrity. Intel Software Guard Extensions (SGX) is a TEE that has been widely deployed at Intel processors. As shown in Fig. 2.8, an SGX program works as follows: • Enclave: It is the trusted computer program that has codes and initial data. It can be packed and shared as a software library and is associated with a unique measurement as its identity. Software dependencies in an enclave must come from a trusted source and the trustworthiness of the enclave codes can be achieved via the attestation mechanisms. • Host application: A host application can be run on an untrusted host machine. The host application can load the enclave into protected memories and interact with the enclave to send/receive data, which are denoted as ECall and OCall, respectively. Note that, when and how to activate enclave functions are decided by the host application. As a result, careful designs are required to preserve enclave security since the host is assumed to be untrusted. More details of SGX working principles can be found in the official “Intel Software Guard Extensions (Intel SGX)—Developer Guide” and “Intel Software Guard Extensions (Intel SGX) SDK for Linux OS—Developer Reference.” In the following, we will discuss two important native security mechanisms provided by SGX documentation: Attestation and Sealing. Fig. 2.8 SGX overview

2.4 On/off-chain Computation Model for Blockchain

2.4.2.1

53

Useful Mechanisms

• Attestation: It is important to ensure that codes and data in an enclave are trusted and that the enclave is performing a desired function, which requires attestation of the enclave. More specifically, an attestation report of the enclave codes and initial data can be calculated to be signed by a quoting enclave with an Enhanced Privacy ID (EPID) key [53]. The signature is essentially a group signature for anonymous authentication. Each SGX processor has a unique EPID signing key while an Intel attestation entity has an EPID verification key. The attestation report can then be verified by the attestation entity. If challengeresponse messages are included in the attestation, the attestation entity and the enclave can negotiate a secret communication key to encrypt/decrypt later communications. The above-mentioned process is called remote attestation, where we need to trust the attestation entity. If two enclaves are loaded in the same host machine, they can also establish trust between each other by local attestation. • Sealing: Enclave can generate additional data during runtime. For example, the enclave generates a public/private signing key pair and registers the public key at the attestation entity. When the enclave is shut down, the data in its memory can also be lost. To preserve the runtime data of an enclave, SGX introduces sealing mechanism. More specifically, an enclave can encrypt the data using a sealing key and store the data on the host machine. Only the same enclave can decrypt the data. With the sealing mechanism, the enclave can store state information and runtime data locally for future uses. However, deletions or rollbacks of the local store can sometimes happen if the host machine is malicious [54]. To design an SGX-based verifiable computation mechanism, developers can first write enclave codes using trusted SGX libraries. Then, the enclave can be attested and provisioned with secret data by the attestation entity. The provisioned secret can be sealed at its local storage and later be used to prove the trustworthiness of the enclave. However, for real-world applications, additional designs can be required to address the known attacks against SGX.

2.4.2.2

Integrating Blockchain with SGX

SGX can be integrated with the blockchain to achieve efficient on-chain verifiable computations. Traditionally, blockchain uses the smart contract technique to let blockchain nodes verify function calls, whose security is guaranteed by the honest majority of the blockchain nodes. With the SGX, the collaborative verifications can be offloaded to an enclave with contract verification functionalities. With proper signing key provisioning to the enclave, the enclave can generate a report of a contract call to be verified on the blockchain. The integration of the blockchain and the SGX can complement each other’s drawbacks [51]:

54

2 Fundamental Data Security Technologies

• First, since many blockchain nodes are required to redo contract computations, the verifications of contract calls can consume a lot of computing resources. With the SGX-based approach, only a single enclave needs to do the contract verifications. As a result, significant computation overheads for verifying contract calls can be saved. • Second, blockchain can serve as trusted storage to store up-to-date state information for the SGX enclave. By doing so, some known attacks, such as rollback attacks against SGX sealing, can be addressed. Key management of enclaves is critical in this case where frequent key updating or revocation needs to be further considered.

2.4.2.3

Implementations

SGX supports programming in Windows or Linux with C++ language. Developers first need to make sure the processor supports SGX and enables the SGX on BIOS settings. Developers can either build the SGX package from scratch or directly install the SGX package from Intel’s official documentation, which also provides trusted C++ libraries for SGX developers. For more details on SGX implementations, please refer to the “Intel Software Guard Extensions (Intel SGX)—Developer Guide.”

2.5 Summary In this chapter, we have summarized basic data security technologies that are essential for designing blockchain-based applications in HCN. The techniques covered in this chapter include crypto primitives, blockchain technologies, privacyenhancing technologies for blockchain, and computation models for blockchain. In the following chapters, we will show in detail how these technologies can benefit the design of data security approaches in HCN, including reliable data provenance, transparent data query, and fair data marketing.

References 1. D. Boneh and M. Franklin, “Identity-based encryption from the Weil pairing,” in Annual International Cryptology Conference. Springer, 2001, pp. 213–229. 2. P. S. Barreto and M. Naehrig, “Pairing-friendly elliptic curves of prime order,” in International workshop on selected areas in cryptography. Springer, 2005, pp. 319–331. 3. D. Johnson, A. Menezes, and S. Vanstone, “The elliptic curve digital signature algorithm (ECDSA),” International Journal of Information Security, vol. 1, no. 1, pp. 36–63, 2001. 4. M. Dworkin, E. Barker, J. Nechvatal, J. Foti, L. Bassham, E. Roback, and J. Dray, “Advanced encryption standard (AES),” 2001.

References

55

5. T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE Transactions on Information Theory, vol. 31, no. 4, pp. 469–472, 1985. 6. D. Liu, C. Huang, L. Xue, J. Hou, X. Shen, W. Zhuang, R. Sun, and B. Ying, “Authenticated and prunable dictionary for blockchain-based VNF management,” IEEE Transactions on Wireless Communications, vol. 21, no. 11, pp. 9312–9324, 2022. 7. D. Eastlake 3rd and P. Jones, “Us secure hash algorithm 1 (sha1),” Tech. Rep., 2001. 8. S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Decentralized Business Review, p. 21260, 2008. 9. G. Wood, “Ethereum: A secure decentralised generalised transaction ledger Byzantium version,” Ethereum Project Yellow Paper, pp. 1–39, 2018-06-05. 10. E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, K. Christidis, A. De Caro, D. Enyeart, C. Ferris, G. Laventman, Y. Manevich et al., “Hyperledger fabric: a distributed operating system for permissioned blockchains,” in Proc. of the Thirteenth EuroSys Conference, 2018, pp. 1–15. 11. D. Catalano and D. Fiore, “Vector commitments and their applications,” in Proc. of PKC. Springer, 2013, pp. 55–72. 12. A. Narayanan, J. Bonneau, E. Felten, A. Miller, and S. Goldfeder, Bitcoin and cryptocurrency technologies: a comprehensive introduction. Princeton University Press, 2016. 13. A. Kiayias, A. Russell, B. David, and R. Oliynykov, “Ouroboros: A provably secure proof-ofstake blockchain protocol,” in Proc. of CRYPTO. Springer, 2017, pp. 357–388. 14. T. P. Pedersen, “Non-interactive and information-theoretic secure verifiable secret sharing,” in Annual International Cryptology Conference. Springer, 1991, pp. 129–140. 15. D. Boneh and V. Shoup, “A graduate course in applied cryptography,” Draft 0.5, 2020. 16. A. Kate, G. M. Zaverucha, and I. Goldberg, “Constant-size commitments to polynomials and their applications,” in Proc. of ASIACRYPT. Springer, 2010, pp. 177–194. 17. S. Bowe, A. Gabizon, and M. D. Green, “A multi-party protocol for constructing the public parameters of the Pinocchio zk-snark,” in Proc. of FC. Springer, 2018, pp. 64–77. 18. A. Chiesa, Y. Hu, M. Maller, P. Mishra, N. Vesely, and N. Ward, “Marlin: Preprocessing zkSNARKs with universal and updatable SRS,” in Proc. of EUROCRYPT. Springer, 2020, pp. 738–768. 19. P. Feldman, “A practical scheme for non-interactive verifiable secret sharing,” in IEEE Annual Symposium on Foundations of Computer Science, 1987, pp. 427–438. 20. B. Parno, J. Howell, C. Gentry, and M. Raykova, “Pinocchio: Nearly practical verifiable computation,” in Proc. of IEEE S&P, 2013, pp. 238–252. 21. D. Boneh, B. Bünz, and B. Fisch, “Batching techniques for accumulators with applications to IOPs and stateless blockchains,” in Proc. of CRYPTO 2019, 2019, pp. 561–586. 22. R. C. Merkle, “A digital signature based on a conventional encryption function,” in Proc. of CRYPTO. Springer, 1987, pp. 369–378. 23. J. Camenisch and M. Stadler, “Efficient group signature schemes for large groups,” in Proc. of CRYPTO. Springer, 1997, pp. 410–424. 24. O. Goldreich and Y. Oren, “Definitions and properties of zero-knowledge proof systems,” Journal of Cryptology, vol. 7, no. 1, pp. 1–32, 1994. 25. M. Bellare and O. Goldreich, “On defining proofs of knowledge,” in Proc. of CRYPTO. Springer, 1992, pp. 390–420. 26. J. Groth, “On the size of pairing-based non-interactive arguments,” in Proc. of EUROCRYPT. Springer, 2016, pp. 305–326. 27. C.-P. Schnorr, “Efficient identification and signatures for smart cards,” in Conference on the Theory and Application of Cryptology. Springer, 1989, pp. 239–252. 28. J. Camenisch, A. Kiayias, and M. Yung, “On the portability of generalized Schnorr proofs,” in Proc. of EUROCRYPT. Springer, 2009, pp. 425–442. 29. I. Damgård, J. Luo, S. Oechsner, P. Scholl, and M. Simkin, “Compact zero-knowledge proofs of small hamming weight,” in Proc. of PKC. Springer, 2018, pp. 530–560. 30. A. Fiat and A. Shamir, “How to prove yourself: Practical solutions to identification and signature problems,” in Proc. of EUROCRYPT. Springer, 1986, pp. 186–194.

56

2 Fundamental Data Security Technologies

31. J. Camenisch, R. Chaabouni, and A. Shelat, “Efficient protocols for set membership and range proofs,” in International Conference on the Theory and Application of Cryptology and Information Security. Springer, 2008, pp. 234–252. 32. B. Bünz, J. Bootle, D. Boneh, A. Poelstra, P. Wuille, and G. Maxwell, “Bulletproofs: Short proofs for confidential transactions and more,” in Proc. of IEEE S&P, 2018, pp. 315–334. 33. M. Li, J. Weng, J.-N. Liu, X. Lin, and C. Obimbo, “Toward vehicular digital forensics from decentralized trust: An accountable, privacy-preserving, and secure realization,” IEEE Internet of Things Journal, vol. 9, no. 9, pp. 7009–7024, 2021. 34. L. Wang, X. Shen, J. Li, J. Shao, and Y. Yang, “Cryptographic primitives in blockchains,” Journal of Network and Computer Applications, vol. 127, pp. 43–58, 2019. 35. R. Gennaro, C. Gentry, B. Parno, and M. Raykova, “Quadratic span programs and succinct NIZKs without PCPs,” in Proc. of EUROCRYPT. Springer, 2013, pp. 626–645. 36. J. Thaler et al., “Proofs, arguments, and zero-knowledge,” Foundations and Trends® in Privacy and Security, vol. 4, no. 2–4, pp. 117–660, 2022. 37. S. Goldwasser, Y. T. Kalai, and G. N. Rothblum, “Delegating computation: interactive proofs for muggles,” Journal of the ACM (JACM), vol. 62, no. 4, pp. 1–64, 2015. 38. A. Kosba, D. Papadopoulos, C. Papamanthou, and D. Song, “MIRAGE: Succinct arguments for randomized algorithms with applications to universal zk-snarks,” in Proc. of USENIX Security, 2020, pp. 2129–2146. 39. A. Kosba, C. Papamanthou, and E. Shi, “xJsnark: a framework for efficient verifiable computation,” in Proc. of IEEE S&P, 2018, pp. 944–961. 40. E. Ben-Sasson, A. Chiesa, E. Tromer, and M. Virza, “Succinct non-interactive zero knowledge for a von Neumann architecture,” in Proc. of USENIX Security, 2014, pp. 781–796. 41. M. Chase and A. Lysyanskaya, “On signatures of knowledge,” in Proc. of CRYPTO, 2006, pp. 78–96. 42. M. Campanelli, D. Fiore, and A. Querol, “Legosnark: Modular design and composition of succinct zero-knowledge proofs,” in Proc. of ACM CCS, 2019. 43. S. Agrawal, C. Ganesh, and P. Mohassel, “Non-interactive zero-knowledge proofs for composite statements,” in Proc. of CRYPTO, 2018, pp. 643–673. 44. D. Fiore, C. Fournet, E. Ghosh, M. Kohlweiss, O. Ohrimenko, and B. Parno, “Hash first, argue later: Adaptive verifiable computations on outsourced data,” in Proc. of ACM CCS, 2016, pp. 1304–1316. 45. D. Pointcheval and O. Sanders, “Short randomizable signatures,” in Proc. of CT-RSA. Springer, 2016, pp. 111–126. 46. D. Liu, A. Alahmadi, J. Ni, X. Lin, and X. Shen, “Anonymous reputation system for iiotenabled retail marketing atop pos blockchain,” IEEE Transactions on Industrial Informatics, vol. 15, no. 6, pp. 3527–3537, 2019. 47. T. Li, H. Wang, D. He, and J. Yu, “Blockchain-based privacy-preserving and rewarding private data sharing for IoT,” IEEE Internet of Things Journal, vol. 9, no. 16, pp. 15 138–15 149, 2022. 48. J. Shao, X. Lin, R. Lu, and C. Zuo, “A threshold anonymous authentication protocol for VANETs,” IEEE Transactions on vehicular technology, vol. 65, no. 3, pp. 1711–1720, 2015. 49. D. Boneh, X. Boyen, and H. Shacham, “Short group signatures,” in Proc. of Annual International Cryptology Conference. Springer, 2004, pp. 41–55. 50. J. Eberhardt and S. Tai, “Zokrates-scalable privacy-preserving off-chain computations,” in IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2018, pp. 1084–1091. 51. R. Cheng, F. Zhang, J. Kos, W. He, N. Hynes, N. Johnson, A. Juels, A. Miller, and D. Song, “Ekiden: A platform for confidentiality-preserving, trustworthy, and performant smart contracts,” in Proc. of IEEE EuroS&P, 2019, pp. 185–200. 52. V. Costan and S. Devadas, “Intel SGX explained,” Cryptology ePrint Archive, 2016. 53. S. Johnson, V. Scarlata, C. Rozas, E. Brickell, and F. Mckeen, “Intel software guard extensions: EPID provisioning and attestation services,” White Paper, vol. 1, no. 1–10, p. 119, 2016. 54. S. Fei, Z. Yan, W. Ding, and H. Xie, “Security vulnerabilities of SGX and countermeasures: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1–36, 2021.

Chapter 3

Reliable Data Provenance in HCN

3.1 Motivations and Applications As communication networks keep evolving, AI-assisted network service and network management are gaining momentum attention [1]. As a result, more and more data are generated and collected for fueling AI model training and inference. For example, wireless sensor networks [2] can help collect a large amount of environmental and human data for developing weather prediction or healthcare applications [3]. The collected and stored data can also be used to conduct data provenance analysis. Specifically, network administrators can log system runtime data and events to maintain a detailed history of the networks [4, 5], which helps make system-level decisions and optimize network operations. Specifically, there are multiple applications for data provenance in future networks: • Data provenance can be used to establish causal relationships between network events. When a network error or bug happens, diagnosis can be conducted through analysis of the provenance data. • Data provenance can serve as evidence of network operations for forensics purposes in case of any dispute. They can also be used to identify malicious nodes in a distributed network [6]. • Data provenance can also be collected and analyzed at system runtime. By doing so, global threat detection system can be implemented to send system alerts efficiently [7]. • Data provenance can help build secure and reliable data management in distributed networks, such as IoT network [8, 9]. In this regard, data sources, data transfers, and data usage can be traced and processed. This chapter investigates data provenance with blockchain-based solutions in future networks. The organization of this chapter is summarized as follows. In © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 D. Liu, X. (Sherman) Shen, Blockchain-Based Data Security in Heterogeneous Communications Networks, Wireless Networks, https://doi.org/10.1007/978-3-031-52477-6_3

57

58

3 Reliable Data Provenance in HCN

Sect. 3.2, we discuss the application requirements for data provenance, including provenance trustworthiness, provenance privacy, and provenance query. In Sect. 3.3, we review state-of-the-art data provenance approaches with design challenges. In Sect. 3.4, we present a case study featuring distributed network provenance [10]. Network provenance model, security model, and representative constructions with evaluations are presented. Finally, we conclude this chapter in Sect. 3.5.

3.2 Application Requirements To implement data provenance in future networks, multiple application requirements should be realized, including provenance trustworthiness, provenance privacy, and provenance query.

3.2.1 Provenance Trustworthiness Provenance data record critical system logs and events for root-cause analysis and threat detection. Therefore, provenance data should be collected and stored in a trustworthy manner, where the authenticity, integrity, and timeliness of the provenance data should be efficiently preserved.

3.2.2 Provenance Privacy Provenance data often contain sensitive information, including system configurations and network traffic information. Without proper protection, leakage of the sensitive information can lead to severe data breaches under privacy regulations. Therefore, the privacy of the provenance data must be preserved to enforce secure data access and protection.

3.2.3 Provenance Query To enable effective and efficient access to provenance data, rich query functionalities should be provided [11]. For example, network provenance data are often organized as a causal relation graph, where keyword query, graph query, and range query are needed. Therefore, an efficient query scheme should be designed for data provenance applications, which also considers the trustworthiness and privacy requirements.

3.3 State-of-the-Art Data Provenance Approaches

59

3.3 State-of-the-Art Data Provenance Approaches In this section, we summarize the state-of-the-art data provenance approaches, in terms of non-blockchain-based and blockchain-based approaches. Then, we discuss design challenges and solutions for balancing decentralization and efficiency for data provenance.

3.3.1 Non-Blockchain-Based Approach Data provenance is widely used in analyzing the system behavior and safeguarding the system security of modern information systems, such as databases, networks, and threat detection systems. To this end, different data provenance scenarios can have different design requirements. • For threat detection systems, an alert investigation scheme based on a provenance graph was designed in [7]. The proposed system analyzed the suspiciousness of an event along with its neighboring nodes in the provenance graph with a network diffusion algorithm to calculate anomaly scores. • For data provenance applications with critical delay requirements, temporal provenance was proposed and designed in [12]. The sequence to process requests and the longest dependency chain for scheduling parallel tasks were thoroughly investigated to improve the provenance efficiency. To identify root causes in the Linux system, static analysis of Linux kernel code was conducted for runtime provenance analysis [4]. • In secure network provenance, a network system should be able to explain the network’s state and causes [13]. To be resilient to some compromised nodes in the system, a secure provenance graph was securely constructed for forensic analysis. For diagnosing network bugs in distributed systems, “differential provenance” was proposed in [14]. The proposed scheme aimed to track the key network events when network states change for root-cause analysis. • Privacy-preserving network provenance was studied in [11]. The proposed scheme designed private provenance query and access control from symmetric searchable encryption (SSE) and structured encryption techniques. That is, tuples and keywords from provenance graphs were encrypted using SSE to enable direct search over the encrypted data. Existing data provenance approaches have been designed with rich functionalities for different provenance applications. However, the mentioned solutions mainly consider collecting provenance data, constructing provenance graphs, and analyzing provenance events in a single trusted domain. Therefore, distributed provenance architecture and secure solutions across different trust domains should be investigated.

60

3 Reliable Data Provenance in HCN

3.3.2 Blockchain-Based Approach To achieve reliable data provenance across different trust domains, the blockchain can serve as a trusted database for storing and querying provenance data. There are multiple use cases for blockchain-based data provenance: • A blockchain-based data provenance architecture was proposed in [15]. More specifically, data operations of the cloud provider were packed and uploaded onto the blockchain as transactions. With the immutable blockchain storage, the onchain data can be used for integrity and trustworthiness verifications, and data tracing [16]. Blockchain also can serve as a reliable platform for enforcing data accountability and provenance which complies with GDPR [17]. • Blockchain can be adopted to build a reliable and secure data provenance system for supply chain management [18, 19]. In this case, industry partners can design smart contract to collaboratively manage their business in a transparent and decentralized manner. • A provenance system for public blockchains was designed in [20], where finegrained data operations on the blockchain were captured. The provenance data can also be efficiently queried for analytical tasks. • Blockchain has the great potential to increase data security in vehicular networks [21]. A blockchain-based ad delivery scheme was proposed in [22] to promote trustworthy and transparent targeted advertising in vehicular networks. To reduce on-chain costs, a proof-of-misbehavior strategy was adopted to achieve postevent accountability. • Blockchain can be adopted to build data auditing architecture for cloud service providers [23]. A blockchain-based data auditing scheme was proposed in [24], where polynomial commitment and zero-knowledge proof techniques were utilized to compute data auditing proofs for on-chain verifications. Cross-domain query on the blockchain is a critical part of data provenance that has also attracted extensive attention. • A searchable encryption scheme that used the smart contract to store encrypted data for on-chain search was proposed in [25]. Designs and implementations on both public and private blockchains were explored and tested [25]. • A provenance scheme based on blockchain for encrypted search was designed in [26]. Verifications and fair payments of search operations were achieved from time-locked payment techniques and incentive mechanisms. • Advanced query operations with verifications were also explored. For example, Boolean query over the blockchain data was designed in [27], where accumulator-based authenticators were integrated for intra/inter-block data queries. Moreover, verifiable range query was proposed in [28] from hybrid Merkle-based data authenticators. Existing blockchain-based data provenance approaches have explored architectural designs and adopted smart contracts with on-chain authenticators for fulfilling

3.4 Use Case: Distributed Network Provenance

61

various provenance tasks. At the same time, due to the expensive on-chain storage and computation costs, computation and communication efficacy in blockchainbased provenance schemes should be further investigated.

3.3.3 Decentralization and Efficiency Dilemma There is a tradeoff for data provenance approaches. First, as there are often cases where provenance happens across trust boundaries, a decentralized data provenance architecture based on blockchain is required. Second, the blockchainbased approach may lead to increasing implementation costs. To address the decentralization and efficiency dilemma, the on/off-chain computation model [29, 30] based on SNARK [31, 32] is a promising solution. A hash-then-prove paradigm was formulated and designed in [33], where a large volume of data can be stored as Pedersen commitments for providing verifiable computations with either SNARK or Sigma protocols [34]. To this end, provenance data can be stored on the blockchain as succinct authenticators. With the SNARK technique, operations over the provenance data can be conducted off-chain with proof being efficiently verified on the blockchain. At the same time, due to the requirements of graph-based provenance queries and various query functionalities, it is still a non-trivial task to design an efficient blockchain-based data provenance scheme for future networks. In the following, we will present distributed network provenance as a concrete use case to illustrate how to balance decentralization and efficiency in blockchain-based data provenance.

3.4 Use Case: Distributed Network Provenance Future networks are environed to have a very heterogeneous and distributed architecture, where network stakeholders from different trust domains, including operators, vendors, service providers, etc., work collaboratively to manage network resources and provide user services [35]. As a result, data provenance can happen across trust boundaries. Due to the lack of a trusted centralized platform for managing provenance data, network administrators need to maintain their data locally. When a global error or bug happens, a cross-domain provenance query is often required. Blockchain can be adopted to build a reliable platform for storing and querying provenance data among different trust domains [36]. With zero-knowledge proof technique [37], on-chain data privacy and trusted provenance queries can also be achieved. In this section, we propose a blockchain-based distributed network provenance scheme to address the decentralization and efficiency dilemma.

62

3 Reliable Data Provenance in HCN

• First, we design a multi-level query index for graph-based provenance data to support rich query functionalities. • Second, we design succinct on-chain digests for the index and provenance data. Integrated with SNARK-based on/off-chain computation models, the proposed scheme supports verifiable provenance query and guarantees archiving security. • We conduct security analysis and extensive experiments to demonstrate the efficiency of the proposed scheme. In the following, we present a representative construction for blockchain-based data provenance [10]. First, we present the graph-based data structure for the distributed network provenance with the security model and goals. Then, we present detailed constructions and discuss the security and performance properties.

3.4.1 Network Provenance Model 3.4.1.1

Graph-Based Network Provenance

This chapter focuses on the graph-based network provenance model [11, 13]. More specifically, network events, such as packet forwarding or packet drop, can be modeled as a directed graph: G = (V , E).

.

(3.1)

The vertexes refer to network events which are denoted as: vi ∈ V .

.

(3.2)

The edges refer to provenance relations between the events which are denoted as: e = (v1 , v2 ) ∈ E.

.

(3.3)

The direction of the edge indicates the causal relationship between the events. This model is widely used in the network provenance area, which is also denoted as the Network Dialog (ND-log) model. In Fig. 3.1, we briefly illustrate an example of a provenance graph for packet forwarding. More specifically, there are three vertexes in the figure: • V1 represents an incoming packet at node N2 with destination IP address as “192.168.1.1.” • V2 represents a packet forwarding rule that packets at N2 with destination as “192.168.1.1” should be forwarded to port 3. • V3 represents a packet forwarding instance where the packet in V1 is forwarded to port 3 at N2.

3.4 Use Case: Distributed Network Provenance Fig. 3.1 Packet forwarding instance

63

V1: Packet (@N2, 192.168.1.1) V2: ForwardRule (@N2, 192.168.1.1, P3)

V3: PacketForward (@N2, 192.168.1.1, P3) We can see clearly that the three vertexes describe a packet forwarding instance with two events (V1 and V3) and a packet forwarding rule (V2). In a provenance graph, there is one edge from V1 to V3 and another edge from V2 to V3, which indicates the causal relationship. The provenance graph, denoted as provenance index I , should support query functionalities [5] against a query (Q). More specifically, a provenance query consists of three modules: • Keyword query: This query module checks if a given keyword exists in a vertex of the provenance graph. • Range query: This query module checks if a numeric value of a vertex lies in a given range. • K-hop ancestor query: This query module returns vertexes that are that k-hop before a given vertex. A query example can be shown from the packet forwarding instance. Specifically, the query can be constructed as: Q = (N2, P acketF orward, 192.168.1.0-20).

.

(3.4)

This query will find (1) “packet forwarding” instances that happen at N2, which can be represented as keyword queries, and (2) IP destination that ranges from 192.168.1.0 to 192.168.1.20, which can be represented as a range query.

3.4.1.2

Distributed Network Provenance Model

Based on the graph-based network provenance, this chapter considers a distributed setting where there can be multiple network administrators in different network domains. As shown in Fig. 3.2, a set of N network administrators is denoted as:

64

3 Reliable Data Provenance in HCN

Log of network domain 3

Admin3 Cross-domain provenance query Log of network domain 2

Admin2 Log of network domain 1

Admin1 Fig. 3.2 Distributed network provenance

A = {A1 , A2 , . . . , AN }.

.

(3.5)

Each network administrator can maintain a local provenance graph for cross-domain provenance queries. Key notations of this chapter are illustrated below: • .G = (V , E) represents a local provenance graph. Each administrator can have a different graph based on its system runtime events. • .vi ∈ V represents a network event or rule which is also a vertex in the graph. • To store the information of .vi , we use the following data structure: Ti = (vidi , Ivi , ci ),

.

(3.6)

with ID .vidi , query index .Ivi , and event log .ci . More specifically, .Ivi can include various features or attributes to describe the event, such as time, location, severity, etc., which can be used as a keyword or a numeric value for a provenance query. .ci denotes a more detailed text description of the event. • I is the overall query index of a network administrator that consists of all .Ivi of its local provenance graph. For improving search efficiency in the implementations, the index can be split into first-level index .IF and second-level index .IS . • Q is a provenance query and a query function is defined as: F(I, Q) → RI .

.

(3.7)

• R represents the query result, consisting of two parts: .RI is the index query result while .RL is the corresponding detailed log information of .RI . • D represents the digests of provenance and log index. More specifically, .DI is the digest of the provenance index and .DL is the digest of the provenance log. • .π represents the query proof with three parts: .πI as the index query proof, .πL as the log proof, and .πD as the digest proof.

3.4 Use Case: Distributed Network Provenance

65

When a cross-domain provenance query happens, such as a global diagnosis of network errors, there are two entities involved: • Investigating administrator (.AI ) would like to query the provenance graph of another network administrator with a provenance query Q. • Source administrator (.AS ) is the owner of a local provenance graph for the query. It should be noted that a provenance query can happen between any two network administrators. Moreover, .AI can query many other source administrators at the same time. Each queried source administrator can perform a provenance query using Q over their local graphs G and return .(vidi , ci ) ∈ G with a correctness proof. After receiving all the provenance query results, the investigating administrator can finally reconstruct a provenance graph for network diagnosis. The cross-domain query process consists of four algorithms: • Setup.(1λ , F) – It takes a pubic index query function .F to output three sets of parameters, including system public parameters (pp), a digest key (DK), and common reference string (CRS). • ProvCon.(G, pp, CRS, DK) Using the public parameters, CRS, and the digest key, the algorithm computes a provenance index I and a digest D from a provenance graph G. • Query.(Q, G, pp, CRS, DK, I ) The algorithm takes a provenance query, the public parameters, CRS, and the digest key. The algorithm performs a provenance query Q over the provenance index I . The algorithm outputs a query result R and a query proof .π . • ProvVer.(Q, R, π, CRS, DK, D) The algorithm takes a query, query result, proof along with CRS, digest key, and the digest. It outputs either accept if the query proof is correct or reject otherwise. As discussed before, the index query function .F is the key query component that can include three query modules. All three modules can be instantiated in a single vc framework. By representing keywords and attributes as numeric values with a keyword dictionary W , the keyword query can be the equality check between two values while the range query can be represented as value comparisons. For example, for a query Q over a provenance graph G represented by I , a keyword query finds vertexes whose index .Ivi contains the queried keyword.

3.4.2 Defining Archiving Security In this section, we discuss the security model and design goals of the proposed scheme.

66

3.4.2.1

3 Reliable Data Provenance in HCN

Security Model

Network provenance is often used for post-event analysis and diagnosis. As a result, it is not a trivial task for network administrators to predict the global influence of network events at runtime and whether a network event can be a cause of a future diagnosis. Therefore, this chapter considers security property related to the archiving nature of network provenance. More specifically, this chapter assumes that network administrators are honest at system runtime to record and archive network logs. Later, when a global error happens and a diagnosis is required, it can be tempting for the administrators to modify the archived network log that does not look good for them. Under the security model, this chapter defines the archiving security with the following properties: • Correctness: An investigating network administrator will accept a query instance consisting of a provenance query and query results when two conditions are met: First, Setup, ProvCon, and Query algorithms are honestly executed; second, digests of provenance index are honestly computed. • Integrity: The query result should also be correct, which includes three folds: First, on-chain provenance digests cannot be maliciously modified; second, only vertexes in the provenance graph that satisfy the query can be returned and all such results must be returned; third, for each log result .ci ∈ R, .ci must be in the provenance graph.

3.4.2.2

Design Goals

Under the network provenance model and security model, the proposed scheme should achieve the following design goals: • All defined query modules should be supported in a single vc framework, for keyword, range, and K-hop ancestor queries. • Archiving security should be achieved for its correctness and integrity. • Communication and computation overheads for the proposed scheme should be feasible and efficient in real-world implementations.

3.4.3 Building Blocks The representative construction uses the following building blocks: cryptographic primitives and verifiable computation (vc) from Pinocchio [32].

3.4 Use Case: Distributed Network Provenance

3.4.3.1

67

Cryptographic Primitives

This chapter utilizes a security parameter .λ, which is often taken implicitly in algorithms. A set of cyclic groups G = (G1 , G2 , GT )

.

(3.8)

is defined with a Type III bilinear pairing e and a prime order p [33, 38, 39]. A collision-resistant hash function is defined as: H ash : {0, 1}∗ → {0, 1}256 .

.

(3.9)

We use .poly(λ) to represent a polynomial algorithm.

3.4.3.2

Pinocchio-Based VC

This chapter adopts Pinocchio-based vc [32] in constructing cross-domain provenance query. A cross-domain query function is defined as follows: F(Q, I ) = RI .

.

(3.10)

Q is a cross-domain provenance query and I is a query index. .RI is the query result that executes Q over I . A verifiable version of the above function has the following two requirements: • Input authenticity: The query and the index should be authenticated by trusted data sources. • Execution correctness: The query execution should follow the pre-determined query rules. For example, a query can check the equality of two keywords (keyword query) or if a given value lies in a range (range query). To achieve verifiable query, the function can be first converted to an equivalent arithmetic circuit. For the circuit, the input wires are the query and index, and the output wires are the query results. The results are correct iff .(Q, I, RI ) are valid assignments of the circuit. The circuit evaluation is then translated to the check of the corresponding Quadratic Arithmetic Program (QAP), which can be cryptographically implemented with SNARK [31]. The benefit of using SNARK for verifiable provenance queries is that the verifications of query results can be very efficient on the blockchain. For illustrative purposes, we define the algorithms of Pinocchio vc as follows [32]: • .KeyGen(pp, F) KeyGen uses implicitly the security parameter, system public parameters (pp), and a pre-defined provenance query function. KeyGen outputs common ref-

68

3 Reliable Data Provenance in HCN

erence string (CRS), which consists of two sets of keys: evaluation key and verification key. • .Eval(ek, Q, I ) Eval takes into a query, a provenance index, and the evaluation key from CRS. The algorithm performs a query over the index to output query results. Moreover, a succinct query proof (.πI ) is also generated for verifying the query results. • .Verify(vk, Q, RI , πI ) Verify uses the query, the results, the verification key, and the proof. The algorithm outputs either accept or reject. Note that, the verification omits the provenance index since the proposed scheme utilizes an external linking method to build a relationship between index digests in .πI and on-chain provenance index digests. In this case, the proposed scheme requires that the on-digests are trusted set up which complies with our definition of archiving security.

3.4.4 Representative Constructions In this section, we present the detailed constructions of the four algorithms of the cross-domain network provenance model. There are mainly three entities involved: • Trusted authority (TA) is responsible for setting up the system by running the Setup algorithm to generate required system parameters. Note that, TA in this chapter is a conceptual entity for illustration simplicity, which can be replaced by a multi-party computation protocol [40]. • Blockchain is a distributed ledger maintained by network administrators for storing provenance digests. • Network administrators are responsible for honestly observing their network events and constructing their local provenance graphs. At the same time, the administrators also compute correctly the digests of the provenance graph to be uploaded onto the blockchain. Briefly speaking, the cross-domain query works as follows: • Any network administrator can be either an investigating administrator (.AI ) or a source administrator (.AS ). .AI can construct a provenance query Q and send the query to .AS . • .AS runs the Query algorithm with its local provenance data and on-chain provenance digests to return the query results to .AI . • Upon receiving the query results, .AI retrieves the on-chain digests and runs the ProvVer algorithm. If all verifications are correct, .AI accepts the query result; otherwise, .AI can report query incorrectness.

3.4 Use Case: Distributed Network Provenance

69

(1) Generate system parameters

Blockchain

TA

(2) Send public parameters to blockchain Fig. 3.3 Workflow of system setup

In the following, we present the detailed constructions of the four algorithms: Setup, ProvCon, Query, and ProvVer.

3.4.4.1

System Setup by TA

As shown in Fig. 3.3, the system setup is conducted by TA. • First, TA chooses a set of system parameters: (λ, G = (G1 , G2 , GT ), p, e, g1 , g2 ),

.

(3.11)

where .λ is the security parameter, .G is a set of cyclic groups with generators g1 ∈ G1 and .g2 ∈ G2 . The well-formedness of the parameters can be easily checked. TA also chooses a secure hash function as H ash. To encode provenance query into circuit-based representations, TA also defines a dictionary that maps a provenance attribute to a numeric value:

.

W =(a1 , a2 , . . . , an ), .

({ax }x∈[1,n1 ] , {ay }y∈[n1 +1,n] ).

(3.12)

More specifically, the first set denotes a numeric value, such as event time; the second set denotes an attribute (binary value), such as event description. By doing so, integer operations can be supported, including comparison, multiplication, and addition. TA determines a template for constructing a provenance index for a provenance graph G as follows: .

I = (IF , IS ).

(3.13)

The first-level and the second-level provenance indexes are defined as follows: IF = {Iki }i∈[1,n2 ] , .

IS = {Ivi }i∈[1,m] , n2 = n − n1 , m = |V |.

(3.14)

70

3 Reliable Data Provenance in HCN

n, n1 , and .n2 are dimensions of the sub-indexes while .|V | is the number of vertexes in the provenance graph. To enable verifiable provenance query, TA needs to set up the evaluation and verification keys (.ek, vk) for Pinocchio-based SNARK [32] by executing the following function:

.

KeyGen(pp, F).

.

(3.15)

There is a set of generators in ek that corresponds to I/O wires of the provenance index: SI = {g¯ i }i∈[1,m∗(n+n2 )] .

.

(3.16)

With the generators, TA first randomly chooses the following keys: P = {Pi }i∈[1,m∗(n+n2 )] ∈R G1 , .

X = {Xi }i∈[1,m∗(n+n2 )] ∈R G1 ,

(3.17)

α, β, γ ∈R Zp . Using .P , X, α, β, γ , TA further computes: .

γ β ˆ = g2 . Eˆ = g2α , Fˆ = g2 , G

(3.18)

TA computes a set of digest key as follows: Y = {Yi }i∈[1,m∗(n+n2 )] , .

β γ

Yi = Piα Xi g¯ i , i ∈ [1, m ∗ (n + n2 )].

(3.19)

Note that, these secrets along with the trapdoor secret in generating .(ek, vk) must be destroyed after the setup. Moreover, the size of the dictionary and indexes must be pre-determined due to the relation-dependent nature of the Pinocchio SNARK. At the same time, search optimization techniques, such as padding or prefix encoding, can also be applied in this framework. The digest key is defined as follows: .

DK = (DKE , DKV ).

(3.20)

Specifically, .DKE , DKV are defined as follows: DKE = (P , SI , X, Y ) .

ˆ Fˆ , G). ˆ DKV = (E,

(3.21)

3.4 Use Case: Distributed Network Provenance

71

(1) Construct index/log digests

Blockchain

Admin

(2) Send index/log digests to blockchain Fig. 3.4 Workflow of digest construction

• Finally, public system parameters are as follows: pp = (G, g1 , g2 , W, H ash, F, CRS, DK).

.

(3.22)

TA can publish the public parameters onto the blockchain.

3.4.4.2

On-chain Digest Construction by Administrators

As shown in Fig. 3.4, each network administrator construct its provenance index and digest as follows: • First, the network administrator can construct an on-chain digest of its provenance graph: G = (V , E).

.

(3.23)

For each vertex .vi in the provenance graph, the administrator needs to compute an associated query index .Ivi [41]: Ivi = ({rx }x∈[1,n1 ] , {ky }y∈[1,n2 ] ).

.

(3.24)

rx represents a numeric value while .ky is a binary value that represents if corresponding keyword, .an1 +y , in W exists in the index or not. A typical example is shown in Fig. 3.5. We can see that each index .Ivi has a unique ID .vidi along with a set of attributes. More specifically, the attribute for IP Addr can be a numeric value while the other binary values indicate if this vertex contains a specific keyword, such as “Forward.” As graph traversal is not efficiently supported in SNARK-based vc [42], an index for querying K-Hop ancestor needs also to be designed, which is denoted as .IK . The idea is simple: for each vertex in the provenance graph, .IK stores all vids for its ancestors within K-hop. For the Pinocchio-based vc framework, computations are represented by circuits that do not efficiently support query breaks and advanced queries. Therefore, to improve the query efficiency, we need to design multiple query indexes in a unified framework. More specifically, the provenance index can be split into two parts:

.

72

3 Reliable Data Provenance in HCN

Node IP Addr, …… , Forward, Loss, …… ,FTP)),

1,

Keyword

1

(192.168.2.24,…, (1, 0,…, 1)) ……

3 4

3,

(192.67.7.59,…, (0, 0,…, 1))

Index

……

3

……

9 21

……

……

, (255.255.1.1,…, (1, 0,…, 1))

Index

Index Index Fig. 3.5 Index digest

I = (IF , IS ).

.

(3.25)

IF is the first-level index that can locate all vertexes that contain a specific keyword. .IS is the second-level index that contains all vertex indexes .Ivi . The index can be constructed using Algorithm 1. Briefly speaking, the two-level strategy first uses a keyword to find all vertexes that contain the keyword; then, a full query can be applied in the reduced search space.

.

Algorithm 1: Index construction Input: {Ivi }vi ∈V Output: IF = {Ikj }j ∈[1,n2 ] and IS = {Ivi }i∈[1,m] for vi ∈ V do for j = 1 to n2 do if kj /= 0 ∈ Ivi then Add vidi to Ikj for j = 1 to n2 do Add Ikj to IF for i = 1 to m do Add Ivi to IS

• After constructing the provenance index, the network administrator computes the on-chain digest for the index: DI = (DF , DS ).

.

(3.26)

3.4 Use Case: Distributed Network Provenance

73

The administrator retrieves the digest key .DKE and computes .DF as follows: .

||

DF =

||

μ

i,j P(i−1)∗m+j .

i∈[1,n2 ] j ∈[1,m]

(3.27)

μi,j is set as follows:

.

μi,j = 0 if vidj ∈ / Iki ; .

μi,j ∈R Zp , otherwise.

(3.28)

The administrator computes .DS as follows: DS = .

||

||

ω

Pn2x,y ∗m+(x−1)∗n+y .

(3.29)

x∈[1,m] y∈[1,n]

ωx,y is set as follows:

.

ωx,y = ry ∈ Ivx if y ∈ [1, n1 ]; .

ωx,y = ky ∈ Ivx if y ∈ [n1 + 1, n].

(3.30)

The administrator also needs to compute the digest of event description .(ci ) for each vertex .vi . As shown in Algorithm 2, the administrator computes the Merkle tree [43] of all event descriptions. Figure 3.6 shows the constructed Merkle tree where the Merkle root is the digest .DL . • Finally, each administrator uploads the digest onto the blockchain: D = (DI , DL ).

.

(3.31)

As shown in Fig. 3.7, there are N network administrators who submit their digests on the blockchain. The digests can serve as secure authenticators for cross-domain queries.

Algorithm 2: Merkle digest generation Input: Provenance Logs {ci }vi ∈V Output: Log Digest DL for i = m to 2m − 1 do Set Mi ← H ash(vidi−m+1 |ci−m+1 ) for j = m − 1 to 1 do Set Mj ← H ash(M2j |M2j +1 ) Set DL ← M1

74

3 Reliable Data Provenance in HCN

=

(

2|

3)

…… /2

(

=

=

(

|

+1 )

……

/2

1| 1)

−1

+1

−1

−2

Fig. 3.6 Log digest

Ethereum Blockchain

Provenance Digest

1, {

,

}

2, {

,

}

…… ,{

,

}

Fig. 3.7 On-chain digest

3.4.4.3

Cross-Domain Provenance Query

An investigating administrator, .AI can query the provenance graph of a source administrator, .AS . As shown in Fig. 3.8, cross-domain provenance queries have the following steps. For illustrative simplicity, secure and authenticated channels have already been established between all network administrators with identity management and credential provisioning. • First, .AI constructs a provenance query and sends the query to .AS : Q = ({(rl , rr )i }i∈[1,n1 ] , {k˜j }j ∈[1,n2 ] ),

.

(3.32)

3.4 Use Case: Distributed Network Provenance

75

(1) Construct and send a provenance query

AdminS

AdminI

(2) Perform query and compute a proof; Send back the results and the proof Fig. 3.8 Workflow of provenance query

where .(rl , rr )i is a range and .k˜j is a binary value to determine if a specific keyword (.an1 +j ∈ W ) is required or not. • Second, after receiving the query from .AI , .AS runs Algorithm 3 to output query result .RI that consist of relevance scores of k-highest vertexes in its local graph. The high-level idea is to search the first-level index and compute relevance scores for vertexes in the second-level index. It should be noted that since CRS is generated once, the size of the index and query can be set with considerations of empty padding. Moreover, the size of all data structures must be fixed at the setup phase. For different provenance cases, the query sequence can be adjusted accordingly. Algorithm 3: .F execution Input: Query vector Q, index I , graph G Output: Query result RI Find k∗ = 1 ∈ Q with compact subindex Ik∗ ∈ IF for vidi ∈ Ik∗ /= 0 do for j ∈ [1, n1 ] do Set f lag = 1 if rj ∈ Ivi does not lie in (rl , rr )j ∈ Q then Set f lag = 0 Set score = 0 for j ∈ [1, n2 ] do Compute score = score + k˜j ∗ kj Set RI [i] = score ∗ f lag

• .AI proves the correct execution of .RI by computing: Eval(ek, Q, I ),

.

(3.33)

to get a proof .πI . From .πI , .AS extracts two group elements .mF , mS which represent the digests of I using generators from .SI . Furthermore, .AS needs to compute a digest proof that correct index values are used in the computation. That is, .mF , mS contain the same values compared with the on-chain digest .DI . More specifically, .AS

76

3 Reliable Data Provenance in HCN

computes: πD = (XF , YF , XS , YS ).

.

(3.34)

More specifically, .πD consists of the items for the first-level index: XF =

||

||

μ

i,j X(i−1)∗m+j ,

i∈[1,n2 ] j ∈[1,m] .

YF =

||

||

μ

(3.35)

i,j Y(i−1)∗m+j .

i∈[1,n2 ] j ∈[1,m]

πD also consists of the items for the second-level index:

.

||

||

XS =

ω

Xn2x,y ∗m+(x−1)∗n+y ,

x∈[1,m] y∈[1,n]

||

.

YS =

||

ω

(3.36)

Yn2x,y ∗m+(x−1)∗n+y .

x∈[1,m] y∈[1,n]

AS retrieves .ci for each .vi ∈ RI and adds .ci to .RL . .AS computes a Merkle proof .(πL ) that .ci is integrated using the Merkle root stored on the blockchain. A Merkle proof consists of all sibling nodes along the path from .ci to the root. .AS denotes R as: .

R = (RI , RL ).

.

(3.37)

AS denotes .π as:

.

.

π = (πI , πL , πD ).

(3.38)

• Finally, .AS returns the following message to .AI : (R, π ).

.

(3.39)

The query results consist of index query result .RI and corresponding log description .RL . At the same time, there are three proofs for query execution, log integrity, and digest consistency. All three proofs need to be verified by the investigating administrator.

3.4.4.4

Verification of Provenance Query

As shown in Fig. 3.9, the workflow of query verification consists of two steps: • First, .AI retries the digest with verification keys from the blockchain:

3.4 Use Case: Distributed Network Provenance

77

(1) Retrieve digests from blockchain

AdminS

Blockchain

(2) Verify the query and the proof

Fig. 3.9 Workflow of query verification

D = (DI , DL ),

(3.40)

DI = (DF , DS ).

(3.41)

.

where .DI consists two digests: .

• Then, .AI checks the following three proofs: – First, .AI receives the query result and proof from the source administrator via a secure and authenticated channel. .AI checks the correctness of the digest proof as follows: ˆ ˆ e(YF , g2 ) = e(DF , E)e(X F , Fˆ )e(mF , G), ?

.

ˆ ˆ e(YS , g2 ) = e(DS , E)e(X S , Fˆ )e(mS , G). ?

(3.42)

This step is to ensure that consistent index values are used in the query execution, which corresponds to the input authenticity requirement of a verifiable computation scheme. – Second, if the digest proof passes the verification, .AI further checks the correctness of the SNARK proof: ?

Verify(vk, Q, RI , πI ) = 1.

.

(3.43)

– Third, .AI also checks the correctness of the Merkle proofs related to .ci . Specifically, .AI recomputes the Merkle root using the nodes in the proof and checks if the recomputed root equals .DL . If all the proofs are correct, .AI accepts the provenance query result. Finally, AI can reconstruct a provenance graph from all provenance query results for network analysis.

.

78

3 Reliable Data Provenance in HCN

3.4.5 Security Analysis We analyze the security properties of the proposed scheme. First, we discuss the required security assumptions and review the blockchain security. Second, security arguments of SNARK-based verifiable computations are presented. Finally, archiving security is analyzed following the previous definition.

3.4.5.1

Security Assumptions

We introduce the security assumptions for analyzing the proposed scheme [32]. More specifically, a set of knowledge-based assumptions is required, including qPower Diffie-Hellman (q-PDH), q-Strong Diffie-Hellman (q-SDH), and q-Power Knowledge of Exponent (q-PKE), where q is a security parameter by .poly(λ). For distinctive groups, the Strong External Diffie-Hellman (SXDH) assumption should also hold for computationally bounded adversaries. The proposed scheme achieves archiving security under the conditions that the q-PDH, q-SDH, q-PKE, and SXDH hold in .G, the public blockchain is secure with liveness and persistence, and the hash function is collision-resistant.

3.4.5.2

Blockchain Security

We briefly recall the security notions of blockchain mentioned in Chapter 2 [44]: Persistence requires that valid transactions will be stable in the blockchain. That is, all blockchain nodes can reach a consistent view of the shared ledger in the long term. Liveness ensures that the transaction confirmation time is guaranteed. That is, a valid transaction will finally be included in the chain within a specific time with a high probability. The blockchain security in this chapter is used to achieve immutable on-chain digest storage. This chapter adopts the public blockchain, i.e., Ethereum, for distributed network provenance. Therefore, blockchain security requires that the majority of nodes are honest in the blockchain network. For example, in a PoW consensus protocol, 51% of the blockchain nodes must be honest. In the proposed scheme, network administrators are responsible for maintaining the blockchain where the majority of them are considered to be honest. At the same time, the proposed scheme can also be applied to other public blockchains.

3.4.5.3

VC Security

There are three security notions: completeness, soundness, and collision resistance. The first two notions are for SNARK security and the third notion is for digest security.

3.4 Use Case: Distributed Network Provenance

79

• The definition of completeness is shown below: ⎤ Verify(vk, Q, RI , πI ) = 1 : ⎥ ⎢ CRS ← Setup(1λ , F)∧ ⎥ = 1. . Pr ⎢ ⎦ ⎣ RI ← F(Q, I )∧ ⎡

πI ← Eval(ek, Q, I ) Similar to the definition of SNARK, an honest verifier will always accept a query instance with proof if they are correctly computed. This property is reduced to the correctness of the underlying SNARK scheme, which has been discussed in Chap. 2. More specifically, the theorem in [31] states that the divisibility check of QAP is equivalent to circuit evaluations of the computing program. • The definition of soundness is shown below: ⎤ Verify(vk, Q, RI∗ , πI∗ ) = 1 : ⎥ ⎢ (CRS, pp) ← Setup(1λ , F)∧ ⎥ = neg(λ). . Pr ⎢ ⎦ ⎣ R ∗ - F(Q, I )∧ I ∗ ∗ (πI , RI ) ← Adv(CRS, pp, F, Q, I ) ⎡

This property is similar to the classic (computational) soundness property of SNARK. From [32], the soundness property is reduced to the security of d-PKE, q-SDH, and q-PDH assumptions through a sequence of security games. At the same time, it is critical for TA to destroy the trapdoor secret when generating CRS, which is required in secure schemes based on knowledge assumptions. An alternative solution is to adopt a secure multi-party computation protocol among network administrators to collaboratively generate the CRS, which however would consume much computation and communication resources. It should be mentioned in this soundness game, an adversary is only given access to the common reference string (CRS). That is, the adversary cannot query an oracle to generate valid query/proof instances. This definition is enough for verifiable computation settings using SNARK, where only the correctness of the execution is critical. In contrast, in a signature of knowledge scheme with SNARK, knowledge soundness is required where the adversary has access to valid instances. • The definition of collision resistance is shown below: ⎡ ⎤ DI = DI ' : ⎢ I /= I ' ∧ ⎥ ⎢ ⎥ ⎢ ⎥ λ . Pr ⎢ (DK, pp) ← Setup(1 , F)∧ ⎥ = neg(λ). ⎢ ⎥ ⎣ DI ← Digest(DK, pp, I )∧ ⎦ DI ' ← Digest(DK, pp, I ' ) This notion essentially guarantees that the on-chain digest of provenance indexes cannot be open to two different values. First, in generating CRS, index-related

80

3 Reliable Data Provenance in HCN

generators should be linearly independent, which can be done via an augmented CRS generation phase; second, SXDH assumption [33] should hold for a computationally bounded adversary; third, the blockchain storage should be immutable and cannot be maliciously modified after it is confirmed. Therefore, on-chain digests cannot be replaced after they are uploaded onto the blockchain.

3.4.5.4

Security of Merkle Proof

The security of Merkle proof can be reduced to the collision resistance of the used hash function. That is, an adversary can forge a proof for items that are not digested in the Merkle tree if and only if the adversary can find collisions of the hash function, which is computationally infeasible.

3.4.5.5

Archiving Security

The archiving security in this chapter includes twofolds, which will be analyzed based on the security properties discussed above. • According to the threat model, network administrators honestly compute onchain index digest and log digest. The completeness of SNARK and the correctness of Merkle proof ensure that an honest verifier will always accept .R = (RI , RL ). This guarantees the correctness property in archiving security. • For the first requirement of integrity in archiving security, once the provenance digests are stored on the blockchain, they cannot be modified due to the storage immutability on the blockchain. For the second and third requirements, the soundness property of the SNARK and collision-resistance property of on-chain digests ensure that correct query executions are carried out over the authenticated provenance index. This guarantees all valid query results for a query are returned. For the fourth requirement, the security of Merkle proof ensures that the returned log information is correctly digested on the blockchain. In summary, the proposed data provenance scheme achieves archiving security given that the security primitives discussed above are secure for a computationally bounded adversary. Discussions First, compared with the traditional SNARK, the proposed scheme directly uses the commitments for part of I/O values in the verification. In this case, generators for these I/O values must be linearly independent. Otherwise, an adversary can easily forge a digest to pass the verification algorithm. Second, the I/O commitments are linked with on-chain digests to ensure they contain the same set of values. The linking method is secure under the condition that on-chain digests come from trusted sources [33], which complies with our security assumption that network administrators are honest when generating on-chain index and log digest.

3.4 Use Case: Distributed Network Provenance

81

Table 3.1 Storage overhead of digest keys Table 3.2 Computation overhead of index digest

Keys Size Key generation ' E + 3E 1 2

.3mn

.DKV

.DKE

' .4mn |G

Prover 'E 1

.2mn

1|

.3|G2 |

Verifier 4P

3.4.6 Performance Evaluation First, we report the theoretical analysis of the digest scheme and on-chain storage. Then, we evaluate the performance of the proposed scheme in terms of off-chain and on-chain performance.

3.4.6.1

Digest Performance Analysis

The proposed scheme requires an index digest scheme [33] to generate onchain digest .DI of the provenance index. Moreover, during the proof generation/verification phase, a digest proof .πD is also computed and verified. In the following, we denote .n' = n + n2 , .E1 is an exponentiation in .G1 , .E2 is an exponentiation in .G2 , P is a pairing operation. We use .|G1 | to denote the size of an element in .G1 and use .|G2 | to denote the size of an element in .G2 . As shown in Table 3.1, there are two sets of digest keys: .DKE and .DKV , which have .4mn' |G1 | and .3|G2 | storage overhead, respectively. We can see that the size of .DKE keeps increasing with m and .n' while the size of .DKV remains constant. Similarly, the size of the digest proof is also constant at .4|G1 |. As shown in Table 3.2, the computation overheads of the digest scheme are summarized as follows: • Key generation requires .3mn' E1 + 3E2 computations, which is also linearly increasing with m and .n' since each provenance item needs to be associated with multiple generators. However, the generation phase only needs to be conducted once. • Prover needs to conduct .2mn' E1 operations for each single proof, which can be efficiently implemented with alt-bn128 curve with specific optimizations by pre-computation. • Verifier needs to conduct 4 pairing operations. Compared with the verifications in SNARK, this introduces additional costs since .mF , mS need verifications. At the same time, batch verification techniques [45] can be applied for verifying multiple proofs at the same time.

82

3 Reliable Data Provenance in HCN

3.4.6.2

Off-chain Performance

For off-chain experiments, we set up the libsnark library1 for implementing a proprocessing zk-SNARK. Our testing bed is a laptop with a 2.3 GHz processor and 8 GB memory on a Linux system. We write the provenance query function in C and use an interface provided by Pinocchio [32] to translate the code into an arithmetic circuit. Finally, interfaces in xjsnark [46] are adopted to run the SNARK on altbn128 curve.2 In the following, we report the performance of the index query function in terms of computation and storage overheads for setup, proof generation, and proof verification. More specifically, we evaluate the performance of querying the secondlevel index. .IS is an .m ∗ n matrix that consists of m provenance vertexes and each vertex has n-dimension index .Ivi . .Ivi further can be divided into two parts: .n1 range values and .n2 keyword values. We adopt a conjunctive query strategy that compares each item in Q with that in .Ivi . • Figure 3.10 shows the computation overheads against m when .n = 100. That is, we fix the dimension of index/query as 100. The most expensive part is the CRS setup phase by TA. During this phase, TA needs to compute several evaluation and verification keys, which can take roughly a few minutes and increase with m. However, the setup only needs to be conducted once and thus the costs are acceptable. At the same time, we can see that the proof generation is more efficient compared with the CRS setup, which takes roughly less than a minute. The verification time remains constant at 27 ms, which is regardless of the size of the query and index. As shown in Fig. 3.11, another important observation is that increasing the number of range queries can increase the computational time. • Figure 3.12 shows the corresponding CRS size as m increases and .n = 100. We can see that the size of ek is much larger than that of vk. This is because ek needs to embed the information of the query function. Similarly, increasing .n1 from 2 to 3 can increase the size of ek since the computation complexity increases accordingly. However, the size of vk remains the same when .n1 = 2, 3 since the query outputs a relevance score for each vertex. As a result, the size of vk is only affected by m. • Figure 3.13 shows the corresponding QAP size as m increases and .n = 100. More specifically, the QAP size represents the number of QAP variables and reflects the complexity of the computing program. Similarly, QAP size increases with m and n. • We further test the computation time when n increases and m is fixed at 1000. As shown in Fig. 3.14, the CRS setup and proving time increases with n while the verification time still remains constant. At the same time, the CRS setup still costs much more time than the proof generation and verification.

1 libsnark: 2 C++

a C++ library for zkSNARK proofs. https://github.com/scipr-lab/libsnark. library for Finite Fields and Elliptic Curves. https://github.com/scipr-lab/libff.

3.4 Use Case: Distributed Network Provenance

83

150

CRS Setup Prover Verifier

Time (s)

100

50

0 0

500

1000

1500

2000

2500

3000

2500

3000

#m Fig. 3.10 Computation overheads of second-level query I, .n1 = 2 200 180

CRS Setup Prover

160

Time (s)

140 120 100 80 60 40 20 0 0

500

1000

1500

2000

#m Fig. 3.11 Computation overheads of second-level query I, .n1 = 3

From the above experimental results, we can see that the proposed scheme achieves efficient verifications of query results by offloading the computational overheads to the proving operations.

84

3 Reliable Data Provenance in HCN 1600 ek (10 6), n =2 1

6

1400

ek (10 ), n 1=3 vk (10 3), n 1=2/3

Size (bits)

1200 1000 800 600 400 200 0 0

500

1000

1500

2000

#m Fig. 3.12 CRS size 10 5

9

# QAP, n =2

8

1

# QAP, n 1=3

7 6 5 4 3 2 1 0 0

500

1000

#m Fig. 3.13 QAP size

1500

2000

3.4 Use Case: Distributed Network Provenance

85

90 80 70

CRS Setup Prover Verifier

Time (s)

60 50 40 30 20 10 0 20

40

60

80

100

120

140

160

#n Fig. 3.14 Computation overheads of second-level query II, .n1 = 3

3.4.6.3

On-chain Performance Analysis

To test the feasibility of the proposed scheme, a Parity-based Ethereum testing network is implemented [47] with Proof-of-Authority (PoA) consensus protocol on the same laptop with two authority nodes and peer nodes. A provenance contract by solidity [48] can be written to simulate the simple storage of provenance digests. At the same time, our design does not rely on specific blockchain architectures. That is, network administrators can also implement the proposed scheme on a public Ethereum blockchain. By writing and implementing a smart contract on the public blockchain, the administrators can store and read the provenance digests, TA can upload system parameters, and the on-chain storage cannot be later modified for archiving purposes. For on-chain storage, the proposed scheme only requires storing provenance and log digests, which is efficient for implementations since on-chain resources are more expensive compared with off-chain resources. To reduce the off-chain proving overheads, the on-chain digest can be split into digests for multiple sub-indexes. The tradeoff between on-chain and off-chain overheads can be tuned according to the requirements of different provenance cases. In the following, we will further investigate the on/off-chain tradeoff with experimental results.

86

3 Reliable Data Provenance in HCN 60

CRS Setup Prover Verifier

50

Time (s)

40

30

20

10

0 0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pr Fig. 3.15 Computation overheads with .pr

3.4.6.4

Multi-Level Query Strategy

In this section, we evaluate the performance of the overall multi-level index and present an analysis of the on/off-chain performance tradeoff. In the multi-level index, .IF is .n2 ∗m-dimension while .IS is .m∗n-dimension. By doing so, a provenance query can first locate a set of sub-indexes that contain a specific keyword to reduce search space. Then, a full query can be conducted over the sub-indexes to return relevance scores of the sub-indexes. To test the impacts of the multi-level strategy, we denote .pr = |Ik∗ |/m as a reducing factor. In the following, we test the computation overheads and storage overheads of the provenance query when .pr changes. At the same time, we fix .n = 100, m = 1000, and .n1 = 2. • Figure 3.15 shows the computation overheads. A smaller .pr can help reduce the query space. As a result, with a larger .pr , the setup and proving time increases. At the same, the verification cost remains constant. • Figure 3.16 shows the storage overheads as .pr changes. Similarly, a smaller .pr indicates a simpler query function with fewer QAP variables and a smaller size of a common reference string. In summary, the proposed scheme is efficient for proof verification and on-chain storage. At the same time, the proposed scheme also provides a tradeoff parameter .pr for developers to adjust on/off-chain performances in different provenance scenarios.

3.5 Summary and Discussions

87

70

VK, 10 4 bit

60

EK, 10 7 bit # QAP, 104

50

40

30

20

10

0 0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pr Fig. 3.16 Storage overheads with .pr

3.5 Summary and Discussions In this chapter, we have investigated the blockchain-based data provenance with its motivations, application requirements, and existing solutions. Design challenges to strike a balance between the decentralization and efficiency requirements in data provenance have been discussed. To address the challenges, we have proposed a representative construction from the SNARK-based vc framework and on/offchain computation models. Moreover, a multi-level index has been proposed to improve the query efficiency with the SNARK-based vc framework. Thorough security analysis has shown that the proposed scheme realizes archiving security. Extensive experiments have demonstrated the on-chain efficiency (succinct overhead regardless of the complexity of the provenance index/query) to present necessary benchmarks and insights for real-world implementations of distributed network provenance. The discussions, designs, implementations, and analysis in this chapter demonstrate the application advantage and feasibility of blockchain-based data provenance approaches in HCN. At the same time, more research efforts can be directed to other provenance query modules for different provenance applications, including efficient sub-graph queries and graph traversal.

88

3 Reliable Data Provenance in HCN

References 1. X. Shen, J. Gao, W. Wu, M. Li, C. Zhou, and W. Zhuang, “Holistic network virtualization and pervasive network intelligence for 6g,” IEEE Communications Surveys & Tutorials, vol. 24, no. 1, pp. 1–30, 2021. 2. Z. Li, Y. Zhao, N. Cheng, B. Hao, J. Shi, R. Zhang, and X. Shen, “Multiobjective optimization based sensor selection for TDOA tracking in wireless sensor network,” IEEE Transactions on Vehicular Technology, vol. 68, no. 12, pp. 12 360–12 374, 2019. 3. M. S. Mahmud, H. Fang, and H. Wang, “An integrated wearable sensor for unobtrusive continuous measurement of autonomic nervous system,” IEEE Internet of Things Journal, vol. 6, no. 1, pp. 1104–1113, 2018. 4. T. Pasquier, X. Han, T. Moyer, A. Bates, O. Hermant, D. Eyers, J. Bacon, and M. Seltzer, “Runtime analysis of whole-system provenance,” in Proc. of ACM CCS, 2018, pp. 1601–1616. 5. A. Chen, Y. Wu, A. Haeberlen, B. T. Loo, and W. Zhou, “Data provenance at internet scale: Architecture, experiences, and the road ahead,” in Proc. of CIDR, 2017. 6. Z. Liu and Y. Wu, “An index-based provenance compression scheme for identifying malicious nodes in multihop IoT network,” IEEE Internet of Things Journal, vol. 7, no. 5, pp. 4061–4071, 2019. 7. W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, and A. Bates, “Nodoze: Combatting threat alert fatigue with automated provenance triage.” in Proc. of NDSS, 2019. 8. R. Hu, Z. Yan, W. Ding, and L. T. Yang, “A survey on data provenance in IoT,” World Wide Web, pp. 1–23, 2019. 9. M. N. Aman, M. H. Basheer, and B. Sikdar, “Data provenance for IoT with light weight authentication and privacy preservation,” IEEE Internet of Things Journal, vol. 6, no. 6, pp. 10 441–10 457, 2019. 10. D. Liu, J. Ni, C. Huang, X. Lin, and X. Shen, “Secure and efficient distributed network provenance for IoT: A blockchain-based approach,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7564–7574, 2020. 11. Y. Zhang, A. O’Neill, M. Sherr, and W. Zhou, “Privacy-preserving network provenance,” Proc. the VLDB Endowment, vol. 10, no. 11, pp. 1550–1561, 2017. 12. Y. Wu, A. Chen, and L. T. X. Phan, “Zeno: Diagnosing performance problems with temporal provenance.” in Proc. of NSDI, 2019, pp. 395–420. 13. W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr, “Secure network provenance,” in Proc. of ACM Symposium on Operating Systems Principles, 2011, pp. 295– 310. 14. A. Chen, Y. Wu, A. Haeberlen, W. Zhou, and B. T. Loo, “The good, the bad, and the differences: Better network diagnostics with differential provenance,” in Proc. of ACM SIGCOMM, 2016, pp. 115–128. 15. X. Liang, S. Shetty, D. Tosh, C. Kamhoua, K. Kwiat, and L. Njilla, “Provchain: A blockchain-based data provenance architecture in cloud environment with enhanced privacy and availability,” in Proc. of IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2017, pp. 468–477. 16. H. Li, K. Gai, Z. Fang, L. Zhu, L. Xu, and P. Jiang, “Blockchain-enabled data provenance in cloud datacenter reengineering,” in Proc. of ACM International Symposium on Blockchain and Secure Critical Infrastructure, 2019, pp. 47–55. 17. R. Neisse, G. Steri, and I. Nai-Fovino, “A blockchain-based approach for data accountability and provenance tracking,” in Proc. of International Conference on Availability, Reliability and Security. ACM, 2017, p. 14. 18. S. Mann, V. Potdar, R. S. Gajavilli, and A. Chandan, “Blockchain technology for supply chain traceability, transparency and data provenance,” in Proc. of International Conference on Blockchain Technology and Application, 2018, pp. 22–26. 19. K. Gai, Z. Fang, R. Wang, L. Zhu, P. Jiang, and K.-K. R. Choo, “Edge computing and lightning network empowered secure food supply management,” IEEE Internet of Things Journal, vol. 9, no. 16, pp. 14 247–14 259, 2020.

References

89

20. P. Ruan, G. Chen, T. T. A. Dinh, Q. Lin, B. C. Ooi, and M. Zhang, “Fine-grained, secure and efficient data provenance on blockchain systems,” Proc. of the VLDB Endowment, vol. 12, no. 9, pp. 975–988, 2019. 21. T. Jiang, H. Fang, and H. Wang, “Blockchain-based internet of vehicles: distributed network architecture and performance analysis,” IEEE Internet of Things Journal, vol. 6, no. 3, pp. 4640–4649, 2018. 22. M. Li, J. Weng, A. Yang, J.-n. Liu, and X. Lin, “Towards blockchain-based fair and anonymous ad dissemination in vehicular networks,” vol. 68, no. 11, 2019, pp. 11 248–11 259. 23. C. Zhang, Y. Xu, Y. Hu, J. Wu, J. Ren, and Y. Zhang, “A blockchain-based multi-cloud storage data auditing scheme to locate faults,” IEEE Transactions on Cloud Computing, vol. 10, no. 4, pp. 2252–2263, 2021. 24. Y. Du, H. Duan, A. Zhou, C. Wang, M. H. Au, and Q. Wang, “Enabling secure and efficient decentralized storage auditing with blockchain,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 5, pp. 3038–3054, 2021. 25. S. Hu, C. Cai, Q. Wang, C. Wang, Z. Wang, and D. Ye, “Augmenting encrypted search: A decentralized service realization with enforced execution,” IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 6, pp. 2569–2581, 2021. 26. C. Cai, J. Weng, X. Yuan, and C. Wang, “Enabling reliable keyword search in encrypted decentralized storage with fairness,” IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 1, pp. 131–144, 2021. 27. C. Xu, C. Zhang, and J. Xu, “vChain: Enabling verifiable Boolean range queries over blockchain databases,” Proc. of SIGMOD, pp. 141–158, 2019. 28. C. Zhang, C. Xu, J. Xu, Y. Tang, and B. Choi, “Gemˆ 2-tree: A gas-efficient structure for authenticated range queries in blockchain,” in Proc. of IEEE ICDE, 2019, pp. 842–853. 29. J. Eberhardt and S. Tai, “Zokrates-scalable privacy-preserving off-chain computations,” in IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2018, pp. 1084–1091. 30. S. Steffen, B. Bichsel, M. Gersbach, N. Melchior, P. Tsankov, and M. Vechev, “zkay: Specifying and enforcing data privacy in smart contracts,” in Proc. of ACM CCS, 2019, pp. 1759–1776. 31. R. Gennaro, C. Gentry, B. Parno, and M. Raykova, “Quadratic span programs and succinct NIZKs without PCPs,” in Proc. of EUROCRYPT. Springer, 2013, pp. 626–645. 32. B. Parno, J. Howell, C. Gentry, and M. Raykova, “Pinocchio: Nearly practical verifiable computation,” in Proc. of IEEE S&P, 2013, pp. 238–252. 33. D. Fiore, C. Fournet, E. Ghosh, M. Kohlweiss, O. Ohrimenko, and B. Parno, “Hash first, argue later: Adaptive verifiable computations on outsourced data,” in Proc. of ACM CCS, 2016, pp. 1304–1316. 34. S. Agrawal, C. Ganesh, and P. Mohassel, “Non-interactive zero-knowledge proofs for composite statements,” in Proc. of CRYPTO, 2018, pp. 643–673. 35. P. Yang, F. Lyu, W. Wu, N. Zhang, L. Yu, and X. Shen, “Edge coordinated query configuration for low-latency and accurate video analytics,” IEEE Transactions on Industrial Informatics, vol. 16, no. 7, pp. 4855–4864, 2020. 36. M. Li, D. Hu, C. Lal, M. Conti, and Z. Zhang, “Blockchain-enabled secure energy trading with verifiable fairness in industrial internet of things,” IEEE Transactions on Industrial Informatics, vol. 16, no. 10, pp. 6564–6574, 2020. 37. Z. Bao, D. He, W. Wei, C. Peng, and X. Huang, “Ledgermaze: An efficient privacy-preserving non-interactive zero-knowledge scheme over account-model blockchain,” IEEE Transactions on Computers, pp. 1–15, 2023. 38. J. Ni, K. Zhang, Q. Xia, X. Lin, and X. Shen, “Enabling strong privacy preservation and accurate task allocation for mobile crowdsensing,” IEEE Transactions on Mobile Computing, vol. 19, no. 6, pp. 1317–1331, 2020.

90

3 Reliable Data Provenance in HCN

39. M. Li, L. Zhu, and X. Lin, “Privacy-preserving traffic monitoring with false report filtering via fog-assisted vehicular crowdsensing,” IEEE Transactions on Services Computing, vol. 14, no. 6, pp. 1902–1913, 2021. 40. S. Bowe, A. Gabizon, and M. D. Green, “A multi-party protocol for constructing the public parameters of the Pinocchio zk-snark,” in Proc. of FC. Springer, 2018, pp. 64–77. 41. J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou, “Fuzzy keyword search over encrypted data in cloud computing,” in Proc. of IEEE INFOCOM, 2010, pp. 1–5. 42. Y. Zhang, C. Papamanthou, and J. Katz, “Alitheia: Towards practical verifiable graph processing,” in Proc. of ACM CCS, 2014, pp. 856–867. 43. R. C. Merkle, “A digital signature based on a conventional encryption function,” in Proc. of CRYPTO. Springer, 1987, pp. 369–378. 44. B. David, P. Gaži, A. Kiayias, and A. Russell, “Ouroboros Praos: An adaptively-secure, semisynchronous proof-of-stake blockchain,” in Proc. of EUROCRYPT. Springer, 2018, pp. 66–98. 45. C. Zhang, R. Lu, X. Lin, P.-H. Ho, and X. Shen, “An efficient identity-based batch verification scheme for vehicular sensor networks,” in Proc. of IEEE INFOCOM, 2008, pp. 246–250. 46. A. Kosba, C. Papamanthou, and E. Shi, “xjsnark: a framework for efficient verifiable computation,” in Proc. of IEEE S&P, 2018, pp. 944–961. 47. Parity Ethereum. https://github.com/paritytech/parity-ethereum. Accessed October 2019. 48. Solidity. https://solidity.readthedocs.io/en/v0.4.25/. Accessed January 2020.

Chapter 4

Transparent Data Query in HCN

4.1 Motivations and Applications With the increase in data volume and data usage in future wireless networks, databased approaches are becoming the driving force in advanced network management. Therefore, data services including data collection, data storage, data updates, etc., can significantly affect the efficiency and effectiveness of network resource management. For example, AI-based network management puts high requirements on data management efficiencies across complex trust domains. Among the various data services, data query is a fundamental component that helps a user, network entity, or authority to quickly locate and retrieve required data from a large database. The most familiar use case is the search engine on the Internet. Users type a few keywords in the search box to find related web pages. For database queries, general query language, such as SQL, can be constructed to support versatile and expressive query functionalities. More specifically, there are multiple applications for data query: • Cross-domain data provenance: As discussed in the previous chapter, crossdomain provenance query can enable network administrators to securely query provenance logs across trust domains. With the query results, a graph of event relations can be constructed for global diagnosis. • Collaborative inventory management: Industrial partners often need to maintain a shared view of business logic and operations. For example, different suppliers can maintain a shared inventory for supply chain management. In this regard, data query can help industrial partners to find lifecycle information of goods in the supply chain. • Transparent log: For critical data or services, such as certificate management, a transparency log can be constructed [1] to support publicly verifiable data query and data updates. More specifically, a user can query the log of a certificate authority to retrieve information about the history of a web certificate. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 D. Liu, X. (Sherman) Shen, Blockchain-Based Data Security in Heterogeneous Communications Networks, Wireless Networks, https://doi.org/10.1007/978-3-031-52477-6_4

91

92

4 Transparent Data Query in HCN

• Smart advertising: Smart advertising enables an ad broker to send personalized advertisements to targeted users based on their preferences or browsing history. Various query rules including keyword query, relevance-based query, etc., can be applied in smart advertising, which finds a suitable advertisement for users from a pool of ads. The remainder of this chapter is organized as follows. In Sect. 4.2, we review the application requirements of blockchain-based data query, including query privacy, query trustworthiness, and query efficiency. Then, we investigate state-of-the-art data query approaches, in terms of cloud-based data query and blockchain-based data query, and discuss the dilemma for achieving query decentralization and efficiency in Sect. 4.3. In Sect. 4.4, we present a concrete use case for data query in future networks [2]: blockchain-based collaborative VNF management. Specifically, models, design goals, and building blocks are presented. Detailed constructions in terms of VNF listing and query are proposed with extensive security analysis and performance evaluations on a real-world consortium blockchain. Finally, we conclude this chapter in Sect. 4.5.

4.2 Application Requirements To realize data query applications for future networks, multiple application requirements should be met, including query privacy, query trustworthiness, and query efficiency.

4.2.1 Privacy As data can contain sensitive information, the data are often encrypted before being stored on cloud/blockchain storage. At the same time, data queries can also contain sensitive information about the query user. For example, a location-based service can require users to include their current geographical location and preferences in a query, which can lead to privacy leakage of users [3]. More specifically, data privacy query can have two requirements: • Query privacy: Private or personal information in a data query should be concealed from a query service provider. • Result privacy: Query results that include private or sensitive information should be concealed from the service provider. In cloud-based file retrieval, the access pattern of cloud files can also be considered private information.

4.3 State-of-the-Art Data Query Approaches

93

4.2.2 Trustworthiness In future networks, data are often generated and stored at distributed network entities. For example, network administrators can maintain their own network logs at local storage; IoT devices with limited storage capability can outsource their data to a powerful cloud server for data services. As a result, data query is often conducted by an external entity that is out of the trust domains of query users. It is essential to ensure the trustworthiness of query results for data query services: • First, the query process should be executed according to pre-defined query rules. • Second, the integrity and completeness of the query results should be guaranteed. That is, all query results that match the query rules should be returned without being modified.

4.2.3 Efficiency Due to the large volume of data in future networks, data query efficiency should be achieved: • First, query processing should be efficient. For different query types, various optimization techniques can be deployed. For example, R-tree can reduce search space for spatial queries. At the same time, when considering the privacy requirements of data query, the query algorithm should also be efficient over the encrypted data. • Second, when considering query trustworthiness, verifiable computation technique is often integrated. That is, additional query proofs are generated and verified. In this case, the proof generation should be efficiently conducted, and the proof verification should also be efficient in terms of both computation and communication overheads.

4.3 State-of-the-Art Data Query Approaches In the following, we review recent data query approaches in terms of a cloud-based or a blockchain-based model. Then, we discuss the design challenges of achieving efficient blockchain-based data query.

4.3.1 Cloud-Based Data Query To enjoy the pay-as-you-go data services, many data owners are outsourcing their data to a third-party cloud server. In this regard, the cloud server manages data query

94

4 Transparent Data Query in HCN

and transmission for the users. However, as the users are losing physical control of their data, query privacy and query trustworthiness challenges need to be addressed. To address the query privacy challenge for cloud-based data query, a wide range of approaches have been proposed for keyword query, range query, and advanced query: • For keyword query, a secure search scheme over encrypted cloud data was proposed in [4]. An efficient search index that supported ranked keyword search was built using the order-preserving encryption technique. Secure inner product computation was adopted to design secure multi-keyword search over the encrypted data [5], which had a notable impact on subsequent works. • For range query, a secure query scheme over the encrypted cloud data was designed in [6]. Using the prefix encoding technique, range checks were converted into set membership checks in a tree structure. Bloom filter was adopted to achieve the query and index privacy. Geometric range query over the encrypted cloud data was developed in [7] with the polynomial fitting and R-tree technique to improve search security and efficiency. • For advanced query, there can be secure designs for skyline query, k-nearest neighbor (kNN) query, or graph query. In [8], a secure set reverse kNN query scheme was proposed for encrypted data over the cloud. Two-server model was adopted, and a set of privacy-preserving computation algorithms was designed. In a malicious security model where the cloud server may not follow the query protocol, query trustworthiness (verifiable query) is also explored from various perspectives: • For keyword query, verifiable multi-keyword search was proposed in [9]. A secure index was constructed for query verifications in the vector-space query model. A multi-user verifiable query scheme over the cloud data was later designed in [10]. RSA accumulator was utilized to achieve search verifiability, and a novel keyword index was introduced to improve search efficiency. • For range query, a verifiable query scheme over outsourced database was proposed in [11]. Tree-based query index was designed, and a hash signature for the Bloom filter was proposed to verify the completeness and correctness of query results. • For advanced query, verifiable SQL query over cloud storage was proposed in [12]. With the designs of polynomial commitments and multi-linear extensions, the proposed scheme achieved efficient and general SQL operations with expressive functionalities and data updates.

4.3.2 Blockchain-Based Data Query Instead of relying on a single entity to store data and provide query service, distributed data owners can maintain shared data storage on the blockchain. In

4.3 State-of-the-Art Data Query Approaches

95

this case, the blockchain can serve as a transparent and trusted entity to provide data query service [13, 14]. A straightforward approach is to store data on the blockchain and use smart contracts for data queries, which is not practical for realworld implementations. To address the on-chain privacy and efficiency challenge, there are extensive research works: • A verifiable query scheme for blockchain data was proposed in [15]. A query layer can be deployed in the cloud to extract data from blockchain. With an authenticated data structure from Merkle Patricia tree, fingerprints of the database can be stored for efficient query verifications. • Cryptographic accumulators [16] can be used to digest a set of items and provide membership proofs. In [1], verifiable query of certificate logs was proposed, where an append-only dictionary based on accumulators can be stored on the blockchain. • SNARG can support verifiable computations for relations represented by arithmetic circuits [17, 18]. For data query services, query algorithms can be converted into circuits to take data/query as inputs and output query results. Set membership queries can also be snarked for efficient on-chain verifications [19]. • Vector commitment [20] is a powerful crypto primitive that can digest a vector of items into a succinct commitment. Vector commitment can be securely opened at given positions, which can be utilized for index-based data queries. Combined with zero-knowledge proof technique, such as zk-SNARK, vector commitment can digest a large database on the blockchain to provide verifiable data query services [21]. At the same time, vector commitment can also be integrated with searchable encryption technique to provide privacy-preserving query with accountability enforcement [22]. • SGX can load the database into its secure memory to conduct verifiable and privacy-preserving data query [23]. Moreover, smart contract can be integrated with SGX to offload contract verification operations into its secure memory [24].

4.3.3 Decentralization and Efficiency Dilemma Data query approach in HCN can suffer from a decentralization and efficiency dilemma: • The heterogeneity of network entities requires a decentralized data management platform to store data and provide query service. To this end, blockchain-based data query approach can be deployed for transparent data query services, which enhances trust among network entities [25]. • The blockchain needs to distribute its storage to peer nodes and run consensus protocols to maintain storage consistency. As a result, on-chain data storage and computation costs can be expensive for implementing complex query algorithms in a large data set.

96

4 Transparent Data Query in HCN

With succinct non-interactive argument (SNARG)-based vc technique [17, 18, 26, 27], data can be digested as succinct authenticators for efficient verification of offchain query results. However, as computations are represented as arithmetic circuits in SNARG, random access memory (RAM) is not well supported in SNARG. As a result, many programming techniques cannot be efficiently supported: • Loop break: A developer needs to fix the number of loops before compiling a program to circuits [17]. • Dynamic array access: A specific position of an array to visit cannot be set at runtime. For data query applications, inefficient RAM and relation-dependent CRS setup result in linear scans over the whole dictionary, which can incur expensive proving overheads. To this end, blockchain-based data query achieves transparent query operation among distributed network entities. With the SNARG-based computation model, on-chain storage and proof verification overheads can be reduced at the cost of increased off-chain proving overheads. Therefore, it is still a non-trivial task to achieve transparent data query with both on-chain and off-chain efficiency. In the following, we will present a collaborative VNF query scheme as a concrete use case and propose a dictionary pruning strategy to address the efficiency challenge of transparent data query.

4.4 Use Case: Blockchain-Based VNF Query Network function virtualization (NFV) is envisioned as a key enabling technology for future wireless networks [28]. More specifically, network resource providers, including operators for access resources, data centers for storage resources, or cloud centers for computing resources, can abstract their resources as virtualized network functions (VNFs). For each specific service with diversified resource requirements, a network slice can be configured that consists of a chain of VNFs. There are multiple benefits with NFV-based resource management: • Diversified and dynamic service demand in future networks can be met with NVF-enabled management and flexible slice configurations [29]. That is, a network slice can be constructed and destroyed per network service with a large VNF pool. • Efficient network resource sharing among resource providers can be achieved to improve the overall resource utilization efficiency [30]. • A new service paradigm for enterprise users can be provisioned. That is, operators can directly offer enterprise users multi-level network infrastructures as network slices, which can lower the implementation and maintainable costs of building a network from scratch.

4.4 Use Case: Blockchain-Based VNF Query

97

To this end, NFV-based resource management solutions have been widely explored in various perspectives of future network architectures, such as in satellite networks [31] or public cloud networks [32]. At the same time, AI-based slice management approaches can also be deployed to improve operational efficiency and effectiveness [33, 34]. As has been discussed in Chap. 1, future networks are going to have a very complicated and heterogeneous architecture [35], i.e., HCN. In NFV-enabled HCN, network resource providers can also come from different trust domains. As a result, it becomes challenging for them to agree on a single trusted entity to manage their VNF information. To address the issue, a distributed VNF management architecture is urgently required to build reliable, shared, and transparent resource allocation [36–42] in HCN. More specifically, the emerging blockchain [43] technology can be a promising approach: • First, the blockchain can serve as a trusted storage for network resource providers to collaboratively store VNF information. With the smart contract technique [44], automatic and secure slice configurations based on the service-level agreement can be conducted. • Second, the blockchain can serve multiple roles in distributed VNF management. For example, the blockchain can be a trusted VNF repository to store VNF information from different resource providers [45, 46]. At the same time, the blockchain can be a trustworthy VNF broker for resource providers [47, 48] to provision network services where auction measures for VNF trading can be deployed. • With additional designs of fairness measures, fair VNF management can be achieved [49] with efficient detection of dishonest management behavior and effective enforcement of accountability [50]. To this end, blockchain-based VNF management has received extensive research attention due to its benefits. At the same time, the inherent efficiency limitations of on-chain storage and computation resources have not been well discussed for practical implementations. In this chapter, we address the efficiency challenge with the designs of an authenticated VNF dictionary on blockchain: • First, we utilize the consortium blockchain architecture, i.e., Hyperledger Fabric, to build a VNF management platform among distributed resource providers. By doing so, no single trusted VNF management entity is required to improve mutual trust among distributed network resource providers. • Second, integrated with SNARG-based on/off-chain computation models, we design an efficient VNF query scheme, which reduces on-chain overheads compared with that by directly storing VNF information on the blockchain. Moreover, a succinct dictionary authenticator can be stored on chain for verifiable VNF query and slice configurations. • Third, to improve the off-chain proving overhead and mitigate the RAM issue of SNARG, we design a two-level verifiable query framework with SNARG. More specifically, a key query item is introduced to reduce the search space, and a

98

4 Transparent Data Query in HCN

verifiable dictionary pruning scheme is proposed based on pre-computed Merkle commitments of SNARG authenticators. • Our extensive experiments demonstrate that the proposed scheme significantly reduces the off-chain proving overhead. At the same time, the proposed pruning solution adds only limited on-chain storage and communication overhead as compared with the straightforward SNARG-based solution. In the following, we discuss a representative construction for blockchain-based data query [2]. First, we present the system model, threat model, and design goals of VNF management in future networks. Then, we discuss the building blocks of the VNF management including commitment schemes and SNARG. Furthermore, we investigate design challenges of efficient blockchain-based VNF query and propose detailed constructions. With thorough security analysis and extensive experiments, we demonstrate that the proposed scheme is both secure and efficient for real-world implementations.

4.4.1 VNF Query in HCN As shown in Fig. 4.1, there are 5 entities for blockchain-based VNF management: • VNF Provider (VNF-P) is the network resource provider that abstracts its resources as VNFs. For example, a mobile operator can provide wireless access function, while a cloud server can provide a network middlebox function [51]. There are multiple VNF-Ps from different trust domains. To enable efficient slice

VNF Query

VNF-Ts VNF Query

VNF Dictionary VNF Listing

VNF-M

Consortium Blockchain

VNF-Ps SAs Fig. 4.1 Blockchain-based VNF management

4.4 Use Case: Blockchain-Based VNF Query









99

configurations for diversified service requirements, VNF-P constructs a VNF dictionary to store necessary VNF information, including functionality, location, performance, price, version, etc. VNF Manager (VNF-M) is a powerful storage and computing unit that is responsible for collecting VNF information and managing slice configurations for VNF-Ps. For example, in cloud-based NFV, the cloud server can be VNF-M (i.e., a VNF broker for VNF-Ps). However, it should be noted that the VNF-M is not considered fully trusted, and therefore, verifiable VNF management should be deployed. VNF Tenant (VNF-T) is a personal or enterprise customer who would like to enjoy NFV-based network services. With its service requirement, VNF-T can be authorized a network slice by VNF-M that consists of a chain of VNFs from (multiple) VNF-Ps. Supervising Authority (SA) is a consortium that consists of many Supervising Nodes (SNs). SA maintains a consortium blockchain in this architecture for VNF management. In practice, SNs can be various network stakeholders, such as operators, edge service providers, and vendors. Blockchain is a distributed ledger run by a consortium, i.e., SA. A VNF management smart contract can be deployed to achieve automatic and transparent VNF management.

More specifically, the above-mentioned entities interact with each other to conduct blockchain-based VNF management as follows. For illustrative simplicity, only single VNF-T and VNF-P are considered in the model: • First, SNs work with each other to set up a VNF management contract on a consortium blockchain. Moreover, SNs also set up system public parameters and generate necessary keys. • Second, VNF-P abstracts its network services as VNFs and builds a VNF dictionary. VNF-P submits its VNF dictionary to VNF-M and computes a dictionary authenticator to be uploaded onto the blockchain. Then, VNF-M can provide VNF query services to VNF-Ts. A VNF dictionary is defined as follows: D = {Vi }i∈[1,m] , .

Vi = (attr1 , attr2 , . . . , attrn ),

(4.1)

where .Vi is a vector that consists of VNF attributes represented by integer values or numeric ranges. For example, a binary value can indicate if a VNF is allocated or not, while an integer value can be associated with a keyword in a keyword dictionary. That is, there are at most m VNF vectors in the dictionary and n attributes in each vector. • Third, a VNF-T constructs a VNF query that specifies their service requirements and submits the VNF query to the blockchain. VNF-M retrieves the VNF query

100

4 Transparent Data Query in HCN

and performs the query over the dictionary at its local storage. At the same time, a correctness proof is computed and returned to the VNF-T. More formally, the VNF query can be defined as follows: F : (D, Q) → R,

.

(4.2)

where D is the VNF dictionary and Q is a VNF query: Q = (q1 , q2 , . . . , qn ).

.

(4.3)

Each .qi is a VNF requirement, such as available resources, locations, and VNF functionalities. .R is a VNF query result defined as follows: R = (id1 , id2 , . . . , idm∗ ),

.

(4.4)

where .idi is the identifier of a matched VNF. That is, there can be at most .m∗ VNFs in the query result. • Finally, the VNF-T verifies the proof and query results. If the proof is not correct, the VNF-T can report to the VNF management contract for a post-query investigation.

4.4.2 Threat Model and Design Goals The core of the blockchain-based VNF management is the VNF query where dictionary authenticators are constructed by VNF-P and queries are conducted by VNF-M. Therefore, we only focus on the transparency and verifiability issues in the query process. Specifically, the trust assumptions of the involved entities are summarized as follows: • A majority of SNs are trusted. That is, SA is a trusted consortium to securely set up the blockchain and system parameters. • Blockchain is a trusted ledger with standard security guarantees. • VNF-Ts are users who would like to enjoy NFV-enabled services. That is, VNF-Ts honestly construct their VNF queries and accept query results if the verifications pass. • VNF-Ps are network resource providers that can be audited by governmental offices and SA. In our threat model, we consider that VNF-Ts also honestly construct their VNF dictionary and dictionary authenticators to provide slice configuration services. At the same time, the authenticator can be easily checked for correctness. • VNF-M can be a third-party and profit-driven entity that may not always follow the pre-defined query protocol. This is due to the lack of execution transparency and efficient regulations. For example, like an advertising broker, VNF-M can

4.4 Use Case: Blockchain-Based VNF Query

101

also set up a white list to promote VNFs of VNF-Ps who pay for a higher ranking. VNF-M may also reserve the same VNF for different slices to lower the risk of service cancelation. Under the system model and threat model, the design goals of the construction include twofolds: • Verifiability: To increase transparency and trustworthiness of the off-chain query processing at VNF-M, the proposed scheme should achieve verifiable and efficient VNF query that includes twofolds: – VNF dictionary and VNF queries used in the query processing should be authenticated and correct. That is, both of them should come from a trusted source and cannot be maliciously modified. – Given a pre-defined query rule, VNF queries should be correctly executed using the dictionary and queries. That is, query results should be verifiable. • Efficiency: To achieve practical blockchain-based VNF management, the verifiable query process should be succinct and efficient in terms of two perspectives: – On-Chain Efficiency: On-chain storage overheads for the VNF dictionary should be succinct regardless of dictionary size. On-chain verification of query results should be efficient. – Off-Chain Efficiency: Off-chain computational overheads for generating query proofs should be efficient, and the RAM issue of the SNARG should be mitigated. Moreover, off-chain communication overheads in terms of proof size should be efficient.

4.4.3 Building Blocks In this section, building blocks of the representative constructions are given, including commitment schemes and SNARG.

4.4.3.1

Cryptographic Notations

We highlight cryptographic notations used in this chapter: • .G = (G1 , G2 , GT ) represents a set of cyclic groups with prime order p and a bilinear pairing: e : G1 × G2 → GT .

.

(4.5)

• To distinguish generators from .G1 and .G2 , we use g to represent an element from .G1 and its tilde version, .g ˜ to represent an element from .G2 .

102

4 Transparent Data Query in HCN

• For integers, .[m, n] represents integers from .m ∈ Zp to .n ∈ Zp , m < n. An n-dimension vector is denoted as vn = (v1 , v2 , . . . , vn ) ∈ Znp .

.

4.4.3.2

(4.6)

Commitment Schemes

This chapter adopts two types of commitment schemes: vector commitment from Pedersen commitment [52, 53] and Merkle tree [54]: • For an n-dimension vector .vn , we can construct a vector commitment as discussed in Chap. 2. This chapter requires two security notions for vector commitments [55]: – Hiding: Only given access to a vector commitment and public parameters, a computationally bounded adversary cannot derive its committed values. – Binding: A computationally bounded adversary cannot open a vector commitment to different sets of values. • Merkle tree [54] is used to digest a set of elements for membership proof. A Merkle root in the tree can serve as a commitment for the digested set. Detailed constructions of the Merkle tree in this chapter will be presented in the proposed scheme.

4.4.3.3

SNARG

We review the specific notions for SNARG [17] used in this chapter. Recall that there are two entities in SNARG: a prover .P and a verifier .V. A relation .R that holds on .(x, w) is denoted as R(x, w) = 1,

.

(4.7)

where x is an instance and w is the witness. In terms of verifiable computation, the relation can also be represented as a computing function F : F (x1 , x2 , . . . ., xn ) → (y1 , y2 , . . .).

.

(4.8)

The function consists of addition and multiplication operations with input values (x1 , x2 , . . . ., xn ) and output values .(y1 , y2 , . . .). Accordingly, there are input/output wires and intermediate wires corresponding to addition/multiplication gates in an arithmetic circuit. To this end, the evaluation of the computing function with input/output values can be converted into the evaluation of its corresponding circuit

.

4.4 Use Case: Blockchain-Based VNF Query

103

with input/output wires [56], which is later converted into the divisibility check of a QAP. The SNARG construction in asymmetric groups with pre-processing model and augmented QAP generation [18] is adopted in this chapter, which consists of Setup, Prove, and Verify algorithms. The detailed constructions can be found in [18]. Briefly speaking, a prover can compute the function F to get input/output values .Iio . By using the Prove algorithm, the prover can generate a correctness proof .π . A verifier can check that .Iio is a valid assignment for F using the Verify algorithm. For efficiency considerations, there are two requirements for the SNARG: • SNARG should be succinct in terms of proof size. That is, proof size should be constant regardless of the complexity of the function. • SNARG should be non-interactive that requires it to be a one-move proof system. This is important for efficient verification on the blockchain as on-chain communications are expensive.

4.4.4 Representative Constructions Given the above-discussed building blocks, a verifiable VNF query can be constructed using a straightforward approach. That is, the VNF query function can be represented as a circuit. SA generates evaluation and verification keys of the circuit. VNF-M generates query results on a VNF query and computes a proof using the Prove function of the SNARG. Although the straightforward approach is easy to implement, it increases the verification costs and proving costs: • Plaintext inputs of VNF dictionary and queries are required in the verification that can be a huge storage cost. • Due to the RAM issue, VNF-M must conduct a linear scan over the dictionary to generate a proof, which can consume much computation resources for involved cryptographic operations. In the following, we present the detailed constructions of an efficient blockchainbased VNF query, which consists of six phases: System Setup, Design of Pruning Function, VNF Listing, VNF Query Construction, VNF Query Processing, and VNF Query Verification. For illustration simplicity, we assume that all involved entities have obtained their identity credentials from a trusted (distributed) certificate authority. Therefore, all communications among SA, VNF-M, VNF-T, VNF-P, and the blockchain are secure and authenticated.

4.4.4.1

System Setup

As shown in Fig. 4.2, SA takes the following steps to set up the system:

104

4 Transparent Data Query in HCN

(1) Set up a consortium blockchain (2) Set up a dictionary template (3) Generate parameters and keys

Blockchain (4) Publish the parameters on the blockchain

SA

(5) Provide necessary keys

VNF-Ps VNF-M Fig. 4.2 Workflow of system setup

First, SNs work together to set up a blockchain for the VNF management. Specifically, Hyperledger Fabric is adopted, and a VNF management contract is deployed as chaincode. Other parameters, including block time, block size, consensus protocols, channels, and membership services of the blockchain, are also determined in this phase. Second, all entities need to agree on query rules, including dictionary and query formats, and matching rules: • Based on the previous definition of the VNF dictionary, the attributes of each VNF information vector can be further split into two parts: keyword attribute and numeric attribute. As a result, we have a modified VNF dictionary template as follows: D = {Vi }i∈[1,m] , .

Vi = ({wx ∈ W}x∈[1,n1 ] , {vy }y∈[n1 +1,n] ).

(4.9)

– Keyword attributes are represented as .wx ∈ W, where .wx is an integer and W is a keyword dictionary. There are total .n1 keyword attributes in each VNF information vector. – Numeric attributes are represented as {vy }y∈[n1 +1,n] ,

.

(4.10)

where .vy is an integer. There are total .n − n1 numeric attributes. n is the total number of attributes in a VNF information vector. There are total m VNF vectors in a dictionary. To improve the randomness in a VNF vector for commitment indistinguishability, one or more numeric values without being used can be set as a random number from .Zp .

4.4 Use Case: Blockchain-Based VNF Query

105

• A VNF query is defined as Q = (q1 , q2 , . . . , qn ).

.

(4.11)

Similar to the VNF information vector, the query can also be split into two sets: keyword query and range query: – Keyword query is represented as an integer .qi that can be located in the keyword dictionary. – Range query is represented by two integers: [a, b], a < b.

.

(4.12)

• For a query Q and the dictionary D, we adopt a conjunctive matching rule that compares each item in Q against that in each VNF vector .(Vi ) of D: – For each keyword query .qx , we check the following condition: ?

.

qx = wx , wx ∈ Vi .

(4.13)

– For each range query .[a, b], we check the following condition: vy ∈ Vj , .

vy ∈ [a, b].

(4.14)

We say a VNF satisfies a VNF query if all attribute checks pass. As has been discussed before, there are two challenges in directly adopting SNARG: plaintext input in verifications and prover inefficiency: • For the first challenge, it is preferred to use honestly generated authenticators of the VNF dictionary in the verifications. That is, VNF-T can pre-compute Pedersen commitments for the dictionary over linearly independent generators in CRS. At the same time, the interfaces of the SNARG scheme should also be modified [57, 58]: – Setup.(G, F) → (CK, EK, V K). Setup generates evaluation keys and verification keys for .F, and a set of commitment keys CK for digesting the dictionary. Note that CK can be part of EK. – Commit.(D, CK) → AutD . Commit takes into the dictionary and the commitment keys to output a dictionary authenticator, .AutD . – Prove.(xQ , D, EK) → (xO , π ).

106

4 Transparent Data Query in HCN

Prove takes into a VNF query represented as a vector, .xQ , the VNF dictionary, and the evaluation keys. The algorithm evaluates the corresponding QAP to output a query result along with a correctness proof. – Verify.(AutD , xQ , xO , V K, π ) → (0, 1). As desired, this algorithm directly uses the dictionary authenticator .AutD along with the query and query results in the verifications. .AutD is a constantsize group element that can save much storage overhead compared with verifications using the dictionary in plaintext. The modified SNARG can be instantiated from many online–offline SNARG constructions [18, 59, 60]. • For the second challenge, a dictionary pruning strategy is adopted to improve search efficiency. More specifically, there often exists a key query item, .q ∗ , in a VNF query, such as VNF location and functionality. Using this key query item, we can first locate VNFs that match the query to significantly reduce query space and get a pruned dictionary .D  with a small number of VNFs. Compared with the original query function, we now have a modified query function defined as follows: F1 : (D, q ∗ ) → R1 , .

F2 : (D  , Q) → R2 .

(4.15)

– .F1 is a function that performs the key query item over the VNF dictionary. The output is .R1 , which consists of VNF identifiers that match the key query item and is also denoted as a pruned dictionary .D  . – .F2 is a query function that performs the full query over VNFs in .R1 . The function outputs identifiers of VNFs that match the full query as final results in .R2 . To enable efficient verifications of .R2 without plaintext inputs of the pruned dictionary, we need to design a prune function as follows: .

P rune(D, AutD , R1 ) → (D  , AutD  , πp ).

(4.16)

More specifically, the inputs of the prune function are the VNF dictionary and the key query result. The prune function needs to output an authenticator .(AutD  ) with a correctness proof .πp . The authenticator and the correctness proof should be efficiently verified based on .R1 . The detailed construction of the prune function will be discussed later. Third, with the templates for VNF query being decided, SA generates parameters and keys for the query templates: • SA sets security parameter .λ and a set of groups:

4.4 Use Case: Blockchain-Based VNF Query

G = (G1 , G2 , GT ).

.

107

(4.17)

SA chooses g from .G1 and .g˜ from .G2 . SA chooses SHA-512 as a secure hash function defined as follows: H : (0, 1)∗ → (0, 1)512 .

.

(4.18)

• SA defines a SNARG system .1 for the key query function. SA generates keys for .1 as follows using the modified SNARG: .

Setup(G, F1 ) → (CK1, EK1, V K1).

(4.19)

Note that .1 takes D and .q ∗ when generating a proof using Prove function of the modified SNARG. • Similarly, SA generates keys for .2 of the full query function using the modified SNARG: .

Setup(G, F2 ) → (CK2, EK2, V K2).

(4.20)

Note that the inputs of .2 when using Prove algorithm are Q and the pruned dictionary. Moreover, an augmented QAP is generated for both functions to ensure generators for CK1 and CK2 are linearly independent. Another approach is to use external random generators for committing dictionary and a linking method to show two commitments open to the same value [58]. • Finally, after setting up the blockchain and generating keys, SA can then provision keys to the blockchain and VNF-M. Specifically, SA publishes the following parameters onto the blockchain: (λ, G, g, g, ˜ e, p, H, V K1, V K2).

.

(4.21)

• SA sends CK1 and CK2 to VNF-T for computing dictionary authenticators for the two SNARG systems. • SA sends EK1 and EK2 to VNF-M for performing VNF query and computing proofs.

4.4.4.2

Design of Pruning Function

To enable efficient dictionary pruning, the VNF-T needs to pre-compute a set of authentications for the second SNARG system. Before going into details, let us review the design choices of the dictionary pruning function: • A modified SNARG can be directly used to generate a pruned dictionary using the key query item. Specifically, the SNARG can simply copy values from the

108

4 Transparent Data Query in HCN

original dictionary to the output dictionary. Since the locations in the original dictionary to be copied are not clear at the beginning, this approach cannot be efficiently implemented using non-RAM SARNK. • Another approach is to adopt a subvector ZKP system, e.g., [55], to open the original dictionary (in plaintext) at positions related to the pruned dictionary. Similarly, since we do not know the pruned dictionary without knowing .q ∗ , we cannot pre-set up keys for the subvector ZKP system. Our goal is to have a dictionary pruning scheme that is efficient for verifications. To achieve efficient verification, we observe that modern ZKP systems usually have a resource-consuming pre-processing phase. For example, SNARGs use a trusted pre-processing phase to encode function information into evaluation keys. For designing a pruning function, our strategy is to offload the trusted preprocessing phase to VNF-Ts. More specifically, we let VNF-Ts construct an auxiliary dictionary .Daux . In .Daux , VNF-Ts pre-compute all individual authenticators of each VNF in its VNF dictionary. By doing so, any authenticator for a pruned dictionary can be quickly reconstructed from authenticators in .Daux . To further enable verification of the correctness and integrity of the pre-computed authenticators, VNF-Ts construct a Merkle tree that digests all the authenticators and stores the root on the blockchain. Detailed constructions of the authenticators are discussed in the following section.

4.4.4.3

VNF Listing

As shown in Fig. 4.3, VNF-P works with VNF-M to conduct VNF listing with the following steps: • First, a VNF-P abstracts its resources as VNFs and constructs a VNF dictionary that consists of a set of VNF information vectors: (1) Construct VNF dictionary with authenticator (2) Compute auxiliary VNF dictionary with authenticators

VNF-P

Blockchain (3) Send

(4) Send the dictionary and pre-computed authenticators

Fig. 4.3 Workflow of VNF listing

and

VNF-M

4.4 Use Case: Blockchain-Based VNF Query

109

Pre-computed Authenticators

Authenticator

V ={

,

,…,

}

V ={

,

,…,

}

1,1

,…

2,1

…… 3,1

,… ,…

1,m ∗

2,m ∗

3,m ∗

…… V ={

}

,…,

,

,1

,…

,m ∗

VNF Dictionary Fig. 4.4 Dictionary authenticator

D = {Vi }i∈[1,m] .

.

(4.22)

Then, the VNF-P constructs a dictionary authenticator using the commitment keys CK1 as follows: AutD = .

n m  

(CK1[i][j ])Vi [j ] ,

i=1 j =1

(4.23)

CK1 ∈ Gm∗n 1 . More specifically, in the i-th VNF information vector, .Vi [j ] denotes the j -th attribute and a generator for the j -th attribute has a commitment key .CK1[i][j ]. An illustrative example of the authenticator is shown on the left side of Fig. 4.4. We can see that the whole dictionary is digested into a single authenticator, which saves much of the on-chain storage. • Second, the VNF-T computes an auxiliary dictionary with authenticators for the second SNARG system as follows: – The VNF-T splits its VNF dictionary into rows of VNF information vectors and computes an individual authenticator for each of them:

110

4 Transparent Data Query in HCN

AutVi,j =

n 

(CK2[j ][x])Vi [x] ,

x=1

∀Vi ∈ D,

.

j∈

(4.24)

[1, m∗max ], m∗

∗n

CK2 ∈ G1 max . The maximum number of matched VNFs in .R2 (.D  ) is .m∗max . .CK2[j ][x] corresponds to the x-th attribute in the j -th VNF in .R2 . .AutVi,j denotes the authenticator for the i-th VNF in the original dictionary that appears as the j -th VNF in .R2 . We can see that each VNF has .m∗max pre-computed authenticators for the second SNARG system since the order of VNFs in .R2 is not known before the key query operation. As shown in the right side of Fig. 4.4, there are total .m ∗ m∗max authenticators pre-computed. – Due to the large size of the pre-computed authenticators, it is not storage efficient to store them on the blockchain. Therefore, the VNF-T uses a Merkle tree to digest them. First, the leaf node of the tree is calculated as follows: Authi,j = H (i||j ||AutVi,j ), .

∀Vi ∈ D,

(4.25)

j ∈ [1, m∗max ]. Ideally, the number of total pre-computed authenticators can be .2h−1 , where h is the height of a balanced binary tree. If not, the VNF-T needs to pack items into the set of authenticators. Then, the VNF-T computes a Merkle tree using Algorithm 4. Specifically, the algorithm computes hashes from leaf nodes to get a tree T and a Merkle Root.

Algorithm 4: Merkle tree construction Input: .{Authi,j } Output: Merkle Tree T Set T to a .(2h − 1)-dimension array Copy .(i||j ||AutVi,j ) to the last .m ∗ m∗max items in T Calculate values of intermediate nodes in a bottom-up manner Set Root as the Merkle tree root

An illustrative example is shown in Fig. 4.5. The constructed Merkle tree can serve as the auxiliary dictionary for verifiable VNF query, and Root is its corresponding authenticator. • Finally, the VNF-T distributes data to other entities as follows:

4.4 Use Case: Blockchain-Based VNF Query

(1||1||

1,1

)

(1||2||

1,2

)

111

……

( ||m∗

||

)

,m ∗

……

……

Merkle

……

Auxiliary Dictionary Fig. 4.5 Auxiliary dictionary authenticator

Consortium Blockchain

Authenticator Merle VNF Query Contract Fig. 4.6 On-chain storage

– The VNF-T sends the dictionary authenticator and the Merkle root to the blockchain: (AutD , Root).

.

(4.26)

The blockchain storage is shown in Fig. 4.6. – The VNF-T sends its VNF dictionary, auxiliary dictionary, and pre-computed authenticators to VNF-M:

112

4 Transparent Data Query in HCN

(1) Construct a VNF query

VNF-T

Blockchain (2) Send the query to blockchain (3) Check query correctness

Fig. 4.7 Workflow of query construction

(D, Daux , AutVi,j ), .

i ∈ [1, m],

(4.27)

j ∈ [1, m∗max ].

4.4.4.4

VNF Query Construction

As shown in Fig. 4.7, a VNF-T can construct a VNF query as follows: • First, the VNF-T determines service requirements including a key query item .q ∗ and keyword query items .wi (the first one is the key query item) and range query items .[aj , bj ]. Therefore, a query vector is denoted as follows: Q = (q ∗ , {wi }i∈[1,n1 ] , {[aj , bj ]}j ∈[n1 +1,n] ).

.

(4.28)

There are n query items in Q where .n1 of them are keyword items. • Second, the VNF-T sends the query to the VNF contract on the blockchain via a secure and authenticated channel. • Third, upon receiving the query, the blockchain checks the query completeness and signature of the query. If the checks pass, the blockchain can store the query in its storage with a unique query ID and a processing flag (for identifying the query processing status). The detailed query contract is shown in Algorithm 5.

4.4.4.5

VNF Query Processing

As shown in Fig. 4.8, VNF-M processes the VNF query using the following steps: • First, VNF-M retrieves the unprocessed query Q from the blockchain to conduct the VNF query:

4.4 Use Case: Blockchain-Based VNF Query

113

Algorithm 5: VNF query contract Inputs: V K1, V K2, AutD , Root Set RECQ to be empty Function VNFQuery(Q) Set f lag to 0 Add (Q, I DT , f lag) to RECQ Function Confirm(Q) Retrieve I DT , f lag from RECQ by Q Check the message sender is I DT Set flag to 1 Function Complaint(Q, any proof ) Retrieve I DT , f lag from RECQ by Q Check the correctness of the proof

(1) Conduct query over the dictionary (2) Compute SNARG and Merkle proofs

VNF-M

VNF-T (3) Send the query results, proofs and authenticators to VNF-T

Fig. 4.8 Workflow of query processing

– Using the key query item .q ∗ , VNF-M searches the VNF dictionary to find VNFs that match .q ∗ and include the matched VNFs in .R1 : R1 = (i1 , i2 , . . . , im∗ ),

.

(4.29)

where .Vix ,x represents .ix -th VNF in the original VNF dictionary and is also the x-th VNF in the query result .R1 , i.e., the VNFs of a pruned dictionary .D  . – Using the pruned dictionary .D  and its VNF information vectors, i.e., .Vix ,x in  .D , VNF-M conducts a full VNF query to output a final query result .R2 . More specifically, .R2 consists of indexes of VNFs in the pruned dictionary that pass the full query. • Second, VNF-M computes proofs for the query results as follows: – For the results in .R1 , VNF-M computes a SNARG proof using Prove algorithm of .1 : 1 .P rove(D, q ∗ , EK1) → (R1 , π1 ).

.

(4.30)

114

4 Transparent Data Query in HCN

The inputs of the algorithm include the VNF dictionary, the key query item, and EK1, while the outputs include a SNARG proof .π1 . – For the results in .R2 , VNF-M computes another SNARG proof using Prove algorithm of .2 : 2 .P rove(D  , Q, EK2) → (R2 , π2 ).

.

(4.31)

The inputs of the algorithm are the pruned dictionary, the full query, and EK2, while the outputs include a SNARG proof .π2 . – For each VNF in the pruned dictionary, VNF-M needs to compute an authenticator to be used in the verification of the second SNARG proof: ∗

AutD  =

m 

.

AutVix ,x ,

(4.32)

x=1

where .AutVix ,x is the pre-computed authenticator that VNF-M receives from the VNF-P. To demonstrate that .AutVix ,x is consistent with the authenticator that VNF-P computes, VNF-M further computes a Merkle proof .πp . As shown in Algorithm 6, VNF-M includes the siblings of the Merkle path for each authenticator in .R1 . With the on-chain Merkle root, this proof can be used to verify that the authenticator .AutVix ,x is correctly digested in the Merkle tree. Algorithm 6: Merkle tree proof Input: Daux , R1 Output: Proof πp for ∀i ∈ R1 do Identify Vi as jth VNF in R1 Find a path from Authi,j ∈ Daux to the root Add siblings of nodes on the path to πp

• Finally, VNF-M sends the following message to the VNF-T via a secure and authenticated channel: (π1 , π2 , πp , R1 , {AutVix ,x }ix ∈R1 , R2 ).

.

(4.33)

Note that VNF-M does not send the query results directly onto the blockchain. Since the proof .πp consists of multiple Merkle proofs, .πp cannot be efficiently verified on the blockchain. To address this issue, we let the VNF-T first receive the proofs and verify them off chain. Then, the VNF-T can either confirm the correctness or make complaints about the incorrectness of the proofs on the blockchain. By doing so, post-query accountability and proof-of-misbehavior can be achieved to reduce on-chain overheads [49].

4.4 Use Case: Blockchain-Based VNF Query

115

Blockchain (1) Retrieve

and

(2) Receive query results, proofs and authenticators

VNF-M

VNF-T (3) Reconstruct authenticator for pruned dictionary (4) Verify results and proof (5) Confirm or make complaints on the blockchain Fig. 4.9 Workflow of query verification

VNF-M can also generate commitments of .R1 and .R2 and sign on the commitments. By doing so, the VNF-T can verify the correctness of the commitments locally. In case of any dispute, the VNF-T can only send the commitments instead of the original .R1 and .R2 to the blockchain to save on-chain storage overheads.

4.4.4.6

VNF Query Verification

As shown in Fig. 4.9, VNF-P can verify VNF query results and proofs from VNF-M using the following steps: • First, the VNF-T retrieves the dictionary authenticator and the Merkle root from the blockchain. • Second, the VNF-T receives query results from VNF-M via a secure and authenticated channel. • Third, using the received pre-computed authenticators .AutVix ,x , the VNF-T computes the aggregated authenticator for the pruned dictionary: ∗

.

AutD  =

m 

AutVix ,x .

(4.34)

x=1

• Fourth, the VNF-T verifies the correctness of received proofs: – For each Merkle path in .πp , the VNF-T recomputes a Merkle root and checks if the recomputed root equals Root. – For .π1 , the VNF-T checks its correctness using Verify algorithm of .1 :

.

1 .V erify(AutD , q ∗ , R1 , V K1, π1 ).

(4.35)

116

4 Transparent Data Query in HCN

– For .π2 , the VNF-T checks its correctness using Verify algorithm of .2 : .

2 .V erify(AutD  , Q, R2 , V K2, π2 ).

(4.36)

• There can be two cases for the proof verifications: – If all proofs are correct, the VNF-T can send a confirmation message to the VNF contract as shown in Algorithm 5. The contract will check the correctness and authentication of the confirmation message and will mark the query as processed if all checks pass. – If any of the proofs does not pass the verification, the VNF-T can send the incorrect proofs along with VNF-M’s signature to the query contract to make complaints. The contract will check if the proof does come from the VNFM and verify the correctness using associated verification algorithms. If the contract determines the proof is authentic but incorrect, the contract confirms that VNF-M is not behaving honestly and can take further actions to enforce accountability. Note that, for the Merkle proof, the VNF-T only needs to upload the incorrect proof onto the blockchain for verification. This can save a lot of on-chain storage overheads.

4.4.5 Security Analysis In this section, we analyze the security of the proposed blockchain-based VNF query scheme. First, we recall security notions and requirements for the SNARG. Second, we demonstrate the security of commitment schemes and the prune function. We highlight the considerations when generating CRS. Finally, we show that the proposed scheme achieves the verifiable VNF query.

4.4.5.1

Security of SNARG

The security of the SNARG is defined as follows: ⎡

⎤ V erify(AutD , Q, R, V K, π ) = 1 : ⎢ Setup(G, F) → (CK, EK, V K)∧ ⎥ ⎢ ⎥ ⎢ ⎥ . Pr ⎢ Commit (D, CK) → AutD ∧ ⎥ = neg(λ). ⎢ ⎥ ⎣ F : (D, Q)  R∧ ⎦ AF (D, Q, EK) → (R, π )

(4.37)

Specifically, V erify and Setup are the algorithms from the definition of the modified SNARG scheme. Commit is the scheme to generate a dictionary authen-

4.4 Use Case: Blockchain-Based VNF Query

117

ticator. The Setup and Commit algorithms are honestly executed. The adversary is given access to function inputs and evaluation keys to output a function output with a proof. The security notion states that a computationally bounded adversary cannot forge valid proof for query results that do not match the query rules. It should be noted that the adversary is not given access to the previously generated valid proofs. That is, the security notion is similar to the definition of soundness when SNARG is used in verifiable computation. The notion does not guarantee the non-malleability of the proofs and zero-knowledge property. The security of the SNARG is achieved under the following conditions: • System parameters are honestly computed by a trusted party, and the trapdoor secret is destroyed after the setup. • Commit algorithm is generated honestly and uploaded to the immutable blockchain storage. Moreover, to directly use the authenticators in the verifications, an augmented QAP is generated in Setup to obtain linearly independent generators for I/O wires. Otherwise, the SNARG can achieve a security notion called “weak cc-SNARG” [58]. • The pairing-based SNARG construction is sound against a computationally bounded adversary [17].

4.4.5.2

Security of Commitments

In the following, we discuss the security of Merkle commitment and Pedersen commitment. We mainly focus on binding property of the two commitments: • Merkle tree is used to digest a set of elements for membership proofs. Its security can be reduced to the collision–resistance property of hash functions. That is, a computationally bounded adversary cannot prove membership of an element that is not digested in the tree. In the proposed scheme, each leaf node in the tree is also associated with an index, which results in an indexed Merkle tree. In this case, an adversary cannot open the same position in the tree to two different elements unless the adversary can find collisions of the hash function. • For Pedersen commitment, it requires that a Pedersen commitment cannot be opened to two different values by a computationally bounded adversary. When the generators of Pedersen commitment are uniformly distributed, it is easy to achieve this property. To directly use commitments of I/O wires in the verification of the modified SNARG, the following conditions must be met: • The SNARG system should achieve strong binding [58] or the converted QAP has linearly independent generators for I/O wires [18]. Otherwise, an adversary can easily forge a proof to pass the verification. • Authenticators can be built using external randomly chosen generators. In this case, a ZKP to show the authenticators using the external generators open to

118

4 Transparent Data Query in HCN

the same values with the I/O commitments using keys from CRS is required. A security notion called “adaptive soundness” [21] is achieved in this case. • In any of the above cases, the authenticators should come from a trusted source. That is, the authenticators need to be honestly computed. Otherwise, an additional ZKP should be designed to demonstrate the well-formedness of the authenticators. An alternative approach is to let the owner of the authenticator open it at specific positions in case of any disputes.

4.4.5.3

Dictionary Pruning Security

The proposed scheme adopts a dictionary pruning function to generate an authenticator of the pruned dictionary. The security of the dictionary pruning should guarantee that the generated authenticator is correctly computed for VNF information vectors in .R1 . The security notion is achieved as follows: • Authenticators for the first and second SNARG systems are correctly computed using trusted keys. At the same time, the Merkle root of the pre-computed authenticators (with indexes) is also honestly computed and stored on the immutable storage of the blockchain. • The key query process is conducted securely by the first SNARG system. That is, the result of the pruned dictionary .R1 is sound and cannot be forged by an adversary. A verifier will check the correctness of the returned .R1 and Merkle proofs. Under the above two security notions, an adversary can forge an authenticator for the pruned dictionary if the adversary can break either the SNARG security or the Merkle security.

4.4.5.4

Verifiable VNF Query

As defined before, verifiable VNF query should achieve input authenticity of VNF queries and the dictionary and execution correctness of the query process: • Due to the security of commitment schemes, the dictionary authenticator is securely generated and stored on the blockchain. Due to the security of the prune function, the authenticator for the second SNARG system is secure; since communications between blockchain, VNF-T, VNF-M, and VNF-P are secure, VNF queries and query results are authentic. • Due to the security of the SNARG, the two query functions .F1 and .F2 are correctly computed and the results cannot be forged.

4.4 Use Case: Blockchain-Based VNF Query

VNF Query Program

119

xjsnark Compiler

1. Query Circuit (*.arith) 2. Sample Inputs (*.in)

Benchmarks for SNARG

libsnark Interface

Fig. 4.10 Implementation illustration of SNARG

4.4.6 Performance Evaluation In this section, we evaluate the performance of the proposed blockchain-based VNF query scheme. First, we present the implementation overviews including the high-level architecture of Hyperledger Fabric and the workflow of a chaincode call. Second, we report the off-chain benchmarks, including computational and storage overheads for system setup, proof generation, and proof verification. Third, we present the performance comparisons between the dictionary pruning scheme and the traditional method. Finally, we discuss on-chain benchmarks on a testing blockchain network.

4.4.6.1

Implementation Overview

We conduct our experiments on a laptop with 2.3 GHz processors and 8 GB memory. We compile and run all codes in Ubuntu 16.04, 64 bit. The implementation includes the SNARG and the Hyperledger Fabric: • The overview of the SNARG implementation is shown in Fig. 4.10. First, programs for conducting VNF query are written using a high-level language defined by xjsnark [26]. Then, xjsnark can compile the VNF query program into an arithmetic circuit to generate two files: a circuit description file (*.arith) and a sample input file (*.in). Second, we can use the jsanrk interfaces adopted from libsnark.1 More specifically, the pre-processing model is instantiated [18] with alt-bn128 curve. The interfaces can take the circuit description and the sample input files to run SNARG algorithms and present benchmarks.

1 libsnark:

a C++ library for zkSNARK proofs. https://github.com/scipr-lab/libsnark.

120

4 Transparent Data Query in HCN

JPBC

Chaincode Channel Org.

Peer

……

…… Org.

Peer

Ordering Node

…… RAFT Consensus

Fig. 4.11 Implementation illustration of Hyperledger Fabric

• For on-chain experiments, we set up a consortium blockchain network based on Hyperledger Fabric [61]. As shown in Fig. 4.11, the test network in Hyperledger Fabric consists of three layers: – First, in the consensus layer, one ordering node is implemented with 2 or more organizations. The ordering node is responsible for running the RAFT consensus protocol. Each organization can have one or more peer nodes. Compared with a public blockchain, Hyperledger Fabric introduces “organization” to cluster peer nodes into different groups. – Second, Hyperledger Fabric introduces “channel” to achieve side chains. For each channel, a different blockchain can be deployed with different access control and membership controls. That is, peer nodes can only communicate with each other after joining the same channel. – Third, after joining a channel, peer nodes can deploy and invoke a smart contract, which is denoted as chaincode in Hyperledger Fabric. It supports writing contracts in Go and JAVA. Note that when external code libraries are needed, they must be added to chaincode dependencies. For example, JavaBased Pairing Cryptography (JPBC) [62] can be added and used to write chaincode.

4.4 Use Case: Blockchain-Based VNF Query

121

Fig. 4.12 Chaincode script

Start blockchain network

Create and join a channel

Create chaincode package

Install & approve package

Commit & invoke chaincode To conduct on-chain experiments, developers first need to install the prerequisites of Hyperledger Fabric. After successfully setting up the testing environment, we can start the test network to deploy and invoke chaincodes as shown in Fig. 4.12: – Basic chain parameters can be set in the configuration file of the test network. For example, the number of organizations and peer nodes can be adjusted. After that, the test blockchain network can be started using Docker. In our experiments, we adopt a single ordering node running the RAFT consensus protocol. – Peer nodes can create and join a channel using a script file. After that, peer nodes within the same channel can deploy and invoke smart contracts. – Any peer node can write and package a chaincode. To do so, the peer node should determine data structures and function calls of the chaincode. Using scripts provided by Hyperledger Fabric, the peer node can compile the chaincode into a package written by Go or JAVA. – Each of the other peer nodes needs to approve the chaincode before deploying the chaincode on the channel. – Finally, the peer node can commit the chaincode to the channel. Any peer node in the channel can invoke the chaincode.

122

4 Transparent Data Query in HCN

For more details, interested readers can refer to the official documentation of Hyperledger Fabric (version 2.3).2 In the test network, a script file named “deployCC.sh” is provided to conduct the 3–5 steps mentioned above.

4.4.6.2

Off-Chain Benchmarks

We present the parameter settings in the off-chain experiments. Specifically, the two query functions are set as follows: • For .F, we set the query as a 20-dimension vector that consists of 10 keyword values and 10 range values. VNF dictionary is also set as collections of VNF information vectors, denoted as .Vi . .Vi includes 10 keyword values (for equality check) and 10 numeric values (for range check). .F adopts a conjunctive query strategy, where a VNF is matched if all 20 query checks pass. Note that the outputs of .F include query results for all VNFs in the original dictionary that are taken into the verification algorithm in plaintext. • For .F1 , the query is set as a vector with 10 keyword items and 10 range items. The key query item is set as a binary value to indicate if a VNF is available or not. More specifically, 1 indicates the VNF is available, while 0 indicates unavailable. That is, .F1 checks all VNF information vectors in the dictionary to see if they are available or not. The function outputs the availability information of all VNFs in the original dictionary. The outputs are also taken into the verification algorithm in plaintext in our implementation. • For .F2 , it conducts the full query using Q over the pruned dictionary. More specifically, each VNF in the pruned dictionary has a 20-dimension VNF information vector, denoted as .Vi . In contrast to .F, .F2 conducts the full query over the pruned dictionary rather than the original dictionary. In the following, we present the computation and storage overheads for .F, .F1 , and .F2 . • As shown in Fig. 4.13, we test three algorithms of .F: Setup, Prover, and Verifier. We assume that the number of VNFs in the VNF dictionary is .2h−1 , where h is the height of the Merkle tree of all VNF information vectors. We can see that the running time for Setup and Prover increases as the height increases. Specifically, the Setup can consume roughly 600 seconds when the height is 12. At the same time, the Prover algorithm can also take around 100 seconds. As shown in Fig. 4.14, the storage overheads for .F are presented. QAP degree refers to the number of multiplication gates in the circuit for .F, which can roughly indicate the complexity of .F. In our calculations, we set 1 KB= 8,000 bit and 1MB .= 1,000 KB. As the height increases, the size of the VNF dictionary increases. Since .F must perform a linear scan over the whole dictionary, the

2 Hyperledger

Fabric Documentation. https://hyperledger-fabric.readthedocs.io/en/release-2.3/.

4.4 Use Case: Blockchain-Based VNF Query

123

700

Setup Prover Verifier

600

Time (s)

500

400

300

200

100

0 9

10

11

12

height Fig. 4.13 Computation overheads of .F

computation complexity of .F, i.e., QAP degree, also increases. At the same time, the size of PK (evaluation key) also increases, which can reach hundreds of MB. By contrast, the size of VK is much smaller and can reach only hundreds of KB. As the size of the VNF dictionary can further increase, the computational and storage costs for directly implementing .F can be expensive in real-world applications. • As shown in Fig. 4.15, running time of the same three algorithms of .F1 is reported. Similarly, the running time of Setup and Prover algorithms all increases as the height of the tree increases. However, compared with that of .F, the algorithms are much more efficient. For example, when the height is 12, Setup only consumes around 10 seconds, while Prover only consumes less than 6 seconds. This is because .F1 only checks 1/20 items for a 20-dimension VNF query vector. In both .F and .F1 , the verification time remains very efficient. As shown in Fig. 4.16, the circuit complexity of .F1 also increases as the height of the Merkle tree increases. The complexity is much lower compared with that of .F. For example, when the height is 12, the QAP degree is roughly 131,000 in .F1 but 3,100,000 in .F. Similarly, the size of PK is also much smaller in .F1 compared with that in .F. However, it can be observed that the size of VK remains the same in .F and .F1 . In both .F and .F1 , the outputs indicate if VNFs in D satisfy the full query or the key item query. Moreover, as plaintexts are required for I/O values in the verifications, the size of VK is decided by the size of the query vector and the VNF dictionary.

124

4 Transparent Data Query in HCN 3500 3

QAP degree (10 ) PK (MB) VK (KB)

3000

2500

Size

2000

1500

1000

500

0 9

9.5

10

10.5

11

11.5

12

height Fig. 4.14 Storage overheads of .F 10

Setup Prover Verifier

9 8

Time (s)

7 6 5 4 3 2 1 0 9

10

11

12

height Fig. 4.15 Computation overheads of .F1

• As shown in Fig. 4.17, the physical and swap memory usage in the SNARG is reported. As we can see, the SNARG system of .F1 requires a significantly higher

4.4 Use Case: Blockchain-Based VNF Query

125

10000 QAP degree (10 3) PK (KB) VK (KB)

9000 8000 7000

Size

6000 5000 4000 3000 2000 1000 0 9

9.5

10

10.5

11

11.5

12

height Fig. 4.16 Storage overheads of .F1

memory than the SNARG system of .F1 . This is because a full query is conducted in .F, while only a key query is conducted in .F1 . • As shown in Fig. 4.18, we present the computation overheads for executing .F2 over the pruned dictionary. In this experiment, we change the size of the pruned dictionary (the number of VNFs in .D  ) from 100 to 250. This is reasonable since a key query item can significantly reduce the search space. We can see that the running time of Setup and Prover algorithms also increases with the size of the pruned dictionary. However, compared with that in .F, the time is significantly reduced. For example, performing a proof generation over the pruned dictionary with 250 VNFs only requires less than 10 seconds. The performance of .F2 is less efficient than .F1 since .F1 only needs to query one item within a 20-dimension query vector. At the same time, the running time of the verifier algorithm still remains succinct. As shown in Fig. 4.19, storage overheads of .F2 are presented. Similarly with the computation overheads, the QAP degree in .F2 is larger than that in .F1 but is smaller than that in .F. More specifically, the size of PK in .F2 can reach nearly 60 MB when .|D  | = 250. The complexity in terms of PK size and QAP size is decided by .20m∗ in .F2 , 20m in .F, and m in .F1 . At the same time, the size of VK in .F2 is much less than both .F and .F1 since the size of the pruned dictionary is less than the size of the original VNF dictionary.

126

4 Transparent Data Query in HCN 9000 8000

F F1

7000

Size (MB)

6000 5000 4000 3000 2000 1000 0 9

10

11

12

height Fig. 4.17 Memory usage of .F and .F1 45 40 35

Setup Prover Verifier

Time (s)

30 25 20 15 10 5 0 100

150

200

# VNF in D' Fig. 4.18 Computation overheads of .F2

250

4.4 Use Case: Blockchain-Based VNF Query

127

220 3

200 180

QAP degree (10 ) PK (MB) VK (KB)

160

Size

140 120 100 80 60 40 20 100

150

200

250

# VNF in D' Fig. 4.19 Storage overheads of .F2

4.4.6.3

Performance Gain by Dictionary Pruning

In this section, we compare the computation overheads between the SNARG-based VNF query and our proposed scheme with dictionary pruning. For the former, we adopt the experimental results from .F as presented before. For the dictionary pruning strategy, we calculate its overall computation overheads as follows: • The computational overheads for .F1 are included. • The computational overheads for .F2 are included when the size of the pruned dictionary is either 150 or 200. We denote .m∗ /m (the ratio between the size of the pruned dictionary and the size of the original dictionary) as the pruning ratio. This ratio is an important indicator of the performance gain with our dictionary pruning strategy. • Computation overheads for generating and verifying Merkle proofs are ignored. As shown in the next section, the running time for Merkle proofs is negligible compared with that of the SNARG. As shown in Fig. 4.20, we only report the prover time of the algorithms. We can see that our pruning strategy achieves a significant performance improvement. While the prover time reaches more than 100 seconds when the height is 12 in .F, the prover time of our pruning strategy only requires a few seconds. Roughly speaking, the pruning strategy incurs .O(m + m∗ n) overheads, while the original SNARGbased approach incurs .O(mn) overheads. That is, the pruning strategy reduces the

128

4 Transparent Data Query in HCN 120

No Pruning R =150 1

100

R =200 1

Time (s)

80

60

40

20

0 9

10

11

12

height Fig. 4.20 Comparisons with dictionary pruning

search space for executing full queries to avoid many unnecessary memory accesses in generating proofs. It should be noted that we set values of the VNF query and the dictionary and the size of the pruned dictionary. For different VNF applications, the performance gain can be affected by the pruning ratio.

4.4.6.4

Overheads for Dictionary Pruning

In the following, we present the computation and communication overheads for calculating Merkle proofs. We implement the codes of Merkle construction and proof generation on a laptop with a 2.3 GHz processor and 8 GB memory. Hash function is instantiated with SHA512 using the standard library from JPBC [62]. In the proposed scheme, a Merkle proof needs to be generated for an authenticator in the pruned dictionary. That is, the number of Merkle proofs is increasing with the size of the pruned dictionary. For each individual Merkle proof, the proof size is linear with the height of the Merkle tree, denoted as h: • As shown in Fig. 4.21, we present the computation time for constructing a Merkle tree. As the height of the tree increases, it is still very efficient since hash functions are very efficient. For example, when the height is 15, it takes less than 100 ms to construct the Merkle tree.

4.4 Use Case: Blockchain-Based VNF Query

129

100 90 80

Time (ms)

70 60 50 40 30 20 10 0 10

11

12

13

14

15

height Fig. 4.21 Computation overheads of Merkle construction

• The computation of a single Merkle proof is extremely efficient since it only involves a few hash calculations. The verifications of a single Merkle proof are conducted in the on-chain experiments. • We calculate the communication overheads for Merkle proofs. Roughly speaking, the size of Merkle proofs is .O(|R1 |log m ∗ m∗max ) that increases with the number of VNFs in the pruned dictionary and the height of the Merkle tree. As shown in Fig. 4.22, the proof size can reach a few KBs. For example, when the height of the tree is 16 and the size of .R1 is 250, the overall size of Merkle proofs is around 250 KBs. In summary, the experimental results demonstrate that the proposed pruning strategy can significantly reduce the computation overhead at the prover while preserving the verifier efficiency. Although the pruning strategy introduces additional communication overhead, the cost is justified by the reduction of computation overhead and can be further reduced with secure hardware-based designs.

4.4.6.5

On-Chain Benchmarks

In the proposed scheme, we adopt a post-query measure to improve the on-chain efficiency. In case of dispute, VNF-T can report incorrect proofs to the blockchain for verification. In our experiment, we implement the verification of a single Merkle proof.

130

4 Transparent Data Query in HCN 250

Size (KB)

200

height=10 height=12 height=14 height=16

150

100

50

0 50

100

150

Size of R

200

250

1

Fig. 4.22 Communication overheads of Merkle proof

More specifically, we write and compile a chaincode with JPBC for hash functions. A Merkle tree is calculated off chain, and the root is stored in the chaincode before being committed onto the blockchain. We set a function in the chaincode to receive and verify a Merkle proof. We modify the script “deployCC” in the test network to send the Merkle proof via function parameters. We measure the time difference between sending the function call and receiving a response from the blockchain. In the following, we report the response time in different blockchain settings and tree height: • As shown in Fig. 4.23, we set two organizations in the network and each organization has one peer node. We change the height of the Merkle tree as 12, 13, or 15. As we can see in the figure, the response time takes roughly 4– 5 seconds and is highly random regarding the tree height. That is, the response time is mainly affected by the network status and the consensus protocol used, and the computational time for verifying Merkle proofs is not that critical. • As shown in Fig. 4.24, we set the height of the Merkle tree as 13 and test the response time in three different network settings: – There are 2 organizations in the network, and each organization has 1 peer node. – There are 3 organizations, and each organization has 1 peer node. – There are 2 organizations, and each organization consists of 2 peer nodes.

4.4 Use Case: Blockchain-Based VNF Query

131

5600

height=12 height=13 height=15

5400

Time (ms)

5200

5000

4800

4600

4400

4200 1

2

3

4

5

Fig. 4.23 Response time of chaincode I 10

1.6

1.4

2 org, two node 3 org, 3 node 2 org, 4 node

Time (ms)

1.2

1

0.8

0.6

0.4 1

2

3

4

5

Fig. 4.24 Response time of chaincode II

We can see from the figure that as the number of peer nodes increases in the test network, the response time on average also increases. When the number of peer nodes is 4, the response can reach 14 seconds in our experiments. It should

132

4 Transparent Data Query in HCN

be mentioned that our proposal can be adapted to blockchain networks with any consensus protocols.

4.5 Summary and Discussions In this chapter, we have investigated the blockchain-based data query approaches. First, we have reviewed the motivations and applications of the blockchain-based data query with three specific requirements: query privacy, query trustworthiness, and query efficiency. Then, we have discussed the existing works in terms of cloudbased data query and blockchain-based data query to highlight the design challenges for balancing the decartelization and query efficiency. To address the challenges, a representative construction for blockchain-based VNF management with efficient VNF query has been proposed. With the on/off-chain computation model from SNARG, an efficient on-chain digest of VNF dictionary is designed to support succinct on-chain storage and query verifications. Most importantly, the RAM issue of the SNARG that increases off-chain proving overhead has been identified and mitigated by a dictionary pruning strategy. A verifiable dictionary pruning function has been designed based on pre-computed dictionary commitments in a Merkle tree. By doing so, many unnecessary memory accesses in generating VNF query proofs are avoided. Extensive experiments have confirmed that the proposed dictionary pruning strategy can improve the proving efficiency significantly while only introducing limited communication overheads. For future work, versatile query functionalities for VNF management should be designed in an efficient manner. Moreover, on-chain verification costs of Merkle proofs in the dictionary pruning can be further reduced by hardware-based approaches.

References 1. A. Tomescu, V. Bhupatiraju, D. Papadopoulos, C. Papamanthou, N. Triandopoulos, and S. Devadas, “Transparency logs via append-only authenticated dictionaries,” in Proc. of ACM CCS, 2019, pp. 1299–1316. 2. D. Liu, C. Huang, L. Xue, J. Hou, X. Shen, W. Zhuang, R. Sun, and B. Ying, “Authenticated and prunable dictionary for blockchain-based VNF management,” IEEE Transactions on Wireless Communications, vol. 21, no. 11, pp. 9312–9324, 2022. 3. H. Li, H. Zhu, S. Du, X. Liang, and X. Shen, “Privacy leakage of location sharing in mobile social networks: Attacks and defense,” IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 4, pp. 646–660, 2016. 4. C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure ranked keyword search over encrypted cloud data,” in IEEE International Conference on Distributed Computing Systems, 2010, pp. 253–262. 5. N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preserving multi-keyword ranked search over encrypted cloud data,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 1, pp. 222–233, 2013.

References

133

6. R. Li, A. X. Liu, A. L. Wang, and B. Bruhadeshwar, “Fast and scalable range query processing with strong privacy protection for cloud computing,” IEEE/ACM Transactions on Networking (TON), vol. 24, no. 4, pp. 2305–2318, 2016. 7. G. Xu, H. Li, Y. Dai, K. Yang, and X. Lin, “Enabling efficient and geometric range query with access control over encrypted spatial data,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 4, pp. 870–885, 2018. 8. Y. Zheng, R. Lu, H. Zhu, S. Zhang, Y. Guan, J. Shao, F. Wang, and H. Li, “SetRkNN: Efficient and privacy-preserving set reverse kNN query in cloud,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 888–903, 2022. 9. W. Sun, B. Wang, N. Cao, M. Li, W. Lou, Y. T. Hou, and H. Li, “Verifiable privacypreserving multi-keyword text search in the cloud supporting similarity-based ranking,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 11, pp. 3025–3035, 2013. 10. X. Liu, G. Yang, Y. Mu, and R. H. Deng, “Multi-user verifiable searchable symmetric encryption for cloud storage,” IEEE Transactions on Dependable and Secure Computing, vol. 17, no. 6, pp. 1322–1332, 2018. 11. S. Wu, Q. Li, G. Li, D. Yuan, X. Yuan, and C. Wang, “ServeDB: Secure, verifiable, and efficient range queries on outsourced database,” in Proc. of IEEE ICDE. 12. Y. Zhang, D. Genkin, J. Katz, D. Papadopoulos, and C. Papamanthou, “vSQL: Verifying arbitrary SQL queries over dynamic outsourced databases,” in Proc. of IEEE S&P, 2017, pp. 863–880. 13. Q. Pei, E. Zhou, Y. Xiao, D. Zhang, and D. Zhao, “An efficient query scheme for hybrid storage blockchains based on Merkle Semantic Trie,” in Proc. of International Symposium on Reliable Distributed Systems, 2020, pp. 51–60. 14. C. Xu, C. Zhang, and J. Xu, “vChain: Enabling verifiable Boolean range queries over blockchain databases,” Proc. of SIGMOD, pp. 141–158, 2019. 15. H. Wu, Z. Peng, S. Guo, Y. Yang, and B. Xiao, “VQL: efficient and verifiable cloud query services for blockchain systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 6, pp. 1393–1406, 2021. 16. L. Nguyen, “Accumulators from bilinear pairings and applications,” in Proc. of CT-RSA. Springer, 2005, pp. 275–292. 17. B. Parno, J. Howell, C. Gentry, and M. Raykova, “Pinocchio: Nearly practical verifiable computation,” in Proc. of IEEE S&P, 2013, pp. 238–252. 18. E. Ben-Sasson, A. Chiesa, E. Tromer, and M. Virza, “Succinct non-interactive zero knowledge for a von Neumann architecture,” in Proc. of USENIX Security, 2014, pp. 781–796. 19. A. Ozdemir, R. Wahby, B. Whitehat, and D. Boneh, “Scaling verifiable computation using efficient set accumulators,” in Proc. of USENIX Security, 2020, pp. 2075–2092. 20. D. Catalano and D. Fiore, “Vector commitments and their applications,” in Proc. of PKC. Springer, 2013, pp. 55–72. 21. D. Fiore, C. Fournet, E. Ghosh, M. Kohlweiss, O. Ohrimenko, and B. Parno, “Hash first, argue later: Adaptive verifiable computations on outsourced data,” in Proc. of ACM CCS, 2016, pp. 1304–1316. 22. K. Nguyen, G. Ghinita, M. Naveed, and C. Shahabi, “A privacy-preserving, accountable and spam-resilient geo-marketplace,” in Proc. of ACM SIGSPATIAL, 2019, pp. 299–308. 23. C. Priebe, K. Vaswani, and M. Costa, “EnclaveDB: A secure database using SGX,” in Proc. of IEEE S&P, 2018, pp. 264–278. 24. A. Chepurnoy, C. Papamanthou, and Y. Zhang, “EDRAX: A cryptocurrency with stateless transaction validation.” IACR Cryptol. ePrint Arch., vol. 2018, p. 968, 2018. 25. H. Zhang, J. Liu, H. Zhao, P. Wang, and N. Kato, “Blockchain-based trust management for Internet of vehicles,” IEEE Transactions on Emerging Topics in Computing, vol. 9, no. 3, pp. 1397–1409, 2020. 26. A. Kosba, C. Papamanthou, and E. Shi, “xJsnark: a framework for efficient verifiable computation,” in Proc. of IEEE S&P, 2018, pp. 944–961. 27. R. S. Wahby, S. T. Setty, Z. Ren, A. J. Blumberg, and M. Walfish, “Efficient RAM and control flow in verifiable outsourced computation.” in Proc. of NDSS, 2015.

134

4 Transparent Data Query in HCN

28. W. Wu, N. Chen, C. Zhou, M. Li, X. Shen, W. Zhuang, and X. Li, “Dynamic RAN slicing for service-oriented vehicular networks via constrained learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 7, pp. 2076–2089, 2021. 29. F. Lyu, F. Wu, Y. Zhang, J. Xin, and X. Zhu, “Virtualized and micro services provisioning in space-air-ground integrated networks,” IEEE Wireless Communications, vol. 27, no. 6, pp. 68–74, 2020. 30. W. Zhuang, Q. Ye, F. Lyu, N. Cheng, and J. Ren, “SDN/NFV-empowered future IoV with enhanced communication, computing, and caching,” Proceedings of the IEEE, vol. 108, no. 2, pp. 274–291, 2019. 31. Z. Jia, M. Sheng, J. Li, D. Zhou, and Z. Han, “VNF-based service provision in software defined LEO satellite networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 9, pp. 6139–6153, 2021. 32. T. Gao, X. Li, Y. Wu, W. Zou, S. Huang, M. Tornatore, and B. Mukherjee, “Cost-efficient VNF placement and scheduling in public cloud networks,” IEEE Transactions on Communications, vol. 68, no. 8, pp. 4946–4959, 2020. 33. H. A. Shah and L. Zhao, “Multiagent deep-reinforcement-learning-based virtual resource allocation through network function virtualization in Internet of Things,” IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3410–3421, 2020. 34. X. Fu, F. R. Yu, J. Wang, Q. Qi, and J. Liao, “Dynamic service function chain embedding for NFV-enabled IoT: A deep reinforcement learning approach,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 507–519, 2019. 35. X. Shen, J. Gao, W. Wu, K. Lyu, M. Li, W. Zhuang, X. Li, and J. Rao, “AI-assisted networkslicing based next-generation wireless networks,” IEEE Open Journal of Vehicular Technology, vol. 1, pp. 45–66, 2020. 36. Y. E. Oktian, S. Lee, H. Lee, and J. Lam, “Distributed SDN controller system: A survey on design choice,” Computer Networks, vol. 121, pp. 100–111, 2017. 37. Q. Ye, W. Zhuang, S. Zhang, A.-L. Jin, X. Shen, and X. Li, “Dynamic radio resource slicing for a two-tier heterogeneous wireless network,” IEEE Transactions on Vehicular Technology, vol. 67, no. 10, pp. 9896–9910, 2018. 38. F. Guo, F. R. Yu, H. Zhang, H. Ji, M. Liu, and V. C. Leung, “Adaptive resource allocation in future wireless networks with blockchain and mobile edge computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 3, pp. 1689–1703, 2019. 39. G. A. F. Rebello, G. F. Camilo, L. G. Silva, L. C. Guimarães, L. A. C. de Souza, I. D. Alvarenga, and O. C. M. Duarte, “Providing a sliced, secure, and isolated software infrastructure of virtual functions through blockchain technology,” in Proc. of IEEE International Conference on High Performance Switching and Routing (HPSR), 2019, pp. 1–6. 40. S. Wang, D. Ye, X. Huang, R. Yu, Y. Wang, and Y. Zhang, “Consortium blockchain for secure resource sharing in vehicular edge computing: A contract-based approach,” IEEE Transactions on Network Science and Engineering, vol. 8, no. 2, pp. 1189–1201, 2020. 41. D. Hu, J. Chen, H. Zhou, K. Yu, B. Qian, and W. Xu, “Leveraging blockchain for multioperator access sharing management in Internet of vehicles,” IEEE Transactions on Vehicular Technology, vol. 71, no. 3, pp. 2774–2787, 2021. 42. P. Yang, N. Zhang, Y. Bi, L. Yu, and X. S. Shen, “Catalyzing cloud-fog interoperation in 5G wireless networks: An SDN approach,” IEEE Network, vol. 31, no. 5, pp. 14–20, 2017. 43. D. C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, “Blockchain for 5G and beyond networks: A state of the art survey,” Journal of Network and Computer Applications, p. 102693, 2020. 44. G. Wood, “Ethereum: A secure decentralised generalised transaction ledger Byzantium version,” Ethereum Project Yellow Paper, pp. 1–39, 2018-06-05. 45. E. J. Scheid, M. Keller, M. F. Franco, and B. Stiller, “BUNKER: A blockchain-based trusted VNF package repository,” in International Conference on the Economics of Grids, Clouds, Systems, and Services. Springer, 2019, pp. 188–196.

References

135

46. I. D. Alvarenga, G. A. Rebello, and O. C. M. Duarte, “Securing configuration management and migration of virtual network functions using blockchain,” in IEEE/IFIP Network Operations and Management Symposium, 2018, pp. 1–9. 47. B. Nour, A. Ksentini, N. Herbaut, P. A. Frangoudis, and H. Moungla, “A blockchain-based network slice broker for 5G services,” IEEE Networking Letters, vol. 1, no. 3, pp. 99–102, 2019. 48. J. Backman, S. Yrjölä, K. Valtanen, and O. Mämmelä, “Blockchain network slice broker in 5G: Slice leasing in factory of the future use case,” in Internet of Things Business Models, Users, and Networks, 2017, pp. 1–8. 49. S. Dziembowski, L. Eckey, and S. Faust, “FairSwap: How to fairly exchange digital goods,” in Proc. of ACM CCS, 2018, pp. 967–984. 50. M. F. Franco, E. J. Scheid, L. Z. Granville, and B. Stiller, “Brain: blockchain-based reverse auction for infrastructure supply in virtual network functions-as-a-service,” in IFIP Networking Conference (IFIP Networking), 2019, pp. 1–9. 51. H. Ren, H. Li, D. Liu, G. Xu, N. Cheng, and X. Shen, “Privacy-preserving efficient verifiable deep packet inspection for cloud-assisted middlebox,” IEEE Transactions on Cloud Computing, vol. 10, no. 2, pp. 1052–1064, 2022. 52. T. P. Pedersen, “Non-interactive and information-theoretic secure verifiable secret sharing,” in Annual International Cryptology Conference. Springer, 1991, pp. 129–140. 53. A. Tomescu, I. Abraham, V. Buterin, J. Drake, D. Feist, and D. Khovratovich, “Aggregatable subvector commitments for stateless cryptocurrencies.” IACR Cryptol. ePrint Arch., vol. 2020, p. 527, 2020. 54. R. C. Merkle, “A certified digital signature,” in Conference on the Theory and Application of Cryptology. Springer, 1989, pp. 218–238. 55. R. W. Lai and G. Malavolta, “Subvector commitments with application to succinct arguments,” in Proc. of CRYPTO, 2019, pp. 530–560. 56. R. Gennaro, C. Gentry, B. Parno, and M. Raykova, “Quadratic span programs and succinct NIZKS without PCPs,” in Proc. of EUROCRYPT. Springer, 2013, pp. 626–645. 57. S. Agrawal, C. Ganesh, and P. Mohassel, “Non-interactive zero-knowledge proofs for composite statements,” in Proc. of CRYPTO, 2018, pp. 643–673. 58. M. Campanelli, D. Fiore, and A. Querol, “LegoSNARK: Modular design and composition of succinct zero-knowledge proofs,” in Proc. of ACM CCS, 2019. 59. C. Costello, C. Fournet, J. Howell, M. Kohlweiss, B. Kreuter, M. Naehrig, B. Parno, and S. Zahur, “Geppetto: Versatile verifiable computation,” in Proc. of IEEE S&P, 2015, pp. 253– 270. 60. J. Groth, “On the size of pairing-based non-interactive arguments,” in Proc. of EUROCRYPT. Springer, 2016, pp. 305–326. 61. E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, K. Christidis, A. De Caro, D. Enyeart, C. Ferris, G. Laventman, Y. Manevich et al., “Hyperledger Fabric: a distributed operating system for permissioned blockchains,” in Proc. of the Thirteenth EuroSys Conference, 2018, pp. 1–15. 62. A. De Caro and V. Iovino, “jPBC: Java pairing based cryptography,” in Proceedings of the 16th IEEE Symposium on Computers and Communications, ISCC 2011, Kerkyra, Corfu, Greece, June 28–July 1, 2011, pp. 850–855.

Chapter 5

Fair Data Marketing in HCN

5.1 Motivations and Applications In future wireless networks, data are generated and stored at distributed network stakeholders [1]. For example, mobile operators can collect data about the time and location of wireless accesses, while service providers can store user click and browse history on the Internet. As there will be more intelligence in future networks, these data are of great commercial value in a marketplace: • For data owners, it incurs management costs for collecting and storing the data. To save costs and increase revenue, they would like to sell the data to data buyers at reasonable prices. • For data buyers, they are motivated to buy data and use the data for various services or applications, such as training an AI model or developing new products. To this end, data marketing for data owners and data buyers is a promising business paradigm for future wireless networks. More specifically, there can be various applications: • Cross-layer optimization: Network operations often require the sharing of data between network stakeholders at different network layers. For example, mobile operators can use the data from service providers to customize radio resource allocations. • Personalized network service: User data such as physical locations and network access history can be used by service providers to build user profiles and provide personalized network services. • AI-assisted network management [2]: Network controllers can buy data from operators, users, and service providers. Using the data, the controllers can train AI models for system-level network management, such as global network slice reservations and configurations. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 D. Liu, X. (Sherman) Shen, Blockchain-Based Data Security in Heterogeneous Communications Networks, Wireless Networks, https://doi.org/10.1007/978-3-031-52477-6_5

137

138

5 Fair Data Marketing in HCN

The remainder of this chapter is organized as follows. We discuss the application requirements of data marketing for HCN in Sect. 5.2, including regulation compliance, identity privacy, and data marketing fairness. In Sect. 5.3, we investigate the existing works on fair data marketing. We highlight the research efforts on designing fair marketing schemes with on/off-chain models. In Sect. 5.4, we propose a representative construction of blockchain–cloud data marketing [3] in terms of system and threat models, design goals, detailed contractions, and security and performance analysis. Finally, we conclude this chapter in Sect. 5.5.

5.2 Application Requirements With the emergence of data privacy laws, there are multiple application requirements for the data marketing, including regulation compliance, identity privacy, and data marketing fairness.

5.2.1 Regulation Compliance There are various data privacy laws that have taken effect recently. General Data Protection Regulation (GDPR) [4] was launched in Europe in 2018. GDPR specifies responsibility and accountability for entities that are involved in the lifecycle of user personal data, especially for data sharing across EU borders. More specifically, GDPR defines some conceptual entities: • Data subject is an identifiable natural person, while personal data mean the information that can lead to the identification of a data subject. • Data processor is an entity that actually processes the data. • Data controller is an entity that determines the usage purpose and processing methods of the data. In future networks with heterogeneous network stakeholders, there can be multiple entities to jointly determine the usage of user data, which refer to the “joint controllers” in GDPR. In this case, the joint controllers shall determine a data usage agreement in a transparent manner, where the blockchain can serve as a shared platform for the joint controllers [5, 6]. More specifically, the network stakeholders can specify data usage terms and responsibilities for various data processing tasks on the blockchain [7, 8] and use smart contract to manage data marketing instances. To this end, the blockchain is a promising solution to build transparent data marketplace [9]. There are extensive discussions about the impacts of GDPR on data sharing in different industrial sectors:

5.2 Application Requirements

139

• The benefits and challenges of a blockchain-based approach for data sharing in the era of GDPR were discussed in [10] to emphasize that more research efforts need to be directed to the design and implementation of blockchain-based data marketing framework. • The conflicts between GDPR policies and cloud computing models were investigated in [11]. For example, cloud service providers may attempt to store, reuse, and trade data without clear explanations, which emphasizes the necessity to impose a transparent and reliable mechanism for cloud data management. • The GDPR impacts on data sharing in the online advertising system were discussed in [12]. With GDPR enforcement, data sharing between websites and third-party technology vendors, e.g., the ad exchange network, is strictly regulated.

5.2.2 Identity Privacy To motivate data subject to sell their data, identity privacy [13, 14] of the data subject should be preserved in data marketing against data processors and controllers. Failing to protect the identity privacy can lead to the leakage of user daily routines or personal information. More specifically, identity privacy can have the following three requirements: – Anonymity: A malicious entity should not be able to extract the data subject’s real identity from the shared data. That is, any identifiable information that can lead to the real identity of the data subject should be removed. – Unlinkability: It is a more strict privacy requirement than anonymity. Specifically, a data subject can participate in multiple marketing instances on the blockchain. From the on-chain marketing messages, it should be infeasible for the public to determine whether the instances are related to the same data subject [15]. – Tracing: The identity privacy of the data subject should not be unconditional. At certain conditions where a data subject misbehaves, an auditing party [16] should be able to trace the true identity of the data subject.

5.2.3 Data Marketing Fairness To enable a healthy and prosperous marketplace for data trading, the following fairness requirements should be met [17]: – Data quality: Data subjects should provide high-quality data for trading and clearly describe the content of the data. – Financial incentive: Data subjects and other involved entities should be guaranteed fair payments for their data or provided services in the data marketing.

140

5 Fair Data Marketing in HCN

– Accountability enforcement: Dishonest marketing operations, such as providing low-quality data or refusing to pay, should be detected effectively. Corresponding accountability against misbehaving entities should be enforced.

5.3 State-of-the-Art Data Marketing Approaches In this section, we review the recent technological advances in data marketing approaches. First, we investigate the data marketing approaches based on a centralized platform. Then, we discuss the state-of-the-art decentralized data marketing approaches, including on-chain storage model and on/off-chain storage model. Finally, we highlight the design challenges of striking a balance between a decentralized architecture and the marketing fairness.

5.3.1 Centralized Data Marketing Data owners can be entities without powerful storage and computing capabilities. For example, it would be cost-inefficient for an owner of multiple IoT devices to store a large volume of IoT data for trading. As a result, many data owners are outsourcing their data to an external cloud server, which may lead to the loss of physical control over their data and increase privacy risks [18]. Therefore, there are extensive research works on cloud-based data marketing: – Fine-grained data access control over the cloud server was proposed in [19]. Attribute-based access policies were enforced for supporting user updates. – Data sharing over the cloud for a group of users was designed in [20]. A common reference key can be derived for each group member for data sharing with fault detection and tolerance. To reshare cloud data to a group, a broadcast proxy reencryption scheme was proposed in [21] that enables a cloud server to transfer a ciphertext to different authorized receipts. At the same time, identity privacy of the receipts was preserved. – Data processing tasks can also be offloaded to the cloud server [22] using homomorphic encryption. By doing so, the direct sharing of processing results was achieved instead of sharing the original data. For data owners with sufficient computing and storage capabilities, data owners can directly conduct data sharing services with data buyers through the help of sharing service providers. With the ZKP and homomorphic encryption, accountable data sharing was designed in [23] to verify the sharing providers’ operations.

5.3 State-of-the-Art Data Marketing Approaches

141

5.3.2 Decentralized Data Marketing As discussed before, blockchain can be utilized to construct the data marketing platform. Depending on if the blockchain is used to store the original data, the related works can be roughly divided into two types: on-chain model and on/offchain model.

5.3.2.1

On-Chain Model

Using blockchain as a trusted data storage, reliable data sharing can be achieved for various applications: – For financial institutions, it is essential to validate the customer’s information before providing services. A blockchain-based customer info sharing scheme was proposed in [24] for supporting consent-driven data sharing. For vehicular applications, secure sharing of GPS error evolution was proposed in [25]. – A secret sharing scheme over the blockchain was designed in [26]. Specifically, a user can encrypt a secret to be stored and shared on the blockchain via threshold cryptography. Secure exchange of digital identity information over the blockchain was designed in [27] from zk-SNARK. The on-chain model usually stores the data in an encrypted form and manages the data sharing using a smart contract. The on-chain model is suitable for storing data with a small volume. When the data volume becomes large, directly storing and trading the data on the blockchain can be prohibitively expensive.

5.3.2.2

On/off-Chain Model

In on/off-chain model, an external storage unit, e.g., a cloud server, can be used to store the large data set. There are extensive research works on building a blockchainbased data sharing for cloud data [28]: – A conceptual blockchain-based framework [29] was proposed to manage the data sharing between users and customers, where data are encrypted and stored in an off-chain cloud server. Fine-grained access control and data sharing for the cloud data were proposed in [30] based on attribute encryption and smart contract. – A data sharing system based on distributed trust was designed in [31]. The authors carefully specified the anonymity requirements in the data sharing scheme and designed a verifiable anonymous history mechanism. – There are also works that adopted functional encryption or trusted execution environment (TEE) [7, 32] to directly process the encrypted data at the cloud server and only send the processing results to the data buyers. More specifically, data analysis results from TEE can be encrypted and sent to a data trading contract [32] for verifications and payments.

142

5 Fair Data Marketing in HCN

– Blockchain can be used to build auditing systems [33] for cloud-based data services. For example, GDPR rules can be written as codes in smart contract to verify the operation of a cloud server [34]. A blockchain-based data management model for the cloud computing was proposed in [35]. – Early attempts to build a blockchain-based data management architecture with GDPR compliance are explored in [36]. Specifically, they proposed a blockchainbased framework where transparent and reliable access control over personal data were enforced. Data sharing for smart city applications was designed in [37] based on Hyperledger Fabric. Using the channel features, the proposed scheme can achieve private data collection and data access control within a secure channel.

5.3.3 Decentralization and Fairness Dilemma While the blockchain-based data marketing can enhance transparency and trustworthiness, the advantages of the blockchain-based architecture do not come at no cost: • First, existing solutions paid insufficient attentions to the blockchain architecture requirements. While the consortium blockchain is a more suitable architecture considering the low scalability of the public blockchain system, it still requires detailed architecture-level considerations to build an effective consortium for the cloud-based data marketing. • Second, the transparency nature of the blockchain contradicts with the requirements of data confidentiality and user identity privacy in the data marketing. • Third, versatile fairness should be achieved for data owners, third parties, and the cloud server, which requires contract-level designs to boost them behave honestly and enforce accountability effectively against misbehaving parties. Therefore, special efforts should be directed to build a blockchain-based data marketing scheme that preserves user privacy and marketing fairness at the same time.

5.4 Use Case: Blockchain–Cloud Fair Data Marketing In this section, we present a concrete use case for blockchain-based data marketing. We adopt a hybrid model where data owners outsource their data to a cloud server with a blockchain as a controller/auditor to manage data marketing [38–41]. The decentralization and fairness dilemma is considered as follows: • On-chain data confidentiality and identity privacy of data subjects should be preserved in the data marketing [42].

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

143

• Privacy regulations in GDPR [4] for data marketing should be complied. That is, data subjects should be granted the right to be informed and the right to agree/reject [11]. • Fair payments for data subjects and correct data delivery for data buyers should be preserved [17]. Efficient and effective detection of marketing misbehavior with accountability enforcement should be achieved. Compared with the on-chain data marketing model, an on/off-chain marketing model faces the following non-trivial challenges: • First, the existing works utilized a cloud server for data storage and the blockchain for data marketing control. However, most of them still assume trusted single entities, such as a certificate authority (CA) or an honest storage server, which poses a risk to data confidentiality and identity privacy due to the potential single-point failure. Moreover, for identity privacy in a (consortium) blockchain network, insufficient attention has been paid to potential privacy leakage through public communication channels between the data owners and the blockchain. • Second, as we introduce a rational cloud server in our model, some communications between the cloud server and others are off chain. As a result, preserving marketing fairness and enforcing accountability [43] become a challenging issue as the cloud server may not follow the protocol, and communications with the cloud are not publicly verifiable. For example, the cloud server may delay the offchain transmission of the data items to the buyer and claim the buyer’s deposit after the payment confirmation is overdue. To address the above-discussed design challenges, we propose an efficient and fair blockchain–cloud data marketing scheme: • First, we design a hybrid data marketing model with blockchain as a data controller and a rational cloud server as a storage unit. By doing so, on-chain overheads for storing large data sets are reduced. With a data marketing contract, the proposed scheme preserves GDPR rights for data subjects, including right to be informed and right to agree/reject. • Second, with multi-message PS signature and threshold cryptography, the proposed scheme enables distributed management of anonymous credentials for data subjects. That is, identity privacy for data subjects is preserved without a single authority for communicating with the data marketing contract. • Third, we design succinct commitments and efficient ZKP for data marketing operations. We design an on/off-chain marketing protocol by carefully specifying message exchanges between data subjects, data buyers, the cloud, and the blockchain. With financial incentives and efficient verifications of marketing operations, the marketing protocol motivates all rational entities to behave honestly, while dishonest behavior can be efficiently detected. • Finally, we formulate and achieve the security notions through security analysis, including consortium management and marketing fairness. We conduct extensive

144

5 Fair Data Marketing in HCN

experiments on a consortium blockchain network to demonstrate the feasibility of the proposed scheme. In the following, we discuss a representative construction for blockchain–cloud data marketing [3]. First, we present the system and threat model with design goals. Then, we review the building blocks of the blockchain–cloud data marketing. Third, we propose detailed designs with extensive security analysis. Finally, performance evaluations are conducted with off-chain and on-chain benchmarks with a realworld testing network based on Hyperledger Fabric.

5.4.1 Blockchain–Cloud Data Marketing Model In this section, we present the hybrid blockchain–cloud data marketing model. In Fig. 5.1, there are four entities in our model: • Supervising Authority (SA): SA consists of supervising nodes that are independent stakeholders, such as governmental authorities. The duty of SA includes setting up system parameters, issuing/opening anonymous credentials, and managing marketing disputes. • Data Subject (DS): We consider DS to be data owners of massive data generated by IoT applications [44]. It has less computing and storage capability to manage

Tracing

IoT Devices

SA System Setup Registration

DS Data Listing/Trading

CS

Data Marketing Contract

TP

Consortium Blockchain

Fig. 5.1 System model

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

145

the massive data and therefore relies on DC including a powerful cloud server and a transparent blockchain for data marketing services. • Third Party (TP): TP would like to utilize the data from DS for data-intensive applications. For example, TP can be a network manager to use the data and train resource allocation models. TP can also be an enterprise to use the data for developing new products. • Data Controller (DC): DC is a conceptual term in GDPR that is responsible for managing data marketing in this chapter. We divide the role of DC into two actual entities: cloud server (CS) and blockchain (BC): – CS is a powerful storage and computing unit that can help DS with data management services. In data marketing, CS can also directly send the data to TP upon DS’s consent. – BC can be maintained by a trusted consortium, such as SA. It is a transparent ledger that can record data on marketing instances and operations. BC regulates the data marketing process with an external cloud and provides evidential records of data marketing operations for DS/CS/TP. With a data marketing contract on BC, it can also enforce automatic verification of marketing operations of DS/CS/TP. Under the blockchain–cloud architecture, data marketing consists of five phases: setup, registration, data listing, data trading, and tracing: • In setup, SA generates public parameters and determines basic algorithms for use. SA also defines marketing terms and initializes a data marketing contract on BC. • In registration, DS can communicate with each supervising node of SA to obtain partial credentials and aggregate the credentials into one single credential. The aggregated credential can demonstrate the anonymous identity of DS in the data marketing process. • In data listing, DS builds his/her data set as a collection of data blocks with a searchable index. DS encrypts the data set using some symmetric encryption algorithm, e.g., AES256. DS sends the encrypted data set to CS and uploads the identifier and authenticator of the data set to BC. CS verifies the correctness of the stored data with the authenticators on BC. If the verification passes, CS sends a confirmation to the marketing contract. • In data trading, TP searches data items on the cloud and retrieves the data item of interest. TP obtains the identifier/authenticator and the encrypted data of the item to check if the item is correct. Then, TP can send a data request to the marketing contract. If DS approves the data request, DS can send a consent message with an encrypted key for decrypting the requested data item. TP retrieves the consent message, decrypts the data, and checks the validity of the data. TP pays DS if the data are correct; otherwise, TP can make a complaint to SA. • In tracing, if any misbehavior of DS is confirmed, SA can trace the true identity of the DS by opening DS’s anonymous signature.

146

5 Fair Data Marketing in HCN

5.4.2 Security Model and Goals SA is trusted and cannot be compromised. CS is a multi-sector entity in the marketing process that may not always follow the pre-determined rules [11]. Since it is hard to enforce transparency in CS, we introduce BC as a secure and trusted ledger to provide honest recording and execution of data marketing instances. Both TP and DS are considered to be rational, who are motivated to conduct honest operations with financial incentives and accountability enforcement. More specifically, TP will pay for the data item if the item is correctly received; DS will send the correct data to TP if fair payments and GDPR requirements are satisfied [17, 45]. The blockchain–cloud marketing scheme considers the following GDPR requirements: • Right to be informed: DS should be informed of the following information: data item for trading and its means of use and receipt of the item from TP. • Right to agree/reject: Full control of data items should be granted to DS to either agree or reject the data sharing [10]. • Identity privacy: Conditional identity privacy should be guaranteed for DS. In most cases, DS’s real identity should be concealed in the data marketing. When DS misbehavior is confirmed, SA should be able to trace the DS’s true identity. It should be noted that the security notion of identity privacy is similar to the notion of anonymity in cryptography. Given the access to generated anonymous signatures and public parameters, a computationally bounded adversary cannot determine the real identity of the signer. Advanced attacks to recover DS identity, including side-channel leakage or cross-layer identification, are out of the scope of this chapter. Under the security model, the security goals of the blockchain–cloud marketing scheme are as follows: • Consortium Management: GDPR requirements of DS, including right to be informed, right to agree/reject, and identity privacy, should be achieved in a transparent and distributed manner. At the same time, the identity privacy of DS can be removed if any misbehavior is confirmed by SA. • Marketing Fairness: First, DS only sends TP the data if DS is guaranteed to get payment; second, TP only pays DS if TP obtains exactly the same data that DS commits in the marketing contract. Data marketing can involve a wide range of operations and potential misbehavior. Other operations or attacks, such as re-sharing of the data and locking of crypto deposits, are out of the scope of this chapter.

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

147

5.4.3 Design Goals Under the system and security model, the design goals of the blockchain–cloud marketing scheme should include: • Security: Consortium management and marketing fairness should be achieved for blockchain–cloud data marketing. • Efficiency: Hybrid marketing architecture with the cloud and the blockchain should be efficient in real-world implementations. On-chain costs for storage and computing should also be feasible with a consortium blockchain implementation.

5.4.4 Building Blocks In this section, we present the building blocks of the blockchain–cloud data marketing scheme, including cryptographic notations, ElGamal encryption, zeroknowledge proof, multi-message PS signature, and Publicly Verifiable Secret Sharing (PVSS).

5.4.4.1

Cryptographic Notations

G = (G1 , G2 , GT ) is denoted as a set of cyclic groups that is written manipulatively. g and g˜ are group elements from G1 or G2 , respectively. The order of G is a prime number p, and a bilinear pairing for G is denoted as e : G1 × G2 → GT .

.

(5.1)

In this chapter, we use five collision-resistant hash functions: • A hash function that maps a string to an element in Zp is denoted as H : {0, 1}∗ → Zp .

.

(5.2)

• A hash function that maps a string to an element in Zp and an element in G1 is denoted as H0 : {0, 1}∗ → Zp × G1 .

.

(5.3)

• A hash function that maps a string to n elements in G1 is denoted as H1 : {0, 1}∗ → Gn1 .

.

(5.4)

• A hash function that maps an element in G1 to a 256-bit string is denoted as

148

5 Fair Data Marketing in HCN

H2 : G1 → {0, 1}256 .

.

(5.5)

• A hash function that maps a string to a group element in G1 is denoted as H3 : {0, 1}∗ → G1 .

.

(5.6)

It should be noted that all the hash functions are modeled as random oracles for the security of the blockchain–cloud marketing scheme.

5.4.4.2

ElGamal Encryption

We adopt the ElGamal encryption [46] in this chapter. Suppose we have a set of groups .G with a public generator .g ∈ G1 . As discussed in Chap. 2, ElGamal encryption consists of three algorithms: KeyGen, Enc, and Dec: • KeyGen generates a public/private key pair as .(sk, pk). • Enc takes a message in .Zp and a public key to output a ciphertext. • Dec takes a private key and a ciphertext to output a message. ElGmal encryption is secure if the computational Diffie–Hellman assumption holds in .G1 . It should be noted that we lift the message to the exponentiation, i.e., m .g , which will be used to construct an identity token for tracing misbehaving DS.

5.4.4.3

Zero-Knowledge Proof

For the blockchain–cloud marketing scheme, we utilize AND relations of Sigma protocols [47–49] with Fiat–Shamir transformation. That is, there can be multiple secrets across the exponents of various public generators. For detailed construction, please refer to Chap. 2. We require the standard security notions of ZKP, including completeness, soundness, and zero knowledge.

5.4.4.4

Multi-message PS Signature

As discussed in Chap. 2, PS signature scheme [50, 51] enables a signer to sign on multiple messages and obtain a short signature. Since the obtained signature can be randomized in verifications, PS signature can be used to construct and prove knowledge of the signature for anonymous credentials. In this chapter, the multimessage variation of PS signature (MPS) [52] is adopted where multiple signatures by different signers can be aggregated to obtain a single signature. Specifically, MPS consists of the following algorithms: - i }i∈[n] , vk) • .MP S.KeyGen(G, g, g) ˜ → ({ski , pk

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

149

Suppose there are n signers in the system. Given public parameters .G, g, and g, ˜ .MP S.KeyGen can generate key pairs for the signers. First, for each singer indexed by i, .MP S.KeyGen selects a set of three private keys as follows:

.

ski = (xi , yi,1 , yi,2 ) ∈ Z3p .

.

(5.7)

With the private keys, .MP S.KeyGen computes the public keys as follows: - i = (X˜ i , Y˜i,1 , Y˜i,2 ), pk X˜ i = g˜ xi , .

Y˜i,1 = g˜ yi,1 ,

(5.8)

Y˜i,2 = g˜ yi,2 . In practice, each signer can self-generate a private/public key pair to join the system. To resist malicious key attacks, the signers need also to generate a ZKP to prove knowledge of the private key. Given all the public keys of n signers, .MP S.KeyGen computes: - 1 , pk - 2 , ..., pk -n ) R =H1 (pk =(r1 , r2 , ..., rn ),

.

(5.9)

ri ∈ Z p . MP S.KeyGen further aggregates all the public keys into a single verification key:

.

- = (X˜ A , Y˜A,1 , Y˜A,2 ), vk X˜ A =

n ||

X˜ iri ,

i=1 .

Y˜A,1 =

n ||

ri , Y˜i,1

(5.10)

i=1

Y˜A,2 =

n ||

ri . Y˜i,2

i=1

• .MP S.Sign(pp, ski , m) → πi Given public parameters, a signer’s secret key, and a message (.m ∈ Zp ), .MP S.Sign first computes a hash of the message: H0 (m) → (m' , h).

.

(5.11)

150

5 Fair Data Marketing in HCN

Note that this step is a modified form of the original PS signature that is for security considerations [51]. Then, .MP S.Sign computes: .

'

πi,2 = hxi +yi,1 m+yi,2 m .

(5.12)

The signature of the message using the secret key of signer i is as follows: .

πi = (m' , h, πi,2 ).

(5.13)

• .MP S.Aggregate({πi }i∈[n] , R) → πA Given n signatures on the same message from all the signers and R, .MP S.Aggregate first checks that all signatures are on the same message. Then, .MP S.Aggregate computes: πA,2 =

n ||

.

ri πi,2 ,

i Σn

=h

i

xi ri +m

(5.14) Σn i

yi,1 ri +m'

Σn i

yi,2 ri

.

MP S.Aggregate outputs an aggregated signature as follows:

.

πA = (m' , h, πA,2 ).

.

(5.15)

- → {0, 1} • .MP S.V erify(πA , m, vk) Given an aggregated signature, a message, and the aggregated verification key, .MP S.V erify first computes: H0 (m) → (m' , h)

.

(5.16)

and checks if the obtained .(m' , h) are equivalent to those in .πA . Then, .MP S.V erify checks the following condition: h /= 1G1 .

.

(5.17)

If the condition holds, .MP S.V erify further checks: ? m ˜ m' e(h, X˜ A Y˜A,1 ˜ YA,2 ) = e(πA,2 , g).

.

(5.18)

If all the checks pass, .MP S.V erify outputs accept; if any of the checks fail, MP S.V erify outputs reject.

.

In the proposed scheme, supervising nodes can act as signers using self-generated private/public key pairs. The verification key can be publicly aggregated. For each DS, the supervising nodes can sign on the DS’s chosen blinded ID. By aggregating all the signatures, the DS obtains an anonymous credential. With a ZKP based on

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

151

Sigma protocol, the DS can then prove knowledge of a signature that can be verified by the aggregated verification key.

5.4.4.5

Public Verifiable Secret Sharing (PVSS)

Public Verifiable Secret Sharing (PVSS) [53, 54] is widely used in threshold cryptography. Suppose we have a set of n participants: (P1 , P2 , ..., Pn ).

.

(5.19)

A dealer would like to share a secret s among the participants and t out of the participants can recover the secret. All operations of the dealer and the participant should be verifiable. The scheme is denoted as .(t, n) PVSS that consists of the following algorithms: - i }i∈[n] ) • .P V SS.Setup(G, g˜ 1 ) → ({psk i , ppk Given public parameters .G and .g˜ 1 , the participants can generate their public/private key pair of ElGamal encryption. Specifically, .Pi selects a random secret key as pski ∈ Zp .

.

(5.20)

Using the secret key, .Pi computes a public key as - i = g˜ pski ppk 1

.

(5.21)

- i }i∈[n] , g, g˜ 1 , g˜ 2 ) → ({Ei , πpi }, Hs , {Aj }) • .P V SS.Share(s, {ppk The dealer constructs a polynomial by picking a set of random numbers as coefficients: .

(a1 , ..., at−1 ) ∈ Zt−1 p .

(5.22)

Denote s as the secret to be shared. The dealer sets the first coefficient of the polynomial to be s: .

a0 = s.

(5.23)

Given all the coefficients, the polynomial is denoted as P (x) =

t−1 Σ

.

j =0

aj x j .

(5.24)

152

5 Fair Data Marketing in HCN

To enable verifiable secret sharing, the dealer needs to compute “commitments” for all coefficients: – For each coefficient except for .a0 , the dealer computes Aj = g aj , .

(5.25)

j ∈ [1, t − 1].

– For the secret, the dealer computes Hs = g s .

(5.26)

.

For each participant .Pi , the dealer first encrypts his/her share using the public key: Ei,1 = g˜ 1ri , .

- i i g˜ Ei,2 = ppk 2 r

P (i)

(5.27) .

The encrypted share is denoted as .

Fi = (Ei,1 , Ei,2 ).

(5.28)

The dealer computes a ZKP for the encrypted share as πpi = ZKP {(ri ) : Ei,1 = g˜ 1ri ∧



.

- i i ) = e ⎝Hs e(g, Ei,2 /ppk r

t−1 ||



(5.29)

Aj , g˜ 2 ⎠ } . ij

j =1

Note that the proof can be constructed using standard Sigma protocol. Finally, the dealer broadcasts the following message to all the participants: ({Ei,1 , Ei,2 , πpi }, Hs , {Aj }).

.

(5.30)

• .P V SS.V erify(πpi , Ei,1 , Ei,2 ) → {0, 1} Participants can check the correctness of his/her proof using the verification algorithm of the ZKP. • .P V SS.Recover({pski , Fi }i∈[t] ) → g˜ 2s t out of n participants can recover the secret from shares, which is denoted as .SOt . Each of the t participants decrypts his/her share and computes a ZKP for verifiable secret recovery as follows:

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

153

πFi = ZKP {(pski ) : Ei' = Ei,2 /Ei,1 i ∧ psk

.

(5.31)

- i = g˜ pski }. ppk 1 All the participants in .SOt send .{Ei' , πFi } to an opener. The opener first checks the correctness of all received proofs using the verification algorithm of the ZKP. Then, the opener recovers the secret over .g˜ 2 as follows: .

g˜ 2s =

||

(Ei' )λi ,

Pi ∈SOt

(5.32)

where Lagrange coefficients are constructed as follows: .

λi =

|| Pj ∈SOt ,j /=i

j/(j − i).

(5.33)

Note that .g˜ 2 can be utilized in the tracing phase of the proposed scheme, which is actually used for identifying the signer of an anonymous PS signature.

5.4.5 Representative Constructions We overview the design rationales of the blockchain–cloud marketing scheme as follows: • For identity privacy with consortium management, the multi-message form of PS signature [52] is adopted to enable distributed issuance of anonymous credentials of DS. At the same time, a linking method based on ZKP is designed for DS to prove that two messages are from the same user. To enable threshold opening of misbehaving DS’s identity, an opening token for DS is shared among supervising nodes using PVSS [52, 54]. • Compared with the fair data trading protocol with only a data buyer and a data owner [17], our hybrid model introduces the cloud server as an external computing and storage unit. As a result, off-chain transmissions between the cloud and other entities are not publicly verifiable. To address the issue, we design an on/off-chain data marketing protocol with efficient on-chain commitments and verifications of marketing operations. With time-locked cryptocurrency deposits and on-chain operation verifications, data marketing can only proceed after a sequence of on-chain confirmations from DS, TP, and CS to ensure the marketing fairness. In the following, we present the detailed constructions of the blockchain–cloud marketing scheme, which consists of 5 phases: setup, registration, data listing,

154

5 Fair Data Marketing in HCN

data trading, and tracing. We make the following assumptions for illustrative simplicity: • All off-chain communications are conducted via secure and authenticated channels. Additionally, communications between CS and DS are conducted via anonymous and secure channels. Broadcast messages are transmitted via synchronous channels. • Consortium blockchain network has been securely set up. To conduct secure and authenticated communications with the blockchain, SA, CS, and TP obtain valid credentials from a (distributed) trusted certificate service. Therefore, we omit the descriptions of identity verification and consistency checks in our protocol. • The blockchain can receive and verify transactions from anonymous nodes. Global clock and time stamps are available on the blockchain. The data marketing contract can support time-locked deposits and transfer the deposits to specific entities under certain conditions. On-chain identities of files, marketing sessions, and public parameters are checked for the uniqueness. • Both TP and CS register themselves at SA. TP obtains a public key (.pkt ) for digital encryption and a blockchain identity .memt ; CS obtains a public key (.pkc ) for digital signature and a blockchain identity .memc .

5.4.5.1

Setup

We denote SA as a set of supervising nodes: SA = (S1 , S2 , ..., Sn ).

.

As shown in Fig. 5.2, SA initializes the system using the following steps: • SA sets the public curve parameters:

1. Generate parameters

SA 2. Publish parameters on blockchain

BC Fig. 5.2 Workflow of setup

(5.34)

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

pp = (λ, G, e, p, g, g). ˜

.

155

(5.35)

Generators g and .g˜ should be uniformly distributed. Each supervising node .Si generates the key pairs for MPS scheme using MPS.KeyGen algorithm: .

- i }i∈[n] , vk). MP S.KeyGen(G, g, g) ˜ → ({ski , pk

(5.36)

The aggregated verification key is denoted as .

- = (X˜ A , Y˜A,1 , Y˜A,2 ). vk

(5.37)

Each supervising node initializes PVSS and generates a key pair for ElGamal encryption: .

- i }i∈[n] . P V SS.Setup(G, g) ˜ → {pski , ppk

(5.38)

Each supervising node also runs a distributed protocol [55] to generate a public key .pks for a threshold ElGamal encryption with the threshold number being larger or equal to t. To prevent malicious key attack, each supervising node also needs to prove knowledge of his/her secret key for all the generated public keys to all the other supervising nodes. For example, a ZKP can be constructed for .ski and .pski as follows: - i ∧ ppk - i }. ZKP {(ski , pski ) : pk

.

(5.39)

• Finally, SA publishes the following information to the blockchain: - i , ppk - i , vk, - pks ). (pp, pk

.

5.4.5.2

(5.40)

Registration

As shown in Fig. 5.3, DS works with SA to register his/her identity and obtain an anonymous credential using three steps: • Suppose DS has a unique identity .ids . DS computes the hash of .ids : .

H0 (ids ) = (m' , h),

(5.41)

to obtain a generator h. DS chooses a random secret .sks ∈ Zp . Using public parameters of the system sks and verification keys of the MPS scheme, DS computes shares of .Y˜A,1 using PVSS:

156

5 Fair Data Marketing in HCN

Fig. 5.3 Workflow of registration

3. Credential Verification

DS 1. Credential Request 2. Credential Response

SA .

˜ i }i∈[n] , h, g, P V SS.Share(sks , {ppk ˜ Y˜A,1 ).

(5.42)

More specifically, DS chooses a set of coefficients to construct a polynomial P (x). For each coefficient, DS computes a set of commitments, denoted as .{Aj }. For .sks , DS computes a commitment as follows:

.

.

Hs = hsks .

(5.43)

For security considerations, DS also computes .

gsk = g sks ,

(5.44)

with a ZKP as follows: πsks = ZKP {(sks ) : .

Hs = hsks ∧

(5.45)

gsk = g sks }. For each .Si , DS encrypts the share of .sks as follows: Fi = (Ei,1 , Ei,2 ) .

- i i Y˜ = (g˜ ri , ppk A,1 ), r

P (i)

(5.46)

where .ri ∈ Zp is a randomly chosen number. Using PVSS.Share, DS also computes .πpi for each .Fi . Finally, DS broadcasts the following information to all supervising nodes: ({Aj }j ∈t−1 , Hs , gsk , πsks , ids , {Fi , πpi }i∈[1,n] ).

.

(5.47)

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

157

• Suppose each .Si in SA receives the message from DS. All supervising nodes first check that .ids has not been registered before. This step is crucial since .ids is used to compute h, and it is insecure to compute PS signature on two identical generators. If .sks has not been registered before, .Si computes .

(m' , h) = H0 (ids ).

(5.48)

- i , .Si checks if the encrypted share is correctly With .h, g, ˜ Y˜A,1 , {Aj }, and .ppk computed: .

P V SS.V erify(πpi , Ei,1 , Ei,2 ).

(5.49)

Si also checks the correctness of .πsks . If all the checks pass, .Si uses its secret MPS key .(xi , yi,1 , yi,2 ) to sign on .Hs as follows:

.

'

πi,2 = hxi +yi,2 m Hs i,1 . y

.

(5.50)

This is essentially a PS signature on a committed value. Finally, .Si returns the following information to DS via a secure channel: πi = (m' , h, πi,2 ).

.

(5.51)

• DS collects all partial credentials .πi from SA. To aggregate the credentials, DS uses the public keys of SA and computes - 1 , pk - 2 , ..., pk -n ) R = H1 (pk .

= (r1 , r2 , ..., rn ).

(5.52)

DS runs the aggregation algorithm of MPS: .

MP S.Aggregate({πi }i∈[n] , R),

(5.53)

to obtain an aggregated PS signature: .

πA = (m' , h, πA,2 ).

(5.54)

DS verifies the correctness of .πA using the verification algorithm of MPS: MP S.V erify(πA , sks , vk).

.

(5.55)

In the verification, we use .ids to compute .(m' , h) instead of the original message to be signed:

158

5 Fair Data Marketing in HCN

1. Generate data item

DS 2. List the item on blockchain 3. Send the item to CS

CS

BC 4. Confirmation from CS

Fig. 5.4 Workflow of data listing

– If the verification passes, DS stores .πA for future anonymous communications. – If the verification fails, DS reports the result to SA to determine which individual signature is not valid.

5.4.5.3

Data Listing

As shown in Fig. 5.4, DS works with CS and BC to list data items using 4 steps: • We denote a single file of DS as File. DS sets a unique file identifier and a file description as .idf and .DF , respectively. To encrypt the file, DS uses AES to compute: AES(F ile)H2 (g Ks ) ,

.

(5.56)

where .Ks is randomly chosen from .Zp to compute an AES encryption key .g Ks . Advanced key derivation function may also be adopted to generate the encryption key [56]. Using .pks of SA, DS also encrypts .Ks as follows: Ef = (Ef,1 , Ef,2 ) .

r

= (g rf , pks f g Ks ),

(5.57)

where .rf is randomly chosen from .Zp . DS computes a ZKP to demonstrate encryption of .Ks as follows [57]:

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

159

πf = ZKP {(rf , Ks ) : .

Ef,1 = g rf ∧ Ef,2 =

(5.58)

r pks f g Ks }.

DS chooses a pair of ElGamal encryption keys: (5.59)

(pkf , skf ),

.

for a data buyer to encrypt usage descriptions. If needed, an additional proof of possession of .skf can also be required. DS computes a hash of the file information: .

Hf = H (idf ||AES(F ile)||DF ).

(5.60)

DS chooses a payment address, which can be an anonymous payment address and is denoted as .addrf . DS chooses a random number .rl to compute a linkage token as follows: Tl = H3 (idf )rl .

(5.61)

.

• DS includes all information for the file in .mf : mf = (idf , Hf , Ef , πf , pkf , addrf , Tl ).

.

(5.62)

DS randomizes his/her anonymous credential .πA as follows: .

r (π1' , π2' ) = (hr , πA,2 ),

(5.63)

where .r ∈ Zp is randomly chosen. Then, DS computes an anonymous signature on .mf by proving knowledge of .sks , m' and .rl : πf s = ZKP {(sks , m' , rl ) : .

Tl = H3 (idf )rl ∧

(5.64)

e(π1' , X˜ A )e(π1' , Y˜A,1 )sks e(π1' , Y˜A,2 )

m'

= e(π2' , g)}. ˜

Note that .mf is included in computing the challenge of .πf s . .rl should be kept secret by DS. Finally, DS sends the following information to the data marketing contract: (mf , πf s ).

.

(5.65)

160

5 Fair Data Marketing in HCN

The contract checks the correctness and uniqueness of .mf and .πf s . If the check passes, the contract stores the file information on the blockchain. Note that DS communicates with the consortium blockchain via anonymous transactions [58]. DS can also frequently update the data listing. Specifically, DS generates new encryption keys, chooses new linkage tokens, and computes new proofs. To restrict anyone in the system to see the on-chain storage, side channels can also be constructed in Hyperledger Fabric to enforce access control policies. • DS constructs a message: mc = (idf , AES(F ile), DF ).

.

(5.66)

Using .rl , DS computes an anonymous signature on .mc as a ZKP [59]: πc = ZKP {(rl ) : .

(5.67)

Tl = H3 (idf )rl }.

Note that .mc is included in computing the challenge for .πc . DS sends the following message to CS: (5.68)

(mc , πc ).

.

• CS needs to check the correctness of received data with on-chain authenticators. CS checks the consistency for .idf and .Tl in .mf and .mc . CS also checks that .πc is correct. Then, CS checks the integrity of the received encrypted data as follows: .

?

H (idf ||AES(F ile)||DF ) = Hf .

(5.69)

If all the checks pass, CS can send a confirmation message to the data marketing contract that sets the status of the data item to be valid. CS is motivated to confirm the data item since successful trading of the item will give CS some rewards. There can be the scenario where the following equation holds: H (idf ||AES(F ile)||DF ) /= Hf .

.

(5.70)

In this case, CS sends .(mc , πc ) to SA for investigation. SA conducts the consistency check for on/off-chain data listing messages. SA further checks the correctness of .πc . The two checks are to ensure that CS reports the correct messages from the same DS. Since the hash check fails, this means that DS does not send the correct data item to CS. To this end, SA can mark the data item with identifier .idf as invalid in the marketing contract.

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

161

1. Off-chain data request 4. TP verification

CS

TP 2. On-chain data request

BC 3. DS confirmation

DS Fig. 5.5 Workflow of data trading

5.4.5.4

Data Trading

As shown in Fig. 5.5, the data trading includes the following steps: • TP searches data items on CS to find a specific file with an identifier .idf and the description .DF . Then, TP retrieves the following information from CS and BC: CS : mc = (idf , AES(F ile), DF ), .

BC : mf = (idf , Hf , Ef , πf , pkf , addrf , Tl ).

(5.71)

Note that .idf should be identical across CS and BC. TP checks the following equation: ?

H (idf ||AES(F ile)||DF ) = Hf .

.

(5.72)

If the hash check fails, TP sends the authenticated communication between TP and CS to SA for investigation. SA further recomputes .Hf and checks if it is consistent with the one in the marketing contract. If SA determines .Hf is not correctly computed, SA concludes that the data item was not correctly confirmed by CS and marks the item as invalid on the blockchain. • If the check passes, TP can ensure that correct encrypted data item is retrieved from CS. Then, TP constructs .Dp as a data usage description and encrypts .Dp using .pkf of the data item as follows: Enc(Dp , pkf ).

.

(5.73)

162

5 Fair Data Marketing in HCN

TP chooses a unique session id, sid, and sets .idt as his/her ID. .pkt is TP’s encryption key and the possession of the secret key is proved by TP. TP sends a request to the Request function in the contract: (sid, idf , idt , pkt , Enc(Dp , pkf ), Ct ),

.

(5.74)

where .Ct is the cryptocurrency deposits from TP for payment to DS and CS. The data marketing contract checks the message validity and authenticity using TP’s .memt . In case that TP requires the data item multiple times but does not send the request, CS can refuse to provide further services to TP. • DS retrieves the file request on the blockchain and decrypts .Enc(Dp , pkf ) to obtain .Dp . DS checks the usage descriptions. DS can abort the request by not sending a confirmation to the marketing contract. In this case, TP can request the deposit back when the file request has passed a certain time on the blockchain. If DS approves the file request, DS computes gc = H3 (sid||idf ||idt ), .

Tl' = gcrl ,

(5.75)

where .rl is the linkage secret that is generated in the data listing phase. Using the provided public key .pkt , DS chooses a random number .rf' ∈ Zp and encrypts the file decryption key .Ks as follows: Ef t = (E1' , E2' ) .

'

r'

= (g rf , pkt f g Ks ).

(5.76)

DS includes the above information in a message: mcon = (sid, idf , idt , Ef t , gcrl ).

.

(5.77)

DS computes a signature on .mcon as follows: πf t = ZKP {(rf , rf' , Ks , rl ) : E1 = g rf ∧ '

E1' = g rf ∧ .

r

E2 = pks f g Ks ∧ rf'

E2' = pkt g Ks ∧ Tl' = gcrl ∧ Tl = H3 (idf )rl }.

(5.78)

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

163

mcon is included when generating the challenge for the ZKP. The above ZKP is to prove (publicly) that the same decryption key is included in both .(E1 , E2 ) and .(E1' , E2' ), and DS knows the linkage secret between the data listing message and this data response message. Alternatively, with the proof-of-misbehavior strategy, DS can only encrypt .Ks under TP’s public key. Later, if TP finds out that the decryption key does not work, TP can request DS to generate the proof or locally compute a verifiable decryption of the key for SA’s investigation. Here, we use the linkage secret to construct a signature instead of the anonymous credential of DS. By doing so, on-chain costs of verifying the signature can be reduced. The method is secure as long as DS does not reveal its linkage secret and does not reuse randomness in Sigma protocols. Another approach is to let DS construct the same base name for this data item with its secret key in the exponent. DS can prove knowledge of its secret key with the base name [60]. Finally, DS sends the following message to the contract:

.

(mcon , πf t ).

.

(5.79)

The contract uses .idf to identify the data listing item .(E1 , E2 , Tl ) for consistency check. The contract also checks the validity of .πf t . An alternative approach is to adopt the proof-of-misbehavior strategy. That is, the contract only stores the proof but does not check its correctness. Later, TP can retrieve the proof from the contract, checks the proof offline, and reports the incorrect proof to the contract. By doing so, some of the on-chain verifications can be shifted to off-chain verifications. • Upon seeing a response from DS, TP retrieves .mcon and decrypts .Ef t to get the decryption key .g Ks using .skt . TP decrypts the encrypted file .AES(F ile) using the decryption key: – If the file decryption is successful and the decrypted file complies with the file description, TP sends the following message to the Pay function on the marketing contract: (sid, idf , idt , pay).

.

(5.80)

The contract checks the validity and authenticity of the message. That is, the sender of the data request and this confirmation message is the same TP for the same file. Then, the contract can mark this file request as complete and transfer the deposit of TP to the provided payment address .addrf and CS. – If TP determines that the decrypted file does not comply with the file description, TP sends the following complaint message to Complain function on the contract: (sid, idf , idt , complain).

.

(5.81)

164

5 Fair Data Marketing in HCN

SA checks the consistency of the file identity/message sender and the correctness of proofs in the data listing and the data request messages. If all checks pass, SA can decrypt the .Ef with a distributed ElGamal encryption scheme [61]. Supervising nodes in SA compare the file content with the file description to send a voting message to the marketing contract to determine if the complaint is valid. If the complaint is invalid, the contract can still transfer the TP’s deposit to CS and .addrf and mark the data request/complaint as finalized. If the complaint is valid, the contract can transfer the deposit back to TP and mark the data request/complaint as finished and the data item as invalid. It should be noted that it is not an easy task to assess file content by individual supervising node. Therefore, we adopt a majority-voting mechanism. At the same time, complaints from TP may lead to the decryption of the encrypted file. To prevent TP from making an arbitrary number of complaints, SA can enforce finance incentives or other countermeasures against TP who makes falsified complaints. – If either a confirmation or a complaint message is received after a certain time, SA or CS can send the following message to Resolve function: (sid, idf , idt , pay).

.

(5.82)

The contract checks the message’s validity, freshness, and authenticity. The contract compares the recorded time of the DS confirmation message with the current block time. If the time difference is larger than a threshold, the contract can transfer the TP’s deposit to .addrf and CS and end the data request. The data marketing contract is shown in Algorithm 7.

5.4.5.5

Tracing

Identity privacy is guaranteed for DS in the data marketing process. However, the identity privacy is not unconditional. As shown in Fig. 5.6, CS and TP can work with SA to recover the true identity of misbehaving DS. There are two cases that can lead to the recovery of DS’s true identity: • In the data listing phase, CS receives the off-chain encrypted data item but detects that Hf /= H (idf ||AES(F ile)||DF ).

.

(5.83)

This means that DS does not generate the commitment of the data item correctly on the blockchain. If TP detects the same issue in the data trading phase, this is determined to be caused by CS’s incorrect confirmation in the data listing. • TP detects that the decrypted file does not comply with the file description and makes a complaint to SA. SA checks and confirms the complaint.

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

165

Algorithm 7: Data marketing Set RecF , RecR , RecK to be empty RecF is to store data items, RecR is to store data requests, RecK is to store DS confirmation Function Initialize(mf , πf s ) Check message/proofs validity Set f lagF = 0 Add mf , πf s , f lagF to RecF Function CSConfirm(idf ) Check message validity Set f lagF = 1 for the file Function Request(sid, idf , idt , pkt , Enc(Dp , pkf ), Ct ) Check message validity Set f lagR = 1 Add sid, idf , idt , pkt , Enc(Dp , pkf ), Ct and f lagR to RecR Function DSResponse(mcon , πf t ) Check message/proofs validity Set f lagS = 1 Add mcon , πf t and f lagS to RecK Function Pay(sid, idf , idt , pay) Check message validity and freshness Locate f lagR ∈ RecR , f lagS ∈ RecK by idf , idt , and set them to 0 Transfer Ct to addrf and CS Function Complain(sid, idf , idt , complain) Check message validity and freshness SA starts a majority-voting process to determine the validity of the complain Function Resolve(sid, idf , idt , pay) Check message validity and freshness Check DS confirmation has received Check TP confirmation or complaint has not been received for a certain time Locate f lagR ∈ RecR , f lagS ∈ RecK by idf , idt , and set them to 0 Transfer Ct to addrf and CS

CS

TP

1. CS/TP complain

SA 2. Verification and tracing Fig. 5.6 Workflow of tracing

166

5 Fair Data Marketing in HCN

Algorithm 8: Blacklisting Set RecE , RecR as empty Function Report(Evid) Randomly selection of t supervising nodes Randomly selection of an opening node, SO Add Evid to RecE Function Reveal((ids , T ok)) Compute H (ids|| ) = (m' , h) Compute λi = Sj ∈SOt \Si j/(j − i) ? m' )T ok = Check e(π1' , X˜ A Y˜A,2 e(π2' , g) ˜ Add (ids , T ok) to RecR

Other cases are not discussed in this chapter, which may need SA’s investigation for each case. Since the communications are authenticated in the data marketing process, either CS or TP can send the evidence to SA. After checking the validity of the evidence, SA can send the evidence to a blacklisting contract to reveal the true identity of DS as shown in Algorithm 8. The contract randomly selects a set of t supervising nodes, denoted as .SOt . The supervising nodes conduct the following procedures: • .SOt retrieves an anonymous signature from the evidence as (π1' , π2' ) ∈ πf s .

(5.84)

.

Each .Sj ∈ SOt retrieves each DS identity .ids with encrypted share .(Ej,1 , Ej,2 ) at its storage and computes psk

Oj,ids = e(π1' , Ej,2 /Ej,1 j ).

.

(5.85)

Sj then sends the following message to .SO via a secure channel:

.

Oj = {ids , Oj,ids }.

.

(5.86)

• Upon receiving all messages from .SOt , .SO first computes λi =

||

j/(j − i),

Sj ∈SOt \Si .

T ok =

|| Sj ∈SOt

λ

j Oj,id . s

For each registered DS with .ids , .SO computes

(5.87)

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

.

H0 (ids ) = (m' , h),

167

(5.88)

and checks the following equation: .

? m' e(π1' , X˜ A Y˜A,2 )T ok = e(π2' , g). ˜

(5.89)

SO stops the process if a match is found. Finally, .SO uploads the following information to the Reveal function of the blacklisting contract:

.

(ids , T ok).

.

(5.90)

The contract checks that the opening is correct and adds the information to the blacklist of DS. As a result, the misbehaving DS can no longer utilize the anonymous credential to list data on the blockchain for data marketing. As the size of the blacklist increases, it may result in additional overhead for verifying the non-membership of the blacklist. We argue that the identity revocation may require additional research attention but provide some potential solutions: (1) DS can generate a non-membership proof for every transaction. (2) Smart contracts can encourage other blockchain nodes (e.g., an offline auditor [62]) to check the validity of each signature and report the invalid signatures for claiming the deposit as reward.

5.4.6 Security Analysis In this section, we present the security analysis of the blockchain–cloud marketing scheme. First, we present the security notions of the blockchain and the distributed credential. Then, we demonstrate the proposed scheme achieves the security goals, including consortium management and marketing fairness.

5.4.6.1

Blockchain Security

Consortium blockchain [63] is adopted to achieve secure recording and updating of data marketing operations. There are two security requirements as follows: • First, all nodes who maintain the blockchain, e.g., supervising nodes, should have a consistent view of the ledger storage. This property is achieved with consensus protocols [64]. In a consortium blockchain, various consensus protocols can be adopted, including RAFT, PBFT, etc. • Second, valid data marketing operations can change the ledger storage, e.g., updating the status of a data item. In a consortium blockchain, peer nodes can send chaincode transactions to change ledger states if the transactions satisfy conditions defined in chaincodes and the endorsement policies. After the

168

5 Fair Data Marketing in HCN

chaincode transaction is accepted by peer nodes, the state change is immutable on the blockchain.

5.4.6.2

Credential Security

In the data marketing, DS obtains an anonymous credential for preserving identity privacy, which requires DS to prove knowledge of a randomized PS signature. More specifically, a modified form of PS signature is generated for DS [51], where a unique id is hashed (as a random oracle) to obtain a message and a generator. Then, DS can randomize the obtained signature and prove knowledge of .sks and .m' . To enable consortium management of the credential, the signing and opening ability should be distributed to a set of supervising nodes. Suppose there are n supervising nodes and the threshold number is t. We require that n out of n nodes participate in the credential issuance and t out of n nodes can open an anonymous signature to a specific DS. The security notions are as follows: • Distributed issuance: We adopt the multi-message form of PS signature [52]. That is, each supervising node has an individual signing key to blindly sign a committed secret of DS. Then, all individual signatures can be aggregated into a single signature to be verified by an aggregated verification key. The following conditions need to be met for its security: – All supervising nodes need to correctly follow the issuance protocol. In the key generation phase, the supervising nodes prove knowledge of their secret keys to ensure that the keys are correctly computed. For each individual signature, it can also be verified using the corresponding public key. – The aggregated verification key needs to be securely computed. More specifically, a set of random numbers is generated for aggregating the verification keys. – All supervising nodes need to sign on the same message and the same random generator. That is, the hash function for generating .m' and h must be modeled as a random oracle. It is critical to not generate different signatures using the same generator, which leads to the total break of the signature security. For an adversary who would like to forge a single signature, the adversary either needs to obtain the secret key or break the unforgeability of PS signature. Therefore, the adversary cannot forge a valid aggregated signature. • Threshold opening: The notion requires that a valid anonymous signature can always be traced to a registered DS. To achieve this goal, at least t supervising nodes work collaboratively to open the signature. Threshold opening is guaranteed under the following conditions: – In the setup phase, each supervising node generates a pair of ElGamal encryption keys. The supervising nodes also prove knowledge of the secret keys to ensure they are honestly generated.

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

169

– In the credential request phase, DS must compute an encrypted share of its secret .sks using Shamir’s secret sharing technique. The shares are encoded over the public verification key .YA,1 [52]. DS also needs to prove the knowledge of .sks , and the shares are correctly computed using ZKP technique. The supervising nodes will first verify the correctness of the shares before generating a PS signature. – Due to the security of issuance process, an adversary cannot forge a valid aggregated PS signature without compromising supervising nodes. Without a valid PS signature or the secret .ids , the adversary cannot generate an anonymous signature due to the security of ZKP. This ensures that a valid anonymous signature must come from a registered DS. – At least t supervising nodes correctly recover the encrypted shares [53]. Moreover, ZKPs are generated in the recovery phase by each participated supervising node to demonstrate a correct decryption. The correctness of Shamir’s || λj secret sharing can further guarantee . Sj ∈SOt Oj,id is correctly reconstructed. s || λj For each registered DS, one opening node can use . Sj ∈SOt Oj,id to find a s specific match. An adversary cannot open an anonymous signature unless the adversary knows the secret keys of SA or obtains the decrypted shares. • Anonymity: The notion requires that an adversary cannot extract DS’s identity or secret key given access to public parameters and valid anonymous signatures. The security properties require the following conditions: – In the credential request phase, DS computes a commitment of a secret key with a ZKP to demonstrate its correct construction. By doing so, supervising nodes cannot extract the secret key from the committed secret. – With a valid aggregated PS signature and the corresponding secret key, DS can first choose a random number to compute a randomized signature. By choosing different random numbers, the signatures can be indistinguishable even for the same message with the same secret. – DS then uses a ZKP protocol to prove knowledge of .sks and .m' in the PS verification algorithm. Due to the security of ZKP, an adversary cannot recover the secrets in the generated anonymous signature.

5.4.6.3

Consortium Management

As discussed in Sect. 5.4.2, there are three goals for consortium management: • Right to be informed: DS encrypts the data item and sends the encrypted data to CS. DS also encrypts the decryption key for the data item and uploads the encrypted key to the blockchain. DS will only grant data access by providing another encryption of the key on the blockchain. That is, TP must submit a data request on the blockchain that will be definitely notified to DS.

170

5 Fair Data Marketing in HCN

• Right to agree/reject: To request a data item, TP encrypts a file usage description and uploads the request to the contract. Only DS can review the file usage to determine if the decryption key should be provided to TP. That is, DS has the right to approve or reject the data request. It should be noted that data reshares by TP who has obtained the data are out of the scope of this chapter. • Identity privacy: DS obtains an anonymous credential for the data marketing process. Due to the anonymity of the anonymous credential, the identity privacy of DS is preserved unless dishonest behavior is confirmed. That is, an efficient adversary can only extract .sks from valid anonymous signatures of DS iff the adversary can control t-out-of-n supervising nodes or break the zero-knowledge property of ZKP. Data confidentiality is guaranteed since an efficient adversary cannot break the security of the ElGamal/AES encryption. The data marketing contract gives DS the right to be informed of and the right to control data requests. The contract is implemented over a transparent and trusted consortium blockchain without relying on a single entity. At the same time, the issuance and opening capability of DS credentials are also distributed to a set of supervising nodes without a single trusted certificate authority. Therefore, consortium management of DS rights is achieved in the proposed scheme.

5.4.6.4

Marketing Fairness

The marketing fairness includes twofolds for TP and DS: • To make a data request, TP needs to generate a transaction and make deposits into the marketing contract. After seeing the data request, DS can decide to either approve or reject the request. DS will only send an encrypted key to TP if DS approves the data request, which will be verified by the marketing contract. Since the message exchanges happen on the blockchain, either DS or TP cannot deny the receipt of the DS confirmation message: – If the encrypted key and the file are correct, TP will send a confirmation message to the data marketing contract. The contract checks the message validity and transfers the deposit to DS and CS. – If TP does not send a confirmation to the contract timely, DS can invoke Resolve function on the contract to obtain TP’s deposits. The contract checks if the deadline has passed for TP and transfers the deposit to DS and CS accordingly. In both cases, DS is guaranteed to get payment if DS sends a correct response to TP’s request. • TP should only pay to DS if the received data are correct, which is guaranteed by the following conditions:

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

171

– In the data listing phase, DS generates a set of commitments and proofs for the encrypted data item and decryption key. More specifically, .Hf is to ensure the integrity of the encrypted file and file description; .Ef is the encryption of the decryption key and .πf is to ensure that the encryption is correctly computed; .πf s serves as an anonymous signature that the listing message comes from a registered DS. The comments and proofs are stored on the blockchain. From the security of the hash function, the soundness of the ZKP protocol, and the immutability of the blockchain storage, an adversary cannot modify the DS’s data items on BC. – In the data listing phase, to prevent the cloud server from purposely delaying on-chain confirmation messages, we use part of the buyer’s deposit as financial incentives for CS to honestly verify and confirm the data items. – In the data trading phase, there are off-chain communications between TP and CS. To prevent CS from refusing to send the data item after TP makes a deposit, TP must first contact CS to retrieve the data item of interest and then send the data request with deposits to the marketing contract. After TP makes the data request, it can happen that DS does not respond. In this case, TP can require to take back the deposits after the request has been logged on the blockchain for a period of time. – DS generates a proof .πf t by proving knowledge of the linkage secret .rl in both the data listing and data trading messages. By doing so, DS proves that the exact same decryption key is sent to TP and the two messages come from the same DS. – All the proofs and commitments are publicly verifiable on the blockchain to ensure that TP receives the committed data and decryption key from DS. However, if TP complains to SA that the decrypted file does not match the file description, SA can launch an investigation process and use majority voting to determine if DS misbehaves. If DS misconduct is confirmed by SA, SA can start a blacklisting process to trace DS’s true identity and mark DS’s data item as invalid.

5.4.7 Performance Evaluation In this section, we report the performance of the blockchain–cloud marketing scheme. First, we analyze the computation and storage complexity of the proposed scheme. Second, we present the experimental setup and benchmarks for both onchain and off-chain experiments based on Hyperledger Fabric.

172

5.4.7.1

5 Fair Data Marketing in HCN

Complexity Analysis

We adopt a hybrid blockchain–cloud marketing model where data are stored on the cloud and its authenticator .(Hf ) is stored on the blockchain. For on-chain storage, each full blockchain node needs to maintain a copy of the ledger. We denote the number of blockchain nodes as .nF and the average size of a file is .|F |a . Therefore, the blockchain–cloud marketing model requires an overall storage cost for a file with .Hf as nF ∗ |Hf | + |F |a .

(5.91)

.

For the on-chain marketing model, data are directly stored on each full blockchain node’s storage, which results in an overall storage cost as nF ∗ |F |a .

(5.92)

.

We set .nF to be 80 or 100 for a practical consortium blockchain. The size of Hf is roughly 512 bits in the proposed scheme. As shown in Fig. 5.7, the proposed hybrid data marketing model can significantly reduce the overall storage costs. For example, storing the data on the blockchain can cost a few thousand MBs, while storing the hash of the data only costs a few bytes. This is because each blockchain node only needs to store a succinct hash of the file. We show the computational complexity analysis when the data listing and data trading phases are conducted honestly for a single data item. We mainly consider cryptographic operations, including exponentiations over .G, pairing operation, and

.

9000 n F=80, on-chain model

8000

n =100, on-chain model

Overall Storage Cost (MB)

F

n F=80, on/off-chain model

7000 6000 5000 4000 3000 2000 1000 0 0

10

20

30

40

50

File Size (MB)

Fig. 5.7 Overall storage cost

60

70

80

90

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

173

AES. Specifically, we denote .E1 , .E2 , and .Et as exponentiations in .G1 , .G2 , and Gt , respectively, and P as a pairing operation. We denote .AESe and .AESd as the encryption and decryption of AES, respectively. We can see below that only a few cryptographic operations are required for DS to list a data item and conduct the trading. For on-chain overheads, constant operations are required regardless of the file size. Specifically, the computation costs are summarized as follows:

.

• For DS, it requires .11E1 + 2Et + 2P + AESe in the data listing and .12E1 in the data trading. • For TP, it requires .4E1 + AESd in the data trading. • For CS, it requires .2E1 in the data listing. • For BC, it requires .7E1 + 4Et + 4P in the data listing and .14E1 in the data trading. Moreover, a proof-of-misbehavior strategy can be further adopted to reduce the onchain costs. For the communication complexity, the credential issuance involves one-round communication between DS and SA, while blacklisting requires one-round communication between SA and the opening authority that are dominated by the size of the supervising committee. This is reasonable since the proposed scheme aims at achieving a higher security level with consortium regulations. Moreover, the registration and blacklisting happen less frequently compared with the data listing and the data trading. For the data listing and trading, only succinct proofs of the data items are stored on the blockchain, while large files are stored off the blockchain, which significantly reduces the on-chain storage cost.

5.4.7.2

Experimental Setup

We present the experimental settings of both on-chain and off-chain experiments: • For off-chain experiments, we adopt Java Pairing-based Cryptography (JPBC) [65] to implement the distributed credential and PVSS schemes. We choose Type F curve in JPBC that is BN128 curve. The testing bed is a laptop with a 2.3 GHz processor and 8 GB memory, and the system is 64-bit Windows. We write the codes using Java SE 1.8. For the hash function, SHA-512 is adopted. For data encryption, AES-256 is adopted.1 We do not implement the hash-to-point function and thus omit its computation overheads. • For on-chain experiments, we implement the test network of Hyperledger Fabric (release 2.1) on the same laptop. The system is 64-bit Ubuntu 16.04. Smart contract is implemented as chaincode with JPBC as dependencies. Curve parameters are encoded into jar files of JPBC. RAFT consensus is adopted with a single ordering node. We adjust the number of organizations and the number of

1 https://github.com/mkyong/core-java/tree/master/java-crypto.

174

5 Fair Data Marketing in HCN

peer nodes in each organization. Other public parameters, such as generators and public keys, are written as static data in chaincode.

5.4.7.3

Off-Chain Performance

We implement and test the performance of the algorithms in the blockchain–cloud marketing scheme. Suppose there are n supervising nodes in the system and the threshold number .t = n in our experiments. In the following, we report the computational costs of PVSS and MPS. To obtain accurate results, we ran multiple times of experiments with instantiations of different group generators: • As shown in Fig. 5.8, the total computation costs when DS shares a secret for verification increase linearly with the number of supervising nodes in the system. This is because the shares of the secret need to be computed and transmitted to every other supervising node. The most expensive part is the Verify algorithm since multiple pairing operations are required. The setup phase is very efficient, while the time costs for recovery can reach 1 second when .n = 10. • As shown in Fig. 5.9, we test four functions of the distributed credential issuance, including Setup, Sign, Aggregate, and Verify. In Fig. 5.9, we report the running time of each algorithm. The running time of Setup is linearly increasing with n. Signature aggregation and verification of the aggregated signature cost around 300 ms. It is efficient for a single supervising node to generate a signature.

Setup Share Verify Recover

3000

Time (ms)

2500

2000

1500

1000

500

0 2

4

6

n Fig. 5.8 Computation overheads of PVSS

8

10

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

175

500 Setup Sign Aggregate Verify

Time (ms)

400

300

200

100

0 2

4

6

8

10

n Fig. 5.9 Computation overheads of MPS 900 800

Proof Generation

700

Proof Verification

Time (ms)

600 500 400 300 200 100 0 f

ft

fs

Proof Fig. 5.10 Computation overheads of proof generation

In the data trading and data listing phase, the most expensive parts are to compute and verify the three proofs: .πf , .πf s , and .πf t . As shown in Fig. 5.10, the computational costs are summarized as follows:

176

5 Fair Data Marketing in HCN

• Generating and verifying .πf and .πf t are efficient, which can take less than 100 ms. • The most expensive part is to verify .πf s that can take near 1 second. Generating .πf s takes around 500 ms. We also implement the hash function and AES-256 encryption/decryption. In our testing bed, the encryption rate is 12 MB/s and the decryption rate is 16.8 MB/s. For example, if the size of the file is 1 GB, the decryption time can be roughly 61 seconds. From the above experimental results, we calculate the computation costs for CS, DS, and TP: • In the data listing, we calculate the time costs of CS for verifying a file hash, .πf and .πf s , and of DS for generating an AES key, computing .πf and .πf s , generating a hash of a file, and encrypting a file. • In the data trading, we calculate the time costs of TP for verifying a hash, .πf , .πf s , and .πf t , and decrypting a file. As shown in Fig. 5.11, the computation costs are determined by the size of the file due to the AES encryption/decryption. At the same time, the off-chain computation overheads for CS and DS in the data trading remain very low since it mainly involves the generation and verification of the proof .πf t .

8

DS CS TP

7 6

Time (s)

5 4 3 2 1 0 10

20

30

40

50

MB Fig. 5.11 Computation overheads of data listing/trading

60

70

80

5.4 Use Case: Blockchain–Cloud Fair Data Marketing

5.4.7.4

177

On-Chain Performance

We implement the verifications of the three most complicated proofs: .πf , πf s and .πf t . We write verification algorithms as chaincode functions and encode public parameters into the chaincode. All peer nodes install and approve the same chaincode package in the same channel. We code the off-chain proofs in a shell script, and a peer node can run the script to invoke different verification functions of the contract. We measure the time difference between a function call and a blockchain response. For each experiment, we restart the blockchain network to create a new channel for installing and running the chaincode. We test the response time in three different settings: 2 organizations, 2 peer nodes; 3 organizations, 3 peer nodes; 2 organizations, 4 peer nodes: • As shown in Fig. 5.12, the response time for verifying .πf s is roughly 8.5 seconds, while the response time for verifying .πf and .πf t is roughly 6 seconds. • As shown in Fig. 5.13, it takes roughly 12 seconds to verify .πf s and roughly 10 seconds to verify .πf and .πf t . • As shown in Fig. 5.14, the response time for verifying .πf s can reach to nearly 18 seconds. The response time for verifying .πf t is more steady and fluctuates around 15 seconds. However, it can take 14–16 seconds for .πf . Comparing the results in three different settings, the response time increases with the number of nodes in the blockchain network. Network status can also affect the response time. At the same time, our designs can be adapted to any consensus protocol. It should be noted that we implement the verifications of .πf , πf s and .πf t

9

8.5

Proof fs Proof f Proof ft

Time (s)

8

7.5

7

6.5

6

5.5 1

2

Fig. 5.12 Response time of chaincode I

3

4

5

6

178

5 Fair Data Marketing in HCN 13.5

Proof fs Proof f Proof ft

13 12.5

Time (s)

12 11.5 11 10.5 10 9.5 9 1

2

3

4

5

6

Fig. 5.13 Response time of chaincode II 18 17.5 17

Proof fs Proof f Proof ft

Time (s)

16.5 16 15.5 15 14.5 14 1

2

3

4

5

6

Fig. 5.14 Response time of chaincode III

as the state-change transactions. A pure query function call that does not trigger state changes should be more efficient.

References

179

In this section, we implement the proposed scheme on the real-world blockchain network. From the experimental results, we can identify key performance indicators and conclude that the proposed scheme is practical for real-world data marketing applications.

5.5 Summary and Discussions In this chapter, we have investigated the blockchain-based data marketing. We have reviewed the motivations and applications of fair data marketing in HCN and discussed its application requirements, including regulation compliance, identity privacy, and marketing fairness. We have investigated the state-of-the-art data marketing approaches in terms of centralized or blockchain-based schemes to highlight the design challenges for decentralized and fair data marketing. To address the challenges, we propose a representative constriction based on blockchain and zero-knowledge proofs. First, we adopt a blockchain–cloud hybrid model for data marketing, where the blockchain can serve as a transparent and trusted GDPR controller and the cloud is the storage and processor of massive IoT data. By doing so, data owners are relieved from the heavy computation and storage overheads of directly managing data. Second, with the distributed anonymous credential and a data marketing contract, the proposed scheme has achieved identity privacy of data owners and transparent data marketing. Third, we have designed succinct commitments of data marketing operations with efficient on-chain verifications. The commit-then-complain marketing process can help detect dishonest behaviors of the cloud server, data owner, and data buyer for regulation and accountability enforcement, e.g., identity tracing of a misbehaving data owner. Furthermore, by requiring data buyers to first obtain and confirm data from the cloud server, the proposed scheme has achieved marketing fairness even with a dishonest cloud server. We have conducted detailed security analysis and extensive experiments based on Hyperledger Fabric to demonstrate the security and efficiency of the proposed scheme.

References 1. X. Shen, C. Huang, D. Liu, L. Xue, W. Zhuang, R. Sun, and B. Ying, “Data management for future wireless networks: Architecture, privacy preservation, and regulation,” IEEE Network, vol. 35, no. 1, pp. 8–15, 2021. 2. W. Quan, M. Liu, N. Cheng, X. Zhang, D. Gao, and H. Zhang, “Cybertwin-driven DRL-based adaptive transmission scheduling for software defined vehicular networks,” IEEE Transactions on Vehicular Technology, vol. 71, no. 5, pp. 4607–4619, 2022. 3. D. Liu, C. Huang, J. Ni, X. Lin, and X. Shen, “Blockchain-cloud transparent data marketing: Consortium management and fairness,” IEEE Transactions on Computers, vol. 71, no. 12, pp. 3322–3335, 2022.

180

5 Fair Data Marketing in HCN

4. General Data Protection Regulation (GDPR). https://gdpr-info.eu. Accessed October 2023. 5. Z. Su, Y. Wang, Q. Xu, and N. Zhang, “LVBS: Lightweight vehicular blockchain for secure data sharing in disaster rescue,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 1, pp. 19–32, 2020. 6. X. Liu, S. X. Sun, and G. Huang, “Decentralized services computing paradigm for blockchainbased data governance: Programmability, interoperability, and intelligence,” IEEE Transactions on Services Computing, vol. 13, no. 2, pp. 343–355, 2019. 7. Y. Xiao, N. Zhang, J. Li, W. Lou, and Y. T. Hou, “PrivacyGuard: Enforcing private data usage control with blockchain and attested off-chain contract execution,” in European Symposium on Research in Computer Security. Springer, 2020, pp. 610–629. 8. K. Nguyen, G. Ghinita, M. Naveed, and C. Shahabi, “A privacy-preserving, accountable and spam-resilient geo-marketplace,” in Proc. of ACM SIGSPATIAL, 2019, pp. 299–308. 9. V. Koutsos, D. Papadopoulos, D. Chatzopoulos, S. Tarkoma, and P. Hui, “Agora: A privacyaware data marketplace,” in Proc. of IEEE ICDCS, 2020, pp. 1211–1212. 10. R. Herian, “Blockchain, GDPR, and fantasies of data sovereignty,” Law, Innovation and Technology, pp. 1–19, 2020. 11. S. Shastri, M. Wasserman, and V. Chidambaram, “GDPR anti-patterns: How design and operation of modern cloud-scale systems conflict with GDPR,” arXiv preprint arXiv:1911.00498, 2019. 12. T. Urban, D. Tatang, M. Degeling, T. Holz, and N. Pohlmann, “Measuring the impact of the GDPR on data sharing in ad networks,” in Proc. of ACM Asia Conference on Computer and Communications Security, 2020, pp. 222–235. 13. A. Sonnino, M. Al-Bassam, S. Bano, S. Meiklejohn, and G. Danezis, “Coconut: Threshold issuance selective disclosure credentials with applications to distributed ledgers,” in Proc. of NDSS, 2019. 14. J. Yin, Y. Xiao, Q. Pei, Y. Ju, L. Liu, M. Xiao, and C. Wu, “SmartDID: a novel privacypreserving identity based on blockchain for IoT,” IEEE Internet of Things Journal, vol. 10, no. 8, pp. 6718–6732, 2022. 15. D. Liu, A. Alahmadi, J. Ni, X. Lin, and X. Shen, “Anonymous reputation system for IIoTenabled retail marketing atop PoS blockchain,” IEEE Transactions on Industrial Informatics, vol. 15, no. 6, pp. 3527–3537, 2019. 16. H. Duan, Y. Du, L. Zheng, C. Wang, M. H. Au, and Q. Wang, “Towards practical auditing of dynamic data in decentralized storage,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 1, pp. 708–723, 2022. 17. S. Dziembowski, L. Eckey, and S. Faust, “FairSwap: How to fairly exchange digital goods,” in Proc. of ACM CCS, 2018, pp. 967–984. 18. L. Wei, H. Zhu, Z. Cao, X. Dong, W. Jia, Y. Chen, and A. V. Vasilakos, “Security and privacy for storage and computation in cloud computing,” Information Sciences, vol. 258, pp. 371–386, 2014. 19. S. Xu, G. Yang, Y. Mu, and R. H. Deng, “Secure fine-grained access control and data sharing for dynamic groups in the cloud,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 8, pp. 2101–2113, 2018. 20. J. Shen, T. Zhou, D. He, Y. Zhang, X. Sun, and Y. Xiang, “Block design-based key agreement for group data sharing in cloud computing,” IEEE Transactions on Dependable and Secure Computing, vol. 16, no. 6, pp. 996–1010, 2019. 21. J. Sun, G. Xu, T. Zhang, X. Yang, M. Alazab, and R. H. Deng, “Verifiable, fair and privacypreserving broadcast authorization for flexible data sharing in clouds,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 683–698, 2022. 22. Q. Zhang, L. T. Yang, and Z. Chen, “Privacy preserving deep computation model on cloud for big data feature learning,” IEEE Transactions on Computers, vol. 65, no. 5, pp. 1351–1362, 2015. 23. C. Huang, D. Liu, J. Ni, R. Lu, and X. Shen, “Achieving accountable and efficient data sharing in industrial Internet of Things,” IEEE Transactions on Industrial Informatics, vol. 17, no. 2, pp. 1416–1427, 2020.

References

181

24. K. Bhaskaran, P. Ilfrich, D. Liffman, C. Vecchiola, P. Jayachandran, A. Kumar, F. Lim, K. Nandakumar, Z. Qin, V. Ramakrishna et al., “Double-blind consent-driven data sharing on blockchain,” in Proc. of IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 385–391. 25. C. Li, Y. Fu, F. R. Yu, T. H. Luan, and Y. Zhang, “Vehicle position correction: A vehicular blockchain networks-based GPS error sharing framework,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 898–912, 2021. 26. E. Kokoris-Kogias, E. C. Alp, L. Gasser, P. Jovanovic, E. Syta, and B. Ford, “CALYPSO: Private data management for decentralized ledgers,” Proc. of the VLDB Endowment, vol. 14, no. 4, pp. 586–599, 2020. 27. H. Gunasinghe, A. Kundu, E. Bertino, H. Krawczyk, S. Chari, K. Singh, and D. Su, “PrivIdEx: Privacy preserving and secure exchange of digital identity assets,” in The World Wide Web Conference, 2019, pp. 594–604. 28. B.-K. Zheng, L.-H. Zhu, M. Shen, F. Gao, C. Zhang, Y.-D. Li, and J. Yang, “Scalable and privacy-preserving data sharing based on blockchain,” Journal of Computer Science and Technology, vol. 33, no. 3, pp. 557–567, 2018. 29. X. Zheng, R. R. Mukkamala, R. Vatrapu, and J. Ordieres-Mere, “Blockchain-based personal health data sharing system using cloud storage,” in Proc. of Healthcom, 2018, pp. 1–6. 30. K. Fan, Q. Pan, K. Zhang, Y. Bai, S. Sun, H. Li, and Y. Yang, “A secure and verifiable data sharing scheme based on blockchain in vehicular social networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 6, pp. 5826–5835, 2020. 31. Y. Hu, S. Kumar, and R. A. Popa, “Ghostor: Toward a secure data-sharing system from decentralized trust,” in Proc. of NSDI, 2020, pp. 851–877. 32. W. Dai, C. Dai, K.-K. R. Choo, C. Cui, D. Zou, and H. Jin, “SDTE: A secure blockchain-based data trading ecosystem,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 725–737, 2019. 33. Y. Xu, J. Ren, Y. Zhang, C. Zhang, B. Shen, and Y. Zhang, “Blockchain empowered arbitrable data auditing scheme for network storage as a service,” IEEE Transactions on Services Computing, vol. 13, no. 2, pp. 289–300, 2019. 34. M. Barati and O. Rana, “Tracking GDPR compliance in cloud-based service delivery,” IEEE Transactions on Services Computing, vol. 15, no. 3, pp. 1498–1511, 2022. 35. L. Zhu, Y. Wu, K. Gai, and K.-K. R. Choo, “Controllable and trustworthy blockchain-based cloud data management,” Future Generation Computer Systems, vol. 91, pp. 527–535, 2019. 36. N. B. Truong, K. Sun, G. M. Lee, and Y. Guo, “GDPR-compliant personal data management: A blockchain-based solution,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1746–1761, 2020. 37. I. Makhdoom, I. Zhou, M. Abolhasan, J. Lipman, and W. Ni, “PrivySharing: A blockchainbased framework for privacy-preserving and secure data sharing in smart cities,” Computers & Security, vol. 88, p. 101653, 2020. 38. O. O. Malomo, D. B. Rawat, and M. Garuba, “Next-generation cybersecurity through a blockchain-enabled federated cloud framework,” The Journal of Supercomputing, vol. 74, no. 10, pp. 5099–5126, 2018. 39. E. Fernandes, A. Rahmati, J. Jung, and A. Prakash, “Decentralized action integrity for triggeraction IoT platforms,” in Proc. of NDSS, 2018. 40. M. S. Rahman, A. Al Omar, M. Z. A. Bhuiyan, A. Basu, S. Kiyomoto, and G. Wang, “Accountable cross-border data sharing using blockchain under relaxed trust assumption,” IEEE Transactions on Engineering Management, vol. 67, no. 4, pp. 1476–1486, 2020. 41. D. Francati, G. Ateniese, A. Faye, A. M. Milazzo, A. M. Perillo, L. Schiatti, and G. Giordano, “Audita: A blockchain-based auditing framework for off-chain storage,” in Proceedings of the Ninth International Workshop on Security in Blockchain and Cloud Computing, 2021, pp. 5– 10. 42. J. Liang, Z. Qin, J. Ni, X. Lin, and X. Shen, “Practical and secure SVM classification for cloudbased remote clinical decision services,” IEEE Transactions on Computers, vol. 70, no. 10, pp. 1612–1625, 2021.

182

5 Fair Data Marketing in HCN

43. M. Li, J. Weng, J.-N. Liu, X. Lin, and C. Obimbo, “BB-VDF: Enabling accountability and finegrained access control for vehicular digital forensics through blockchain,” Cryptology ePrint Archive, Report 2020/011, 2020, https://eprint.iacr.org/2020/011. 44. T. Linden, R. Khandelwal, H. Harkous, and K. Fawaz, “The privacy policy landscape after the GDPR,” Proceedings on Privacy Enhancing Technologies, vol. 2020, no. 1, pp. 47–64, 2020. 45. C. Lin, D. He, X. Huang, and K.-K. R. Choo, “OBFP: Optimized blockchain-based fair payment for outsourcing computations in cloud computing,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3241–3253, 2021. 46. T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE Transactions on Information Theory, vol. 31, no. 4, pp. 469–472, 1985. 47. M. Bellare and O. Goldreich, “On defining proofs of knowledge,” in Proc. of CRYPTO. Springer, 1992, pp. 390–420. 48. J. Camenisch and M. Stadler, “Efficient group signature schemes for large groups,” in Proc. of CRYPTO. Springer, 1997, pp. 410–424. 49. T. P. Pedersen, “Non-interactive and information-theoretic secure verifiable secret sharing,” in Annual International Cryptology Conference. Springer, 1991, pp. 129–140. 50. D. Pointcheval and O. Sanders, “Short randomizable signatures,” in Proc. of CT-RSA. Springer, 2016, pp. 111–126. 51. ——, “Reassessing security of randomizable signatures,” in Proc. of CT-RSA. Springer, 2018, pp. 319–338. 52. J. Camenisch, M. Drijvers, A. Lehmann, G. Neven, and P. Towa, “Short threshold dynamic group signatures,” in International Conference on Security and Cryptography for Networks. Springer, 2020, pp. 401–423. 53. B. Schoenmakers, “A simple publicly verifiable secret sharing scheme and its application to electronic voting,” in Proc. of CRYPTO. Springer, 1999, pp. 148–164. 54. P. Feldman, “A practical scheme for non-interactive verifiable secret sharing,” in IEEE Annual Symposium on Foundations of Computer Science, 1987, pp. 427–438. 55. R. Gennaro, S. Jarecki, H. Krawczyk, and T. Rabin, “Secure distributed key generation for discrete-log based cryptosystems,” in International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 1999, pp. 295–310. 56. H. Krawczyk, “Cryptographic extraction and key derivation: The HKDF scheme,” in Proc. of Crypto. Springer, 2010, pp. 631–648. 57. C. P. Schnorr and M. Jakobsson, “Security of signed EIGamal encryption,” in International Conference on the Theory and Application of Cryptology and Information Security. Springer, 2000, pp. 73–89. 58. D. Bogatov, A. De Caro, K. Elkhiyaoui, and B. Tackmann, “Anonymous transactions with revocation and auditing in Hyperledger Fabric.” IACR Cryptol. ePrint Arch., vol. 2019, p. 1097, 2019. 59. C.-P. Schnorr, “Efficient identification and signatures for smart cards,” in Conference on the Theory and Application of Cryptology. Springer, 1989, pp. 239–252. 60. J. Camenisch, M. Drijvers, and A. Lehmann, “Anonymous attestation using the strong Diffie Hellman assumption revisited,” in International Conference on Trust and Trustworthy Computing. Springer, 2016, pp. 1–20. 61. P.-A. Fouque and D. Pointcheval, “Threshold cryptosystems secure against chosen-ciphertext attacks,” in International Conference on the Theory and Application of Cryptology and Information Security. Springer, 2001, pp. 351–368. 62. Q. Wang, C. Wang, K. Ren, W. Lou, and J. Li, “Enabling public auditability and data dynamics for storage security in cloud computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 5, pp. 847–859, 2010. 63. E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, K. Christidis, A. De Caro, D. Enyeart, C. Ferris, G. Laventman, Y. Manevich et al., “Hyperledger Fabric: a distributed operating system for permissioned blockchains,” in Proc. of the Thirteenth EuroSys Conference, 2018, pp. 1–15.

References

183

64. M. Vukoli´c, “The quest for scalable blockchain fabric: Proof-of-work vs. BFT replication,” in International Workshop on Open Problems in Network Security. Springer, 2015, pp. 112–125. 65. A. De Caro and V. Iovino, “JPBC: Java pairing based cryptography,” in Proceedings of the 16th IEEE Symposium on Computers and Communications, ISCC 2011, Kerkyra, Corfu, Greece, June 28–July 1, 2011, pp. 850–855.

Chapter 6

Conclusion and Future Works

6.1 Conclusion This monograph investigates data security challenges in future heterogeneous communications networks. Specifically, future networks are environed to integrate artificial intelligence (AI) into every perspective for efficient resource management and service provisioning. To support AI in future networks, the quality and volume of user and system data become the driving force. However, as data can come from various stakeholders in different trust domains ranging from operators, and vendors to service providers, a reliable solution for distributed data lifecycle management is urgently needed. To achieve reliable data management, this monograph introduces the distributed ledger technology, i.e., blockchain, as the underlying architecture for network stakeholders. Specifically, the blockchain serves as a trusted and shared platform among stakeholders to record and update critical data lifecycle events for regulation compliance purposes. With blockchain being a promising approach, its intrinsic storage and computation costs have also raised significant design challenges for practical applications. At the same time, on-chain data can also suffer from privacy risks due to the transparency nature of the blockchain. Furthermore, the fairness of the platform cannot be preserved trivially due to the various behaviors of blockchain participants. To balance decentralization with efficiency, privacy, and fairness in the blockchain-based approach, this monograph presents designs, implementations, and evaluations for three data security approaches: reliable data provenance, transparent data query, and fair data marketing.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 D. Liu, X. (Sherman) Shen, Blockchain-Based Data Security in Heterogeneous Communications Networks, Wireless Networks, https://doi.org/10.1007/978-3-031-52477-6_6

185

186

6 Conclusion and Future Works

6.1.1 Reliable Data Provenance In future networks, data provenance enables reliable archiving of network runtime data. Network stakeholders from different trust domains can adopt blockchain to collaboratively maintain a provenance graph with causal dependencies, which can help diagnose of global network errors. However, due to the transparency of onchain data, the blockchain-based provenance approach can increase the risk of leakage of sensitive network information. Chapter 3 presents a distributed network provenance scheme based on blockchain to achieve reliable data provenance. To address the privacy challenge, Chap. 3 adopts an on/off-chain computation model based on zk-SNARK. By doing so, only succinct commitments of network provenance data are required to be stored on the blockchain without directly storing the original data. At the same time, off-chain provenance proof can be generated to be efficiently verified on the blockchain. The chapter formalizes and realizes “archiving security” for distributed network provenance and conducts extensive experiments to demonstrate the implementation feasibility of the proposed scheme.

6.1.2 Transparent Data Query Data query lets a data user efficiently retrieve data from a large data set, which can include many query types, such as keyword and range queries. It is a fundamental module and can serve as the starting point for a variety of data applications. For blockchain-based data query scheme, developers can adopt the smart contract technique to directly query on-chain data. However, this approach can incur heavy on-chain storage and computation costs. While the on/off-chain computation model with zk-SNARK reduces on-chain costs, it lacks off-chain proof generation efficiency. To address the efficiency challenge, Chap. 4 presents a blockchain-based data query scheme. First, this chapter designs a blockchain-based framework for collaborative VNF management in future networks. The framework enables VNF providers to manage their VNF information and slice configurations in a transparent and trustworthy manner. Second, this chapter designs an on-chain authenticated VNF dictionary from Pedersen and Merkle commitments. The commitments are integrated with a two-level SNARK system to achieve efficient dictionary pruning and full query execution. By doing so, the random access memory issue of the SNARK technique can be mitigated to significantly increase the off-chain efficiency for proof generation. Furthermore, a proof-of-misbehavior strategy is adopted to reduce the on-chain verification costs of Merkle proofs. Finally, thorough security analysis and experiments on a real-world consortium blockchain are conducted, which showcase that the proposed scheme achieves significant efficiency improvement in off-chain proof generation compared with the existing work.

6.2 Future Works

187

6.1.3 Fair Data Marketing In data marketing, data owners sell their data to buyers to boost data-driven applications in future networks, such as AI-assisted network management. To comply with the regulation requirements of recent privacy laws, a transparent data marketing platform is required to record data usage purposes, data selling instances, etc. At the same time, due to the unpredictable behavior of rational players in the data marketing, fairness issues must be addressed to motivate honest participation and enforce accountability against marketing misbehavior. To address the fairness challenge, Chap. 5 presents a blockchain–cloud data marketing scheme. First, this chapter designs a hybrid data marketing architecture that adopts the cloud as a processing unit and the blockchain as a marketing control unit, which complies with the transparent “joint-controllers” specified in GDPR. Second, this chapter carefully tailors the designs of Sigma-based ZKP techniques, to achieve consortium management and marketing fairness. Specifically, this chapter introduces a rational third-party cloud platform and designs a data marketing protocol with verifiable on-chain operation commitments. Furthermore, without relying on a centralized authority, this chapter realizes distributed issuance and tracing of anonymous credentials of data sellers. With extensive security analysis and real-world blockchain experiments, this chapter demonstrates the efficiency and feasibility of the proposed blockchain–cloud data marketing scheme.

6.2 Future Works This monograph has studied three representative blockchain-based data security approaches and addressed the efficiency, privacy, and fairness challenges. For future works, this monograph further investigates two research directions that take into full consideration AI techniques in future networks. In the following, the background, motivations, challenges, and potential solutions of the two research directions are briefly discussed.

6.2.1 On/off-Chain Computation Model with Modular Designs On/off-chain computation model enables blockchain nodes to offload on-chain computation tasks to less expensive off-chain sides. For example, this monograph has thoroughly investigated the construction of on/off-chain models with zkSNARK techniques to design efficient blockchain-based data query schemes. By using on/off-chain computation models, expensive on-chain overheads can be significantly reduced for practical implementations of blockchain applications. However, as AI-assisted data processing tasks can play a critical role in future

188

6 Conclusion and Future Works

networks, simply adopting the SNARK-based computation model may not be sufficient. First, AI-assisted tasks can increase the computational overheads in model training and model inference, especially when large models are adopted, such as deep neural networks for language processing. Second, the SNARK-based approach lacks efficiency in off-chain proof generation, which can be prohibitively expensive for computations with large AI models. Beyond zk-SNARK, on/off-chain computation models can also be constructed from various verifiable computation (vc) techniques, including SGX, ZKP, and SMC under malicious settings. These techniques have distinctive features (advantages and limitations). For example, SGX-based vc approach is computationally efficient, which however can increase the costs of enclave/key management and is vulnerable to side-channel attacks. Therefore, designing efficient on/off-chain computation models for AI computations requires further research attention, where a modular design strategy can be a potential solution: • Divide and conquer: A complex AI task can be divided into multiple subtasks, such as linear layers or non-linear layers in a deep neural network. For each subtask, a proper privacy model can be constructed to quantify privacy requirements and risks. Based on the privacy model and the computational features of the subtask, efficient instantiations from specific vc techniques can be designed to improve the overall computation efficiency. • Bridge gaps between subtasks: Different vc techniques can require distinctive data representations. For example, either Boolean circuit or arithmetic circuit can be constructed, while commitment scheme designs can also be very different. Therefore, efficient linking methods between different subtasks, such as commitand-prove ZKPs, should be studied to bridge the gaps and achieve efficient onchain verifications. • Implementation and evaluation: Comprehensive implementations should be conducted, and specific optimizations, such as floating data representation, should be applied for different AI tasks. At the same time, management costs, such as enclave package updating and key escrow, should also be considered for practical use cases. With the modular design strategy for on/off-chain computation models, efficient verifiable AI-based data processing can be further studied. That is, the computation results of AI tasks should be efficiently verified on the blockchain, which is important for achieving fair AI model sharing.

6.2.2 Multi-party Fair AI Model Sharing This monograph has investigated fair data sharing for future networks to fuel data for training AI models and providing personalized services. Instead of sharing the data, AI-assisted network management can also boost direct model sharing among network stakeholders. Federated learning-based model training is a typical example.

6.2 Future Works

189

Specifically, edge service providers or mobile operators can train a local traffic prediction model using their own system runtime data. The local models can then be aggregated into a global traffic prediction model. By doing so, the efficiency in system-level network management is increased and local data privacy can be better preserved. However, model sharing is very different from data sharing with new design challenges: First, data trading operations can be simple and efficiently verified on the blockchain. For example, data integrity can be verified using a hash function, and data source can be verified using a digital signature. In comparison, verification of AI models can be computationally expensive, especially for large AI models. Second, AI model sharing usually involves multiple parties rather than two parties in data sharing. As a result, there can be more unpredictable behaviors, which makes marketing fairness a non-trivial task. Based on the mentioned on/off-chain computation models, other emerging techniques can be integrated for achieving multi-party fair model sharing. For example, game theory is efficient in modeling multi-party interactions, and timelocked hash on the blockchain is widely used in enforcing on-chain fairness. Specifically, designing fair AI model sharing for future networks can include the following aspects: • Efficient model verification: AI model training can be divided into multiple submodules to be verified on a public testing data set. To be efficiently verified on the blockchain, commitment of both data sets and model parameters can be designed from Pedersen-, Merkle-, or MAC-based commitment schemes. Integrated with the on/off-chain computation models, critical parts of AI model inference can be publicly verified on the blockchain to determine model quality. • Behavior modeling: From the analysis of real-world attacks, potential malicious behavior in multi-party model sharing should be carefully investigated. For different attack methods, such as data poisoning or model poisoning attacks, efficient verification criteria should be designed. Moreover, for rational model owners, proper reward and penalty mechanisms should be specified in terms of financial or regulation measures, which can be analyzed using game theory. • Accountability contract design: Based on the above-mentioned model verification and behavior modeling, an accountability smart contract can be designed to achieve on-chain fair model sharing. First, model quality can be verified to determine model owner behavior. Second, with proper rewards and penalties in the contract, effective accountability enforcement can be achieved to motivate all participants to behave honestly.

Index

B Blockchain, 9, 23, 57, 92, 138, 185 C Collaborative VNF management, 14, 92, 186 Consortium management, v, 19, 27, 143, 146, 147, 153, 167–170, 187

I Identity privacy, v, 11, 16, 18, 47, 138–140, 142, 143, 146, 153, 164, 170, 179

M Model sharing, 19, 188–189 Modular design, 46, 187–188

D Data management, v, 4, 6–12, 17, 57, 91, 95, 139, 142, 185 Data marketing, v, vi, 12, 15–16, 18, 19, 54, 137–179, 185, 187 Data provenance, v, 12–13, 17, 18, 32, 54, 57–87, 91, 185, 186 Data query, v, 12–18, 54, 91–132, 185–187 Data security, v, vi, 12–17, 19, 23–54, 60, 185, 187 Dictionary pruning, v, 18, 96, 98, 106–108, 118, 119, 127–128, 132, 186 Digital signature, 17, 23, 24, 27, 37, 154, 189 Distributed network provenance, 13, 18, 58, 61–87, 186

N Network intelligence, 3, 4

O On/off-chain computation model, v, 14, 15, 18, 19, 50–54, 61, 62, 87, 97, 132, 186–189

P Privacy-enhancing technology, 31–50, 54 Provenance digest, 64, 66, 68, 85

F Fairness, v, 10–13, 15–17, 97, 138–140, 142, 143, 146, 147, 153, 167, 170–171, 179, 185, 187, 189

S Smart contract, v, 9, 12, 14, 17, 27, 29–31, 53, 60, 85, 95, 97, 99, 121, 138, 141, 142, 167, 173, 186, 189

H Heterogeneous communications networks (HCN), v, vi, 1–2, 4–12, 19, 37, 54, 57–87, 91–132, 137–179, 185

V Virtualized network function (VNF), 3, 14, 18, 92, 96–132, 186

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 D. Liu, X. (Sherman) Shen, Blockchain-Based Data Security in Heterogeneous Communications Networks, Wireless Networks, https://doi.org/10.1007/978-3-031-52477-6

191