269 105 28MB
English Pages 211 [222] Year 2020
Obaid Ur-Rehman, Natasa Zivic Security in Autonomous Driving
Also of interest Vehicle Technology Technical foundations of current and future motor vehicles D. Schramm, B. Hesse, N. Maas, M. Unterreiner, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ---- Electrical Machines A Practical Approach S. K. Peddapelli, S. Gaddam, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ---- Personalized Human-Computer Interaction M. Augstein, E. Herder, W. Wörndl (Ed.), ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ---- Machine Learning and Visual Perception B. Zhang, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ---- Communication in Vehicles Cultural Variability in Speech Systems B. van Over, U. Winter, E. Molina-Markham, S. Lie, D. Carbaugh, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ----
Obaid Ur-Rehman, Natasa Zivic
Security in Autonomous Driving
Authors Dr.-Ing Obaid Ur-Rehman University of Siegen Hölderlinstr. 3 57068 Siegen Germany [email protected] Dr.-Ing. habil. Natasa Zivic Chair for Data Communications Systems University of Siegen Hölderlinstr. 3 57068 Siegen Germany [email protected]
ISBN 978-3-11-062707-7 e-ISBN (PDF) 978-3-11-062961-3 e-ISBN (EPUB) 978-3-11-062715-2 Library of Congress Control Number: 2020945304 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2020 Walter de Gruyter GmbH, Berlin/Boston Cover image: Just_Super/iStock/Getty Images Plus Typesetting: Integra Software Services Pvt. Ltd. Printing and binding: CPI books GmbH, Leck www.degruyter.com
Preface Autonomous vehicles are not fiction anymore. Developments in the recent years have transformed the dream of autonomous vehicles into reality. Vehicles are now able to safely self-navigate themselves with little or no human intervention. This however did not happen overnight. A modern vehicle has taken its inspiration and attributes from technological advancements in many fields such as embedded systems, computer networks, wireless and mobile communications, Internet of things and last but not least data communications, network and computer security. The technology is still evolving at a rapid pace and a whole array of standardization activities are going on. The goal of this book is to bring together and explain in simple words the ideas revolving around the emerging and fascinating field of autonomous driving. However, the focus of the book is on the security aspects of autonomous driving rather than the autonomous driving itself. Autonomous driving is an amalgam of modern technological advancements and classic automotive. With the fusion of modern technology with automotive field, an autonomous vehicle is becoming a smartphone on wheels. It has the ability to self-navigate, drive with little or no intervention of the driver and observe the traffic rules as good as – or sometimes even better than – the driver and that all with safety and security considerations. Just like a smartphone, today’s vehicles are connected to the Internet for many reasons such as user comfort, enhanced connectivity, over-the-air software updates or simply for aid in maintenance. However, connecting such an autonomous vehicle to the Internet without ensuring its security would mean giving hackers a chance to create havoc on the roads. Being connected to the Internet means a high risk of being hackable. If hacked, the vehicles could be remotely be turned into enemy machines, resulting in havoc on public roads and causing lifethreatening damage to not only the road participants such as passengers and pedestrians but also the infrastructure. On the one hand, taking security into consideration has the potential to eliminate – or at least minimize – the risks of being hacked. On the other hand, it also enables integrating more features into vehicles such as the ability to perform financial transactions on the fly without the risk of being robbed. The authors of the book have many years of experience in research and development in communications, network and computer security, data communications and embedded systems. The authors have also recently developed individual experiences in the different fields of automotive industry where applied cryptography and security is extensively used. This fusion of traditional IT security with applied knowledge in automotive world and the lack of a book on the subjects covered in this book led to the idea of writing a book on this topic. This book starts by introducing the topic of cryptography, which lays the foundation of security in Chapter 1. Chapter 2 addresses the topic of threat analysis and risk assessment which is needed to better understand the risks associated with the https://doi.org/10.1515/9783110629613-202
VI
Preface
development and integration of new technologies. With this knowledge, one is able to prioritize possible risks and address the critical ones. Chapter 3 addresses machine learning algorithms that are important for inclusion of artificial intelligence into autonomous vehicles. These concepts are extended in Chapter 4, where anomaly detection methods based on machine learning are discussed. In Chapter 5, distributed ledger technologies are introduced which lay the foundations for machine-to-machine interactions in a secure manner. This also includes a discussion on the blockchain technology and its alternatives. Finally, in Chapter 6, an algorithm for enhanced authentication and error correction through cognitive capability is presented. This algorithm enables authentication of data – including images and video – in the presence of noisy transmissions which may arise due to a vehicle capturing and transmitting images needed for various functions like those of sensors, software updates, multimedia and diverse information to/from the backend and infrastructure.
Contents Preface
V
1 Introduction to Cryptography 1 1.1 Cryptography 1 1.1.1 Branches of Cryptology 1 1.1.1.1 Introduction 1 1.1.1.2 Cryptology 1 1.1.1.3 Cryptography 1 1.1.1.4 Encryption 2 1.1.1.5 Decryption 3 1.1.1.6 Cryptographic Key 3 1.1.1.7 Cryptographic Protocol 3 1.1.1.8 Cryptanalysis 3 1.1.1.9 Cryptanalyst 4 1.1.2 Cryptographic Design Principles 4 1.1.2.1 Confusion 4 1.1.2.2 Diffusion 4 1.1.2.3 Avalanche Effect 4 1.1.2.4 Random Oracle 5 1.1.2.5 Kerckhoffs’ Principle 5 1.1.3 Symmetric Cryptography 5 1.1.3.1 Secret Key 5 1.1.3.2 Number of Keys 5 1.1.4 Asymmetric Cryptography 6 1.1.4.1 Key Pair for Asymmetric Cryptography 6 1.1.4.2 Private Key 6 1.1.4.3 Public Key 6 1.1.4.4 Asymmetric Encryption 7 1.1.4.5 Digital Signatures 8 1.1.4.6 Combination of Encryption and Digital Signatures 1.1.4.7 Man-in-the-Middle Attack 8 1.1.4.8 Digital Certificate 10 1.2 One-Way Hash Function 11 1.2.1 Characteristics 11 1.2.2 Hash Functions in Practice 12 1.3 Block Cipher 12 1.3.1 Construction 12 1.3.2 Padding 13 1.3.3 Block Ciphers in Practice 13 1.3.3.1 Advanced Encryption Standard 13
8
VIII
1.3.3.2 1.4 1.4.1 1.4.2 1.5 1.6 1.6.1 1.6.2 1.6.3 1.6.4 1.7 1.7.1 1.7.2 1.7.3 1.7.4 1.7.5
Contents
Lightweight Cipher PRESENT 16 Block Cipher Modes of Operations 18 ECB Mode 19 CBC Mode 19 Bit Stream Ciphers 20 Message Authentication Codes 21 Authentication 21 MAC Generation Using a Symmetric Block Cipher MAC Generation Using a Dedicated Hash Function Security Aspects of MAC 24 Digital Signatures 25 Digital Signatures with Appendix 25 Digital Signatures with Message Recovery 26 RSA Algorithm 26 Digital Signature Algorithm 29 Elliptic Curve Cryptography 30 References 35
23 23
2 Threat Analysis and Risk Assessment 37 2.1 Background 37 2.1.1 Software in Automotive 37 2.1.2 Threat Model 38 2.1.3 Threat Analysis 38 2.2 Threat Analysis and Risk Assessment in Automotive 39 2.2.1 Security Analysis Methodologies 39 2.2.2 HEAVENS Project Approach 40 2.2.2.1 Threat-Level Parameters 41 2.2.2.2 Impact-Level Parameters 43 2.3 Case Study: Advanced Driver Assistance System 45 2.3.1 System Background 45 2.3.2 Vehicular Architecture 46 2.3.3 Important Elements of the System 46 2.3.4 Threat Identification 50 2.3.5 Risk Assessment 50 2.3.5.1 Automated Parking via Bluetooth 51 2.3.5.2 Remote Access – Mobile Communications 52 2.3.5.3 Remote Access – Wi-Fi 53 2.3.5.4 Radar Spoofing 54 2.3.5.5 Reflashing ADAS ECU through Physical Access 56 2.3.5.6 Summary 57 2.3.5.7 Security Requirements 58 References 59
Contents
3 Machine Learning 61 3.1 Machine Learning Categories 61 3.1.2 Supervised Machine Learning 63 3.1.2.1 Linear Regression 64 3.1.2.2 Logistic Regression 64 3.1.2.3 Support Vector Machines and Support Vector Regression 3.1.2.4 Decision Trees 66 3.1.2.5 Random Forests 67 3.1.2.6 Naïve Bayes 67 3.1.2.7 Artificial Neural Networks 68 3.1.2.8 Bootstrap Aggregating (Bagging) and Boosting 70 3.1.2.9 Stacked Aggregating 71 3.1.3 Unsupervised Machine Learning 71 3.1.3.1 Clustering Algorithms 72 References 84 4 4.1 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.5.1 4.1.6 4.1.6.1 4.1.6.2 4.1.6.3
5 5.1 5.2 5.3 5.3.1 5.3.2 5.3.3 5.4 5.4.1 5.5
65
Machine Learning for Anomaly Detection 87 Intrusion Detection Systems 87 Overview and Categorization 87 Categorization and Properties of ML Techniques in Intrusion Detection Systems 89 Security Levels of Intrusion Detection Systems 91 ML-Based Anomaly Detection 92 Cyberattacks on ML-Based Intrusion Detection Systems 99 Use of GAN Technology for Targeted Circumvention of Protection Systems 102 Use of IDS for Automotive CAN 105 CAN Bus Architecture 106 CAN Bus Vulnerabilities and Possible Attacks 115 Security Solutions for CAN Bus 121 References 131 Distributed Ledger Technologies 137 Introduction 137 Cryptocurrencies 139 Blockchain 141 Proof of Work 147 Bitcoin Vulnerabilities and Handling of Issues Other Blockchain Cryptocurrencies 153 Tangle 154 IOTA Bundle 160 Hashgraph 163
150
IX
X
5.6 5.6.1 5.6.2 5.6.3 5.6.4 5.6.5 5.6.6 5.6.7 5.6.8 5.6.9 5.6.10 5.6.11 5.6.12 5.6.13 5.6.14 5.6.15
6 6.1 6.2 6.3 6.3.1 6.3.2 6.3.3 6.4 6.4.1 6.4.2 6.5 6.6 6.7
Index
Contents
DLT Applications in Autonomous Driving 180 Financial Sector 181 Other Sectors 181 Tracking Supply Chains 182 Autonomous Driving Systems 182 Vehicle Lock/Unlock 183 Payments Related to Cars 184 Ridesharing of Autonomous Cars 184 Unadulterated Reading of Mileage 185 Motivating Ecologically Responsible Driving 185 Reservation of Parking Places 186 Avoiding Traffic Obstacles 186 Data Exchange for Digital Twin Synchronization 186 Registering and Management of Serial Numbers 187 Registries as the Digital Proof of Ownership 187 Software and Documents Release Management 188 References 188 Self-Correcting and Authentication Algorithm for Automotive Applications 190 Self-Learning Algorithms 190 Related Work 191 Error Correction Codes 193 RS codes 194 Turbo Codes 194 Low Density Parity Check Codes 195 Two-Phase Self-Correcting Algorithm 195 Phase-I 196 Phase-II 197 Learning Property 199 Security Analysis 200 Simulation Results 202 References 206 209
1 Introduction to Cryptography 1.1 Cryptography 1.1.1 Branches of Cryptology 1.1.1.1 Introduction Cryptography is essential to providing security. In this chapter, the basic cryptographic concepts and terminologies are introduced. These concepts and terminologies will be used in different contexts in the following chapters. 1.1.1.2 Cryptology Although cryptography is the main point of discussion in this chapter, there is a more generic umbrella term, called cryptology, which can be further split into cryptography and cryptanalysis. 1.1.1.3 Cryptography Cryptography is the study and practice of methods for secure communication between two or more parties in the presence of adversaries. The goal is to conceal the meaning of messages exchanged among the communication partners. Cryptography can be further split into subfields such as symmetric encryption, asymmetric encryption and security protocols. Cryptography provides among others – Data confidentiality through encryption and decryption – Data integrity – Authentication of the communication partners – Authentication of data source Cryptography is ubiquitous in the modern digital. Some of the applications of cryptography include – Online banking – E-mail exchange – Internet communications – Mobile communications – Virtual private networks Fig. 1.1 shows cryptology and some of its most important subfields.
https://doi.org/10.1515/9783110629613-001
2
1 Introduction to Cryptography
Fig. 1.1: Branches of cryptology.
1.1.1.4 Encryption Encryption transforms the original data, called plaintext, by applying mathematical transformations into another form, called ciphertext, which cannot be understood by a third party or an adversary. The transformation is reversible, which means it is always possible to get back the corresponding plaintext from a given ciphertext. Encryption algorithm uses side information, called a cryptographic key, in order to perform the transformation. The key is shared with the recipient through a secure channel. It is not intended to be disclosed to third parties and should be known only to the intended recipients of the message. A simplified view of the encryption and decryption process with the help of a shared key K is shown in Fig. 1.2. The transmission can occur in time or space. When the transmission occurs in time, it means that the encrypted data is stored,
Fig. 1.2: Encryption and decryption.
1.1 Cryptography
3
for example, in a database, and retrieved later when it is decrypted. When the transmission occurs in space, it means that the encrypted data is sent to the receiver over an insecure communication channel. An insecure communication channel is the one which is shared by many communication partners including adversaries. 1.1.1.5 Decryption Decryption is the reverse process of encryption and transforms the ciphertext back to the corresponding plaintext by applying the mathematical transformations in reverse order. In order for decryption to be successful, the same key is needed which was used earlier for encryption of the data. Therefore, it is important to share this key with the desired communication partner and not to disclose it to unintended recipients. 1.1.1.6 Cryptographic Key A cryptographic key is needed for encryption as well as decryption. The key is exchanged between the communication partners prior to the start of encrypted communication, that is, encryption/decryption. The key should be kept secret and never be disclosed. If the key is disclosed, the previous and future communication encrypted with the same key is compromised. 1.1.1.7 Cryptographic Protocol A cryptographic protocol performs a security-related function by applying cryptographic functions, that is, it is the application of cryptographic algorithm for a specific purpose. It describes how a cryptographic algorithm should be used in practice. Most prominent cryptographic protocols used in practice include – Secure hypertext transfer protocol – Transport layer security – Internet protocol security – Medium access control security 1.1.1.8 Cryptanalysis Cryptanalysis is the study and application of methods to defeat the objectives of cryptography, for example, by successfully decrypting the ciphertext without the knowledge of key. This can be done in many ways, such as through mathematical analysis or by exploiting the weaknesses in the implementation of cryptographic algorithms. The weaknesses of implementation might exist in the software implementation of the algorithm, the hardware on which the algorithm is running, in the design of the security protocols which make use of the cryptographic algorithms, in the implementation of these protocols or in a combination of these.
4
1 Introduction to Cryptography
1.1.1.9 Cryptanalyst A person who performs cryptanalysis is called a cryptanalyst. A cryptanalyst is also typically called an attacker or a hacker in layman’s terms. The main goal of a cryptanalyst is defeating the objectives of cryptography. A cryptanalyst is interested in knowing the plaintext corresponding to a given ciphertext. However, the ideal success scenario for a cryptanalyst is his or her ability to extract the cryptographic key used in cryptography. In order to obtain this key, a cryptanalyst can use multiple methods. The most trivial of these methods is a brute-force attack. Brute-force attack looks for all possible combinations of the key, until the right key is found. Considering a key length of 32 bits, this would mean that the cryptanalyst looks for all possible keys. This is equal to trying all of the 232 possibilities, also known as the key space. The effort required by the cryptanalyst can be increased considerably by increasing the key length. For example, doubling the key length to 64 bits will increase the efforts of the bruteforce attack to 264 searches. However, with even larger key lengths, such as 1,024 bits, a brute-force attack becomes impractical with the current technology. A cryptanalyst needs other intelligent methods to reduce the efforts required to search for a key. As an example, dictionary attacks might be used instead of searching the whole key space.
1.1.2 Cryptographic Design Principles 1.1.2.1 Confusion Confusion aims to make the relationship between a given plaintext and the corresponding ciphertext as complicated as possible. Confusion is a mathematical operation, after which it is not possible to link the statistics of the plaintext with the statistics of the corresponding ciphertext. Thus given a ciphertext, it should be impossible to derive the corresponding plaintext or parts of it. Typically, confusion is obtained through bit or byte substitution in today’s ciphers. 1.1.2.2 Diffusion Another important operation used in modern-day cryptography is diffusion. Diffusion aims to dissipate the statistical structure of plaintext into the corresponding ciphertext. Diffusion is achieved when each bit of the input has effect on many bits at the output, which also means that each bit of the output is affected by many input bits. 1.1.2.3 Avalanche Effect Avalanche effect means that changing a single input bit to a cryptographic function results in a change of approximately 50% of the output bits, which in fact leads to diffusion. This is a desirable property of good cryptographic algorithms. This essentially means that by changing a single bit of a plaintext, each bit of the ciphertext changes with a probability of 0.5.
1.1 Cryptography
5
1.1.2.4 Random Oracle For a given input, a random oracle acts like a black box and produces a random output. If the input to a random oracle is repeated, the output is never repeated. Once again, the concept of diffusion can also be linked to a random oracle, where every input bit influences many output bits to achieve randomness. 1.1.2.5 Kerckhoffs’ Principle In cryptography, hiding the cryptographic algorithms is not considered a good idea. This is based on the Kerckhoffs’ principle [1], which states that the security of a cryptosystem does not lie in hiding the cipher but in the key alone. This ultimately means that a cryptosystem should remain secure even if an attacker knows everything about the system, such as the encryption and decryption algorithms. The only exception is the secret key, which should not be public knowledge. Therefore, the cryptographic algorithms should work even if the attacker is able to gain full understanding of the system.
1.1.3 Symmetric Cryptography In symmetric cryptography, a single key is used for both encryption as well as decryption. This key must not be shared with unintended recipients of the encrypted data, thus giving it the name secret or private key. 1.1.3.1 Secret Key Secret key is the key used in symmetric cryptographic techniques by a specified set of entities [2]. Cryptography based on a single key is also known as symmetric or private key cryptography. Prior to encrypted communications, the key needs to be exchanged over a “secure” channel between the communication partners as shown in Fig. 1.3. The sender performs encryption at the source and transmits the ciphertext over an insecure channel, such as the Internet, although an adversary, Eve, should be able to get hold of the communication messages transmitted over the insecure channel, but not the key. The establishment of a secure channel is possible through key exchange protocols which are described later in this chapter. 1.1.3.2 Number of Keys Assume there are n participants who want to exchange encrypted data. If each participant has shared a key with each other participant, then the total number of keys that need to be exchange are n(n–1)/2. Thus, the number of keys will increase quadratically with the number of participants making the job of key management, that is, key generation, exchange, storage and maintenance quite challenging.
6
1 Introduction to Cryptography
Fig. 1.3: Symmetric encryption and decryption.
1.1.4 Asymmetric Cryptography In asymmetric cryptography, proposed by Diffie and Hellmenn [3], the communication partners use different keys for encryption and decryption. The text encrypted with one key cannot be decrypted with the same key rather with the other key. It is, therefore, called asymmetric cryptography. Each communication partner gets a key pair in asymmetric cryptography, out of which one of the keys, called the public key, is made publically available, whereas the other one, called the private key, is kept private by the participant. It is not possible to obtain the private key with the knowledge of the public key and therefore, the public key can be shared with all communication participants. 1.1.4.1 Key Pair for Asymmetric Cryptography Key pair is a pair of related keys, called private and public keys, for asymmetric cryptography. The private key defines the private transformation and the public key defines the public transformation [4]. 1.1.4.2 Private Key Private key is the key of an entity’s asymmetric key pair which should only be known to and used by that entity [5]. A private key is solely reserved for that entity and should not be made public, hence the name. 1.1.4.3 Public Key Public key of an entity’s asymmetric key pair can be made public [5]. However, as the key is intended to be made public, it must fulfill certain conditions listed below:
1.1 Cryptography
7
– Given a public key, it should not be possible to derive the private key from it through inverse operations. – Even if one can transform a chosen plaintext or ciphertext with the public key, it should still not be possible to deduce the private key from it. Asymmetric cryptography can be used to achieve multiple purposes, such as encryption and digital signatures. In principle, the private key can be derived from the public key. In order to make it sufficiently difficult, algorithms based on the solution of problems in complexity theory are chosen. Such functions are called oneway trap door functions. One-way functions have the property that for a given input argument, it is easy to find the function value. However, it is not possible with a reasonable effort, to do the inverse transformation, that is, to go back from the function value to the input argument. If there is a parameter, such as a private key, which makes it easier to do the inverse transformation, it is called a one-way trapdoor function. 1.1.4.4 Asymmetric Encryption Asymmetric cryptography can be used for data encryption and digital signatures. In the former case, the public key is used for encryption whereas the private key for decryption. Let us assume that there are two communication partners, A and B, as shown in Fig. 1.4. Let us also assume that the communication partner A wants to send a confidential message to the communication partner B. In order to send a confidential message, A takes the public key of B and encrypts the message with it. Since only B has the corresponding private key, only B can decrypt the message encrypted with the public key of B.
Fig. 1.4: Asymmetric encryption and decryption.
Asymmetric encryption can be used for the encryption of secret (shared) keys. The secret key for symmetric encryption can be encrypted using the public key of the intended recipient. Only the indented recipient, with the corresponding private key,
8
1 Introduction to Cryptography
can decrypt and make use of the secret key. The public key should be made available to A, for example, through a public directory, which everyone can access as shown in Fig. 1.5.
Fig. 1.5: Public directory of keys.
1.1.4.5 Digital Signatures Another important usage of asymmetric cryptography, besides encryption, is digital signatures. If a message is encrypted with the private key of the sender, the receiver can decrypt it with the public key of the sender. If successful, it is clearly evident who has sent the message and, therefore, the source of the message can be authenticated. Due to the equivalence of this to signing, it is known as a digital signatures. Such digital signatures can be used for various purposes such as to provide entity authentication, data origin authentication, data integrity and nonrepudiation services [6]. 1.1.4.6 Combination of Encryption and Digital Signatures A message can be encrypted and signed at the same time. This can be done by first signing the message using the private key of the sender A, and then also encrypting the message using the public key of the receiver B, as shown in Fig. 1.6. In effect, the digitally signed message is transmitted encrypted over a public network, but also containing a signed message. 1.1.4.7 Man-in-the-Middle Attack If an attacker has access to the messages exchanged between two legitimate communication partners and can intercept and modify these messages, then a man-in -the-middle attack can be performed by the attacker. In the view of public key
1.1 Cryptography
9
Fig. 1.6: Sign and encrypt.
cryptography, the man-in-the-middle attack can be launched by the attacker, say Eve represented in short as E, to intercept the communication between legitimate users A and B, and possibly also to replace it with the fabricated messages generated by Eve. This works as follows: – A sends its public key to B, which is intercepted by E. This is then replaced by E with its own public key. – B sends its public key to A, which is also intercepted by E. This is then also replaced by E with its own public key. From this point onward, when A sends an encrypted message to B, E intercepts the message which is encrypted with E’s public key. E is therefore able to decrypt it with its own private key in order to read its content. In order for the communication to continue, E then encrypts the message with the public key of B and sends it to B. Now B decrypts the message with its own private key and is unable to notice that the message was already disclosed. The same is true for the communication in opposite direction, that is, from B to A. Since the public keys are publically known, a man-in-the-middle attack has a higher significance in asymmetric cryptography as compared to symmetric cryptography. A man-in-the-middle attack is shown in Fig. 1.7. E pretends to be B while communicating with A and at the same time it pretends to be A when communicating with B.
10
1 Introduction to Cryptography
Fig. 1.7: Man-in-the-middle attack.
1.1.4.8 Digital Certificate A digital certificate is a proof of ownership of a public key. It is an entity’s public key signed by a trusted third party, also known as a certification authority (CA). The public key certificates are normally available in public key databases and contain information about the owner of a certificate, such as name, address and the public key. The CA guarantees, by signing the certificate, that the public key belongs to the owner and that the information about the owner is correct. A digital certificate contains other information about the owner besides the public key. One of the most commonly used standard for digital certificates is the X.509 v3 certificates. X.509 is an international telecommunication union’s standard for public key infrastructure. As shown in Fig. 1.8, an X.509 certificate binds a public key to a distinguished name, an email address and certain other information about the owner.
Fig. 1.8: X.509 v3 digital certificate.
1.2 One-Way Hash Function
11
1.2 One-Way Hash Function 1.2.1 Characteristics A one-way hash function H converts an input M of variable length to an output, H(M), of a fixed length. This characteristic has many functionalities, for example, to calculate the checksum on the data to be transmitted. Hash functions are standardized in ISO standard [7]. The output of the hash function is called the hash code or simply a hash. The specialty of a one-way hash function is its “one way” property. Also, if an input M is given, it should be easy to calculate the output value H(M) = h. A oneway hash function is shown in Fig. 1.9.
Fig. 1.9: One-way hash.
In addition to the one-way property, a hash function should additionally fulfill the following properties: Preimage Resistance: If a hash value h is given, it should be difficult to find a suitable message M such that HðMÞ = h
(1:1)
Second Preimage Resistance: If M and H(M) = h are given, it should be difficult to find another suitable message M′, such that H M′ = HðMÞ = h (1:2) Collision Resistance: It should be difficult to find two different messages M and M′, such that HðMÞ = H M′ These properties are shown in Fig. 1.10.
(1:3)
12
1 Introduction to Cryptography
Fig. 1.10: Properties of a one-way hash.
A one-way hash function is useful in the following cases: – To compress a message to a shorter “fingerprint,” which can be used in digital signatures. It can also be used to verify a given value without disclosing the value itself. – To generate pseudorandom numbers, since the hash must have random characteristics.
1.2.2 Hash Functions in Practice Seven dedicated hash functions, including RIPEMD and its strengthened versions RIPEMD-128 and RIPEMD-160, are specified in RFC 5652 [8]. The other four hash functions are SHA-1, SHA-256, SHA-384 and SHA-512. SHA-1 has a hash code length of up to 160 bits, SHA-256 and SHA-384 have hash code lengths of up to 256 bits and up to 384 bits, respectively. SHA-512 has a fixed hash code length of 512 bits. These dedicated hash functions make use of a round function which is called iteratively.
1.3 Block Cipher 1.3.1 Construction A block cipher operates on a bit string of a fixed predefined length called a block [4]. In block encryption, the encryption algorithm processes one block of plaintext at a time to produce a corresponding block of ciphertext. If the length of the plaintext is larger than one block, then the plaintext is split into multiple blocks, where the last block is padded if needed and the plaintext is encrypted block by block to produce ciphertext blocks. A block cipher operation is shown in Fig. 1.11.
1.3 Block Cipher
13
Fig. 1.11: Block cipher encryption.
1.3.2 Padding Most often, the plaintext is not a multiple of the block length. Thus, the plaintext cannot be divided into equal length blocks. In such cases, the plaintext needs to be padded with additional bits to make it a multiple of the plaintext blocks. Padding is defined as appending extra bits to a data string [7]. There are many ways to perform padding; however, one particular method commonly used is the literature is called PKCS7 padding [8]. PKCS7 Padding In PKCS7 padding, the blocks of bytes are padded with the count of padded bytes. For example, if the plaintext is 14 bytes and needs to be split into groups of 8 bytes, then the last 2 bytes get the value 2. However, if the plaintext is already divisible by the block length, for example, if it is 16 bytes long, then another block of 8 bytes is appended to the plaintext and the value of each byte in this block is set to 8. The padded bytes can be removed by looking at the value of the last byte and removing that many bytes from the data. PKCS7 padding is depicted in Fig. 1.12.
1.3.3 Block Ciphers in Practice 1.3.3.1 Advanced Encryption Standard The Advanced Encryption Standard (AES) is a symmetric block cipher described in ISO/ IEC 18033-3:2010 [9]. AES operates on data blocks of 128 bits and supports three different key lengths, which are 128, 192 and 256 bits. The respective AES algorithms, due to the key lengths, are called AES-128, AES-192 and AES-256. The number of rounds Nr varies with the key length, which is 10, 12 and 14 for key lengths of 128, 192 and 256, respectively. The encryption process using AES is shown in Fig. 1.13.
14
1 Introduction to Cryptography
Fig. 1.12: PKCS7 padding.
Fig. 1.13: AES encryption.
1.3 Block Cipher
15
The data after each transformation is called “state” in AES. The rounds in AES are divided into four byte-oriented transformations called subBytes, shiftRows, mixColumns and addRoundKey. The state is represented as a two dimensional array of bytes in AES. For the data block length of 128 bits, the state is an array of 4×4 bytes. As the names suggest, the function subBytes support byte substitution by a nonlinear substitution function. It is implemented through the substitution boxes, called S-boxes. shiftRows supports shifting of the rows of the state array by different distances. mixColumns combines the bytes of each column of the state array using an invertible linear transformation. mixColumn together with shiftRows provide diffusion. addRoundKey XORs the actual the state of the array with the round key Ki, where 1≤i≤32. The decryption process uses inverse transformations corresponding to each transformation used in the encryption. These inverse transformations are represented by invsubBytes, invshiftRows and invmixColumns. The decryption process using AES is shown in Fig. 1.14.
Fig. 1.14: AES decryption.
The round keys, K′i, used in decryption, are derived from the round keys, Ki, using the following transformation, K′i = Ki ðfor i = 0 or i = Nr Þ or invmixColumns (Ki) for 1 ≤ i ≤ Nr − 1
(1.4)
16
1 Introduction to Cryptography
1.3.3.2 Lightweight Cipher PRESENT For constrained and embedded environments, such as automotive control units or sensors with limited resources, AES might not be a suitable option [10]. In such cases, a lightweight cryptographic algorithm optimized for the resource-constrained systems is a preferred choice. PRESENT algorithm is a lightweight symmetric block cipher specified in ISO/ IEC 29192-2:2012 [11], which processes data blocks of 64 bits and supports key lengths of 80 bits and 128 bits. The respective algorithms are called PRESENT-80 and PRESENT-128. PRESENT cipher processes data in 31 rounds. A total of 32 keys are derived from a given key. One key is used in each round and the 32nd key is used after the last round. The data after each transformation is known as state in PRESENT. Each round has three steps: – addRoundKey – sBoxLayer – pLayer The function addRoundKey XORs the current round key with the current state, that is, addRoundKey = bj ! bj ¯Kij where state is b63 . . .b0 and the key is given as, Ki = Ki63 . . . Ki0 for 1 ≤ i ≤ 32 sBoxLayer is a nonlinear substitution function, which substitutes 4-bit with another 4bits. Thus, such 4-bit to 4-bit substitutions are applied 16 times in parallel in each round. sBoxLayer is shown in Fig. 1.15. The pLayer function permutes the bits of the state to scramble the output state of the current round. pLayer is shown in Fig. 1.16.
Fig. 1.15: PRESENT sBoxLayer.
Finally, addRoundKey is performed over the state and the last (32nd) key after the last round to achieve a whitening effect over the state. The update function produces the next round key for use in the next round. Encryption using PRESENT is shown in Fig. 1.17. The decryption process of PRESENT algorithm, shown in Fig. 1.18, applies inverse operations in the reverse order as that of encryption. It uses addRoundKey, invsBoxLayer and invpLayer.
1.3 Block Cipher
17
Fig. 1.16: PRESENT pLayer.
Fig. 1.17: PRESENT encryption.
Here addRoundKey is similar to that used in encryption. invsBoxLayer is the inverse of the substitution performed by the sBoxLayer, that is, the 4-bit to 4-bit substitutions are reversed. invpLayer performs the permutation in reverse order as compared to the encryption process.
18
1 Introduction to Cryptography
Fig. 1.18: PRESENT decryption.
1.4 Block Cipher Modes of Operations Block ciphers transform N-bit plaintext blocks into N-bit ciphertext blocks and vice versa. This can lead to certain security exploits. It is important that large sizes of data be ciphered and deciphered using a block cipher mode of operation. Following five modes of operations are defined in ISO/IEC 10116:2017 [12] for block ciphers. – Electronic codebook (ECB) – Cipher block chaining (CBC) – Cipher feedback (CFB) – Output feedback (OFB) – Counter In the following, only ECB and CBC modes of encryption are discussed in order to clarify the usage and properties of these modes of operations. It is to be noted that the five modes listed above are not the only modes of operations used in practice. The most noteworthy modes of operation include ciphertext stealing mode, used for disk storage and standardized through IEEE Standard 1619 [13] and NIST Special Publication [14]. Other modes such as Galois countermode (GCM) and Galois message authentication code (GMAC) are standardized by NIST [15]. GCM supports authenticated encryption, that is, authentication and confidentiality at the same time. It is widely adopted for its enhanced performance due to the ability of parallel processing. GMAC is an authentication only counterpart of GCM. Offset codebook mode (OCB) is another mode of operation that supports authenticated encryption. OCB has three different versions, OCB1 [16], OCB2 [17] and OCB3 [18].
1.4 Block Cipher Modes of Operations
19
1.4.1 ECB Mode The ECB mode is similar to using a standard encryption algorithm on its own. The given plaintext is divided into multiple blocks and each block is encrypted independently of the other blocks. This is similar to using a dictionary and for each plaintext block finding a ciphertext block and vice versa. The last part of the message may need to be padded for encryption using a block cipher to succeed. Encryption and decryption of data using ECB mode are shown in Fig. 1.19.
Fig. 1.19: ECB mode encryption and decryption.
As each block of N bits is processed independently of the other blocks, the order of encryption of blocks can be changed as desired without affecting the decryption. Two similar plaintext blocks always yield the same ciphertext blocks in ECB mode. This property is very critical as often repetitive code sequences occur in databases and data communication protocols. It offers also the possibility of chosen plaintext attacks, where an attacker can gain pairs of plaintext and ciphertext. Bit or burst errors in a ciphertext block, results only in the corresponding plaintext block to be erroneous. This is due to the avalanche effect phenomenon as described earlier. However, an error is not propagated beyond the affected blocks. Loss of block boundaries during transmission, for example, by transmission errors, results in the loss of synchronization between encryption and decryption as well. The same plaintext is always encrypted to the same ciphertext if the same key is used.
1.4.2 CBC Mode In CBC mode, the encryption of each block depends on the previous block. In order for the chaining mechanism to also be successful for the first block, an agreed upon
20
1 Introduction to Cryptography
starting value, called the initialization vector (IV), is required which substitutes the previous encrypted block in the first step. The IV is assumed to be public unlike the key. The same plaintext will result in the same ciphertext if the key and IV both are repeated. Since blocks are chained, the ciphertext produced is dependent on the IV and all the previous plaintext blocks. This means that the plaintext blocks cannot be rearranged or shuffled. Using a different IV prevents the same plaintext from producing the same ciphertext using the same key. Encryption and decryption using CBC mode are shown in Fig. 1.20.
Fig. 1.20: CBC mode.
In case of an erroneous ciphertext block, the decryption of that and the succeeding blocks is disrupted. If the ith ciphertext is erroneous, the ith plaintext block has approximately 50% erroneous bits due to avalanche effect [19]. In the (i + 1)th plaintext block, only those bits are wrong that have been disturbed in the ith ciphertext block. This means successful decryption of the last block does not guarantee that the entire message was successfully decrypted.
1.5 Bit Stream Ciphers A bit stream encryption is used when a delay free encryption of a bit stream, is needed. In bit stream ciphers, only a single bit is encrypted at a time. A possible shortfall of the bit stream encryption is that the decryption might return wrong data if the received bit stream is erroneous. Thus, the consideration of synchronization and error propagation properties is of particular importance in bit stream encryption. CFB mode and OFB modes of operation can be used for bit stream encryption.
1.6 Message Authentication Codes
21
OFB-mode is an example for a synchronous stream cipher, CFB mode an example for a self-synchronizing stream cipher. Nevertheless, they are very inefficient for stream encryption, because only one bit of a cipherblock is used for the XOR operation. Bit stream encryption is depicted in Fig. 1.21.
Fig. 1.21: Bit stream encryption.
In practice, the algorithms used for stream ciphers are based on linear feedback shift registers, which are supplemented by non-linear feedback or nonlinear functions to increase the linear equivalent. Each generator, producing a periodic sequence, can be generated by a linear shift register. The only provably secure encryption method is the stream or block encryption with a one-time key stream. One-time key stream means that the probability of the value of a bit is 0.5 and independent on all bits, which have been generated before, and the key stream or parts of it have not been used before. The practical problem with one-time key pad is that the one time key stream has to be shared between the communication partners before the start of encryption. Some of the stream ciphers used in practice include RC4, SEAL, Enocoro and Trivium [20].
1.6 Message Authentication Codes 1.6.1 Authentication Message authentication codes are data integrity mechanisms that compute a short string as a complex function of every bit of the data and of a secret key [21]. Their main security property is unforgeability, that is, if the secret key is known, it should not be possible to predict the MAC on any new data string [21]. A MAC algorithm is defined as [22] an algorithm that maps a string of input bits and a secret key to another string of bits of a fixed length. Additionally, the function must satisfy the following two properties: – The function can be efficiently computed for any key and input string.
22
1 Introduction to Cryptography
– For a given key, with no prior knowledge of the key, it is computationally infeasible to find the function value for a given input string. This is true even if there is a knowledge of a set of input strings with their corresponding function values, where the value of the ith input string might have been chosen after observing the function values for the first i–1 input strings (where i is an integer, such that i>1). MAC is also used for data integrity verification when there is a risk of intentional and unintentional changes introduced in the data during transmission or storage. Data integrity is defined as the property that the data has not been altered or destroyed in an unauthorized manner. ISO has the following three different standards for MAC generation mechanisms: – The mechanisms for MAC using block ciphers [22] – The mechanisms for MAC based on dedicated hash functions [21] – The MAC mechanisms based on a universal hash function [23] MAC algorithms can also be used to ensure the authentication of data origin. Authenticity of data origin means [21] that a message has been originated by an entity in possession of a shared secret key used to generate the MAC tag. This works as follows. For the data to be protected, a cryptographic checksum called the MAC or MAC tag is calculated on it with the help of a secret key and appended to it. The verifier also generates the MAC tag on the data and compares it with the appended tag to verify the origin as well as integrity of the message. The function for calculating a MAC tag, computed over a message M of length m, is a symmetric cryptographic function which results in a MAC tag of fixed length n, such that n≤m. The MAC generation function reduces the m-bit input to n-bit MAC, therefore it is possible that multiple messages have the same MAC tag. MAC codes have many applications such as in banking, data transmission networks, industrial applications and lately in automotive. To verify the order and completeness of message sequences, a time varying parameter such as a sequence number or a timestamp is appended to the message before calculating its MAC tag. The receiver receives a possibly modified message M′ and a possibly modified MAC′. The receiver then calculates a MAC′′ on the received message M′ and compares it with the received MAC′. If MAC′′ is equal to MAC′, it is concluded that the message is received without any errors. This also ensures to the receiver that that the message originated from the sender which whom he shares a secret key. The process of verification using MAC is shown in Fig. 1.22. A MAC can be generated using – a symmetric block algorithm [22], – a dedicated cryptographic hash function that uses a symmetric key, for example, SHA-256 or RIPEMD 160 [21].
1.6 Message Authentication Codes
23
Fig. 1.22: MAC verification.
1.6.2 MAC Generation Using a Symmetric Block Cipher Six algorithms are specified in ISO/IEC 9797-1:2011 [22] for the generation of an mbit MAC using a symmetric n-block cipher like DES, 3DES or AES. This mechanism is also known as CBC MAC [12]. In this approach, the message M is divided into k N-bit blocks M1, M2, . . ., Mk. The last block might need to be padded in order to make it divisible by block boundary. The padding is used temporarily for calculating the MAC tag. It is not required to be transmitted or stored along with the message. From the N-bit output of CBC, the left most m-bits are chosen as the MAC tag. The sender and receiver go through the same process of computing the MAC on the message. If both of them get the same MAC, the receiver assumes that the message comes undisturbed and from the same person who has the same key K as the receiver.
1.6.3 MAC Generation Using a Dedicated Hash Function Three algorithms for MAC generation are given in ISO/IEC 9797-2:2011 [21] that use a secret key and a hash function or its round function. These three dedicated hash functions are chosen from [7]. The strength of the data integrity and message authentication mechanisms is dependent on many factors including the length and secrecy of the key, the length and strength of the hash-function, the length of the MAC tag and on the specific mechanism.
24
1 Introduction to Cryptography
The three hash mechanisms specified in ISO/IEC 9797-2:2011 [21] are: – MDx-MAC: It makes a minor adjustment to the round function by adding a key to the additive constants in the round function – HMAC: It calls the complete hash function two times, in contrast to MDx-MAC. – The third mechanism is a variant of the first one that considers only short strings as input.
1.6.4 Security Aspects of MAC Length Extension Attack Length extension attacks can be performed on MAC algorithms using hashes of type H (key || message), that is, the MAC codes with the construction in which the key is prefixed to the message, for example, MD5, SHA-1 and SHA-256. Let the message M, sent from A to B, be a sequence of blocks x = (x1, x2, x3, . . ., xn) and K be the key used. A computes the authentication tag on M as, m = MACk ðxÞ = hðKjjx1 , x2 , x3 , ....., xn Þ
(1:5)
However, the problem here is that the MAC for another message x′ = (x1, x2, x3, . . ., xn, xn+1) can be constructed from m, by appending a block xn +1 to the message x, without knowledge of the secret key K. Thus, B can be tricked into accepting x′ as a valid message, even though only x was authenticated by A. The attack is possible due to the fact that calculating MAC on the additional message block xn+1 only requires the output of the previous hash, which is equal to A’s m, and xn+1 as input but not the secret key. This problem can be addressed using hash based message authentication code, hashed MAC (HMAC). It uses two hashes called the inner and outer hash. The key (K) is first XORed with an inner pad (ipad) and the result (K¯ipad) is pre-pended as a block to the message blocks (x1, x2, . . ., xn). This results in the hash, hipadx = HðK¯ipadjj x1 jj x2 jj. . . . . .jjxn Þ
(1:6)
As a next step, the key is XORed with the outer pad (opad) and prepended to the hipadx as calculated above. The result is hashed to produce the HMAC of message x as, HMACðxÞ = HðK¯opadjjHðK¯ipadjj x1 jj x2 jj. . . . . .jjxn ÞÞ
(1:7)
The ipad and opad are chosen as constant bit patterns, where opad = 0x5c5c......5c
(1:8)
1.7 Digital Signatures
25
and ipad = 0x3636......36
(1:9)
The lengths of opad and ipad are equal to one block length. Forgery Attack In forgery attacks, the MAC for a sent message M can predicted by an attacker. If the prediction is correct for every message, it is called an “existential forgery” attack. If the prediction is correct for a specific message, the attack is called “selective forgery.” The attacker can also verify if the attack is successful or not called the verifiable or nonverifiable attacks.
1.7 Digital Signatures 1.7.1 Digital Signatures with Appendix Digital signatures can be generated in multiple ways. In digital signature with appendix [6], a hash is computed on a message, which is then signed and transmitted along with the original message (as appendix). At the receiver, the original message is needed at the input for the verification of the digital signature by regenerating the hash and comparing it with the received signed hash. Digital signature schemes with appendix are standardized in ISO/IEC 14888-1:2008 [6]. Digital signatures with appendix is the most common use of digital signatures. Signature generation with this method is shown in Fig. 1.23 and signature verification is shown in Fig. 1.24.
Fig. 1.23: Digital signature generation [6].
26
1 Introduction to Cryptography
Fig. 1.24: Digital signature verification [6].
1.7.2 Digital Signatures with Message Recovery In case of the digital signature with message recovery [24, 25], the verification process reveals all or part of the protected message. The message is supplemented by redundancy, transformed by the signature algorithm, signed and transmitted. At the receiver, the original message is not needed separately for verification since it can be obtained through inverse transformation. It is recommended that at least 50% of the input of the signature generation is redundancy. Therefore, digital signatures with message recovery can only be applied to very short messages, as the complete input of the signature algorithms has to be smaller than the maximum length. Variants of digital signatures with message recovery support one recoverable part of the message and one part protected by a hash value, which is also part of the digital signature.
1.7.3 RSA Algorithm One of the best known asymmetric encryption methods is the RSA algorithm, named after its inventors Ronald Rivest, Adi Shamir and Leonard Adleman [26]. RSA-based digital signature schemes with message recovery are standardized in ISO/IEC 9796-3:2006 [25]. The security of these signature schemes is based on the factorization problem, that is, decomposing a large number n into its prime factors. The encryption, that is, mathematical transformation E, for conversion of the plaintext M into the ciphertext C is represented as, C = EðMÞ = Me mod n
(1:10)
1.7 Digital Signatures
27
The inverse transformation D to get the plaintext M back from the ciphertext C is represented as, M = DðCÞ = Cd mod n
(1:11)
The message M is represented through a positive integer between 0 and n–1. Due to modulo calculations, C is also between 0 and n–1. Messages that are larger than n–1 in their numerical representation are divided into blocks. The RSA algorithm is therefore essentially a block algorithm. It is a commutative process, that is, M = DðEðMÞÞ and M = EðDðMÞÞ
(1:12)
It is therefore suitable for ensuring both the confidentiality as well as the digital signatures. Digital Signature (i.e., mathematical transformation D) for conversion of the plaintext M into the signature S is given as: S = DðMÞ = Md mod n
(1:13)
The inverse transformation E to get the plaintext M back from the signature is given by M = DðSÞ = Se mod n
(1:14)
RSA can be used to generate digital signatures with appendix or with message recovery. The public key is (e, n) and the secret key is (d, n). n is the product of two very large, freely elected prime numbers, such that n = p⋅q. The inventors of this method proposed before at the launch of the process to use for p and q hundred digit primes. Nowadays, mostly for n, a length of 1,024–2,048 bits is chosen. The RSA scheme is shown in Fig. 1.25.
Fig. 1.25: The RSA encryption and digital signature.
Example of RSA-Based Encryption Let us choose p = 11, q = 19 and e = 17;
(1:15)
28
1 Introduction to Cryptography
This gives n = pq = 209 and d = 53;
(1:16)
Let the message M be ‘101’, which is the decimal number 5, then the corresponding ciphertext is C = 517 mod 209 = 80;
(1:17)
The inverse transformation gives the plaintext, that is: M = 8053 mod 209 = 5
(1:18)
Generation of RSA Key System An RSA key system consists of the following keys: – The public key: (e, n) – The private key: (d, n) Note that n is a common component of the public and private keys. The main task in RSA is to find a very large random number n, such that, n = p⋅q. Here p and q are prime numbers and p and q should be slightly different in lengths. This requirement arises due to the fact that p and q are not determined by simple testing of all prime numbers around n1/2. Generation and testing of prime numbers is given in ISO/IEC 18032:2005 [27]. After the generation of p and q, a number e is determined such that: gcd ðe, ðp − 1Þ · ðq − 1ÞÞ = 1
(1:19)
Then d is determined such that: e · d mod ðp − 1Þ · ðq − 1Þ = 1
(1:20)
Extended Euclidean algorithm is used to calculate d. The length of the secret key d of RSA key system must be at least N/3 where N is the length of n. The security of the whole scheme depends on d and the decomposition of n, therefore e is determined such that the arithmetic complexity is minimized. e can be chosen as a constant if e is large enough, such that certain attacks with the chosen plaintext are not possible. It must be considered that e and (p–1)(q–1) do not have any common divisor and d and n are relatively prime. If e is a prime, then this property is automatically fulfilled. Mathematical Background n, e and d are constructed such that e · d mod ðp − 1Þ · ðq − 1Þ = 1
(1:21)
1.7 Digital Signatures
29
and ’ðnÞ = ðp − 1Þ · ðq − 1Þ
(1:22)
where p and q are prime numbers. Then: e d = t · ’ðnÞ + 1, fort ≥ 1
(1:23)
and after the Euler’s rule: M = Cd mod n = ðMe mod nÞd mod n = ðMe Þd mod n = Me d mod n = Mt
· ϕðnÞ + 1
(1:24)
mod n = ðMϕðnÞ Þt · M mod n = 1t · M mod n = M
The message must be large enough for encryption so that no attacks by repetition or division are possible. Therefore, if the message is small, it must be padded before encryption. Similarly, e should not be too small, which in practice is the fourth Fermat number: n
e = 22 + 1
(1:25)
e has a low Hamming weight of 2, which allows very fast computation of the encryption operation E, when the square-and-multiply algorithm is used. There are many optimizations to compute the modular exponentiation of E and S.
1.7.4 Digital Signature Algorithm A variant of the El Gamal method, called digital signature algorithm (DSA), proposed by NIST’s FIPS as Digital Signature Standard (DSS) is also based on the computational intractability of the discrete logarithm problem. It works as follows: – The sender selects a large prime p, where 21,023