133 54 76MB
English Pages 1922 [1137] Year 2024
Table of Contents Preface Part 1: Blockchain and Ethereum Basics Chapter 1: Blockchain and Cryptocurrency Chapter 2: Ethereum Architecture and Ecosystem Chapter 3: Decentralized Finance Chapter 4: EVM-Compatible Blockchain Networks Chapter 5: Deep Research and the Latest Developments in Ethereum Part 2:Ethereum Development Fundamentals Chapter 6: Fundamentals of Solidity Chapter 7: Web3 API Fundamentals Chapter 8: Developing Your Own Cryptocurrency Part 3: Ethereum Development Fundamentals Chapter 9: Smart Contract Development and Test Fundamentals Chapter 10: Writing a Frontend to Build the NFT Marketplace DApp Chapter 11: Ethereum Tools and Frameworks Part 4:Production and Deployment Chapter 12: Setting Up Ethereum Private Chain Chapter 13: Deployment of Your DApps
Chapter 14: Building Ethereum Wallets Chapter 15: Oracles, Technology, and Layer 2 in Practice Part 5:Conclusion Chapter 16: Conclusion Index Other Books You May Enjoy
Preface Welcome to the second edition of Learn Ethereum: A practical guide to help developers set up and run decentralized applications with Ethereum 2.0. This book is an indispensable resource for individuals seeking to understand and master the Ethereum blockchain platform. Within these pages, we embark on a captivating journey through the world of Ethereum, exploring its underlying principles and its potential for revolutionizing industries, and provide a step-by-step process for building Decentralized Applications (DApps). Whether you are a developer, entrepreneur, investor, or enthusiast, this comprehensive guide equips you with the necessary knowledge, tools, and skills to navigate the Ethereum ecosystem with confidence. Ethereum has transformed the execution of smart contracts as a blockchain and decentralized computing platform. This book establishes a solid foundation for comprehending the Ethereum ecosystem, starting with an introduction to blockchain, cryptography, and cryptocurrencies. We will explore vital concepts, such as consensus mechanisms, and mining processes. We will also dive into the architecture of Ethereum 2.0, Ethereum Virtual Machine (EVM), and layer 1/layer 2 (L1/L2) scaling solutions (like optimistic rollups and ZK rollups), as well as the transition to proof of stake (PoS) through the Beacon Chain. Moving forward, we will conduct an in-depth analysis of Decentralized Finance (DeFi), covering token standards, stablecoins, and various DeFi protocols to provide you with a comprehensive understanding of this thriving field. In addition, we will examine the significance of EVM compatibility, shedding light on prominent EVM-compatible blockchain networks, such as BNB Smart Chain, Polygon Chain, and Avalanche Chain. Furthermore, we will delve into advanced topics and the latest developments within the Ethereum ecosystem. We will extensively cover Ethereum’s plan for solving scaling challenges, with a focus on its end game of the rollup centric Ethereum roadmap.
To facilitate practical application, we will devote a significant portion of this book to the Solidity programming language. Through an exploration of its fundamentals, contract patterns, exception handling, and more, we will empower you to develop your own Ethereum DApps. Moreover, we will guide developers in utilizing Ethereum web3 APIs with JavaScript, Python, and Java, enabling seamless interactions with the Ethereum blockchain. Throughout the book, we will present comprehensive guides that will lead you through the entire process of designing, developing, testing, deploying, and monitoring DApps. By creating ERC20, ERC721, and ERC1155 smart contracts, you will gain hands-on experience in building your own cryptocurrencies. To further solidify your understanding, we will provide a practical demonstration in the form of the Decentralized NFT marketplace, employing essential tools such as node.js, Web3, Truffle, and Hardhat. To expand your knowledge and proficiency, we explore a variety of Ethereum tools and frameworks, including IPFS, Infura, Alchemy, and QuickNode. Through practical examples, we will enhance your understanding of these tools. Additionally, we will discuss the distinctions between public and private blockchains, thus guiding you in deploying complete smart contract applications across various blockchain environments. Moreover, we will delve into wallet design, which will enable you to comprehend wallet functionality and construct secure wallet systems. Finally, will we delve into cutting-edge topics such as oracles, cross-chain solutions, and layer 2 technologies. We introduce the concept of oracles and demonstrate how they operate in the decentralized Web 3.0 ecosystem, reacting to real-world events and interacting with traditional systems. Additionally, we briefly explore Ethereum cross-chain bridge technology, which allows users to transmit tokens and arbitrary data between blockchain networks. Lastly, we will examine practical implementations of L2 technologies, equipping you with knowledge of the latest Ethereum advancements. By the end of this book, you will possess a comprehensive understanding of Ethereum, encompassing fundamental concepts, advanced topics, and the latest developments. Furthermore, you will have the ability to write smart
contracts, and develop, test, and deploy DApps using a diverse array of tools, wallets, and frameworks Now, let us embark on this exciting journey into the Ethereum ecosystem. Together, we will unlock the potential of this groundbreaking technology and explore the possibilities it holds for the future.
Who this book is for Learn Ethereum, Second Edition, is designed for developers, entrepreneurs, investors, and enthusiasts seeking to master the fundamentals of the Ethereum blockchain and build real-world DApps. Developers: Gain practical knowledge and hands-on experience in building DApps using Ethereum. This comprehensive guide caters to both beginners and experienced developers, providing essential tools and insights for enhancing your Ethereum development skills. Entrepreneurs: Discover the potential of blockchain technology and its applications across industries. Explore Ethereum’s capabilities and practical implementations to identify opportunities for leveraging this technology in your business ventures and gaining a competitive edge. Investors: Understand the underlying principles of Ethereum to make informed investment decisions in the cryptocurrency and blockchain space. Gain insights into Ethereum’s ecosystem and navigate the dynamic landscape of blockchain projects and tokens. Enthusiasts: Immerse yourself in the world of blockchain, cryptocurrencies, and the Ethereum ecosystem. This book offers a thorough exploration of these subjects, providing you with a deep understanding of Ethereum’s core concepts, its impact on industries, and the tools and frameworks involved in Ethereum development. Regardless of your background or expertise, Learn Ethereum, Second Edition, equips you with the knowledge, practical skills, and confidence to actively participate in Ethereum’s rapidly evolving ecosystem.
What this book covers Chapter 1, Blockchain and Cryptocurrency, covers a comprehensive understanding of blockchain technologies, specifically focusing on the Ethereum ecosystem. Starting with basic concepts relating to Bitcoin, Ethereum, cryptocurrency, and blockchain, the book covers topics such as introducing blockchain technology, exploring cryptography, understanding the blockchain consensus mechanism, delving into Bitcoin and cryptocurrency, previewing blockchain use cases in various industries and government sectors, and introducing the world of Ethereum. By the end of this chapter, you will have gained the necessary knowledge to understand Ethereum accounts, forks, and the concept of mining. Chapter 2, Ethereum Architecture and Ecosystem, describes the architecture of Ethereum and helps you understand EVM, gas, and accounts, among other concepts. It also covers the fundamentals of ether mining. The chapter delves into how the Beacon Chain operates and how Ethereum implements the PoS consensus mechanism. Additionally, it explores Ethereum’s merge of Eth1 and Eth2. This chapter discusses the challenges of scaling Ethereum and provides an overview of various L1 and L2 scaling solutions, including optimistic rollups and ZK rollups. By the end of this chapter, you will have a solid understanding of the internals and diverse technologies within the Ethereum ecosystem. Chapter 3, Decentralized Finance, explores cryptocurrency and DeFi. It introduces Ethereum token standards, including fungible tokens and NFTs, and delves into stablecoins, with a focus on MakerDAO.The chapter then covers various DeFi categories such as lending, borrowing, exchanges, derivatives, fund management, lotteries, payments, and insurance. Prominent protocols within each category are highlighted, providing insights into the leading players in the DeFi ecosystem. Additionally, you will gain an understanding of the current state of the DeFi marketplace. This chapter concludes by offering a forward-looking perspective on the future of DeFi. This chapter provides you with essential knowledge of Ethereum token standards, stablecoins, and the diverse range of DeFi
products and services. It equips you with insights into leading protocols and a glimpse into the future of this transformative financial landscape. Chapter 4, EVM-Compatible Blockchain Networks, offers an overview of several EVM-compatible blockchain networks, namely Binance Smart Chain, Polygon, and Avalanche. This chapter explores the functionalities and workings of each blockchain, including a detailed examination of concepts and strategies for connecting EVM-compatible chains and facilitating asset bridging across different networks. You will gain insights into the diverse ecosystem of EVM-compatible blockchains and learn about the mechanisms that enable interoperability and seamless asset transfers. Chapter 5, Deep Research and Latest Developments in Ethereum, delves into the ongoing research and advancements within the Ethereum platform. This chapter starts by looking at challenges and considerations in distributed systems in general and introduces schools of thoughts in scaling blockchain networks. It then discusses various phases of the Ethereum roadmap post the merge. This chapter explores cutting-edge scaling solutions such as Proto-Danksharding, Danksharding, Data Availability Sampling, Maximal Extractable Value (MEV), Proposer Builder Separation (PBS), providing insights to help you make sense of the rollupcentric Ethereum roadmap. Additionally, you will gain an understanding of key improvements in user experience, including the smart contract wallet and account abstraction. This chapter also delves into the concept of zkEVM and the current state of the zkEVM implementation. This chapter provides an outlook on the future of Decentralized Autonomous Organizations (DAOs), Web3, the metaverse, NFT platforms, and blockchain technology, offering you a glimpse into the exciting possibilities and developments that lie ahead. Chapter 6, Fundamentals of Solidity, provides a comprehensive exploration of Solidity, the leading programming language for smart contracts. You will gain a deep understanding of Solidity’s features and development tools. This chapter covers essential Solidity language fundamentals, including contract structure, patterns, and exception handling, along with smart contract security and best practices. This chapter also offers practical insights by showcasing a complete real-world smart contract developed in
Solidity. You will learn how to functionally test your smart contracts and ensure their robustness. By the end of this chapter, you will have a solid foundation in Solidity and will be equipped with the knowledge and skills to build your own secure and functional smart contracts. Chapter 7, Web3 API Fundamentals, delves into the fundamental features of the Web3 API. This chapter provides an in-depth exploration of three key Web3 APIs: web3.js for Ethereum in JavaScript, web3.py for Ethereum in Python, and web3j for Ethereum DApp development in Java. Through practical examples, you will possess the knowledge and skills of how to leverage these APIs to interact with smart contracts deployed on the blockchain. Chapter 8, Developing Your Own Cryptocurrency, provides you with an overview of open-source smart contract libraries and delves into various ERC token standards. This chapter guides you through the process of creating your own cryptocurrencies using Solidity, starting with the ERC-20 token standard. You will learn how to develop a cryptocurrency called MyERC20Token based on the ERC-20 standard. Additionally, this chapter explores the creation of NFTs for a decentralized art marketplace, utilizing the ERC-721 standard to develop DigitalArtERC721Token. Furthermore, this chapter introduces another prominent NFT token standard, ERC1155, and provides insights into creating an ERC1155 NFT token. By the end of this chapter, you will have a comprehensive understanding of token standards, the setup of an Ethereum development environment, and the ability to create your own cryptocurrencies using various token standards. Chapter 9, Smart Contract Development and Test Fundamentals, focuses on providing you with practical insights into using development tools and conducting tests for smart contracts. This chapter begins by demonstrating the usage of Remix, a popular web-based IDE, for smart contract development and debugging. It further explores alternative options such as the Truffle suite and Hardhat as comprehensive development frameworks. You will also gain an understanding of smart contract unit testing by applying tests to the previously developed smart contract. This chapter emphasizes the importance of testing in ensuring the functionality and security of smart contracts. By engaging with these topics, you will
strengthen your proficiency in Ethereum development and be well-equipped to develop, test, and secure your own smart contracts. Chapter 10, Writing a Frontend to Build the NFT Marketplace DApp, guides you through the process of creating a user interface (UI) for a decentralized digital art market DApp. This chapter introduces you the concept of DApps and their two-tier architecture, comprising a frontend UI layer and a smart contract backend on the blockchain. Having already learned about smart contract development and unit testing in the previous chapter, this chapter focuses on developing the UI component, which allows end users to interact with smart contracts. React, a widely used JavaScript framework, is employed for this purpose, along with the web3.js library, which facilitates communication with the Ethereum blockchain through its APIs. By following the chapter’s content, you will acquire the knowledge and skills needed to build the UI for a DApp using React and web3.js. You will gain an understanding of the essential steps involved in setting up the development environment, constructing frontend components, and ultimately running a fully functional decentralized digital art market DApp. Chapter 11, Ethereum Tools and Frameworks, provides you with an overview of the commonly-used tools and frameworks in Ethereum development. This chapter delves into the various tools and frameworks that are typically employed in Ethereum development, including those for smart contract compilation, deployment, and testing. Additionally, it explores frameworks that facilitate the development of decentralized applications. This chapter also covers storage options within the Ethereum ecosystem, including on-chain storage utilizing smart contracts, as well as off-chain storage using distributed file systems such as InterPlanetary File System (IPFS). You will gain an understanding of the available storage solutions and their applications. Furthermore, this chapter introduces popular smart contract libraries that offer pre-built functionalities and code templates. These libraries enable developers to streamline their development processes and enhance the security of their smart contracts. Chapter 12, Setting Up an Ethereum Private Chain, shifts our focus to private Ethereum networks. Private blockchains are primarily used by
developers for testing purposes, offering advantages such as simplified testing without the need for node syncing or obtaining test ether. This chapter guides you through the process of setting up a private blockchain using Ethereum, highlighting the differences between public and private blockchains. Additionally, it explores the application of private blockchains in production use cases. Chapter 13, Deployment of Your DApps, focuses on the next step in the smart contract development cycle: deploying and testing contracts in an environment similar to the Ethereum main network. Testnets serve as platforms where developers can closely simulate the main network and test their contracts effectively. This chapter provides a comprehensive understanding of deploying smart contracts on popular testnets, namely the Goerli and Sepolia test networks. You will learn the step-by-step process of deploying your contracts to these testnets, enabling you to assess the functionality and behavior of your contracts in an environment that closely resembles the Ethereum main network. Furthermore, this chapter delves into monitoring smart contracts after deployment. You will gain insights into the tools and techniques used to monitor the performance and interactions of your deployed contracts. By following the content of this chapter, developers will be equipped with the knowledge and skills necessary to confidently deploy and monitor their smart contracts on testnets. Chapter 14, Build Ethereum Wallets, provides a comprehensive overview of Ethereum wallets and guides you on how to create your own wallets. This chapter delves deeper into the technology and functionality of Ethereum wallets. You will gain a solid understanding of Ethereum wallet concepts, including non-deterministic wallets and Hierarchical Deterministic (HD) wallets. This chapter explores the features of HD wallets, which offer enhanced security and convenience through the generation of a hierarchical tree of private keys. Additionally, you will explore advanced wallet features such as multiparty signatures, stealth addresses, and confidential transactions, which provide additional layers of privacy and security. This chapter also provides step-by-step guidance on creating an Ethereum wallet, empowering you to have full control over your wallet’s security and functionality. Furthermore, you will become familiar with popular third-
party Ethereum wallets, broadening your knowledge of the available wallet options and their respective features. By the end of this chapter, you will have a comprehensive understanding of Ethereum wallets, the security features they offer, and the various options available for wallet creation and management. Chapter 15, Oracles, Cross-Chain, and Layer 2 in Practice, offers a comprehensive exploration of the fundamental concepts and technologies that enable interoperability and advanced functionalities within the decentralized Web 3.0 ecosystem. This chapter begins by providing you with a clear understanding of the crucial role played by oracles in retrieving and verifying external data, empowering smart contracts to respond and execute actions based on real-time information. Through practical examples, you will gain hands-on experience in integrating oracles into your smart contracts to access real-time token market prices. Continuing, this chapter delves into the intricacies of cross-chain bridge technology, which facilitates seamless communication and asset transfers between different blockchain networks. By comprehending the underlying mechanics of cross-chain bridges, you will grasp their significance in enabling the smooth interoperability of tokens and data across multiple chains. Furthermore, this chapter explores L2 technologies, which effectively tackle scalability challenges by providing off-chain solutions that enhance transaction throughput and minimize fees. You will be introduced to practical implementations of L2 technologies, comprehending your potential to significantly improve the overall efficiency and user experience of DApps. Upon concluding this chapter, you will possess a comprehensive understanding of oracles, cross-chain bridge technology, and L2 technologies. Equipped with this knowledge and the necessary tools, you will be empowered to leverage these technologies in your Ethereum projects, resulting in enhanced functionalities, seamless real-time data integration, and improved scalability. Chapter 16, Conclusion, is a summary of the entire book and the Ethereum blockchain technologies covered therein. It offers you a comprehensive understanding of Ethereum and its blockchain technologies. It addresses the
challenges, explores the ecosystem, discusses the emerging trends in the blockchain and Ethereum ecosystem, and provides a glimpse into the future of Ethereum. It serves as a valuable resource for individuals seeking to grasp the fundamental concepts, opportunities, and advancements within the Ethereum blockchain space.
To get the most out of this book Having a basic understanding of Ethereum frameworks, such as Remix and Truffle, will be beneficial for you. Additionally, familiarity with JavaScript is advantageous for comprehending the concepts presented in this book. Software/hardware covered in the book Operating system requirements Angular 9
Windows, macOS, or Linux
TypeScript 3.7 ECMAScript 11
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Learn-Ethereum-Second-Edition. If there’s an update to the code, it will be updated in the GitHub repository. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Conventions used
There are a number of text conventions used throughout this book. Code in text:
Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example “With web3.js installed, to instantiate web3, here is some typical JavaScript code.” A block of code is set as follows: componentDidMount = async () => {
const web3 = await getWeb3();
const contractInstance = await getInstance(web3);
...
this.setState({ contractInstance: contractInstance });
}
Any command-line input or output is written as follows: npm install -g create-react-app
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “Select cat-in-blockchain and click the PUBLISH button.” Tips or important notes Appear like this.
Get in touch Feedback from our readers is always welcome. General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form. Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material. If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Share Your Thoughts Once you’ve read Learn Ethereum., we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback. Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Download a free PDF copy of this book Thanks for purchasing this book! Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice? Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application. The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily Follow these simple steps to get the benefits: 1. Scan the QR code or visit the link below
https://packt.link/free-ebook/9781804616512 2. Submit your proof of purchase 3. That’s it! We’ll send your free PDF and other benefits to your email directly
Part 1: Blockchain and Ethereum Basics In this Part, we will start with the concept of Blockchain and dive deep into Ethereum fundamentals, its architecture and its ecosystem. We will then introduce Decentralized Finance (DeFi) and analyze various DeFi protocols. We will discuss various Ethereum Virtual Machine (EVM) compatible Blockchains and cross-chain integration mechanisms. To keep you at the forefront of this evolving field, we will also bring you up to speed with the latest developments and most advanced research topics within the Ethereum community. This part comprises the following chapters: Chapter 1, Blockchain and Cryptocurrency Chapter 2, Ethereum Architecture and Ecosystem Chapter 3, Decentralized Finance Chapter 4, EVM-Compatible Blockchain Networks Chapter 5, Deep Research and the Latest Developments in Ethereum
Blockchain and Cryptocurrency It is a common belief that the bankruptcy filed by Lehman Brothers, a Wall Street banking giant, on September 15, 2008, triggered the global financial crisis in 2008-2009. Excessive risk exposure to the subprime mortgage and financial derivative markets by large banks almost brought down the global financial system. The crisis was the ultimate consequence of a fundamental breakdown of trust in the relationship between customers and the financial institutions that should have been serving them. Shortly after that, Satoshi Nakamoto, a mysterious and anonymous entity, published a whitepaper on October 31, 2008, called Bitcoin: A Peer-to-Peer Electronic Cash System, which is considered the origin of Bitcoin and all cryptocurrencies. Satoshi proposed a completely decentralized approach for Peer-to-Peer (P2P) payment without central banks or intermediaries. He outlined the principles and functions of what would be developed and introduced as Bitcoin in the following year. The central technology behind his invention is referred to as blockchain and has since evolved well beyond Bitcoin and digital payment. It is now a suite of technologies, forming the foundation of distributed ledgers and cryptocurrency. No one knows who or where Satoshi is, whether it is one individual or a group, but the whitepaper has profoundly changed money, digital and cryptocurrencies, business, and the world. You can learn more about the Bitcoin whitepaper authorship debate at https://www.judiciary.uk/wp-content/uploads/2022/08/Wright-vMcCormack-Judgment.pdf. The purpose of this book is to help you to understand blockchain technologies, introduce you to the tools and technologies of the Ethereum ecosystem, and get you started with developing smart contracts and end-toend decentralized applications. In this chapter, we will start with basic concepts in Bitcoin, Ethereum, cryptocurrency, and blockchain.
In this chapter, we will cover the following topics: Introducing blockchain technology Rehashing cryptography Anatomizing the blockchain consensus mechanism Understanding Bitcoin and cryptocurrency Overview blockchain use cases in the industry and government Ushering in the world of Ethereum
Technical requirements All of the source code in this book can be found at the following GitHub link: https://github.com/PacktPublishing/Learn-Ethereum-Second-Edition/.
Introducing blockchain technology You might have heard the parable of the blind men and the elephant. It is a folktale about six blind men’s individual descriptions of the same elephant based on their own perceptions from touching particular parts of the animal, each of them giving very different descriptions of what they think the creature looks like. It highlights the fact that different perspectives can lead to distinct viewpoints, emphasizing the limits of perception and the importance of a complete context. When Satoshi invented Bitcoin, the fundamental concept of its vision was to build a blockchain, a shared public ledger (longest Proof-of-Work (PoW) chain), that verifies and immutably records all transactions through a decentralized computer network (P2P network) and a consensus mechanism with computational proof. Satoshi thus came up with an elegant solution to the double-spend problem in digital money. A doublespend is an attack where someone tries to spend money in a transaction that isn’t actually available anymore as the money has already been spent.
Blockchain is a new elephant in the digital world. To most of the public, blockchain is nothing but an obscure pseudonym for all cryptocurrencies, including Bitcoin, Ethereum, and more. So, what is blockchain? What does a blockchain look like? How does it work? Where can we use blockchain? Do you need a blockchain? There are many ways to describe a blockchain from different perspectives, but there is no universal definition of a blockchain. On the contrary, there are prevalent debates over the essential attributes or qualities of a blockchain. It is perceived as a new architecture using existing technologies, the next generation of the internet and web, a future database and distributed shared ledger, the new Napster (a P2P file-sharing system used in the 90s) with a pure decentralized P2P network, a cryptocurrency, a trustless secure transaction system, and so on. In reality, it is all of these. Only by combining all of these perspectives can we understand the whole picture of blockchain technologies and get a sense of the true potential of blockchain. The following picture illustrates different viewpoints of blockchain technology:
Figure 1.1 – Different viewpoints on blockchain technologies So, what is a blockchain anyway? Think of blockchain as a new architecture paradigm and a new trust protocol. It is a computer science
primitive forming the foundation of most cryptocurrencies and decentralized applications. It is a P2P transaction model that can enable two parties to transact in a way that is tamper-resistant and cryptographically proven. As the technology behind Bitcoin and other cryptocurrencies, blockchain is an open, distributed ledger that can be simultaneously used and shared within a large, decentralized, publicly accessible network. In essence, blockchain is a distributed shared ledger technology supported by three pillars, as shown in the following figure; these are P2P networks, cryptography, and a consensus mechanism:
Figure 1.2 – Key components of blockchain To understand how blockchain works, let’s start with the fundamental concepts and key building blocks of blockchain technologies. Then, we’ll discuss the key differences between centralized, distributed, and decentralized systems. We will then dive into the blockchain data structure
and discuss how transactions, blocks, and chains are maintained and how the network reaches a consensus on the state of the chain, as well as how to secure the blockchain with cryptographic technologies. The following lists the key building blocks of blockchain technologies: Transactions: A transaction is a value transfer between two parties. It could be a transfer of money, tangible assets, or cryptocurrency. Transactions are broadcasted to the blockchain network. They are validated and verified by all nodes and collected into blocks. Once the block reaches a certain depth — in Bitcoin, this is six blocks — those transactions in the block can be considered irreversible. Block: All verified transaction records are collected into a data structure called a block. It has a header and body part, where the header contains a cryptographic hash of the previous block, a timestamp, and a Merkle tree root hash of all transactions in the block. The body is the container of transaction data. A Merkle tree is like the digital fingerprint of transactions in the block, which we will discuss extensively later in this section. The chain of block (blockchain): A blockchain is a linked list of a chain of blocks. Blocks are linked together using a cryptographic hash as the pointer to the previous block. Decentralized P2P network: It is a P2P network in which interconnected nodes share resources between themselves without the use of a central authority or some sort of intermediary. Consensus protocol: The consensus protocol in blockchain is a set of rules that all network nodes will enforce when considering the validity of a block and its transactions. The consensus mechanism is the process used by the network nodes to achieve agreement on the network state. It is a fault-tolerant mechanism to ensure the reliability and integrity of the network. Mining: Mining is the process by which network nodes in blockchain systems add new blocks to the blockchain and get rewarded with crypto-incentives.
In the next section, we will discuss how P2P networks work.
Decentralized P2P networks To explain how blockchain works, let’s look at the steps involved with the existing business model for completing a simple payment transaction. A customer, Alice, needs to pay $10 to Bob, who is in a geographically distant region from Alice and happens to have an account in the same bank as Alice. She can make the payment either by visiting a bank branch or using the web. Let’s say she tries to do it online through the bank’s web portal. She will need to authenticate herself using her username and password and then put the transfer order in and wait for the bank system to confirm whether the transaction is completed. As shown in the following diagram, in order to support such online banking activities in the traditional world, the bank has to establish an identity, access management system, and authenticate Alice’s login credentials. Behind the scenes, the bank needs to develop a bank web portal and a backend system to verify whether Alice has the right account with the bank and has enough money to pay Bob, upon which the bank can transfer $10 out of Alice’s account and put $10 in Bob’s account. The bank has to maintain a ledger to record the details of the transaction in a database and show the balance each person has. The following diagram shows a centralized bank system model:
Figure 1.3 – Centralized bank system model As the business grows, customers needs change with it. The traditional brick-and-mortar business model is being replaced by the digital banking and commerce model. This requires technological changes in the bank system too. Banks nowadays deploy a distributed system model to serve the ever-growing needs of their customers. The following diagram shows the distributed bank system model:
Figure 1.4 – Distributed bank system model The fundamental issue with the preceding centralized or distributed system model is the risk from single points of failure. Failure could come from malicious network attacks, system failures, or security and privacy breaches; it could come from business failures in the bank itself, which can cause millions of people to lose their homes due to the bankruptcy of big banks during a global financial crisis. It could happen due currency failure, such as the currency collapse in Venezuela, where the lifetime savings of average citizens suddenly became worthless overnight. Also, payments can be blocked due to government censorship. Satoshi Nakamoto believed that the root problem with the traditional fiat system is all the trust required to make it work. Citizens have to trust the central bank not to devalue the currency. Consumers have to trust the bank to manage their money. But history has shown again and again that this trust is often breached.
Satoshi designed an elegant decentralized P2P electronic cash system, and the technology behind that, blockchain, is the solution, where transactions are maintained in a distributed shared ledger and replicated across a global P2P network. Security and privacy are ensured with cryptographic technologies, and transaction integrity is achieved through a consensus mechanism. The following diagram shows a decentralized bank system model:
Figure 1.5 – Decentralized bank system model As new transactions are made, they are broadcasted to all network nodes, and over time all transactions that have occurred are sequenced together in the public ledger and made available on all replicated network nodes, as shown in the following diagram:
Figure 1.6 – Decentralized public ledger Now that we understand the difference between centralized and decentralized models, let’s see how blockchain works.
How does blockchain work? Using the previous example, as shown in the following diagram, let’s assume Alice wants to buy something from Bob and she agrees to pay Bob 10 bitcoins (BTC):
Figure 1.7 – Money transfer between two parties Let’s walk through the high-level processes step by step to demonstrate how blockchain works: 1. Create blockchain transactions: A transaction is a value transfer between two parties. When Alice sends 10 BTC to Bob, it will create a transaction with one or more inputs and two or more outputs, where the inputs reflect Alice’s account, and the outputs reflect which account(s) Alice intends to transfer to. The transaction is then digitally signed with Alice’s private key and broadcasted to the P2P network. The receiver will use the digital signature to verify the ownership of Alice’s funds. We will discuss digital signatures and cryptographic hash functions in detail in later sections. 2. Validate the transactions and add to the transaction pool: Once the transaction is submitted to the blockchain network, the bookkeeper node (usually a full node in a P2P network that receives the transactions) will validate it according to protocol rules defined by the blockchain network. If the transaction is valid, the bookkeeper will add it to the transaction pool and relay the transaction to the peers in the network. 3. Create the candidate blocks: Transactions in the transaction pool are collected into the block periodically. In a Bitcoin network, every 10 minutes, a subset of network nodes, called mining nodes or miners, will collect all valid transactions from the transaction pool and create
the candidate blocks. The following diagram shows the structure of a candidate block:
Figure 1.8 – Creation of candidate blocks As illustrated in the preceding diagram, the high-level processes are as follows: The candidate block packages the recent valid transactions into the block structure based on block specifications. For each transaction in the package, it creates a cryptographic hash of the transaction data, recursively calculates the hash out of existing hashes, and creates a Merkle root of all transactions, as depicted in the following diagram:
Figure 1.9 – Merkle tree The miner node looks for the latest block on the blockchain and adds its hash to the block header of the candidate block as the reference from the candidate block it intends to link to. 1. Mine the new block: Once the candidate block is created, the race starts for the chance to add new blocks and win the rewards. The process for such a race is called mining. The winning of the race is determined by the consensus mechanism. We will discuss different consensus mechanisms in later sections. In blockchain systems such as Bitcoin or Ethereum 1.0, the PoW consensus mechanism is applied to mining. Miners will keep trying to find a random number, the nonce in the block header structure, until the hash meets certain challenging conditions. For example, one such challenging condition is, the resulting block hash is smaller than a target number, or in some cases, the hash has a few leading zeros. In practice, every random number has the same chance to win the race, so practically, you can just start a loop through from 1 to 232 until it finds such a nonce, the unique hash meeting the condition. It requires huge CPU hashing power to find such a nonce. The challenging condition, called difficulty, can be adjusted based on the target number or bits
in the block header structure. The difficulty in winning the race grows exponentially the smaller the target number is or the fewer bits are in the block header structure. 2. Add a new block to the blockchain: The first winning node will announce the new block to the rest of the network for verification. Once the block is verified and approved by the majority of the network miners, it will be accepted and becomes the new head of the chain. Since all blocks are chained together by linking the hash to the previous block, any tampering with the ledger becomes impossible since it will require PoW on all previous transitions. All miners have the chance to solve the puzzle, but only the winning miner has the authority to add the block to the chain and claim the bounty. Once the new block is added to the blockchain, all in-progress miners will stop their mining efforts on the newly added block and start the race again on a new block. The following diagram summarizes the step-by-step process when new transactions are submitted to the blockchain network:
Figure 1.10 – How blockchain works in one picture Cryptography plays a critical role in maintaining the transaction state in the blockchain and ensuring immutability. Cryptography is not new. In the next section, we will go over some key concepts in cryptography.
Rehashing cryptography Cryptography is the study of secure communication techniques that prevent third parties or the public from reading private messages and allow only the intended recipient of a message to view its contents. It is the cornerstone of information security, which serves as the basis for delivering secure business applications and services. Modern cryptography concerns itself with the following five objectives of information security: Confidentiality: This is the concept of preventing sensitive data from being accessible by any unauthorized entities Integrity: This means protecting sensitive data from unauthorized changes during transit from one party to another party Authentication: This is the process of ensuring that user identity is truly what the user claims it to be, whether the user is human or a system Authorization: This is the concept of determining what actions an authenticated user is allowed to perform Non-repudiation: When a user performs an action on data, the action must be bound with the user so that it can’t deny performing such actions Cryptography deals with the design of algorithms for encryption and decryption, which are intended to ensure the secrecy and authenticity of the messages or transactions in question. Let’s start with some key elements in modern cryptography: Encryption: This is the process of converting plain text or data into an unintelligible form, typically using a mathematical algorithm. Decryption: This is the process of reversing encryption, converting an encrypted message back into its original text and data. Hash: This is the process of converting any data block (arbitrary size or message) into a fixed-length hash code. A cryptographic hash
function is a deterministic mathematical function performing such a conversion using cryptography, and it always maps to the same result for a given data block. Cryptography is the linchpin and one of the three pillars of blockchain technology, along with the consensus mechanism and P2P network. It is used in many different forms, including, for example, wallets (for proof of cryptocurrency ownership), transactions (for PoW consensus), and P2P communication. In the following subsections, we will go over key blockchain-related cryptography topics, including public-key cryptography, digital signatures, cryptographic hashing, and Merkle trees.
Public-key cryptography Public-key cryptography is a form of cryptographic function in which encryption and decryption are performed using two different keys — one public and one private key. They are generated in pairs. It is also called asymmetric cryptography. The public key can be shared with the public, but the private key is meant to be a secret code only known by its owner. The keys are used in tandem too. Either of the two keys can be used in encryption, with the other one used for decryption. It is computationally improbable to determine the private key given only knowledge of the cryptographic algorithm and the public key. Public-key cryptography is mostly used to do the following three things: Secure the message transmission between two parties and ensure the confidentiality of messages or data Authenticate the sender and ensure the message is indeed sent from the sender Combine it with the cryptographic hashing function and provide a digital signature on a document before sending it to the receiver
We will go over the first two here and discuss digital signatures in the following section: Public-key cryptography for confidentiality: In this case, as depicted in the following diagram, the receiver’s keys are used to encrypt messages between two parties during transmission. The sender (Alice) uses the receiver’s public key to encrypt a message, and the receiver (Bob), who holds their own private key in secrecy, can decrypt the messages using their private key:
Figure 1.11 – Confidentiality with public key Public-key cryptography for authentication: In this case, as shown in the following diagram, the sender’s keys are used to authenticate the sender’s message. The sender uses its own private key to encrypt a message before sending it to the intended parties. The receiver can use the sender’s public key to confirm the message’s authenticity and decrypt it. The combination of this approach with the message’s cryptographic hashing function provides a digital signature, which we will discuss in the next section:
Figure 1.12 – Authentication with public key Public-key cryptography is an essential technology underpinning wallets and transactions in the blockchain. We will discuss the Bitcoin wallet in the Understanding Bitcoin and cryptocurrency section.
Cryptographic hash function A cryptographic hash function is an algorithm used to randomly convert a string of binary data into a condensed representation of a message — a message digest. Its output is called a hash value, digital fingerprint, digest, or checksum. It is deterministic and always results in the same hash value for a given message. It is capable of taking any size of data block and producing a fixed-size hash value that uniquely identifies the original data block. It is a one-way, irreversible function; the only way to recreate the input data is to try a brute-force approach with all possible values to see whether there is a match, which is almost computationally infeasible. Notable hash functions include MD5, SHA-1, SHA-2 and SHA-3. Although they are still widely in use, MD5 and SHA-1 are cryptographically broken due to collision attacks found in the algorithm, and are thus no longer recommended.
Cryptographic functions have been widely used in blockchain technology, including the following: Merkle trees: As we showed earlier, when a miner node pulls transactions from the transaction pool, it packages them in a block, where the block header has a field referencing the Merkle root of all transactions. Block chaining: Blocks in the blockchain are chained together with a reference to the previous block using a cryptographic hash. PoW: The PoW consensus algorithm itself is a game in solving a cryptographic hash function. We will discuss it in more detail in the Understanding the blockchain consensus mechanism section. In addition to cryptographic hash functions, digital signatures have been broadly leveraged in blockchain networks too. We will discuss their usage in the next subsection.
Digital signatures A digital signature is a set of algorithms for determining the authenticity and integrity of digital messages or documents. It assures the recipient that the message was indeed created by the expected sender and that the message was not altered during transmission. The sender cannot deny having sent the message. When Alice sends a document to Bob, she will follow certain steps to digitally sign the document, as shown in the following diagram:
Figure 1.13 – Digital signature The steps to digitally sign the document are as follows: 1. Calculate the message digest of the document Alice wants to send to Bob with a cryptographic hash function, usually any SHA-2 or SHA3 algorithm. 2. Encrypt the message digest with Alice’s private key, append the encrypted message digest to the original document, and send the combined message out. 3. Once Bob receives the combined message from Alice, he will separate the encrypted message digest from the document itself. Bob will use Alice’s public key to decrypt the encrypted message digest. 4. At the same time, Bob will calculate the message digest of the received document and compare the resulting message digest with the decrypted message digest to see whether there is a match. If yes, Bob is assured that the document originated from Alice without any tampering.
In blockchain, a digital signature is a way to prove ownership of the underlying cryptocurrency or electronic coin. When Alice needs to pay Bob 10 BTC, she will digitally sign a hash of the previous transaction, which can prove that Alice has ownership of the 10 BTC. In summary, cryptography is one of three foundational pillars in blockchain technology. Public-key cryptography is the basis for blockchain wallets and transactions, and the cryptographic hash function is a key element underpinning the PoW consensus mechanism. A digital signature is used as proof of ownership of electronic coins or cryptocurrency. In the next section, we will introduce and look at a blockchain consensus mechanism in detail and discuss how cryptography technologies are leveraged to reach consensus among decentralized parties.
Anatomizing a blockchain consensus mechanism A fundamental problem in large-scale distributed systems is how to achieve overall system reliability in the presence of failures. Systems need to be fault-tolerant. This requires a process for distributed, often heterogeneous systems to reach a consensus and agree on the network state, whether it is a database commit or an action to take. In this section, we will discuss two types of consensus algorithms, PoW and PoS.
What is consensus? Consensus in a blockchain is the process by which a network of mutually distrusted nodes reaches an agreement on the global state of the chain of blocks. In blockchain, transactions or data are shared and distributed across the network. Every node has the same copy of the blockchain data. Consensus allows all of the network nodes to follow the same rules to validate transactions and add new blocks to the chain, and therefore allows it to maintain uniformity in all copies of a blockchain.
Sometimes, it is also called a consensus mechanism or consensus algorithm. A consensus mechanism focuses on the rules and incentives for the network to reach an agreement. A consensus algorithm is a formal procedure or computer program for solving a consensus problem, based on conducting a sequence of specified actions. It is designed to achieve reliability in a network involving multiple nodes. Consensus algorithms ensure that the next block in a blockchain is fully validated and secured. Multiple kinds of consensus algorithms currently exist, each with different fundamental processes. Different blockchain platforms may implement different consensus mechanisms. In this section, we will focus on the following two popular algorithms, show how they work, and discuss the pros and cons of each mechanism: PoW: This consensus algorithm was first coined and formalized in a 1999 paper by Markus Jakobsson and Ari Juels. It got popularized by Satoshi in the Bitcoin whitepaper. It was commonly adopted by many other blockchains, including Ethereum 1.0. The PoW is the mining process with the purpose of finding an answer to a cryptographic hashing problem. To do so, the miner has to follow the block selection rules to locate the previous block and use the hash from the previous block header, together with the Merkle root of current transactions in the new block, to solve the hashing problem. It requires considerable computations and hashing power. In Bitcoin, block selection rules specify that the longest chain wins. PoS: This consensus algorithm aims to select network nodes to propose new blocks using various combinations of random selection based on their wealth or age (the stake). Instead of miners competing to solve energy-consuming cryptographic hash functions, the network instead uses a pool of validators. Validators are network nodes that are willing to stake their cryptocurrency on the new block that they claim should be added to the public blockchain. Let us get into the details of how PoW and PoS actually work in the following subsections.
Proof-of-work Proof-of-work, also referred to as PoW, is the most popular consensus algorithm used by blockchain and cryptocurrencies such as Bitcoin and Ethereum 1.0, each one with its own differences. We will talk about the specific implementation of PoW in Bitcoin and Ethereum in later sections.
How PoW works PoW, in terms of protocol design, is an intensive computation game among all miners in the network. The problem to be solved is a cryptographic puzzle. Behind the game theory, it is the incentive system that rewards the winners with bitcoins for contributing new blocks into the blockchain. As shown in the following picture, miners collect all pending transactions from the transaction pool and race against each other to solve the cryptographic puzzle. The miner solving the puzzle will create the new block and publish it into the network for verification from other nodes. Once verified, all nodes can add the new block to their own copy of the blockchain:
Figure 1.14 – How PoW works The cryptographic puzzle that miners race to solve is identifying the value of the nonce. A nonce is an attribute in the block header structure. In the beginning, each miner guesses a number to start with, checking whether the resulting hash value is less than the blockchain specific target. Bitcoin uses the SHA-256 algorithm for this. SHA-256 outputs a fixed-length number. Every number between 0 to 232 has the same chance to solve the puzzle, therefore a practical approach is to loop through from 0 to 232 until a number can meet the criteria, as shown in the following diagram:
Figure 1.15 – PoW mining process Once a miner finds the nonce, the results, including the previous block’s hash value, the collection of transactions, the Merkle root of all transactions in the block and the nonce, are broadcasted to the network for verification. Upon being notified, the other nodes from the network automatically check whether the results are valid. If the results are valid, they add the block to their copies of the blockchain, stop the mining work in hand, and move on to the next block.
Targets and difficulty A target is a blockchain-specific 256-bit number that the network sets up for all miners. The SHA-256 hash of a block’s header — the nonce plus the rest of the block header — must be lower than or equal to the current target for the block to be accepted by the network. The difficulty of a cryptographic puzzle depends on the number of leading zeros in the target. The lower the target, the more difficult it is to generate a block. Adding leading zeros in the target number will increase the difficulty of finding such a nonce exponentially. As you can imagine, the higher the difficulty setting, the more difficult it will be to evaluate the nonce. Adding one leading zero in the target will reduce by 50% the chance of finding the nonce. The difficulty is decided by the blockchain network itself. The basic rule of thumb is to set the difficulty proportionally to the total effort on the network. If the number of miner nodes doubles, the difficulty will also double. The difficulty is periodically adjusted to keep the block time around the target time. In Bitcoin, it is 10 minutes.
Incentives and rewards The winner of the cryptographic puzzle usually needs to expend huge amounts of energy and crucial CPU time to find the nonce and win the chance to create new blocks in the blockchain. The reward for such actions depends on the blockchain itself. In the Bitcoin blockchain, the winner is rewarded with Bitcoin, the cryptocurrency of the Bitcoin blockchain. The PoW consensus is a simple yet reliable mechanism to maintain the state of the blockchain. It is simple to implement. It is a democratic lottery-based system that lets you participate in the game of mining and get the rewards, where every node can join and higher CPU power may not translate into higher rewards. Currently, the winning miner is rewarded with 6.25 BTC for each block created in the Bitcoin blockchain.
Double-spend issues
Satoshi’s original intention in using a PoW mechanism is to solve doublespend issues and ensure the integrity of the global state of the Bitcoin blockchain network. Let’s say Alice sends 10 BTC to Bob, and at the same time or later on she pays Catherine the same 10 BTC. We could end up with the following three situations: The first transaction goes through the PoW and is added to the blockchain when the second transaction is submitted. In this case, the second one will be rejected when miners pull it from the transaction pool and validate it against all parent blocks. Both transactions are submitted simultaneously and both go into the unconfirmed pool of transactions. In this case, only the first transaction gets a confirmation and will be added in the next block. Her second transaction will not be confirmed as per validation rules. Both get confirmed and are added into competing blocks. This happens when miners take both transactions from the pool and put them into competing blocks. The competing blocks form a temporary fork on the blockchain. Whichever transaction gets into the longest chain will be considered valid and spent, and the other one within the block on the short chain will be recycled. When it is reprocessed, it will be rejected since it is already spent. In this case, it may take a few blocks to get the other one recognized as the double-spent one. Double spend is a technical flaw in all digital currencies prior to Bitcoin, where the same unit of digital currency could potentially be used in transactions multiple times. Bitcoin’s solution in addressing double-spend issues paved the way for Bitcoin to be the true digital currency.
Advantages and disadvantages However, there are a few drawbacks to the PoW algorithm due to the economic cost of maintaining the blockchain network safety:
Energy consumption: PoW consensus, which uses a network of powerful computers to secure the network, is extremely expensive and energy-intensive. Miners need to use specialized hardware with high computing capacity in order to perform mining and get rewards. A large amount of electricity is required to run these mining nodes continuously. Some people also claim these cryptographic hash calculations are useless as they can’t produce any business value. At the end of 2018, the Bitcoin network across the globe used more power than Denmark. Vulnerability: PoW consensus is vulnerable to 51% attacks, which means, in theory, dishonest miners could gain a majority of hashing power and manipulate the blockchain to their advantage. Centralization: Winning a mining game requires specified and expensive hardware, typically an ASIC type of machine. Expenses grow unmanageable, and mining becomes possible only for a small number of sophisticated miners. The consequence of this is a gradual increase in the centralization of the system, as it becomes a game of riches. On the flip side, it requires huge computing power and electricity to take over the PoW-based blockchain. Therefore, PoW is perceived as an effective way to prevent Denial-of-Service (DoS) and Distributed Denialof-Service (DdoS) attacks on the blockchain.
Proof-of-stake As opposed to PoW consensus, where miners are rewarded for solving cryptographic puzzles, in the PoS consensus algorithm, a pool of selected validators each take turns proposing new blocks. The validator is chosen in a deterministic way, depending on its wealth, also defined as a stake. Anyone who deposits their coins as a stake can become a validator. The chance to participate may be proportional to the stakes they put in. Let’s say, Alice, Bob, Catherine, and David stake 40 ether, 30 ether, 20 ether, and 10 ether respectively to participate; they will get a 40%, 30%, 20%, and 10% chance of being selected as the block creator. The following is how it works in the PoS consensus mechanism:
Figure 1.16 – How PoS works As shown in the preceding diagram, the blockchain keeps track of a set of validators. Depending on their roles in creating new blocks, sometimes the validator is also called block creator, builder, or proposer. At any time, whenever new blocks need to be created, the blockchain randomly selects a validator. The selected validator verifies the transactions and proposes new blocks for all validators to agree on. New blocks are then voted on by all current validators. Voting power is based on the stake the validator puts in. Whoever proposes invalid transactions, blocks, or votes maliciously, which means they intentionally compromise the integrity of the chain, may lose their stakes. Upon the new blocks being accepted, the block creator can collect the transaction fee as the reward for the work of creating new blocks. PoS is considered more energy efficient and environmentally friendly compared with the PoW mechanism. It is also perceived as more secure too. It essentially reduces the threat of a 51% attack since malicious validators would need to accumulate more than 50% of the total stakes in order to take over the blockchain network.
Similar to PoW, total decentralization may not be fully possible in the PoSbased public blockchain. This is because a few wealthy nodes can monopolize the stakes in the network. Those who put in more stakes can effectively control most of the voting. Both algorithms are subject to the socio-economic issue of making the rich richer. PoS is getting more popular these days, due to social economical perspective and scalability limitation of PoW mechanism. Ethereum transitioned to PoS and decommissioned PoW as part of the merge of Ethereum 1.0 and Ethereum 2.0 in September 2022. We will discuss Ethereum 1.0 and 2.0 in more details in the next chapter.
Forking Earlier, we spoke about the temporary fork that occurs when two competing blocks are added to the blockchain. As shown in the following screenshot, this can continue until the majority of the nodes see the longest chain. Newer blocks will be appended to the longest chain. Blocks added to the shortleaf of the forked chain will be discarded, and those transactions will go back to the transaction pool and will be picked again for reprocessing. Eventually, the blockchain will comprise all conforming blocks, chained together using cryptographic hashes pointing to its ancestor:
Figure 1.17 – Forking in a blockchain Just like software development, forking is a common practice in blockchain. Forking takes place when a blockchain bifurcates into two separate paths. The following events, intentionally or accidentally, can trigger a blockchain fork:
New features are added, requiring a change in blockchain protocol, such as block size, mining algorithm, and consensus rules Hacking or software bugs A temporary fork occurs when competing for blocks with the same block height A general forking scenario in a blockchain may look like the following screenshot:
Figure 1.18 – Competing blocks during forking Depending on the nature of such events, the actions to fix the issues could be a hard fork or a soft fork or, in the case of a temporary fork, doing nothing and allowing the network to self-heal.
Hard fork A hard fork happens when radical changes in the blockchain protocol are introduced and it makes historical blocks non-conformant with new protocols or rules. Some are planned. Developers and operators agree with protocol changes and upgrades to new software. Blocks following the old protocol will be rejected, and blocks following the new protocol will become the longest chain moving forward. But, in some cases, this is controversial and heavily debated in the blockchain community, as was the case with the Bitcoin fork on 6 August 2010 or the fork between Ethereum and Ethereum Classic. In such contentious hard fork cases, as long as miners continue to maintain both the
old and new software, the blocks created by the old and new software will diverge into separate blockchains. The following screenshot illustrates both planned and contentious hard forks:
Figure 1.19 – Hard forks During a contentious hard fork of blockchain, a new cryptocurrency will be created to fuel the new blockchain. The owner of the existing crypto-assets may stay in the current network or move to the new network. When moving to the new network, they will receive a proportional amount of new cryptocurrency in the new network. Hard forks often create pricing volatility. The conversion rate between the old and new fork may be determined by the market. It is important to know the context and details of a hard fork and understand the crypto-economic impacts of such a fork to both cryptocurrencies in order to take advantage of such sudden and drastic changes. Once forked, nodes will start with separate paths moving forward. Nodes would need to decide which blockchain network they want to stay in. For example, Bitcoin Cash diverged from Bitcoin due to a disagreement within the Bitcoin community as to how to handle the scalability problem. As a result, Bitcoin Cash became its own chain and shares the transaction history from the genesis block up to the forking point. As of May 23, 2022, Bitcoin
Cash’s market cap is around $3.67 billion, ranking twenty-fourth, versus Bitcoin’s $556 billion.
Soft fork A soft fork, by contrast, is any change of rules that is backward-compatible between two versions of the software and the blocks. It goes both ways. In the soft fork case, existing historical blocks are still considered valid blocks by the new software. At the same time, the new blocks created through new software can still be recognized as valid ones by the old software. In the decentralized network, not all nodes upgrade their software at the same time. Nodes staying with an older version of the blockchain software continue creating new blocks using the older software. Nodes upgraded to the newer version of blockchain software will create new blocks using new software. Eventually, when the majority of the network hashing capacity upgrades to a newer version of the software, in theory more blocks will be created with the newer version and make it the longest chain. Nodes with older software can still create new blocks. Since it is not in the longest chain, as illustrated in the following screenshot, similar to the temporary fork case, these blocks will soon be overtaken by the new chain:
Figure 1.20 – Soft fork in progress Where more nodes are stuck on the older version, as illustrated in the following screenshot, new blocks created from an older version of blockchain software may become longer and longer; it will take a while for the new software to be effective:
Figure 1.21 – Soft fork at the end So far, you have learned how PoW and PoS work. We have analyzed the advantages and disadvantages of different consensus mechanisms. In the next section, we will help you understand what Bitcoin and cryptocurrency are and discuss how blockchain technology applies to Bitcoin.
Understanding Bitcoin and cryptocurrency Blockchain is the technology behind Bitcoin, which is considered the origin of all cryptocurrencies. In this section, we will introduce the basics of Bitcoin and discuss the digital payment mechanism with Bitcoin.
Bitcoin basics Bitcoin is a decentralized electronic cash system that makes peer-to-peer payment possible without going through an intermediary. The original Bitcoin software was developed by Satoshi Nakamoto, released under the MIT license in 2009, following the Bitcoin whitepaper, Bitcoin: A Peer-toPeer Electronic Cash System. Bitcoin is the first successful implementation of a distributed cryptocurrency. Thirteen years after Bitcoin was born, as of May 23, 2022, it has about 19 million Bitcoins in circulation and it has reached about a 556 billion market cap (https://coinmarketcap.com/currencies/bitcoin/).
Like any fiat currencies or tangible assets, the price of Bitcoins can fluctuate over time and its valuation is determined by the open market. Several factors can influence the price, including supply and demand on the market, competing cryptocurrencies and altcoins, and governance and regulations. The following screenshot shows the Bitcoin market cap, daily transaction volume, and price movement since its inception up to May 23, 2022:
Figure 1.22 – Bitcoin market cap In this section, we will present key concepts in Bitcoin, including the wallet, transaction and account balances, Bitcoin supply, and bootstrap. We will demonstrate how Bitcoin payments work with blockchain. We will also discuss major challenges in Bitcoin and the Bitcoin blockchain. Finally, we also briefly talk about various altcoins, different types of cryptocurrency on the market.
What is a wallet?
Bitcoin is a cryptocurrency, digital cash, or virtual money. Unlike a fiat currency, you can’t touch or feel it. You can’t stash Bitcoins under your bed. So, where do you store your Bitcoins? How do you prove ownership of the Bitcoins? Technically, Bitcoins aren’t stored anywhere. They don’t exist in any physical form. They are a set of software objects circulating around the Bitcoin network, where ownership of the Bitcoin is proved with a cryptographic key. Payment records detailing money being transferred in or out of people’s wallets are recorded as a chain of private keys showing ownership transfer in the blockchain. If you own the private keys, you own that Bitcoin. If you lose your keys, you lose everything you have on the Bitcoin network. A Bitcoin wallet is an application where the cryptographic keys, that is, pairs of public and private keys, are stored. There are many forms of Bitcoin wallets in use, as shown in the following diagram, but broadly, they are categorized into the following four types: desktop, mobile, web, and hardware wallets. Hardware wallets are considered cold wallets, while the rest are considered hot wallets. We have an extensive discussion on crypto wallets in Chapter 14, Build Ethereum Wallets:
Figure 1.23 – Types of Bitcoin wallets
Your private key is used by you to digitally sign the transaction when you spend some Bitcoin. Anyone who knows your public key can verify your signature on the payment you make to them. The public key — or more accurately, a wallet address associated with your public key — is used by anyone else to pay Bitcoin to you. You can have as many pairs of public and private keys as you want in your wallet. In Bitcoin, a private key is a 256-bit-long hash and a public key is 512 bits long. They can be converted into shorter lengths in a hexadecimal representation. A Bitcoin address is generated based on the public key, using multiple rounds of mixed use of the SHA-256 and RIPEMD-160 cryptographic hash functions. You can have as many addresses as you need, and each address can be used once for each Bitcoin transaction. The following screenshot gives an example of a Bitcoin wallet generated from the website at https://www.bitaddress.org/bitaddress.org-v3.3.0SHA256dec17c07685e1870960903d8f58090475b25af946fe95a734f88408cef4aa19 4.html:
Figure 1.24 – Bitcoin wallet The QR code on the left side is the Bitcoin address you can share with your trading partners. The secret one, the QR code on the right, is your private key with which you sign your transaction.
Transactions, UTXO, and account balances Whenever you check your bank account, you always see a balance associated with your checking or savings accounts. Your bank keeps track of all of your transactions and updates your balances following each and every transaction. A Bitcoin wallet provides you with a balance too. However, the balance in Bitcoin is not that straightforward. Instead of keeping track of every transaction, Bitcoin keeps track of unspent coins, also called UTXO. UTXO stands for unspent transaction output. In Bitcoin, a transaction is a collection of inputs and outputs transferring the ownership of bitcoins between payer and payee. Inputs instruct the network which coin or coins the payment will draw from. Those coins in the inputs have to be unspent, which means they have not been used to pay someone else. Outputs provide the spendable amounts of bitcoins that the payer agrees to pay to the payees. Once the transaction is made, the outputs become the unspent amounts to the payee; they remain unspent until the current payee pays someone else with the coin. Taking the earlier example where Alice needs to pay Bob 10 BTCs, let’s assume, prior to this transaction, that Alice has two UTXOs in her wallet, one with 5 BTCs and another with 8 BTCs. Bob has one UTXO of 30 BTCs in his wallet from other transactions. Let’s also ignore the transaction fee for now. When Alice uses both UTXOs as the input to pay 10 BTCs to Bob, both will be the inputs of the transaction. One 10 BTC UTXO will be created as output to Bob, and one 3 BTC UTXO will be returned to Alice. After the transaction, Alice will have one 3 BTC UTXO in her account, and Bob will have two UTXOs in his account. They remain as UTXOs until they are used to pay for other transactions:
Figure 1.25 – How UTXOs work When either Alice or Bob pays someone with the remaining UTXOs, the unspent output from the previous transaction becomes an input to the new transaction. Since all transactions are digitally signed, essentially a Bitcoin becomes a chain of digital signatures on the Bitcoin blockchain network. In fact, the blockchain is a state machine that records all transactions on an immutable ledger. Each UTXO can be ultimately traced back to the original coins that were mined by miners, which in turn can be traced back to the first set of bitcoins on the first block. Piecing together all the transactions that have occurred on the Bitcoin blockchain, from the genesis block to all blocks on the blockchain, you would see Bitcoins changing hands as in the following directed acyclic graph:
Figure 1.26 – UTXO in a directed acyclic graph To count the number of UTXO transactions or the total amount of unspent bitcoins, you have to count the number of leaf UTXOs, and the total amounts of bitcoins in the leaf UTXOs. To count how much bitcoin you have in your own wallet, all you need to do is add all unspent bitcoins in all leaf UTXOs where you are specified as the payee in the transaction outputs.
Genesis block and coin supply In Bitcoin, there is no central authority to issue the cryptocurrency and control the money supply. Instead, Bitcoin is created by the Bitcoin blockchain network through the discovery of new blocks. As shown in the following screenshot, the first block is also called the genesis block, or block #0, which was mined on June 3, 2009, with an output of 50 BTC. The first 50 BTC is not spendable. The following screenshot shows the genesis block in the Bitcoin blockchain:
Figure 1.27 – Genesis block Source: https://www.blockchain.com/btc/block/000000000019d6689c085ae165831 e934ff763ae46a2a6c172b3f1b60a8ce26f Bitcoin uses a Bitcoin generation algorithm to control how many coins will be minted and at what rate. It is a function of the Bitcoin block height and its block reward. It started with a block reward of 50 BTC. The block reward is cut in half for every 210,000 blocks, or approximately every four years. The rate of block creation is adjusted based on mining difficulty. The maximum capacity of Bitcoins in the system is 21 million, which can be reached when 6,929,999 blocks have been mined. For more information, you should check out the Bitcoin wiki site: https://en.Bitcoin.it/wiki/Controlled_supply.
How does Bitcoin payment work? Take the earlier example when Alice needs to pay Bob 10 BTC. Alice opens her Bitcoin wallet, scans or copies Bob’s transaction address, and creates a transaction with a 10 BTC payment to Bob. Once the transaction is digitally signed and submitted, it is sent to the Bitcoin blockchain network:
Figure 1.28 – How Bitcoin payment works Once the transaction is broadcasted to the Bitcoin network, the bookkeeper node, usually a full node in a P2P network that receives the transactions, will validate it according to Bitcoin protocol rules. If the transaction is valid, the bookkeeper will add it to the transaction pool and relay the transaction to the peers in the network. In the Bitcoin network, every 10 minutes, a subset of network nodes, called mining nodes or miners, will collect all valid transactions from the transaction pool and create the candidate blocks. They also create a Coinbase transaction for themselves to be rewarded by collecting the transaction fees in the event they win the mining race and add the block to the chain. All nodes will verify the new
block and add it to their own copies of the blockchain. Magically, Bob will be able to see the payment from Alice and 10 BTC in his wallet.
Bitcoin transaction and block structure When creating a Bitcoin transaction, the wallet application has to follow the Bitcoin protocol rules and create the transaction data structure in line with the Bitcoin specification. Invalid transactions will be rejected by the network. For details of the Bitcoin transaction and block structure, please refer to https://en.Bitcoin.it/wiki/. The following are key data structures in a Bitcoin transaction and block: Bitcoin block structure: The following table shows the data structure within a Bitcoin block:
Figure 1.29 – Bitcoin block structure Block header structure: The following table shows the data structure for a block header:
Figure 1.30 – Bitcoin header structure In particular, hashPrevBlock references the 256-bit hash value of the previous block, and hashMerkleRoot is the hash Merkle root of all transactions in the block, including the Coinbase transactions. And the nonce is the magic number that miners need to find so that the SHA-256 hash value of the block header is smaller than or equal to the blockchaindefined specific target. Transaction structure in Bitcoin: The following screenshot shows the general data structure of a Bitcoin transaction:
Figure 1.31 – Bitcoin transaction structure A transaction can have many inputs and outputs, as specified in the field of list of inputs and list of outputs fields. The input structure is shown as follows:
Figure 1.32 – Transaction inputs in a Bitcoin transaction The following table shows the structure for the output:
Figure 1.33 – Transaction outputs in a Bitcoin transaction Now, you understand transaction and block data structure. In the next subsection, let us see how transactions are processed in a blockchain network.
Transaction validation and block verification Bitcoin protocol rules define a set of validation rules, including syntactic rules and valid values. Bookkeepers, or miner nodes, need to validate transactions according to those rules before the transaction is added to the pool. It also checks the following (https://en.Bitcoin.it/wiki/Protocol_rules): Transaction duplication: This is to see whether we have matching transactions in the transaction pool or in a block in the main branch Double spend: This is to check whether the input is used to pay concurrently in any other transactions in the pool or in the main branch Orphan transaction: For each input, this checks whether we can find the reference output transaction in the main branch and the transaction pool Coinbase maturity: This is to make sure coins from the Coinbase transaction are mature enough to be spent Overdraft: This checks the inputs and outputs to make sure there is enough to make the payment and be able to make a reasonable transaction fee Once a miner completes a new block with the mining, the new block will be broadcasted to the Bitcoin network for verification. Each full node,
including mining nodes, will verify the new block and all transactions within the block. The same set of transaction validation rules will be applied. For block verification, all nodes check whether the block has the right cryptographic hash and the nonce makes the hash smaller than the target. The miner will add the block to the longest chain. As we discussed earlier, temporary forking may happen; a Bitcoin block tends to self-heal and only the blocks in the longest chain will stay.
Limitations in Bitcoin Thanks to Bitcoin, blockchain technology has attracted worldwide attention. Like any new technology, it has its limitations. Notable limitations include the following: Scalability and throughput: Scalability is a major concern in the Bitcoin network, and more broadly in any PoW-based blockchain. By design, every transaction has to be verified by all nodes, and it takes about an average of 10 minutes to create a new block with the block size limited to 1 MB. Block size and frequency limitations further constrain the network’s throughput. Transaction processing cost: Mining in the Bitcoin network is costly and energy intensive. The miners who add new blocks to the blockchain are rewarded with bitcoins. As the bitcoin supply gets closer to the maximum capacity of 21 M bitcoins, mining becomes less profitable. Miners will rely more and more on transaction fees to offset the mining cost and make a profit. It will drastically increase the transaction cost in Bitcoin. Please refer to https://Bitcoinfees.info for real-time transaction fees in the Bitcoin network. Security and privacy: Bitcoin has the 51% attack issue. At least in theory, network could be compromised if the majority of CPU hashing power is controlled by dishonest miners. It may not be economically feasible to launch such an attack on the main Bitcoin network. But recently, at least five cryptocurrencies with much smaller networks have been hit with attacks of this type. By design, all transactions are permanently stored in the Bitcoin network and can be traced to the
involved parties. They are made public. This greatly improves transparency, however, unfortunately, also raises privacy concerns. Usability: Bitcoin uses a stack-based scripting system for transaction processing. It supports very rudimental operations and lacks the functionalities of modern programming languages. It is Turingincomplete and inhibits the ability to build more sophisticated realworld business and payment applications. Finality: Transaction finality refers to the moment that blockchain transactions are considered complete and can no longer be reverted. In a PoW-based blockchain system such as Bitcoin, the blockchain goes with the longest chain, therefore there is no immediate finality. The deeper in the chain a given block becomes, the more likely it is that the transactions in the block will be finalized. In Bitcoin, transaction finality is probabilistic. It is believed that it takes 6 blocks to be considered as safe and final, which means about 60 minutes. By design, if you lose your private keys, you lose access to your bitcoins. In the same way, if your private keys are compromised by hackers, they can take possession of your bitcoins and make any transactions they wish. To address this issue and some accompanying security concerns, Bitcoin introduced multiple signatures (multisig) in 2014 to allow multiple keys to be used to authorize one single Bitcoin transaction. Bitcoin Core has been using Elliptic Curve Digital Signature Algorithm or ECDSA as its cryptographic algorithm for digital signatures from day one, when it was distributed by Satoshi in 2009. As shown in the following diagram, three payors, Alice, Kyle, and Sam, each need to sign the transaction with their own keys. All three digital signatures need to be added to the transaction when they pay Bob some bitcoins together:
Figure 1.34 – Multisig in Bitcoin transactions The latest update to Bitcoin Core in 2021 was the Taproot upgrade, designed to further address privacy concerns and improve scalability and throughput. The Taproot upgrade leverages Schnorr signatures as a replacement for the ECDSA schema when signing transactions, and introduces a Merklized Abstract Syntax Tree (MAST) schema to aggregate multiple signatures into one Schnorr signature for multisig transactions, as shown in the following screenshot:
Figure 1.35 – Schnorr signature in Bitcoin transactions
Schnorr signatures are much more efficient in signing and verification than the ECDSA schema, and require less data to be transmitted within the P2P network and stored on the blockchain, which in turn makes the Bitcoin blockchain more efficient, secure, and scalable. With the Taproot upgrade, you no longer need to expose all your public keys when making multisignature Bitcoin transactions. Note For more information, you should check out the Bitcoin wiki site: https://en.bitcoin.it/wiki/Taproot_activation_proposals.
Altcoins Altcoins are cryptocurrencies other than Bitcoin. Some earlier altcoins, such as Litecoin, are variations of Bitcoin with changes and improvements implemented to address some of the particular limitations we discussed in the previous section. Some, including Ethereum, BNB Chain, Cardano, and Solana, are intended as new blockchain platforms for building decentralized applications. According to http://coinmarketcap.com, the following are the top ten altcoins based on the market cap, as of May 23, 2022:
Figure 1.36 – Top 10 altcoins Compared with the top 10 altcoins published in 2019 in the first edition of this book, Bitcoin and Ethereum continue to stay as the top two crypto coins. XRP and Cardano also remain in the top 10, but the other 6 were nowhere to be seen back in 2019, although their market caps now range from $10 billion to $73 billion. Bitcoin variants such as Litecoin or Bitcoin Cash declined to #18 and #24, respectively. The following is a list of leading altcoins: Ethereum: This is one of the best-known smart contract platforms that enables Decentralized Applications (DApps). It was invented by Vitalik Buterin in 2013. Ether is the native currency of the Ethereum platform and uses the symbol ETH. It comes with the Ethereum Virtual Machine (EVM) to enable smart contract execution on the Ethereum blockchain. We will dive into the details of Ethereum throughout the rest of this book.
XRP: XRP is a native cryptocurrency that powers the XRP ledger, enabling value transfers in the Ripple network. Unlike Bitcoin or Ethereum, all XRP tokens were pre-minted at the beginning. The XRP Ledger (XRPL) is a decentralized public blockchain that maintains the order and sequence of all XRP transactions. It doesn’t use PoW or PoS. Instead, in the XRP consensus protocol, designated servers reach an agreement on outstanding transactions every 3-5 seconds. All transactions are made public, with strong cryptography to guarantee the integrity of the system. BNB Chain: Similar to Ethereum, BNB Chain is another smart contract-enabled blockchain platform intended to create a Decentralized Finance (DeFi) ecosystem. It is EVM compatible, which means you can deploy Ethereum smart contracts on the Binance chain and vice versa. Instead of using PoW as in Bitcoin or PoS consensus as in Ethereum 2.0, it operates using a Proof-of-Authority (PoA) consensus mechanism. The native token of BNB Chain is the BNB coin. We will discuss in detail how BNB Chain and other EVMcompatible blockchains work in Chapter 4, EVM-Compatible Blockchain Networks. Solana: Solana is another native blockchain platform created for supporting smart contracts and DApps. It uses the SOL symbol. Different than other blockchain platforms, Solana uses a combination of the PoS consensus mechanism and a Proof-of-History (PoH) algorithm to ensure network security and the accurate recording of transaction sequences on the blockchain. We will briefly introduce the Solana blockchain in Chapter 5, Deep Research and Latest Developments in Ethereum. Litecoin: This is almost identical to Bitcoin except that the time for adding a new block was reduced from 10 minutes to 2 minutes. Bitcoin Cash: This is a hard fork of the Bitcoin chain that was created because of a group of Bitcoin Core developers that wanted to use a different way of addressing the scalability issue. Blockchain technology will continue to evolve. As blockchain finds more usages in industry, more advanced blockchain networks and newer altcoins
will continue to rise to the top. In the next section, we will showcase some of the influential blockchain use cases across all industries.
Overview of blockchain use cases in the industry and government Since its invention in 2009, blockchain has garnered great interest across industry worldwide. It is considered a disruptive technology that has unsettled financial services, banking, and the payment industry and continues to fundamentally change the way business is conducted in all other industries. It has found great success in all traditional industries, as well as in governments around the world. As the world desperately searches for ways to get out of the tangled web of the Web2 world and explore unchartered paths on a voyage into the digital future, it is blockchain that ushers in the unknown metaverse and the world of Web3. In this section, we will showcase a few successful use cases of blockchain technology in industry and government.
Financial services Blockchain started as a peer-to-peer electronic payments solution, and quickly found broad success in the financial services, banking, and payment industries. Decentralized Finance (DeFi) is a collective term for financial instruments created on top of distributed ledger technology and blockchain, which replicate all traditional financial instruments in the digital world using cryptocurrencies and smart contracts. It created a world in the digital and crypto space parallel to the real, traditional world of financial services to which we have been accustomed for the last few hundred years – and it offered more. Today, DeFi products and services range from crypto asset management, lending and borrowing, and DeFi exchange, to sophisticated risk management products such as derivatives, insurance, and more. Bridges between traditional finance and DeFi are built to provide blockchain and smart contract solutions to traditional financial institutions to enable them to diversify and expose themselves to the crypto markets.
You will learn more about blockchain, smart contracts, and cryptocurrency throughout the rest of this book. In particular, we will delve into DeFi use cases and the leading protocols in Chapter 3, Decentralized Finance.
Payments There are great opportunities in the payment market due to its size. According to the McKinsey on Payments in 2021 report, covering the trends and opportunities in the global payments space, more than three quarters of Americans are using some form of digital payments. Digital payments constitute 78% of all payments. McKinsey’s survey of executives from leading banks found that leveraging blockchain and distributed ledger technologies to support the digitization of supply chain financing is one of the technology trends they observed. Blockchain-enabled models further support banks in offering frictionless real-time payments with lower costs. Ripple came into the picture as early as 2012 with the intention to make a dent in the inter-bank money transfer system. This space has traditionally been dominated by the Society for Worldwide Interbank Financial Telecommunication, better known by its acronym SWIFT. That year, Ripple Labs Inc. released a real-time gross settlement system, including a currency exchange and remittance network. RippleNet provides a service to send money globally by connecting banks, payment providers, and digital asset exchanges. To transfer funds between banks, RippleNet uses its native XRP tokens and guarantees fast and secure settlements. As of February 2023, this payment network involves 55 countries and hosts over 120 currency pairs. More than 100 financial institutions worldwide have joined its payment network. Ripple is not the only blockchain-based payment service. Mastercard and the blockchain company R3 announced a partnership to produce a crossborder payment solution in September 2019. China Construction Bank also developed a blockchain solution enabling supply-chain financing for crossborder payments with the aim of reducing settlement time. We expect that companies with objectives such as financial inclusion, consumer protection, and regulatory compliance will continue to leverage
blockchain and other emerging technologies, and work together towards the same goal, a noble goal of offering a global currency and robust, secure infrastructure to empower the lives of people all over the world.
Audit and assurance During an audit, an organization’s financial statements are evaluated to determine their accuracy and fairness. If all the transactions are recorded in an immutable blockchain as indelible marks, audits will become redundant. However, there is still the chance that blockchain transactions could be logged in the wrong sections of financial statements, or a transaction itself may be illegal if, say, it’s sent and received between parties that might not comply with regulations. In some cases, it can even be sent as an off-chain agreement. Internationally, the audit market is dominated by big players such as Price Waterhouse Coopers (PwC), KPMG, Ernst & Young (EY), and Deloitte. All of them are bringing in blockchain innovations. PwC announced a blockchain auditing service in March 2018. PwC France and Francophone Africa brought together experts specialized in cybersecurity, big data, and audits in its blockchain lab, located in Paris. The lab collaborated with Request Network, which is a project that’s building a decentralized payment system for the Ethereum network. KPMG, partnered with Microsoft, has expanded its blockchain strategy for audits. Microsoft Azure’s hybrid cloud capabilities, security at an enterprise level, and extensive compliance certification portfolio were combined by KPMG to break down complex business workflows. The first joint blockchain nodes opened in Frankfurt and Singapore as early as February 2017. By April 2018, EY had announced its blockchain analyzer to make the lives of audit teams easier in sourcing all transaction data across an organization from multiple ledgers on the blockchain. May 2022 saw the production release of the third generation of EY Blockchain Analyzer: Reconciler,
being made available for the first time to non-audit clients. It enables the auditors and non-audit users to do the following: Import enterprise records Reconcile off-chain enterprise records with on-chain transactions Track wallet balances In 2017, Deloitte released 90,000 certificates on the private blockchain, with an international accredited registrar DNV GL. How does this work? As soon as a new certificate is issued, it will be digitized and stored in a private blockchain. In this private network, each certificate is tagged uniquely and can be traced. Simply scanning a QR code makes it possible for anyone to verify that a company is certified. Blockchain technology also brings new challenges to auditing and assurance. The chaos from Initial Coin Offerings (ICOs) and Initial Token Offerings (ITOs), which we will talk in Chapter 3, Decentralized Finance, as well as the security and audit issues in smart contracts, will make auditing and assurance of DeFi companies and the transparency and accountability of cryptocurrency much harder to achieve. The dramatic crash of Terra/Luna demonstrates that it requires a different level of expertise that audit and assurance firms have to build. Lack of regulatory clarity on how crypto assets are audited will ultimately only hurt the average Joe.
Healthcare Healthcare is another industry that has strived to find ways to maintain health quality and lower overall costs. Blockchain has the potential to disrupt the entire industry.
COVID-19 contact tracing
The COVID-19 pandemic is an ongoing global pandemic of a coronavirus disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Since early 2020, it has overwhelmed healthcare systems and disrupted people’s lives around the world. Beyond vaccines, the best way to slow down the rate of infection is social distancing and contact tracing. IBM developed a digital health pass using blockchain, like a health passport, to track and selectively share individual COVID-19 status so that it can help to control the spread of the disease. It allows organizations to verify the health credentials of those they have interacted with or do business with. With privacy built into the tools, it enables individuals to manage their health information through an encrypted digital wallet on their smartphones and maintain control of who it is shared with and for what purpose. As published on HealthIT.gov in the middle of 2020, Villanova University developed a blockchain solution to help medical facilities track coronavirus cases globally. The system enables medical facilities around the world to publish coronavirus test results between doctors on a blockchain. Assisted by technologies such as the Internet of Things (IoT) and Artificial Intelligence (AI), the system triggers alerts about potential surges of COVID-19 cases based on high-risk gatherings and public space surveys. These alerts enable healthcare providers to make data-driven decisions in allocating medical staff and equipment already in short supply. Tech Mahindra developed a vaccine ledger using blockchain technology to track COVID-19 vaccines from manufacturers to the recipients, with the intention of preventing counterfeits. As a result, vaccine distribution becomes safer and more reliable, especially during initial outbreaks, where the scarcity of authentic vaccines may drive price gauging and the spread of counterfeits. Similarly, Novartis, a global health company, has been exploring the potential of blockchain for pharmaceutics and developing drug tracking systems using blockchain to combat counterfeits and blackmarket medicine.
Electronic medical records
Blockchain in healthcare can help improve medical record access and sharing. Anthem, the largest health plan in the BlueCross BlueShield system and the second largest health insurer in the United States, is using blockchain technology to allow secure data access and sharing of its members’ medical records. It started as a pilot in 2019 and expects to roll out this feature to all 40 million members.
Medical claim adjudication and payment Several leading healthcare payor and provider organizations are working together to explore the potential of taking advantage of blockchain to improve the efficiency of administrative processes with the goal of bending curve in the administrative cost in healthcare ecosystem. Anthem, together with Health Care Service Corporation (HCSC), another of the largest insurers in the BlueCross BlueShield system, shared other party liability information over blockchain and made it to all parties, which in turn removes friction and the possibility of erroneous manual processing from coordination of benefit during the claim adjudication. A group of top healthcare heavyweights, including Anthem, HCSC, and Aetna, together with IBM, initiated a blockchain healthcare network with the intention to tackle various healthcare industry challenges, including efficiency and end-to-end visibility in claims processing and payments, as well as provider data accuracy. Not coincidently, United Healthcare, Humana, and Change Healthcare are also piloting blockchain solutions with another group of health organizations, initially focusing on addressing provider data quality issues using distributed ledger technology. Provider data accuracy and quality are prevailing industry challenges that cost the healthcare industry billions of dollars in administrative expenses each year. Providence St. Joseph Health, one of the leading not-for-profit healthcare systems, is focusing on building an integrated provider-payer system on a blockchain platform to streamline claims processing and interoperability among all parties, and improve revenue cycle efficiency. One of the successful use cases is to utilize blockchain technology and shared ledgers
to improve prior authorization processes, a complex process that often infuriates providers and doctors due to erroneous denials of medical services, increasing friction between providers and healthcare payors.
Blockchain use cases led by government Governments around the world have leveraged blockchain technology to improve service provision to citizens in their countries. In this section, we will discuss some of the prominent blockchain use cases implemented in the public sector.
Food safety initiatives Believe it or not, the food industry is leading the blockchain adoption. The FDA is taking a new approach to food safety. In the New Era of Smarter Food Safety blueprint, announced in July 2020, the FDA laid out its technology heavy approaches for creating a safer, more digital and traceable food system, including tech-enable traceability and smarter tools for prevention and outbreak response. In its blueprint, it plans to tap into new technologies, like blockchain and Internet of Things (IoT), for maintaining records and tracking events from growing to the food on the table. In fact, there are many such food safety tracking implementations already on blockchain. Walmart, partnering with IBM and Nestle, implemented a supply chain food tracking system using Hyperledger, a permissioned blockchain, as early as 2018. It tracks the journey of fresh produce from farms to grocery stores. Agricultural conglomerate Cargill Inc. is leveraging blockchain technology to track its Thanksgiving turkeys from the store they were sold in back to the farm that raised them. In a similar way, Nestle, the Swiss multinational food and drink processing conglomerate, is harnessing blockchain technology for baby food safety. It allows the consumer to simply scan a QR code to track items such as organic infant formula, baby
food, and instant mashed potatoes from the manufacturer to the grocery store shelf. The immutable and auditable nature of blockchain transaction records make it best suited for supply chain tracking. From the top 50 blockchain use cases reported by Forbes in the last 3 years, it is no wonder that supply chain tracking using blockchain has been applied in almost every industry. For example, AP Moller Maersk, a Danish shipping company, is using Hyperledger to track shipping containers during ocean and inland freight transportation. Breitling, the luxury watchmaker, has built a Ethereumbased blockchain system to track and prove the authenticity of its products. De Beers, the diamond producer, and LVMH, the world leader in highquality luxury products, are also using blockchain technology to track products, prove authenticity, and fight counterfeiting. Blockchain finds its way in driving positive changes in Environment, Social, and Governance (ESG) issues and supporting ESG’s mission for a sustainable future too. Mining giant BHP developed blockchain solutions to document emission data and trace its carbon footprint. Other use cases have been implemented by Industrial and Commercial Bank of China (ICBC) to track and incentivize energy-efficient vehicle usage by connecting ICBC customer wallets to government transportation data via blockchain.
Smart city ambitions The smart city concept is not new. A smart city is a technologically modern physical infrastructure that integrates and leverages information, communication, and network technology to optimize city operations and services for its residences. With the advent of blockchain, 5G, IoT, and AI/Machine Learning (ML), interpretations of the smart city concept have expanded. Modern smart cities use different types of IoT devices and sensors to collect specific data from citizens, devices, buildings, assets, and every element of city operations, and leverage 5G networks to efficiently transmit and share this data. AI/ML technology is used to gain insights and drive efficiency in managing assets, resources, and services, improving
operations across the city. Blockchain is expected to be used as an immutable and shared ledger to facilitate frictionless data exchange. One such ambition is Saudi Arabia’s $500 billion smart city project, NEOM, which intends to build a cognitive city and hyperconnected infrastructure from scratch. Advanced technology plays a key role in ensuring efficient and smart operations in cities. Blockchain and smart contracts will be used to manage instantaneous transactions and financial payments. It will enable network participants to exchange data with a high degree of reliability and transparency. If this has gotten you interested, you can check https://www.neom.com/en-us for any new developments.
Central bank-issued digital currency A central bank-issued digital currency (CBDC) is a digital form of central bank money that is widely available to the public. It is also viewed as the digital form of fiat money. Although the US has not decided whether or when it will issue a CDBC yet, the Federal Reserve has recognized the benefits of CDBCs and is exploring the implications and options for issuing a US CDBC, sometimes also called a digital dollar. In its Money and Payments: The U.S. Dollar in the Age of Digital Transformation report published in January 2022, the Federal Reserve made it clear that it considers CDBCs to be a digitized version of fiat currency, and an expansion to the existing fiat currency, not intended to reduce or replace it. Due to the widespread use of Bitcoin and other cryptocurrencies, central banks around the globe face the same dilemma. As tracked by the Atlantic Council’s Geoeconomics Center, 9 out 91 countries on the tracking list have launched their own CDBCs, including the Bahamas, Nigeria, and seven eastern Caribbean countries. 15 countries are piloting CDBCs, including China, Singapore, Russia, and South Korea. 16 countries are in the development stage of their own CDBCs. The US is one of 40 countries still in the research stage. China’s Central Bank released a pilot version of its digital yuan wallet in April 2021 and has expanded the pilot to more than 11 regions as of February 2023. Although the US is behind the other top central banks,
including the EU, UK, and Japan, a number of technological experiments related to digital currencies have already been conducted in the US, including a hypothetical CBDC and the use of distributed ledger technology for wholesale payments. Research on economic and policy, stakeholder engagement and outreach activities, and international collaboration are underway to help the Fed reach a decision about the appropriateness of issuing a US CBDC. Powering many blockchain implementations of the use cases in the preceding subsections are a set of newer blockchain technologies and smart contracts. In the next section, we will introduce you to Ethereum, a smart contract-enabled blockchain network.
Ushering in the world of Ethereum Vitalik Buterin, the founder of Ethereum, addressed the limitations of Bitcoin discussed earlier quite differently. While working on Bitcoin, he recognized that Bitcoin’s stack-based scripting is very limited and lacks the functionality and capability for application development beyond the transfer of cryptocurrency ownership. He saw it as a huge opportunity and began writing his own whitepaper in 2013. In his famous Ethereum whitepaper (https://github.com/ethereum/wiki/wiki/White-Paper), Vitalik laid out his vision and intent to build a blockchain that includes the following: A built-in Turing-complete programming language A smart contract and decentralized application platform, allowing anyone to define, create, and trade all types of cryptocurrencies and crypto assets Similar to Bitcoin, Ethereum is built on blockchain technology. It has all of the critical characteristics of a blockchain. It is a shared distributed ledger on top of a decentralized P2P network. It works in a similar way to that discussed in the Bitcoin and cryptocurrency section earlier in this chapter.
Unlike Bitcoin, which is a decentralized state transition system with limited decentralized computing capability via Bitcoin scripting, Ethereum is a decentralized computing and data platform featuring Turing-complete smart contract functionality. Ethereum introduced a few new and critical concepts, including the smart contract, EVM, and account. We will cover them in detail in the rest of this book.
Smart contract A smart contract is a piece of programming code that is stored and executed on the blockchain. Ethereum now has a Turing-complete language, Solidity, which enables developers to develop and deploy smart contracts. In addition to moving ether (the cryptocurrency of the Ethereum network) between accounts, Ethereum smart contract code can support more modern program language constructs such as loops and perform much more complex computations, including data access, cryptographic algorithms, and function calls. Each such operation has a gas price associated with it. That is how Ethereum calculates the transaction cost of running smart contracts and, through a gas limit, protects smart contracts from infinite loops or programming errors. A smart contract is like a scripted agreement between interacting parties; the code built into the contract is stored on the Ethereum blockchain and cannot be tampered with or removed. This greatly increases the credibility of legal documents.
EVM The EVM is the runtime environment for smart contracts in Ethereum. It is a virtual operating system deployed as an Ethereum client to all network nodes across the globe. Similar to the Java Virtual Machine (JVM) in the Java world, contract code is compiled into bytecode, which is loaded into the EVM as part of contract creation.
Account There is no concept of accounts in Bitcoin. Instead, Bitcoin uses the concept of UTXO to keep track of money transfers and account balances. Ethereum introduces the concept of the world state and account. The world state comprises a mapping of all accounts and their public addresses. To facilitate both state transactions and decentralized computing, Ethereum introduces two types of accounts: Externally Owned Accounts (EOAs), controlled by private keys, and contract accounts, controlled by their contract code.
Summary In this chapter, we explained key blockchain components and elements and the different characteristics of blockchain, and we discussed how blockchain works. We reviewed cryptographic technologies and how they are leveraged in blockchain. We illustrated how PoW and PoS consensus mechanisms work. We went over some key concepts in Bitcoin, as well as examining some Bitcoin limitations. We provided a short overview of cryptocurrencies and altcoins. We then showcased some of the leading blockchain implementations across the industry. We also briefly introduced Ethereum, as well as the key differences between Bitcoin and Ethereum. In the next chapter, we will delve into the Ethereum architecture and ecosystem in greater depth. Stay tuned.
Ethereum Architecture and Ecosystem In this chapter, we will show you the Ethereum 1.0 (Eth1) architecture and how Eth1 works under the hood. We will discuss how the Ethereum Virtual Machine (EVM) works and how smart contract code is executed within the EVM. We will help you understand blockchain scalability challenges, and various layer 1 and layer 2 options for scaling Ethereum. We will then introduce Ethereum 2.0 (Eth2), including Ethereum’s transition to Proof of Stake (PoS), Beacon Chain, and the Eth1 and Eth2 merge. You will be able to learn the modular design of Eth2 blockchain architecture, get a deep understanding of the execution layer and consensus layer, and how the beacon chain works in the context of PoS. At the end of this chapter, we will discuss rollups and explain in detail how optimistic rollups and Zero-Knowledge (ZK) rollups work. To reflect the changes that took place with the Ethereum merge in September 2022, in this book, we will change the nomenclature for the Ethereum protocol. Eth1 will now be referred to as the original Proof-ofwork (PoW)-based Ethereum, Eth2 will now be referred to as the beacon chain prior to the merge, and Ethereum will now be the merged term for both Ethereum 1.0 and Ethereum 2.0. We will cover the following topics in this chapter: Introducing the Eth 1.0 Architecture Diving deep into Eth1 Examining mining in Eth1 Understanding scaling challenges in Ethereum Introducing Beacon Chains and Eth2 Merging Eth1 and Eth2 Scaling Ethereum with rollups
Technical requirements For all the source code for this book, please refer to the following GitHub link: https://github.com/PacktPublishing/Learn-Ethereum-Second-Edition/.
Introducing the Eth1 Architecture Vitalik Buterin, the creator of Ethereum and co-founder of the Ethereum Foundation, envisioned Ethereum as a decentralized computing platform that enables anyone to create, store, and run smart contract-based Decentralized Applications (DApps). What separates Ethereum from Bitcoin is the smart contract platform, which makes it possible to support much more broad use cases with blockchain, beyond just payment and money transfer. Smart contracts are computer programs deployed and executed on the Ethereum blockchain network. They enable complex transactions between multiple parties. As the following diagram shows, an Ethereum blockchain network is a decentralized Peer-to-Peer (P2P) network of Ethereum clients, representing network nodes. An Ethereum client refers to software installed in the network node that can verify the new transaction, execute the smart contract, and process new blocks of the chain. It is a kind of enclave, residing in thousands of computers or devices on the internet, and connected through the Ethereum P2P network. What is enclaved is the EVM and the runtime environment in the P2P network for smart contract execution. The following diagram shows the P2P network:
Figure 2.1 – P2P network in Eth1 Ethereum clients run the EVM and can technically be written in any popular programming language. There are many different implementations of Ethereum clients. Ethereum makes a variety of different client implementations possible as long as the implementation conforms to the protocol specification defined in the Ethereum yellow paper (https://github.com/ethereum/yellowpaper). The design behind Ethereum, based on the whitepaper, is intended to build a simple, efficient, and extensible blockchain platform, and have a Turingcomplete program language to support more sophisticated and complex computations. It not only has all of the benefits of a blockchain but can also serve as the framework to support all types of digital assets and value transfers as well. From the beginning of Ethereum, Ethereum creators and its community have promoted client diversity. There are many advantages of such a variety of Ethereum client implementations, including the following: It makes the network more resilient to bugs.
It prevents the centralization of developer resources. In general, competitions between teams help to find the best solutions to common and challenging issues. Each client may have a different focus, strength, and weakness in mining, prototyping, DApp development, and more. DApp developers or private Ethereum blockchain operators may choose the ones fitting their own special needs. Notable earlier Ethereum clients include the following: Client
Language Developers
Where to download
Geth
Go
Ethereum Foundation
https://geth.ethereum.org/downloa
Parity
Rust
Parity
https://www.parity.io
Besu
Java
Hyperledger https://besu.hyperledger.org/en/s Foundation
OpenEthereum Rust
Ethereum Foundation
https://openethereum.github.io
Nethermind
.Net
Nethermind
https://nethermind.io/nethermindclient/
Erigon
Go
A team of developers
https://github.com/ledgerwatch/er
Table 2.1 – Ethereum clients Ethereum introduces the concept of the world state and account. Ethereum uses an account model to keep track of money transfers and account balances, instead of the UTXO model used in Bitcoin. The world state comprises a mapping of all accounts and their public addresses. A state transition represents the mapping from an old stable state to a current state. Beyond smart contracts and the EVM, an Ethereum client provides all the
blockchain components for maintaining the world state and state transitions in the blockchain network, including the following: Managing transactions and state transitions within the Ethereum blockchain Maintaining the world state and account state Managing P2P communication Block finalization with mining Managing the transaction pool Managing crypto-assets, gas, ether, and tokens We will discuss more details in the next section when we dive deep into Ethereum and the EVM.
Ethereum – the world computer Ethereum is often pegged as the world computer in the decentralized world. What does that mean? How does Ethereum fulfill the tall order of the humongous amount of computation needed in the digital world? Let’s start with the history of the internet and the web and discuss the potential of Ethereum. The World Wide Web started as a decentralized content network in the early 90s. It was designed for people to publish and share content without going through any central authority or intermediary. But from the early 2000s, with the advent of e-commerce, social media, and mobile technology, collectively called Web 2.0, we began to communicate, interact, and transact with each other and share information through centralized services provided by big companies such as Google, Meta, Microsoft, and Amazon. Thanks to the power of platforms, platform businesses such as Uber, Airbnb, and Facebook managed to disrupt traditional business models and dominate vast traditional industries within just a few years of their own launches and outcompete the traditional companies with a tiny fraction of
the number of employees and resources. The direct consequence of this success is that all user data is concentrated in the hands of the few, creating risks that our data will be misused or even hacked. It also makes it easier for governments to conduct surveillance and impose censorship. Blockchain is on the way to becoming the new internet, Web 3.0. Bitcoin laid the foundation of decentralization with its shared public ledger, a digital cryptocurrency payment model, and P2P technology. Ethereum took this model further beyond finance and P2P payment, which propelled the creation of a new business model called DApps. By the original design, Ethereum provides a platform for anyone to write smart contracts and DApp based on their business needs and value propositions. It is intended as the world computer for the decentralized world. To support this goal, Ethereum provides four decentralized computing facilities, along with a large list of development and testing tools, which make it very easy to develop and deploy DApps on to the Ethereum blockchain. The four decentralized computing facilities are as follows: The Ethereum blockchain for a decentralized state Smart contracts for decentralized computing Swarm and IPFS for decentralized storage Whispers for P2P messaging The following diagram shows the decentralized computing facilities:
Figure 2.2 – Ethereum as the world computer Ethereum itself adopted many innovative ideas from the blockchain community too. With the transition to PoS, the enablement of rollups, the introduction of sharding, and the separation of the consensus layer and the execution layer, Ethereum is marching closer to becoming the decentralized computing platform and Web 3.0 infrastructure.
Web 2.0, Web3, to the Metaverse
As we all agree, Web 1, or Web 1.0, made it easier for everyone to share, find, and search for information. It enabled the connectivity and free flow of information. It brought convenience and the simplest form of digital interaction, which carried the internet boom to almost every corner of the world. Web 2.0, the current form of the internet, is a result of economies of scale. It further continued the momentum and brought massive convenience to the world at the expense of personal privacy, identity, or ownership. It made trade, commerce, transaction, interaction, and entertainment digital and possible to do online. Ethereum co-founder Gavin Wood first coined the term Web3 as early as 2014, and he envisioned a truly decentralized and more democratic version of the internet without monopoly or censorship as being the future of the web – that is, Web3. Since the original vision, Web3 has evolved as a collective ecosystem of everything decentralized, powered by and governed through the public blockchain. What will Web3 bring to the world? What problems will Web3 try to solve? Let us start with what has not been addressed or solved by Web 1.0 or Web 2.0: Money and Fiat currency – This is something that Web 2.0 has not attempted to resolve. On the contrary, companies big and small and average citizens rely on that to facilitate today’s thriving economy. Ownership – Believe it or not, everyone contributes to the thriving business of social networking and e-commerce. Your data becomes the new oil in the Web 2.0 engine. In trade for convenience and interaction, you give up ownership of your data, and your privacy too. Identity – Everyone can create a digital identity in their own preferred social network, or on an e-commerce website. How secure can it be? You will get a clue if you check whether any of them appear in big data lakes. You need a sovereign identity, controlled and owned by you, representing you in the digital world, rather than an avatar.
Privacy – This is a common problem even occurring in the blockchain world. Governance – Today, the web is largely governed by big companies, market monopolies, and central government. Blockchain may give a clue as to what the future could look like. These aspects need to be solved by the next generation of the internet, Web 3.0, and the future Metaverse. The public blockchain was naturally considered for the Web3 infrastructure due to decentralization and censorship resistance. Blockchain platforms, such as Bitcoin, Ethereum, and so on, attempt to solve issues of Fiat currency with cryptocurrency. With the introduction of Non-Fungible Tokens (NFTs), , the blockchain ecosystem makes digital ownership over real, digital, or virtual assets possible. They are valued with cryptocurrency or crypto-assets. No one can take away your ownership. They can be freely traded over the blockchain network. Decentralized Autonomous Organizations (DAOs) could be the key to addressing the governance gaps in today’s world, and bring in transparency, governing rules, and decision-making through smart contracts. Web3, NFTs, and blockchain technology will enable the world to voyage into the Metaverse, a virtual, digital universe centered around every autonomous digital me world. It’s all about my identity, my experience, my interactions, and my ownership in my virtual world. You may have played or remember Second Life, a massive, multiplayer 3D online game in a virtual world, developed by the San Francisco-based firm Linden Lab and launched in June 2003. According to Wikipedia, by 2013, Second Life had approximately one million regular users. Second Life has an internal economy, which is powered by the Linden dollar, or L$. The L$ is a virtual token for use only within the Second Life platform. You can buy or rent assets using L$. You can provide services and get paid with L$. The game itself is an autonomous, self-sustainable, and self-sufficient virtual world, where, as a virtual being, you can do anything, almost in parallel with the real world we are living in.
This is the earlier simplest linear version of the Metaverse. Powered by augmented reality (AR) and virtual reality (VR) technology, the future experience of the Metaverse will be more aerial than linear, more personal than machine-like, and more immersive, with Matrix-style collaboration rather than Pokémon Go-style social interaction. You will own your share of assets in the form of tokenization and cryptocurrency, and be able to claim ownership of any contributions, influence, or creation in the form of NFTs. Most importantly, you will live and interact with the world with your own me identity, which allows you to maintain your privacy in the Metaverse. We are still in the very early stages of Web3 and the Metaverse. The Metaverse is not a topic we will discuss in detail in this book. However, we will discuss the Metaverse in the context of NFTs in Chapter 5, Deep Research and Latest Developments in Ethereum.
DApps A DApp is an application or service that runs on a blockchain network and enables direct interaction between consumers and providers, for example, connecting buyers and sellers in a decentralized marketplace. Similar to the centralized application architecture, a DApp usually involves a decentralized backend, which runs on the blockchain network, and a centralized frontend, which allows end users to access their wallets and make a transaction. The following diagram shows the differentiation between centralized and decentralized applications:
Figure 2.3 – Comparison of centralized applications and DApps Although there are many different viewpoints, it is a common belief that a DApp must be completely decentralized and open source. It must run on a blockchain network and use and generate cryptographic tokens. Most DApps often start with a whitepaper and a working prototype. If garnering enough attention from the investors, it may involve a token sale and an Initial Coin Offering (ICO). We will discuss tokens and coins, as well as their funding mechanisms, in Chapter 3, Decentralized Finance. Ethereum clients provide a set of Web3 APIs over JSON-RPC for DApps interacting with an Ethereum blockchain. From your web or wallet application, you can use the Web3 object provided by the web3.js library to communicate with the Ethereum network. It works with any Ethereum client. Behind the scenes, it connects to a local or remote Ethereum node and makes RPC calls. In some sense, this is like the old client-server model, where DApps are the client and the entire Ethereum network as a whole acts as a server.
To DApps, the Ethereum network is just like a giant world computer, assembled with thousands of computing devices throughout the internet. Once you connect to the network, you could connect to any node in the decentralized network, as shown in the following diagram:
Figure 2.4 – Ethereum DApp architecture That is how the users of DApps interact and transact with each other on the blockchain. Many types of DApps are being created. According to State of the DApps, as of July 2022, there are over 4,000 DApps and about 7,000 smart contracts listed, and over 67% of them run on the Ethereum network. We will show how a Web3 API works in Chapter 7, Web3 API Fundamentals. In the next section, we will discuss how Eth1 works, and go into details of the Eth1 architecture. We will then shift the focus to Eth2. We show how Ethereum transitioned to PoS and how rollups enable massive improvements to transaction throughput. We will introduce DAOs and discuss sharding, IPFS, Swarm, and Whisper in more detail in Chapter 5, Deep Research and Latest Developments in Ethereum.
Diving deep into Eth1 In this section, we will get into the details of Ethereum architecture. We will start with some basic concepts of accounts, contracts, transactions, and
messages, and then discuss the EVM internals and how the EVM actually works.
Accounts As we discussed earlier, instead of the UTXO model, Ethereum manages accounts and transactions differently from Bitcoin. Ethereum introduces the world state concept, the collection of all accounts on the blockchain network. The world state presents the global state of the Ethereum network, which is constantly updated following any transaction execution. It is a kind of global database, which is replicated to all Ethereum nodes behind the scenes. Like your bank account, an Ethereum account is used for holding ethers and transacting with each other. It has a 20-byte cryptographic address and an account balance. The address identifies the owner of the account. In addition to the address, an Ethereum account contains four fields: Nonce: A counter used to identify distinct transactions Balance: The account’s current ether balance Contract code: An optional cryptographic hash code pointing to a smart code associated with the contract creation Storage: Optional cryptographic hash code pointing to the account’s storage The following diagram further illustrates the structure of an Ethereum account:
Figure 2.5 – Illustration of an Ethereum account In Ethereum, a transaction is a state transition of an account from one state to another, which is initiated by an external entity. All transactions, whether moving ether from one account to another or smart contract code execution, will be collated into a block. Also, the resulting account states and transaction receipts are added to the block too. The new block will be mined by the blockchain network and added to the blockchain. Data in the blockchain is stored in supporting storage, usually a database. Depending on the Ethereum client implementation, it may be stored in a different type of database. For example, the Geth implementation uses Google LevelDB as the underlying database implementation for the global state, as shown in the following screenshot:
Figure 2.6 – State transitions in Ethereum We will get into different types of accounts in the next section.
Two types of accounts Accounts play an essential role in Ethereum. Ethereum introduces two types of accounts: One is the Externally Owned Account (EOA), which is used for ether transfer and is controlled by private keys. There is no code associated with an EOA. Another one is a Contract Account (CA), which is used for contract creation and smart code execution. The EVM activates and executes the smart contract code logic whenever the CA receives a message. Beyond
normal operations, it may read from and write to internal storage or invoke smart contracts on other contracts. They are both state objects; an EOA has a balance, and a CA has both a balance and storage. Without CAs, Ethereum would be limited to the mere transfer of value between accounts, as with Bitcoin.
EOAs Just like your personal or business account in a financial institute, an EOA is associated with an external entity as an owner who has an interest in the account or has ownership of the underlying crypto-assets. Every EOA has a pair of cryptographic keys. It is controlled by the owner’s private key. The owner uses their private key to digitally sign all transactions so that the EVM can securely validate the identity of the senders. In the world state, the account is linked to a public address, which is generated based on the owner’s public key. We will talk about the address in detail in the Addresses and wallets section, as part of the discussion about the Ethereum wallet. The following screenshot shows the structure of the EOA:
Figure 2.7 – EOA As shown in the preceding screenshot, the EOA has a balance associated with the address, mainly used for ether transfer.
CAs A CA, or a contract, has an ether balance and associated code, which is linked to the smart contract code in an EVM. It may have optional storage, which points to EVM storage. A state change in a CA may involve an update of the ether balance, the associated data in the storage, or both. A CA has an associated address too, which is calculated using the Keccak-256 hash function, based on the address of its creator (sender) and the nonce:
Figure 2.8 – A CA The associated smart contract code is executed when it is triggered by transactions or messages received from other contracts. Once a new block is added to the blockchain, all participating nodes will execute the contract code again as part of the block verification process.
Transactions and messages Transactions and messages in Ethereum are a little bit confusing, largely because they are often discussed together. There are some subtle differences. The following diagram will give some clarity about the differences and commonalities between these two terms:
Figure 2.9 – Transactions and messages in Ethereum In Ethereum, the term transaction represents the signed data package of a message that is sent from an EOA to another account. The message itself instructs what action to take on the blockchain. They all require the initiator of the transaction to digitally sign the messages, and transactions will be recorded into the blockchain. Three types of transactions can happen: CA creation: In this case, an EOA acts as the initiator or creator of the new CA.
A transaction between two EOAs: In this case, one EOA initiates an ether movement transaction by sending a message to the receiving EOA. A transaction between EOA and CA: In this case, the EOA initiates a message call transaction, and the CA will react with the referenced smart contract code execution. The CA can send messages to other CAs or EOAs. Unlike a transaction, messages are virtual objects during the execution and will not be recorded in the blockchain. If an EOA is the recipient, the recipient’s account state will be updated and recorded in the world state. If a CA is the message recipient, it is accepted as a function call and the associated contract code will be executed. From a data structure perspective, a transaction is a digitally signed message. According to web3.js, a message contains the following attributes. A data type of DATA means alphanumeric, while QUANTITY means numeric. Besides the from attribute, all others are optional:
Figure 2.10 – Message structure in Ethereum Transactions will have additional attributes. Ethereum uses an Elliptic Curve Digital Signature Algorithm (ECDSA) signature for digital signatures; r and s are outputs of an ECDSA signature, and v is the recovery ID.
Smart contracts
A smart contract is an executable code that is digitally signed by the contract creator as part of CA creation. It is like a scripted agreement between transacting parties; the code built into the contract is stored on the Ethereum blockchain and cannot be tampered with or removed. This greatly increases the credibility of the legal document. Typically, DApp developers write smart contracts in some high-level programming language and then compile them into the bytecode. It is the bytecode living on the blockchain and executed within the EVM. There are not a lot of choices in determining which programming languages to use. The following are a couple of options that the developers may consider: Solidity: Solidity is the most popular one on the market for developing smart contracts. It is a JavaScript-type language and is Turing-complete. Vyper: Vyper is a general-purpose, experimental programming language that compiles down to the EVM bytecode, as does Solidity. It is a contract-oriented, Pythonic programming language that targets smart contract development language. Vyper aims to be auditable, secure, and human-readable. Being simple to read is more important than being simple to write. Folks may wonder whether to use Solidity or Vyper. In the majority of use cases, this is a personal preference. Solidity is the most popular one and has all tools and utilities in place for developing end-to-end DApps. Vyper is still in the earlier experimental stage. On purpose, it has omitted several programming constructs to be more secure, auditable, and human-readable. If your use case requires these, use Solidity instead of Vyper. We will focus on the Solidity programming language in this book. In Chapter 6, Fundamentals of Solidity, we will provide a comprehensive introduction to various Solidity language fundamentals, including the structure of a contract, contract pattern, and exception handling. More experienced smart contract developers may want to use Yul and Yul+ to augment the limitation of Solidity. Yul is designed as the intermediate program language in the compilation of smart contract code into EVM
code. Yul+ is the extension to Yul, which provides assembly-like, low-level code for EVM Ops code manipulation. We will show how to use Yul in Chapter 6, Fundamentals of Solidity.
Ether and gas The Bitcoin network uses Bitcoin as the cryptocurrency to bootstrap the network and has a sophisticated algorithm to control the coin supply. The miner, by providing computing capacity for costly mining processes, gets rewarded with newly minted coins and transaction fees in Bitcoin. Ether is the cryptocurrency powering the Ethereum blockchain network. ETH is the officially listed symbol for ether. Gas is the energy fueling smart contract execution in the EVM and can be purchased with ether. To obtain ether, you either need to trade for ether from the crypto market or sign up as a miner. In Ethereum, the supply side will be lowered by the move to PoS simply because there is no expensive mining anymore to compensate for. Ether will be issued at a constant linear rate during the block-mining process. You can check out online discussions and blogs here – http://ethdocs.org/en/latest/ether.html – for more information about the pros and cons of supply limitations. Wei is the smallest denomination of ether in Ethereum. One ether is one quintillion or 1018 wei. The following table is a list of the named denominations and their value in wei: Unit
Wei value Wei
Wei
1 wei
1
Kwei (babbage)
1e3 wei
1,000
Mwei (lovelace)
1e6 wei
1,000,000
Gwei (shannon)
1e9 wei
1,000,000,000
Microether (szabo)
1e12 wei 1,000,000,000,000
Milliether (finney) 1e15 wei 1,000,000,000,000,000
Ether
1e18 wei 1,000,000,000,000,000,000
Table 2.2 – Named Ethereum denominations and their Wei values Ethereum is a general-purpose decentralized computing platform powered by ether. In addition to mining, all Ethereum nodes need to perform all computational steps as defined in the smart contract as part of the transaction and block verification process. The Ethereum protocol charges a fee per computational step in exchange for the computing resources supplied by the network nodes for the contract execution. The execution fee is dynamically determined based on the total gas needed for the execution, and the gas price within the network. Gas in Ethereum is an internal virtual machine token to identify the relative cost between operations (calculations, storage, and memory access) of contract execution. The base fee was introduced as part of the London hard fork and EIP-1559 implementation, to make the gas price more stable and predictable. The base fee is determined by the system as a result of the supply and demand of the block size on the network. In Ethereum, the block size is no longer fixed. If the demand for block size is high, a block with a larger block size may be created, and in the same way, when the demand is subdued, a block smaller than normal may be created. The base fee is calculated based on the demand for the block size compared to the block prior to the target block. The sender may add a priority fee as a tip to the miner for the inclusion of its transactions into the block. The sender can purchase the gas from the miner using ether. The sender creating a transaction needs to set both a gas limit and price per gas unit, which together becomes the price in ether that is paid. When a smart contract is compiled into bytecode, it is actually assembled as an encoded opcode and loaded into the EVM. Each opcode is identified as a specific operation. The total gas cost of those operations will be the cost of your transaction. Every time the sender sends a transaction to a contract, the following two inputs need to be provided:
Base fee per gas: The maximum price limit the sender is willing to pay for processing a transaction Priority fee per gas: The tip the sender is willing to pay as the incentive to the miner for the inclusion of their transactions The execution fee the sender offers for the transaction will be calculated as follows: For a wallet that has not been upgraded to leverage the new gas fee improvement, they can still use the old formula, that is, gas*gasPrice for the total execution cost. It is recommended to set a large enough base fee per gas limit since all unused gas is returned to the sender. If there is not enough gas, the miner still collects the fee even though the transaction execution will be rolled back. If the transaction goes through, the unused gas will be refunded. In the next section about the EVM, we will talk more about these computational steps and the gas associated with each step.
The EVM In the Java world, developers write Java code in any Java IDE and compile it into bytecodes, which, in turn, are loaded into a virtual machine called the Java Virtual Machine (JVM) behind the scenes through a Java class loader, and executed in the JVM. The JVM specification describes what is required of a JVM implementation and ensures the interoperability of Java programs across different implementations and underlying hardware platforms. Similarly, in Ethereum, smart contracts and DApp developers code smart contracts in an Ethereum high-level language such as Solidity, compile them into bytecodes, and upload them on the blockchain for invocation and execution. The EVM is the runtime execution environment for smart contracts in Ethereum. It comes with a different implementation for Ethereum clients, as we discussed earlier. Each implementation follows the EVM specification defined in the Ethereum yellow paper: https://ethereum.github.io/yellowpaper/paper.pdf.
Now, let’s delve deeper to understand what the EVM is and how it executes a smart contract. The EVM is a simple stack-based architecture. When executing a smart contract, it performs all operations, or in technical terms, opcodes, as defined in the EVM code, or bytecode. Ethereum provides three types of space in the EVM for the operations to access and store data: Stack: This is a last-in, first-out container with a fixed size and maximum depth of 1,024 items, to which values can be pushed and popped. Each stack item is 256 bits long. This was chosen to facilitate the Keccak-256 hash scheme and elliptic-curve computations. Memory: This is a volatile and expandable word-addressed byte array. Key/value store: This is a word-addressable word array. It is meant to be used as the long-term account storage for the smart contract. Depending on whether you use account storage, your smart contract may be considered stateful or stateless. During smart contract execution, the EVM has access to the incoming message, as well as block header data. Additionally, the EVM keeps track of the available gas and a Program Counter (PC) during execution. The following diagram shows the EVM:
Figure 2.11 – Illustration of the EVM Unlike Java, Ethereum smart contract bytecode is uploaded through contract creation transactions, instead of a class loader within the virtual machine. As we discussed earlier, all transactions are digitally signed by the EOA. As shown in the following diagram, when a new smart contract needs to be created, the developer typically follows these steps to get a smart contract deployed into the EVM: 1. The developer writes the smart contract in Solidity and compiles it into bytecodes. 2. They use their own Ethereum account to sign the account creation transaction with the bytecode. 3. They send an account creation transaction to the Ethereum network. Steps 2 and 3 allow Ethereum to upload the code to the blockchain and create the accounts in the world state through the Ethereum mining process:
Figure 2.12 – Deploying a smart contract to the EVM and CA creation Depending on the tools you use, you will likely invoke a JSON-RPC call such as web3.eth_sendTransaction to create a new contract on the blockchain, with parameters as follows: params: [{
"from": "< EOA Address >",
"to": "",
"gas": "",
"gasPrice": "",
"value": "",
"data": "0x " }]
Once the CA is created by the mining process, you can invoke a JSON-RPC call, such as web3.eth_getTransactionReciept, to find out the contract address. You can also always go to the Etherscan site to view and search your contract code using the contract address. As we discussed in the Transactions and messages section, the contract can be invoked by an EOA via a transaction or a CA via a function call from another smart contract.
Check out the following diagram:
Figure 2.13 – Smart contract execution inside the EVM As shown in the preceding diagram, we can see the following: 1. Once invoked, the EVM will load contract bytecodes into memory, loop through opcodes while checking the available gas, and perform all operations on the EVM stack. 2. During the execution, if there is not enough gas or an error occurs, the execution will abend. 3. The transaction will not go through, but the sender will not get the spent gas back. EVM executes around 140 operations or opcodes, which are divided into the following 11 categories. All of the opcodes and their complete descriptions are available in the Ethereum Yellow Paper (https://github.com/ethereum/yellowpaper): Stop and arithmetic operations (0x00-0x0b)
Comparison and bitwise logic operation (0x10-0x1a) SHA3 (0x20) Environmental information (0x30-0x3e) Block information (0x40-0x45) Stack, memory, storage, and flow operations (0x50-0x5b) Push operations (0x60-0x7f) Duplication operations (0x80-0x8f) Exchange operations (0x90-0x9f) Logging operations (0xa0-0xa4) System operations (0xf0-0xff) You may remember the relationship between transaction fees, gas, and gas prices from our earlier discussion. In fact, the winning node adding the new block to the blockchain gets paid the transaction fee for every smart contract executed within its EVM. The payment is calculated for all of the computations the miner made to store, compute, and execute the smart contract. The EVM specification defines a fee schedule for every operation or opcode. You can check the Ethereum wiki site to see all the opcodes supported by Ethereum. The following is a screenshot captured from the Ethereum Yellow Paper. For the latest gas fee schedule, please check out https://github.com/crytic/evm-opcodes.
Figure 2.14 – The gas fee schedule in Ethereum In addition to smart contract execution, the EVM also performs transactions between two EOAs to transfer ethers from one account to another. In any case, whenever a transaction is submitted to the Ethereum network, as the Ethereum mining node, the EVM and Ethereum client perform the mining operations and add new transactions to the blockchain. We will talk about mining in the next section. If you come from a Java development background, you may remember the Java Just-in-Time (JIT) compiler and its advantage in terms of Java runtime. It compiles the bytecode into the machine code instructions of the running machine. This makes sense since, from the bottom-line machine execution view, it is machine code being executed, not the bytecode. By
doing the initial compilation, it avoids instant interpretation from bytecode into machine code at execution time, hence improving the performance of Java applications. Some newer EVM implementations support a JIT compiler too, including Parity and Geth. Similar to Java JIT, EVM JIT is a library for the JIT compilation of EVM bytecode. Before the EVM can run any code, it must first compile the bytecode into components that can be understood by the JIT-enabled EVM. By compiling the bytecode into logical pieces, the JIT can analyze the code more precisely and optimize where and whenever necessary, although it may seem slower at the beginning since it needs to check whether the JIT-compiled code is in the library and compile the bytecodes if it is not. You can check out both the Go Ethereum and Parity sites for more details.
Addresses and wallets Earlier, we introduced Ethereum accounts, transactions, and smart contracts. They are all linked to an address, which refers to the owners of both types of accounts or the senders of messages and transactions. Like Bitcoin, Ethereum has the concept of an Ethereum wallet too. It is a tool that allows you to easily share your public keys and securely keep private keys. In the following sub-sections, we will discuss the details of the Ethereum address and wallet. We will also provide a brief introduction to different wallet tools.
Addresses in Ethereum As we discussed earlier, an EOA has a pair of public and private keys. The private key is used to digitally sign any transactions. The public key is used to generate the account address, which is linked to the account state in the world state. An Ethereum address is described as follows in the yellow paper: In other words, to generate an Ethereum address, take the Keccak-256 hash of the public key. The rightmost 20 bytes of the hash, with a prefix of 0x as
the hexadecimal identifier, is your Ethereum address. The CA has an account address too. Traditionally, it is generated based on the sender’s public key and the transaction nonce. A newer address creation schema based on EIP-1014, Skinny CREATE2, was implemented as part of the system-wide upgrade release of Constantinople. It allows the developer to create an address for future smart contract deployment. You can check out the EIP link at https://eips.ethereum.org/EIPS/eip-1014. Once the transaction is processed, the CA will be linked to the account address in the world state.
The Ethereum wallet Ether, just like every other cryptocurrency, doesn’t exist in any tangible shape or form. All that exists are records on the Ethereum blockchain. Similar to Bitcoin, you will need the Ethereum wallet to enable cryptocurrency transactions on the Ethereum blockchain. An Ethereum wallet stores the public and private keys, where the private key is used for paying with or spending ether, and the public key is used for receiving ether. A wallet can contain multiple public and private key pairs. An Ethereum address, derived from public keys, is used as the public address for you to receive ether. However, every piece of cryptocurrency has a private key. You will need the private keys from your wallet to unlock the funds to spend the coins. The following formula defines how the Ethereum wallet address is calculated:
Figure 2.15 – Calculation behind an Ethereum wallet address We will discuss various wallet tools in the next sub-section.
Wallet tools There are many wallet implementations to choose from. You may have to decide for yourself which wallet is best for you. You have options to choose between hot and cold wallets – in other words, online or offline wallets – or you can choose between hardware, software, or even paper wallets. There are a lot of mobile wallets available too. For Ethereum, you may want to choose an ERC-20-compatible wallet. ERC-20 is an Ethereum token standard that defines standards, smart contract interfaces, and rules for issuing crypto-tokens on the Ethereum network. The following are a few known to be compatible with ERC-20 token standards: Atomic Wallet (https://atomicwallet.io) Trezor (a hardware-based wallet) (https://trezor.io) MyEtherWallet (https://www.mewwallet.com) MetaMask (https://metamask.io/) Mist (https://github.com/ethereum/mist/releases) We will discuss Ethereum tokens more in Chapter 3, Decentralized Finance, including ERC-20, ERC-223, ERC-721, and ERC-777 tokens. We will further demonstrate how to build Ethereum wallets in Chapter 14, Build Ethereum Wallets.
Examining mining in Eth 1.0 In this section, we will explain how mining works in Ethereum. Depending on the progress of Eth2 and the merge of Eth 1 and Eth 2, sing the PoW consensus in mining might already be decommissioned by the time you read this section.
The mining process in Ethereum is largely the same as the one we discussed for Bitcoin. For each block of transactions to be added to the Ethereum blockchain and the world state to be updated, a consensus must be reached between all network nodes that the new blocks proposed by the miners, including the nonce found with the PoW, must be verified by all nodes. However, there are quite a few notable differences between Ethereum mining and Bitcoin mining. Most of them are driven by the protocol and architectural differences in the blockchain. As we discussed earlier, Ethereum maintains both the transaction list and the world state on the blockchain. We will discuss those differences in detail here.
Mining and the consensus protocol Bitcoin uses a general-purpose cryptographic hash function, SHA-256, as the PoW algorithm. With advances in specialized mining equipment such as ASIC, miners have built large mining pools to compete with each other for the rewards of the bitcoin. This puts small miners at a disadvantage and leads to more mining centralization. To avoid such concerns, Ethereum uses a memory-hard hash function called Ethash, a modified version of the Dagger-Hashimoto algorithm, as the PoW algorithm and targets GPUs as the primary mining equipment. As with other PoW algorithms, Ethash involves finding a nonce that makes the resulting hash value fall under a protocol-defined target. The design idea for a new hash algorithm is twofold. First, it is that it is hard enough for miners to mine but it is easy enough for the validators to verify. Also, the hash results are uniformly distributed for easy control when a new block is found. In Ethereum, new blocks are created every 12 seconds, instead of every 10 minutes on the Bitcoin network. The difficulty is dynamically adjusted to ensure fast block creation speed. Overall, the Ethash algorithm in Ethereum involves two stages. The first stage is to generate a dataset of a Directed Acyclic Graph (DAG). This is usually calculated for each epoch – or every 30,000 blocks. The second stage is to repeatedly hash the dataset, the proposed header, and a random
nonce using Keccak-256 until the resulting hash value meets the difficulty target. DAG generation is composed of the following three steps: The first step is to calculate the seed. Use Keccak-256 cryptographic function to hash the headers of each block within the current epoch; the resulting hash becomes the seed for step 2. Once the seed is found, a 16-MB pseudorandom cache is generated from the seed using Keccak-256. A DAG is generated using the Fowler-Noll-Vo (FNV) hash function. It consists of many chunks of 64-byte elements; each of them depends on a part of the cache. Each Ethereum client may implement a DAG differently. It is typically generated in advance and cached for performance improvement. For example, in the Geth client, a DAG was automatically generated. At any time, Geth keeps current and previous DAGs to ensure smooth epoch transitions.
Ethereum transactions and block structure Due to blockchain architecture and mining differences in Ethereum implementations, Ethereum defines quite different transaction and block structures in the Ethereum protocol. Ethereum DApps need to follow Ethereum protocol rules to format and submit transactions to the network. Invalid transactions will be rejected by the network. The following are the essential block data structures in Ethereum, taken from Ethereum’s GitHub code: Block header
Description
Size
parentHash
A Keccak hash of the parent block
32 bytes
ommersHash
A Keccak hash of the ommer block
32 bytes
beneficiary
The beneficiary address, that is, the mining fee recipient
20 bytes
stateRoot
A Keccak hash of the root node of state trie after the execution
32 bytes
TransactionRoot
A Keccak hash of the root node of the transaction trie
32 bytes
receiptsRoot
A Keccak hash of the root node of recipients in the transaction
32 bytes
logsBloom
A bloom filter of two fields {log address + log topic} in the receipts
256 bytes
difficulty
The scalar value of the difficulty of the previous block
Big Int
number
The scalar value of the number of ancestor blocks
Big Int
gasLimit
The scalar value of the current limit of gas usage per block
Big Int
gasUsed
The scalar value of the total gas spent on the Big transaction in this block Int
timestamp
The scalar value of the output of Unix time()
Big Int
extraData
32-byte data relevant to this block
32 bytes
mixHash
A 256-bit hash of the nonce and the difficulty 32 of the block to make computation hard enough bytes
nonce
A value to denote the computation power that went into creating this block
8 bytes
Block body
Description
Size
Transaction List Ommers List
List of transaction(s) in this current block
Varies
List of Ommer transactions in this current block
Varies
Table 2.3 – Ethereum block data structure Please note, stateRoot is the Merkle Patricia Trie of the account state, transactionRoot is the Merkle Patricia Trie of all transactions in the block, and receiptsRoot is the Merkle Patricia Trie of all transaction receipts. You can check out Vitalik’s blog about the rationale for implementing Merkle Patrica Trie in Ethereum: https://blog.ethereum.org/2015/11/15/merkling-in-ethereum/. Another difference in Ethereum is when creating new blocks, Ethereum adds the new block to the heaviest branch of the block tree, instead of to the longest chain as we saw in Bitcoin. The header attribute, difficulty, is used by the miner to determine which branch is heavier. The following is the transaction data structure the sender needs to use when sending a transaction to the Ethereum network: Field
Description
Size
type
Transaction type
1 byte
Nonce
No. of transactions sent by the sender
32 bytes
To
The recipient address
32 bytes
gas
The maximum amount of Wei needed to complete this transaction
20 bytes
Value
No. of Wei to be transferred to the recipient’s address
32 bytes
data
Transaction data
maxPriorityFeePerGas
The maximum fee per gas the sender is willing to pay to the miner in Wei
MaxFeePerGas
The maximum total fee per gas the sender is 32 willing to pay, including the network/base bytes fee and the priority fee) in Wei
accessList
An array list of address and storage keys
chainId
The chain ID that this transaction is valid 32 on bytes
v
ECDSAPUBKEY
32 bytes
r
ECDSASIGN
256 bytes
s
ECDSARECOVER
Big Int
20 bytes
20 bytes
Table 2.4 – Ethereum transaction data structure A transaction receipt is generated once the blockchain accepts the submitted transaction.
Transaction validation and block verification Ethereum maintains all accounts in the underlying world state, which makes the state transition much easier. When the transaction is submitted to the Ethereum blockchain, miners will perform an intrinsic validity check on the transaction. It will be validated according to the consensus rules and heuristic limits of the local node, such as price and size. If the transaction size is over 32KB, it will be rejected to prevent DoS attacks. The
transaction needs to be well formed with Recursive Length Prefix (RLP) encoding. They will be checked to ensure that the transaction is properly signed by the sender and has the proper nonce ordering and to make sure the sender has enough funds to cover the total transition costs – in other words, the amount being transferred plus the gas cost for the smart contract execution. Miners add transactions to the transaction pool once they pass an intrinsic validity check. Every 12 seconds, miners take transactions out of the transaction pool and start to propose the new block. They determine ommer or uncle blocks, and the total gas used in the block. They will create the block structure as defined earlier and start mining. Once the nonce is found to meet the Ethash target, the new block with the newfound nonce is broadcasted to the network for all network nodes to verify and add to their local copy of the blockchain. By now, you have learned how EVM works and how the Ethereum transaction is processed in Eth1. In the next section, we will discuss the challenges in scaling Ethereum and the rationale for transitioning to PoS.
Understanding scaling challenges in Ethereum Since the creation of Bitcoin, blockchain has enjoyed huge success well beyond transferring money and the maintenance of a distributed ledger. Ethereum was invented to address some of the fundamental computing limitations in Bitcoin with the introduction of smart contracts, which unlock infinite potential in blockchain technology and make broad applications of blockchain technology possible. Originally envisioned as the computer of the world, Ethereum has quickly found itself becoming the decentralized infrastructure powering a decentralized virtual economy and the Metaverse, the digital parallel to the real world which we have been used to for the last hundreds of years. As of July 2022, less than 10 years after its conception and about 7 years after the mainnet launch, Ethereum had become the most vibrant
blockchain ecosystem. According to https://messari.io/asset/ethereum/metrics/all, Ethereum now hosts about three thousand DApps and processes about 1.19 million transactions per day. It has a $200+ billion market capability and settles more than $4.4 billion in transactions daily, about 2-3 trillion per year. It works like a flywheel. The resulting flywheel effect has established Ethereum as the dominant blockchain infrastructure in the last few years. However, Ethereum’s success also highlights a few critical challenges for the future of Ethereum. The Ethereum community seems to be caught in both a virtuous and vicious cycle: 1. As more and more microtransactions are developed, smart contracts and their compossibility, that is, the ability to integrate and invoke smart contracts, make the orchestration of microtransactions to support large, complex, and emergent use cases possible. As a result of that, more and more sophisticated and innovative DApps have been developed. 2. The transaction throughput in Ethereum cannot meet the high demand for DApps, as more and more DApps are deployed. 3. Higher usage of the Ethereum blockchain drives the scarcity of block spaces. 4. Scarcity in block space jacks up the market-driven gas fee and makes Ethereum transactions more expensive. 5. High transaction fees and less transaction throughput will drive away the network participants, and ultimately as we will see in the next section, it will harm the network security and detriment the health of the Ethereum blockchain. The meteoric rise of new blockchains in the last few years is a clear example. With the PoW consensus, high usage of the Ethereum blockchain drives up demand for electricity and energy consumption, which in turn creates Environment, Social, and Governance (ESG)-based issues; this incurs more and more government regulations on Ethereum, crypto, and blockchain in general.
At its heart, the scalability issues with low throughput and the scarcity issues with limited block space are crucial. Like Bitcoin, the main reason for the Ethereum scalability problem is the network protocol that each node in the network has to process each transaction. Eth1 implements a slightly modified version of the PoW consensus mechanism. In Ethereum, miners have to race to find the nonce to meet the target difficulty. Every node needs to verify that the miners’ work is valid and keep an accurate copy of the current network state. This greatly limits the transaction process capability and throughput of the Ethereum blockchain network. Currently, it can only process 12-15 transactions per second.
The blockchain scalability trilemma First recognized by Vitalik Buterin, the scalability trilemma is a concept in blockchain regarding its capability to address scalability, decentralization, and security, without compromising any of these aspects. The trilemma claims that it is almost impossible to achieve all three properties in a blockchain system: Decentralization: This is a core tenet upon which Bitcoin and blockchain were created. Decentralization enables censorship resistance and permits anyone to participate in a decentralized ecosystem without a central authority or intermediary. Security: This refers to the integrity and immutability of the public ledger, and the ability to resist 51%, Sybil, or DDoS-like network attacks, as well as protection from double-spending and tampering. Scalability: This concerns the ability to handle a growing number of transactions in the blockchain network. In order for the Ethereum blockchain to be the world computer as the inventor envisioned, it needs to match the transaction throughput of many centralized systems, such as Amazon, Visa, or Mastercard. The following diagram is an illustration of the scalability trilemma in the blockchain:
Figure 2.16 – Blockchain scalability trilemma The key challenge of scalability is finding a way to achieve all three at the base layer. The design choices of Bitcoin and Eth1 favor decentralization and security, while sacrificing scalability. Some of the altcoins we discussed in the Altcoins section in Chapter 1, Blockchain and Cryptocurrency, address Bitcoin scalability issues with a compromise of either decentralization, by introducing some centralized concepts and components, or security, by applying variations to the consensus protocols. Solana, Cardano, and XRP prioritize scalability and security over decentralization;
therefore, they can provide much higher transaction throughput per second than Bitcoin and Eth1. On the contrary, Eth2 takes a more balanced approach to maximize all three properties. It prioritizes decentralization and believes a decentralized and well-diversified network will strengthen network security, while a centralized and concentrated network will have fewer means to protect the network and therefore will weaken network security. Eth2 provides a multiprone approach to solving the scalability dilemma, the crux of Eth2, including moving to PoS, sharding, improving on EVM execution and separating execution from consensus, and supporting off-chain executions. Most importantly, it has an economic design to incentivize participation, improve decentralization, and further protect network security. We will discuss Ethereum scaling solutions in the rest of this chapter.
Modular blockchain architecture As we discussed earlier, the blockchain, Bitcoin and Ethereum included, is the shared and immutable distributed ledger. Transactions, recorded as the state transition of individual accounts, are secured through a consensus mechanism. Bitcoin has limited computing capability in facilitating the state transition of Bitcoin accounts beyond simply moving bitcoins. Ethereum addressed computation limitations with the smart contract concept, which makes Ethereum a true world computer in the decentralized world. In the most abstract form, blockchain is a new way of distributed computing at a global scale. At the fundamental level, it allows millions of diverse and untrusting participants to coordinate and agree upon what is happening in the decentralized world. Just like a traditional distributed computing platform, it needs computing power such as from a CPU for execution, it needs storage for data, and it needs a mechanism to achieve a consensus among distrusted parties. Earlier generations of blockchains, such as Bitcoin and Eth1, implemented all these separate concerns in their base layer, called layer 1. They were considered a monolithic blockchain, since the blockchain data, execution,
and consensus intermingled in the original design, and provided less flexibility in improving any perspective alone without impacting the stability of the blockchain itself. Earlier versions of bitcoin-variant altcoins simply just relaxed certain parameters or replaced certain algorithms without fundamentally changing the blockchain design. Some other altcoins, or newer blockchains, explored new ways to address the issues associated with the PoW, started chipping away, and plugged in the new consensus mechanism. Taking the cue from the modular design of computer chips at AMD, as the challenger to Intel’s integrated central control design, modern blockchain designers start to modularize the blockchain components into layers, and find innovative ways to optimize different layers to achieve the goals of different blockchain technologies. In Ethereum, it is to achieve the optimal balance between decentralization, security, and scalability. The following are logical layers in the modern blockchain: 1. The data layer, sometimes called the data availability layer – where the world state resides 2. The execution layer, sometimes also called the execution engine, or in Ethereum, the execution client 3. The consensus layer, otherwise known as the consensus client or consensus engine The following diagram illustrates these layers in Eth1, although it is implemented as a monolithic blockchain:
Figure 2.17 – Mapping Eth1 components to the modular concept Modularization is not new in software engineering. In fact, it is an architecture and design practice to decouple large complex systems into multiple independent modules. It is a process to separate concerns and allow the flexibility to address those concerns differently. The resulting system will be easy to understand and maintain. The independent subsystem or modules can be reused, can be extended, and new implementations of any individual subsystem can easily be plugged in without too many impacts on the large complex system as a whole.
Ethereum scaling solutions Ethereum scalability solutions are one of the most active topics in the Ethereum community. The following are a few areas of concern the community is trying to tackle:
Transaction processing and the block creation time with PoW – How fast can the miners process all transactions and create a new block through mining? Transaction finality – How soon can the decentralized network reach a consensus that a transaction has happened and can’t be reverted? Currently, it takes about six blocks in Bitcoin and 3-4 minutes in Ethereum for the network to consider a block finalized in the main chain. You should check out Vitalik’s blog to read more about transaction settlement and block finality probability (https://blog.ethereum.org/2016/05/09/on-settlement-finality/). Block space scarcity – How affordable can the transaction cost be, as more and more transactions are posted and processed on the Ethereum blockchain? As of June 2022, the average Ethereum gas cost is 47.63 Gwei, down from the highest value of over 470 Gwei on May 1, 2022. Ethereum adopts an incremental approach to address scalability issues on all three layers. Solutions being implemented or proposed fall into three categories: on-chain or Layer 1 solutions, off-chain or Layer 2 solutions, and side-chain solutions. There are some obvious or theoretical ones, such as increasing the block size or slicing one blockchain into many independent altcoin chains. Due to the nature of P2P, a traditional horizontal scaling approach may not work.
Layer 1 solutions On-chain solutions, sometimes also called layer 1 solutions, address scalability and performance issues at the base layer of the Ethereum blockchain network. One such solution is the beacon chain, implemented as part of Eth2, for transitioning from a PoW consensus to a PoS consensus. With a modular blockchain design, it becomes a standalone consensus layer to orchestrate transaction processing on the network and enforce network security and integrity. We will discuss how the beacon chain works and how PoS works in Ethereum in the next section.
Another layer 1 solution is sharding. Sharding is not a new concept, as traditional RDBMSes and new big data platforms have used sharding as a way to improve scalability and performance for many years. In the original Eth2 roadmap, Ethereum planned to implement both computation and data sharding. With the complexity of getting both computation and data sharding right and skyrocket the demand for blockchain, Ethereum is taking a more practical approach to focus on data sharding first in Eth2. Computation sharding may or may not be included in the future roadmap. The modular blockchain concept makes it easier to explain data sharding in Ethereum. By separating the data layer from the monolithic blockchain, data sharding provides a mechanism to divide the blockchain data into multiple shards and shard chains, and still maintain a link back to the beacon chain. Data integrity and blockchain data management will be coordinated through the orchestration at the beacon chain. All data at different shards will be made available for transaction processing. Ethereum intends to implement data sharding immediately after the merging of Eth1 and Eth2. For now, we can think of the beacon chain as the consensus layer, and the current world state in Eth1 as the data availability layer and one shard. The execution layer, the execution environment for smart contracts and computation, stays in the base layer of the Ethereum blockchain. We will discuss Ethereum’s planned implementation of data sharding in more detail. Beyond data sharding, there are protocol and EVM improvements planned in the Ethereum roadmap too. We will discuss them in Chapter 5, Deep Research and Latest Developments in Ethereum.
Layer 2 solutions As we discussed earlier, the modular blockchain design makes it easier to swap and plug in any of the three layers and makes Ethereum more scalable. Currently Ethereum’s plan for computation and execution is to optimize the underlining protocols and improve the execution efficiency of the EVM, which was slated later in the future roadmap, due to pivoting on the PoS implementation and data sharding. Therefore, are a lot of off-chain
solutions have been developed in the Ethereum community to take executions off the base layer of the blockchain, and allow Ethereum to maintain security and transaction integrity. Notable solutions include the following: Rollups – A rollup solution is a mechanism to take the executions and computations of the Ethereum base layer. The closer analogy is like batching Ethereum transactions, instead of processing each transaction on Ethereum one by one. It improves the transaction processing throughput and drastically lowers the transaction cost. There are two flavors of rollups, optimistic rollups, and ZK rollups. The concepts are much similar, but the approaches for transaction validity and the consequence of transaction finality are quite different. We will discuss how rollups work in the last section of this chapter. State channels – State channels are another technique for performing transactions and other state transitions in a second layer built on top of a blockchain. By moving many processes off-chain, the blockchain can be more efficient and scalable, while still retaining transaction integrity and network security. We will discuss state channels in Chapter 5, Deep Research and Latest Developments in Ethereum. Sidechains – A sidechain is a separate blockchain that runs independently from the Ethereum blockchain, and still maintains periodical checkpoints with the main Ethereum blockchain. It was first proposed in August 2017 by Joseph Poon and Vitalik Buterin. The design idea is to offload transactions to many faster and less crowded side chains, also called Plasma chains. Ploygon is the first popular sidechain implementation that is EVM-compatible and brings scalability to the Ethereum platform. We will discuss Ploygon, together with other EVM chains, in Chapter 4, EVM-Compatible Blockchain Networks. With layer 2 solutions, transactions are processed outside of the base layer of the Ethereum blockchain, or by other blockchains. This certainly creates some new challenges and solutions too. In the case of state channels, you need multiparty agreements for such operations. In the case of sidechains, you will need a bridge to link sidechains to the main Ethereum blockchain, and in all solutions, it requires a settlement step or layer as an anchor to
record and register transactions or transaction proofs on the main Ethereum blockchain.
Introducing Beacon Chains and Eth2 In this section, we will discuss how the Beacon Chain works in Eth2, and the key components of the Beacon Chain client.
PoS in Eth2 You may remember that we analyzed both PoW and PoS consensus algorithms in depth in the Anatomizing blockchain consensus mechanisms section of Chapter 1, Blockchain and Cryptocurrency. We discussed the advantages and disadvantages of both algorithms. The main issue with PoW is that it is energy-intensive, has a high cost of mining, and is less scalable. Eth2, originally known as Serenity, is planned to be the system’s final upgrade, as the entire network transitions from a PoW to a PoS consensus algorithm, and tackles fundamental questions such as scalability, economic finality, and security. It plans to fade out the PoW chain over time via the merge between Eth1 and Eth2. In its simplest form, with PoW, miners invest in GPU an ASIC, and consume a large sum of electricity as the expense to perform the duty of securing the network. In return, their investments are rewarded and they earn bitcoin and cryptocurrency. In the case of PoS, validators, similar to the role that miners play in PoW, allocate certain assets as the stakes to perform the duty of securing the network and are rewarded with crypto coins for their services. Their investment is a certain amount of ether or crypt coins; their expense is really low since they don’t need to do any complex computations. Validators are randomly selected to propose new blocks, and all nodes can validate the proposed blocks. Upon validation, the new blocks will be added to the blockchain. The validator proposing a valid
block will be rewarded with incentives. In this way, it makes the blockchain continue to operate. One of the issues with the PoS consensus, at least theoretically, is that nothing is at stake. The doubt about PoS is that the stake alone won’t deter bad behavior. For example, in the case of a potential temporary fork, validators could potentially build on both sides of the fork and make the temporary fork forever since they will collect a transaction fee no matter whether they win or lose. This also makes the finality become impossible. To avoid this nothing-at-stake issue, Ethereum implemented a Casper Friendly Finality Gadget (FFG) PoS consensus mechanism. It uses two rounds of voting, one round of votes for the head of the beacon chain, and one round of votes for two checkpoints in the current and prior epoch to ensure transaction validity and finality, and therefore safeguard the chain. PoW uses difficulty to control the speed at which new blocks are proposed on the blockchain. The Eth2 PoS implementation uses one shorter set of fixed time intervals, called slots, to control the creation of new blocks. At any slot, one selected validator will propose a new block, and a group of at least 128 validators will form a committee to attest to the new block. All validators will have the opportunity to attest to the head of the chain. It uses a longer set of fixed time intervals, called epochs, as the checkpoints to vote on the finality of transactions that occurred in the prior epoch. Dishonest validators will be penalized, and their stakes will be slashed. To encourage participation in the network, inactive or idle participants are discouraged too through a smaller penalty for inactivity.
How the beacon chain works To see how Ethereum implements PoS with a beacon chain, let us start with some key concepts and building blocks in Eth2: Slot and epoch – Eth2 divides time linearly into slots and epochs. Starting from time 0, every 12 seconds becomes a slot, and every 32 slots, or every 6.4 minutes, becomes an epoch.
A validator – A validator is any node that stakes 32 ETHs and is willing to participate in the network and perform the duties assigned by the beacon chain. A minimum of 16,384 validators is required to start Eth2. At this moment, per beaconcha.in, there are over 400,000 validators on the Eth2 network, with a total of about 13 million staked ether. The validator registry – This is the pool managed by the beacon chain to maintain the state of all validators. Staking – This is the action of depositing 32 ETHs with the intention of becoming a validator. Slashing – This is the action of reducing the balance of the ether as the penalty for dishonest network behavior or inactive participation in assigned duties. Attestation – Attestation is the vote of blocks in Eth2.
Slots, epochs, and checkpoints In Eth2, one slot is exactly 12 seconds, and one epoch is exactly 32 slots. The first slot is called a genesis slot, and the first epoch is called a genesis epoch. The following diagrams show how slots and epochs work in Eth2:
Figure 2.18 – Slots and epochs in Eth2 One randomly pre-selected block proposer and a validator committee will be assigned to each slot. Validators will be randomly shuffled epoch by epoch. Their roles and duties are defined at the epoch level. The first block in an epoch is the checkpoint for the current epoch. If no block is created in the first slot of an epoch, the preceding block in the earlier epoch will be the checkpoint for the current epoch.
Validators and the validator registry In Eth2, validators participate in the PoS consensus of the Ethereum protocol. A validator is a virtual entity on the beacon chain, and their role is to perform transaction validation and block creation duties. To become a validator, you need to stake at least 32 Eths as a financial commitment. In addition, you also need to commit the uptime so that you can perform the duties as assigned. In return, you will get rewarded for helping the security and smooth operations of the Eth2 blockchain. The majority of beacon chain responsibilities are to maintain the validator registry and coordinate validators to perform their duties.
Staking and the life cycle of a validator In Eth2, staking is the process of depositing a total amount of at least 32 Eths with a commitment of some unknown period of time in order to become a validator on the Eth2 network. The staker is the one who provides the funds for staking. A staker can deposit their Eths to a deposit contract on Eth1. The minimal threshold for such a deposit is 1 Eth. The deposit contract is a smart contract and the primary mechanism to lock up the Eths on Eth1 and transfer funds from an Eth1 account to an ETH2 validator. The committed funds in Eth1 will get burned. The deposit contract has the following properties: The staker’s Eth1 address The validator’s public key on the beacon chain The amount of the stake (a minimum of 32 ETH) The public and private keys for withdrawing the funds on the beacon Chain The entire deposit process may take about 12 hours. That is because the beacon chain only considers transactions that have been in the deposit contract for at least 2,048 Eth1 blocks, plus the 32 Eth2 epoch waiting period. Once deposited, the validator will be in a pending state. Depending on the total amounts deposited and first-in-first-out (FIFO), it will be put into the validator queue and wait to be activated. During each epoch, the beacon chain only activates 6 validators from the validator queues. The following diagram depicts the validator’s life cycle. All active validators on the beacon chain will be assigned to perform blockchain maintenance duties. They can be rewarded or their funds can be slashed based on the actions they perform. As long as they are active, they will be
pulled and assigned for the healthy progression of the blockchain chain. If their funds get slashed and fall below the threshold, or they volunteer to exit the network, they will be pulled out from the pool, and their resulting funds can be withdrawn once the lock period is over:
Figure 2.19 – Life cycle of a PoS validator A staker can deposit multiple small amounts to the deposit contract. Only once the total amount is at least 32 Eths, it is eligible to become a validator. A staker may deposit more than 32 Eths as the stake; since only 32 Eths are needed to become a validator, it would be beneficial to the staker to create multiple validators on the beacon chain to increase the passive return of their stakes.
The following screenshot, taken from https://beaconcha.in, shows the current epoch, current slots, the total active and pending validators, as well as the total staked ethers as of July 9, 2022:
Figure 2.20 – Number of validators and amount of staked ether As you can see, there is about 13 million ether staked for participating in PoS. The total value staked on the beacon chain reaches about 15.8B in USD.
The validator committee The beacon node maintains the validator registry and keeps track of each validator’s status based on the validator’s life cycle. At the beginning of an epoch, all active validators in the registry will be randomly assigned to certain roles they play during the next epoch. The pool of validators in the registry will be evenly divided into multiple committees; each committee has at least 128 validators. It will ensure at least one committee for each slot. In the future, it also ensures at least one committee for each shard. The validator’s role within the committee is to provide attestations. They are also called attesters. Out of each committee, one or more validators may be selected as the attestation aggregators, whose role is to aggregate all attestations within the committee and re-broadcast them to the network.
One validator will be randomly selected as the block proposer for each slot. In this way, all slots will have at least one pre-defined randomly selected committee to work with. All nodes, including one designated block proposer and 128 block attestors in one committee, form a subset. This is done through a RANDAO process. To be more precise, the RANDAO process is executed one epoch ahead so that all validators have time to prepare for their roles and duties. When one epoch starts, all roles, including the block proposers and the validators in each of 32 committees, as well as the selected committee aggregators, are already assigned to perform their duties. The following figure depicts how the committee and block proposers are formed:
Figure 2.21 – Random shuffling and validator committee selection
Now you have learned how the time is divided into epochs and slots, and how the validator committees are selected for each slot, in the next subsection, we will show you how the beacon chain progresses.
The beacon chain The beacon Chain is the consensus-layer implementation of the PoS mechanism in Eth2. It plays the coordination role, and its responsibility is to assign roles to each validator, ensure all validators play their roles, and reward validators for keeping the network safe. To do so, it maintains the validator’s status, manages the validator’s stake balance, and keeps audit trails of the validator’s actions. Just like Eth1, it maintains the world state and transaction records related to the state transition. In Eth2, the world state of validators, called the beacon state, as well as the beacon state transitions, including attestations and validator life cycle changes, are recorded as part of the beacon block. The result of that, as time progress, is a separate blockchain itself. It becomes a blockchain of beacon blocks.
The beacon block The following is the beacon block and beacon block body data structure as defined by the Eth2 spec:
Figure 2.22 – Beacon block data structure The slot is the slot when the current is proposed. The things recorded on the beacon block include the following: Attestations – All attestations collected from the last slot Slashings – All proposer or attester penalties due to wrong attestations or inactive participation Deposits – All staking amounts and incentives for performing duties Eth1_data – An Eth1 data vote for sending ether to the deposit contract for staking
Voluntary_Exits – The validator’s signature for voluntary exits from participation in beacon chain activity Reserved for the future merge is the execution_payload field, which links the normal Eth1 transaction data to the beacon block. We will discuss the merge in the next section.
The beacon state The following is the beacon state data structure as defined by Eth2 spec, which is the result of the beacon state transition:
Figure 2.23 – Beacon state data structure As you can see, maintained as part of the world state, the beacon state includes the following information: Historic data pointing to the latest block Eth1 data for staked deposits The validator registry and its balance
Checkpoints and justification for finality Participation and inactivity As with the beaconblockbody data structure, the beacon state also reserves a latest_execution_payload_ header field for the merge of Eth1 with the beacon chain, which will provide a link between Eth1 data and the beacon state.
Beacon block creation As we discussed in Validators and the validator registry section, at the beginning of each epoch, the beacon chain will perform a RANDAO function to randomly divide the network into multiple committees, assign a committee to a slot, and ensure each slot has at least one committee of 128 validators. At the beginning of each slot, all beacon transactions, including block proposals, attestations, validator status changes, incentive activities, and so on, submitted during the last slot will be collected by a designated validator as a block proposer, which will perform the duty of sequencing the transactions, verifying the validity of transactions, and performing state transition computation as part of the block proposals. All validators in the designated committee will validate the new block proposed by the block proposer and provide their attestations for the new block. Selected committee aggregators will aggregate all attestation messages and new blocks from the committee, and re-broadcast them to the entire network so that the block proposer and committee for the next slot can see and process them into the next block. Eth2 employs Latest Message Driven Greedy Heaviest-Observed SubTree (LMD GHOST) as the fork choice rule for settling the head of the beacon chain. If a fork occurs during block creation, the block proposer will use the fork choice rule to determine which side of the chain to add the new block in. The attestors in the same committee will use the same rule to attest the validity of the new block. All validators will need to use the same fork choice rule to attest the head of the beacon chain. The rule itself starts with
the beginning of the fork, considers which validators attested on the blocks from each side of the fork, and recursively calculates the total balance of the validators’ stake. The heaviest branch wins. Once a new block is proposed, all validators will cast an LMD GHOST vote and a Casper FFG vote, as part of the two rounds of voting. The LMD GHOST vote is to vote on the head of the beacon chain, using the Casper LMD GHOST protocol. The Casper FFG vote is to vote for the target checkpoint in the current epoch and the source checkpoint from the last epoch. Any checkpoint that receives 2/3 votes from all validators is considered the finality. At each epoch, there is one block as the checkpoint of the chain, which is considered the last point where the blockchain finality is agreed among all validators.
The beacon node’s role Similar to Eth1, Eth2 is supported by the beacon chain client node software. The following screenshot shows the high-level architecture components of the beacon chain clients. Its role is to serve as the consensus layer and execute the PoS consensus mechanism to maintain the beacon block of the beacon chain, and the world state of the beacon state:
Figure 2.24 – Illustration of a beacon chain client Depending on the roles each node plays at each slot, the beacon node may play the role of block proposer, block attester, or attestation aggregator role. The beacon nodes form the decentralized P2P network of the beacon chain, whose role is to maintain, secure, and progress the beacon chain. In summary, the following are the beacon node’s role in Eth2: Watching for stake deposits on the Eth1 chain for new validators
Maintaining the validator status and state on the beacon chain and adding and removing validators from the registry Processing voluntary withdrawal (in the future) Maintaining a synchronized clock with other beacon nodes Randomly forming block proposers and committees for slots per epoch Serving as an RPC server/endpoint for validator clients to leverage to propose/attest beacon blocks Processing block attestations from validators/committees Progressing the beacon chain As a validator, they will play their assigned roles, including the following: Computing beacon state transitions and creating the new block (as a block proposer) Attesting on the new block, the head of the chain, and source and target checkpoints (all validators) Broadcasting attestation messages (as an attestation aggregator)
Transaction finality All validators are required to vote on the source and target checkpoints, as well as the block head. If any checkpoint gets 2/3 votes, the checkpoint is considered justified. A block at a checkpoint is considered finalized or in a finality state once its next checkpoint is justified too. All blocks up to the latest finalized checkpoint blocks are considered final. Since each epoch potentially has one checkpoint, it takes at least 2 epochs, or 12.8 minutes, to finalize the transactions.
Incentives and penalties
Unlike PoW, where miners are rewarded with block creation and there are no disincentives to penalize bad behaviors, Ethereum PoS in the beacon chain implementation leverages both incentives and disincentives to reward good behavior and penalize bad and negligent actions. The following are the incentives with which the network participants will be rewarded: Consensus rewards, including incentives for attestations on the source, target, block head, sync committee, block proposal, as well as whistleblowing. Execution rewards, which allow the block proposer to collect the transaction fee. MEV rewards, which give the block proposer a chance to selectively process transactions in the mempool and maximize the extractable value (MEV) of transaction processing. We will discuss MEV in Chapter 5, Deep Research and Latest Developments in Ethereum. The Ethereum beacon chain requires all validators to perform their duties as designated in a timely fashion, which includes the commitment to active participation online. Validators can be penalized for the following: Slashing, which is a severe penalty for dishonest validators. In this case, a significant portion of their staked ether will be reduced and burnt. This happens when a block proposer signs two different beacon blocks for the same slot, or when an attester signs two different attestations with the same target. An inactivity penalty is a small penalty for negligence of their duty, such as being inactive or idle online and unable to perform their designated duties. An attestation penalty is a small penalty for being unable to provide one or more attestations for the source, the target, and the block head. A sync committee penalty is a small penalty for being unable to provide sync committee attestations.
Benefits from the transition to PoS The transition to PoS will immediately bring the following benefits to the Ethereum ecosystem: Security – PoS will continue to improve decentralization. As of now, there are about 400,000 validators and about 13 million ether staked. The sheer volume of validators and money at stake will make the network more secure. With Eth priced around $1,200, it will require over 10 billion dollars to compromise 2/3 of all staked validators for creating and attesting invalid transactions, and about 5 billion dollars to compromise 1/3 validators to indefinitely delay the finality and make the network useless. Scalability – Transitioning to PoS will improve the Ethereum transaction throughput since validators don’t need to compete to mine the new blocks. Combined with rollups, it is expected to increase in transactions per second (TPS) from 12-15 TPS prior to the merge to thousands of TPS. Sustainability – One immediate effect is a drastic drop in the energy consumption incurred due to mining. It is expected to reduce energy usage by an estimation of 99.95%, which will improve the sustainability of our environment. Capital efficiency – One side-effort of transitioning to PoS is the reduction of new Eths mined, which will reduce the total ether supply. Since the miner no longer relies on consuming tons of electricity for the ether rewards of new blocks, the new issuance of ethers is expected to reduce by 0.3-0.4%, which in turn may improve capital efficiency. Unfortunately, without data sharding or L2 rollups, this alone may not reduce the transaction costs. In the next section, we will discuss the merge of Eth1 and Eth2.
Merging Eth1 and Eth2
Ethereum’s transition from the PoW consensus mechanism to the PoS consensus mechanism has been a long journey. It started with the beacon chain as a separate blockchain that operated under PoS, with the purpose of testing PoS. Although it has been live since December 2020, it didn’t process any Ethereum transactions until the merge in September 2022. The merge of the Eth1 chain and the Eth2 beacon chain will facilitate the transition to PoS and create a unified chain going forward. We will refer to the merged Eth1 and Eth2 as Ethereum going forward.
Merging Eth1 data into Eth2 The following is the pictorial view of how the unified chain looks after the merge. The merge is controlled by the difficulty bomb, which has been ingrained into the Eth1 protocol since 2015. As we discussed in Chapter 1, Blockchain and Cryptocurrency, difficulty is a mechanism to control the speed of new blocks being mined in a PoW-based blockchain. In Eth1, at a predefined block number, by setting the difficulty bomb to a significantly harder level, which results in much longer block mining times and therefore disincentivizes miners from continuing to stay on Eth1, it will force the network participants to move to the PoS consensus mechanism:
Figure 2.25 – Merging Eth1 blocks into the beacon block As you can see, before the merge, both the Eth1 transaction chain and the Eth2 beacon chain run parallel on their own courses. Although, in the beacon chain data structure, there is a placeholder to link Eth1 data to the
beacon chain, it was actually blank prior to the merge. Once the difficulty bomb is triggered, it will start to package transaction blocks within the beacon blocks so that the merged chain will continue to progress with the beacon blocks, and the transaction blocks within the beacon blocks will point to the previous transaction blocks. The first beacon block after the merge will have the transaction block pointing to the last block on the Eth1 chain. The following class diagrams show how Eth2 packages Ethereum transactions as the execution payload block within the beacon block body:
Figure 2.26 – The beacon block and execution payload Notably, since difficulty is no longer in use in PoS, it is replaced with randao, which is randomly generated every epoch. In a similar way, the state root is linked to the beacon state too. The beacon state carries the execution payload header too:
Figure 2.27 – The beacon state and execution payload header Let us take a look at the Ethereum client architecture after the merge in the next subsection.
The Ethereum client architecture after the merge With the merge, Ethereum transaction blocks and the beacon state will be unified as one single blockchain. However, the clients orchestrating the PoS consensus mechanism and the clients executing Ethereum transactions
continue to be separate. Together, they will process Ethereum transactions, progress the head of the chain, and secure the Ethereum blockchain going forward. The following is a logical representation of the Ethereum client in the postmerge world. The consensus layer of the Ethereum client is the software for running the beacon node. All beacon nodes will form a P2P PoS network. As we discussed earlier, it will manage and maintain the validator registry, orchestrate and direct the creation of new blocks, incentivize good network behavior, and disincentivize bad or inactive network behavior. The following are considered the leading consensus clients available: Prysm (https://prysmaticlabs.com) Lighthouse (https://github.com/sigp/lighthouse) Teku (https://github.com/ConsenSys/teku) Nimbus (https://github.com/status-im/nimbus-eth2) Depending on your roles in Ethereum, you may download the full node, and play a role as the beacon node. If you stake at least 32 Eths, you may just download a validator client and play the role of a validator. Please refer to the consensus client documents listed for detailed instructions. The following screenshot shows what the merged Ethereum client looks like. The merged client is made of the PoS consensus client, also called the beacon chain client, and the execution client. The execution client is a modified version of the original Eth1 execution client:
Figure 2.28 – The beacon chain client and execution client after the merge An execution-layer client is the traditional Eth1 client, with the removal of PoW mining. Once merged, PoW mining is essentially useless. The execution layer client, in the post-merge world, will become a worker node, upon receiving assignments from the beacon node to perform its duties as a blockchain proposer, attester, or attestation aggregator. The beacon node communicates with the designated blockchain proposer via the RPC protocol. Upon instruction from the beacon node, the block proposer will collect all transactions from the mempool, sequence them, execute smart contracts within the EVM, and create and propose new blocks per the protocol rules.
The execution client nodes within the same randomly created committee form a subnet. It broadcasts its attestations to other client nodes in the same subnet using a gossip protocol. Selected attestation aggregators, using execution client node software, will further broadcast all attestations to the broad validator P2P network so that all validators will hear all attestations and be ready to vote. The following are four leading execution clients available in the post-merge world: Geth (https://geth.ethereum.org) Erigon (https://github.com/ledgerwatch/erigon) Nethermind (https://docs.nethermind.io/nethermind/) Besu (https://besu.hyperledger.org/en/stable/) Ethereum promotes client diversity. In theory, you can mix and match consensus clients with execution clients, which means there are more than 16 combinations of Ethereum clients available for maintaining and securing the unified Ethereum blockchain.
Revisiting the modular blockchain architecture By now, you need to learn about Ethereum’s approach to building a modular blockchain architecture. PoS and the merge decouple the consensus layer and the execution layer. The execution layer relies on data availability to create new blocks or attest to existing blocks. As depicted in the following figure, a modular blockchain design has made it possible to use any combination of consensus clients and execution clients to support the network, as we discussed earlier. In addition, it enables innovation in swapping and switching any layers or any components within any layer without compromising the entire design of the blockchain, as long as the blockchain protocol rules are followed. For example, a new rollup-based execution client, or a privacy persevere EVM can be plugged in to improve
the execution of the privacy of transactions. Similarly, data can be provided by a third-party data provider too; one example is the Vanadium implementation, where it relies on the off-chain data availability to provide validity proof for ZK rollups. With modular design, some of the key roles can be centralized without compromising the decentralized nature of blockchain. For example, a block proposer or creator role can be split into a block builder role and a separate block proposer role. As more transaction traffic gets on the Ethereum blockchain, the computation of the state transition becomes more complicated and requires a more sophisticated computing capability. Therefore, a centralized subset of block builders could benefit the network, instead of weakening the network security. This is powerful when combining MEV with a separate entity for block-building. We will touch base on this topic in Chapter 5, Deep Research and Latest Developments in Ethereum. The following screenshot is an illustration of the modular blockchain architecture in Ethereum post-merge:
Figure 2.29 – Modular blockchain architecture in Ethereum post-merge In Ethereum, data is another layer to be addressed after the merge, notably through data sharding. It will drastically increase the block space, and make data available for execution. Without data sharding, block space is very limited, which in turn constrains the transaction throughput at the execution layer, and continues to drive up the transaction cost. We will discuss data sharding in Chapter 5, Deep Research and Latest Developments in Ethereum too.
Scaling Ethereum with rollups
Due to the popularity of the Ethereum blockchain, the Ethereum network is constantly congested. With limited space on the base layer of the Ethereum blockchain, the transaction cost is very expensive, as everyone competes for scarce block space. Think of the Ethereum block space as Manhattan, and the Ethereum network as the highway I95 between Washington DC and New York City, and you will be able to grasp what I am referring to. If everyone drives in a single car to Manhattan from Washington DC on I95, you will see the traffic backed up long before you are close to the city. Instead of driving solo, you could take a bus, train, or flight to get to Manhattan from Washington DC. The only limitation is that you may have to adjust your routine based on the public transportation schedule. This is how L2 rollups work. Instead of sending individual transactions to the Ethereum network and processing them one by one in real time or close to real time, L2 rollups allow transactions to be batched together and sent to the Ethereum network for batch processing in a fixed time interval. In this way, the transaction cost is much lower on average. Depending on the techniques each rollup employs, the actual block space needed for all transactions in a batch is much smaller. In general, there are two types of L2 rollups; one is optimistic rollups and another one is ZK rollups. The commonality between these two approaches is outsourcing executions to L2 and keeping minimal computations on L1, but still relying on L1 for security and data. The key difference between them is how they implement the proofs. Optimistic rollups assume the transactions are valid and provide fraud proof for disputes so that a verifier can present the evidence that a state transaction is invalid. ZK rollups implement validity proof, where the block proposers are required to present evidence that the state transaction is valid. Each type has quite a few actual implementations already on the market. We will discuss how optimistic rollups and ZK rollups work in the rest of this section.
Optimistic rollups
Optimistic rollups, as the name indicates, use an optimistic strategy for batching transactions. They assume the batch transaction validity and allow a period of time for the network to provide fraud proof to roll back the transactions. They claim the network is secure as long as there is a single honest validator who actively provides timely fraud proof. Since it allows fraud proof and contest, the transition may take much longer to be finalized. In general, the following diagram depicts how optimistic rollups work:
Figure 2.30 – Conceptual architecture of a L2 optimistic rollup Typically, to power the L2 network, it defines a set of two-way bridges to deposit ether on the L1 Ethereum network to fund the transactions in L2. A bridge is a common blockchain design pattern that facilitates the exchanges of two tokens native to their respective blockchains. This involves the
lockup of ether on L1 and minting new tokens on L2. Depending on L2, some L2 optimistic rollups such as Optimism have their own tokens for asset transactions and gas on L2, while some other ones, such as Arbitrum, only mint tokens for gas transactions on L2. Transactions submitted to L2 will be collected by an aggregator or sequencer. The aggregator or sequencer will batch all transactions together, process them one by one per the L2 protocol rules, and record the state transitions using pre-state Merkle root and post-state Merkle root, as shown in the following figure. It packs them together with all compressed transactions in a batch and sends a rollup contract on the L1 Ethereum network. On the L1 side, Ethereum will treat the entire batch as one transaction, and invoke the rollup contract, which essentially maintains the L2 state on L1. In the Optimism base, it adds the new batch to the contract-maintained ordered list. Depending on the L2 protocol for the optimistic rollups, it will allow a certain time (typically 1-2 weeks) for any fraud proof to challenge the validity of the batch submission and present evidence for fraudulent transitions. Anyone watching the batch submission to L1 can challenge it with fraud proof. The following diagram shows how the transactions and states pre- and postL2 executions are packed for committing to L1:
Figure 2.31 – Batch submission to the L1 base chain
Upon receiving evidence of fraud, the smart contracts in the L1 Ethereum chain will start to replay all or specific transitions from L1, with all transaction data, and determine whether the post-state root is the same as the post-state root submitted from L2. If the fraud proof is verified as valid, the correct state of the batch L2 transactions will be restored on both L1 and L2 network and the verifier will be rewarded. As long as there is one trustworthy verifier on the network, who submits the fraud proof within the challenge period, the transactions on both L1 and L2 are considered final, and the network is considered secure. At this moment, Optimism has to replay all transactions in the batch, as well as all transactions following the batch. In the Arbitrum case, it can check fraud proof for individual transactions in the batch and only revert those impacted transactions. When they exit L2, their L2 tokens or coins will be burnt and the equivalent ether will be allowed to be withdrawn or be used for transactions on the Ethereum network. Due to the nature of optimistic rollups, you may have to wait for the transactions to be finalized for the money to be available for withdrawal, which typically is another 1-2 weeks. The advantages of optimistic rollups include lower transaction cost and a high transaction throughput per second. By outsourcing transaction execution to L2 and compressing transactions to minimize the L1 block space usage, the average gas fee (L1 posting, L1 storage, plus L2 processing) per L2 transaction is much lower than posting them one by one to the L1 network. Most L2 networks use designated aggregator(s) or sequencer(s) to execute L2 transactions on an L2 virtual machine, without enforcing the consensus on L2, and to provide minimal L1 processing for the entire L2 batch on L1. Transaction throughputs on L1, the Ethereum blockchain network, can be much higher. Some L2 networks claim that the TPS can be thousands. Leading optimistic rollups, such as Arbitrum and Optimism, adopt the EVM-compatible virtual machine environment for L2 executions. This helps ease the transition from L1 to L2. Most existing Ethereum development and deployment tools continue to work with the L2 network. Existing smart contracts on the L1 Ethereum network will require minimal changes to be moved over to the L2 rollup network.
The downside of optimistic rollups is the longer finality time. By the protocol design, it has to allow a certain time for fraud proof challenges. DApp designers have to consider the trade-offs of a high TPS versus a delayed finality. Another issue with today’s optimistic rollup implementations is the introduction of centralized or designated roles as the aggregator(s) or sequencer(s). The concern is that this may introduce a monopoly and thus compromise network security. The counter-argument for this issue is that network can be secured as long as there is one honest verifier out of thousands of thousands of network participants. Notable optimistic rollup implementations include the following: Arbitrum – Arbitrum is one the leading L2 optimistic rollup implementations developed by Offchain Labs. It runs on a slightly modified EVM called the AVM. To understand how Arbitrum works, developers can start at https://developer.arbitrum.io/intro/. Optimism – Optimism is another leading L2 optimistic rollup implemented and designed with four main pillars: simplicity, pragmatism, sustainability, and optimism. With that in mind, it tries to make it easier to enable existing DApps and smart contracts to be transitioned from the Ethereum network to L2, without much developer experience. It has its own slightly modified version of the EVM, called the OVM. At this moment, it has a set of centralized entities as sequencers in L2 and is currently working on making the sequencer decentralized to avoid censorship. You should check it out at https://community.optimism.io/docs/protocol/.
ZK rollups Zero-knowledge proof (ZKP) or zero-knowledge protocol is not a new concept. In fact, it was first developed in the 1990s as a method by which one party (as the prover) convinces the other party (as the verifier) that a statement about some secret information is true without revealing the secret itself. In layman’s terms, ZKP is like you want others to know you have the keys to all the doors in a highly secured military complex. Instead of
verifying your keys one by one, you show them you can get in from the entrance and get out through the back door. It is super easy to verify. The ZKP can be interactive or non-interactive. It finds great success in blockchain mainly through two separate use cases; one is as the privacy preservation mechanism for addressing blockchain data privacy concerns, and another one is ZK rollups. Similar to optimistic rollups, ZK rollups batch transactions and publish to the L1 blockchain periodically. Instead of assuming the transactions are valid, ZK rollups generate the ZKPs using zk-SNARK or zk-STARK technology for batch transactions, and send the transactions, pre-state root, and post-start root, together with the ZKP as the evidence of validity proof, to the L1 Ethereum mainnet. Rollup contracts on L1 will verify the ZKP, and post the batch as a single transaction on the L1 blockchain. In general, the following screenshot shows how ZK rollups work in the Ethereum ecosystem:
Figure 2.32 – Conceptual architecture in a L2 ZK rollup Similar to optimistic rollups, ZK rollups rely on bridge contracts to lock up assets on L1 and provide liquidity on L2 through minting L2 tokens. In the same way, the transaction cost is lower too, due to all batch transactions being posted to the L1 block as one single transaction. However, with ZK rollups, an L2 aggregator will collect all transactions, execute them, generate the pre- and post-state Merkle root, and run zkSNARKS to generate the ZKP. The computation for such proof is quite complex, and it requires much more computational power in the aggregator node. Another complication of ZKP is that it can’t run complex smart contracts at the moment. There are quite a few Zero Knowledge EVM (zkEVM) implementations available for specific L2 ZK rollups, but it may take a while for a production-ready version of a general-purpose zkEVM to
be used for such validity proof of smart contract executions. We will discuss the zkEVM concept further in Chapter 5, Deep Research and Latest Developments in Ethereum. The biggest advantage of ZK rollups is fast finality. Since all transactions are validity-proved using ZKP, the finality for ZK rollup transactions happens when the posted L1 block is finalized, which is typically the Ethereum block finality period (in current Eth1, it is about 6 blocks or 1 minute, or after transitioning to PoS, it is 6-12 minutes), plus the L2 batch period. Compared to the 1-week finality time for optimistic rollups, this is much faster. Notable ZK rollup implementations include the following Loopring – Loopring is the first L2 ZK rollup implementation built for exchange and payment. It claims it can settle up to 2,025 trades per second while guaranteeing the same level of security as the underlying Ethereum blockchain. If you’re interested, you should check it out at https://loopring.org. StarkNet – StarkNet is a permissionless decentralized L2 ZK rollup network on top of Ethereum. Instead of using zk-SNARKs like other ZK rollups, StarkNet actually uses zk-STARKs for validity proof. It has its own virtual machine in Cairo, it also works on zkEVMs too. If you’re interested in the difference between zk-SNARKs and zkSTARKs and how StarkNet works, you should check it out at https://starkware.co/starknet/. zkSync – zkSync is another trustless ZK rollup protocol for scalable low-cost payments on Ethereum. It uses ZKPs and on-chain data availability to keep users’ funds as safe as if they never left the mainnet. It has its own zkEVM implementation. Check it out at https://docs.zksync.io/userdocs/intro/#zksync-in-comparison. Ploygon Zero – Polygon Zero is another L2 ZK rollup solution for Ethereum. It claims its Plonky2 technology can generate ZKPs faster than any other existing tech. It has its own implementation of zkEVM, fully compatible with the EVM in Ethereum. We will discuss more on Ploygon as an EVM-compatible sidechain blockchain solution in
Chapter 4, EVM-Compatible Blockchain Networks. You can check out more details about Polygon and Ploygon Zero at https://zkevm.polygon.technology/docs/introduction. Ethereum is building a rollup-centric roadmap as part of the end game to solve the scaling issues in the Ethereum blockchain. According to Vitalik, ZK rollups will be the long-term solutions in the end, while optimistic rollups definitely will serve our needs in the short term. We will talk about the rollup-centric roadmap and future plan post-merge in Chapter 5, Deep Research and Latest Developments in Ethereum.
Summary In this chapter, you learned about the details of Ethereum architecture and how Ethereum works. We went through the key Ethereum concepts, including accounts, contracts, transactions, and messages. We discussed how the EVM works and how the smart contract code is executed within the EVM. We discuss how the Eth2 and beacon chain works and what the unified Ethereum blockchain looks like in the post-merge world. In the end, we also introduce you to various L2 optimistic rollups and ZK rollups. In the next chapter, we will discuss Decentralized Finance (DeFi), the most successful use cases built and powered on Ethereum so far.
Decentralized Finance So far, you’ve learned how peer-to-peer payment works on both the Bitcoin and Ethereum networks. In this chapter, we will introduce you to Decentralized Finance (DeFi). DeFi is an umbrella of new digital financial systems and services powered by blockchain technology and built on top of a decentralized network. We will start by covering the concepts of DeFi and how it is different from Traditional Finance (TradiFi) or Fintech, another area of digital innovation in financial services and payments before the rise of DeFi. We will introduce Ethereum token standards and stablecoins to help you understand the DeFi primitives and basic constructs. Following that, for the remainder of this chapter, we will discuss various DeFi protocols, including products and services in decentralized lending and borrowing, decentralized exchange, derivative, and insurance. We will dive deep into how the leading protocols, such as Aave, Uniswap, dYdX, and Nexus Mutual, work. At the end, we will touch on the concepts of cryptoeconomics and token economics and clarify where they intersect and how they are different. In this chapter, we will cover the following topics: Introducing decentralized finance Mastering Ethereum token standards Analyzing stablecoins and MakerDAO Dai Understanding DeFi protocols Making sense of cryptoeconomics DeFi after the collapse of FTX
Technical requirements
To access the source code for this book, please refer to the following GitHub link: https://github.com/PacktPublishing/Learn-Ethereum-SecondEdition/.
Introducing decentralized finance As its name suggests, DeFi is a decentralized version of the financial products and services that we normally saw and transacted with in our daily lives before blockchain and Bitcoin. It is a set of block-native digital products and services powered by blockchain technology and a decentralized network. In the first edition of this book, we touched a little on the DeFi movement. Since then, DeFi has been evolving into a virtual parallel to traditional finance. In fact, over the last 30 years (since the start of the internet and the World Wide Web), newcomers and digital disruptors have been bringing in convenience, digitization, automation, and innovation to disrupt the traditional financial markets and provide customers with an instant and seamless consumer experience. This gave rise to a new finance technology service called Fintech. Notable firms include PayPal, Square (later renamed Block), and Stripe. Together with payment technology giants, such as Visa and MasterCard, they have dominated digital payments in the Web2 world. As the following diagram illustrates, Fintech grew out of TradiFi through automation and digitalization. It is the digital continuum of TradiFi and takes advantage of the internet, e-commerce, and technological advances. It automates inefficient and disjointed financial business processes and improves operational efficiencies, therefore lowering the total cost per member that’s serviced. Because of that, it improves financial inclusion and makes it accessible to a broader customer base, which would otherwise have limited access to banking in the TradiFi world. On top of that, it leverages mobile, cloud, artificial intelligence (AI), and machine learning (ML) technology to reach out and engage with targeted customers to gain mass adoption. However, Fintech relies on the traditional financial infrastructure to provide digital financial products and services, including fiat and central bank-controlled monetary policies, as well as the antique
Society for Worldwide Interbank Financial Telecom (SWIFT) standard and batch cross-border payment process:
Figure 3.1 – Finance technology evolution At the same time, due to the financial crisis that occurred from 2008 to 2010, Bitcoin was developed as an alternate to the fiat system, and its underlying technology, blockchain, became the new technology infrastructure for the decentralized Web3 world. Enabled with smart contracts and Ethereum, other layer 1 smart contract-based blockchains have evolved as the DeFi infrastructure now powers a new set of crypto products and services. It is not difficult to find the same, similar, or more
advanced financial services in DeFi compared to those in TradiFi or Fintech. Smart contracts are the magic in this transition. They enable the tokenization of physical or virtual products through token standards, which means they turn real-world assets into crypto assets in DeFi. They allow us to bridge one set of crypto assets powered by one blockchain network to other crypto assets on other blockchain networks. The collapse of FTX in 2022, the now-bankrupt centralized crypto exchange, further propelled the move to decentralization and disintermediation. Web3 empowers Fintech with the power of DeFi, helps centralized entities gain the trust of their consumers, and allows it to be both easily and seamlessly accessed by the general public. The train has already started, but there is no sign of a destination yet. This trend will continue. Since the start of COVID-19, more and more TradiFi and Fintech firms have joined the decentralization bus, thus manifesting the grand shift of decentralization. The following diagram shows a pictorial view of the technology stacks of typical DeFi applications. At the base is the blockchain infrastructure, which powers smart contracts and DeFi applications. This is the ultimate settlement layer of all DeFi crypto assets. Smart contracts enable Layer 2 (L2) rollups to improve scalability and throughput to the Layer 1 (L1) blockchain infrastructure and provide connectivity to bridge from one blockchain infrastructure to another. Crypto primitives, such as fungible tokens, Non-Fungible Tokens (NFTs), and stablecoins, are supported natively via smart contracts. Composability in smart contracts makes it possible to build the leading DeFi protocols and products, such as those in the categories of lending and borrowing, DEX, crypto derivatives, and insurance and risk management:
Figure 3.2 – Decentralized finance technology stack In the next section, we will introduce the basic concepts of crypto assets and Ethereum token standards. These are the primitives that fuel the crypto economy in terms of blockchain and propel DeFi products and services.
Mastering Ethereum token standards In this section, we will go over Ethereum token standards. They are a set of smart contracts that define rules, conditions, and operations that specify how the token will function in the Ethereum ecosystem. We will start by looking at the different types of tokens or coins, and how they are funded.
In layman’s terms, the concepts of cryptocurrency, crypto assets, coins, or tokens are interchangeable, although they may mean different things in the crypto space. Let’s take a look at what they mean.
Definition of cryptocurrency Just like fiat currency, coins such as Bitcoin or Ether are the native cryptocurrency that’s minted through the L1 blockchain protocol and circulated on the blockchain network to fuel the crypto economy in its ecosystem. Bitcoin is sometimes referred to as a coin because they were the first kind of crypto coin to be created. All other L1 minted coins, including Ether and those we mentioned in Chapter 1, Blockchain and Cryptocurrency, are called Altcoins. Tokens, on the other hand, are a form of cryptocurrency and act as a digital or crypto representation of the entirety or a portion of the tradable assets in the blockchain ecosystem. They are created outside of L1, usually through smart contracts or L2 and Layer 3 (L3) constructs. There are many ways to categorize an asset and investments in TradiFi. The most well-known ones include stocks, bonds, currency, real estate, commodities, futures, and financial derivatives. Some financial professionals also include art and collectibles, as well as cryptocurrency. Crypto assets are tradable digital assets in the form of coins or tokens on the blockchain. They are a new class of assets that arise from cryptocurrency and blockchain technologies. These are regulated autonomously from peerto-peer networks and are secured with cryptographic technologies. Tokens typically fall under different categories, including the following: Utility tokens: Tokens that are created to fuel the Initial Coin Offerings (ICOs) or operations of Decentralized Applications (DApps). Security tokens: Tokens that are created and governed by financial regulations.
Payment tokens: Tokens that are used for payment, including native coins or any tokens issued for payments, such as stablecoins. Fungible tokens: A type of transferrable token, similar to Bitcoin. One token equals one Bitcoin, functions as one Bitcoin, and has a value of one Bitcoin, similar to any other kind of Bitcoin. NFTs: A type of token that represents a unique underlying asset – for example, a piece of original art. We will talk about fungible and NFT tokens in more detail in the Ethereum token standards subsection. Blockchain or DApp projects are typically funded through ICO, Security Token Offerings (STOs), Initial Exchange Offerings (IEOs), Initial Decentralized Exchange (DEX) Offerings (IDOs), or crypto airdrops. Those who were observing closely in 2017-2018 know how crazy the crypto ICO markets were.
Crypto funding mechanism Since the first crypto token sale by Mastercoin in July 2013, ICOs have been popular among blockchain, cryptocurrency, and investment communities until they peaked in 2017-2018 when they became a quick and effective way to raise funding in the form of cryptocurrencies. An ICO is a type of crowdfunding or crowd sale that uses a blockchain platform and cryptocurrency. It is similar to an initial public offering (IPO) in the traditional finance market. In the case of an IPO, the company raises millions of funds through the sales of its stocks. In an ICO, the company raises money in the form of Bitcoin, Ether, or fiat currency by selling their coins or tokens to the public, investors, or speculators. Unlike IPO, which is regulated, ICO is not bound by any legal requirements. An ICO is used to fund a DApp’s implementations, operations, and expansion. Usually, the company that’s raising funds via an ICO will market its ICO plan publicly or privately and create a whitepaper that explains its project, the technology behind it, and details about its team
members. In most cases, a smart contract is created for prospective investors to participate in the ICO. Once launched, the smart contract will credit the investors with the ICO’s native crypto assets in exchange for existing cryptocurrencies, such as Bitcoin or Ether. ICO was considered risky since it was not regulated. In addition to an ICO, another common mechanism is to go through an STO as a crowdfunding or crowd sale type of funding method to fund blockchain and cryptocurrency projects. It is similar to an ICO in that an investor exchanges their assets (money and/or other assets) for coins or tokens as investments. The same process is followed. However, there are some key differences. Most ICOs serve as utility tokens and are used to fund the normal operations of the cryptocurrency and blockchain network. STOs have to comply with regulatory governance and must be offered under security laws. Therefore, the barrier to offering tokens under an ICO is relatively low, while for an STO, it is much higher. For the same reason, tokens under an STO are considered less risky compared to ICO coins since STO tokens are backed by real assets and, in general, are protected under security laws. You will continue to see both models as the new funding channels for cryptocurrency and blockchain projects. You may continue to hear the debate about ICO versus STO. Depending on individual jurisdictions, the line between ICO and STO may become very murky. Both ICOs and STOs are a good way to secure the funding of cryptocurrency, but both, by themselves, lack the liquidity the crypto market requires. After they cooled down in 2018, a new crypto-funding vehicle with an IEO became more popular. Instead of issuing ICOs or STOs, the token issuer could go to an exchange to get their tokens and projects listed under the IEO. In return, the exchange exercised due diligence on all the requirements for such listings and market the tokens to the potential investors on behalf of the token issuer. Binance is a perfect example of such an exchange. You can check out the Binance model at https://www.binance.com/en. IEO requires the offerors to get permission from the exchange to list crypto projects for funding. As DEX becomes more popular, another similar
funding model, called Initial DEX Offering (IDO), emerged. The key difference is that IDO doesn’t require permission to list crypto projects. There are other new crypto funding models too, with one notable one being Crypto Airdrop. Airdrop is viewed as a marketing strategy where the crypto project transfers crypto assets to multiple wallets for free or through small ways to claim. The purpose of such a transfer is to promote crypto projects to current or potential users and increase awareness of such projects. Crypto funding has become increasingly popular over the years. It helps crypto projects quickly acquire the funding they need and jump-start the project. For investors, it offers investment opportunities in decentralized finance. However, like any investment, it comes with many darker side aspects too, including rug-pulls, pump-and-dump, wash trading, and security hacks. A rug-pull scam happens when the creators of a crypto project suddenly withdraw their funds and disappear, leaving investors with worthless tokens. Pump-and-dumps are a type of fraudulent practice where a group of people artificially manipulates the price of a cryptocurrency for profit. Similar to a pump-and-dump, wash trading is another type of market manipulation where a trader buys and sells the same cryptocurrency or NFT multiple times to create the illusion of high trading volume. Hacks occur when the penetrators take advantage of the security holes in a system and steal millions of dollars in value from the protocol. It is important to be aware of all those scams and the risks involved since all of them occurred in the short history of cryptocurrency.
Ethereum token standards Ethereum started as the world’s first computer and blockchain platform for developing smart contract-based DApps. Quickly, the Ethereum community realized it was the perfect platform for creating unique tokens that exist and operate on the Ethereum blockchain. As such, many token standards were developed to facilitate the creation and launch of a vast variety of crypto tokens.
Fungible tokens and NFTs
Like fiat currency, cryptocurrencies, including Bitcoin, Ether, and Altcoins, are considered fungible tokens, which means they can be interchanged with the same type of tokens. A Bitcoin is a Bitcoin. When you exchange a Bitcoin with someone, you are expecting one Bitcoin as an exchange. Only quantity matters. There are other NFTs, such as digital artworks and collectibles, digital rights, digital royalties, and so on. The most popular type of NFT is called CryptoKitties, an Ethereum-based video game for breeding, trading, and collecting various types of virtual cats. They are similar to a piece of artwork or a collectible. Each item is considered distinct and valued differently. Ethereum provides a platform and standard token interface through a smart contract for crypto projects to implement and launch new coins or tokens. A smart contract is a digital legal contract that acts as code between the investors and Ethereum token offerors. The ERC-20 standard, and its variations, define the standard smart contract interface for fungible tokens. ERC-721 is another official standard for implementing and launching NFTs. ERC-1155 is an Ethereum multi-token standard that enables you to bundle both fungible tokens and NFTs for trading in one transaction. ERC is an abbreviation for Ethereum Request for Comments and represents the standard proposal and comments process for Ethereum improvement. We will touch on ERC and the Ethereum Improvement Proposal (EIP) as part of Ethereum governance in Chapter 5, Advanced Topics and Latest Developments in Ethereum. The following diagram illustrates the most popular token standards:
Figure 3.3 – Ethereum token standards Later in this chapter, we will discuss the ERC-20, ERC-721, and ERC 1155 standards in more detail. If you’re interested, you can check out the Ethereum EIP site for more details on the various standards listed here and the new standards being proposed.
ERC-20 for fungible tokens The ERC-20 standard defines the standard interface for addressing the functionalities needed for issuing money-like fungible tokens, including how tokens can be transferred between accounts, and how users can retrieve data associated with a given ERC-20 token. All the ERC-20 compatible tokens have to implement the ERC-20 interface, as shown in the following screenshot:
Figure 3.4 – ERC-20 smart contract interface The following six functions are defined in the standard interface: totalSupply():
This function allows anyone to call and get the total supply of tokens. The internal smart contract implementation keeps track of the total supply of the tokens. balanceOf(address tokenOwner):
This function allows authorized users to retrieve the token balance of the account associated with the tokenOwner address. allowance(address tokenOwner, address spender):
When executed, this function returns the total number of tokens that the account owner specified for someone from another account to be able to withdraw and spend. transfer(address to, uint tokens):
This function allows you to send a certain number of tokens to a given address. The sending account must have enough tokens. When executed, it will fire a Transfer event. If there aren’t sufficient tokens to be sent, it will throw an exception. approve(address spender, uint tokens):
This function enables a specific account to approve a predefined upper limit for the number of tokens to be withdrawn by the spender. Internally, the smart contract maintains a list of spenders and the allowance for each token owner.
The allowance can be reset through this call. On a successful call, it will trigger an Approval event. transferFrom(address from, address to, uint tokens):
This function allows the smart contract to send a specific number of tokens from one contract address to another contract address. Optionally, the token issuer may define additional attributes for the token, such as its name, symbol, and decimal. Decimal is the decimal the token is using; by default, it is 18 decimals. It can provide getter methods to query the attributes. Some other standards have been extended from ERC-20 to address some of the issues with ERC-20, including the following: The ERC-223 standard was created to address the lost tokens problem from ERC-20, where a transfer transaction mistakenly sends tokens to a wrong or invalid smart contract address. It added a security perspective to ensure approve and transferFrom behave as they are supposed to. It is backward compatible with the ERC-20 standard. ERC-777 addresses the same issue differently. It tries to improve the wildly used ERC-20 standard while remaining backward compatible with ERC-20. It allows both contract accounts and regular accounts to have control over the tokens being transferred. It introduces an operator concept as the mediator for the sender to send the tokens on their behalf. The operator, set up as a verified contract, can act like an exchange and transfer funds or burn coins on behalf of the token holders. The interface provides methods for the senders to authorize and revoke operators, who can send tokens on their behalf. ERC-4626, according to EIP definitions, is created to provide tokenized vaults representing shares of a single underlying ERC-20 token. A tokenized vault is a smart contract that can be used to store and manage ERC-20 tokens. In layman’s terms, ERC-20 is like the money in your checking account, which can be used to make payments. A tokenized vault is like the interest you earn from a savings account, where interest is generated out of your money.
ERC-4626 is useful in decentralized lending or yield-framing use cases, where you lend or stake your cryptocurrency for a profit of some sort of crypto interest. According to ERC, all ERC-4626 tokenized vaults have to implement ERC-20 to represent shares. It has the underlying shares of crypto tokens. ERC-20 operations such as balanceOf, transfer, and totalSupply represent the interest-bearing portion of the underlying assets. Using the earlier layman’s terms, balanceOf() returns the current balance of your earned interests, transfer() allows you to transfer a certain portion of earned interest out, and totalSupply() gives you the total interest you have earned. We will discuss decentralized lending in the Understanding DeFi protocols section later in this chapter. We encourage you to check out the EIP site for more details about the ERC20 family of standards: https://eips.ethereum.org/erc.
ERC-721 for NFT ERC-721 defines the standard for building NFTs on the Ethereum blockchain. In this class of crypto assets, every token is unique, has distinct perspectives, and is valued differently. It is kind of like a collectible, such as a rare stamp, work of art, or exotic car. The standard interface defined by the ERC-721 standard enables smart contract implementations to manage, own, and uniquely trade tokens, appropriate to the underlying crypto asset. The following screenshot shows the functions specified in the ERC-721 interface:
Figure 3.5 – ERC-721 smart contract interface Every such token has a unique token identifier or token ID. The ERC-721 standard does not mandate any standard set of token metadata nor restrict adding supplemental functions. It is up to the developer of the NFTs to define additional metadata or functions suitable for defining the uniqueness of the token. The following list explains some of the functions present here: balanceOf(address _owner):
This function allows the token owner to retrieve the count of all NFTs the owner was assigned to. ownerOf(uint256 _tokenId):
This function returns the token owner’s
address for a given token ID. safeTransferFrom(address _from, address _to, uint256 _tokenId, bytes data): This function allows the token owner
to
transfer the ownership of a NFT to another address. It throws an exception if the sender is not the owner of the token or the authorized operator. Some data with no specific format can be passed as part of the transfer function.
safeTransferFrom(address _from, address _to, uint256 _tokenId): This is an overloaded function with no data that’s passed
as
part of the transfer. approve(address _approved, uint256 _tokenId): This function grants the third-party operator, given by the _approved address,
permission to transfer the token on behalf of the token owner. setApprovalForAll(address _operator, bool _approved):
This
function gives the token owner the ability to grant or revoke the approval permissions of all operators. When _approved is set to true, it sets the approval permissions; otherwise, it revokes all the operator permissions. getApproved(uint256 _tokenId):
This function returns the approved
operator address for a given NFT ID. isApprovedForAll(address _owner, address _operator):
This is a Boolean function that checks whether the operator, given by the _operator address, is the approved operator for the given token owner, given by the _owner address. ERC 721 and NFT introduce infinite opportunities to crypto markets. Beyond art collectibles, anything that has unique valuable traits can be implemented as an NFT. Music, patents and trademarks, the deeds to your house, a famous player’s game, you name it – they can all be issued as NFTs for someone to collect and trade. Imagine that you have a super clever miner extract value (MEV) strategy and you want to profit from anyone who uses it. NFT may be a good option here.
ERC-1155 for multi-tokens ERC-1155 defines the smart contract standard for bundling a group of fungible tokens and NFTs within one transaction on the Ethereum blockchain. There are a lot of use cases for this kind of implementation. The most touted is the gaming or metaverse use case, where the offeror may allow users to buy and sell game money in the form of fungible tokens and game collectibles in the form of NFTs in the same trade. It has both ERC-20
and ERC-721 operations, which offer more enhancements. This class of crypto assets has a portion of its fungible assets represented as fungible tokens, as well as a port of unique assets for NFTs. The standard interface defined by the ERC-1155 standard makes it easy for smart contract implementations to manage, own, and trade a variety of tokens within the same transaction, therefore reducing the transaction’s cost. The following screenshot shows the functions specified in the ERC-721 interface:
Figure 3.6 – ERC-1155 smart contract interface The functions in the ERC-1155 smart contract interface are listed here: safeTransferFrom(address _from, address _to, uint256 _id, uint256 _value, and bytes calldata _data) external: The safeTransferFrom operation’s transfers _value amounts to a type(represented by _id) token from the _from address to the _end address. It exits a TransferSingle event. The caller must be approved
to manage the token from the _from address. When the _to address is a smart contract, it must follow the safe transfer rules. safeBatchTransferFrom(address _from, address _to, uint256[] calldata _ids, uint256[] calldata _values, bytes calldata _data) external: The safeBatchTrasferFrom operation transfers a list of certain amounts of a certain token type from the _from address to the _end address. It exits a TransferBatch event. The caller must be approved to manage all such tokens from the _from address. When the _to address is a smart contract, it must follow the safe transfer rules. balanceOf(address _owner, uint256 _id) external view returns (uint256): The balanceOf operation inquires a certain token type (represented by _id) to be balanced in the _owner address. It
returns the actual balance. balanceOfBatch(address[] calldata _owners, uint256[] calldata _ids) external view returns (uint256[] memory): The balanceOfBatch operation inquires the balances of a list of token types (represented by _id) in the _owner address. It returns the actual balances
of each type. setApprovalForAll(address _operator, bool _approved) external: The setApprovalForAll operation approves or disapproves
a
third-party operator’s ability to manage all of the caller’s tokens. It emits an ApprovalForAll event. The Boolean value of _approved means approval when it’s set to true and disapproval when it’s set to false. The owner can always manage all of its tokens. By granting third-party operators this ability, they can manage their own tokens. It can take away such approval using the same method. isApprovedForAll(address _owner, address _operator) external view returns (bool): The isApprovedForAll operation
inquires where the operator has the approval to manage all of the tokens in the _owner address. Safe transfer rules are defined to ensure that such tokens are transferred securely from one address to another. The caller must be approved to manage the tokens before they can transfer any tokens. As per the specification, when the _to address is a smart contract, the recipient’s
smart contract must implement the ERC1155TokenReceiver interface and act appropriately. ERC1155TokenReceiver is used to handle the receipt of single or batch transfers. In such cases, at the end of the safeTransferFrom function, it must call the onERC1155Received method on the recipient’s smart contract, after the balance has been updated. Similarly, at the end of the safeBatchTransferFrom function, it must call the onERC1155BatchReceived method on the recipient’s contract, once all the balances have been updated. If you’re interested, you should check out the EIP site for more details: https://github.com/ethereum/EIPs/blob/master/EIPS/eip-1155.md. With that, you have learned about various Ethereum token standards. In the next section, we will discuss what a stablecoin is and how the ERC-20 standard is used to create MakerDao Dai stablecoins.
Analyzing stablecoins and MakerDAO Dai Imagine that it’s a sunny day and you go for a cup of coffee in a busy downtown coffee shop. You are paying for coffee with your Ether, and by the time you get your coffee, you have to pay 10% more Ether after 30 minutes of waiting in line. That is as crazy as it can get. Unfortunately, that is the volatility we saw in major cryptocurrencies such as Bitcoin and Ether during the crypto market crash between late 2017 and early 2018, and lately in 2022. The value of some coins fluctuated as much as 20% daily. Stability is a major issue for cryptocurrencies looking to gain mainstream adoption in the mass market. That is where stablecoin helps. As its name suggests, it is a type of token that’s designed to smooth out the volatility of the value of coins and make them more stable in value over a certain period. In the next subsection, we’ll cover the different types of stablecoins on the market.
Categories of stablecoins
Stablecoins are crypto tokens that are used to offer investors relatively stable alternatives to pegging specific assets, such as fiat currency or commodity, or leading cryptocurrency. Their supply is adjusted based on demand using some pre-defined governance rules. The very first stablecoin, bitUSD, was issued in 2014 by pegging its value to the US dollar. It is intended to minimize the typical cryptocurrency volatility by maintaining collateral in the form of reserves, in US dollars. Since then, stablecoins have garnered great traction. They have become the critical building blocks of the DeFi ecosystem. Stablecoin functions as money and serves as the common currency denomination and media exchange that powers almost all the leading DeFi protocols. In general, there are four types of stablecoins, as shown in the following diagram:
Figure 3.7 – Different types of stablecoins Let’s review them in detail: Fiat collateralized stablecoin: This is pegged to a fiat currency such as the US dollar or the euro. It is backed by a reserve with a certain amount of fiat currencies as collateral. One of the most popular ones is Tether (USDT), which is pegged to the US dollar with a ratio of 1:1. Circle USDC is another example of such a stablecoin.
Commodity collateralized stablecoin: This is pegged to a common commodity, such as gold. It is backed by a certain value of the commodity as collateral. One example is Digix Gold (DGX), an ERC20 coin that is pegged to gold with a ratio targeted to be 1 DGX to 1 gram of gold. Paxos Gold (PAXG) is another such example. Crypto collateralized stablecoin: This is collateralized by a major cryptocurrency, such as Bitcoin or Ether. This category of stablecoins is normally backed by a certain amount of Ethers or Bitcoins. A popular example is Dai, created by MakerDAO, which is pegged to the US dollar with a target ratio of 1:1. Algorithmically collateralized stablecoin: This is also called a seigniorage-style coin. It uses algorithms to control the stablecoin’s money supply. There is no collateral for such stablecoins, and their value is controlled purely by supply and demand and stabilized by various algorithms. Stablecoins are blockchain-based versions of fiat currencies, which enables them to interact with blockchain-based applications and smart contracts. Stablecoins backed by cryptocurrency or algorithms are supported through the blockchain and secured with the decentralized peer-to-peer network and cryptography. Fiat-collateralized and commodity-collateralized stablecoins introduce a centralized entity that oversees the collateral provisions and manages any reserves. They may require additional transparency and oversight, as well as audit and compliance enforcement.
MakerDao Dai The concept of Dai stablecoin is like a home equity line of credit (HELOC) loan, through which you tap into your home equity on your house as collateral and get a pool of cash for your financial purposes. The interest rate of a HELOC loan is normally tied to the prime rate, which is set by the federal reserve. The bank lending you the HELOC loan may also charge some fees associated with loan applications and monthly or annual maintenance of the HELOC account.
With the Dai stablecoin, you are locking up some Ether as the collateral asset in exchange for Dai. It is intended to be a stable cryptocurrency whose unit value is pegged to one US dollar. The advantage of such stablecoins is that they can leverage all essential capabilities of blockchain technology. At the same time, Dai can help you realize the full potential of a cryptocurrency and decentralized network.
Mechanism of MakerDao Dai Dai stablecoin started as a project at MakerDao in 2015. It was launched as a single collateral smart contract platform on Ethereum in 2017 that allows anyone to deposit Ether as a collateral asset; in return, it generates a certain number of Dai on the Maker platform. You can then use Dai just like any other cryptocurrency. You can send Dai to anyone and pay for your coffee or any goods and services, so long as the merchant accepts it. You can open a Dai savings rate (DSR) deposit account to earn interest by locking up the Dais in the DSR deposits. It is a kind of savings account in your regular bank, and you earn interest on a per-second base, based on system-defined DSR. MakerDao moved away from the original single collateral platform to a multi-collateral Dai (MCD) system in 2019, where it accepts many leading crypto tokens as collateral, in addition to Ether. The MCD is also called the Maker protocol. The following diagram shows a high-level conceptual architecture of the Maker protocol:
Figure 3.8 – MakerDAO Dai conceptual architecture At its core, it is the Maker vault that locks up collaterals, mints Dai stablecoins, maintains the accounting of all Dais and their collaterals, and burns Dais when the user exits. Here is how it works behind the scenes: The user can open a Maker vault by depositing Ether or any Makersupported crypto asset as collateral. This allows the Maker vault to lock up the collateral in exchange for Dais as a loan. Dai is pegged to one US dollar. The maximum number of Dais are minted proportionally to the dollar value of locked collaterals. The Maker protocol leverages Oracle to access the real-time price information of external crypto tokens and determine the value of collaterals in real time.
When the user wants to exit, the Maker protocol clears all debts and redeems all their collaterals. In this case, the user repays the loaned Dais and pays all accrued system stability fees. The stability fee is like the interest you accrue on Dais loans. Once all debts are cleared, the vault is closed. The Dais will be burned to allow the Maker protocol to maintain a healthy collateralization ratio. The system sets up the minimized collateralization ratio, which is the liquidation ratio that triggers the liquidation of the vault if its collateralization ratio falls below the minimized collateralization ratio. This was designed as part of the Maker protocol to mitigate risks. Once triggered, the system will liquidate part or the entirety of the collateral to pay back the loaned Dais, plus the accrued system stability fees and penalties. What is left will be returned to the user. The governance of the Maker platform is established at the system level, which is done through on-chain voting from MKR voters. The Maker platform allows internal governance variables to be modified through some designated smart contract, also called an active proposal smart contract. Any MKR account holder can deploy smart contracts to propose changes to the governance variable. The MKR voters then vote for the active proposal from all proposals with their MKR tokens. The smart contract with the highest number of votes is designated as the active proposal smart contract and granted permission to make the system changes. You can check out the Dai stablecoin white paper for more details: https://makerdao.com/whitepaper/White%20Paper%20The%20Maker%20Protocol_%20MakerDAO’s%20MultiCollateral%20Dai%20(MCD)%20System-FINAL-%20021720.pdf.
The technical architecture of MakerDao The following diagram shows the high-level technical architecture and components in the Maker protocol. The components depicted in this diagram are a set of smart contract modules in the Maker protocol to facilitate initiation, shutdown, governance, and regular operations of the
protocol. Those components marked as proxies indicate that a proxy interface is in place for some of the functions. On the right-hand side, we can see the Maker vault, which contains the system records to store the vaults and keep track of all collaterals and loaned Dais. It comes with a set of core modules that allow most of the components in this architecture to interact with the Maker vault to actively maintain the collateral account. MakerDao only allows one type of collateral per vault. If you want to use multiple crypto assets as collaterals for loaning, you have to create multiple vaults:
Figure 3.9 – MakerDAO Dai technical architecture
On the left-hand side, we can see a set of smart contracts in the governance module that facilitate MKR voting, proposal execution, and voting for the security of the Maker Protocol. MakerDao Dai is a dual token system. One token is ERC-20-compliant Dai and is used as a form of stablecoin. Another is the MKR token, which is also ERC-20-complaint and is used as the governance token. The MKR module in this diagram mints, burns, and circulates MKR tokens. In addition to the standard ERC-20 interface, MKR also has DSAuth-protected mint and burn functions; DSAuth is a smart contract that defines authorization rights for minting and burning tokens. MakerDAO governance in the Maker protocol is executed on-chain through a vote made by the MKR token holder. In this case, MKR is used to represent the voting power of MKR holders to vote for the risk management and business logic of the Maker protocol. These include votes for the risk parameters for each collateral type. Here are the key parameters for each collateral type: Stability fee: A fee that continuously accrues on debt in a vault, kind of like the interests accrued on mortgages or HELOC. The stability fees are paid in Dais. Debt ceiling: The maximum amount of Dais generated for a given collateral type. Liquidation ratio: The minimum ratio of collateral value to debt per vault. It is set to 150%. In addition, an MKR token is used as a utility token to balance the surplus and debt of the protocol. For healthy vaults that stay above the liquidation ratio, Dai stability fees are accrued. The total stability fee that’s accrued from all healthy vaults becomes the surplus Dais. On the other hand, if any vault falls under the liquidation ratio, the protocol will accrue debts. Once the surplus versus debt ratio hits a certain system-defined threshold, MKR holders can use MKR to vote to enable the Flapper auction house to sell Dai
surplus for MKR and use a Flipper smart contract to buy debts with MKR tokens. In those cases, MKR tokens can be minted or burned. In the case of emergence, MKR holders can deposit MKR to emergent shutdown smart contracts and authorize it to be shut down, closing all vaults. It is intended to be invoked to mitigate malicious governance or to prevent the exploitation of a critical bug, such as one that allows collateral to be stolen. The following are high-level descriptions of the modules in the middle of the preceding diagram: System Stabilizer: This is comprised of vow, flopper, and flapper contracts. The purpose of System Stabilizer smart contracts is to correct the system and maintain the balance of the surplus and debt of Dais. The vow smart contract represents the overall protocol balance of surplus and debt. The flopper smart contract is used for debt auction to cover the deficits by auctioning off MKR in exchange for Dais. Once the debt has been auctioned, new MKR tokens are minted. On the other hand, flapper contracts auction off the surplus Dais in exchange for MKRs; once auctioned, those MKRs will be burnt. Peg Stability: This is comprised of dssPsm contracts. The Peg Stability module was created to allow users to swap a given collateral type directly for Dai at a fixed rate, rather than borrowing Dai, such as swapping other stablecoins for Dai. It is a special vault with a 100% collateralization ratio and zero accruals. Rates: This is comprised of jug and pot smart contracts. It was created for accumulating the stability fee on the loaned Dais and the interest on the DSR deposits. Dai Module: This provides an ERC-20 smart contract for DAI tokens, as well as the Daijoin adapters that will accept the allowed type of collateral assets to be collateralized for Dais. Oracle Module: The Maker protocol relies on external pricing information to determine the collateralization ratio. The Oracle module was created for this purpose. It has two smart contracts called median
and Oracle Security Module (OSM). The median contract, as its name suggests, calculates the median of prices from a whitelist of price feed contracts and uses them to update the token value in the MakerDao system. OSM ensures delayed prices from those whitelisted price feed contracts are used in the system. Flash Module: The Flash module enables the flash loan, which allows anyone to mint Dais up to a limit set by Maker governance without the need for collaterals, as long as the loaned Dais are paid back in the same transaction. The system charges a fee for such transactions, which means it provides an additional income source for the protocol. Collateral Module: This is comprised of join and clip smart contracts. Joins are adapters that deposit or withdraw uncollateralized assets into/from a Maker vault. Clip contracts are used for selling the transferred collateral for Dai in an attempt to cancel out any debt and assign it to the Maker protocol. Shutdown Module: This is comprised of an end smart contract, the purpose of which is to coordinate a system shutdown in the case of emergence, system upgrade, and so on. The Shutdown module closes down the system and reimburses Dai holders.
MakerDao Dai protocol details The Maker vault maintains collateralization ratios using the following formula: Collateralization ratio = Number of tokens in collaterals * unit price _____________________________ dollar value of Dais * 100 ; Let’s say you collateralize 10 Ether, and each Ether is priced at 1,500 dollars. Here, 1 Dai is 1 US dollar, and the Maker collateralization ratio is 150%, or 1.5. If you deposited 10 Ether, which is worth 15,000 US dollars, you are allowed to mint and take 10,000 Dais. You can take a one-time loan of 10,000 Dais, or take out Dais multiple times, but the total loan can’t amount to more than 10,000 Dais. The 10 Ether will be locked out in the Maker vault until 10,000 Dais and the associated stabilization fees are paid
back. If the price of Ether increases, you may be able to borrow more Dais. But if the price of Ether drops, you may have to deposit more Ether or return some Dais to maintain the debt ratio. The following variables are used to calculate the collateral ratio: The total amount of locked collaterals Th real-time price of collateralized assets in units, which are assessed through Oracle System stabilization fee The total amount of Dais loaned The following diagram shows the collateralization ratio as time lapses:
Figure 3.10 – Collateralization in MakerDAO
At time 0, until around time 5, the collateralization ratio is above 1.5, which means over-collateralized at or above a ratio of 150%. During this period, the vault owner may borrow more Dais. But at around time 5, when the ratio falls below 1.5, the system liquidates the assets in the vault to cover the loaned Dais and the stabilization fee, as well as provide a penalty. Between times 5 and 6, if the sale of assets through the liquidation process can cover the total cost of the loan, the remainder will be returned to the vault owner. If it can’t be covered or the vault is bankrupt at time 6 or after, all assets will be confiscated and the loss will be covered by the protocol. Stablecoins, together with Ethereum token standards, are the fundamental building blocks and primitives in the DeFi ecosystem. In the next section, we will discuss how the leading DeFi protocols work.
Understanding DeFi protocols In this section, we will help you understand popular DeFi constructs and concepts and show you how different DeFi applications work, including decentralized lending and borrowing, exchange, derivative, and insurance. We will also dive deep into leading DeFi protocols in those categories.
Basic DeFi constructs Let’s start by looking at the key DeFi building blocks and basic constructs: Liquidity pool: Liquidity and liquidity pools play a critical role in the DeFi ecosystem. Liquidity, in the form of crypto assets, is the lifeline of any liquidity-based DeFi application. It is used to bootstrap, govern, and secure the DeFi protocol. It is used to facilitate the trading of crypto assets through the protocol rules. A liquidity pool is a mechanism where liquidity is locked into the smart contract; the process of locking liquidity is also called staking. The liquidity provider may get the governance token – that is, LP tokens – as proof of staking.
In the MakerDao Dai section, we showed you how liquidity and liquidity pools are used in MakerDao Dai, the popular stablecoin protocol. The governance token, MKR, is the LP token. A liquidity pool can lock one or multiple types of crypto coins. In MakerDao, it locks one type of crypto coin at once. In a DEX such as Uniswap, which we will discuss in the next few sections, the pool locks the pair of crypto coins to facilitate the exchange of one coin for another. In return for their staking, the liquidity providers accrue interest in the form of governance tokens or some interest-bearing tokens. This type of process, in which tokens are staked for maximum returns, is a type of yield farming and is the most popular type. We will discuss yield farming later in this section. Exchange: In traditional finance, an exchange is a marketplace where financial instruments such as stocks, commodities, futures, and derivatives are traded. The exchange of crypto assets is supported through DEX, where the crypto asset holder of one type of coin can trade their holdings in exchange for another type of crypto coin. In traditional finance, such trades happened through the use of an order book, or via the market maker. An order book records the bid and asking price movements that reflect the interest of buyers and sellers in a particular financial instrument. A matching engine uses the order book to determine which orders can be fully or partially executed. A market maker is an entity that provides its liquidity for quoting both the buying and selling the assets in a trade, hoping to make a profit on the bid-ask spread. In the DeFi world, on-chain order book matching is too costly, since the bids and asks, as well as the crypto price movement, have to be actively maintained on the blockchain network. Automated Market Maker (AMM) was introduced to allow people to match and swap two types of crypto coins. We will discuss the DeFi exchange and AMM in more detail in the following subsections.
Flash loan: We briefly mentioned flash loans when we discussed the technical architecture of MakerDao Dai. A flash loan is a common concept that’s implemented in many DeFi protocols. It is a type of uncollateralized loan that can be lent instantly without any collaterals, so long as the loans are repaid in the same blockchain transaction. Oracle: We briefly introduced Oracle as a component of MakerDao Dai that collects pricing information about collateralized crypto assets. Oracle is a common concept in all DeFi protocols, as well as broad blockchain ecosystems. It is a mechanism that a blockchain uses to integrate with external systems. For example, a sports betting DApp may need weather information to determine how the bets are played out. In such cases, the DApp may have to rely on Oracle to retrieve the weather information to aid with betting. In DeFi, Oracles are commonly utilized to share and integrate the price movement of different types of crypto assets, as MakerDao Dai did. Bridge: In addition to integrating with external systems, sometimes, a blockchain needs to communicate and gain visibility of transactions with other blockchains or L2 protocols. That is where Bridge comes into play. We mentioned Bridge when we discussed L2 rollups in Chapter 2, Ethereum Architecture and Ecosystem. There, we stated that it is a common design pattern for enabling the integration of token exchanges between the L1 and L2 networks. Bridge is another critical component in DeFi and blockchain that has been implemented in almost all DeFi protocols. Bridge acts as a smart contract component on both sides of a network to facilitate the communication and exchange of information between blockchains. Tokenization: So far, we have mostly discussed native tokens and coins supported by blockchain and decentralized networks. In DeFi, tokenization is the process of designing fungible or NFTs representing the underlying physical or digital assets. For example, the real estate properties you own can be tokenized in the form of fungible tokens and put on a blockchain for buying or selling. If your property has any unique traits, you can tokenize them into NFTs and put them on the blockchain network as NFTs. Tokenization brought new liquidities to DeFi and made it possible to connect the real world to the crypto world.
Composability: Composability in DeFi refers to the ability to quickly integrate and aggregate different DeFi constructs and smart contracts, as well as create new DeFi products. As we saw in Figure 3.2, smart contracts are the real magic behind all these DeFi products and constructs. In a smart contract, composability enables one smart contract to utilize the features and behaviors of another smart contract via inheritance or composition. So long as they implement the same smart contract interface, new smart contracts can be Plug and Play, and make a new DeFi protocol by replacing some portion of the older protocol. As you can see, once enabled by smart contracts, all these DeFi constructs are like the Lego blocks of the DeFi ecosystems. They empower DeFi’s innovation and make it possible to create new and innovative DeFi products. We will show you how different DeFi protocols work in the following subsections.
Lending and borrowing Traditional lending and borrowing are normal financial activities we do all the time in our daily lives. When you purchase a house, instead of paying cash in full, which not many people can afford, you normally put down 20% of the purchase price as the down payment and get a mortgage to pay the remaining 80%. In return, the leader who lent you the money gets interest for a certain period – that is, until the loan has matured. The mortgage rate is determined by multiple factors, including your financial situation and credit score. Credit scores are personalized numbers that represent the creditworthiness of an individual. In the US, they normally come from agencies such as Equifax, Experian, and TransUnion. DeFi lending and borrowing of crypto assets are very popular in the crypto world and the DeFi community. Similar concepts in traditional finance have been ported or implemented in the DeFi protocol but with one caveat: there is no central authority for crypto credit rating. Even if you can Oraclize real-world credit scores into the DeFi world, it will be very difficult to
check someone’s creditworthiness due to the lack of know-your-customer (KYC), which is enforced by laws in the traditional finance world.
How does DeFi lending and borrowing work? MakerDao Dai is a popular DeFi crypto lending platform that allows a borrower to borrow DAI tokens. As we discussed in the MakerDao Dai section, Dai is a stablecoin whose value is pegged to US dollars. Anyone can open a Maker vault, lock in Ether or Maker-accepted coins as collaterals, and generate Dais as a loan against those collaterals. The borrower can use the Dais for any other crypto transactions, or simply create a Dai saving account and earn interest. In the absence of a credit score entity, or without knowing your customers, the MakerDao protocol has to resort to oversized collaterals as a risk control mechanism to protect the protocols from being liquidated by Maker vaults. This is how most DeFi lending and borrowing protocols work. As shown in the following diagram, the protocol maintains a liquidity pool where the lenders can stake their crypto assets and, in turn, get LP tokens and earn interest until they close the stake and withdraw the funds. In the DeFi lending/borrowing protocol, protocol governance is enacted to govern the normal operations of the protocol. Its role in the DeFi protocol is to serve as a mechanism for making crucial decisions about protocol changes or even protocol governance framework changes. Protocol governance can be centralized, decentralized, or hybrid. In the case of decentralized governance, protocol governance is codified as smart contracts and made transparent on the blockchain:
Figure 3.11 – DeFi lending and borrowing Similar to the MarkerDao Dai protocol, the borrower can lock their crypto assets as collateral and obtain a crypto loan. Like MakerDao, the DeFi lending and borrowing protocol typically requires an over-collateralized position to get the loan to protect itself. Let’s say you have 300 Ether in your wallet. Instead of letting the 300 Ether sit idle and earn nothing, you can lock those 300 Ether into the DeFi lending and borrowing protocol and get a value of 200 Ether for a crypto loan. In DeFi, anyone can be a lender. You can turn around and lend them out and earn interest. The interest you earn could fluctuate every minute. The interest rates of underlying crypto assets vary protocol by protocol and change over time. The complexity and abundance of choices in earning interest made yield aggregation possible, which is the algorithmic process of maximizing the interests for a given crypto position. We will discuss yield aggregators in more detail in the upcoming subsection. Like MakerDao Dai, liquidation in DeFi lending and borrowing is the process of liquidating crypto assets that are locked as collateral and selling them at a discount to close the loan. Crypto prices are very volatile. The collateralization ratio – that is, the collateral over the loan amount – fluctuates all the time and indicates the health of the borrower’s position. The DeFi protocol maintains a collateralization ratio threshold – that is, the
minimal collateralization ratio the borrower can continue to use to hold their crypto loan position without liquidation. This threshold is normally defined by the governance protocol. It varies protocol by protocol. When any account’s collateralization ratio falls below this threshold, the protocol enacts the liquidation process and auctions the collaterals in the market to recover the potential loss from the protocol. Oracle is the component that feeds the protocol with the near real-time pricing information of all crypto assets. Different protocols may determine the crypto prices differently, depending on the protocol’s design. Aside from MakerDao Dai, Aave and Compound are other popular DeFi lending and borrowing platforms. We will discuss the Aave protocol in the following subsection, as well as highlight the key differences between Aave and Compound.
Diving deep into Aave Since it launched in 2020, Aave has become one of the most popular decentralized lending and borrowing protocols in the DeFi ecosystem. It is an open source smart contract platform that was built using Solidity, a popular smart contract language. As a non-custodial liquidity protocol, it allows lenders to deposit crypto assets into a protocol and earn interest without transferring ownership of the tokens. The borrower can earn interest on their borrowed assets. As shown in the following screenshot, the lender can deposit the interestbearing tokens as a form of liquid into the Aave lending pool. In return, they will receive an equivalent amount of cTokens. For example, if you deposit 300 Ether into the Aave lending pool, Aave will lock 300 Ether into the lending pool and mint 300 cETH. In the same way, if you deposit 30,000 Dais into the Aave lending pool, the Aave protocol will lock 30,000 Dais into the pool and mint 30,000 cDais for you. The interest rates on cTokens are determined by the protocol. In Aave, the interest rates are algorithmically adjusted based on supply and demand. Interest is accrued in cTokens too. As the lender exits the system, they can simply return
cTokens, including the original stakes and all accrued interests, and exchange the target tokens back to you:
Figure 3.12 – High-level technical view of the Aave protocol The borrower has the option to get a stable rate loan or variable rate loan for the collateralized crypto assets. In Aave, if you choose a stable rate loan, the protocol will mint the same amounts of sTokens and issue them to you. Similarly, if you want a variable rate loan, you will get vTokens back. Stable rate loans and variable rate loans are like fixed-rate mortgages or adjustable-rate mortgages (ARMs) in traditional finance. In Aave, stable rates act as fixed rates in the short term but can be readjusted in the long term in response to the supply and demand in the market. Variable rates can always be adjusted based on the current market conditions. As we introduced earlier, a flash loan is a special loan transaction that doesn’t require collateral for borrowing, so long as the loan, together with the fee, is paid within the same transaction. Flash loans are not directly supported in the Aave protocol. Instead, the protocol gives developers two options to build smart contracts to simulate the flash loan on Aave. One is done through theflashLoan operation, while the other is done through the flashLoanSimple operation. The former allows the borrower to access the
liquidity in multiple reserves. The latter only allows the borrower to access a single reserve for flash loan transactions. Portal was added in Aave v3 to support easy integration with other L1 or L2 networks. The role of Portal in Aave is to facilitate the circulation of liquidity among Aave v3 markets across various blockchain networks. In the latest version of the protocol, Aave v3, you can use bi-directional, governance-approved bridges to move assets from one network to another. Typically, that involves burning a Tokens on the source network while instantly minting them on the destination network. The underlying assets can then be staked to the Aave lending pool after it has been moved through a bridge. Not everyone can perform this asset movement over a bridge. A new system role, BRIDGE, was provided in V3. Only an address with such a role has permission to move the supplied liquidity through the bridge. To access the price from external systems or other networks, Aave v3 leverages Chainlink, a decentralized Oracle network, as an Oracle proxy to supply the price and data required by the Aave protocol. To secure the protocol, Aave v3 has a sophisticated governance model that incentivizes the AAVE holder to lock AAVE tokens into a Safety Module (SM) smart contract. Once locked, they are used by the protocol as a mitigation tool to cover the deficits in case some shortfall events occur. If you’re interested, you can learn more at https://docs.aave.com/aavenomics/governance#protocol-governance. To conclude this section, let’s take a look at how Compound works.
A brief overview of Compound Similar to Aave, Compound is another popular algorithmic and autonomous DeFi lending and borrowing platform. Compound offers its governance token, called COMP, to represent the voting rights over protocol and governance decisions. This includes the decision to incorporate new assets, protocol upgrades, or technical upgrades on the platform.
Similarly, it converts any crypto assets into Compound’s native tokens, cTokens, which are then used to track positions (supplied assets) in Compound. If you deposit 300 Ether into Compound, it will issue the same amount of cETH as the claims of your asset in the liquidity pool in Compound. Likewise, if you deposit 30,000 Dais to Compound, you will be issued 30,000 cDais. Similar to Aave, different cTokens can earn you different interest rates, and those interest rates are determined by the Compound protocol. Just like any other DeFi protocol, a liquidation mechanism was designed to protect the borrower and the protocol from further detriments to the system. In Compound v3, the system defines the liquidation collateral factors and borrow collateral factors. Borrow collateral factors are used to determine the initial borrow capacity, while liquidation collateral factors are used to trigger the liquidation process. Liquidation collateral factors are always higher than the borrow collateral factors so that a price buffer is left to prevent unnecessary liquidation. Liquidation is triggered when any account’s borrowing exceeds the limits set by liquidation collateral factors. When that happens, the protocol absorbs the debt of those underwater accounts. It takes over the collateral assets from those accounts and uses the cash reserve of the protocol to repay the debt. The protocol sets the target amount of reserves of the base token to protect the protocol from the risk of insolvency. If the protocol holds less than the target reserves, the protocol allows the liquidators to buy the collaterals at a discount. If you’re interested in learning more, you can find more details about the Compound protocol at https://docs.compound.finance/liquidation/.
Decentralized exchanges Earlier, we introduced the concept of exchange, where you can swap and trade one financial instrument with another. Crypto assets can be traded at centralized exchanges such as Coinbase, Binance, and DEX. In a centralized exchange, trades are typically normalized with the fiat currency.
Let’s say you want to exchange your BTC with ETH – you will sell your BTC in exchange for a fiat currency, such as the US dollar, and then use the fiat currency to buy ETH. DEX is a decentralized marketplace for trading crypto coins, tokens, and crypto products in a non-custodial manner. Typically, transactions are made directly between two traders without any intermediary or having to convert into fiat currency. The leading DEX protocols include Uniswap and Curve. In the next section, we will discuss how DEX works and help you learn the mechanics of AMM. Then, we will dive deep into Uniswap, the leading DEX protocol. We will conclude this section with a brief overview of the DEX aggregator, a new type of DeFi service that’s become more popular lately. By the end of this section, you will have a good understanding of DEX and how leading protocols implement it.
How does DEX work? In a traditional finance or centralized exchange, the exchange resorts to the order book to match the orders and facilitate the trade of one financial instrument for another. As we discussed earlier, the order book approach requires you to keep track of the bid and ask of all orders and match your bid with someone’s ask. This approach became costly in the DeFi world. Another common approach in traditional finance is to use the market maker, who mediates the trades between traders. This becomes impractical in the DeFi ecosystem since, in the decentralized network, there won’t be any centralized intermediary to mediate the trades. This gives rise to the AMM concept in DeFi. The following diagram shows the key concepts and mechanisms of a DEX protocol. Like any other DeFi product, DEX maintains a large pool of liquidities in the liquidity pool. Any liquid providers who want to generate yields from their crypto assets can stake their crypto assets into the liquidity pool. DEX leverages the liquidity pool to make the market and match the orders according to some deterministic algorithms. Similar to all other DeFi
products, protocol governance is done through a decentralized protocol. In this case, Oracle was leveraged to bring the external pricing data into the DEX protocol. Some may use a decentralized Oracle network, such as Chainlink, to bring the pricing feed on-chain. Uniswap, instead of relying on a third-party Oracle feed, chose to build an Oracle with historical observations on-chain. We will discuss this when we unpack the Uniswap protocol in the subsequent sections:
Figure 3.13 – DEX AMM is the cornerstone of many popular DEX protocols on the market. It was initially proposed by Vitalik, the co-founder of Ethereum, as an idea in his Reddit post, which later became popularized following the successful implementation of constant products in Uniswap. AMM allows crypto assets to be traded in a permissionless and automatic way by matching the price determined by the supply and demand of liquidity pools within the protocol. In the next subsection, we’ll cover the constant product AMM strategy and help you understand how the protocol determines the exchange rate between two types of crypto assets.
Understanding the mechanics of AMM
AMM is a novice market-making concept that originated from Ethereum. Typically, it comprises a set of smart contracts designed to facilitate the determine the pricing of underlying crypto assets based on the amounts of cryptocurrencies in the liquidity pool. The most popular is the constant product algorithm. The constant product algorithm is based on the hypothesis that, when you exchange one token with another in a large pool, the product of two quantities of respective tokens may stay constant. Vitalik initially theorized this as X * Y = K (as a constant). As seen in the following screenshot, we have a pool size of 1,000, which initially has 20 X coins and 50 Y coins. If a trader wants to buy 40 Y, this means that after the trade, the pool will have 10 Y left. Using the constant product formula, after the trade, the pool should have 100 X coins to make the pool size stay constant at 1,000. This means the trader has to put 80 X coins into the pool in exchange for 40 Y coins being taken out of the pool:
Figure 3.14 – Constant product AMM strategy Let’s use some real-world crypto exchange examples to help you understand how AMM helps with market making. In current market conditions (October 2022), 1 ETH is priced at around 1,300 US dollars. Let’s say we set up a balanced ETH-DAI liquidity pool with an initial staked fund of 1.3 million in ETH (or 1,000 Ether) and 1.3 billion in DAI (or 1.3 million Dais). The constant product of ETH and Dais in the pool is exactly 1.3 billion.
If a trader opens a swap position to exchange 10 Ether in their wallet for some Dais, how will the AMM smart contract determine how many Dais the trader would receive? To exchange 10 Ether for Dais, they must add 10 Ether to the liquidity pool, which would end up containing 1,010 Ether if the trade goes through. According to the constant product formula, the liquidity pool should have the following Dais: 1300000000 _ 1010 = 1,287,129 Dais Upon deducting this from the total number of Dais before the trade, the trade would exchange 10 Ether for 12,871 Dais. AMM is simple to implement. The downside is that it may create a price slippage for underlying assets, where the expected price of the exchange may be different than what AMM quoted. Another issue is it may incur an impermanent loss, where the liquidity provider may see their staked assets priced lower than the markets’ due to price fluctuation in the AMM market.
Unpacking the Uniswap protocol Ever since Uniswap was launched and deployed on the Ethereum blockchain in November 2018, it has become one of the most popular decentralized exchange platforms. As we mentioned earlier, the initial idea of the constant product AMM strategy came from Vitalik, but the Uniswap team turned the hypothesis of AMM into a reality. The following screenshot illustrates the earlier implementation of Uniswap. In the Uniswap v1 implementation, the liquidity providers deposit their pairs of crypto assets into the Uniswap liquidity pool, where the protocol will build all possible pairs of pools the platform allows. When a trader wants to exchange a number of A coins with B coins, it simply asks the protocol to swap two coins based on the price determined by the AMM, a smart contract that implements the constant product algorithm:
Figure 3.15 – Uniswap v1 protocol As we explained in the previous section, the new price will ensure the pair’s liquidity pool stays compliant with the constant product principle once the swap has been made. The trader pays a fee for this swap, which will be accrued and subsequently redistributed to the liquidity providers for their stake in the liquidity pool. The process of incentivizing the liquidity provider in this passive way is also called yield farming, which we will discuss at the end of this section. Since its initial implementation, Uniswap has gone through multiple upgrades. v2 and v3 of the Uniswap protocol have added a lot of new capabilities to improve the price stability for the system, as well as capital efficiency for liquidity providers. The following diagram illustrates the Uniswap V3 architecture:
Figure 3.16 – Uniswap v2/v3 protocol Although the basic mechanics didn’t change, a lot of new capabilities were added to the protocol. Here are some notable features: Liquidity pool: In Uniswap v1 and v2, only a single pool can be initialized per pair, and each pool has a standard fee at a rate of 0.30%. It is simple to implement but also creates limitations in terms of the capital efficiency of the liquidity providers. A standard rate of 0.30% may be too high for stable assets such as stablecoins but could be too little for volatile assets. To address this capital efficiency issue, in v3, the protocol allows multiple pools to be created per pair, and each pool can have differentiated fee rates. Concentrated liquidity: Introduced in Uniswap v3, concentrated liquidity allows the liquidity provider to specify a price range for the staked liquidity in the liquidity pool. This exchange can only be executed if the price falls within a certain price range. This makes sense since most crypto tokens are priced in a certain range. In earlier versions of Uniswap, the price range can be from zero to infinite due to the constant product formula. All staked positions earn yield uniformly, proportionally to the ownership percentage of the liquidity pool. In v3,
with concentrated liquidity, the liquidity provider can open liquidity positions at different price points and with different liquidities. To support such customized staking positions, behind the scenes, Uniswap v3 leverages NFTs to manage the concentrated liquidity positions; swap fees are continuously collected and accrued. Flexible fee: Instead of a fixed fee collected from every trade, v3 introduced a flexible fee structure. v3 allows multiple pools per asset pair; the fee tier for each pool can be set during the initialization of the liquidity pool. The fee tiers include 0.05%, 0.30%, and 1%. Oracle: In its initial version, Uniswap doesn’t rely on the external price Oracle or historical prices on the liquidity pool to determine the price. Instead, it purely follows the constant product formula. In v2, it introduced the Time Weighted Average Price (TWAP) Oracle for building historical price observations on a chain. Historical data is stored as an array of observations. At each block, each pool tracks only the current observation. As the blocks progress, the observations are overwritten. It is the user’s or aggregator’s responsibility to aggregate such observations of the previous blocks. In v3, it accumulates such observations for the last 9 days. Any protocol that wants to quote the price before making the trade can leverage Oracle to get the historical observations. Flash swap: Flash swaps were first implemented in v2. They allow the trade to go through without enforcing that enough input tokens have been received. Without a flash swap, the transaction will be rolled back on the Ethereum network if the input tokens don’t cover the underlying assets, as well as the fee. With a flash swap, such transactions can be executed and the remaining gaps can be sent as part of callback functions. As with all DeFi protocols, Uniswap has a unique governance model that protects the platform. If you’re interested in learning more, check out https://docs.uniswap.org/protocol/introduction.
A glance at DEX aggregators
Before we wrap up our discussion on DEX, let’s briefly go over various exchanges on the market and understand the role of a DEX aggregator. At this point, you should have a good understanding of decentralized exchanges and how Uniswap, the leading DEX protocol, works. In addition to Uniswap, there is quite a long list of DEXs on the market. We encourage you to check out listing sites such as CoinMarketCap (https://coinmarketcap.com/rankings/exchanges/dex/) for the top DEX protocols. With the abundance of DEX protocols, there are a lot of options for traders to exchange and swap their crypto assets. DEX relies on liquidity pools to maintain the stability of pricing. Some DEXs may not have enough liquidity to be stable. These exchange rates vary from DEX to DEX. How you can generate a maximum return for the liquidity providers or get the best exchange rates for your assets becomes a challenge. That is what DEX aggregators try to address. 1inch initiated such a concept and became the leading DEX aggregator. Typically, a DEX aggregator is built on top of existing DEX protocols, leveraging the composability of smart contracts. It provides a single entry point or dashboard for the traders to view all available DEX markets; price Oracles such as the Uniswap v3 TWAP Oracle advise the traders on the best trading strategy to handle slippages or impermanent loops, and maximize their yields.
Decentralized derivatives and insurance Similar to traditional finance, derivatives are popular crypto financial products and services in the DeFi ecosystem too. Over the last few years, many DeFi derivative products, including futures, options, and swaps, were made available on decentralized exchanges. Leading DeFi derivative protocols, including Synthetix, UMA, dYdX, and others, started emerging. At the same time, decentralized insurance products, especially those that protect DeFi investments and smart contracts, were gaining traction. In the following subsection, we will provide a brief overview of decentralized
derivatives and insurance in DeFi and go over some of the leading DeFi derivative and insurance protocols.
Overview of derivatives and insurance in DeFi In financial terms, derivatives are the financial instruments that derive the values from the future pricing movement of underlying investment assets. The same applies to crypto derivatives. A decentralized derivative is a contract between two parties that generates profits from the performance of underlying crypto assets. When the contract’s conditions are met, the contract is executed through smart contracts on the blockchain. Popular DeFi derivatives include futures, options, and perpetuals. Let’s take a brief look at them: Futures: Similar to futures in traditional finance, a crypto futures contract is an agreement between two parties involved in obliging to buy and sell certain crypto assets at a predefined price and date. For example, if you think that ETH will increase in price following the merge, you may want to open a long position on the Ether by buying an Ether futures contract with a monthly expiry date. If you felt the merge might introduce uncertainty and the price may drop, you may want to open a short position on the Ether future contract. Options: In the same way, crypto options are available through DeFi options contracts. They are quite similar to crypto futures contracts; however, options contracts don’t mandate buying or selling the underlying crypto asset – they simply give the contract holder the option to execute the buy or sell order. It is normally a way to hedge on the price movement of underlying crypto assets in case the underlying price moves against you. Perpetuals: A perpetual contract is a crypto futures contract that never expires. Similar to crypto futures, a perpetual contract allows the contract holder to buy or sell the underlying crypto asset at a pre-agreed price anytime in the future.
Insurance is a written contract in which the insurer indemnifies the policyholder against the financial losses from certain risks or perils. Insurance is all about risk and uncertainty, while risk is the probability that something bad could happen and it will cause financial losses when it happens. Traditionally, this involves a central entity, the insurer, who prices the risks and offers the insurance products for anyone to buy and get protection against such risks. In return, the insurer collects the premiums from the policyholders, and when something bad happens, they pay the insured person in the form of a claim. In DeFi, a similar risk and insurance concept has been implemented in decentralized insurance protocols that allows peer-to-peer provisioning of insurance protection without a central entity. Leveraging blockchain and decentralized networks to replace traditional insurance is still relatively new, but insurance and protection from DeFi protocol risks have taken shape thanks to some of the leading decentralized insurance protocols on the market, such as Nexus Mutual. In the next few subsections, we will introduce you to the leading DeFi protocols, dYdX and Nexus Mutual.
Understanding dYdX In mathematical terms, dy _ dx represents the first-order derivative of a function, y = f(x), as shown here: dy _ dx = lim h→0 f(x + h) − f(x) _ h As its name suggests, dYdX is the DeFi protocol that has pioneered decentralized derivatives, including perpetuals, margin and spot trading, and lending and borrowing. It was originally built on the Ethereum L1 network, but in 2020, it moved to StarkNet, the L2 ZK rollup platform on Ethereum. All dYdX trades are settled in StarkNet, which, in turn, publishes ZK proofs periodically to Ethereum to ensure transaction security. Funds must be deposited to the Ethereum smart contract before they can be used to trade on dYdX. We mentioned StarkWare and its implementation of ZK rollups, StarkNet, in Chapter 2, Ethereum Architecture and Ecosystem.
As shown in the following screenshot, dYdX is a decentralized exchange for futures, options, and perpetuals. The key difference between dYdX and other DEXs is their margin and leverage. In dYdX, a margin allows the trader to buy derivatives with borrowed money, in the form of crypto assets, while the leverage is the collateral that protects the protocol from its pricing falling:
Figure 3.17 – dYdX protocol overview The trader needs to send crypto assets as collateral to open a margin account. Internally, collateral is held as USDC, and the quote asset for all perpetual markets is USDC. The trader can open multiple positions within the same account, but all positions will share the same collateral. The dYdX protocol defines three types of risk parameters per crypto type: the initial margin fraction, maintenance margin fraction, and incremental initial margin fraction. Maximum leverages are determined based on these risk parameters. The protocol then uses these leverages to determine the value that must be held by an account when opening a position, increasing positions, or avoiding liquidation of the account and positions. Another key difference is that dYdX uses an off-chain order book to maintain the open positions and leverages an off-chain matching
mechanism to match the orders, instead of using AMM on-chaining for order matching. All transactions are then posted on-chain for security and settlement once the orders are matched. The market maker can use dYdX too, just like the market maker in traditional finance. They do arbitrage trading when opportunities present themselves. Liquidation may be triggered during the extreme market, where the high volatility of underlying assets may cause the account value to drop below zero before it can be liquidated. The protocol defines an insurance fund to protect the system from insolvency. When account liquidation is triggered, the insurance fund will absorb the loss. If the insurance fund is exhausted, the protocol will use profitable positions to offset underwater accounts to maintain the stability of the entire system. dYdX plans to move the order book and matching on-chain in the future. Instead of staying on Ethereum, they plan to build out an L1 blockchain on Cosmos and leverage the validators in the decentralized network to manage the order book and match the orders. They foresee that this approach will enable the protocol to handle tens to thousands of transactions per second and at the same time, take advantage of an order book and stay decentralized.
Understanding the risks in the context of insurance Although jurisdiction and local rules may differ, generally, insurance deals with the risks of unexpected events. The basic formula for insurance is that the fund that’s collected from the premiums paid by those who are insured can cover the cost of any incurred losses. The following screenshot shows a high-level conceptual business model for insurance:
Figure 3.18 – Conceptual insurance business model As we mentioned earlier in the Overview of derivatives and insurance in DeFi section, insurance is all about the risks and the probability of something bad happening. Risk and insurance always work in tandem. Without risks, there is no need for insurance. The probability is driven by risk factors and therefore could be different from person to person. The insurer will need to price the risks and come up with a premium so that when they sell the policy to the buyer, they can collect the premiums for the future payments of claims from those who suffer losses. The higher the risk, the stronger the likelihood that something bad may happen, and therefore, there would be a higher premium. The lower the risk is, the lower the premium; otherwise, no one would buy the protection. In both cases, premiums paid by all policyholders are pooled into the insurance reserve and any future claims are paid out of the reserve.
Introducing the Nexus Mutual protocol
Nexus Mutual is a decentralized insurance platform built on the Ethereum blockchain to protect smart contracts in DeFi and DApps. It protects smart contracts from bugs, hacks, and unforeseeable events. It is a membershipbased, risk-sharing insurance model. Anyone who wants protection for their smart contracts can become a member, which allows them to buy insurance coverage and participate in the protocol’s governance. When an adversarial event occurs, any financial losses can be covered by the insurance reserve for the protocol. The following diagram shows a high-level conceptual view of the Nexus Mutual protocol:
Figure 3.19 – Decentralized insurance Nexus Mutual protocol Nexus Mutual leverages tokens and incentives to govern insurance operations and members’ behaviors. At its core, this is the membership token, NXM, which is the native ERC-20 token on the Nexus Mutual platform. Anyone can purchase NXMs and become a member; they can then use these NXMs to purchase insurance coverage. Once insurance coverage has been established, 90% of those NXM tokens are burned; the other 10% is locked for the duration of the coverage, plus the systemdefined timely claim filing period, which is currently set to 35 days for claim filing. In addition to being able to purchase smart contract coverage,
NXMs enable members to take part in protocol governance and play a role in risk assessment and claims assessment. Nexus Mutual introduced the concept of decentralized risk assessments, where the members can stake their NXMs and partake in the risk assessment. The members who stake for risk assessment become the risk assessors, and they earn risk assessment rewards as NXMs from the insurance coverage purchase. If there are any early claims indicating a lack of judgment in the risk assessment, those stakes may be lost. Similarly, claims are assessed through voting by the claim assessors. The claim assessor will need to stake NXMs for taking part in the claims assessment. The stakes will be deposited for a certain period and will be returned, provided the claims are assessed and voted honestly. The members who stake for a claims assessment become the claim assessors. When a claim arises, they assess the claim and vote on approving or denying the claim. Voting within the consensus outcome, which is set at a 70% majority, entitles the claims assessors to earn additional NXMs as a processing fee. Voting against the consensus outcome will cause the assessor’s stake to be locked for a long period. If there is no consensus among claim assessors, the claim will be escalated to all members for voting.
Yield farming and DeFi continuum As you can see, in all the DeFi protocols, one critical enabler is the liquidity pool, a collection of tokens or digital assets locked in a smart contract. These pools bootstrap the DeFi protocol and its ecosystems and provide liquidity for decentralized lending and borrowing, exchanges, derivatives, insurance, and other DeFi protocols. They facilitate trading, payments, and other peer-to-peer crypto transactions on the blockchain network. As we discussed when we talked about Uniswap, a paired liquidity pool allowed crypto assets to be exchanged in a decentralized manner via AMM. Without AMM, protocol designers have to rely on the order book to match orders, which is proven to be much harder and more expensive to implement.
Liquidity providers (LPs) provide liquidity to the liquidity pool. In return, they are incentivized, typically through interest, LP tokens, or governance tokens. In addition to financial rewards, this may enable the liquidity providers to partake in protocol governance and other protocol functions. The process of staking or locking liquidity into the liquidity pools and earning rewards is called yield farming. It is also called liquidity mining. To liquidity providers, this is a form of passive income from crypto assets. Most of the DeFi protocols we’ve discussed in this section allow you to get rewards by participating in the liquidity pool. These rewards are typically annualized and calculated as an annual percentage yield (APY). These APYs could be vastly different among different protocols. This gives rise to a new category of DeFi protocol, called the yield aggregator, which you first saw in Figure 3.2. In concept, a yield aggregator works as a set of smart contracts that pool crypto assets from investors and invest them in various yield-producing products or services with the intent to pursue optimal yields. One such example is yearn.finance. It aggregates yields from leading DeFi lending and borrowing protocols such as Aave, Compound, and others and helps the liquidity providers find the best returns. The liquidity providers, also called yield farmers, deposit their crypto assets into the yearn vault, called yVault. The funds are then converted into yVault tokens, or yTokens. For example, if you deposit 10 ETH, they will be converted into yETH. If you deposit 10,000 DAIs, they will be converted into xDAIs. All yTokens are ERC-20 tokens, act as a deposit receipt, and represent the farmer’s share of the yVault that they are participating in. A vault is a set of smart contracts that implement automated yield generation algorithms for different crypto assets, each driven by one or more yield strategies. These are defined as strategy contracts. A yVault may have many strategies active at the same time. It may change its strategies, rebalance capital allocations, or automatically shift capital as opportunities arise. One of the key functions of a strategy, called harvest, once called, will trigger the rebalancing process to realize the profits and reinvent the profits back into the strategy. Anyone can build a strategy and add it to the yVault. To add a new one, the strategist, who proposes the strategy, has to
go through a rigorous strategy vetting process, including concept vetting, code review, security review, and mainnet testing steps. DeFi has transformed traditional financial products and services and emerged as the future of the finance sector. However, it is not without its risks. Like any software, there are inherent risks when implementing software and its underlying blockchain infrastructure in DeFi. Scalability challenges in the L1 blockchain may deter DeFi adoption. Vulnerability in smart contract implementation has caused millions of dollars of losses in the leading DeFi platforms. Many protocols rely on Oracle to price the crypto assets, which could pose systematic risks if the prices are compromised. Governments around the world are closely watching the DeFi protocols and decentralized financial instruments on the crypto markets. New regulations may come out at any time. Regulation risks may lead to additional uncertainty and volatility in the DeFi ecosystem. Government regulation on crypto markets may determine the future and rules of engagement in the DeFi ecosystem for years to come. So far, you have learned how most of the leading DeFi protocols work. In the next section, we will discuss the economic perspective of the blockchain network and DeFi protocols and help you understand cryptoeconomics and tokenomics in the crypto world.
Making sense of cryptoeconomics The Ethereum community started to introduce the term cryptoeconomics in 2014. In most people’s views, the blockchain and cryptocurrency enable a new autonomous, self-sustainable, and self-sufficient decentralized digital world. In the decentralized world, the blockchain network exhibits all the basic elements of an economic system, which in broad terms is what people referred to as cryptoeconomics in the early days. In his talk in 2020, Vitalik credited Satoshi for the creation of cryptoeconomics. In his view, Satoshi didn’t solve the consensus problem, as most believe. The consensus problem was already solved long before
Satoshi invented blockchain. Cryptography, as a technology, had been applied everywhere to secure transactions, systems, and communications between parties. What Satoshi really invented was the incentive mechanism. When Satoshi developed Bitcoin, the first decentralized digital currency platform, he implemented a PoW protocol to secure the Bitcoin blockchain network through cryptography and fuel the system with an incentive system. The actor, which is the miner in this case, gets rewarded with Bitcoin for computation efforts in creating new blocks and keeping the chain alive. Similarly, in a PoS system, which is where Ethereum just moved to, the actors or validators stake their crypto assets to ensure the security of the network. In return, they get rewarded with transaction fees. In both cases, honest actors get rewarded, and dishonest behaviors are penalized. The fork choice rules built into the protocol ensure the continuation of blocks and self-correction when conflicting blocks appear. The economic incentives built into the protocol encourage all network actors to follow good economic practices while maintaining the network’s autonomy and sustainability. Together with cryptography and game theory, the incentive structure, which is designed to boost and power the creation and integrity of a particular cryptocurrency, can create an adaptive, fault-tolerant, live ecosystem. That is what most believe to be cryptoeconomics these days. While cryptoeconomics focuses on the incentive design of the blockchain ecosystem and the protocol as a whole, token economics, also known as tokenomics, deals with economic and financial behaviors of the crypto tokens in the application layer, similar to how they are handled in the DeFi protocols we discussed earlier in the Understanding DeFi protocols section. The goal is to ensure that a crypto token is used as intended. They rely on the cryptoeconomics to ensure secure transactions are made on the protocol and leverage the underlying cryptocurrency to provide liquidity and bootstrap the ecosystem. In layman’s terms, cryptoeconomics focuses on monetary policies and economic policies and regulations, while tokenomics cares more about the company’s stocks or bonds in the context of the company’s financial
performance and shareholder’s investment performance. Usually, to raise capital for expansion or growth, a company may offer stocks or bonds to the public through IPOs. They may continue with secondary offerings as needed. Similarly, DAOs or DeFi protocols offer tokens to raise funds to bootstrap ecosystems. Tokenomics focuses more on the capital structure and funding models of DAOs, token distribution and release, and the supply and demand of those tokens. It does have an incentive system, but it may not focus on rewards or penalties for good or bad behaviors. Instead, it may concentrate on financial health and proper governance of DAOs. To the token holders, this translates into investment gains or profit sharing from what happens on the protocol.
DeFi after the collapse of FTX FTX, the largest centralized exchange for trading cryptos, collapsed in November 2022 after a liquidity crisis. FTX’s token, FTT, lost 90% of its value over 10 days. It started with a company balance sheet being leaked, which triggered mass withdrawal by its customers. This led to the bankruptcy of the company. The collapse of FTX sent shockwaves through the financial industry; it exposed the vulnerability of the crypto market and DeFi and raised questions about the stability of the entire crypto market. FTX’s business model and inadequacy in governance and transparency are the direct reasons why the company fell. Several factors contributed to the collapse of FTX. One is directly related to the overall crypto market sentiment during the crypto winter in 2022. A lack of confidence in cryptocurrencies made it difficult for FTX to raise capital. Another factor was the company’s close ties to Alameda Research, a crypto trading firm that was also facing financial difficulties. When Alameda collapsed, it took FTX down with it. However, the collapse of FTX increased the awareness of the risks of centralization and the need for decentralized governance. It showed that even the largest and most seemingly reputable centralized crypto exchange platforms are not immune to failure. To the DeFi community, this unfortunate event in FTX has some positive impacts on DeFi. For one, it
has led to a growing interest in DeFi and increased demand for DeFi products and services. Investors are looking for ways to store and manage their crypto assets in a more secure and decentralized way. It prompted improvements in the security and governance of DeFi protocols and helped DeFi developers find ways to address the vulnerabilities that were exposed by the FTX collapse. Even in the wake of the FTX fiasco, DeFi is still evolving. New DeFi protocols continue to come to market, and existing ones continue to improve and mature. Institutional investors are increasingly looking to DeFi as a way to access new investment opportunities and reduce their risk. One area worth watching is the institutional DeFi. It is a version of DeFi that combines the power of DeFi and innovations from TradiFi’s safeguards to maintain regulatory compliance and ensure financial integrity and customer protection. It leverages tokenization, DeFi composability, smart contracts, and blockchain technology to unlock value for investors, financial institutions, and even the average Joe.
Summary Wow, that was a lot to unpack. By now, you should have a good understanding of the DeFi stack. We started with DeFi primitives and helped you understand the concept of native cryptocurrency or tokens, including ETC, ETH, ERC-20, ERC-721 tokens, and stablecoins. We introduced the basic constructs and building blocks in DeFi, including liquidity pools, Oracle, flash loans, exchanges, tokenizations, and composability. We then dived deep into the leading DeFi protocols, including decentralized lending and borrowing, decentralized exchange, as well decentralized derivatives and insurance protocols. We also provided an overview of DeFi aggregators, such as DEX aggregators and yield aggregators. We briefly touched on the cryptoeconomic perspective of blockchain and the DeFi protocol. We discussed the FTX fiasco and the importance of decentralization in the wake of the FTX collapse. The DeFi movement is a continuum in that it continues to evolve. Smart contracts are the lynchpin of
all these new financial products and services. The composability of smart contracts enables innovation and evolution in DeFi ecosystems. It is a fascinating area where Ethereum truly finds huge success. Thanks to smart contracts, as well as their interoperability and composability, many such protocols find it easier to expand or integrate with the DeFi protocols on other L1 blockchains, especially those that are EVM-compatible. In the next chapter, we will cover other EVM-compatible blockchains, including Binance Smart Chain, Polygon Matic, and Avalanche. We will help you understand how to bridge different L1 blockchains and show you how leading DeFi protocols support the interoperability between different EVM-compatible chains.
EVM-Compatible Blockchain Networks So far, we have shown you how Ethereum works and how Ethereum Virtual Machine (EVM), as the execution layer, works under the hood in Chapter 2, Ethereum Architecture and Ecosystem. You also learned how EVM and smart contracts are enabling many of the leading Decentralized Finance (DeFi) protocols and creating a whole set of decentralized financial products and services on blockchain networks in Chapter 3, Decentralized Finance. As we discussed in the last chapter, smart contracts are the linchpin of DeFi protocols and decentralized applications. EVM is the computing environment enabling smart contract execution on the Ethereum blockchain network. Ethereum is the first blockchain network to pioneer the concept of smart contracts. Since then, many other EVM-compatible blockchain networks have been developed and have brought the innovation and flywheel effect of DeFi into other layer 1 ecosystems. In this chapter, we will help you understand the rationales behind EVM compatibility, and then discuss some of the leading EVM-compatible blockchain networks, including BNB Smart Chain (BSC), Polygon chains, and Avalanche chains. We will analyze popular integration mechanisms in bridging different L1 EVM chains. At the end of this chapter, to complete the discussion, we will briefly take a glimpse at some of the leading nonEVM chains, including TRON and Solana. In this chapter, we will cover the following topics: Understanding EVM blockchain ecosystems Introducing BSC Scaling Ethereum with Polygon chains Diving deep into high-performance Avalanche chains
Bridging interoperability gaps between blockchains Glancing over non-EVM blockchain networks
Technical requirements For all the source code of this book, please refer to the following GitHub link: https://github.com/PacktPublishing/Learn-Ethereum-Second-Edition/.
Understanding EVM blockchain ecosystems As we discussed in Chapter 2, Ethereum Architecture and Ecosystem, Ethereum’s success in propelling the broad adoption of DeFi protocols and cryptocurrencies has created such a flywheel effect that more and more innovative protocols have been developed out of the Ethereum ecosystem, and, in turn, more and more transactions are coming to the Ethereum network. The busier the network is, the costlier the transaction fee is. The fundamental issues behind such skyrocketing costs are the scalability of the Ethereum network and the scarcity of limited block space. We discussed scalability challenges in blockchain. You may recall the blockchain scalability trilemma from Chapter 2, Ethereum Architecture and Ecosystem. It is generally deemed unlikely for a decentralized network to comprise all three attributes of scalablity, security, and decentralization. Various blockchain protocol designers have to sacrifice one of them and achieve the other two. Bitcoin’s goal is to create a decentralized and secure peer-to-peer (P2P) network for digital payment; therefore, it sacrifices scalability. The same goes for Ethereum. Some other layer 1 blockchains, such as Solana and BSC (formerly Binance Smart Chain), prioritize scalability over decentralization. Therefore, they can achieve great scalability, throughput, and usability, and in turn, create phenomenal adoptions of their ecosystems. Avalanche is the first blockchain trying to achieve all these attributes in its own layer 1 design.
On the other hand, Ethereum is addressing its scalability with its own roadmap. Transitioning from Proof of Work (PoW) and Proof of Stake (PoS) and merging between Ethereum 1 and Ethereum 2 are some earlier steps in its massive plan. As we discussed earlier, the ecosystem also innovated various L2 options to improve the throughputs and reduce the transaction cost, especially with both optimistic and zero-knowledge (ZK) rollups. Data sharding, a scaled-down version of the original planned sharding solution in Ethereum 2.0, is still on the roadmap, which will create more block space and ultimately alleviate scarcity issues. Apparently, since the start of Covid-19, the overgrowth of Decentralized Applications (DApps) and DeFi adoptions has exacerbated scalability challenges in Ethereum and created more issues than the Ethereum community can solve at its own pace. Some of the leading DeFi protocols shown in Figure 4.1 started to migrate from layer 1 to layer 2, and some decided to move out of Ethereum altogether and move to new and inexpensive blockchain networks. Thanks to such scalability challenges, new blockchain protocol designers seized such opportunities and developed new layer 1 blockchain networks. Many new blockchains follow the modular blockchain architecture paradigm, where the consensus, execution, and data become more separate and interconnected layers, instead of coupling together as a monolithic blockchain such as Bitcoin or Ethereum 1.0. The benefit of the modular approach is that any layers can be replaced with better solutions suitable for the use cases and the needs of its targeted ecosystems. On one hand, they want to improve or totally address scalability and throughput issues with new design; on the other hand, they also want to take advantage of DeFi and DApps successes in Ethereum, which is why many such newer blockchain protocols have adopted EVM as the execution layer for their new blockchains. The immediate benefit is the portability of smart contracts. The same smart contracts in DeFi protocols and DApps can be deployed without much or any change. The abundant tools, APIs, and frameworks make it attractive for the developers to develop new applications on the new blockchain ecosystems and redeploy them into any other EVM-compatible chains.
Each layer 1 blockchain is considered a separate blockchain ecosystem. To maximize the benefits of DeFi and DApps across blockchain ecosystems, interoperability and connectivity between chains become a necessity. They require capabilities to enable asset transfer from one chain to another chain, data sharing between chains, and transaction coordination across chains. The following diagram depicts such concepts in the broad blockchain ecosystems:
Figure 4.1 – EVM-compatible blockchain network ecosystem Bridges, which we briefly introduced in Chapter 2, Ethereum Architecture and Ecosystem, are common design patterns that facilitate asset transfer and transaction coordination across EVM-compatible chains. Decentralized oracles, which we introduced in Chapter 3, Decentralized Finance, become
the cross-chain infrastructure enabling data sharing between chains and bringing off-chain data on-chain. More sophisticated interoperability mechanisms, such as Polkadot and Cosmos, are facilitating interconnectivity and integration across chains, including EVM and nonEVM chains, which we will discuss in later sections. Let us dive into the fascinating world of the EVM ecosystem. We will start with BSC in the next section.
Introducing BSC Binance started with its own blockchain, previously called Binance Chain, to power its own centralized flavor of decentralized exchange for trading crypto assets in 2017. Its main function is to provide a highly performant order book-based matching engine that aims to replicate the sub-second trading efficiency of centralized exchanges in the traditional financial world. Binance jumped onto the wagon of DeFi mania in 2020 with the rollout of Binance Smart Chain. This is a smart contract platform intended to power DeFi protocols and smart contracts. Instead of building a new blockchain infrastructure from the ground up, Binance forked from Geth one of the popular Ethereum 1.0 clients, plugged in its own implementation of the Proof of Staked Authority (PoSA) consensus algorithm, and made it available quickly to the developer community, which was deeply frustrated by the progress of Ethereum scaling solutions. Since then, it has become one of the leading DeFi and cryptocurrency platforms. According to https://defillama.com/chains, its total value locked (TVL) is second behind Ethereum among all blockchains. Since early 2022, Binance has consolidated its blockchain portfolio into Binance Chain and renamed its blockchain products. The new product family is called BNB Chain. Figure 4.2 shows the consolidated product offerings from Binance:
Figure 4.2 – Binance blockchain product portfolio The consolidated BNB Chain portfolio is made up of the following products: BNB Beacon Chain (previously Binance Chain), which is used for BNB Chain governance. One of its key responsibilities is to elect the validators participating in the consensus for BSC. BNB is the native cryptocurrency in both Beacon Chain and BSC. BSC (previously Binance Smart Chain), which provides an EVMcompatible execution engine, a PoSA consensus layer, as well as bridges to BNB sidechains and other EVM-compatible blockchains. BNB Sidechain—PoS solutions for developing custom PoS blockchains and DApps with existing BSC functionality. BNB ZkRollup—Its own version of ZK rollup solutions to be developed in the future for scaling BSC. We will focus on BSC in the rest of this section. We encourage interested readers to check out more details on the other three products at
https://docs.bnbchain.org/docs/overview.
Consensus mechanism in BSC Binance took a progressive decentralization approach to address scalability challenges in Ethereum. It focused on usability, scalability, and transaction throughput from the get-go. It also clearly saw the momentum in the Ethereum development community and made a conscious decision to stay compatible with EVM. Instead of building blockchain infrastructure from the ground up, Binance forked from the Ethereum 1.0 client implementation, Geth, created BSC, and then added many improvements to address challenges and issues in Ethereum. BSC implements the PoSA consensus mechanism, which is a combination of the Delegated Proof of Stake (DPoS) and Proof of Authority (PoA) consensus mechanisms. In a PoA-based blockchain network, validators are approved to participate in transaction validation and block creation based on a certain identity. They have to gain a reputation to become validators and have the incentive to maintain their reputation and identity. DPoS works similarly to PoS, but with a twist, where the users of the blockchain stake their coins and vote for delegates to become representatives to validate the transaction and propose new blocks. BSC combines both DPoS and PoA in its implementation of the consensus mechanism, namely PoSA consensus. The following diagram shows how PoSA works in BSC:
Figure 4.3 – PoSA consensus mechanism in BSC Anyone can stake BNB coins on a Binance beacon chain through the staking module to participate as a validator. In Binance, you can either delegate your own BNB stakes to you, delegate that to someone else, or receive the delegated BNB stakes from anyone. The beacon chain ranks the validators by using the delegation tokens and operator addresses. The validators with more delegation tokens will get a higher ranking. If it is a tie based on delegated tokens between two validators, the validator with smaller address bytes has a higher ranking. The top 21 validators with the highest rankings are then elected. The beacon chain communicates the 21 elected validators to the BSC chain via the BSC relayer, including any voting changes, which will be updated to BSC periodically, currently set on a daily basis. All elected validators are put into the active validator set on the BSC chain. They will take turns in validating and proposing new blocks. In return, the validators who propose the new block and get it added to the BSC chain collect the transaction fees. Since BNB coins are pre-minted, there is no new issuance of BNB coins as an incentive to block creation, which is a deviation from a PoW-based consensus. The protocol enforces double-sign detection and slashing logic to guarantee the security and finality of the blockchain. The validators can be slashed for unavailability too to ensure full participation in protocol security.
In addition, BSC also introduced a candidate pool of 20 validators as backups. The candidate validators may be called to propose new blocks, but the chances of them being called are much less than those in the active validator pool. In the same way, candidate validators can collect transaction fees if they are called to propose new blocks, but if they are inactive, they will get slashed with a small penalty for negligence.
Block creation and finality The creation of new blocks in BSC is very similar in concept to Ethereum’s implementation of PoS, but there are notable differences in BSC. The following diagram shows the process of creating new blocks:
Figure 4.4 – Block creation and verification in BSC As we discussed earlier, the system maintains an active validator set through staking and voting. The validators in the active validator set take turns to propose new blocks. BSC has a much shorter block time, currently around 3 seconds. That means a new block will be created every 3 seconds. Similar to Ethereum PoS, BSC has an epoch too. An epoch in BSC has 200 blocks. The block sitting on every 200th block is also called an epoch block.
When a validator is called for its duty of block creation, it becomes a block proposer. It will validate all transactions according to the protocol rules and prepare the block header of the next block. It then assembles all transactions and finalizes the candidate blocks, then signs all things in the block header, and finally broadcasts the new blocks to the network for the rest of the validators to verify. Once the new block is broadcasted, all validators will start their verification process by verifying the header and all transactions inside the block. If more than half of the validators agree on the new block, the new block is considered valid and added to the BSC chain. In BSC, transaction finality happens when the new block is sealed by more than two-thirds of validators, which means about 15 blocks or 45 seconds. The average transaction cost is about 21 gweis, which is much lower than the average transaction cost on the Ethereum network. The projected throughput in BSC is around 500 transactions per second (TPS). One of the issues with BSC is the centralization of validators, hence it makes the BSC network less secure. Since only the top 21 validators are elected to secure the network, it is easier to comprise more than half of the 21 validators (that is, 11 validators) to compromise the security of the BSC chain. Since the system requires two-thirds of the validators to sign for the incoming block to be final, that means BSC chain finality can’t happen if more than a third of its validators, or seven validators, are comprised. Compared with the Ethereum network and its beacon chain, where there are tens and thousands of validators lining up for securing the network, the BSC network does pose some potential network security risks.
BNB governance As shown in Figure 4.3, BNB is comprised of two layers of blockchains— one is the beacon chain, and the other one is the BNB smart chain. As we already know, the BNB smart chain is the execution environment and crypto asset transfer hub for smart contracts and DApps. Don’t confuse the BNB beacon chain with the Ethereum beacon chain. In Ethereum 2.0, the beacon chain is the heartbeat of Ethereum; its main function is to
orchestrate how validators are randomly selected for the block proposal and record all activities for transaction audits and traces. The BNB beacon chain is used for electing the top 21 validators to participate in securing the BSC. In addition, it is responsible for the BSC governance of the BNB smart chain. To control the system behaviors of BSC, BNB defines a set of system parameters that can be changed through the platform governance process, including the following: Slashing threshold Cross-chain transfer fees Relayer reward amount Parameters of staking/slash/oracle modules on the beacon chain Any other parameters that provide the flexibility of BSC All these parameters are voted on the beacon chain as part of the staking and voting process. Once those set of parameters is established, they will be communicated to the BSC chain via the relayer and made effective via the governance contracts on the BSC chain. It is not clear how the governance of the beacon chain works. One could imagine that in Binance, this could be a centralized function since the Binance chain started as a centralized entity for decentralized exchanges.
BNB ecosystem and roadmap Binance took a pragmatic approach to address scalability challenges in Ethereum and develop its own blockchain ecosystems. Although it started as a fork of Ethereum 1.0, Binance introduced necessary improvements to address the high transaction fee and transaction throughput issues that have plagued the Ethereum community since the start of Covid-19. Ethereum’s solution with Ethereum 2.0 was perceived as the long shot and the longterm solution for scalability issues. Binance sized the window of
opportunities and rolled out its own variation of Ethereum to be able to address the needs of DeFi products and services in the broad cryptocurrency community. As we mentioned earlier, Binance is taking a progressive approach to decentralization. It is considered a centralized network since it only has 21 validators securing the network at present, but Binance’s approach did address the transaction throughput and cost issues. One another key feature of BSC is that it is EVM-compatible. DApps built for Ethereum can be ported and deployed on BSC, taking advantage of faster and cheaper transactions. Since its rollout in 2020, many leading DeFi protocols, including those we discussed in Chapter 3, Decentralized Finance, are able to extend their offerings to Binance networks or aggregate their services out of both Ethereum and other EVM-compatible blockchains. Some replicated DeFi protocols, such as PancakeSwap or other variations of Uniswap, can easily attract billions of locked values and find success in the BNB ecosystem. As we saw in Figure 4.2, the future roadmap may include the Binance version for ZK rollups and sidechains. BNB ZK rollups are the replicated or ported version of ZK rollups in the Ethereum ecosystem, and the BNB sidechain is considered the counterpart of the Ethereum sidechains, such as Polygon Matic, or Avalanche subnets, which we will discuss in the next two sections. BNB Sidechain allows developers to create BSC-compatible sidechains suitable for specific use cases. It comes with a PoS consensus mechanism and a staking module. It allows you to define your own tokens and transaction fees and choose your own set of validators. In this regard, BNB’s implementation of sidechains is more similar to Avalanche subnets. BNB’s ZK rollups are on the roadmap and will be available anytime soon. Combining BSC with both Zk rollups and sidechains will certainly make the BNB ecosystem a legitimate challenger to the Ethereum ecosystem in the years to come.
Scaling Ethereum with Polygon chains Since it started as the plasma sidechain in 2017, Polygon has been taking an Ethereum-centric approach to its product strategy. Contrary to Binance and Avalanche, Polygon has never considered building its own layer 1 blockchain. Instead, it has built a wealth of L2 scaling solutions on top of Ethereum to augment Ethereum’s scalability solutions, including sidechains, ZK rollups, and optimistic rollups. In Chapter 2, Ethereum Architecture and Ecosystem, we introduced you to the concept of modular blockchain architecture, where modern blockchain implementations are moving away from monolithic design to more modular architecture design. With modular blockchain architecture, the execution layer, consensus, and blockchain data are separated into different modules but can still integrate and work seamlessly as a whole. Any of those layers and part of those layers can be supplemented by components at L2 or other higher layers. With rollups and sidechains, settlement is another component in this modular design to allow the settlement of L2 transactions into the L1 network. To support L2, sidechains, or even on-chain L1 light clients, data needs to be available when it is needed. Ethereum 2.0 started with some of these concepts. The following diagram is an abstraction of such a modular blockchain architecture design:
Figure 4.5 – Revisiting modular blockchain architecture Polygon’s vision fits neatly into this modular blockchain architecture paradigm. It is intended as the decentralized Ethereum scaling platform that enables developers to continue to leverage Ethereum for guaranteed economic security and augment the Ethereum community with scalable, user-friendly DApps with low transaction fees. By economic security, we really mean the cost to attack the blockchain. In its strategy, Ethereum is the hub and ultimate settlement layer for everything in Polygon. The following diagram shows the product offerings in the Polygon ecosystem:
Figure 4.6 – Polygon’s blockchain product portfolio Largely, Polygon has three categories of blockchain solutions. At the top is a set of L2 rollup solutions intended to scale Ethereum. We discussed both ZK rollups and optimistic rollups in Chapter 2, Ethereum Architecture and Ecosystem. Between these two rollup strategies, Polygon has its biases on ZK rollups and ZK technologies. Although it is not shown in this diagram, it is worth mentioning that Polygon has a separate module, Polygon Avail, which provides data availability to the rollups.
On the right side is Polygon’s multi-chain solutions targeted to enterprise use cases. Standalone sidechains and enterprise chains, supported by Polygon Edge, are very similar to BNB sidechains and Avalanche subnets in concept. We will briefly introduce Polygon Edge and multi-chain solutions later in this section. In the next subsection, we will review how Polygon PoS and Plasma, together with Ethereum, provide EVMcompatible chains and enable scalability in Ethereum.
How Polygon PoS and Plasma work Polygon is a blockchain application platform that provides scaling solutions through hybrid PoS and Plasma-enabled sidechains. Compared with modular blockchain architecture, PoS is a generic validation layer implementing a PoS-based consensus algorithm, and a Plasma-enabled sidechain is a full-blown EVM-compatible execution layer that takes the computation, smart contract execution, and block-building logic out of the Ethereum base layer, similar to most L2 rollups. The following diagram shows a three-layer Polygon PoS network architecture:
Figure 4.7 – Polygon PoS network layer architecture It has three separate layers, as follows: Bor execution layer (Plasma)—This is the execution layer where selected producers execute transactions and produce blocks in an EVMcompatible execution environment. It is based on Go Ethereum client software. At any given time, a set of pre-defined block producers take turns to produce new blocks at the Bor sidechain, and a consensus is reached via the clique PoA consensus protocol. The Heimdall PoS layer randomly selects a set of producers from all staked validators. Heimdall polygon network layer (PoS)—Heimdall acts as the network manager and orchestrator within the Polygon network. Its function is somewhat similar to the beacon chain in Ethereum 2.0, but Heimdall doesn’t really record the PoS network heartbeat like the beacon chain does. Similar to the Ethereum beacon chain, Heimdall manages all validators and randomly selects validators for the assignment of block production on the Bor sidechain. It manages the clock for the spans and commits the merklized hash of the root hashes of each Bor block within a span. To achieve consensus at the Heimdall PoS network, it uses a modified version of the Tendermint consensus protocol, Peppermint consensus. In addition, the PoS layer acts as the middleman to share the Ethereum state data from the Ethereum mainnet to producers at the Bor network. Ethereum layer (settlement)—This layer consists of a set of contracts on the Ethereum mainnet. Checkpoint contracts are used for the Heimdall PoS layer to send checkpoints to the Ethereum mainnet, and enable the Polygon PoS and Plasma chain to anchor on Ethereum. Beyond that, it allows anyone who wants to stake and participate in the Polygon PoS network to stake the crypto assets and become a Heimdall PoS validator. The rewards, as well as penalties, for PoS activities are managed at the Ethereum layer too. Polygon has a flexible shared security model where you can choose either PoS security, Plasma security, or a hybrid security model of both PoS and Plasma. In the Plasma security model, PoA consensus at the Bor plasma
sidechain can be trusted to ensure transaction validity and transaction security. If the Plasma sidechain is corrupted or the selected Bor block producers are compromised, since all funds at the sidechains are settled at the Ethereum chain, the users can exit Plasma without any loss of their crypto assets. With PoS security, you don’t have to rely on PoA fast consensus from the Bor Plasma layer; instead, you rely on the validators on the Heimdall PoS layer for security and transaction validity. If the block proposers are compromised, committing checkpoints to Ethereum won’t succeed. A hybrid model allows you to use both PoA fast consensus at the Plasma sidechain layer and the PoS network layer. They all rely on Ethereum to provide economic security at a global level.
Block production and checkpoints In Polygon PoS, whenever someone stakes their assets on the Ethereum mainnet for an opportunity to become a Polygon validator, this will trigger the Heimdall listener to add them to the validator pool. Similar to other PoS consensus protocols, the chance of being selected is proportional to the stakes they staked. In Heimdall, the time interval is divided into spans; each span has several sprints, and a number of Bor blocks are created at each sprint. The time interval, number of sprints within a span, and number of blocks created within a sprint are defined through Heimdall governance. The following diagram shows the process for selecting a set of block producers and the proposer for a given span:
Figure 4.8 – Polygon PoS validator selection At the start of any span, Heimdall PoS randomly shuffles the validator pools and selects the next set of validators as the block producers for the entire span, and selects a set of proposers to commit Bor checkpoints at the end of the current span to the Ethereum mainnet. Validators in the Bor block producer set will take turns producing new blocks for each sprint; they will earn rewards for securing the network. All rewards and penalties are accumulated in the Ethereum mainnet. When any validator exits from the network, they can withdraw the stakes and the rewards from the Ethereum mainnet. The chance of being selected is determined by the stake voting power, which is proportional to the stakes deposited. The more stakes you put in, the more likely you will be selected. Let us say A deposited 1,000 Matic coins, B deposited 500 Matic coins, and C and D each deposited 300 coins. Let us say 100 Matic coins earn a validator one voting power. In this case, A will have 10 voting powers, B will have 5, and C and D each have 3. When they are put into the validator pool to shuffle, it will start as {AAAAAAAAAABBBBBCCCDDD}, and the random shuffling process will determine the Bor block producer set and proposers. Let us say the size of the producer set is 5, which means the top 5 validators after the shuffling will be popped out and assigned the role as the producers for the current span. Random shuffling is done using a seed derived from
the historical blocks. Let us say after shuffling, the validator pool becomes {AABCCABCDAABBDAAADBAA}, which means validator AABCC will take turns to produce Bor blocks when A will produce blocks for the first two sprints, B will produce blocks for the third sprint, and C will produce blocks for the next two sprints, and so on until we reach to the end of current span. In a similar method, proposers are selected out of the Bor producer set using Tendermint’s weighted round-robin algorithm. One of the validators in the producer set will be selected as the proposer to commit the checkpoints. At the end of the span, the proposer will verify all transactions since the last checkpoint and calculate the Merkle hash tree out of all block hashes in the current span, committing the Merkle hash as the checkpoint to the Ethereum mainnet. All Bor transactions in the span are considered to have reached finality once the proposer successfully commits the checkpoints to Ethereum. The following diagram shows how checkpoints are processed:
Figure 4.9 – Block production, checkpoint, and commitment As you can see, once the producers and proposers are selected, they will form a subnet as a committee performing assigned duties for the current span. This is a little similar to how the beacon chain manages validators for each epoch and slot in Ethereum 2.0.
Consensus mechanism in Polygon PoS As we discussed earlier, in Polygon, blocks are produced continuously within a span. At the end of the last sprint within a span, a proposer will be selected using Tendermint’s weighted round-robin algorithm. The chosen proposer commits the checkpoint for the given span to the Ethereum mainnet to acknowledge all transactions are verified. Polygon PoS uses a
two-phase commit process to ensure the successful submissions of a checkpoint to the Ethereum mainnet. The following diagram shows such a checkpoint submission process:
Figure 4.10 – Checkpoints and Peppermint consensus Polygon PoS consensus is based on a modified version of the Tendermint consensus algorithm, which is called Peppermint consensus. Tendermint is a Byzantine Fault Tolerant (BFT) blockchain consensus algorithm that can reach consensus as long as more than two-thirds of the Byzantine validators act normally. That means it can tolerate up to one-third of validators being compromised, unavailable, or acting maliciously. With Tendermint, validators participate in the consensus process by signing votes for blocks. There are three types of votes happening in Tendermint, including a pre-vote, a pre-commit, and a commit vote. When more than two-thirds of the validators have signed and voted on the new blocks, the new block is considered to be committed on the blockchain. Polygon made some modifications to the data structures to fit the Tendermint algorithm into the Polygon PoS network protocol.
At the end of a span, a proposer is selected using Tendermint’s weighted round-robin process and starts its duty for checkpoint submission. It will wait for all blocks within a span to be included until the timeout. If not all blocks are completed, it simply sends out no acknowledge (ACK) events to force the system to select the next proposer and continues to wait for all blocks to be included. Once all blocks are included, the proposer will calculate the Merkle root hash of all block hashes in a span and propose it as the checkpoint for all validators to verify and sign it with its signature. Once received, all validators in the validator pool will verify the transactions and checkpoints, vote and sign them with their signature, and send the vote with the signature to the network. In the meantime, the proposer will start its clock and begin to collect all votes and signatures. If over two-thirds of the votes are collected, the proposer will submit the checkpoint to the Ethereum mainnet. On the Ethereum network, the checkpoint smart contract keeps track of all checkpoints received and by whom they were submitted. The checkpoint transaction committed by the proposer will be added to the list. If the commitment was successful on the Ethereum network, the proposer will send an ACK message to notify all validators that checkpoint transactions were committed and provide the checkpoint number as proof of such successful commitment. Every validator can verify that checkpoints were committed successfully on the Ethereum mainnet. Once committed, the network will move on to select another validator as the next proposer. Whoever is selected, the incoming proposer will then send an ACK transaction to confirm that the previous checkpoint transaction has been successfully committed onto the Ethereum mainnet. Whenever there is a change in the validator set, the change will be synchronized to the validator node via the validator node on Heimdall. If it is timed out while retrieving all votes and signatures, or if the checkpoint commitment was not successful, the proposer will send no ACK transactions to the network. This will force Polygon to select another proposer. The selected new proposer will start the whole process all over again.
Multi-chain solutions with Polygon Edge Earlier, we discussed modular architecture adoption in Polygon PoS and Plasma. Polygon Edge is a framework for building modular, extensive, and EVM-compatible public or private blockchains. Figure 4.11 shows a highlevel abstraction of such a blockchain framework:
Figure 4.11 – Polygon Edge modular architecture
This is a framework to build a new EVM-compatible blockchain network. It provides the following key components and constructs: Consensus is the abstraction of a consensus layer, which provides Istanbul BFT (IBFT) consensus implementation. It supports both PoA and PoS consensus. Blockchain is the core component where all blocks are maintained and new blocks are added. Internally, it uses LevelDB for storing blockchain data and metadata. This is similar to the data availability layer we discussed earlier when we looked at modular blockchain architecture. State is the component executing transactions and managing state transitions in the blockchain. It is actually the execution layer. Polygon Edge supports an EVM-compatible execution environment. Libp2p is the P2P networking abstraction facilitating communications across multiple blockchain networks, enabling transfers of both Ethereum Request for Comment 20 (ERC-20) and ERC-721 tokens with a bridge solution. In addition, you can define your own edge currency and implement your own ERC-20 or ERC-721 tokens. The Web3 JSON-RPC interface allows industry-standard wallets to interact with Polygon Edge, just as you normally would do with other blockchain networks. For more information, check out the Polygon Edge documentation at the link we showed at the beginning of this subsection.
Polygon ecosystem and roadmap Polygon took an Ethereum-centric approach to building products and solutions that make Ethereum accessible and affordable, kind of like building public infrastructure such as roads, bridges, and subways for those commuting into and out of Manhattan and doing business at the center of New York City. What Polygon has built are blockchain infrastructures for DApps to connect to and transact on the Ethereum network.
In addition to the ZK rollups we mentioned in Chapter 2, Ethereum Architecture and Ecosystem, as well as Polygon PoS, Plasma, and multichain solutions, the following solutions are worth mentioning too: Polygon Avail—An on-chain decentralized data availability solution in development to provide blockchain data to L2 rollups and allow data sampling on historical transactions. Validium has pioneered such a solution, but it offers off-chain data availability, instead of on-chain. Ethereum is developing such a solution too. According to its roadmap, which we will discuss in Chapter 5, Deep Research and Latest Developments in Ethereum, it may take a while for such an Ethereum solution to be available. Before that happens, Polygon Avail could be a data sampling and availability solution to all L2/L3 and Web3 applications. Polygon Nightfall—This is an optimistic rollup solution, but leveraging Zero-Knowledge Proof (ZKP) to address privacy issues in blockchain applications is a pressing enterprise concern. Polygon Supernets—This is a Blockchain-as-a-Service (BaaS) model offered by Polygon. Instead of building and deploying their own blockchain using Polygon Edge, application developers can leverage Polygon Supernets to set up, bootstrap, and roll out their blockchain network. Polygon has a rich set of solutions in its product portfolio, and it continues to evolve and innovate. It has a vibrant developer community too. Interested readers should check out the Polygon website to get up-to-date information (https://polygon.technology).
Diving deep into high-performance Avalanche chains Avalanche blockchain is a decentralized network and smart contractenabled blockchain platform designed from the ground up to address scalability, decentralization, and security challenges, the famous blockchain
trilemma. Avalanche tackles these trilemma problems with a novel approach to network consensus, which is proven to be scalable no matter how large the decentralized network becomes. In addition, it has a unique modular architecture approach to address issues related to interoperability, finality, and usability using three separate blockchains. To attract developers from the Ethereum and DeFi community, Avalanche adopted EVM as its smart contract execution platform, which it claimed to be the fastest smart contract platform in the blockchain industry, as measured by time-to-finality (TTF). In this section, we will dive deep into Avalanche blockchain architecture and its consensus mechanism. We will also discuss how three separate chains—Exchange Chain (X-Chain), Platform Chain (P-Chain), and Contract Chain (C-Chain)—work. As usual, we will give a brief overview of network governance at Avalanche and its ecosystems.
Avalanche blockchain architecture overview Avalanche took a drastically different approach from Binance or Polygon when building the blockchain and DeFi ecosystem. The designers believe most PoS-based BFT consensus mechanisms, which are based on votes and messages, can’t be scalable due to the quadratic increase of messages passing over the network. That is why most such networks have to be permissioned or introduce centralized components, which essentially sacrifice decentralization for scalability and performance. That is what Binance did with its BNB smart chain. In the other camp, Bitcoin and Ethereum 1.0, which are based on the Nakamoto consensus and the PoW protocol, prioritized security and decentralization over scalability and throughput. The throughput and performance in both networks can’t meet the demand of DApps and DeFi ecosystems. Ethereum 2.0, as well as its L2 ecosystems, including Polygon, is addressing this blockchain trilemma through the modular blockchain architecture design. By separating consensus with the beacon chain from the smart contract layer, and further separating data into data shards, it is a
common belief that Ethereum, at the end game, will be able to address blockchain trilemmas and fundamentally solve scalability issues. With additional L2 scaling solutions from Polygon and other rollup providers, Ethereum 2.0 will be an essential decentralized platform to power DeFi, Web3, and Metaverse for years to come. Avalanche took a brand-new approach to tackle challenges and issues facing Ethereum, Bitcoin, and other L1 ecosystems. It started with a new consensus, which we will discuss in the next subsection. Its consensus is inspired by the intuition that, instead of everyone having to reveal their votes, if you continuously survey randomly selected participants, you’ll probably know what the entire population would prefer. The sampling size of randomly selected network participants doesn’t have to be large; Avalanche showed that a sample size of 20 would ensure network security and provide a drastic improvement in reaching a network consensus, no matter how large the network. Similar to Ethereum 2.0, Avalanche follows a modular blockchain design and separates execution, consensus, and asset management into separate modules. In Avalanche, those modules are designed as three separate chains, called X-Chain, P-Chain, and C-Chain. The entire network, as a permissionless decentralized network, is called the primary network. To meet enterprise and vertical industry needs, any subset created from the entire network can form a custom chain and serve the needs of specific industries or verticals. The following diagram depicts the Avalanche blockchain architecture:
Figure 4.12 – Avalanche blockchain architecture To understand Avalanche blockchain, let us start with its primary network. Avalanche’s primary network is a permissionless P2P decentralized network made of all its validators. AVAX is the native token powering the Avalanche network. Anyone can stake AVAX and become the validator. Validators may be randomly selected to participate in sampling as part of the Avalanche consensus protocol. In Avalanche, a blockchain is abstracted as a virtual machine (VM), which defines the interface, state, and behaviors of a blockchain. The VM and a blockchain are similar to a class and an object in object-oriented language (OOL). The blockchain in Avalanche is a concrete instantiation of a VM. In a way, this may not be too much difference in concept than Polygon Edge, which provides a framework and template for spawning new blockchains. In Avalanche, a blockchain is an instance of a VM and runs on the primary network. The primary network and Avalanche consensus mechanism ensure the liveness, safety, and finality of the blockchain. A VM defines the data structure of a block and the state transition during block creation, as well as APIs and endpoints to be used to interact with the blockchain. Avalanche also introduced the subnet concept. A subnet is a subset of validators grouped together to ensure consensus on a blockchain. The primary network itself is a special subnet. By default, Avalanche defines
three separate blockchains: X-Chain, P-Chain, and C-Chain. They are validated and secured by the entire primary network. You can create as many custom blockchains as you want; each one can be secured and validated by one subnet. You can slice and dice the entire primary network into subnets. The blockchains and subnets are one-to-many relationships, where multiple blockchains can be validated by one subnet. Let us go over the three default blockchains in Avalanche, as follows: X-Chain—X-Chain, the exchange chain, is the default asset blockchain on Avalanche and comes with X-Chain APIs enabling the creation of new assets, exchanges between assets, and the execution of cross-subnet transfers. It is an instance of the Avalanche Virtual Machine (AVM). X-Chain implements the Avalanche consensus protocol. AVAX is the native token on the Avalanche network and can be traded on X-Chain. P-Chain—P-Chain is the Avalanche platform chain; its role is very similar to the beacon chain in Ethereum 2.0. It is an implementation of the Platform Virtual Machine (PVM) and is responsible for staking and protocol governance. It is the metadata blockchain that coordinates validators, keeps track of active subnets, and enables the creation of new subnets. It provides APIs for creating customer blockchains and subnets and adding or removing validators from a subnet. As you can see from Figure 4.12, through the API, Avalanche allows you to create custom chains and attach the subnet to your own chain. C-Chain—C-Chain is the contract chain; it is an EVM-compatible chain for smart contract execution. It is an instance of the geth Ethereum VM powered by Avalanche. Since it is EVM-compatible, most existing Ethereum development tools can work with Avalanche seamlessly, and existing smart contracts can be deployed on Avalanche C-Chain with little or no changes and take advantage of the Avalanche blockchain network. One another notable deviation from the traditional blockchain is that Avalanche introduced a direct acyclic graph (DAG) into its chain formation, instead of a linear chain formation where one block can only
have one parent block and one child block. In a DAG, one parent block can have multiple children and a child block can have multiple parents. In fact, a linear chain formation is a specialized DAG formation. As you learned from Chapter 1, Blockchain and Cryptocurrency, Bitcoin uses the Unspent Transaction Output (UTXO) model for its block formation, while Ethereum introduced the Account model in its block formation. The UTXO model is good for simple state transitions such as asset exchange and transfer, but the Account model is a better fit for smart contract execution and computation. Avalanche adopted both in its platform. As depicted in Figure 4.12, X-Chain adopted a UTXO-based DAG formation for the state transition of asset exchange in its own chain progression. Both P-Chain and C-Chain continue to use linear formation for their blockchain. Don’t confuse the DAG representation of UTXO transactions with the DAG we discussed here. In Bitcoin, UTXO transactions themselves can form a DAG, as we showed in Figure 1.25 in Chapter 1, Blockchain and Cryptocurrency. But transactions in the Bitcoin network are still packed in a block, and the block is then added to the linear blockchain. In Avalanche XChain, transactions are packed into a block, which becomes a vertex of the DAG during the chain formation. You may recall from Chapter 1, Blockchain and Cryptocurrency, that the linear chain formation and UTXO model help solve the double-spend problem in digital payment. With DAG, Avalanche handles the double-spend problem in its consensus protocol through the rejection of conflicting transactions, which we will discuss in a later section. Each chain has its own consensus mechanism. X-Chain implements the Avalanche consensus algorithm. Both P-Chain and C-Chain implement the Snowman consensus mechanism, which is an optimized version of the Avalanche consensus mechanism for linear chain formation. Next, let us delve into the consensus mechanism and understand how all three blockchains work.
Avalanche consensus mechanism
Imagine that you are speaking to a crowd in a large stadium on the topic of blockchain and Avalanche. You are not sure how familiar they are with the topic, so you will probably survey the audience to see whether they prefer a more technical presentation or a more business-oriented non-technical discussion. You will likely move section by section to get a sense of the audience’s preference. You will be pretty sure by intuition about the preference of the stadium after a few rounds of such surveys giving the same answers. Avalanche consensus works in the same way. Instead of notifying every validator in a large, decentralized network, network consensus can be achieved with a few rounds of random sampling of a small subset of validators.
Snowball consensus algorithm in Avalanche Let us say a new transaction is submitted to the network that has 2,000 validators, and the network needs to reach a consensus on whether to accept the transaction or not. The sample size is 6, which means for each round, 6 validators will be randomly selected. The quorum size is 4, which means the validator knew the preference from the sampling group if at least 4 sampled validators provided the same preferences, and the decision threshold is set as 5. Figure 4.13 shows a visual representation of the sampling process in Avalanche:
Figure 4.13 – Network sampling in the Snowball algorithm In the first round, a surveying validator will query randomly selected six validators and ask their preference for accepting a new transaction or not. If at least four out of six such validators agree on accepting this new transaction, the surveying validator will record Yes or Y for accepting the transaction. In the same token, if four or more selected validators reject the transaction, the surveying validator will record No or N for rejecting the transaction. If no more than four validators provide the same preferences, this round of sampling will not have any preference, thus the surveying validator will record that as Undecided or U. This means that after round 1, the recorded preference will be Y, N, or U. If you continue to sample, you will get Y, N, or U on the second round, and so on; what you get will be some word made up of Y, N, or U. After five rounds, if there are any consecutive Ys such as YYYYY or consecutive Ns such as NNNNN, then consensus has been achieved.
In Avalanche, such a sampling process for reaching consensus is called the Snowball algorithm. The following diagram depicts the consensus process in the Avalanche protocol:
Figure 4.14 – Avalanche consensus mechanism When a transaction is submitted, it will be sent to the primary network. A validator will add the transaction for voting. It selects a small subset of validators out of the primary network and queries those selected validators for their preference of acceptance. Upon receiving the query, all queried validators will validate the transaction and determine if the transaction is valid or conflicts with any accepted transactions. If the transaction is invalid or there is any conflicting transaction, the incoming transaction will be rejected. The inquiring validator will collect the responses from all queried validators and record the preference based on sampling results from this set of
selected validators. If a large portion of responses think the transaction should be accepted, the inquiring validator will record the subsampling preference as accepted; otherwise, if a large portion of responses prefer to reject the transaction, the inquiring validator will record the subsampling result as rejected. The inquiring validator repeats such a subsampling process until the threshold is met, which means alpha of the validators queried reply the same way (accept or reject) for beta (β) consecutive rounds. After that, if the consecutive rounds of subsampling prefer acceptance, the inquiring validator will accept the transaction; otherwise, it will reject the transaction.
Block creation and finality Similar to all other blockchains, transactions are bundled together as blocks, and the blocks are added to the blockchain and keep the blockchain live. Instead of each transaction going through the Avalanche subsampling process in the previous subsection, in Avalanche X-Chain, the entire block of transactions will be processed together. If a block is accepted, it will be added to the chain and keep the chain progressing; otherwise, rejected transactions within the block will be discarded, and all valid transactions will be reprocessed in future blocks. The blocks in X-Chain form a DAG, instead of the linear chain we commonly see in most layer 1 blockchains. The following diagram illustrates X-Chain and a DAG in Avalanche. Each block in the DAG is also called a vertex:
Figure 4.15 – DAG in Avalanche X-Chain In Avalanche, when creating a new block, instead of conducting consecutive rounds of subsampling until the threshold is met, it optimizes block creation with both subsampling and transitive voting. Transitive voting means if you vote on a new block, you also vote on all its ancestors. Transitive voting allows X-Chain to temporarily progress without the need for consecutive rounds of subsampling on each block. For example, in the preceding diagram, if you vote on vertex B, you also give a vote on its parent block, vertex A. In the same way, if you vote on vertex E, that means you also throw your vote on its ancestors—in this case, vertices C, B, and A. But if you vote on G, since G has multiple parents, you essentially give your vote on all vertices except vertex F. In this way, when new blocks are added to the DAG, the system accumulates the vote counts on all vertices, and if any of the ancestor vertex’s votes exceeds the threshold, those ancestor vertices will be accepted. The following diagram illustrates the progression and process of how a vertex is accepted to X-Chain. The system maintains the trios, including chit, confidence, and consecutive success, for each vertex. When the subsampling votes acceptance, the chit is set as True, and the confidence and consecutive success are each set as 1. As the child blocks are added, this will increase confidence and consecutive success for all their ancestors.
In this way, Avalanche can keep track of how blocks are accepted in XChain:
Figure 4.16 – Block creation and finality in X-Chain In the preceding example, when vertex A is added, if the subsampling from selected validators prefers to accept the block and all its transactions, the trio is set as A{true,1,1}. By the time block B is proposed and preferred by the subsampling, the trio for B is set as B{true,1,1}. Due to the transitive voting, A is preferred too since A is the parent of B, therefore the trio for block A is reset to A{true,2,2}, and so on and so forth. By the time E is added and preferred, the trio for block A is reset to A{true, 5,5}, which means A just reached the threshold of five consecutive successful votes. Therefore, A is marked as accepted. Once the vertex is accepted, all transactions in the vertex are considered as final. Finality in Avalanche X-Chain only takes seconds, which is super-fast compared with Ethereum and Bitcoin blockchain.
Snowman consensus for P-Chain and CChain Differently from X-Chain, blockchains in both P-Chain and C-Chain form linear chains, just as with most layer 1 blockchains. In fact, a linear
blockchain is a simplified case of a DAG, where one block has exactly one parent block and one child block. It may temporarily have multiple child blocks, but eventually, the longest chain or heaviest chain wins. Both P-Chain and C-Chain in Avalanche implement Snowman consensus to secure the blockchain. Snowman consensus is a specialized implementation of Snowball consensus, which generates the linear ordering of blocks during the chain formation and enforces the new block to only have one parent block. In the case where multiple child blocks exist during the temporary fork, some branches will be pruned, and only one branch will eventually survive. You now understand how built-in chains work in Avalanche. In the next subsection, we will introduce the concept of subnets and custom blockchains.
Subnets and enterprise blockchains Avalanche made it easy to create custom blockchains through the concept of the subnet. A subnet is a subset of all Avalanche validator sets that are grouped together for securing the custom blockchain. As we discussed in the overview of the Avalanche blockchain, the primary network, which comprises all staked validators, is a special subset of Avalanche. The primary network is responsible for the security and liveliness of all three built-in Avalanche chains: X-Chain, P-Chain, and C-Chain. P-Chain is responsible for the metadata management layer of all custom chains in Avalanche. In the network topology, as shown in the following diagram, two C1 and C2 subnets are formed; C1 is created for securing the custom chain, while the C2 subnet is created for the enterprise chain. They can be permissioned or permissionless. All metadata regarding the subnets and the custom chains is managed through P-Chain:
Figure 4.17 – Avalanche subnet and custom blockchains Subnet has a one-to-many relationship with custom or enterprise chains. That means one subnet can be used for many blockchains, but each blockchain can only be secured by one subnet. A validator can participate in multiple subnets. Similar to three built-in chains such as X-Chain, P-Chain, and C-Chain, each custom blockchain is an instance of a VM too. Once defined, the custom blockchain will be instantiated and live on one subnet. Together with the subnet, they define their own execution logic, maintain their own state, and agree on the blockchain state using their own consensus. Custom blockchains can have their own native tokens and crypto markets. They become independent networks, and nothing is shared between the custom blockchains. They have to communicate with each other through P-Chain, the platform chain. Avalanche allows anyone to create a default EVM-compatible VM or custom blockchain through simple configuration and a command-line interface (CLI). You can customize the default configuration and define a specific EVM custom chain based on your specific need. But if you want to
build your very own blockchain, you have to implement your own VM interface. Similar to Polygon Edge, Avalanche provides a framework for creating custom blockchains through the concept of a VM. A VM is a blueprint for a custom blockchain in Avalanche. All blockchains are instantiated from a VM. A VM defines the interface to handle building, processing, and storing blocks on the blockchain. It is a custom execution layer that defines how transactions are executed in the subnet, how the states are maintained as part of state transitions, and how blocks are created in the custom chain. Taken from Ava Labs’ GitHub site, the following screenshot shows the VM interface in Go:
Figure 4.18 – The VM interface for a custom chain
To build a custom VM, you have to implement the ChainVM interface, which allows the subnet to build a new block via the BuildBlock method, notify the subnet of a preference via SetPreference method, and return the latest accepted block via the LastAccepted method. If interested, you should check out Avalanche’s subnet instructions for more details on how to build a custom VM (https://docs.avax.network/subnets/introduction-to-vm). Similar to other modular blockchains, the Avalanche subnet allows the decoupling of the execution layer implemented through the VM and consensus layer. All VMs run on top of the Avalanche Consensus Engine, which allows validators in the subnet to agree on the state of the blockchain. AvalancheGo provides the consensus engine for every blockchain on the Avalanche network. You can check the AvalancheGo GitHub site for more details on how consensus and VMs work together to progress the blockchain (https://github.com/avalabs/avalanchego/tree/master/snow). In the next subsection, we will give an overview of the governance model in the Avalanche platform.
Governance in Avalanche chains Earlier, we mentioned that P-Chain is responsible for staking and protocol governance. In fact, in Avalanche, anyone can stake AVAX tokens to become a validator and partake in Avalanche governance. Avalanche leverages on-chain governance for critical governance parameters, where the governance participants can vote on changes to the network settings and make network upgrade decisions democratically. The following governance parameters are allowed to change through governance voting: The minimum staking amount The minimum and maximum time a node can stake Minting rate
Transaction fee structure Staking rewards, fees, and airdrops To avoid drastic changes in the governance parameters and ensure the predictability of the protocol, all governance parameters are also subject to limits in change frequency and range of changes. This prevents the same governance parameters from being changed too often within a short period. It also restricts how many changes a governance parameter is allowed to make. By restricting both change frequency and ranges, the system can be more predictable. In addition, there are a set of consensus-related parameters, including those controlling how the subsampling works, which can be configured through blockchain initialization. Before we conclude the discussion of Avalanche blockchains, let us take a look at the Avalanche ecosystem in the next subsection.
Avalanche ecosystem Avalanche built its blockchain infrastructure and consensus mechanism from the ground up. Its modular approach allows the decoupling of consensus and blockchain execution so that developers can leverage both an established EVM-compatible execution environment for smart contracts and DApps and innovative snow family consensus for scalability and fast finality. Avalanche claims it can achieve 4500 TPS, and transactions in Avalanche can reach finality within a second, which is the fastest smart contract platform in the layer 1 blockchain space. Since its mainnet launch in September 2020, it has become one of the leading DeFi and blockchain platforms. It is Solidity-compatible, which makes all smart contract development tools and DeFi protocols portable on Avalanche. In fact, most of the DeFi protocols we discussed in Chapter 3, Decentralized Finance, extended their support on Avalanche.
The ability to build custom chains with high throughput and scalability, as well as fast transaction finality, makes Avalanche an ideal platform for certain use cases that failed in finding success in other layer 1 platforms. Such categories of projects on Avalanche are worth mentioning, including the following: Gaming—Avalanche found great success in the gaming industry. The instant finality, scalability, and high throughput of Avalanche chains afford the gamer an immerse gaming experience that is almost comparable with the game experience in the centralized world. The ability to create a subnet gives the game developer flexibility in offering differentiated features and operating in an insulated blockchain environment. Enterprise and consortium blockchains—Avalanche subnets enable enterprise and institutional verticals to deploy permissioned EVMcompatible blockchains or build custom blockchains specific to the enterprise and the industry. Web3 and Metaverse—The Avalanche platform seems to have positioned itself to provide one platform for supporting all kinds of use cases, from DeFi to NFT, from marketplace to exchange, and from public permissionless blockchains to permissioned enterprise or consortium blockchains. In addition, Avalanche provides the built-in infrastructure for bridging assets from all custom chains within the platform. It leverages Intel Software Guard Extensions (SGX) enclave technology to secure bridge assets between Avalanche and Bitcoin or Ethereum networks. Cross-chain interoperability and communication are complex subjects. We will provide an overview of different cross-chain interoperability options in the next section.
Bridging interoperability gaps between blockchains
So far, you have learned about quite a few L1 blockchains, including Bitcoin, Ethereum, BSC, and Avalanche. Most L2 rollups and sidechains are considered separate blockchains, according to modular blockchain architecture since they mainly offload transaction executions from the L1 chain and rely on the main chain for security. Limitations in mainstream blockchains such as Ethereum and Bitcoin, as well as mass adoption of DeFi and other Dapps, have propelled the growth of other blockchains, such as BNB smart chains, Avalanche, and Solana, and the transition to L2 rollups. Blockchains are like individual countries. Each has its own protocol rules, ecosystems, and cryptocurrency, which creates an interoperability gap between different blockchains. Without proper channels, you can’t take your asset or your data from one chain to another chain, or back and forth. Cross-chain interoperability refers to the means to communicate and share data across different blockchains and act upon information or events received from other chains. In the following subsections, we will discuss cross-chain integration challenges and go over common blockchain bridge patterns, as well as infrastructure and frameworks facilitating communications between blockchains.
Cross-chain integration challenges Most blockchains are built from the ground up, with their own security and consensus mechanism, their own protocol rules, and the way they manage assets and transactions. The first generation of blockchains, such as Bitcoin and Ethereum were specially designed as standalone, sovereign, and selfcontained networks powering their own crypto economy and ecosystem. Integration with the off-chain was very limited. Integration with other blockchains and interoperability across chains were not natively built out of blockchains. Let us say you want to transfer 100 ethers from Ethereum to another blockchain such as Avalanche. For such a simple asset transfer use case, you have to deposit 100 ethers to your Ethereum account and instruct Ethereum to transfer it to Avalanche. Ethereum has to wait until the
transaction is processed and finalized, and then transfer the custody of your 100 ethers and lock them into some smart contract so that it can communicate to Avalanche there are 100 ethers to be transferred to Avalanche. Without proof of the deposit transaction, Avalanche may not recognize or trust such a request. Some entity has to watch for the deposit and lock events on Ethereum and wait until its finality, and then notify Avalanche it is safe to transfer. On the Avalanche side, it has to transfer the custody of 100 ethers to an Avalanche smart contract or accounts and mint the AVAX tokens and issue these to the user account on Avalanche X-Chain. Since Avalanche doesn’t recognize the ether as its native token, somewhere in between, both sides have to agree on the swap price between ether and AVAX or convert back and forth to wrapped tokens both can recognize. If you want to withdraw your 100 ethers back from Avalanche to Ethereum, all the steps have to be reversed to make it happen. Any of these steps could cause transaction integrity issues, and any third-party entity introduced as the mediator could create trust issues and lead to security concerns. Asset transfer is just one such cross-chain interoperability use case. Most sophisticated use cases, such as DeFi products and services across chains, require much more complex transaction coordination and workflow across chains. Despite the importance of blockchain interoperability and the pressing demand for such integrated solutions, there are many technological challenges, barriers, and concerns that make interoperability a hard problem to solve, including the following: Transaction and finality—These are related to the difference in processing transactions and achieving finality among different blockchains. Asset and value conformity—Blockchains have their own native currency and tokens. Without an exchange-and-swap mechanism, one blockchain can’t recognize assets from other blockchains. Atomicity and synchronization—Depending on the blockchain, cross-chain interoperability and communication involve many atomic asynchronized transactions on both sides of the blockchain; it is not
possible to come up with one transaction to coordinate the steps on both sides of the transactions and make it atomic. Trust and consensus—Cross-chain communication requires some entity, including any involved party themselves, to present the evidence of some transactions that occurred on one side, communicate the evidence to the other side, and coordinate actions on the destination chains. Trust and consensus issues refer to concerns on how to prove the validity of transactions from the source chain and allow the target chains to easily verify the validity of transactions on the other chain, as well as which entity or entities you trust as mediators for facilitating such activities. At its core, there lacks a security guarantee that my asset can be safely landed into my account on the other blockchain. Over 1 billion dollars of loss, due to security hacks into the leading blockchain bridge protocols, such as Wormhole, Nomad, and Polygon, has shown blockchain interoperability has yet to mature.
Common cross-chain bridge designs We discussed in Chapter 2, Ethereum Architecture and Ecosystem, that bridges are common design patterns between the Ethereum mainnet and L2 rollup solutions. Common bridge design includes smart contracts on both sides of the Ethereum mainnet and L2 rollup network. The user may deposit ethers and exchange them for L2 tokens, and then use L2 tokens to process transactions and trade on L2 rollups. In such a design, the ethers will be locked inside the bridge smart contract, and then mint the ERC-20 tokens on L2. When the user exits, it simply allows the L2 rollup network to burn the L2 tokens and return the remaining ethers to the user. Without a direct bridge between L2 chains, a token transfer between Arbitrum and Optimism, two leading optimistic L2 rollups, has to go through the Ethereum mainnet to get completed, as the following diagram shows:
Figure 4.19 – Bridge options between L2 rollups In general, a cross-chain interoperable transaction is comprised of multiple activities on both the source chain and the target chain. Evidence of activities that happened on the source chain must be communicated or relayed to the target chain for the target chain to complete the activities on the target chain. For the target chain, how to verify the validity of the messages will be different from one bridge design to another. At a high level, there are three categories of blockchain bridges, as follows: Natively verified bridge—In this design, communication may be made directly between the source and target chain. Both chains rely on the blockchain native capability, such as the light client, to verify the message. Locally verified bridge—In this design, communication may be made directly between the source and target chain too. Both chains rely on the smart contract to verify the message. This kind of verification normally only involves the transactions involved in cross-chain communications. Externally verified bridge—The communication across chains may be helped by a third-party watcher, called a relayer, which collects
evidence of the transaction completion and presents that evidence on the target chain. Several notary-based bridges fall into this category. Let us go over all three categories. Both natively verified bridges and locally verified bridges are point-to-point bridge solutions where communications are established without a thirdparty entity. With the externally verified bridge solution, you have to trust the centralized third-party entity or rely on a trustless decentralized network. We will go over these categories of bridge design in the following sections.
Natively verified bridge design One common bridge design is the natively verified bridge based on light clients, where the light client of the target chain is deployed as a smart contract on the source chain and vice versa. As you learned from Chapter 2, Ethereum Architecture and Ecosystem, a light client is a node that carries all the blockchain header information; instead of in the full-node client case, it has to carry all the block details since the genesis block. A light client has all the needed information for the counterpart chain to confirm and verify that transactions happened on the source chain. The following diagram shows such a design:
Figure 4.20 – Point-to-Point bridge with light clients Obviously, bridge design with light clients can’t be scaled when a large number of blockchains are involved in such cross-chain transactions, since it requires N2 such pairs of light-client deployments to facilitate interoperability and communications among N chains. Another issue with the light-client design is that there is too much information to be shared with the target chain, in addition to those cross-chain-related transactions. Let us say one of the transactions in a block on the source chain deals with asset transfer to the target chain. Since that specific asset transfer transaction is packaged together with other transactions in the blockchain in block 1000, the light client will have all the header information, not only information from inside block 1000 but all other non-cross-chain transactions and blocks as well.
Locally verified bridge design Another category of bridge solution is the locally verified bridge based on the hashed timelock mechanism. A hashed timelock is a type of time-based mechanism that is used to restrict access to a certain resource or action until a certain time has passed. It works by requiring the user to provide a
cryptographic hash of a secret value, which is combined with a predetermined time delay. The receiver is required to provide an acknowledgment in the form of cryptographic proof. If the correct hash is provided after the time delay has elapsed, the receiver is granted access to the resources or allowed to perform the action. Hashed timelocks have been used as a mechanism for cross-chain bridges, where cross-chain asset transfer is made to happen via a hashed timelock. The following diagram shows how crypto asset transfer over a hashed timelock-based cross-chain bridge works:
Figure 4.21 – Point-to-point bridge using hashed timelock technique In this type of bridge, a hashed timelock contract (HTLC) will be deployed on both sides of the blockchains. Let us say Alice needs to send 10 ethers on blockchain A to Bob on blockchain B. Let us assume both chains recognize ethers as the supported crypto asset type. The specific process for transferring 10 ethers will look like the following:
1. Alice deposits 10 ethers into an HTLC on chain A and initiates the cross-chain transfer to Bob on blockchain B. 2. Alice then chooses a passcode and uses the passcode to generate a key and then hash the key. 3. Alice then sets the timelock on the asset locked inside the HTLC. 4. Alice then sends the passcode to Bob. 5. Once Bob receives the passcode, he uses the passcode to unlock the funds with some time delay. If the locked time expires, the fund will be refunded back to Alice on blockchain A. The hashed timelock-based bridge solves the trust problem inherent in the cross-chain process. As long as Alice keeps the passcode to herself, and Bob has enough time window to unlock and transfer funds, both Alice and Bob can make the cross-chain transaction without the need for any trusted third party.
Externally verified bridge design More common bridge designs rely on third-party entities to be mediators. The mediator could be one or many trusted entities or many trustless entities. A trustless bridge has to rely on a blockchain to reach a consensus on what to relay to the destination chain. Such a blockchain is also called a relay chain, which we will discuss in the next subsection, together with Polkadot. In the case of trusted third-party mediators, as the following diagram shows, a set of trusted mediators are selected to collect evidence of events that occurred on the source chain, provide their signature on the evidence, and present them to the target chain. The target chain acts upon the receipt of such evidence and completes the transaction:
Figure 4.22 – Bridge options with a third-party notary There are three types of mediators in such a bridge design, as follows: Single notary—In this case, the bridge sets up a small number of watchers, called relayers, to monitor and watch the transaction on the source chain. In Ethereum’s case, this could be the events emitted from smart contract executions and transaction logs. One of the watchers will be selected to sign the evidence and relay the evidence to the target chain. Multi-sig notary—In this case, the bridge sets up multiple watchers. All of them are monitoring and watching events on the source chains, and upon collecting any cross-chain transaction evidence, they have to sign a multi-sig notary to relay the evidence to the target chain. The target chain makes sure that all signatures are present. Similar to the multi-sig wallet, a multi-sig notary does introduce privacy concerns since the private keys for all parties have to be made available.
Distributed notary—In this case, a set of watchers are established. They all have to watch and monitor events on the source chain. When relaying the evidence to the target chain, they instead work together to provide their own signature on the evidence using a piece of the key in a multi-party computation (MPC) approach. In the MPC approach, the unique key used to sign the evidence is split into multiple fragments, and those fragments are then randomly distributed among a set of notaries. Compared with the single notary or multi-sig notary approach, this is considered to be more secure and able to protect privacy. Notary-based bridge design relies on the honesty of all notaries. If any of the notaries is compromised, the bridge may be open to security holes. In such cases, the bridge becomes the weakest link in the security of both source and target blockchains. In the next section, we will introduce trustless bridge solutions built using the Polkadot framework.
Trustless bridge solutions using Polkadot Polkadot is the next-generation blockchain framework for building connected interoperable blockchains. It is a heterogeneous hub-and-spoke multi-chain infrastructure with built-in support of interoperability and a shared security model. At its core, it is the relay chain acting as a hub, with many parallel blockchains called parachains as the spokes. The relay chain is responsible for coordinating the system as a whole, including parachains, interacting with governance, and participating in a nominated PoS consensus, a variation of the PoS consensus. Parachains can be applicationspecific blockchains responsible for transaction processing, state transition, and executions. In Polkadot, a parachain can be built using the Substrate framework and connected to the relay chain through parachain slots. Cross-chain communications between Substrate framework-based parachains are
natively supported within Polkadot. It has built-in support cross-chain communication with a cross-consensus protocol. Cross-consensus message (XCM) defines the message format passed from one chain to another chain. The cross-consensus protocol specifies how the XCM message will be transferred across the chains. Two message-passing protocols were defined in Polkadot, including crosschain messaging passing (XCMP) and vertical message passing (VMP). XCMP is used for messaging between parachains, and VMP is used between the relay chain and parachains. The relay chain uses XCMP to connect the message sender and receiver and transmit the messages from the sender parachain to the receiver parachain. The cross-chain message, formatted in the XCM definition, is sent as the payload of the message. The same messaging mechanism is leveraged for building externally verified trustless cross-chain bridges. The following diagram shows how a Polkadot bridge works:
Figure 4.23 – Trustless bridge options with Polkadot The bridge is made of a collator network and a number of collators. The bridge itself is a decentralized blockchain as a parachain connected to the relay chain via the parachain slot. The blockchain in the bridge is maintained by collators. All collators continuously watch on events from
the blockchain that it bridges to, and present cross-chain transactions as XCM messages to the collator network. The collator network has to reach a consensus on the validity of the messages before the messages are sent to the relay chain. The relay chain, once it receives the messages, will use XCMP to determine the destination parachains to handle the messages and, in turn, send the messages to the target parachains. On the destination parachain, it will use the bridge to interact with the target blockchain to complete the cross-chain transactions. Taking the same example, let us walk through how to transfer 100 ethers from Ethereum to Avalanche using a Polkadot bridge. For such a simple asset transfer use case, you have to deposit 100 ethers to your Ethereum account and instruct Ethereum to transfer it to Avalanche. Ethereum has to wait until the transaction is processed and finalized, and then transfer the custody of your 100 ethers to the bridge smart contract on Ethereum. The 100 ethers will be locked into the bridge smart contract. Collators on the Ethereum-Polkadot bridge will watch on the events emitted from the smart contract. If a majority of the collators agree on such events, then the collator will format XCM messages and send the messages using XCMP to instruct the relay chain to transfer 100 ethers to Avalanche. Once the parachain and the Polka-Avalanche bridge receive such an ether transfer message, they will notify the bridge smart contract on Avalanche to mint a number of wrapped ether ERC-20 tokens on the Avalanche network, with a value equivalent of 100 ethers. The bridge smart contract will then deposit those wrapper tokens to the user-specified Avalanche account. When you want to exit the Avalanche network and take your asset back from Avalanche to the Ethereum network, the bridge smart contract will simply burn the remaining wrapped ether tokens and notify the parachain to return equivalent ethers back to the user-specified Ethereum account on the Ethereum network. In addition to value transfer, Polkadot can be used as a generalized crosschain communication solution. A bridge in Polkadot is a connection for transferring data across chains using XCM and XCMP as messaging mechanisms. These chains are standalone sovereign chains and have their own protocol rules, execution engine, and governance. By leveraging the
Polkadot bridge, they can interact and integrate with each other. The bridges connect to the relay chain and are secured through the Polkadot consensus mechanism, maintained by collators. Check out the Polkadot site for more information (https://wiki.polkadot.network/docs/getting-started). Similar to Polkadot, Cosmos can be leveraged to build a trustless generalpurpose cross-chain bridge too. Let us examine how a Cosmos-based bridge works in the next section.
Generalized bridge solutions with Cosmos Cosmos offers more generalized blockchain bridge solutions using InterBlockchain Communication (IBC). In its original version, it primarily focused on the interoperability and connectivity of IBC-connected chains, as well as the creation of a multi-chain ecosystem that can support a wide range of DApps and services. It builds the infrastructure for the internet of homogeneous blockchains, each powered by BFT-based consensus mechanisms such as Tendermint. It provides the developers with a stack of technologies, including Tendermint, IBC, Cosmos SDK, interchain security, and liquid staking, to develop and operate a blockchain and join the Cosmos network. Private blockchains, such as Hyperledger and R3, as well as nonBFT-based chains, can be IBC enabled and interact with the Cosmos network. In the Cosmos network, a group of blockchains, called zones, are connected to a central hub through a secure and trustless channel. The hub is also called the hub zone. The hub acts as a central point of coordination and ensures that the different blockchains are able to communicate with each other securely and reliably. There could be multiple such central hubs, and one hub can connect to another hub. In this way, even without connecting to the same hub, a blockchain can connect to another blockchain from another cluster of hubs/zones. IBC is a more generalized cross-chain communication protocol. Similar to the TCP protocol for the internet, IBC handles the transport, authentication,
and ordering of data between two blockchains. The following diagram shows how IBC-enabled blockchains communicate with each other:
Figure 4.24 – Generalized bridge options with Cosmos IBC has two layers built into the protocol to facilitate cross-chain communications. At the base infrastructure layer, it is the IBC TAO layer that defines the transport, authentication, and ordering of packets. The transport layer (IBC TAO) provides the necessary infrastructure to establish secure connections, channels, and ports and transport and authenticate data packets between chains. The following is a list of key components in the IBC TAO protocol stack: Port—A port in Cosmos denotes the type of application, such as a fungible token transfer or token exchange. They are identified by a unique ID. An IBC module can bind to any number of ports. Channel—Channels are used to facilitate communications, including sending, receiving, and acknowledging packets between IBC modules
on both ends of blockchains. Channels are established with a handshake. Connection—Connections are responsible for facilitating all crosschain verifications of an IBC state. A connection can be associated with any number of channels. The connection is established through a handshake, and the purpose of the handshake is to verify that the light clients on each chain are the correct ones for their respective counterparties. Light client—Cosmos relies on the light client to verify the proofs and keeps track of the state of the counterpart blockchains. It relies on the light client to verify proofs against the light client’s consensus state. The relayer also uses the light client to verify messages exchanged. Relay—In Cosmos, cross-chain messages are relayed from one chain to another through the IBC relayer. The relayer monitors the state change of each chain and submits those updates to the counterpart chains. On top of the IBC TAO layer is the application layer, or the IBC/APP layer, which defines exactly how data packets should be packaged and interpreted by the sending and receiving chains. Such application use cases include, for example, fungible token transfers (ICS-20), NFT transfers (ICS-721), interchain accounts (ICS-27), and so on. To complete the discussion of blockchain bridges, let us quickly take a look at decentralized oracle-based bridges in the next section.
Decentralized oracle as a bridge In both Chapter 2, Ethereum Architecture and Ecosystem, and Chapter 3, Decentralized Finance, we mentioned Oracle as a mechanism to bring offchain data on-chain. In fact, many of the DeFi protocols we discussed in Chapter 3, Decentralized Finance, use Oracle to access the off-chain pricing data of underlying crypto assets.
By definition, a blockchain oracle is an entity that allows the native blockchain to connect to external systems or data sources and enables smart contracts to execute based on off-chain inputs and outputs. Decentralized oracle networks (DONs), a decentralized network of oracle nodes, facilitate the creation of hybrid smart contracts, where on-chain smart contracts and off-chain infrastructure are integrated to support more advanced DApp use cases that react to off-chain events and interoperate with external systems. Chainlink is one such notable implementation of a DON. In essence, all cross-chain interoperability problems are blockchain oracle problems. A blockchain oracle can be leveraged as a bridge solution to facilitate the interoperability between blockchains. The following diagram illustrates such a bridge solution based on a DON:
Figure 4.25 – Bridge options with a DON In this type of bridge solution, one blockchain can get the inputs from the decentralized order network and write the output data to the oracle for the other blockchain to use. The oracle becomes the media of data exchange.
To support the growing demand for cross-chain bridge and interoperability solutions, Chainlink is working on the Cross-Chain Interoperability Protocol (CCIP)—a new open source standard for cross-chain communication. Similar to XCMP we discussed with Polkadot, CCIP aims at being a general inter-blockchain messaging and data exchange protocol. As with IBC in the Cosmos network, CCIP is a layered protocol. CCIP is intended to establish a universal connection between hundreds of permissioned and permissionless blockchain networks. On top of that, it provides provide users with cross-chain application services, such as the Chainlink Programmable Token Bridge, various other bridge implementations, and the ability to create powerful cross-chain applications that span any blockchain network. CCIP is still under development. Check out the Chainlink site for more details (https://blog.chain.link/introducingthe-cross-chain-interoperability-protocol-ccip/). So far, we have discussed several leading EVM-compatible blockchains, as well as different bridge solutions for cross-chain interoperability. In the next section, we will quickly go through some of the leading non-EVM blockchains.
Glancing over non-EVM blockchain networks There are several non-EVM blockchains worth watching too. Before we wrap up this chapter, let us take a high-level overview of two of those nonEVM chains, notably TRON blockchain and Solana.
TRON blockchain overview TRON is a leading non-EVM-compatible open source public blockchain platform that supports smart contracts and can achieve high TPS. It has its own Turing complete VM, called TVM, very similar to EVM in Ethereum and EVM-compatible networks. It supports smart contracts written in Solidity, as well as other popular smart contracts in the planning. It has its own consensus mechanism in the DPoS algorithm. In the TRON network,
27 super representatives of nodes rotate every 6 hours to validate blocks and transactions and reach consensus via DPoS. TRON is the native cryptocurrency powering the TRON network, paying for gas. In addition, very similar to the ERC-20 token standard, it also supports the TRC-20 fungible token standard via a smart contract. The following diagram shows a high-level architecture view of the TRON blockchain network:
Figure 4.26 – TRON blockchain architecture It has three layers, as follows:
Application layer—In this layer, DApps and customized TRON wallets can interact with smart contracts on the core layer. In addition to Ethereum-compatible JSON-RPC 2.0, TRON supports HTTP and gRPC APIs for DApps and wallets to send transactions to smart contracts on TRON. Core layer—This layer defines modules for smart contract execution, account management, and DPoS consensus. Storage layer—This layer defines how the TRON network state is stored. It includes Chain Storage and State Storage as the distinctive distributed storage protocol created by TRON. Behind the scenes, TRON uses a graph database to manage the needs of varied data storage in the storage layer. There are three types of nodes on the TRON network: witness nodes, full nodes, and Solidity nodes. Witness nodes are responsible for block production and creating and voting on proposals, as well as governance. Full nodes are responsible for broadcast transactions and blocks to the TRON network. Solidity nodes synchronize irrevocable blocks and provide inquiry APIs for accessing transactions and accounts in the chain. If you are interested in more details, please check out the TRON blockchain developer site (https://developers.tron.network).
Introduction to Solana Solana is another L1 blockchain built from scratch with the purpose of achieving high throughput and low transaction costs. It has garnered great attractions in the blockchain and DeFi community since its launch in 2020. Solana uses a unique consensus algorithm called Proof of History (PoH) to validate transactions on its network. PoH is a type of PoS algorithm that uses a verifiable delay function (VDF) to generate a unique timestamp for each block. This timestamp is generated based on the inputs, the previous hash, and a sequence number; it is difficult to compute but easy to verify, so it can be used to prove that a block was created at a specific time without relying on a trusted third party.
The network is called a Solana cluster. Anyone can stake the SOL, the native token in the Solana blockchain, and become a validator. Solana is a leader-based blockchain. One of the validators will be selected as the leader. At any time, there is only one leader, who will play the role of block producer. The rest of the validators become verifiers. The following diagram shows how the Solana blockchain works:
Figure 4.27 – PoH consensus and Solana blockchain network As shown in the preceding diagram, when transactions are submitted to the Solana cluster, one of the validators will be selected as the leader. It will sequence transactions, calculate the timestamps of all transactions, record the chain state, and sign on the chain state. The leader will package all transactions and signed state, chunk it into smaller pieces, and broadcast them to the network. Each verifier will assemble those pieces and make a replica of the original package, and verify the transactions are valid. All verifiers are required to sign the state and send the signature to the leader. It uses a variation of the BFT consensus algorithm to gain consensus on Solana.
PoH allows Solana to achieve high transaction throughput while maintaining a secure and decentralized network. It also allows the platform to verify the ordering of transactions without needing to wait for multiple confirmations, which makes it possible to process transactions faster than on other blockchain platforms. Check out the Solana site for more details (https://docs.solana.com/cluster/overview).
Summary By now, you should have a good overall understanding of leading EVMcompatible blockchains, including Binance BSC, Polygon PoS, and Avalanche blockchains, as well as their ecosystems. To help you get a grasp of blockchain L1 ecosystems, we also briefly introduced you to two other leading non-EVM chains, TRON blockchain and Solana. They are both thriving in their own DeFi and DApps ecosystem. In fact, we are living in a multi-chain world today, and for the foreseeable future, there will continue to be multi-chain ecosystems. To help you understand how to transfer assets between different L1 blockchains, or how to interact with smart contracts on another L1 blockchain, we discussed various blockchain interoperability options, including both trusted bridge design patterns and trustless bridge mechanisms. We went through more generalized cross-chain messaging options, including Polkadot and Cosmos IBC, as well as DONs such as Chainlink CCIP. The blockchain ecosystem is evolving, and interoperability and security continue to be pressing challenges in the blockchain community. In the next chapter, we will take you through the rest of the Ethereum future roadmap, including all pieces of scaling Ethereum. We will discuss what the end game means to the Ethereum community and what a rollup-centric roadmap looks like. We will discuss more on how zkEVM works and the latest development on zkEVM, an exciting area in EVM. To give you a complete understanding of blockchain and Web3, we will introduce key concepts of Decentralized Autonomous Organizations (DAOs), NFTs, and Metaverse. For folks ready to jump onto developing smart contracts
and get their feet wet, you can skip the next chapter and jump into Part 2, Ethereum Development Fundamental.
Deep Research and the Latest Developments in Ethereum In Chapter 2, Ethereum Architecture and Ecosystem, we explained the key concepts in the Ethereum blockchain, including accounts, smart contracts, and the Ethereum Virtual Machine (EVM). We discussed the internals of Ethereum in detail, as well as how EVM works when executing smart contracts. We introduced you to the concept of modular blockchain design, delved into the Beacon Chain, and looked at how Ethereum transitioned to PoS with the merge of Ethereum 1.0 and Ethereum 2.0. At the end of that chapter, we provided an overview of various L2 scaling solutions, including optimistic rollups and zero-knowledge (ZK) rollups. We discussed leading Decentralized Finance (DeFi) protocols extensively in Chapter 3, Decentralized Finance, and showed you popular EVM-compatible L1 chains in Chapter 4, EVM-Compatible Blockchain Networks. By now, you should have a good understanding of how DApps work and be ready to develop your very first decentralized application. Scaling Ethereum is the major focus post-merge. In this chapter, we will start by looking at the challenges and considerations in distributed systems in general, and introduce schools of thoughts in scaling blockchain networks. We will then help you make sense of the rollup-centric Ethereum roadmap, and discuss various phases of the Ethereum roadmap post-merge. We will also delve into deep research topics to help us solve various scaling puzzles in Ethereum. To help you understand and transition to the Web3 world, we will introduce the latest development in Decentralized Autonomous Organizations (DAOs), Non-Fungible Tokens (NFTs), Web3, and the Metaverse. The following topics will be covered in this chapter: Understanding the challenges in distributed systems Making sense of the Ethereum roadmap
Sharding and data availability sampling Discovering MEV and PBS zkEVM and EVM improvements Smart contract wallet and account abstraction DAOs NFTs, Web3, and Metaverse For those of you who want to get your feet wet and plunge into developing smart contracts, you can jump directly to Part 2, Ethereum Development Fundamentals, and come back later on to grasp various deep research topics in the Ethereum community.
Technical requirements For all the source code for this book, please refer to the following GitHub link: https://github.com/PacktPublishing/Learn-Ethereum-Second-Edition/.
Understanding the challenges in distributed systems In today’s world, distributed systems are everywhere. Internet, intranet, and mobile networks are examples of distributed network systems. More sophisticated ones include clusters, grids, and cloud infrastructure. Examples of distributed applications vary from client-server applications and service-oriented architecture (SOA)-based systems to massively multiplayer online games. Thanks to social, mobile, and cloud applications, large-scale distributed systems have evolved into an indispensable technology platform and ubiquitous always-on environment for businesses, consumers, and average citizens around the world. Technology advances in hardware, devices, and software have made heterogeneity, openness, and transparency less of an issue. However,
guaranteeing availability, security, and scalability and ensuring data consistency and fault tolerance are still major challenges that business and technology leaders face when developing large-scale distributed systems. It comes as no surprise that a decentralized peer-to-peer network is a distributed system at a global scale too. Design choices and trade-offs have to be made by the designer of the distributed systems to address those challenges and issues; they must come up with products to achieve their design goals and meet certain real-world business needs. In distributed database systems and big NoSQL data platforms, the goal is to ensure the security and scalability of the system. The consistency and availability of data and information, along with fault tolerance in a distributed network, are the trade-offs and design choices. In a blockchain network, which is a distributed database in a decentralized peer-to-peer network, decentralization and trustlessness are the ultimate goals, and a blockchain network has to make trade-offs between decentralization, security, and scalability. We will look at this in more detail in the next few sections, where we will look at design trade-offs in distributed systems in general and in blockchain networks in particular.
The CAP theorem The CAP theorem is a fundamental concept in distributed system design. It states that it is impossible to achieve the following three properties in a distributed system: Consistency Availability Partition tolerance In particular, the CAP theorem suggests that the distributed system designer has to understand trade-offs between consistency and availability and prioritize one over another. This is because network partition may occur since there is no guarantee that network nodes are free of failure in a distributed heterogeneous environment. Traditional RDBMSs are more
targeted and ensure consistency and availability in a centralized system, as shown in the following diagram:
Figure 5.1 – The CAP theorem in distributed systems For example, the Cassandra database, a massively scalable open source NoSQL database from the Apache Software Foundation, is the right choice for applications that track large amounts of data but can’t afford to lose data. It provides high availability and can be scaled horizontally. The design choices that have been made in Cassandra mean that it prioritizes availability and partition tolerance over consistency, although it can be configured to provide strong consistency, eventual consistency, or a hybrid consistency. CouchDB is in the same category. On the other hand, MongoDB is strongly consistent by default, which also means it compromises availability. It is a general-purpose, document-based distributed database that achieves horizontal scaling by sharding data across many servers. Its design goal is to offer strong consistency and partition tolerance (the C and P of CAP), making the trade-off for availability during
network partitions. It can be configured for eventual consistency too. HBase is in the same category.
Horizontal scaling versus vertical scaling One of the most challenging problems in a distributed system is scalability. Scalability refers to the ability of a system or network to handle increasing amounts of work. In the context of software systems, scalability refers to the ability of a system to handle increasing amounts of users, data, or transactions. For example, a system is considered scalable if it can maintain or increase its throughput under an increased load when resources are added. As the following diagram shows, in a typical three-tier application scenario, where a web server accepts a user’s request, and also depends on the request, the application server processes the request and the final state is updated in the underlying database in the database server. When users increase drastically – let’s say 5-10 times the number of concurrent users suddenly access the system – how do you scale the multi-tier applications and meet the required service-level agreement (SLA)? There are two ways to scale the system, as depicted in Figure 5.4, which means the designer or architect of the multi-tier application needs to ensure each tier can scale independently and that, as a whole, it can meet the desired performance when the concurrent users increase:
Figure 5.2 – Vertical versus horizontal scaling One technique is vertical scaling. In this case, you can increase the computing capacity in each layer by increasing memory, storage, more advanced CPU, and so on. Another technique is horizontal scaling, where additional servers can be provisioned and added to the server farm in each layer. There are trade-offs between these two techniques. Vertical scaling makes server management straightforward, while horizontal scaling increases the complexity of server management. Horizontal scaling is considered to be more advantageous since with today’s technologies in the cloud and
virtualization spaces, it is possible to scale the system up and down based on user traffic. In the case of vertical scaling, it is not easy to increase or decrease system capacity. In the case of distributed databases, with both RDBMS and NoSQL databases, several different approaches, such as partitions and shading, and design trade-offs enable them to grow to a very large size while being able to scale out, both up and down. A blockchain network is also a type of distributed system. Many of the scaling techniques in distributed systems were applied to solve the scalability challenges in Ethereum blockchains too. Now, it is time for us to go over different Ethereum scalability solutions and proposals and understand how the Ethereum community addresses various scalability issues.
Scaling Ethereum Finding an Ethereum scaling solution is one of the most active research topics in the Ethereum community. As we discussed in Chapter 2, Ethereum Architecture and Ecosystem, it was a long belief that a decentralized blockchain network can’t achieve all three properties, including security, decentralization, and scalability, known as the blockchain scalability trilemma. Therefore, the blockchain designers designed the protocol rules and blockchain architecture by prioritizing any two of the three attributes and solved the pressing needs of their ecosystems. However, modular blockchain architecture made it possible to achieve all three attributes by optimizing each layer. Therefore, as a whole, it can address the blockchain scalability trilemma. This has been manifested by the new Ethereum roadmap, as well as its Beacon Chain implementation, where decentralization and security can be done through the PoS implementation of the Beacon Chain, and the scalability and transaction throughput can be achieved through L2 rollups, such as optimistic rollups and ZK rollups, or L2 plasma, such as Polygon PoS. Avalanche is another example of modular blockchain design, which aims to achieve scalability, security, and decentralization in its blockchain implementation. Avalanche solved its scalability challenges through a new consensus mechanism, called Avalanche Consensus. No matter how many nodes are in the
network, Avalanche can achieve consensus and transaction finality within sub-seconds. As a distributed system, the CAP theorem can be applied to blockchain networks too. In addition, since blockchain powers billions of crypto assets on the network, economic security is a major consideration in designing scaling solutions. The following are a few areas of concern from an economic security perspective: Transaction finality: This refers to the point that the decentralized network reaches a consensus that a transaction has happened and can’t be undone. Currently, post-merge, it takes about 12 minutes or two epochs to consider whether a transaction has reached its finality. Transaction cost: This is not just how many transactions the network can process per second when adding the cost perspective – it is also how expensive the transactions could be on the Ethereum network. Sustainability: This is the game theory that incentivizes good behaviors and penalizes bad and malicious ones. Without that, the network is left vulnerable, and may not be sustainable for a long time. Higher costs and slower finality may drive away the network participants and lead to the network being less decentralized and secure. This was manifested by the rise of L2 scaling solutions, where transactions are moved off-chain. Solutions that are implemented or proposed fall into three categories: onchain solutions, off-chain solutions, and consensus mechanism protocols. With the rollup-centric Ethereum roadmap, Ethereum seems to settle on offchain scaling solutions. These are supported through various L2 rollups, including both optimistic and ZK rollups, as we discussed in Chapter 2, Ethereum Architecture and Ecosystem. PoS implementation through the Beacon Chain, as well as the merge with Ethereum 1.0, is a huge step in addressing scalability and economic security through consensus mechanism protocols. Sharding is the on-chain solution Ethereum is trying to tackle post-merge.
Similar to the scaling options in distributed systems, we can categorize solutions as vertical or horizontal solutions when scaling Ethereum. Increasing the block size is considered the vertical solution. As shown in the following diagram, instead of having many more small blocks, you can increase the block size and have more transactions jammed into one large block:
Figure 5.3 – Ethereum scaling option with a large block size However, like vertical scaling, increasing the block size would require nodes to possess better computing capabilities to process transactions. This may lead to centralization and thus ultimately compromise decentralization and security, the main tenets of the blockchain. Ethereum has shunned such scaling options due to centralization concerns. With a slew of data-sharding options minimizing the impacts of decentralization, having a big block size has become a critical building block in Ethereum scaling solutions. One of the available horizontal scaling solutions, which is data sharding with up to 64 shards, may still be in the future Ethereum roadmap, but it won’t be considered any time soon. L2 rollups are considered horizontal solutions too. As shown in the following diagram, many L2 rollups can
execute transactions independently on their own L2 chains and settle back into the L1 Ethereum chain:
Figure 5.4 – Ethereum scaling option with L2 rollups Ethereum is taking on both vertical and horizontal options when it comes to scaling the Ethereum blockchain. In the next few subsections, we will see what the latest Ethereum roadmap looks like, as well as how Ethereum plans to achieve its goal of 100K transactions per second (TPS) through multipronged scaling solutions.
Making sense of the Ethereum roadmap
Ethereum has been taking a practical and adaptive approach to defining and adjusting its future roadmap. The original Ethereum 2.0 roadmap, defined in 2016-2018, was based on the monolithic concept where the blockchain does it all, achieving scalability through sharding and securing the system through PoS. The transactions are spread into 64 shards. It was a massive undertaking, and this complexity arises when interlinking transactions across shards and considering quadratic sharding, where a multi-layer sharding design may help increase the capacity of the blockchain. Since the beginning of Ethereum, PoS has been the ideal state and ultimate goal in the Ethereum community. Implementing a PoS consensus and smoothly transitioning from PoW to PoS have been the most challenging and painstaking efforts in Ethereum’s history. Although this took much longer to implement, the merge of Ethereum 1.0 and Ethereum 2.0 and the implementation of the Beacon Chain were successfully executed in late 2022 without any hiccups or disruptions. Behind this journey is blockchain technology evolution, from the monolithic blockchain design in the first two generations of blockchains to a modular blockchain architecture in the third generation of blockchains. With the modular blockchain architecture, the blockchain is supported by three decoupled but integrated layers, namely the consensus, execution, and data layers. Each layer can be optimized on its own with minimal or no impact on the system as a whole. As we discussed in Chapter 2, Ethereum Architecture and Ecosystem, the Beacon Chain becomes the consensus layer, and the original Ethereum 1.0 becomes an execution layer. At the same time, due to huge demands from DeFi protocols and DApps, the common challenges with first-generation blockchains, including scalability, throughput, block space scarcity, and transaction cost, have become the pressing issues for Ethereum blockchain adoption. This led to the popularity of various L2 scaling solutions. Over the years, solutions such as state channels and plasma came and went. The more practical and modular L2 rollups, which offload the execution into L2 and settle the transactions on L1 through fraud or validity proof, became the best choices for the Ethereum community in addressing scalability issues and reducing transaction costs. We discussed both optimistic rollups and ZK rollups in Chapter 2, Ethereum Architecture and Ecosystem. Ethereum finally pivoted
to the rollup-centric approach as the critical piece to solving the Ethereum scalability puzzle. To support these L2 rollups, Ethereum became the final security and settlement layer for its ecosystem.
Pivotal to rollup-centric To understand the rationale behind the rollup-centric Ethereum roadmap, let’s revisit how data is managed and exchanged across the L1, Ethereum base layer, and L2 rollups. As shown in Figure 5.5, when transactions are sent to L2 rollups, the rollup validates the transactions, performs the computations and state transitions, and manages the state at L2. L2 rollups, in this case, are just like any other blockchain:
Figure 5.5 – Data exchange between L1 and L2
They need to access data from L1 to ensure the validity of the transactions, and transactions, in batches, need to be sent back to L1 for settlement and leverage L1 to ensure global security. In the case of optimistic rollups, whoever challenges the validity of the transactions will need to access the data to ensure its fraud-proof. In the case of ZK rollups, the validator at L2 will need to access the data to submit validity proof. In either case, the transaction cost is much lower than the single L1 transaction cost since a large number of L2 transactions can be batched and bundled into a single L1 transaction when settled into the Ethereum base layer. The rise of both optimistic rollups and ZK rollups manifested that rollups can act as scaling mechanisms for processing thousands of transactions, lowering transaction costs, and achieving higher TPS. Essentially, L2 rollups allow Ethereum to leverage every rollup as a sharded execution environment and provide horizontal scaling solutions to Ethereum. This has drastically reduced the complexity of sharding and simplified sharding design in Ethereum since it no longer needs to slice and dice the smart contract’s execution and computation. However, Ethereum is not built in to support these rollups. With a large number of transactions settled into L1, rollups have to rely on calldata to send large volumes of data back to Ethereum. Calldata is stored on the blockchain permanently, at least in the current Ethereum design. Limited space in Ethereum creates block space scarcity issues, which drives up the transaction cost further. Data sharding is the original design for exponentially increasing the blockchain space and lowering the transaction cost. The idea is to have 64 data shards and coordinate through the Beacon Chain. Complexity increases when dealing with cross-linking transactions across shards. Implementing such a data-sharding mechanism may take quite a long time since it involves extensive protocol-level changes. As billions of crypto assets are locked on Ethereum, the public may trumpet the economic security of Ethereum security as the top priority among all improvements and enhancements. Ethereum may implement full data sharding eventually. But for the short term, to meet block space demands, it continues to focus on vertical scaling
solutions to gradually increase the block size. Proto-Danksharding was an earlier attempt to increase the block size from around 64K to 1-2 MB. The full implementation of Danksharding will further increase the block size to 32 MB, which is considered an essential step in scaling Ethereum up to 100K TPS. As we discussed in the Scaling Ethereum section, solutions with a purely larger block size increase hardware and bandwidth requirements, which, in turn, is detrimental to decentralization and makes the network less secure. To address such issues, proposer builder separation (PBS) has been placed in the roadmap to allow for a more centralized pool of builders with more advanced computing power and faster broadband bandwidth to build the large blocks. A separate set of proposers is solely responsible for proposing the blocks. The idea behind such a design is that the network is secure so long as the validators are decentralized. To make it easier for every node to validate the blocks, regardless of whether they’re staked as validators or just as network participants, we need to make validity proof simplified and reduce the need for the full dataset to be downloaded for validity proofing. KZG commit was proposed as the commitment mechanism to ensure data is available for sampling. Data availability sampling was proposed to ensure data is available when needed, without you having to download the full blockchain dataset and therefore minimize the reliance on broadband bandwidth. Verkle tree was proposed to simplify validity proof, without the need for a full dataset. The following diagram shows how all these proposals fit into the modular architecture of Ethereum:
Figure 5.6 – Proposals for scaling Ethereum Scaling Ethereum is like a complex puzzle, with each of the proposals helping Ethereum move to its end game. In the next subsection, we’ll review how Ethereum plans to finish implementing its rollup-centric roadmap.
Overview of the post-merge Ethereum roadmap The rollup-centric Ethereum roadmap is much more simplified and drastically different than the original roadmap. The following diagram shows the different phases or stages in the latest Ethereum roadmap:
Figure 5.7 – Rollup-centric Ethereum roadmap Let’s briefly go over each phase and discuss the key building blockchains and goals Ethereum is trying to achieve: The merge: This is the first phase in the new roadmap and intends to create an ideal, simple, robust, and decentralized PoS consensus. The majority of the tasks in this phase were already completed as part of the merge of Ethereum 1.0 and Ethereum 2.0. In Chapter 2, Ethereum Architecture and Ecosystem, we discussed how the Beacon Chain works, and how Ethereum transitioned from PoW to PoS consensus. Stake withdrawals and various improvements leading to single-slot finality still need to be accomplished. The surge: This phase intends to support the L2 rollups and address scalability and throughput to achieve 100,000 TPS or beyond. The focus in this phase is to implement a data sharding solution through Danksharding and data availability sampling and achieve full rollup scaling. This is a step-by-step approach where implementing EIP 4844, an intermediate step that increases blockchain capacity and further reduces transaction costs for L2 rollups, also call Proto-Danksharding, will be the first attempt of a full data sharding solution. We will discuss sharding and conclude Ethereum scaling solutions in the next section.
The scourge: The scourge is a newly added stage that intends to improve the consensus protocol through PBS and avoid or minimize the risks of the maximal extractable value (MEV). They were originally scheduled further out in the splurge phase, but due to the merge, in which the validators were selected upfront in creating new blockchains, and increased centralization potential, thanks to demands in terms of computation, hardware, and network bandwidth in block creation, separating the roles of the block builder and the block proposer help the system improve decentralization and security. We will discuss PBS and MEV in the Discovering MEV and PBS section after discussing sharding and data availability. The verge: The verge intends to implement the Verkle tree, as well as transition the account states from the Merkle tree to the Verkle tree. The goal with the verge is to simplify the validation process on the blockchain and enable full SNARKed Ethereum through zkEVM and SNARK for Verkle proofs. The purge: The purge stage is reserved for historical data pruning and technical debt reduction. This includes the implementation of EIP 4444, which allows 1-year or older historical data to be purged. Various improvements and simplifications of EVM were proposed in this phase, as well as the implementation of state expiry specifications. We will discuss zkEVM in detail and briefly touch on EVM improvements in the zkEVM and EVM Improvements section. The splurge: This is a catch-all phase that’s reserved for fixing everything else, including various EVM improvement tracks, account abstraction, and the implementation of EIP 1559 for reforming Ethereum fee mechanisms. One notable one is the implementation of the verifiable delay function (VDF), which we briefly mentioned in Chapter 4, EVM-Compatible Blockchain Networks, when we provided an overview of the Solana blockchain. We will discuss EIP 4337 and account abstraction in Smart contract wallets and account abstraction section. But beyond that, we will not cover much about this in this book. If you wish to learn more, take a look at the Ethereum Foundation website.
In the next few sections, we will dive into the surge, scourge, and verge phases and discuss the latest developments and research in the Ethereum ecosystem before helping you understand the big picture and the end game of the Ethereum blockchain.
Sharding and data availability sampling As we explained earlier, sharding is the major focus in the Ethereum community post-merge. It is considered the final piece of the puzzle in overall Ethereum scaling solutions. With the rollup-centric roadmap, Ethereum has scaled back from full execution and data sharding to a muchsimplified data sharding solution. In this section, we will dive deep into Ethereum’s approach to data sharding. Sharding is not a new concept – it is a common scaling technique in distributed systems. It has been implemented in a variety of distributed database systems, from RDBMSs to many modern big data NoSQL databases. Essentially, sharding is a particular method for horizontally partitioning large datasets within a database. More specifically, the database is broken into little pieces called shards that, when aggregated, form the original database. In the following screenshot, we can see that one large dataset can be sliced horizontally into two or more partitions, and each partition may be stored in separate database instances:
Figure 5.8 – Database sharding In a decentralized blockchain network, the network consists of a series of nodes connected in a peer-to-peer format, with no central authority. As is the case with the current blockchain system, each node stores all the states of the network and processes all of the transactions. To match transaction scalability and throughput from Mastercard or Visa, Ethereum has to process large volumes of transactions much faster. L2 rollups may have taken care of the complexity and speed of computations and smart contract execution, but a huge amount of transaction data needs to be stored on the blockchain. One logical way we discussed earlier is to increase the block size. But this may not be able to fundamentally address the scalability issues since every node still needs to validate and verify the large block of many more transactions. They will also need more computing resources to handle that much load too. That is where data sharding was originally considered. Instead of creating a larger physical block, which was considered detrimental to decentralization at that time, what if you were to slice them into different smaller blocks, as shown in the following diagram, and have smaller blocks stored in the subset of network nodes? Similar to database sharding, to find all the transactions and verify any particular transactions, you have to aggregate the transactions from all the shards for
that period. When you aggregate all the transactions, you will have a virtual large block. Figure 5.9 shows this data-sharding concept:
Figure 5.9 – Intuitive view of blockchain data sharding The initial design of Ethereum data sharding follows the same rationale as the Ethereum merge, where the Beacon Chain coordinates the PoS consensus within the Beacon Chain and the executions through the execution client. The Beacon Chain attaches the Ethereum 1 data block to the beacon block and progress chain. In the case of data sharding, as shown in Figure 5.10, a large amount of transaction data is split into 64 sharded blocks, which are then linked to the beacon block. Each shard forms the
blockchain of sharded blocks. The consensus of sharded blockchains is maintained through the subcommittee of validators, which is randomly selected by the Beacon Chain:
Figure 5.10 – Initial Ethereum data sharding design It looks like a simple concept, but complexity arises when dealing with cross-linking transactions across shards. Sharded chains may not be able to reach consensus at the same pace as the Beacon Chain since each sharded chain has a validator pool. All the data being permanently stored on-chain will create other issues, which may lead to inevitable centralization. Ethereum has implemented a much-simplified sharding concept called Proto-Danksharding, which will turn into Danksharding. ProtoDanksharding is considered the practical step and prerequisite to implementing Danksharding. Full Danksharding may take years to come to fruition, but Proto-Danksharding is at full speed and will be implemented with EIP 4844 in mid to late 2023. The idea behind Proto-Danksharding and Danksharding is that L2 transaction data doesn’t have to be stored on-chain forever. For L2 optimistic rollups, once the fraud-proof period is over, transactions are
considered final. For L2 ZK rollups, transactions can reach finality much faster. As time goes on, it becomes more and more unlikely for finalized transactions to flip, and the chance of finalized transactions becoming fraudulent becomes slimmer. From a separation of duties perspective, the L2 rollup must interpret and validate the transaction data. Although data is submitted to L1 through calldata, Ethereum L1 doesn’t validate the transaction data itself; all it does is make sure it follows the L1 protocol when formatting calldata. Since Ethereum is pivotal to a rollup-centric roadmap and targeting L1 as the final security and settlement layer, using a sharded chain for shards may be overkill. These sharded blocks don’t have to form a blockchain. This simplifies the design of sharding. It only needs temporal storage to hold L2 transaction data and make it available when required. Figure 5.11 shows the concept of Danksharding and Proto-Danksharding:
Figure 5.11 – Temporal blobs in Danksharding and Proto-Danksharding Therefore, instead of providing more block space for transactions on L1, Danksharding, as well as Proto-Danksharding, provides more temporal space, as blobs of data, to allow L2 transactions to be held for a shorter period. It is like a sidecar running alongside the blocks on the Beacon Chain. The goal is to provide about 1 MB of storage through ProtoDanksharding, and then further expand to 16-32 MB with the full Danksharding implementation. This is around a 16 times increase from the
current 50-100K calldata space with just Proto-Danksharding. The Ethereum protocol itself does not attempt to interpret it, nor need to validate it. All L1 needs to do is check that the blob is available and make sure it can be downloaded from the network. The blob data in these blobs can be used by L2 rollups that support highthroughput transactions. After a certain time, it will expire and be purged from the network. This may seem counterintuitive since one of the blockchain principles since the beginning of Bitcoin is to have transactions on the chain permanently that can be verified up to the genesis block. Ethereum’s view is that, so long as some entities host the historical blocks somewhere and make them available, you can trace them back to the genesis block when required. There is no need for every node, whether it is a light client or a full node, to download all the historical blocks in the entire chain. This begs the question, what if the entity holding the history of the entire chain is a malicious actor? The shorter answer is that it is not economically possible to manipulate the historical transactions as any manipulation can easily be detected; all the proofs have already been built into the sequence of the chain. The worst case in such a scenario is that malicious actors may not share historical data or extort to access such data.
Proto-Danksharding Proto-Danksharding, also called EIP 4844, is the first step attempt to partially expand storage capacity in the Danksharding design and enable a full Danksharding implementation in the future. It introduced a new transaction type, known as a blob, to enable L2 to carry more transaction data to L1, instead of using the calldata data type. The L2 transaction that carries such a blob transaction type is also called a blob-carrying transaction. It is future-compatible and expected to be used in the full sharding solution. Figure 5.12 shows the high-level design of ProtoDanksharding:
Figure 5.12 – Design of Proto-Danksharding In such a design, L2 rollups need to send batched transactions as blobcarrying transactions using the new blob transaction type, as well as the KZG commitment hash to the Ethereum base layer. As we discussed in Chapter 2, Ethereum Architecture and Ecosystem, such submissions from L2 will be picked up by the blockchain proposer designated to that slot. As Figure 5.13 and Figure 5.14 show, the block proposer must validate the KZG commitment using verification rules. It separates the blob data and adds KZG commitments together with other beacon block data before sending it to the execution layer, EVM, for smart contract execution and computation. Once the Eth1 block is generated, the block proposer packages the Eth1 block within the beacon block, attaches blobs to the beacon block, and publishes the beacon block to the network. The blob is carried as the sidecar and is made available when the validators or clients need to access the blob-carrying transactions. Blob data, which is stored alongside the Beacon Chain, is still available for the block validator at this stage. Blob data can be available for L2 rollups when they need to submit fraud proof or validity proof. In both cases, the Beacon Chain allows L2 to check if data is available via KZG commitments. The following figure shows how fraud proofs work in L2 optimistic rollups. Any node at L2 challenging the
transactions will submit the transactions in question as part of the calldata. On the beacon side, the blob verification, which was added with ProtoDanksharding, will check the original KZG commitments to determine if the transactions in question are available:
Figure 5.13 – Proto-Danksharding transaction processing in optimistic rollups In the case of ZK rollups, ZK proof of the transactions is submitted alongside the original batch, as shown in the following diagram. The block proposer uses the point value evaluation precompile to verify the proofs. The point value evaluation precompile was also added to ProtoDanksharding:
Figure 5.14 – Proto-Danksharding transaction processing in ZK rollups Compared to calldata, a Proto-Danksharding blob is a low-cost solution for on-chain data storage. They are designed to be the temporary caches of L2 transaction data that will expire after a certain period. They are extremely large (~125 KB) and can hold up to 1M L2 transaction data. The pure expansion of such storage will help Ethereum improve its transaction throughout so that it’s 16 times that of its current throughput. In the Proto-Danksharding phase, validators and clients still have to download full blob content. Data bandwidth in Proto-Danksharding is targeted at 1 MB per slot, with 2M being the maximum. With each blob being 500 KB in size, Proto-Danksharding allows you to submit up to four blobs in each batch submission. To be future-compatible with data availability sampling when full Danksharding is implemented, the sidecar design with blob data allows you to check data availability through the is_data_available() function, although it still relies on the full data to check availability in the Proto-Danksharding phase.
Danksharding Full data sharding, through Danksharding, may take years to come to fruition. Most on-chain mechanisms will be already in place after ProtoDanksharding. Danksharding will further expand the blob space to a minimum of 16 MB and a maximum of 32 MB. With each blob being 500 KB in size, full Danksharding will allow you to submit up to 64 blobs in each batch submission. Execution clients or rollups don’t require many changes to adopt Danksharding, but due to the large block space, significant changes need to be made in the consensus layer to make the large blobs work. One such challenge is data bandwidth requirements. Since the block proposer and validators still need to download the data to create the blocks and attest them, much faster network bandwidth will be required to download the data and calculate and execute it within the defined slots, which, in turn, could lead to centralization. To address such a challenge, Danksharding introduced the data availability sampling concept so that you don’t have to download the entire dataset for the validators. Once the block is created, the validators attesting the candidate block only need to know if the transactions are available. The idea behind this is that instead of storing large blobs as sidecars on the consensus layer, with Danksharding, they are split into slices, and all the little slices are spread across 64 shards. 2D sampling at the consensus layer will start such sampling on top of the shards to check data availability when they’re inquired. In addition, Danksharding also introduced the PBS concept to separate the role of block creation from block proposing. With such a design, the block builder is responsible for validating KZG commitments, storing blobs as sidecars on the Beacon Chain, performing smart contract execution and state transition, and building and packaging the candidate blocks for the block proposer to propose the candidate block on the Ethereum network. We will discuss PBS alongside MEV in the next subsection.
Data availability sampling To understand data sharding and data availability sampling, we need to study how data is sliced and stored in the Ethereum network. One intuitive approach is to evenly slice the entire dataset and put what remains into shards. Individual nodes on the network only keep a small number of slices of such data. The issue with this approach is that if any node is withholding such a slice or is too slow to respond to the availability inquiry, the Ethereum network can’t progress as expected. Instead of putting the same slice into many nodes on the network, Ethereum resorts to the erasure coding technique to ensure data loss can be prevented, so long as half of those nodes can provide the individual slices they have held. Erasure coding is a common storage technique that prevents data loss when data is broken into fragments, expanded, and encoded with redundant data chunks and stored across a set of storage media. In the event of storage damage, data can be recovered from original and expanded fragments. As shown in Figure 5.15, the original data is broken into three fragments, and through polynomial commitments, each is expanded into two additional chunks. Polynomial commitments ensure any three or more chunks out of the nine fragments can be used to recover the original data:
Figure 5.15 – Erasure coding for data loss prevention The same technique is used to break the data blobs and create imaginary redundancy. Figure 5.16 shows how erasure coding is used in Danksharding. Let’s say that up to 64 blobs are submitted in one batch L2
submission. The transaction data is broken into 64 shards, with roughly one blob per shard. Each shard is broken into up to 1,024 chunks, with each chunk being roughly 500 bytes. For each chunk, the consensus layer uses the polynomial commitments to create an imaginarily redundant chunk as an expansion of the original data blob. For both original and imaginary chunks, the consensus layer creates a KZG commitment and attaches the commitment to the data slices. At the end of this process, you could see up to 131,072 chunks (64 shards times 2,048) ready to be scattered on the Ethereum network. This happens per slot base, as the block proposer creates the candidate block:
Figure 5.16 – Erasure coding in Danksharding To spread those chunks on the Ethereum network, Ethereum uses the validator subcommittee concept, where a randomly selected validator is assigned the duty of block creation and 128 randomly selected validators form a subcommittee to validate the candidate blocks. With a full data sharding implementation, the idea is to slice and dice the entire validator network and line it up for the duty of storing and attesting small slices of original or expanded transaction data.
At the beginning of each epoch, in addition to randomly selecting 32 block proposers and a subcommittee of 128 nodes to progress the chain, with one proposer and one subcommittee per slot, the RANDAO process also creates 2,048 vertical committees and 2,048 horizontal committees. At the per slot base, there will be 64 horizontal committees so that each shard gets one horizontal committee to attest the publication of slices for the given blob. 2,048 vertical committees will attest the slices for a given chunk, and each of those 2,048 chunks is assigned one vertical subcommittee. During each slot, a proposer is selected for each shard. Each proposer is entitled to propose the blob, which includes those 1,024 chunks of original data, as well as the 1,024 chunks of expanded data, as well as extra proofs to allow each part of the blob to be verified independently. The following diagram shows how all vertical and horizontal committees are established and how they work together to scatter the chunks across the network:
Figure 5.17 – Blob publish process in Danksharding When an L2 transaction is submitted, the block proposer distributes the blobs to the shard proposers – one shard proposer and one blob. The shard proposer publishes the blobs to the appropriate horizontal subnets, along with the proof of all chunks. The other participants on the horizontal subnet publish the chunks to each vertical subnet that they are in. Each participant in the vertical subnet verifies the proof and attests to the validity of the data chunk. If more than half of the vertical subnet agrees on the chunk, consensus on the chunk is reached. To support L2, either for fraud proof or validity proof, those chunks can be used to check data availability through sampling. This is a process where, through enough samplings, evidence of data availability can be achieved via the proof of polynomial commitments. To put this all together, Danksharding’s final form could look as follows:
Figure 5.18 – Full picture of data sharding
The design could change since the actual implementation of full Danksharding is still years away. Without PBS, this concept may work. But the risk of this approach is that it may potentially lead to validator centralization since computing and validating large transaction sizes will deter the participation of average node operators. It also highlights MEV issues and exacerbates the risks of MEV. We briefly touched on MEV in Chapter 2, Ethereum Architecture and Ecosystem. In the next section, we’ll look at MEV in detail and review how PBS works, in conjunction with data availability sampling for a full Danksharding implementation, and how it addresses the challenges and risks of MEV.
Discovering MEV and PBS As you saw in the latest Ethereum roadmap, the scourge phase plans to implement PBS and address the systematic risks of MEV. We will start by providing an overview of MEV before discussing how to implement MEV in Ethereum and PBS.
Overview of MEV In a distributed system like blockchain, without a centralized entity managing the order or transaction flow, arbitrage opportunities always exist. Earlier in this chapter, we talked about the CAP theorem. With the existence of network partition, a distributed system has to decide whether to prioritize consistency or availability. This applies to decentralized networks too. Due to the decentralized nature of blockchain, data may not be consistent, although it is considered consistent eventually. Certain data may not be available until the transaction is considered final. Both perspectives create the opportunity for arbitrage and exploitation of the value. You may recall from Chapter 1, Blockchain and Cryptocurrency, that in a PoW-based blockchain system, all transactions that are submitted are added to the mempool, at which point the miner races to mine the blocks and collects the transaction fee and block rewards. There is a delay between the time the transactions are broadcasted to the network and the time a miner
picks up the transactions from the mempool and starts mining the blocks. The miner has the leeway to include or exclude certain transactions or issue competing transactions to achieve the maximum profits from processing all those transactions. Since the mempool is public and transparent, any actor can submit competing transactions to take advantage of the price movement and capture the value from the mempool. This leads to the initial definition of MEV, miner extractable value. This rarely happens to the Bitcoin blockchain since transactions in the Bitcoin mempool are just simple payment transactions. Due to smart contracts and DeFi, this became noticeable and disruptive in Ethereum 1.0. MEV became popular in early 2021 when the Ethereum gas price skyrocketed. This high gas price propelled all kinds of strategies for identifying arbitrage opportunities and defining the solutions to maximally extract the profits from available transactions. At the earlier stage of blockchain, network participants relied on the transaction fee and block rewards to run and operate the miner to progress and secure the blockchain. This is a healthy incentive to help sustain the blockchain network. However, as the total value locked (TVL) into the crypto assets (Ethereum, in particular, due to DeFi), MEV can potentially create existential risks to Ethereum, as per the Flashbots report in 2020 (https://docs.flashbots.net). Ethereum 2.0, moving to PoS, didn’t alleviate these MEV issues and risks. With the Beacon Chain and PoS, block proposers are randomly selected at the beginning of an epoch, which creates opportunities for selected block proposers to be targeted for extracting maximal profit. With more DeFi products and constructs developed on the newer merged Ethereum, as well as the rise of various L2 rollups, this exacerbated the risks and created more abundant opportunities. This was manifested by the various yield farming mechanisms, which we discussed extensively in Chapter 3, Decentralized Finance. To understand why that is the case, let’s explain where the profit comes from using one simple use case. Let’s say someone is submitting a large buy transaction of ETHs. This will cause the ETH price to become higher if it goes through. The miner can issue the same buy transaction and hope to
purchase the ETHs before the price moves up. The price difference will be the profit or the value that the miner will capture. In general, there are three patterns in the MEV solution: Front-running: This is a process where the miner submits a competing transaction and gets in front of those unconfirmed transactions in the mempool. For example, if the miner or any third party discovers a large buy position for ETHs in the mempool, they can submit the same buy transactions with a slightly high transaction fee. The new transaction will be executed ahead of the original transaction since it has a higher transaction fee. Sandwich-running: This is a process where the miner submits a pair of buy and sell transactions before and after those unconfirmed transactions in the mempool. Using the same example that we did for front-running, if the miner or any third party discovers a large buy position for ETH in the mempool, they can submit a buy transaction with a slightly higher transaction fee and a sell transaction with a slightly lower transaction fee. The original transaction is sandwiched between the new buy and sell transaction. Back-running: This is a process where the miner submits an opposite transaction after those unconfirmed transactions in the mempool and hopes to make a profit after the price movement. For example, if the miner or any third party discovers a large buy position for ETHs in the mempool, they can submit a sell transaction with a slightly lower transaction fee. The new transaction will be executed after the original transaction. We discussed yield farming in Chapter 3, Decentralized Finance, which could be leveraged as the strategy for MEV. Let’s say you are arbitraging across decentralized exchange (DEX) using the yield farming protocol. As we discussed in Chapter 3, Decentralized Finance, most DEXs, such as Uniswap and Sushiswap, implement automated market maker (AMM) algorithms to determine the buy and sell prices of underlying crypto assets. The price of the same crypto tokens may differ significantly from one DEX to another. You could profit from the price difference by submitting a buy
transaction to buy the crypto assets at a lower price and submitting a sell transaction to sell them at a higher price. Another scenario is the liquidation process, where the liquidated asset can be offered at a discount. The miner or validator can buy the liquidated assets and sell them in front of the buy transactions to make a profit. For most L2 rollups, transactions are processed in almost real time and batched for settlements on L1. The time delay in settlements could also create arbitrage opportunities.
MEV implementation in Ethereum MEV was kind of like the wild west before the merge of Ethereum 1.0 and Ethereum 2.0. According to Flashbot, in Ethereum 1.0, 90% of miners used a slightly modified geth client to interact with the Flashbot MEV capture module and extracted maximal value from the mempool. Such a client is also called a mev-geth. Due to the large concentration of mining pools, MEVs were largely taken by the largest mining pool operators. To mitigate the risks of MEV, Flashbot took the concept of mev-geth and developed an MEV Boost component, which acts as a sidecar to the Beacon Chain to help post-merge Ethereum and democratize access to MEV values. This architecture makes the roles in the MEV supply chain clearer. The MEV boost architecture is shown in Figure 5.17:
Figure 5.19 – MEV Boost in Ethereum
To understand the MEV Boost architecture, first, we’ll go through the MEV value chain. When the transactions are submitted to the Ethereum network, they are all put into the mempool. At each slot, a designated block proposer collects all the transactions, makes the block, and proposes the blocks to the network for validation. Someone needs to go through all the transactions, determine if there are any values left on the table, and submit MEV transactions to capture the value. A block builder needs to include all MEV transactions in the block. Someone needs to ensure the consensus client, as well as the designated block proposer, knows there is one MEV-carrying block out there. To do so, the MEV Boost architecture formalized the role of the MEV searcher, block builder, and MEV relayer. An MEV searcher is responsible for finding profitable transactions and sending them to the block builder for inclusion in a block. If new transactions need to be created to capture the values, the searcher creates and submits MEV transactions to the block builder, instead of passing through the public mempool. The block builder aggregates both MEV transactions and original public mempool transactions and sends the bundle to the MEV relayer to directly communicate to the consensus client and the designated block proposer for that slot. The MEV relayer validates the transaction bundles and then passes them to the designated block proposer for inclusion in a block. There are multiple block builders and MEV searchers on the Ethereum network. Any searcher can connect to any builder to have their identified MEV transactions included in the transaction bundle. In the same way, there are multiple relayers on the network to help relay the bundles to the consensus client. The designated proposer evaluates all incoming bundles and chooses the bundles with the most MEV values. It sends the block with such MEV transaction bundles to the validators to attest and sign.
Proposer builder separation PBS is a necessary step when implementing Danksharding. As we discussed in the previous subsection, as the blob size expands to 16 MB at a minimum, and 32 MB at a maximum, Danksharding drastically increases the requirements of both computing and network capacity, therefore raising
the bar for becoming a validator. Another consideration is the MEV issue. As we discussed in the MEV implementation in Ethereum section, postmerge, Ethereum puts the block proposer in a position to leverage more sophisticated MEV extraction strategies to determine what transactions to include, exclude, or add for the proposer to capture the maximal profits. The complexity of these strategies propels block proposers to outsource the complex block-building and MEV extraction computations to a much smaller pool of high-performance computing operators. The consequence is that both issues will make the network less decentralized. PBS is a design concept or architecture pattern that splits the block builder roles from the block proposal role. The block builder is responsible for determining the orders of transactions, building the execution block bodies, and bidding the new block bodies to the block proposer. The block proposer simply selects the highest bid and proposes the new block to the network. This design concept is not new. In the current design, MEV Boost implements a similar approach, where the proposer relies on the relayers to solicit the builders to create transaction bundles with maximal MEV values. The relayers become the brokers between the block proposer and the builders. What is different in the future PBS design is that it turns the blockbuilding process into an auction, where the specialized actors with highperformance computing capacities can bid and earn the rights to build the blocks. With this design, the MEV supply chain is simplified. The relayers aren’t needed, and the protocol becomes the broker between the proposer and the builder. Regular validators only need to accept the highest bid. PBS is still in the design stage at the time of writing (January 2023). Together with data availability sampling, after full Danksharding is implemented, Ethereum will be able to address the challenges and issues around scalability and be able to achieve 100K TPS. Only the block builders need to process the entire block. All other validators and users can verify the blocks efficiently through data availability sampling.
zkEVM and EVM improvements
At the heart of Ethereum is EVM, which acts as the engine that powers the entire Ethereum network. It is the runtime execution environment for smart contracts and the blockchain. Quite a few lower-level protocol improvements have been proposed, including some at the EVM opcode level. There are many discussions regarding account model improvement, including further account data abstraction. If you’re interested, you can check out the Ethereum EIP site for details (https://github.com/ethereum/EIPs/issues). In the original Ethereum 2.0 roadmap, eWASM, the Ethereum version of WebAssembly (WASM), was considered the next generation of the EVM. WebAssembly is a W3C standard that defines the binary instruction set for a stack-based virtual machine. It intends to compile high-level languages such as C/C++/Rust and deploy them on the web similar to the JavaScript engine inside a web browser. It is supported by all four major browsers – that is, Safari, Chrome, Firefox, and Edge. The goal is to be able to execute the code next to the metal and take advantage of common hardware capabilities available on a wide range of platforms. The eWASM specification defines a subset of WASM components to be supported by the newer EVM. Since blockchain requires deterministic behavior from smart contract execution, non-deterministic features in WASM were restricted. This also includes several system smart contracts that provide access to Ethereum platform features. It intends to provide improved EVM performance and make Ethereum a true world computing platform. As we discussed in Chapter 4, EVM-Compatible Blockchain Networks, EVM, as the smart contract execution environment, has been adopted by many L1 blockchains too. Each provides some customization and improvements on top of EVM and made it suitable for compatible chains. The popularity of L2 rollups, especially ZK rollups in scaling Ethereum, also propelled changes in the EVM roadmap. Post-merge Ethereum post is gearing toward supporting ZKP and getting zkEVM ready as the generalpurpose ZK implementation of the EVM. It is not clear where eWASM is in the future of Ethereum. Some still believe that eWASM is the future of the EVM. If you’re interested, you can
check out the eWASM GitHub site for design rationales and project status (https://github.com/ewasm/design). For the rest of this section, we will focus our discussion on zkEVM and help you understand the types of zKEVM implementations on the market and how general-purpose zkEVM works in concept.
Overview of zk-SNARK In Chapter 2, Ethereum Architecture and Ecosystem, we briefly introduced ZK and ZKP as we discussed how ZK rollups work. A ZKP is a cryptographic approach that allows the party, as the verifier, to verify a claim made by another party as the prover, without the need for the prover disclosing any secrets. There are two types of ZKP: interactive and noninteractive. The most used one in the blockchain is called Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARK). At a high level, as its name suggests, zk-SNARK allows the prover to generate a short proof from its knowledge of a solution to a problem. In the case of Ethereum, the solution could be the block data or the state transition during block formation. The verifier can then use the proof to quickly verify the solution without the prover sharing its secret about the solution. ZCash, which we mentioned in Chapter 1, Blockchain and Cryptocurrency, is the first L1 blockchain to leverage ZK-SNARKs to address blockchain privacy issues, where zk-SNARKs are used to prove ownership of crypto-assets without revealing the owner’s public key or identity. In mathematical terms, zk-SNARK works as follows. Let’s say that a function maps A to B – that is, f(A, A ∈ R n) = B. If two parties know how the function works, one can compute the function with A as input and receive B. In other words, the statement about the equation is true. Whoever wants to verify whether the computation is right can simply do the computation itself, so long as it can get hold of A. That is how the PoW and PoS work today. Let’s say both parties don’t want the third party to know the input; they can introduce a secret key, S, and share it between the two parties. The mapping becomes as follows:
f ′ (A, A ∈ R n, S) = B ′ (S) When one party computes the function with A and the secret, S, and the result comes out as B ′ , the other party can verify the computation so long as S is shared. No other third party would know how the computation is done without the secret. However, this still creates an issue where both parties need to share the secret. To avoid disclosing the secret, we can preprocess the input and generate a pair of secrets. At this point, one party can use one of the pair to do the computation, while the other party can use the other secret of the pair to verify it. The following represents such mapping: Preprocessing: P(A, A ∈ R n) = { S P, S V} Computation: f ″ (A, A ∈ R n, S P) = B ″ (S V) In this way, both parties can prove and verify the computation, and there is no need to disclose any information to the other party. The downside of this approach is that it requires the preprocessing stage to use the input to generate the pair of secret keys. The key that’s used by the prover is called the proving key, while the key that’s used by the verifiers is called the verification key. To understand how zk-SNARK works, let’s understand some basic concepts and building blocks in a ZKP system: Prover and verifier: These are the two main roles in the ZKP system. The prover is the party who can prove a statement without disclosing the secret to other parties, while the verifier is the party who can verify the proof from the prover without any knowledge of the original secret. ZK circuit: There are two types of ZK circuits in the ZKP system. One is the arithmetic circuit, while the other is the Boolean circuit. EVM, in its current state, is a Turing-complete state machine, which may be efficient at dealing with operations such as NAND and NOR, as
well as memory. However, it may not be effective in processing polynomial commitments in ZKP. An arithmetic circuit performs arithmetic operations, such as addition and multiplication, on the inputs and produces an output using a directed acyclic graph (DAG) representation of a polynomial. In zk-SNARKs, the arithmetic circuit is used to perform mathematical operations on secret keys so that the prover and the verifier do not reveal their keys to the other parties. The following diagram is an arithmetic circuit of a polynomial, x 1 * x 2 * (x 2 + x 3) 3 *( x 3 + 1):
Figure 5.20 – Arithmetic circuit of a polynomial On the other hand, a Boolean circuit performs AND and OR operations by using logic values such as true and false on the inputs to produce an output.
zk-SNARK uses the Boolean circuit to create a logical representation of a statement to be proven in a ZK proof. Rank-1 constraint system (R1CS): As we discussed earlier, an arithmetic circuit is a polynomial expression that computes a function from its inputs. In general, an arithmetic circuit can be represented in many different ways. One such representation is the R1CS, which presents the arithmetic circuit as a system of linear equations. This can be expressed as a set of vectors and matrices. R1CS is then converted into quadratic arithmetic programs (QAPs). Quadratic arithmetic programs (QAPs): In a QAP, the polynomial is further expressed as a set of quadratic equations. The QAP is then used as a basis for the rest of the zk-SNARK pipeline for generating a proof of knowledge that enables the user to verify the solution without revealing any secrets. Trusted setup, proving, and verification keys: As we discussed earlier, to make zk-SNARK work, we require a trusted setup. This is a pre-processing procedure that generates the proving and verification keys. The proving key is generated using a process called toxic waste, which ensures that the key cannot be used to generate false proofs. The prover then uses the proving key to generate proof of knowledge. Similarly, a verification key is used by the verifier to verify the proof generated by the prover. The verification key is derived from the proving key and is publicly available. The proving key and verification key are like the private and public key pair in the public key infrastructure (PKI). Now that you understand the concept of ZKP and the building blocks of zkSNARKs, let’s discuss how they are used in different zkEVM implementations.
Types of zkEVM implementations A zkEVM is an EVM-compatible smart contract environment that supports proofs and verifications through ZK technologies. The following figure
shows an ideal zkEVM stack on the right that is compatible with Ethereum’s infrastructure, tools, and languages:
Figure 5.21 – Comparison of a non-ZK EVM and a fully ZK-compatible EVM There are several zkEVM implementations in development or on the market. We mentioned several zkEVM implementations when we discussed ZK rollups in Chapter 2, Ethereum Architecture and Ecosystem, including Loopring, Starkware, zkSync, and Polygon. Loopring uses specific circuits to support simple payment and asset transfer transactions through its ZK
rollups. Starkware, zkSyn, and Polygon all have their implementations of zkEVMs. In addition, Scroll, a new player in the L2 ZK rollup space, is working on its own general-purpose zkEVM implementation. In general, these zkEVM implementations are categorized based on how much they are compatible with the EVM stack and the Ethereum infrastructure. The following are the different categories of zkEVMs, according to Vitalik: Type 1 – fully Ethereum-equivalent: This is fully compatible with existing Ethereum tools, development environments, and Ethereum’s infrastructure. All smart contracts in Solidity and developer tools work seamlessly and flawlessly. Ethereum has been developing ideas about such zkEVMs from its research. If you’re interested check out the zkEVM specifications at https://github.com/privacy-scalingexplorations/zkevm-specs. Type 2 – fully EVM-equivalent: This is EVM-compatible, although it may not be fully compatible with the Ethereum infrastructure. Similar to type 1, smart contracts in Solidity and developer tools continue to work seamlessly and flawlessly, but it may introduce low-level changes to optimize the proof and smart contract executions in the ZK environment. Scroll is a general-purpose zkEVM that is fully compatible with EVM and its infrastructure. We will use Scroll to discuss how a general-purpose zkEVM works. Some claim that Polygon Hermez falls into this category too. Type 3 – almost EVM-equivalent: With this type of zkEVM, you start to see some incompatibilities with the standard EVM. Smart contracts may need to be rewritten or changed to be run properly in this type of zkEVM. Some of the opcodes may not be supported. Some of the development tools may also not be supported due to such compatibility issues. Polygon Hermez may fall under this category too since it introduced some non-compatible micro-opcodes in its own zkEVM implementation. Type 4 – high-level language-equivalent: This type of zkEVM introduced a new smart contract language. Some of them may introduce translation tools to translate existing Solidity code into a new language
and claim to be Solidity-compatible. But to understand how the internals of such a zkEVM works, you have to understand the new language, its intermediate representations of the smart contract code, as well as potentially all those new sets of opcodes. Existing smart contracts will likely need to be rewritten and common Ethereum developer tools may not be supported. Starkware falls under this category. It introduced a new smart contract language called Cairo and a bytecode-level language called Cairo Assembly. ZKSync 2.0 claims to be EVM-compatible, except for a couple of already deprecated opcodes and one soon-to-be-deprecated opcode. It also introduces its own language in Zinc for smart contract development. In addition, Vitalik also has a 2.5 version. Some types of zKEVMs are similar to type 2 but may choose to optimize the gas cost for zkEVM executions and ZK proofs. The following table summarizes where those known zkEVM implementations belong: Type 1 – fully Ethereumequivalent
Type 2 – fully EVMequivalent
Scroll
X
Polygon
X
Type 3 – almost EVM-equivalent
Type 4 – high-levellanguage-equivalent
X
ZKSync 2.0
X
Starkware
X
Table 5.1 – Categories of existing zkEVM implementations We will use Scroll to discuss how zkEVM works in the next subsection.
Workings of zkEVM
Scroll is an EVM-equivalent L2 ZK rollup for scaling Ethereum. Similar to the ZK rollups we discussed in Chapter 2, Ethereum Architecture and Ecosystem, Scroll is built upon three major pieces. One is the sequencer, which collects the L2 transactions, executes smart contracts, and creates blocks on L2. As a result of block formation, execution traces are generated. The coordinator is the middleman here and collects the execution traces and randomly selects one of the rollers from the roller network to generate the proofs. The core piece of the Scroll L2 ZK rollup implementation is its implementation of zkEVM, which is used to prove the correctness of EVM execution in L2. The roller creates the witness and commitments based on the execution traces and recursively generates proofs for each block using EVM circuits. Once the proof is generated for all the blocks in the batch, the coordinator selects one roller to aggregate all the proofs and sends the proof aggregates to the L1 rollup contract as the ZK proof for the entire batch. In this way, those batch transactions from the Scroll rollup can be settled on the Ethereum mainnet. A zkEVM takes the pre-state commitment, smart contract code commitments, and the witness of the pre-state, after-state, and smart contract code to create the after-state commitment and state transition proof. Within an EVM circuit, it recursively uses RAM storage and opcode circuits to complete the proof to prove the state transition from pre-state to post-state is valid. To learn more about how these proofs are generated, check out the Scroll blog: https://scroll.io/blog/proofGeneration. The following figure shows a high-level Scroll rollup architecture:
Figure 5.22 – Scroll zkEVM and ZK rollup architecture The good part of this architecture is that the rollers can execute independently, and the proofs can be generated in parallel. In theory, instead of PoW or PoS consensus, a blockchain system can rely on zk-SNARKs to ensure the blockchain’s integrity and progress of the chain. A decentralized pool of provers can assume the role of the block proposers, and the network nodes can act as the verifiers to verify the state transition. This sounds like a long stretch, but with the technological advancements in both hardware and ZKP technologies, this may someday become a reality. In the next section, we will discuss EIP 4337 and the concept behind smart contract wallets and account abstraction.
Smart contract wallets and account abstraction According to the post-merge Ethereum roadmap, account abstraction is slated to be supported as part of the splurge phase. There are two proposed EIP solutions for account abstraction, where one requires the protocol to
change and the other doesn’t. The solution that requires the consensus layer protocol changes may take much longer to implement and also demand through testing. In the meantime, EIP 4337, which doesn’t require protocol changes, has been implemented to support account abstraction and smart contract wallets. Once EIP 4337 is implemented, you may not have much left to do at the splurge phase in terms of account abstraction. Let’s start by understanding what EIP 4337 is in the next section.
Account abstraction and EIP 4337 As we discussed in Chapter 2, Ethereum Architecture and Ecosystem, Ethereum supports two types of accounts: externally owned accounts (EOAs) and contract accounts (CAs). To interact with Ethereum, users have to create EOAs; only through those EOAs can users send transactions and execute smart contracts on Ethereum. EOAs are controlled by private keys. They are used to pay gas fees and sign transactions or messages. Contrary to the programmability and flexibility of CAs, an EOA’s functionality is very limited. This creates user experience issues. In addition, the loss of these private keys will make the EOAs no longer accessible. Account abstraction is the proposed solution for making smart contract wallets natively supported on Ethereum. A smart contract wallet is a wallet that’s controlled by a smart contract instead of private keys. It can create transactions and interact with Ethereum. It attempts to empower users with the programmability of smart contracts to improve security and privacy and enable better user experience in their accounts. With the programmability of smart contract wallets, users can define their own security rules and recover their accounts when their private keys are lost or compromised. It allows the user to share account security across trusted devices or individuals, which enables social recovery. You can pay someone’s gas fee or batch transactions together. The user experience is greatly improved with the support of smart contract wallets. EIP 4337 makes it much easier to build smart contract wallets and
much safer to use them. With a smart contract wallet, the user does not need to keep track of a private key or seed phrase. Instead, the user can simply interact with the smart contract through a Web3 wallet or DApp. Instead of making consensus layer protocol changes, which would require a hard fork and extensive testing leading up to the launch, EIP-4337 is a new proposed standard for supporting account abstraction without the need for protocol changes. In addition to being able to socially recover a wallet, EIP 4337 also allows multiple operations to be performed atomically in the same transaction. It gives users the flexibility to pay transaction fees with ERC-20 tokens, pay their gas feed, and more. DApp developers have more bells and whistles to offer innovative features – for example, sponsoring gas fees on DApps. In the next subsection, we’ll learn how EIP 4337 works.
How a smart contract wallet works EIP 4377 is an Ethereum improvement that defines smart contract wallets and account abstraction implementation. Figure 5.22 illustrates the key components and transaction flow in the proposal of EIP 4377:
Figure 5.23 – EIP 4337 account abstraction architecture The following components are proposed in EIP 4377: The UserOperation transaction: This was introduced for smart contract wallets to send transactions to the Ethereum network on behalf of a user. It is a pseudo-transaction object that is almost the same as a normal Ethereum transaction but adds more attributes to support the needs of smart contract wallets, including initCode, callData, callGasLimit, verificationGasLimit, preVerificationGas, maxFeePerGas, maxPriorityFeePerGas, and paymasterAndData. The UserOperation mempool: Instead of a public mempool, once submitted, all UserOperation transactions go to a separate mempool called UserOperation. Bundler: The bundler’s role is similar to that of a block builder in Ethereum. Bundlers pick up user operations from a separate mempool and package them into a single transaction as a bundle for creating the block. Nodes on the Ethereum network can choose to act as a bundler.
EntryPoint: EntryPoint is the global singleton smart contract on the Ethereum network and handles the user operations in a bundle. It defines the handleOps function for the bundler to call and execute all transactions. EIP 4337 uses a two-phase approach for transaction execution. The first phase is a simulation that verifies that the signature is correct and that the operation pays fees. If the simulation fails, the transaction is dropped out of the bundle. The second phase is the actual execution of user transactions. If the smart contract account doesn’t exist yet, it allows an account to be created before it’s executed. Smart contract wallet account: All accounts have to implement one "validateUserOp" and one "execute" function for the entry point to call during the execution. The first function is for the simulation. The actual execution is done through the "execute" function. In addition, EIP 4377 defines the aggregator interface for the smart contract wallet to work with aggregated signatures. It also allows smart contract wallets to sponsor transactions for other users through the paymaster component. When the paymaster exists on the account, the entry point executes the user operations according to the logic defined in the paymaster interface. This is particularly useful for DApp developers who wish to attract users and promote user adoption. For example, the paymaster can be used to subsidize gas fees for new users, allow DApp users to pay fees with DApp-specific ERC-20 tokens, and so on. Account abstraction is a fast-moving implementation that attempts to drive the mass adoption of Ethereum and DApps. If you’re interested, check out EIP 4337 site for a more in-depth discussion about its design and implementation (https://eips.ethereum.org/EIPS/eip-4337). In the next section, we will discuss Decentralized Autonomous Organizations (DAOs) and their governance.
DAOs DAOs play a critical role in many blockchain applications and DeFi protocols. Many of the DeFi protocols we discussed in Chapter 3,
Decentralized Finance, rely on DAOs to get the liquidity to jump-start the project and automate governance decisions. In the following sections, we will provide an overview of DAOs, explain how DAOs work, and use the Uniswap DAO to illustrate how the protocol is governed by the leading DeFi protocol.
Introduction to DAOs Traditionally, an entrepreneur or a group of entrepreneurs spots a market opportunity and forms a corporation to address market needs. Typically, starting a traditional corporation requires you to submit a list of required documents. For example, if you want to start a company in the state of Virginia in the USA, you need to provide the name of the company, a business address, a registered agent, the incorporator’s name and address information, the total number of shares, and an effective start date. You need to specify who the officer or directors are. Following that, you need to pay a filing fee, get insurance, and have a bank account. Once it is in full operation, you will need to file annual reports and conduct regular shareholder meetings. A traditional corporation is a vertical, centralized, hierarchical structure, and the organization’s decision is made from the top and disseminated downward through the chain of reporting hierarchy. A DAO is a decentralized organizational structure that uses blockchain, crypto-assets, and Web3 technologies to govern organizational functions. It leverages blockchain and smart contracts to allocate organizational resources, coordinate management activities, and automate governance decisions. Compared to traditional organizations, it requires very fewer formalities in forming a DAO. The founding members and the participants of the DAO are not required to fill out any paperwork to start a DAO. Typically, the founding members, who share similar goals and objectives, form a DAO to address certain market needs or take on certain missions. It differs from the traditional organization model in the following ways: Funding: Traditional corporates rely on a funding source from the founding members. They normally require large upfront investments to
get it funded and started. They may try to secure additional funding sources from investors in exchange for a portion of the company representing the shareholder’s values. This may create a misalignment in organization missions and make governance decisions opaque. On the contrary, a DAO may start with an initial token offering through ICO and any other means, as we discussed in Chapter 3, Decentralized Finance. The funding of DAO is aligned with its governance, which uses the smart contract to prescribe the tokenomics of its tokens in the smart contract. A smart contract defines an incentive mechanism and monetary policy for the supply and demand of their tokens. The tokenomics of the token determine how and when new tokens should be generated or removed from the system. These funding decisions are automated via the smart contract when certain conditions are met. Ownership: The ownership of traditional corporates manifests in the form of shareholder values. In a DAO, token-based ownership is used. The tokens you own give you the right to vote and participate in DAO governance. Structure: Traditional corporates rely on the hierarchical reporting structure to make and disseminate organizational decisions. This makes the organization more effectively aligned with the corporate missions and strategies from the top down. A DAO tends to operate under a flat and decentralized structure since there is no certain decision authority. All organizational decisions should be already prescribed and enacted through the smart contract. Operations: Traditional corporates may leverage technologies to automate their business processes and operations. A DAO normally replies on the blockchain and smart contracts to maintain its daily operations. Governance: The critical difference between a traditional corporate and DAO is governance. Traditional firms rely on management, in particular the chief executive officer (CEO), and the board of directors to make organizational decisions, set the organizational direction, and ensure it is on the right path.
A DAO lays down the rules and policies of organizational decisions in the form of smart contracts. It is transparent to everyone how the organization’s decisions are made and when those decisions are enacted. The decisions are auditable and traceable as part of smart contract execution on the blockchain. When certain governance perspectives need to change, an onchain governance management process, such as voting, is used to determine and effectuate the changes. DAO has emerged as the main organizational model in the decentralized and Web3 world. There is no standard way to classify these DAOs. Some classify DAOs based on the purpose of DAO and its mission and define DAOs as corporate DAOs, community DAOs, or protocol DAOs. Other classifications further subdivide DAOs into grant DAOs, social DAOs, collector DAOs, venture or investment DAOs, media DAOs, social media DAOs, entertainment DAOs, and more. Protocol DAO is one of the most common DAO types on the market today. It is built for the governance of decentralized protocols. Most DeFi lending and borrowing, exchange, and derivative and insurance protocols we discussed in Chapter 3, Decentralized Finance, fall under this category. The immediate benefits of DAOs as an organizational model include transparency, efficiency, and security. It removes the human and emotional parts of decision-making and automates the organization’s decisions through smart contract execution.
DAO governance case study using Uniswap As we discussed in Chapter 3, Decentralized Finance, the Uniswap development team pioneered a constant product automated market maker mechanism for the exchange of crypto assets. The team had responsibility for the key decisions in the design and implementation of the initial Uniswap protocol. Shortly after its initial lunch in November 2018, the team launched its governance token, UNI, in September 2020 to enable a self-sustainable and competitive decentralized exchange. This gave rise to Uniswap DAO, the Uniswap community, which is in charge of protocol
governance and protocol development. UNI tokens give the token holders the right to vote or delegate votes on new development proposals, as well as the changes of governance attributes that drive the daily operation or infrastructure of the Uniswap protocol. According to Uniswap (https://uniswap.org/blog/uni), one billion UNI tokens were minted and shared among the Uniswap development team and its community members, investors, and advisors. The distribution to each group is as follows: 60.% of UNI tokens go to the community members 21.266% of UNI tokens are allocated to the Uniswap development team 18.044% of UNI tokens are allocated to the Uniswap investors 0.69% of UNI tokens belong to a group of advisors UNI provides the mechanics for voting rights in Uniswap governance. Uniswap employs a three-step process for governance: the temperature check process, the consensus check process, and the governance proposal process. The following figure shows such a governance process in Uniswap DAO. Once enough votes are garnered at the end of the governance proposal process, a formal implementation of on-chain code is enabled to effectuate the protocol changes:
Figure 5.24 – Uniswap governance process Collectively, by holding UNI tokens, the token holders can participate in the governance over the following key functions in the DAO: Uniswap governance UNI community treasury The protocol fee switch Uniswap.eth
Ethereum name service (ENS) name
Uniswap default list (tokens.uniswap.eth) SOCKS liquidity tokens If you’re interested in learning more, should check out the Uniswap website for more details regarding its governance: https://uniswap.org/governance. Another golden example of protocol governance is the MakerDAO, which we covered when we introduced the MakerDAO DAI and Stablecoins in
Chapter 3, Decentralized Finance. You are encouraged to go through the websites of the leading DeFi protocols to understand how the protocol or platform is governed through on-chain voting and smart contracts. In the next section, we will delve into a related but broader subject and help you skim through new concepts in NFTs, Web3, and Metaverse. Stay tuned.
NFTs, Web3, and Metaverse In Chapter 2, Ethereum Architecture and Ecosystem, we briefly introduced the concept of Web3 and Metaverse. We also discussed NFTs as part of the Ethereum token standards in Chapter 3, Decentralized Finance. In this section, we will further illustrate what the Metaverse is and how blockchain, NFTs, and Web3 play critical roles in the Metaverse while providing a biased view of what the future looks like in a world of NFTs, Web3, and Metaverse.
Introduction to a world of NFTs, Web3, and Metaverse There is no standard definition of what Web3 or the Metaverse is. A common consensus, especially with the blockchain community, is that Web3 is the next generation of web and internet infrastructure. It is a continuum of the progression from Web1, which we saw in the 1990s, to Web2, which we have today, to the future decentralized infrastructure that is powered by blockchain and enables digital ownership. The following figure summarizes the evolution and progression of the web and the internet’s infrastructure over the last 30 years:
Figure 5.25 – The evolution of the web’s infrastructure To understand what the future holds for the Metaverse, let’s explain how we get here and why Web3 is the logical and critical building block of the Metaverse's future. Web1, commonly known as the World Wide Web, started as a decentralized network of information and took off in the 1990s. The first generation of use cases on the web is mostly for information sharing. The activities that users can do on the web include sharing and retrieving information from static websites. Late in the 90s, e-commerce, user interaction, and user-generated content started being made. Use cases such as blogs, chat, social media platforms, and large e-commerce sites became mainstream. This is often called read and write web in Web2. Together with the maturity of the cloud, as well as social and data technologies, Web2 played a critical role in digitalizing business processes, innovating digital business models, and vastly improving the end user experience. Web3 refers to the blockchain and decentralized technology stack that enables the development of decentralized applications and gives users control over their identity and data, which is manifested by the concept of
decentralized identity (DID), crypto-assets, and NFTs. DAOs give the participants control over how the organization or community is governed. Combined with immersive experience technologies such as augmented reality (AR), virtual reality (VR), and extended reality (XR), it creates an ecosystem where users can play, transact, and interact with each other in the Metaverse, which is a digital reflection of the physical reality we live in.
The current state of NFTs As we explained in Chapter 3, Decentralized Finance, NFTs are created out of blockchain technology and live on a decentralized network. Ethereum made it easy for anyone to mint and create an NFT out of anything. It could be something meaningful, such as a real asset in the real world, it could be some collectibles that may hold value through ownership, or it could be some moments when something may happen, is happening, or has happened. NFTs have become a crypto phenomenon since the start of the COVID-19 pandemic in 2020. It reached its peak in 2021. The rise and fall of NFTs in 2022 reflects the general sentiment of the crypto market. To understand why NFTs make sense, let’s use an art collectible as an example of how NFTs fill in some gaps in today’s world. An art collectible was created by the original artist and was sold to a series of art collectors. The artist would only benefit from the first such sale. All subsequent sales would benefit the seller of the collectible. NFTs, by tokenizing the art collectible, enable the artist to collect the royalties of all such sales. Earlier collectors, as the initial investors of such art collectibles, can subsequently collect some portion of the proceeds if designed in such a way. The rationales behind NFTs are digital asset ownership, copyright protection, as well as royalty collection. They are protected through blockchain, smart contracts, and decentralized identity. ERC-721 and ERC-1155 are two notable interfaces for creating and minting NFTs in Ethereum. An NFT allows you to record token symbols, unique token IDs, ownership identifiers, metadata, transaction history, as well internal or external storage identifiers. Smart contracts define the trade and
ownership transfer rules. Over the last few years, NFTs have been created for all kinds of things, including events and moments in the real and virtual world, ranging from real-world assets, artworks and collectibles, event tickets, and music and media to virtual items such as games, memes, and moments. An NFT marketplace is a centralized or decentralized platform for buying, selling, and trading NFTs. These platforms allow buyers to discover and buy the NFTs listed on the marketplace and enable sellers and owners to store, display, and list their NFTs so that they can be sold to buyers. One notable example is Opensea, which is the first and largest digital marketplace for crypto collectibles and NFTs. Check out the Opensea website for more details: https://opensea.io.
Web3 and the future of the internet There’s no denying it – Web2 advanced the world by supplying the best user experience, providing instant and seamless access, and enabling convenience through digitization and digitalization. People scarify privacy and ownership for convenience and easy access. Web3 is supposed to bridge that gap and enable ownership, identity, and privacy. The following figure shows a blockchain-centric view of the Web3 world:
Figure 5.26 – Blockchain-enabled Web3 technology stack From this, we believe DLT, blockchain, and decentralized networks will be the infrastructure foundation for the next generation of the web. The leading L1 blockchains are building a web and internet infrastructure that powers the future finance, e-commerce, social, gaming, and payment activities in this new world. Blockchain natives, including smart contracts, crypto token standards, central bank digital currency (CBDC), as well as L2 rollups could become the future building blocks that power all social, gaming, and commerce activities in this new world. The ownership of user-generated content, data, identity, as well digital assets will be guaranteed by the underlying infrastructure. A new set of doand-earn models have emerged. In GameFi, you can play a game and your activities in that game earn you incentives or crypto-assets. With SocialFi, your content contributes to the health of a social platform and earns you a certain portion of the platform’s profits. As part of DAO governance, you can earn additional incentives by participating in DAO governance and protocol activities. Thanks to blockchain technology, you own your identity, content, data, rights, and assets, which are censorship-resistant, and no one can take them away from you.
Anyone can interact with the Web3 world through their wallets, interact with others through an immersive experience, and buy, sell, and trade their assets through decentralized marketplaces. That leads us to the Metaverse.
The Metaverse, virtual reality, and future interactions With or without blockchain as the major building block, the world will dive into a new world known as the Metaverse. This has been manifested by the phenomenon that all big tech companies have a huge investment in the Metaverse. Facebook changed its name to Meta to put its focus on the future of social interactions in the world of the Metaverse. What is missing from today’s Web2 world is ownership and an immersive experience. Ownership can’t be solved by a traditional technology stack alone. Technologies such as DLT, blockchain, decentralized networks, and decentralized identity are playing and continue to play critical roles in ensuring and guaranteeing ownership. What exactly the future of the Metaverse will look like is still murky. We hope the following analysis framework will help you understand what the future of a Metaverse world holds:
Figure 5.27 – The future of the Metaverse At the center is a digital me. Putting on our decentralized lenses, the digital identity is the decentralized digital identity (DID), which puts you back in control of your information while ensuring a verifiable, decentralized digital identity. Trust is guaranteed by blockchain technology and verifiable credentials, with us as the decentralized identifiers. Blockchain, as well as the DAO, is the digital infrastructure and governance structure that can ensure trust among decentralized parties. DID is supposed to work across different Web3 platforms and can be used to prove ownership of NFTs, social media accounts, and other assets on the blockchain. At the bottom are the digital twin and digital ownership. The Web3 technology stack, together with smart contracts, NFTs, and blockchain technologies, enforces the ownership of digital and crypto assets. You own
the contents and data you generated, and through smart contracts, you can earn incentives and certain ownerships of digital and crypto assets. The digital twin facilitates a real-time connection between the digital ME and the physical world of ME, and enables interactions and synchronization between the virtual and physical worlds. To ME, it is an omnichannel experience. It is a real-time representation of a unique individual, allows ME to be present in both physical and digital worlds simultaneously, and provides a seamless experience, regardless of whether I am in the physical or digital world or cross in between both. At the top, it is about the interaction and experience. Technologies in VR/AR/XR will enable a new set of digital experiences and interactions. Altogether, we think blockchain and decentralized networks will become the internet infrastructure of the future. No one is sure how much of a role blockchain plays in the future Metaverse, but blockchain and smart contracts will make it easy to implement ownership and identity in this new world. Take, for example, Decentraland, one of the leading Metaverse platforms, which allows users to create, explore, and monetize digital things in the virtual world. The platform provides an immersive social experience by allowing users to socialize and interact with each other through chat features, attend virtual events, and explore the diverse environments created by other users. Decentraland is governed by the Decentraland DAO, which is a decentralized autonomous organization that is controlled by its users. The platform supports a decentralized economy, similar to the economy in the physical world. Decentraland is built on the Ethereum blockchain, which enables users to buy, sell, and trade virtual land and digital assets using the platform’s native cryptocurrency, MANA. The virtual world in Decentraland is divided into parcels of land, which are represented as NFTs on the Ethereum blockchain. These parcels can be purchased using MANA and provide the owners with complete control and ownership rights over the content and experiences within their land. The ownership of virtual assets is secured by the blockchain, ensuring provable scarcity and verifiable ownership.
Within Decentraland, you can create and customize 3D scenes, structures, and digital objects using its development tools. These digital creations can then be traded in the marketplace. Imagine that, someday, you could tokenize your property in the physical world and put it on Decentraland or another Metaverse platform as an NFT with your own decentralized identity. By doing so, you can host social events or do business as a DAO seamlessly in both the virtual world and the physical world. This is not a far-fetched idea, and it could become a reality sooner than anyone thinks.
Summary In this chapter, we showed you Ethereum’s plans post-merge and its rollupcentric roadmap. At the time of writing, sharding is the top priority in implementing final solutions for scaling Ethereum. We discussed how Danksharding and Proto-Danksharding work and why data availability sampling is needed. We touched on PBS, as well as its impact on MEV. We also discussed different zkEVM implementations. A full ZK-compatible EVM is considered the holy grail of EVM improvement. We introduced account abstraction and the EIP 4377 implementation to help you understand how a smart contract wallet works and what innovation it may unleash. Both zKEVM and account abstraction are hot topics in the Ethereum community and we think we will see a real breakthrough soon. At the end of this chapter, we briefly introduced DAO, NFTs, Web3, and the Metaverse. In the next chapter, we will introduce Solidity, the smart contract programming language in Ethereum. We will dive into the details of the programming language’s structure and discuss the best practices for writing a smart contract in Solidity.
Part 2:Ethereum Development Fundamentals In this Part, you will thoroughly explore Ethereum and the intricacies of smart contract development. The journey begins with a detailed examination of Solidity programming and its best practices. We will then delve into the Web3 API, demonstrating how to seamlessly integrate smart contracts into your decentralized applications (DApps). We guide you through practical examples, helping you create your own cryptocurrency using open-source smart contract libraries and ERC token standards. This part comprises the following chapters: Chapter 6, Fundamentals of Solidity Chapter 7, Web3 API Fundamentals Chapter 8, Developing Your Own Cryptocurrency
Fundamentals of Solidity In this chapter, we will dive into the details of Solidity, the most popular smart contract programming language. We will look at the features of the Solidity programming language and will provide an overview of Solidity’s development tools. We will learn about various Solidity language fundamentals, including the structure of a contract, contract patterns, and exception handling. We also cover smart contract security and best practices. At the end of this chapter, we will show you a complete example of a real-world smart contract that’s been developed in Solidity, and demonstrate how you can functionally test your smart contract. The following topics will be covered in this chapter: Introducing Solidity Learning about the fundamental programming structure in Solidity Enabling the contracts library Understanding inheritance, abstract contracts, and interfaces Examining smart contract execution under the hood Mastering advanced programming concepts in Solidity Types of smart contract Putting it all together – rental property leasing
Technical requirements For all the source code of this book, please refer to the following GitHub link: https://github.com/PacktPublishing/Learn-Ethereum-Second-Edition/.
Introducing Solidity
Solidity is an Ethereum smart contract programming language with a syntax similar to C++ and JavaScript. It was designed to create a smart contract and can be executed on the Ethereum Virtual Machine (EVM). Gavin Wood, Christian Reitwiessner, Alex Beregszaszi, and several Ethereum core contributors developed it. Solidity is a statically-typed, object-oriented language that contains state variables, functions, and complex user-defined types and supports inheritance and libraries. It allows developers of Decentralized Applications (DApps) to implement business logic functions in a smart contract. Like any other static language, the contract compiler will verify and check syntax rules during the contract compile time. Similar to Java, the Solidity code is compiled into bytecode that can be executed on the EVM. Unlike other compiled languages, the bytecode that’s generated across platforms will remain the same, provided that the input parameters to the compiler and the compiler version remain the same. Once compiled, the contract converts to bytecode which the Ethereum network can process. The EVM will generate a contract address for the deployed smart contract. Users with permissions can interact with a contract function to submit a transaction, such as a transfer function. Here is a flow diagram showing the deployment process of the Solidity smart contract:
Figure 6.1 – Smart contract development process This can be further explained with the following steps: 1. We start by coding smart contracts in Solidity and compiling them into bytecode. 2. Then, we deploy and test our smart contract in the test environment, including a local simulated non-mining environment and the Ethereum testnet.
3. Once the smart contracts have been thoroughly tested, we can deploy a smart contract to the Ethereum mainnet and let the DApps invoke and execute the smart contracts. In the next section, we will discuss smart contract development tools for developing Solidity contracts.
Tools for the Solidity development environment As we discussed in the DApp development tools section of Chapter 2, Ethereum Architecture and Ecosystem, there are quite a few development tools available to make smart contract development easier. Which one to use is truly a personal choice. The following are the tools we are going to use to build, monitor, and deploy smart contracts on the Ethereum platform: Browser-based IDEs Standalone IDEs with Solidity plugins Command-line development management tools
Browser-based IDEs There are quite a few browser-based integrated development environments (IDEs) you can use to develop Solidity smart contracts. The advantage of such tools is that you can directly start to develop, test, and deploy code entirely in a web browser without requiring any local software installation. In this section, we will be looking at online browser-based tools such as Remix and EthFiddle. Let’s look at one of them in a bit more detail now. Remix is a powerful open source IDE for coding, compiling, testing, and debugging smart contracts in Solidity:
It is a browser-based development environment that allows you to program in Solidity within your browser It allows you to compile Solidity code in all available versions and supports three runtime environments for testing and debugging, including JavaScript VM, Injected Web3, and the Web3 provider In addition, it allows you to see all transaction logs, events, input, and output, as well as gas consumption for the transactions You can start Remix by connecting your browser to https://remix.ethereum.org/. The following is a screenshot of the UI of Remix:
Figure 6.2 – Remix IDE Next, let’s move on to standalone IDEs.
Standalone IDEs Although online Solidity compilers such as Remix offer an easy-to-use interface for developing and deploying smart contracts, many developers prefer the versatility and customization options provided by standalone IDEs when working on large and complex Solidity projects. These standalone IDEs offer advanced capabilities such as code completion, debugging, and project management, which can optimize workflow and improve code efficiency. A variety of standalone IDEs, including Visual Studio Code, Atom, and Sublime Text, incorporate Solidity plugins that provide advanced features to enable developers to write, compile, and deploy Solidity code directly from within the IDE. These plugins feature sophisticated functionalities such as syntax highlighting, code completion, error checking, improving code development speed, and reducing errors. Furthermore, some standalone IDEs, such as Truffle and Embark, provide comprehensive development environments for developing and deploying DApps. These IDEs contain integrated built-in testing frameworks, tools for managing smart contracts, and integration with popular blockchain platforms such as Ethereum, simplifying the process of building and deploying DApps. Overall, standalone IDEs with Solidity plugins present a powerful development environment for creating DApps. With their advanced features, flexibility, and customization, they constitute an appealing choice for developers aiming to construct top-quality smart contracts and DApps. The following screenshot shows us a Visual Studio Code IDE example:
Figure 6.3 – Visual Studio Code IDE In the next section, we will go over the fundamentals of the Solidity programming language via various examples.
Learning about the fundamental programming structure in Solidity Let’s get a taste of Solidity’s code and use an example to show you the layout and constructs of a smart contract. We will begin with the most basic smart contract example, HelloWorld.sol, as shown in the following screenshot:
Figure 6.4 – HelloWorld contract Solidity’s file extension is .sol. It is similar to .js for JavaScript files and .java for Java source code. The preceding code defines a smart contract called HelloWorld, which has a contractor for setting the initial greeting, defines a setGreeting method to reset the greeting, and a hello method for the authorized party to get the greeting. In the rest of the chapter, we will go over the fundamentals of the Solidity programming language, including the following headings: The layout of a Solidity source file Structure of a contract State variables Functions
Function modifiers
The layout of a Solidity source file A Solidity source file is typically composed of the following constructs: SPDX license identifier Pragma Comments Import Contract definition You can have many of those constructs in one Solidity source file. We will briefly go over the first three constructs in this section, and then discuss the contract construct in detail in the following section.
SPDX license identifier From version 0.6.8, Solidity introduces SPDX license identifiers to indicate relevant license information from the package to the source code file level. SPDX stands for Software Package Data Exchange, and it is an open standard for the software bill of materials (SBOM). For details on SBOM, please check the following website: https://www.cisa.gov/sbom. You should specify an SPDX license for your smart contract file. Solidity compiler encourages the use of machine-readable SPDX license identifiers, which include license information as the supplied string in the bytecode metadata. Every source file should start with a comment indicating its license: // SPDX-License-Identifier: MIT
If you do not want to open source the source code, you can use the UNLICENSED value.
Pragma The pragma keyword in line 2 simply says that the source code file will need to compile with a compiler that’s version 0.8.9 or later. Anything newer does not break functionality. This is important because different versions of Solidity can have different syntax, semantics, and features. Therefore, by specifying the version of Solidity, you can ensure that the smart contract will work with the certain version of the compiler we chose. For example, ^0.8.9 implies the contract will not work on compilers lower than version 0.8.9.
Comments Just like any other modern language, comments are used to make the source code easier to understand for the readers or developers. They are ignored by the compiler. The comment can be a single-lined comment starting with // or a multi-lined comment starting with /* and ending with */. Comments can be used to document the input and output of a function. In the preceding example, there are comments to define the input parameter using @param in line 23 and the output parameter via @return in line 16.
Import The import keyword in Solidity is used to include code from another file in the current contract. The import statement is similar to the import statement in Java. It is often used to make the source code more modularized: 1. The following are a few ways to import a Solidity source file:
import "HelloWorld.sol";
2. The preceding line of code can be used to import all the global symbols from a local Solidity source file: import "openzeppelinsolidity/contracts/token/ERC20/StandardToken.sol";
3. The preceding line of code can be used to import a file from the Zeppelin library from the local library with a defined path: import "http://github.com/OpenZeppelin/openzeppelinsolidity/contracts/token/ERC20 /ERC20.sol";
The preceding line of code imports an ERC20.sol file from the public repository. 4. You can import a file and define a new global symbol as follows: import * as MyContract from "BaseContract.sol";
5. The preceding code imports from BaseContract and creates a new global symbol called MyContract, containing the global symbol’s member from the import file – that is, BaseContract.sol. 6. Another Solidity-specific syntax equivalent to the preceding import is as follows: import "BaseContract.sol" as MyContract;
7. Instead of importing all the global symbols from the imported file, you can import specific symbols and name some of them alias, as shown here:
import {symbol1 as alias, symbol2} from " BaseContract.sol";
Depending on your import statement, the compiler will use either file path to locate the source file. The path can be a relative path, something like ../orders/OpenOrder.sol, or an absolute path, such as /home/evergreen/orders/OpenOrder.sol, or a URL pointing to the public URL, as shown here, import https://github.com/OpenZeppelin/openzeppelincontracts/blob/v4.8.3/contracts/utils/math/SafeMath.sol.
Contract definition This will define the name of the contract and a collection of code (its functions), data (its state), and conditions to execute the contract. For example, we define the SimpleStorage contract as follows: contract SimpleStorage {
...
}
A Solidity file is saved as a .sol file type, and source code can include import directives, pragma directives, struct, enum, and function. In the next section, we will cover the structure of a contract.
Structure of a contract Contracts in Solidity are similar to classes or objects in most object-oriented languages. A contract defines the following constructs: Data or state variables Functions Events and function modifiers User-defined types in enums or structs
Collections with mappings Functions can access and modify the state variables or user-defined types of variables. When calling a function on a different contract, the called function will not have access to the state variables from the caller contract. Smart contracts are deployed to the Ethereum network, and will only react to the call from an external account or contract through Inter-process Communications (IPC) HTTP, or Remote Procedure Call (RPC).
State variables In Solidity, state variables are defined at the contract level and permanently stored in contract storage in EVM. Contract state variables are maintained and used across multiple transactions. State variables can be defined using various data types, including integers, strings, arrays, storage, mappings, structs, and user-defined types. As shown in the preceding HelloWorld contract, greeting is the state variable and is defined as a string type with a visibility of X. The visibility of a state variable can be defined as public, private, or internal, as follows: The public variable: The compiler automatically generates the getter method; other contracts can read this public variable. The private variable: The variable is only visible to the smart contract itself, and not even to the child contracts. The internal variable: This can be used if you want the state variable to be visible to the smart contract and its child contracts. It is the default visibility level for state variables.
Built-in data types Solidity is a statically-typed language. Developers with other programming language backgrounds, such as JavaScript, Java, or Python, will easily learn
the Solidity syntax. Each variable needs to specify the data type. Depending on the data type, a declared variable will have a default value, which is an initial default value whose byte representation is all zeros. Data types in Solidity can be either of the following: Value types: This will always be passed by value Reference types: This means it will be passed by reference Solidity defines a large number of built-in data types, which allow you to define the complex types using enums, structs, and mapping. The basic data types in Solidity are as follows: Types
Operators
Example bool
Boolean (bool)
!, &&, ||,
transferable
==,
=
and !=
Note The Booleans are true or false expressions. By default, it is false.
true;
Comparison operators: 0x0x57C08691e56BB0b43f0f39643eBF616dCfEa5dC6 ERROR[02-10|19:39:30.940] Invalid address length, please retry > 0x57C08691e56BB0b43f0f39643eBF616dCfEa5dC6 > 0xA5FBe80812d3e572A264e176Cc45c551cBF7aac1 > 0x Should the precompile-addresses (0x1 .. 0xff) be prefunded with 1 wei? (advisable yes) > yes
4. Select yes to prefund the accounts with 1 wei. 5. Next, we need to specify the network ID, which is 1155: Specify your chain/network ID if you want an explicit one (default = random) > 1155 INFO [02-10|19:40:10.151] Configured new genesis block >
6. Now we can generate and export the genesis file by selecting Manage existing genesis, then Export genesis configurations. During the process, you will be prompted to save and accept the output of the 1155.json native genesis file. You can save this file to your local machine and proceed by pressing the Enter key. Finally, you can quit Puppeth with Ctrl + D or Ctrl + C: What would you like to do? (default = stats)
1. Show network stats 2. Manage existing genesis 3. Track new remote server 4. Deploy network components > 2 1. Modify existing configurations 2. Export genesis configurations 3. Remove genesis configuration > 2 Which folder to save the genesis spec into? (default = current) Will create 1155.json > INFO [02-10|19:40:29.151] Saved native genesis chain spec path=1155.json What would you like to do? (default = stats) 1. Show network stats 2. Manage existing genesis 3. Track new remote server
4. Deploy network components > CRIT [02-10|19:40:41.214] Failed to read user input err=EOF
You should see 1155.json be generated in the current folder of the local system. 7. You can rename it to genesis.json by using the following command: mv 1155.json genesis.json
8. Open genesis.json and you will see the initial configuration generated by Puppeth. "alloc" is an Ethereum-specific functionality to handle the "Ether presale" period. Puppeth generates a list of pre-filled wallets from 0000000000000000000000000000000000000000 to 00000000000000000000000000000000000000ff. A total of 255 addresses
are given 1 wei. Since we are running a local private Ether node, we don’t use this data. Instead, we can delete these 255 pre-funded addresses from alloc and keep the last two user accounts we created manually with the Geth tool. 9. Puppeth’s default prefunded accounts have 0x20000000000000000000000000000000000000000000000000000000 0000000 wei, which is around 9.0462e+74 ether. This large amount
will be enough to pay the gas fee when we do some Web3 exercises. We can see what genesis.json looks like in the following screenshot:
Figure 7.3 – Generated genesis.json file using the Puppeth tool The genesis.json file plays a pivotal role in setting up and initializing a blockchain network. It acts as the essential configuration file that defines the initial state and parameters of the blockchain, including the genesis block. Think of the genesis.json file as the settings for your blockchain. It allows you to define important aspects such as the chain configuration, the difficulty level for mining blocks, gas limit, and initial allocations of assets.
When creating a genesis.json file, there are four required values that must be specified: config, difficulty, gasLimit, and alloc. These values provide crucial information for the blockchain network, such as networkspecific configurations, the level of computational effort required for mining new blocks, the maximum amount of gas allowed per block, and the initial allocations of tokens or assets to specific accounts. By carefully configuring the genesis.json file with these required values, you can tailor the characteristics and behavior of your blockchain network to suit your specific needs and requirements. Once we have generated the genesis.json file, we can proceed with initializing the chain instance.
Initializing the chain instance To initialize the chain instance, we can run the following geth command: geth --datadir private-chain/ init genesis.json -- datadir param
specifies where we should save our blockchain
network’s data. After the initializing command runs, you should see the genesis state is updated and the chain instance is initiated: ubuntu@ip-172-31-22-102:~$ geth --datadir private-chain/ init genesis.json
INFO [02-10|20:29:53.526] Maximum peer count ETH=50 LES=0 total=50
…
INFO [02-10|20:29:53.610] Writing custom genesis block
INFO [02-10|20:29:53.634] Persisted trie from memory database nodes=358 size=50.78KiB time=2.085958ms gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
..
INFO [02-10|20:29:53.736] Successfully wrote genesis state database=lightchaindata hash=e85e8b..167136
The private chain’s root folder should contain something like the following:
Figure 7.4 – Geth file structure Next, we will start our local Geth node.
Starting a Geth node
To start a Geth node, we will run the geth command to start the network with the given params: geth --nou– \
--datadir=$pwd \
--syncmode 'full' \
--port 30310 \
--networkid 1515 \
--miner.gasprice 0 \
--miner.gastarget 470000000000 \
--http \
--http.addr 0.0.0.0 \
--http.corsdomain '*' \
--http.port 8545 \
--http.vhosts '*' \
--http.api admin,eth,miner,db,net,txpool,personal,web3,debug \
--verbosity 3 \
--mine \
--allow-insecure-unlock \
--unlock '0,1' \
--password password.txt
This will bring up a local Ethereum Geth node. You will see lots of logs printed out on the console:
Figure 7.5 – Geth logs Having successfully started the Ethereum network, the next section will focus on connecting to Ethereum networks and performing some basic tests to ensure proper functionality.
Connecting to Ethereum networks In this section, we will connect to the Geth console and explore the fundamentals of using Web3 commands in Geth. Geth interacts with instructions encoded as JSON objects, adhering to the JSON-RPC API specification. Users can directly send these instructions using tools such as Curl over HTTP for things such as querying an account balance. The following code snippets demonstrate an example request for an account balance sent to a local Geth node with HTTP port 8545 exposed, followed by Geth’s response: 1. To run the Geth node, we will run the geth command to start the network with the given params: ubuntu@ip-172-31-22-102:~$ curl -X POST -H 'ContentType: application/json' –data '{"jsonrpc":"2.0","method":"net_version","params": [],"id":1515}' http: //localhost:8545 {"jsonrpc":"2.0","id":1515,"result":"1515"}
Executing the command will yield the blockchain network ID 1515 as the output. Once the geth node runs, a geth.ipc IPC file will be generated in the private chain folder. IPC is short for inter-process communication and generally operates on local computers. IPC behaves in the same way as RPC/HTTP; the only change is the client implementation. 2. We can log in to the geth console by running the following command: geth attach geth.ipc
This will enter the geth console and present the following output:
ubuntu@ip-172-31-22-102:~/private-chain$ geth attach geth.ipc
Welcome to the Geth JavaScript console!
instance: Geth/v1.10.26-stable-e5eb32ac/linux-amd64/go1.18.5
coinbase: 0x57c08691e56bb0b43f0f39643ebf616dcfea5dc6
at block: 0 (Fri Feb 10 2023 19:39:11 GMT+0000 (UTC))
datadir: /home/ubuntu/private-chain
modules: admin:1.0 debug:1.0 engine:1.0 eth:1.0 ethash:1.0 miner:1.0 net:1.0 personal:1.0 rpc:1.0 txpool:1.0 web3:1.0
To exit, press ctrl-d or type exit
>
3. Next, let’s check the ether values of the two accounts we created previously using eth.accounts: > eth.accounts ["0x57c08691e56bb0b43f0f39643ebf616dcfea5dc6", "0xa5fbe80812d3e572a264e176cc45c551cbf7aac1"] >
4. To get the balance of the first address, we can use the following command: > eth.getBalance(eth.coinbase) 9.046256971665327767466483203803742801036717552003169065 58262375061821325312e+74
5. Let’s use the web3 library to convert the amount from wei to ether. We can see the unit is divided by 1018 from wei to ether: > web3.fromWei(eth.getBalance(eth.coinbase),"ether") 9.046256971665327767466483203803742801036717552003169065 58262375061821325312e+56
We have now configured a local Ethereum network on an Ubuntu Linux machine and performed basic web functions. In the next section, we will
run web3.js to interact with the private Ethereum network.
Learning the fundamentals of web3.js – the Ethereum JavaScript API web3.js is a collection of Ethereum JavaScript APIs that enable you to develop clients to interact with the Ethereum blockchain. This lets you read and write data to and from the Ethereum blockchain with smart contacts. Web3 provides interactions with Ethereum nodes (be they local or remote) via HTTP, IPC, or WebSocket. web3.js is used to interact with an Ethereum node (Geth node) using JSON-RPC to read and write data to the network. We can install the web3.js library with Node Package Manager (npm). In the following subsection, we’ll discuss the web3.js basics and see how to start using web3.js to interact with the Ethereum blockchain. web3.js can be used both in frontends and backends. To simplify our work, we will run the web3.js API from the command line and will cover some popular web3.js APIs.
web3.js project setup To begin working with web3.js, we need to set up a project that includes the web3.js library and its dependencies. This section will guide you through the process of setting up a web3.js project, ensuring that you have the required tools and configurations in place. We will cover essential steps such as installing Node.js and npm, creating a new project directory, initializing a new npm project, and installing the web3.js library. These steps will lay the foundation for building Ethereumbased applications using web3.js, allowing you to harness the power of blockchain technology within your projects. Let’s get started:
1. Install Node.js if you haven’t done so. The official installation guidelines can be found at https://nodejs.org/en/download/packagemanager/. 2. We will install version 14 in our local Ubuntu development environment. Run the following commands. Once complete, we should see that version 14 of Node.js is available: sudo apt update curl -sL https://deb.nodesource.com/setup_14.x | sudo bash sudo apt -y install nodejs node -v v14.21.2
3. Create a folder called web3js-example. 4. Initialized a nodejs project. 5. We first navigate to the web3js-example folder, then run the npm init -y command to initialize it as a Node.js project: ubuntu@ip-172-31-22-102:~/web3js-example$ npm init -y Wrote to /home/ubuntu/web3js-example/package.json: { "name": "web3js-example", "version": "1.0.0", "description": "",
"main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1" }, "keywords": [], "author": "", "license": "ISC" }
6. Install web3.js: npm install web3 --save
This will set up a basic web3.js project. The web3.js library has a key class called web3 that holds the majority of the library’s functions. The following are the five additional modules that make up web3js: is a module that provides functions for web3.js clients to interact with the Ethereum blockchain nodes, mined blocks, externally owned accounts, and smart contracts web3-eth
is a module for the Whisper protocol to communicate via P2P and broadcast web3-shh web3-bzz
storage
is a module for the Swarm protocol for decentralized file
is a module that contains useful helper functions for the DApp function web3-utils
- is a module that provides network-related functions, including checking connectivity status, retrieving the peer count, and obtaining the listening status of the connected node web3.net
In the next few sections, we will explore the web3-eth API.
web3.js Account web3.js Account is a key component in interacting with the Ethereum blockchain using the web3.js library. It represents an Ethereum account, which is essentially a public-private key pair used for signing transactions and accessing the blockchain. With web3.js Account, developers can perform various operations, such as creating new accounts, managing account information, checking account balances, and signing transactions. It provides a convenient interface to handle cryptographic operations securely and seamlessly within Ethereumbased applications. In this section, we will review the account API.
Creating a Geth Ethereum account We will start with creating the Geth Ethereum account: 1. Create a file called account.js. 2. Initialize a Web3 instance: 1. We first import the web3.js library: Web3 =require("web3")
2. We then connect to a local Ethereum node at http://localhost:8545:
web3 =new Web3("http://localhost:8545")
3. Call to get the wallet balance: web3.eth.getBalance(walletAddress).then(res=>console.log (res));
The content of account.js is as follows: async function getWeb3() {
const Web3 = require('web3')
//Instantiating a Web3 instance and using localhost Ethereum node
const web3 = new Web3(new Web3.providers.HttpProvider('http://localhost:8545'))
return web3
}
async function getAccounts(web3) {
return await web3.eth.getAccounts()
}
async function main() {
let web3= await getWeb3()
let accounts = await getAccounts(web3)
console.dir(accounts)
}
main()
4. Run account.js with the following command: node account.js
[
'0x57C08691e56BB0b43f0f39643eBF616dCfEa5dC6',
'0xA5Fbe80812d3e572A264e176Cc45c551cBF7aac1'
]
With two accounts returned, we have now run our first web3.js call from the client side. The remaining web3.js APIs will follow a similar account.js setup.
Creating an Ethereum account
Next, let’s create an account with the eth.accounts.create function. The input parameter is an optional random string to increase entropy. The string should be at least 32 characters. If no string is given, a random string will be generated by randomhex. You can create an account through web3.eth.personal.newAccount(password).
Let’s write a createAccount function in account.js and print it out: async function createAccount(web3, password) {
return await web3.eth.personal.newAccount(password);
}
async function main() {
let web3= await getWeb3()
let password = '!@superpassword';
let account = await createAccount(web3, password)
console.log(`account created: ${account}`)
let accounts = await getAccounts(web3)
console.log("Total accounts are: ")
console.dir(accounts)
}
main()
The console will show the following result. We can see that one new account has been created: node account.js
account created: 0xdB6784E53871750752E44A9225DD606e669d84d7
Total accounts are:
[
'0x57C08691e56BB0b43f0f39643eBF616dCfEa5dC6',
'0xA5Fbe80812d3e572A264e176Cc45c551cBF7aac1',
'0xdB6784E53871750752E44A9225DD606e669d84d7'
]
Note Since we could send a plain-text password over a unsecured WebSocket or HTTP provider, in actual live applications we should be very careful when
using this API to create user accounts.
Checking the balances of Ethereum accounts web3.js has a get balance API shown in the following code. The function returns the balance of an address at a given block: web3.eth.getBalance(address [, defaultBlock] [, callback])
Let’s write a get balance function to retrieve all three account balances: async function getAccountBalances(web3, accounts) {
// get balance
await accounts.forEach(async account => {
const balance = await web3.eth.getBalance(account)
console.log(`account: ${account}, balance: ${balance}`)
});
}
async function main() {
let web3= await getWeb3()
let accounts = await getAccounts(web3)
await getAccountBalances(web3, accounts)
}
main()
The result is as follows: ubuntu@ip-172-31-22-102:~/web3js-example$ node account.js
account: 0x57C08691e56BB0b43f0f39643eBF616dCfEa5dC6, balance: 9046256971665327767466483203803742801036717552003169065582623 75061821325312
account: 0xdB6784E53871750752E44A9225DD606e669d84d7, balance: 0
account: 0xA5Fbe80812d3e572A264e176Cc45c551cBF7aac1, balance: 9046256971665327767466483203803742801036717552003169065582623 75061821325312
We can see the two accounts we created previously have 9.04e74 wei, while the new account has a balance of 0. We’ve checked the balances of Ethereum accounts, so now let’s move on to the web3.eth.Contract object. This object helps us interact with smart contracts on the Ethereum blockchain.
web3.js ABI Ethereum Virtual Machine (EVM) defines the standard way data can interact with contracts in the Ethereum ecosystem. Solidity has a global variable called abi (for the contract’s application binary interface) with encode and decode methods used on the parameters of any contract function. web3.js provide several encode and decode methods under we3.eth.abi, listed as follows: encodeFunctionSignature encodeEventSignature encodeParameter encodeParameters encodeFunctionCall decodeParameter decodeParameters decodeLog
In the following section, we will run a few encode and decode APIs using web3js ABI.
encodeFunctionSignature is used to encode abi json for functions as well as strings. Here is an example: encodeFunctionSignature
async function encodeFunctionSignatureJson(web3) {
return await web3.eth.abi.encodeFunctionSignature({
name: 'myMethod',
type: 'function',
inputs: [{
type: 'uint256',
name: 'input1'
},{
type: 'string',
name: 'input2'
}]
})
}
async function encodeFunctionSignatureString(web3) {
return await web3.eth.abi.encodeFunctionSignature('myMethod(uint256,string )')
}
async function main() {
let web3= await getWeb3()
let encodeOutPut = await encodeFunctionSignatureJson(web3)
console.log(`encodeFunctionSignature by Json: ${ encodeOutPut}`)
encodeOutPut = await encodeFunctionSignatureString(web3)
console.log(`encodeFunctionSignature by String: ${ encodeOutPut}`)
}
main()
We can see the same result by running it on JSON and string input: ubuntu@ip-172-31-22-102:~/web3js-example$ node abi.js
encodeFunctionSignature by Json: 0x24ee0097
encodeFunctionSignature by String: 0x24ee0097
Next, let's look at encodeParameters and decodeParameters.
encodeParameters and decodeParameters and decodeParameters are essential functions provided by the web3.js library for handling the encoding and decoding of input and encodeParameters
output parameters in Ethereum smart contracts. By utilizing encodeParameters and decodeParameters in web3.js, developers can seamlessly interact with smart contracts, passing arguments and retrieving results in a standardized and efficient manner. These functions simplify the process of working with complex data structures and ensure accurate communication between Ethereum smart contracts and the web3.js application. encodes a function’s JSON parameters and is the API used to decode the encoded parameters to its JavaScript types. Here is an example: encodeParameters decodeParameters
async function encodeParameters(web3, typesArray, parameters) {
return await web3.eth.abi.encodeParameters(typesArray, parameters);
}
async function decodeParameters(web3, typesArray, hexString) {
return await web3.eth.abi.decodeParameters(typesArray, hexString);
}
async function main() {
let web3= await getWeb3()
let typesArray = [
'uint8[]',
{
"Struct": {
"propertyOne": 'uint256',
"propertyTwo": 'uint256'
}
}
];
let parameters= [
['10','11'],
{
"propertyOne": '100',
"propertyTwo": '200',
}
];
let hexString = await encodeParameters(web3, typesArray, parameters);
console.log(`encodeParameters hexString: ${ hexString}`)
let decodeOutput = await decodeParameters(web3,
typesArray, hexString)
console.log(`decodeOutput: ${ JSON.stringify(decodeOutput)}`)
}
The JSON encoded input and decoded result are as follows: ubuntu@ip-172-31-22-102:~/web3js-example$ node abi.js
encodeParameters hexString: 0x00000000000000000000000000000000000000000000000000000000000 0006000000000000000000000000000000000000000000000000000000000 0000006400000000000000000000000000000000000000000000000000000 000000000c800000000000000000000000000000000000000000000000000 0000000000000200000000000000000000000000000000000000000000000 0000000000000000a00000000000000000000000000000000000000000000 0000000000000000000b
decodeOutput: {"0":["10","11"],"1": ["100","200"],"__length__":2}
Lastly, let's look at encodeFunctionCall.
encodeFunctionCall is a vital function offered by the web3.js library, primarily used for encoding function calls to ensure compatibility with the function signatures defined within a smart contract. encodeFunctionCall
encodes a contract function call using its JSON data and given parameter values in 32 bytes of data. Here is an example: encodeFunctionCall
async function encodeFunctionCall(web3, jsonInterface, parameters) {
return await web3.eth.abi.encodeFunctionCall(jsonInterface, parameters);
}
async function main() {
let web3= await getWeb3()
let jsonInterface = {
name: 'myMethod',
type: 'function',
inputs: [{
type: 'uint256',
name: 'input1'
},{
type: 'string',
name: 'input2'
}]
};
let parameters2 = [100, 'test']
let encodeFunctionCallOutput = await encodeFunctionCall(web3, jsonInterface, parameters2)
console.log(`decodeOutput: ${encodeFunctionCallOutput}`)
}
main()
Here is the encoded function result: ubuntu@ip-172-31-22-102:~/web3js-example$ node abi.js
encodeFunctionCallOutput: 0x24ee0097000000000000000000000000000000000000000000000000000 0000000000064000000000000000000000000000000000000000000000000 0000000000000040000000000000000000000000000000000000000000000 0000000000000000004746573740000000000000000000000000000000000 0000000000000000000000
Next, we will delve into the Web3 provider, a critical component in web3.js that acts as a communication bridge between your application and the Ethereum blockchain. The Web3 provider enables you to establish connections with Ethereum nodes such as Geth and Infura, facilitating seamless interaction with the blockchain network.
Web3 providers Web3 providers come in three types: HTTP provider WebSocket provider IPC provider
In the example in the Creating a Geth Ethereum account section, we pointed to a local Geth node using the HTTP provider: const Web3 = require('web3')
const web3 = new Web3(new Web3.providers.HttpProvider('http://localhost:8545'))
We can change providers to use the WebSocket provider by giving the WebSocket URL web3.setProvider(new Web3.providers.WebsocketProvider('ws://localhost:8546')). You can also use the IPC provider as follows: // Using the IPC provider in node.js
var net = require('net');
var web3 = new Web3('/home/ubuntu/private-chain/geth.ipc', net);
In real-world development, you may often point to a remote testnet node provider. Using Alchemy as an example, here is how we use a remote node provider: var Web3 = require('web3');
var web3 = new Web3("https://ethmainnet.alchemyapi.io/v2/your-api-key");
The market’s top node providers/Web3 providers are Infura, Alchemy, Moralis, QuickNode, and Tatum. We will use Moralis as our remote node provider for our web3.js contract and transaction-related API examples in the next few sections.
Setting up the Ethereum testnet environment In the next section, we will guide you through the process of setting up an Ethereum testnet environment. By doing so, you will be able to simulate the functionalities and behavior of the Ethereum blockchain within a controlled
and sandboxed environment. This setup is particularly beneficial as it allows you to test your smart contracts, debug code, and ensure that everything is functioning properly prior to deploying your contracts on the live Ethereum network.
Creating a QuickNode API endpoint Visit the QuickNode website at https://www.quicknode.com/. You can easily onboard if you don’t have an account by following the on-screen instructions. In addition, you can choose a free plan for your web3.js API experiments. Once you log in, select Ethereum | Goerli, then click Continue:
Figure 7.6 – Create an endpoint in QuickNode Configure the add-ons page, choose Discover free plan, and create a QuickNode Goerli testnet API instance. Once the setup is complete, QuickNode will have created a Goerli testnet endpoint instance for you. Use the HTTP Provider API to interact with the Ethereum Goerli testnet as follows:
Figure 7.7 – Goerli testnet endpoint in QuickNode The number at the end of the URL is a private API key to access this endpoint. To keep this sensitive QuickNode URL confidential, we will set up a Node.js configuration and store this URL locally.
Creating the environment configuration Create an .env file and add your QuickNode endpoint to it: # Quick Node API KEY for Goerli Alchemy
L1RPC=https://chaotic-proud-waterfall.ethereumgoerli.discover.quiknode.pro/fc33ae7cxxxxxxxxx15f25370/
# your wallet private key
privateKey=
There is a private key field in the preceding code, which is used for your wallet private key for accounts. We will add it later when we go over the web3.js transaction API. Install the Node.js dotenv npm library. dotenv is an npm module designed to seamlessly load environment variables from an .env file into the process.env object: npm i dotenv
With the environment configuration set up, the next step involves acquiring test tokens on the Goerli testnet. The Goerli testnet test tokens are needed to
simulate the use of real Ether on a test network. These test tokens hold no real-world value and are specifically designed for testing and experimentation purposes. By obtaining test tokens on the Goerli testnet, developers can perform various operations, such as deploying and interacting with smart contracts, testing transaction functionality, and verifying the behavior of their applications in a test environment.
Getting Goerli testnet tokens Get two accounts from the MetaMask wallet on the Goerli testnet network. You can create a new account through the wallet itself:
Figure 7.8 – Create a new account on the Goerli testnet network using the MetaMask wallet We will need get some free test ether from the Goerli test network. Copy Account 1’s Ethereum address and use one of the Goerli testnet faucets to get the ether through your Ethereum account address: Official Goerli testnet faucet: https://goerli-faucet.slock.it/ Goerli Faucet: https://goerlifaucet.com/
Getting an account private key
In MetaMask, click Account 1, then select account details. On the details page, click Export private key, enter your wallet password, then click Confirm. MetaMask will show your account’s private key:
Figure 7.9 – Get account private key Copy this private key to the .env file’s privateKey field.
The web3.js Transaction API At a high level, submitting a transaction, such as transferring a token from one account to another, involves two steps: 1. Sign the transaction using web3.eth.accounts.signTransaction. This will only sign the transaction, not send it. 2. Next, we send transactions to the network using web3.eth.sendSignedTransaction. In this section, we will walk through how to sign and send transactions using the web3.js library. We will show how to sign and send a transaction between two accounts by transferring ether through a testnet: 1. Set up two account addresses as fromAddress and toAddress. We copy these two addresses from the MetaMask accounts we created:
const Web3 = require('web3') require('dotenv').config() //setup web3 const web3 = new Web3(new Web3.providers.HttpProvider(`${process.env.L1RPC}`)) const fromAddress = '0x5381E3e6b740C82b294653711cF16619D68b71B8' const toAddress = '0xA7f4b23804502E1E1a5Aeaa8FA6571A412EfdC6C' // 2. Create account variables const accountFrom = { privateKey: privateKey: `${process.env.privateKey}`,, address: fromAddress, }; // Change addressTo const addressTo = toAddress;
Notice that we use dotenv to read the QuickNode Goerli test network endpoint and wallet private key in the code. 2. Write the transfer function as follows: const transfer = async () => {
console.log(`Sending transaction from ${accountFrom.address} to ${addressTo}`); // 4. Sign tx with PK const createTransaction = await web3.eth.accounts.signTransaction( { gas: 21000, to: addressTo, value: web3.utils.toWei('0.01', 'ether'), }, accountFrom.privateKey ); // 5. Send tx and wait for receipt const createReceipt = await web3.eth.sendSignedTransaction(createTransaction.rawTran saction); console.log(`Transaction successful with hash: ${createReceipt.transactionHash}`); }; // 6. Call transfer function
transfer();
3. Following the two-step transaction process, we first call the signTransaction API by passing the account private key to sign the transaction. To sign a transaction, the from account needs to be unlocked. 4. The API sends the signed transaction from account 1 to account 3 using account 1s private key, passing the transaction data to be signed and the address to sign the transaction with web3.eth.signTransaction(transactionObject, address [, callback]). The signed result returned is in hex format and contains
your signed message. The API calls the remote Ethereum jsonrpc eth_sign method: {"jsonrpc":"2.0","method":"eth_sign".. } sendSignedTransaction
takes the already-signed transaction and sends it
to the Ethereum network. The method is a wrapper for eth_sendRawTransaction: web3.eth.sendSignedTransaction(signedTransactionData [, callback])
Let’s run the transaction.js script: node transaction.js
Sending transaction from 0x5381E3e6b740C82b294653711cF16619D68b71B8 to 0xA7f4b23804502E1E1a5Aeaa8FA6571A412EfdC6C
Transaction successful with hash: 0xc447ab21e6ba81d972d16b4ed2bcf70b90423710aea2625ccb9f43d13eb c5ffa
Once the transaction is complete, go to MetaMask to verify it. You should see that 0.01 ether was transferred from Account 1 to Account 3:
Figure 7.10 – Transfer of ether from Account 1 to Account 3 With the successful transfer of ether from Account 1 to Account 3, we have gained a basic understanding of how the web3.js Transaction API functions. In the next section, our focus will shift to web3.js smart contract deployment, where we will explore the process involved.
Deploying a smart contract using web3.js In this and the following section, we will deploy and interact with a smart contract using the web3.js API. The web3.eth.Contract object is used to interact with smart contracts on the Ethereum blockchain. Before we can deploy the contract, we need to prepare some contract data: 1. Get the smart contract ABI and bytecode. In Chapter 6, Fundamentals of Solidity, we used the Remix IDE to compile Orders.sol. Click on the ABI icon and copy the Orders.sol ABI. We can find the Orders contract ABI and bytecode by clicking on the Compilation Details button. You can find the ABI and bytecode in the popup that appears:
Figure 7.11 – Contract ABI and bytecode 2. Then create an orders.json file in the same folder as contract.js. Paste the ABI and bytecode content into this .json file:
Figure 7.12 – Contract ABI .json file 3. Create a file called contract.js. 4. Copy the transaction.js from Web3js Transaction API section, getWeb3() function and account and key configuration logic in contract.js. We also read the orders.json file and get the ABI and bytecode: const fs = require('fs'); const Web3 = require('web3') require('dotenv').config() //setup web3 const web3 = new Web3(new Web3.providers.HttpProvider(`${process.env.L1RPC}`)) // 1. Import the contract file const contractJsonFile = fs.readFileSync('orders.json'); const contract = JSON.parse(contractJsonFile); const fromAddress = '0x5381E3e6b740C82b294653711cF16619D68b71B8' // 3. Create address variables const accountFrom = { privateKey: `${process.env.privateKey}`, address: fromAddress,
};
With the basic web3.js contract configuration set up, we are now ready to utilize the web3.js API to deploy our smart contract. There are several steps we need to follow to deploy a smart contract: 1. Create a contract instance. 2. To create a new contract instance, the following function can be used: new web3.eth.Contract(jsonInterface[, address][, options]).
The input parameters are as follows: jsonInterface:
An object. This is the JSON interface to instantiate
the contract. address:
Optional string. This is the address that the smart contract
calls. options:
Optional object. The smart contract’s options, including the
following: from:
string
gasPrice: gas:
string
number
data:
string
An object will be returned when the function is called. The object contains information on the instance of the smart contract, such as the events and methods: var contract = new Contract(jsonInterface, address);
const contractInst = new web3.eth.Contract(abi);
3. Create a constructor transaction. 4. The deployment of a contract instance is achieved by invoking the contract’s constructor function and passing the bytecode of the contract. It will deploy the contract to the blockchain. After successful deployment, it will return the transaction object: const contractTx = contractInst.deploy({ data: bytecode });
5. Sign the transaction. Similar to the previous transaction.js, we use a Coinbase account to sign the contract transaction. Contract instances have an estimateGas method, which will give us an estimated gas amount. The data will be encoded ABI data from the contract instance: const createTransaction = await web3.eth.accounts.signTransaction(
{
data: contractTx.encodeABI(),
gas: await contractTx.estimateGas(),
},
accountFrom.privateKey
);
6. Send the signed transaction over the Ethereum network and get the transaction receipt, which contains the deployed contract address: const createReceipt = await web3.eth.sendSignedTransaction(createTransaction.rawTran saction); console.log(`Contract deployed at address: ${createReceipt.contractAddress}`);
7. Let’s run contractDeploy.js: node contractDeploy.js Attempting to deploy from account 0x5381E3e6b740C82b294653711cF16619D68b71B8 Contract deployed at address: 0xE442a2A8B95e3445640E67D854f6E81C907F1e35
The contract was successfully deployed to the 0xE442a2A8B95e3445640E67D854f6E81C907F1e35
address.
8. To verify the contract was successfully deployed on the Goerli testnet, have a look for it in the Goerli explorer: https://goerli.etherscan.io/address/0xE442a2A8B95e3445640E67D85 4f6E81C907F1e35:
Figure 7.13 – Etherscan showing the contract deployment With our smart contract successfully deployed, the next step is to interact with it using the web3.js API. In the upcoming section, we will explore how to utilize the web3.js API to interact with our deployed smart contract, allowing us to call its functions, retrieve data, and update its state on the blockchain.
Interacting with a smart contract This section will discuss how to interact with deployed smart contracts using the web3js API. With our contract now deployed, we can call the contract function: 1. Order.sol contains the setOrder method, which stores buyer order information with product and quantity values: function setOrder(address _address, string memory _buyer, string memory _product, uint _quantity) public { Order storage order = orders[_address]; order.buyer = _buyer; order.product = _product; order.quantity = _quantity; }
2. Since this method will update the blockchain state, it will be quite similar to the previous transaction.js logic. Let’s first set up the environment configuration value. Then, we provide the addresses of the deployed contract for submitting orders: const fs = require('fs'); const Web3 = require('web3') require('dotenv').config() //setup web3
const web3 = new Web3(new Web3.providers.HttpProvider(`${process.env.L1RPC}`)) // 1. Import the contract file const contractJsonFile = fs.readFileSync('orders.json'); const contract = JSON.parse(contractJsonFile); const fromAddress = '0x5381E3e6b740C82b294653711cF16619D68b71B8' // 3. Create address variables const accountFrom = { privateKey: `${process.env.privateKey}`, address: fromAddress, }; // 3. Create address variables const contractAddress = '0xE442a2A8B95e3445640E67D854f6E81C907F1e35'; // 4. Get the bytecode and API const abi = contract.abi; // 5. Create contract instance const contractInst = new web3.eth.Contract(abi, contractAddress);
3. Call the setOrder method to build the order transaction: // 5. Build order tx
4. To set up an order for Alice with 10 units of Asset1, we invoke the setOrder method: const setOrderTx = contractInst.methods.setOrder(fromAddress, "Alice", "Asset1", 10);
5. Similar to the previous example in the The web3.js Transaction API section, we sign and send the transaction for the setOrder function: const setOrder = async () => { console.log( `Calling the setOrder function in contract at address: ${contractAddress}` ); // Sign Tx with PK const createTransaction = await web3.eth.accounts.signTransaction( { to: contractAddress, data: setOrderTx.encodeABI(), gas: await setOrderTx.estimateGas(), },
accountFrom.privateKey ); // Send Tx and Wait for Receipt const createReceipt = await web3.eth.sendSignedTransaction(createTransaction.rawTran saction); console.log(`Tx successful with hash: ${createReceipt.transactionHash}`); };
6. Run the contractSendAndSignTnx.js script. The transaction should be returned successfully with a hash value: node contractSendAndSignTnx.js Calling the setOrder function in contract at address: 0xE442a2A8B95e3445640E67D854f6E81C907F1e35 Tx successful with hash: 0xd5150a0d308fc6442ca22926c5132c6c8823e5e8b418802a12eb54 655ce7625e
7. Now let’s verify it by calling getOrder on the contract: function getOrder(address _address) view public returns (string memory, string memory, uint) { return (orders[_address].buyer, orders[_address].product, orders[_address].quantity); }
8. The function needs to pass the buyer’s contract address. 9. The contract object has a call method that executes its smart contract method in the EVM without sending any transaction or modifying the transaction state: myContract.methods.myMethod([param1[, param2[, …]]]).call(options [, defaultBlock] [, callback])
Here is the getOrder call logic: // 5. Create getOrder function
const getOrder = async () => {
console.log(`Making a call to contract at address: ${contractAddress}`);
// 6. Call contract
const retrieve = await contractInst.methods.getOrder(fromAddress).call();
console.log(`The current order is: ${retrieve}`);
};
// 7. Call getOrder function
getOrder();
10. Let’s run it: node contractCall.js Making a call to contract at address: 0xE442a2A8B95e3445640E67D854f6E81C907F1e35 The current order is: {"0":"Alice","1":"Asset1","2":"10"}
We then get Alice’s order. This is the end of the web3.js section. In this section, we reviewed some popular web3.js APIs, using them to create user accounts, get balances, transfer ether, deploy smart contracts, and submit and call smart contract functions. For more information about web3.js, you can browse the documentation (for version 1.7.3) at https://web3js.readthedocs.io/en/v1.7.3/#.
Getting started with web3.py We have now learned how to interact with the Ethereum blockchain using web3.js. web3.py is a Python library that was originally derived from the web3.js library, but has since evolved separately for Python developers. In this section, we will embark on a journey to explore web3.py, a powerful Python library for interacting with the Ethereum blockchain. web3.py provides developers with a comprehensive set of functionalities to interact with smart contracts, send transactions, and query blockchain data using the Python programming language. web3.py brings the capabilities of web3.js to Python developers, allowing them to harness the power of Ethereum and build decentralized applications using familiar Python syntax and conventions. With its intuitive API and extensive documentation, web3.py simplifies the process of integrating Ethereum blockchain functionality into Python-based projects.
Prerequisites Before diving into web3.py, it is essential to set up a dedicated virtual environment to manage your Python dependencies. A virtual environment creates an isolated environment in which you can install specific packages and libraries without interfering with your system-wide Python installation. In the next section, we will guide you through the process of setting up a virtual environment for web3.py. This step ensures a clean and controlled development environment, allowing you to manage dependencies and packages specific to your web3.py projects.
Setting up the virtual environment We will start with setting up the virtual environment. The virtual environment can protect us from unsupported packages or version conflicts: 1. In a Terminal, install pip if it’s not already available:
which pip || curl https://bootstrap.pypa.io/get-pip.py | python3
2. Then install virtualenv if you don’t have it already: which virtualenv || pip install --upgrade virtualenv
3. Next let’s create a virtual environment: virtualenv -p python3 ~/.venv-py3
4. After our virtual environment has been created, we will also need to activate it: source ~/.venv-py3/bin/activate
Each time we start a new Terminal session, we need to activate virtualenv using the preceding command. 5. With virtualenv active, (.venv-py3) will show in the Terminal. 6. We then upgrade to the latest tools: pip insta– --upgrade pip setuptools
7. Finally, we install the web3.py library: pip insta– --upgrade web3
At this point, you have successfully installed web3.py and set up the necessary virtual environment. You are now ready to start utilizing the powerful features and functionalities of web3.py to interact with the Ethereum blockchain using Python. In the next section, we will dive deeper into web3.py, exploring its core concepts and demonstrating how to connect to Ethereum networks, interact with smart contracts, and perform various operations using the web3.py library.
Overview
For your reference, the structure of web3.py library is listed in the following tree: Web3 Web3 API Address utilities Web3.isAddress() Web3.isChecksumAddress() Web3.toChecksumAddress()
Converting currency Web3.fromWei() Web3.toWei()
Encoding and decoding Web3.is_encodable() Web3.toBytes() Web3.toJSON()
Cryptographic hashing Web3.keccak() Web3.solidityKeccak()
web3.eth API Data – fetching data web3.eth.get_balance() web3.eth.get_transaction()
Transactions – making transactions web3.eth.send_transaction() web3.eth.sign_transaction() web3.eth.send_raw_transaction() web3.eth.wait_for_transaction_receipt() web3.eth.get_transaction_receipt() web3.eth.sign()
Contracts – smart contracts utilities web3.eth.contract() Contract.address Contract.abi Contract.functions Contract.fallback Contract.constructor() Contract.encodeABI()
Logs and filters – getting logs and applying filters on events web3.eth.filter()
web3.eth.get_logs()
Net – network properties web3.net.listening web3.net.peer_count web3.net.version
After quickly reviewing the web3.py APIs, it is now time to move on to the next step: connecting to Ethereum networks. By establishing a connection to an Ethereum network, you will be able to interact with the blockchain, retrieve data, send transactions, and execute smart contracts. In the upcoming section, we will guide you through the process of connecting to Ethereum networks using web3.py, enabling you to seamlessly integrate your Python applications with the Ethereum ecosystem.
Connecting to Ethereum networks web3.py allows us to connect to local nodes provided by Ganache and remote providers. Besides this, it also provides a test provider that has test accounts with test ether under EthereumTesterProvider from the ethtester library. If we want to connect to the test provider, we can install the eth-tester library using these steps: 1. Open the Terminal and reactivate the virtual environment: source ~/.venv-py3/bin/activate Before proceeding, make sure to install eth-tester if it is not already installed.pip install eth-tester
2. Now let’s see how to connect to Ethereum networks. Open the python3 shell: python3 Import modules: >>> from web3 import Web3, EthereumTesterProvider
web3.py has three types of built-in providers to connect to the corresponding JSON-RPC servers listed in the following table:
Providers
JSON-RPC servers to Code to create a new instance of Web3 connect to
Web3.IPCProvider
IPC socket
Web3.HTTPProvider
HTTP and HTTPS
w3 = Web3(Web3.IPCProvider('./path/to/geth
w3 = Web3(Web3.HTTPProvider('http://127.0.0.1:85
ws and w3 = Web3.WebsocketProvider wss Web3(Web3.WebsocketProvider('ws://127.0.0.1 websocket EthereumTesterProvider
Test provider
w3 = Web3(EthereumTesterProvider())
Table 7.2 – Built-in providers in web3.py 3. Check the connection. If the connection is established, it will return True: >>> w3.isConnected()
True
4. Once connected, we can start to use web3.py. We can get the latest block information as follows: >>> w3.eth.get_block('latest')
AttributeDict({'number': 0, 'parentHash': HexBytes('0x0000000000000000000000000000000000000000000000000 000000000000000'), 'nonce': HexBytes('0x000000000000002a'), 'sha3Uncles': HexBytes('0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f 0a142fd40d49347'), 'logs_bloom': 0, 'transactionsRoot': HexBytes('0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc00 1622fb5e363b421'), 'receipts_root': '0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e3 63b421', 'stateRoot': HexBytes('0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc00 1622fb5e363b421'), 'miner': '0x0000000000000000000000000000000000000000', 'difficulty': 131072, 'totalDifficulty': 131072, 'size': 0, 'extraData': HexBytes('0x00000000000000000000000000000000000000000000000 00000000000000000'), 'gasLimit': 30029122, 'gasUsed': 0, 'timestamp': 1654777902, 'transactions': [], 'uncles': [], 'baseFeePerGas': 1000000000, 'hash': HexBytes('0xdd0103dbdfaf4bf4f268c6a9c5bb7d34505d4c402115e534d 28373f10165d85c')})
Now that we have successfully connected to the blockchain using web3.py, the next section will focus on deploying smart contracts. Deploying smart contracts is a crucial step in utilizing the power of blockchain technology. We will explore how to use web3.py to compile and deploy smart contracts to the Ethereum network. By following the step-by-step instructions, you will learn how to deploy your own smart contracts and make them accessible to the network participants.
Deploying smart contracts using web3.py In the previous section for web3.js, we learned how to fetch information about accounts and smart contracts and make transactions. With web3.py, we will show an example of how to deploy a smart contract. First, let’s install a few tools so we can run our example:
1. Open a Terminal and reactivate the virtual environment: source ~/.venv-py3/bin/activate
2. Install the sandbox node that eth-tester provides: pip install -U "web3[tester]"
3. Install the Solidity compiler: pip install py-solc-x
4. Install the latest version of solc in the python3 shell: python3 >>> from solcx import install_solc >>> install_solc(version='latest') Version('0.8.14')
5. We are all set for deploying smart contracts now. Next, import the dependencies: >>> from web3 import Web3 >>> from solcx import compile_source
6. Connect to the test provider: >>> w3 = Web3(Web3.EthereumTesterProvider())
7. Set the default account. We will use this as the account from which the ether will be sent: >>> w3.eth.default_account = w3.eth.accounts[0]
8. Compile a hello world Solidity contract from a source string: compiled_sol = compile_source( ''' pragma solidity >=0.4.16