208 42 10MB
English Pages 376 [377] Year 2023
Next Generation Peerto-Peer Engineering
Next Generation Peerto-Peer Engineering: Mediated Computing on the Edge By
William J. Yeager and Rita Yu Chen
Next Generation Peer-to-Peer Engineering: Mediated Computing on the Edge By William J. Yeager and Rita Yu Chen This book first published 2023 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2023 by William J. Yeager and Rita Yu Chen All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-5275-9479-3 ISBN (13): 978-1-5275-9479-1
To the memory of Gene Kan, a dear friend and colleague. His contributions to Gnutella helped to bootstrap P2P in the new millennium.
TABLE OF CONTENTS
Chapter 1 ....................................................................................................1 Introduction 1.1 The .COM Supernova ............................................................................1 1.2 The Changing Internet Marketplace ......................................................4 1.3 What Is a P2P Network and Computing on Edge?? ..............................7 1.4 Why P2P Now? ...................................................................................10 1.5 Basic P2P Rights .................................................................................14 1.6 Contents of This Book and P2P PSEUDO-Programming Language .....21 Chapter 2 ..................................................................................................23 P2P: The Endgame of Moore’s Law 2.1 The 1980’s ...........................................................................................24 2.2 The 1990’s - The Decade of the Information Highway and the New Millennium ................................................................................................36 2.3 The New Millennium...........................................................................39 Chapter 3 ..................................................................................................42 Components of the P2P Model 3.1 The P2P Document Space ...................................................................43 3.2 Peer Identity.........................................................................................52 3.3 The Virtual P2P Network ....................................................................69 3.4 Scope and Search - With Whom Do I Wish to Communicate? ...........79 3.5 How to Connect ...................................................................................86 Chapter 4 ................................................................................................108 Basic Behavior of Peers on a P2P System 4.1 The P2P Overlay Network Protocols .................................................108 4.2 P2P Mediator Protocols .....................................................................133 4.3 P2P PeerNode-to-PeerNode Protocols...............................................203 4.4 P2P Connected Community Protocols...............................................205 4.5 P2P Hashing Algorithms in the Context of Our P2P Overlay Network ...................................................................................................209 4.6 More 4PL Examples ..........................................................................216
viii
Table of Contents
Chapter 5 ................................................................................................227 Security in a P2P System 5.1 Internet Security ................................................................................227 5.2. Reputation Based Trust in P2P Systems ...........................................240 5.3 More Building Blocks for P2P Security ............................................248 5.4 Digital Rights Management on P2P Overlay Networks ....................280 Chapter 6 ................................................................................................285 Wireless and P2P 6.1 Why P2P in the Wireless Device Space?...........................................287 6.2 Introduction to the Wireless Infrastructures ......................................288 6.3 The Telco Operator/Carrier ...............................................................303 6.4 Fixed Wireless Networks...................................................................304 6.5 Mobile Ad-hoc...................................................................................310 6.6 P2P for Wireless ................................................................................313 Chapter 7 ................................................................................................318 Applications, Administration, and Monitoring 7.1 Java Mobile Agent Technology .........................................................318 7.2 Implementation of Java Mobile Agents on the P2P Overlay Network ...................................................................................................321 7.3 The Management of the P2P Overlay Network with Java Mobile Agents......................................................................................................341 7.4 Email on the P2P Overlay Network...................................................345 Appendix I ..............................................................................................354 4PL Specification Introduction .............................................................................................354 1.0 P2P Overlay Network Documents .....................................................355 2.0 P2P Overlay Network Commands .....................................................358 References ..............................................................................................361
CHAPTER 1 INTRODUCTION
Although the goal of this book is technical in nature as well as instructional, that is to say, if one wishes to know how to build a P2P network, reading this book provides multiple recipes for both the student and the practitioner. Yet, its arrival is not independent of existing economic trends and technical market forces. Thus, up front, before diving headlong into the engineering chapters, let’s take a closer look at the significant events that have motivated the necessity to write a book on “P2P Engineering and Computing on the Edge” at this time.
1.1 The .COM Supernova The year 2000 was amazing. The beginning of the new millennium began with fireworks displays broadcasted worldwide as successive capital cities greeted the stroke of midnight. The high-tech economy responded likewise. Over the next year it experienced worldwide growth to reach new limits as the .COM frenzy took hold. This folie arrived at the point where a clever new domain name could generate a few million dollars in venture capital to fund startups, and early stage, emerging companies. This explosive economic growth had to self-destruct sooner or later. It did, and sooner, when in March, 2000 the stock market bubble burst like a sudden .COM supernova that sent hundreds of these Internet companies into bankruptcy. Since that time most of those .COMs that drove the NASDAQ to new heights have practically all vanished. It is historically important to note that the .COM stock market “bubble” bursting is not a new phenomenon. The same bubble effect occurred concurrently with the inventions of electricity in 1902, the radio in 1929, and the promise of the mainframe computer in 1966. Initial expectations drove stocks’ P/E ratios to unrealistic heights only to see them abruptly come tumbling down again. The new technology in each case was sound and produced long-term revenue, the investors’ enthusiasm and the market’s response were not.
2
Chapter 1
One can ask first, why did the .COMs come into being, were they and/or their supporting technologies legitimate? And from this point of view, one can try to understand the above events. Secondly, given the void that their disappearance had left, what, if anything, could be done to put in their place the products and services that would result in long-term, stable economic growth in the high-tech sector? This collapse has shown this sector to be the heart of the cardio-vascular system of the new millennium economy. There are those who denied the latter statement and looked to the old goldstandard industries in times of economic decline; but the new economy was and is as dependent on the high-tech Internet technologies as the automobile industry is on our highway system. The birthrate of .COMs was directly tied to the ease with which one can deploy on the Internet client/server web-based architectures, and this drove the .COM expansion further than it should have. While this ease of deployment was certainly a big plus for the underlying technologies, the resulting digital marketplace could not provide the revenue streams necessary to either support the already existing .COMs or sustain the growth being experienced. One wonders about their business plans since office space was rented, employees hired, software written, and hardware purchased and connected to the “Internet Information Highway,” and all with speculative, venture capital investments. Venture capital support was supposed to be based on a sound business plan as well as new ideas based on Internet technology. Here it is extremely important to note that the technology is not at fault. The technology was and is fantastic. Rather, it was the desire to make a quick buck today without worrying about tomorrow’s paycheck that brought the economy to its knees. One senses that a sound business plan was irrelevant, and that the hopes for a quick and highly inflated IPO were the goal. When the .COMs went supernova, and their hardware capital was reabsorbed into the economy with bankruptcy sales, this brutalized companies, like Sun Microsystems, that depended on hardware revenue. Similarly, tens of thousands of very talented employees were then jobless. This can only be viewed as a course in “business 101,” as “lessons learned” for the young entrepreneurs and their technical counterparts. For those who profited from this bubble implosion-explosion, and there are many, these young entrepreneurs came back, and in force with their technical teams; but also wiser, smarter; they came back and as well made the necessary changes to the system that exploited their talent, energy and dreams. On the technical side of things, again, the ease of deployment of the hundreds of .COMs was a proof of concept of not only the web based, distributed computing model but also the high-tech design, manufacturing
Introduction
3
and delivery cycle. High-tech was and would continue to be responsive across the board, and clearly, the Internet did not go away either as a way of doing e-business, or as a means of enhancing both personal communication and one’s day-to-day life. With respect to personal communication, a beautiful thing happened during this time. Strong partnerships and alliances were created between nearly all aspects of the high-tech and telecommunications sectors because personal communication in all its forms can be enhanced with Internet connectivity. Yes, there was a rush to add features to mobile phones before the technology was mature, but the i-Mode, Java/MIDP experiment of the early 21st century alone proved the business and technical models are sound. And at the same time, the WAP GSM and 2.5G experiments proved that without uniform standards across this space, as well as responsive data networks, these services will be abandoned. The former partnerships continued to flourish, and the latter problem was realized mid-stream at the WAP Forum; WAP and its WAP gateways as proxies to the Internet were abandoned; and end-to-end Internet protocols were adopted for mobile devices. This is discussed in detail in Chapter 6. Consequently, business opportunities with personal communication, media-sharing and life-enhancing applications were speculated to be a large part of the P2P economic model. We will point out throughout this book how P2P can better address areas like these, and more easily solve the multiple device problems for which client/server web-based solutions have not been adequate. In this book we describe how in some of these cases a marriage of these latter technologies with P2P networks will be suitable collaborative P2P applications in the enterprise, where the documents produced are proprietary and need to be highly secured, regularly checkpointed, with all transactions audited; and in others, pure, ad-hoc P2P networks will be the best solution. Here, one might have the exchange of content, like family trip photos, or video sessions between neighbors connected either on a shared Wi-Fi network, or with cable or ADSL. In all such cases, content will be protected from ISP snooping. And above, in all such cases using P2P to provide computation on the edge, that is, nearer to the users, will significantly improve user-to-service bandwidth. It will as well lessen the bandwidth load on the long haul, backbone networks data must traverse to reach these massive Cloud Services. Also, by distributing such services in this manner, users and providers escape all your services in one location, syndrome. Thus, the distributed servers on which these services run are less vulnerable to attacks. As the .COM rollout proceeded in the first decade of the new millennium, limitations of the underlying network, its administration and
4
Chapter 1
services were also revealed; SPAM continued to be an uncontrolled problem; bandwidth was not sufficient in the “last mile;” Internet service to China, India and Africa was minimal to non-existent (Allen, 2001, 76-83); and denial-of-service attacks, and security problems were then and still are a constant concern. These are discussed in section 1.3.1.1, and P2P engineering solutions, where applicable, are found throughout the book. For example, security, distributed-denial-of-service (DDoS) attacks, and SPAM are covered in Chapter 5. In the final analysis, the .COM supernova was not technology gone wrong, but rather a business failure. Yes, we did have those unemployed, creative engineers, managers, and marketeers. Happily, creative beings tend not to be idle beings. Creative energy is restless energy. From this perspective, the “supernova” was a positive event: Just like its astronomical counterpart which must obey the laws of thermodynamics, where the released energy and matter self reorganizes itself to form new stars, one can view the laid off, post .COM supernova employees as a huge pool of intellect, of creative energy that did not remain dormant like a no longer active volcano, but rather, regathered in small meetings in bars, cafes, homes and garages, to create business plans based on surprising and disruptive technologies, some which appeared to come out of “left field,” and others from a direct analysis of what is required to advance Internet technology. And the post .COM entrepreneurs were not much older but were much wiser. As a result, the technologies they will, and did introduce were and are based on sound computer science; an examination of the underlying reasons why failure of such huge proportions happened so abruptly; and thus, yielded products with long term revenue as a goal rather than a short-term doubling of one’s stock portfolio. Life continued for these individuals with these dreams in place, is here to stay, reshaping itself as necessary, and the .COM supernova was the natural progression of things: A necessary reorganization of a huge amount of energy that cannot be destroyed.
1.2 The Changing Internet Marketplace Why does a marketplace change? How do we get into online shopping in large farmer-like markets? Fundamental to these two questions is access to the distribution and aggregation of the commodities being sold. The Internet, digital or e-Commerce marketplace is not different. As we accelerated through the events discussed in the previous section, rules for accessing, distributing, and delivering digitally purchased items were put in place. But, right after the .COM bubble burst, most of which was purchased
Introduction
5
on the Internet could not be delivered in the same manner. Many examples come to mind. Two typical ones at that time were eBay and WebVan. One requires revenue streams to deliver. Thus, the supply chain of purchased, eCommerce products to the consumer stalled. It is interesting to note that there were two salient exceptions: Amazon and Netflix. They did not overexpand. Their business models were steady as you go. How such budding companies can build a P2P, Mediated, Edge-Computing as a cost saving infrastructure is one of the several goals of this book. With respect to digital music, Napster’s P2P network was an initial effective as well as inexpensive delivery mechanism. It was brought down by copyright infringement problems and is discussed below. Fortuitously, as a side-effect, it helped bootstrap P2P technology: Even if Nutella, Freenet, LimeWire and BitTorrent were noteworthy early efforts that were also drawn into the fray of copyright infringement, their P2P technologies were an excellent motivation for the P2P research and the diverse systems that followed. As an example, one of the more successful, early P2P efforts was Sun Microsystems open-source Project JXTA (Project JXTA, 2011). It was initiated in 2001. This open-source project provided a P2P platform on which anyone could build P2P applications and services. This technology was globally downloadable at no cost. And its more than 20 United States patents permitted unlimited derivative works and were royalty free. The latter permitted its CTO to work with the IETF’s Internet Architecture Board to create the first P2P Internet Research Task Force. Sun Microsystems as a company had as one of its goals to create Internet standards. How can P2P ease the access to products and services that are for the most part achieved through a web-based, browser interface or a mobile app. If one needs to search for a hotel at a particular price in each region, then Internet access to this digital information can be extremely tedious and time consuming. While the results are often gratifying, the means of access can certainly be improved. If one wishes to purchase an orange, then it should not be necessary to go to Florida. For the http GET mode of access, one almost always returns to the home site for fresh data. Napster showed us that one can much more effectively deliver digital content using hybrid P2P technology. Yes, Napster no longer exists but the technology it used is only getting better. There were legal issues with respect to copyright protection, and so digital rights management software has been written to protect digital ownership. Why? Because for those who stopped using Napster, the recording industry realized the huge potential of unloading the infrastructure cost for delivering mpeg to the users’ own systems and networks. P2P is sure to eventually become an essential part of
6
Chapter 1
the digital marketplace because safeguards can be put in place to both guarantee the payment for purchased, copyrighted material as well as its P2P distribution. Dr. Stefan Saroiu pointed out that about 40-60% of the peers shared only 5-20% of the total of shared Napster files (Saroiu, 2002), i.e., Napster was used as a highly centralized, client/server system with very little incentive for sharing the wealth of music in the system. One can speculate that the resistance to sharing was directly correlated with the fear of getting caught for copyright theft, and that a system of sharing incentives such as digital coupons leading towards free or low-cost content will be easy to put in place in a marketplace that discourages digital content theft. Finally, given the vast extent of the now primarily In the Cloud based access by the means of an ISP, the user computing on The Edge will be able to access proximity Cloudlets Services. This will optimize the Cloud’s bandwidth, thus providing faster Internet service with the local, high-speed distribution and streaming of media in all its forms. This includes voice recognition without delays; enhanced eCommerce; optimized virtual reality services and apps; and significantly increases AI algorithm’s response times. In fact, AI at the edge can potentially yield near-instantaneous decision-making. For P2P to become a fundamental building block in the existing digital marketplace, the digital content needs to be aggregated closer to home. While powerful servers are essential for the financial management and initial distribution of this data, always “going to Florida” is no longer a sensible solution. Just like the freeways are filled with semi-trucks during off hours to deliver the oranges to local markets, the same use of the information highway makes good “bandwidth sense.” As the Internet Cloud continues to expand like the Universe, the consumer experience will require local-proximity-based, Cloudlet data access. This is thoroughly discussed in Chapter 2. As mentioned above, even with Google search, the web-based process of finding what you wish to purchase is tedious, time consuming and can be streamlined if, for example, the user wishes to purchase from a nearby merchant. One might like to have a digital aide with a template exactly describing a desired purchase, and a P2P app that does the shopping by querying one’s local, on the Edge, marketplace Cloudlets and returning only the best matches. Scanning, for example, Google lists is really a waste of time. Cloudlets with digital yellow pages for locally available purchases will create a very attractive e-Commerce marketplace. We discuss this thoroughly in Chapter 7 under the topic of Java Mobile Agents on P2P networks.
Introduction
7
It is also important to understand that commercial P2P does not exclude user P2P networks for user communication as well as exchanging their personal media with the usual legal requirements for copyright protection. P2P applications using Computing on the Edge technology can enable such small communities in Virtual Private P2P networks.
1.3 What Is a P2P Network and Computing on Edge? What is P2P? That is the real question. Most people still believe that P2P means the ability to violate copyright and exchange what amounts to billions of copyrighted mpeg or jpeg files free of charge. The music and motion picture industries are strongly against P2P for this reason. We will discuss the historical origins of P2P in Chapter 2, and this history makes no reference to the “Napster-like” definitions of P2P. Rather, it will discuss the foundations of distributed, decentralized computing. One finds also that the non-technically initiated have also begun to equate decentralized computing with a dark force of the computing psyche. It almost comes down to the old battle of totally centralized versus decentralized governments. Note also that Cloud Services provide centralized computing in The Cloud. Amusingly enough, for example, The United States and European Union are organized somewhat like hybrid P2P networks in the definition we will propose below. And capitalism was born out of such an organization of political and legal balance of power. Where does that leave the cabalistic opponents of P2P? We leave that to the reader to decide. But this opposition is at least partially neutralized by a lack of understanding of P2P, and thus a goal of this book is to help shed some light on this P2P issue. Decentralized, distributed computing is a powerful tool for organizing large collections of nodes on a network into cooperative communities, which in turn can cooperate with one another. Yes, the one possible instance of this paradigm is anarchy where each member node of such an organization of computational resources can do what it wishes without concern for others. At the opposite extreme is a dictatorship. An organization of network nodes that leads to either anarchy or a dictatorship does not have a place in P2P networks as viewed from this book’s perspective. Nearly everything in between does. And, certainly, we need to establish some rules to prevent the violation of others’ rights whether they are members of society or nodes in a network. Napster from this point of view was not P2P. Rather, it was a centralized system that encouraged non-cooperation among the member nodes or a subtle form of anarchy which is almost a self-contradiction. Centralized because all mpeg distribution began with centralized registration and access to copyright-
8
Chapter 1
protected mpeg files from about 160 servers (Saroiu, 2002). And anarchylike behavior among the nodes because once a node possessed a copy of a mpeg file the tendency was not to redistribute it, and thus, ignore the underlying “share” the content model which is at the roots of P2P. Clay Shirky (Winer, 2002) gives us a litmus test for P2P in: 1. Does it treat variable connectivity and temporary network addresses as the norm, and 2. Does it give the nodes at the edges of the network significant autonomy? While this is a test for P2P, there will be instances of P2P networks from this book’s point of view that will treat fixed network addresses and 24x7 connectivity as the norm. Our goal is not to have a purist litmus test that excludes a major segment of the computational network, but rather a test that is less exclusive. To this end a collection of nodes forms a P2P overlay network or P2P network if the following hold: 1. A preponderance of the nodes can communicate with one another; can run app-services enabling them to each play the role of both a client and a server; and exhibit a willingness to participate in the sharing of resources, 2. Nodes may be completely ad-hoc and autonomous, or use traditional, centralized, client/server technology as necessary. Here one notes the term overlay network. From this book’s point of view, P2P networks are overlay networks on top of the real network transports and physical networks themselves, as shown in Figure 1-1. P2P means mutually supportive nodes on the one hand and being able to use as much of the available technology as is necessary on the other, and thus, to arrive at a network that behaves optimally. A P2P network in an enterprise will be different from a P2P network in a neighborhood, and the two may or may not communicate with one another. The former will in all probability be stable, and the latter most likely ad-hoc and unstable.
Introduction
9
Figure 1-1. P2P Overlay Network
The lifetimes of network addresses and connectivity, as well as an autonomous node’s symbolic “Edge” position in the Internet topology lay at the far end of a very broad P2P spectrum of possibilities offered by the above definition. If one wishes P2P to be a success, then the engineering principles to which it adheres as well as its domain, must be able to encompass, and find ways to both interact with and improve current Internet, centralized client/server, based app-services. In fact, the appropriate vision is to view the ultimate Internet as a single network of nodes for which P2P provides an underlying fabric to help assure optimal, and thus, maximum service to each device limited only by the device’s inherent shortcomings, and not by its symbolic position in the network. Yes, an adhoc, autonomous, self-organizing, network of unreliable nodes is inherently P2P. Yet, a highly available cluster of database systems supporting a brokerage firm can also be configured as a P2P network as can these systems’ clients. The members of such a cluster can be peers in a small P2P network using P2P communication for the exchange of availability and failover information; the clients can also be members of the same network to help both mediate network wide load balancing, and data checkpointing; as well as a member of a client P2P network to share content and suspend and resume tasks on one another’s systems. In such a configured P2P network there may be centralized client/server relationships to, for example, insure authenticated, authorized access, and this P2P network as well as the pure, ad-hoc P2P network both satisfy the definition, both being points in the P2P spectrum. The application of the fundamentals in this book will enable one to create such networks. But, standard, distributed client/server email and database systems are not P2P even if the clients may keep data locally and can act as auto-servers either
Chapter 1
10
to improve performance or at times of disconnection. These later client/server systems do not communicate with one another as peers and adhere strictly to their roles as clients and servers. This book does not discuss the fundamentals of these latter systems but will describe methods for morphing them towards the P2P paradigm for better performance. Such morphed systems will certainly be hybrid rather than pure P2P, and an extremely important step in the evolution of P2P technology.
Figure 1-2. The P2P Spectrum
The symbolic “Edge” of the network is really better viewed as preColumbian network terminology in the sense that before Columbus, the western world believed the world was flat, and had an edge. When Columbus looked for the edge of the world, he never found it, this fictional point of view was dropped, and the possibilities for travel have been limitless ever since that time. If one is at any node in the Internet, then there is not a sense of, “I am at the network’s Edge.” One’s services may be limited because of a slow or poor connection, and this can happen anywhere on the Internet. It is much better to view each node as located at the “Center” of the network, and then do what is possible to make all such “Centers’’ equal. This is closer to the task P2P has set out for itself in this book.
1.4 Why P2P Now? Internet e-Commerce will be as successful as the acceptance of the willingness to both use on a regular basis and pay for the applications and services (app-services) that the digital marketplace offers. One of the reasons we had a .COM supernova was the consumers did not accept the app-services offered by the .COM marketplace in the above sense. Some of the app-services were used some of the time, and very few were used all of the time. Those very few are those that survived. The acceptance one envisions here means much more than, “Hmm... This is an interesting URL, maybe I’ll try it someday.” Acceptance is expressed by users saying things like, “This app is so cool I can’t get along without it”; “This service is so compelling that I feel like I am under charged for its use”; and “this app is as necessary as my car, my roller blades, and my hiking boots, and, hey, this is fun!” The app-services must offer a break
Introduction
11
from the tedium of day-to-day living. People spend enough time waiting in lines, sitting in traffic, and being overloaded with junk postal mail, spam and those unwanted, popup advertisements. Each of the above produces revenue but why perpetuate pain when it is not necessary? In the last three cases the advertisers are neither thinking of, nor care about the recipient of the advertisements, rather they use any technique they can to increase sales revenue. How many times have you, the reader, been frustrated by these advertising methods? As its primary goal, the kind of acceptance envisioned here requires maximal service with minimal user hassles. Furthermore, optimal app-service response time is essential to enable the creation of real sources of revenue rather than using bogus nuisances for this purpose. To achieve maximal service with minimal user hassles we must look beyond the client/server/Cloud-Service-mode of distributed computing that drives the Internet. The era has arrived where billions of devices will be interconnected. Although the client-server/Cloud-service structured network is straightforward, and has served us well up to now, even software applications looking up the right server is a painful job for both clients and servers as our every-day directory service, Domain Naming Service (DNS) has the potential to become a bottleneck with the sustained growth of the Internet. In fact, a Distributed Denial of Service (DDoS) attack in October of 2016 on Dyn, a company that controls much of the DNS infrastructure, brought down most of the United States’ Internet for an extended period of time. For authentication and authorization, centralized, server-based services such as Kerberos are in use. Internet security protocols like Secure Socket Layer (SSL) and Transport Layer Security (TLS) currently require centralized Public Key Infrastructures (PKI) and well known, centralized Certificate Authorities (CA) to manage X509 certificate creation and distribution. These systems are also vulnerable to disruptions like DDoS attacks and are required to do secure Internet transactions. Finally, Cloud Services such as AWS, Azure, and Google Cloud that Build, Deploy, and Manage Websites, Apps or Processes certainly save money since the private data centers given the costs of the latter. But, attacking any one of these Cloud services can disrupt thousands of businesses and millions of users. This is an “all of your eggs in one basket” mode of operation. It is in fact one of the problems that inspires distributed computing on the edge. There is no reason why one cannot have edge-based Cloud services, which we call Cloudlets, where such a service is equivalent to peer in a P2P network of a regional or local user base. These Cloudlets will store the only data that is in use by their P2P network. This will not only significantly improve the achievable bandwidth and thus user
12
Chapter 1
satisfaction but also make these more finely distributed services more difficult to attack. These Cloudlets are not Internet cache storage. Rather, they are complete, user proximity services. The latter can abandon media that is no longer in use, and just as easily refresh it from the corresponding Cloud service as required. It is maximal resiliency. Beyond these computational infrastructure limitations in the client/server/Cloud model, we are also faced with the Mobility paradigm. One travels with her/his mobile device(s), or a laptop and would like to communicate with another such system. There is no reliable solution for the problem of ad-hoc mobility where a node can appear in a network, joins it, and begins to communicate with other nodes. We now have millions of mobile devices with disposable IP addresses and no home address identifiers. We need solutions for these mobile devices to discover and communicate with one another with end-to-end, secured connections. Facebook, Twitter, and Instagram don’t cut it with their viral behaviors. One can also envision P2P sensor networks as well as P2P drone networks. The former sensors might monitor a bridge for material failures; shopping centers for misbehavior; aid firefighters in locating hotspots; locate poachers; monitor traffic and alert congestion; go to areas of high radioactivity, and, etc. These can be viewed as inter-P2P networks, AI enhanced swarms with well-known home bases for real-time data communication as well as guidance. With the proper computation power and enhanced cameras becoming mobile, mini-neural networks executing machine learning algorithms, they can send their thus preprocessed images to their home bases for more intensive machine learning. They will save lives of humans as well as animals as well as protect both public and private property. To build a reliable computing powerhouse to serve billions of clients and applications, during the past few decades, companies, institutes, and governments have viewed Moore’s Law as a monarch to follow, as well as a limit to challenge. Sooner or later, the limit will be reached at the level of each individual machine, and scientists have already begun to investigate the possibility of building more powerful computing engines by using more advanced technologies from optical to quantum that will no longer be subjects of the Moore monarchy. We are excited about the future, at the same time, we are worried about the present: idled computers; Internet services wedged like the 5 p.m. traffic on a Los Angeles freeway; devices no longer able to communicate with one another; the impossibility of secure communication between any two devices; and wasted manpower and energy outages. Are we solving the right problem? Are there better solutions already available?
Introduction
13
We, more than ever, need P2P now because with the duplication of information at various locations closer to the systems accessing it, even when one or more these sites are disabled, the information can be retrieved more efficiently since both the load on, and access bandwidth to a centralized resource are minimized; with the direct connection between any two devices identified by unique IDs virtually achievable, the centralized directory lookup will no longer be either a “must-have” component or a source of bottlenecks in the network; multiple device, drones, and sensors types of ad-hoc mobility can be achieved; and with P2P network based, mobile agent services, objects, applications and clients can communicate and share information with each other in such a way as to minimize the users’ involvement in tedious search tasks and thus make systems that are more user responsive. There are many more possibilities brought by P2P technology and any one of them can lead us toward the next wave. With respect to timing and the state of current technology, these possibilities are much closer to realization, and preferable to us sitting here and waiting for the next technical revolution. So, why will P2P now help optimize Internet access and blow away this illusion of the user isolated at its edge? A short answer to this question is that the current Internet technology without P2P cannot support the sustained, optimized growth of multidimensional, multiple device appservices, and the network topology which P2P implicitly creates will be location independent, and hot with localized activity everywhere. Let’s look at why this is true. As mentioned above, one of the first requirements is user-based, appservices that fully support multimedia data. This means that among other things, at least music, and video must be delivered on demand. The evidence is already here that centralized services cannot support the current demand for domain name lookup (Cheriton, 1988, 314-333), and the massive exchange of multimedia content is a huge problem in comparison. The bandwidth is not there, and the centralized, Cloud based web-services solution always adds more bandwidth capacity, and larger server farms. This is historically viewed as keeping up with the increasing demand by providing the same quality of service. This is neither acceptable nor successful at providing user satisfaction. The analogy is adding more lanes to the freeways to “improve'' traffic flow, rather than seriously looking at alternative solutions that will either be convenient or compelling enough for drivers to consider using them. The build-out of Wireless LANs (WLANs) based on 802.11a/b networks anticipated in the first decade of the 21st century has arrived. As will be discussed in Chapter 6, P2P is a natural fit for optimal content
14
Chapter 1
distribution in WLANs. In section 3.4 it is pointed out how P2P will encourage an evolution of the organization of a mixture of network devices again leading to an optimal use of bandwidth and services to eliminate the centralized bottlenecks reinforced by the pre-Columbian illusion of where the center of the Internet is located. The world is not flat and the Internet’s edges as well as the sky itself with air born, mobile P2P networks can all be hot spots of computation and communication. A second way P2P can optimize the Internet now is by taking advantage of the processing power of quiescent CPUs at no cost. It was projected in 1966 that mainframe computers would revolutionize the world. Neither the arrival of the now-extinct mini-computer nor the microprocessor was anticipated. A mobile phone’s processor is more powerful than a typical 1980’s mainframes! Mobile devices included, there are more than a billion computers out there all connected to the Internet, and most of the world’s population is not yet connected. The existing available processing power is massive. Using P2P one can create coordinated, loosely coupled, distributed nodes as members of a P2P Network. SETI@Home (SETI@home, 2022) is successful as an administratively centralized, computing grid where the responsibility for decisions is administered by software at SETI@Home’s University of California’s laboratory. With the addition of P2P capabilities SETI@Home like P2P networks will be able to offload some of these administrative tasks by adding server capabilities to each SETI@Home client. This will help to both lessen the bandwidth used to and from their laboratory to the volunteered computers, and speed up the overall grid computation by, for example, permitting checkpointed jobs to be off-loaded to another client directly. Right now, one’s home can become a fully connected P2P edge network. This network in turn can be connected to either a laptop, PDA, mobile phone, automobile, or workstation in one’s office behind a firewall giving each family their personal peer community. This book presents the fundamentals sufficient to initiate the process.
1.5 Basic P2P Rights P2P Networks are organized overlays on top of and independent of the underlying physical networks. We have a definition that is general enough to permit almost any device with network connectivity to be a node in some P2P network. And our definition permits these networks to be both ad-hoc and autonomous, and their nodes to be anonymous. It also admits the possibility of strongly administered well authenticated, secure networks. And, in either case, both openness and secrecy can and will exist. This
Introduction
15
paradigm is frightening to some because on the one extreme it is antagonistic to George Orwell’s state in his book, “1984.” “Big brother” will not know that you exist and therefore cannot be “watching you.” It is frightening to others because it also permits Orwellian P2P networks where there is no secrecy, all communication is both monitored and audited, and all data is in clear-text and is either monitored in real-time or saved for surveillance. The former and latter will then use contemporary Big Data Analytics. The latter, among other things, can take advantage of both Natural Language Processing and Machine Learning algorithms. What’s important is the freedom of choice P2P offers. This choice has concomitant responsibilities, a P2P credo if you like: Respect one another’s rights, data, CPU usage, and share the bandwidth fairly; spread neither viruses nor worms; be tolerant of one another’s resource differences; be a good network neighbor; and do no harm to others. The nature of credos is to be violated. That is understood and part of human nature. The goal is to minimize nonaltruistic P2P behavior by either localizing it to those P2P networks where it is acceptable, or appropriately punishing it when it is discovered. The usual punishment will be expulsion from the P2P network. Rightly enough, in the sanctity of one’s home can be a P2P network whose members are all the devices and appliances that have network presence. As you will learn in this book, a P2P overlay network is independent of the multiple possible “underlying” real networks and their protocols, for example, bluetooth, Wi-Fi and ethernet networks, some of which may or may not use the Internet protocols on the local area network in the home. All in-the-home communication and data can be private, and only search warrants will permit entry. The United States Bill of Rights can be viewed as a P2P supporting document since freedom of assembly, speech and the avoidance of unreasonable search and seizure are at the heart of P2P. And certainly, one can imagine a well-organized “network militia” bearing its software arms to secure the network and the data resident therein. Freedom of access for network devices and their associated data are at the heart of P2P networks. The rules for the network and data use are decided by the member nodes and are member nodes’ policies. One should be able to then purchase several devices and appliances as well as applications using the appropriate, hopefully open source, standard based P2P protocols1, to enable a P2P 1
This book teaches the reader the fundamentals of how to build a spectrum of secure P2P Networks that support Edge Computing. Ultimately accepted standards created by a recognized standard body may vary from this book. For without standardized P2P protocols the ever present, dark side of the Internet prevents practitioners from guaranteeing secure P2P networks.
16
Chapter 1
network, and permit the various nodes to join the network.
1.5.1 “Freedom of Assembly” for Right to Connect Everything The first decade of the new millennium saw an exponential growth of network aware devices capable of sending and receiving data. The list is ever increasing, and the combinatorics defy one’s imagination. Along with computers we have home routers and modems, PDA’s, mobile phones, automobiles, TV’s, surveillance systems, door bells, temperature controls, light switches and light bulb receptors, fans, refrigerators, alarm systems, wrist watches, stoves, dishwashers, ovens, home theaters, electricity and gas meters, pet licenses, eye glasses, rings, necklaces, bracelets, etc. Any combination of these can be interconnected to form ad-hoc P2P networks. One might ask, “To what end?” It is easy to place oneself in a scenario having just left home and worrying if the oven or stove was left on. Rather than turn back, a simple control panel on these devices which are peers in a home P2P network and this home edge network can be accessible with either a wireless device in one’s automobile or a mobile phone, both as peers in one’s private home network, is sufficient to make a quick check. In fact, one could launch a mobile agent securely within the home LAN to do a full integrity check of the home and report back. A few seconds later one will receive an “all is well,” or “you left the stove on, and it’s turned off?” Before and after photos can be sent. Another scenario is ad-hoc networks of people in coffee houses, railroad stations, sitting in the park, or in an outdoor cafe. Here, jeweled bracelets, or necklaces might light up in similar colors for all of those who are members of the same ad-hoc, P2P network community. In the evening when viewed from the distance, one can imagine small islands of similar colored lights moving about, towards and away from one another, in a beautifully illustrated, ad-hoc social contract as “Smart Mobs (Rheingold 2005)”. These scenarios are endless, practical, and part of our future. P2P engineering for wireless devices and networks is discussed in Chapter 6.
1.5.2 “Freedom of Assembly” for Survival of the Species Ecosystems are self-organizing, self-maintaining and in case of reasonable injury, self-healing. There is life and death, and the eco-system evolves using natural selection. This process continues and new life forms arrive over time as the result of mutation. Eco-systems are great for trialand-error testing. The same must be said for P2P overlay networks. Peers
Introduction
17
come and go, crash during data transfers, lose their visibility, and are rediscovered. New devices are accepted on face value and permitted to participate in the network’s activities. P2P networks are spawning grounds, playgrounds for creative thinkers. In this manner, a P2P network can continue to gather new members, make them as known as policy permits, and behave much like ecosystems where diversity leads to survival of the fittest. Peers are free to assemble with others for the interchange of content. Peers like mobile-agents are free to traverse the P2P network, visit those peers where entry is permitted, gather information, and return it to their originators. As such, “Freedom of Assembly” is the ultimate P2P right. As “what is P2P” defines, although each single device is part of a cooperative whole, it is a node in a P2P network and makes its own decisions and acts independently. A device’s presence is neither required nor denied. Hence, the failure of a device should not be harmful to other devices. If two devices are connected, and one abruptly crashes, this should be a little hiccup in the P2P network, and there ought to be a redundant device to take its place. Still, everything has two sides, this independence also means that there might not initially be anyone who will help this poor, temporarily stranded guy. As for a highly available, client-server system, there always are servers behind each client, and back-up servers configured for each server, but they are subject to bottlenecks, resource depletion and denial-of-service attacks. So, these self-maintaining, self-healing and self-adaptive features cannot always reduce the burdens on client/server, centralized systems. On the other hand, for a device in a P2P network they are not only essential but rather they are inherent in the network ecology. Thus, the “poor guy” who was sharing content and abruptly lost the connection can expect to resume the operation with another node although this recovery may not be instantaneous. During its apparent isolation it might receive the welcome visit of a wandering mobile-agent that is validating the network topology and can redirect the peer to a new point of contact. Similarly, it is difficult for hackers to use denial-of-service attacks because, like an ecosystem, there is no center to attack. P2P networks have built-in redundancy. From a software engineer’s perspective, ideally, P2P software must be written to reside in a self-healing network topology. Typically, any device uses software to monitor its tasks, schedule its resources to maximize its performance, set pointers and re-flush memory for resuming operations efficiently after a failure. At the higher level, the P2P software should be able to adjust to the new environment for both recovery and better performance. For example, a device might have dedicated caches or tables to store its favorite peer neighbors to be used for fast-tracking connections
18
Chapter 1
or network topology sanity checks. When either a neighboring device fails or one of its buddies is not so “truthful” for an intolerable period, the P2P software on the device should be able to dynamically modify its records. In this way, at least the same connectivity can be guaranteed. This is just one of the most straightforward cases showing that P2P software needs to be self-healing and self-adaptive if the network is to behave in the same manner. The engineering dynamics of these scenarios is discussed in detail in later chapters. Unfortunately, not all devices are capable of self-management, for example, the small, wireless sensors. Such small devices don’t have enough computing power and memory to have such sophisticated software or hardware embedded. So, they must rely on other machines for the above purposes. Although the P2P purists hate to use the “server” word, it is true that the small devices require such “server-like,” proxy or surrogate machines, and this fits perfectly with the definition of P2P overlay networks defined above. As mentioned just above, “Freedom of Assembly’’ in P2P networks is supportive of a multiplicity of devices and software “organisms.” They arrive, try to succeed, to find a niche, and either continue to flourish or die out. Since the early 1990’s mobile-agent technology has been looking for the appropriate execution environment. They can be invasive, pervasive, informative, or directed, and come in all shapes and sizes. For the authors research, they work best when implemented in JAVA because the byte code is hardware independent. Mobile-agents can be written to adapt to selfhealing, ad-hoc network behavior and, in fact thrive in such an environment. The very fact that they are mobile by nature, and can have self-adapting itineraries and agendas, can be signed and thus secured, and are opportunistic as they travel, they have always required a network eco-system for their survival and evolution in mainstream technology. The authors of this book are advocates of mobile-agent technology as applied to P2P overlay networks. The engineering details are discussed in Chapter 4.
1.5.3 “Freedom of Speech” for the Right to Publish Data and Meta-data As previously mentioned, the data or information published and transferred on the Internet is multi-dimensional, and enormous in volume. Thus, brute force pattern matching techniques for searching for data are no longer durable and become dinosaur-like relics from the era when only textual data was available. A file sharing system which depends on such
Introduction
19
simple techniques can be easily hacked since it only requires data corruption by a virus to destroy data. Now, a description of data, data about data, metadata, is an essential component of the organization of data on the Internet to make tasks like search achievable. With meta-data, for example, one can keep signed hashes of the data in meta-data that permit one to detect modification attacks. Similar techniques can be used to detect data that has been modified while being transferred on the Internet. Nodes on a P2P overlay network have the absolute right to exchange data or meta-data or both. This meta-data can be public or private, freely published and subscribed to by everyone, or secret and viewable by a select few. Meta-data can be stored as clear text and posted to a public domain site for wide distribution of the data described by this meta-data. One of the immediate uses of these sites is to share research publications among institutes. On the other hand, P2P applications have the choice of not hiding or hiding meta-data. They can have strong encryption or use secure IP (IPsec) so that data or meta-data that is being exchanged can be impossible to monitor because well written security systems can assure the end-to-end privacy of these “conversations.” Thus, encrypted meta-data can and will be impossible to detect on peers’ systems. Also, access to a system’s data directories, i.e., the meta-data describing the files on that system, can require authentication with appropriate credentials for access, and this meta-data can be transmitted as clear text or encrypted descriptions of these directories. Thus, again it may only be visible to the possessor of these credentials. Thus, given the current encryption key sizes and algorithms, viewing the clear text is impossible. Processing speed is so fast that encrypting or decrypting a megabyte of data is negligible. Thus, the processing time required to keep both local and remote user data and meta-data secret is a given. “Freedom” of Internet privacy protection has almost no obstacles because the cryptography code which implements the required algorithms is freely available on the Internet (Free Code 2003) (The Legion of the Bouncy Castle. 2022) (OpenSSL 2015). Noting that thirty three percent of all internet traffic is directed towards pornographic sites (Buchholtz 2019), will P2P networks be any different with respect to data and meta-data that is published? The answer is probably not. The “Freedom of Speech” gives one the right to publish data and metadata within certain legal limitations, and the Internet knows fewer and fewer boundaries to data exchange. But do note with caution that the first amendment to the United States Constitution being applied world-wide does inspire significant resistance from governments that wish to control the information that crosses their borders.
20
Chapter 1
When P2P networks are pervasive, the publication of content may or may not reside on individuals’ systems as a function of the networks placed on the P2P spectrum. As such, some of these systems will be much more difficult to locate on the Internet because their visible IP network addresses are NAT endpoints. Still, the permitted and accepted private exchange of data and metadata must be no different than a VoIP telephone conversation moving from base station to base station. As such, the problem does not reside with the system that is used to enable the conversation to take place, but rather with the endpoints of the conversation. Endpoint nodes on a P2P overlay network are assigned permanent identifiers as addresses that are globally independent of their real network addresses. And thus, the P2P protocols and routing we define on the overlay network adapt to the arbitrariness of IP addresses in real-time to give nodes true IP address independence. Why? The overlay network binds to the Internet layer and recognizes and changes to the device's IP address. It then updates the overlay routing of {endpoint node ID, IP address} and this propagates across the overlay network as necessary. The details are explained in Chapter 4. New technology forges new pathways across all systems including legal ones. This always has and always will be the side-effect of crossing new frontiers. It’s exciting for some and frightening for others, and one generation’s laws may become the next generation’s blue laws, i.e., outdated relics of historical interest which will always have some diminishing in time support. Undeniably, all users will be subject to the current “laws of the land,” and open to arrest and prosecution for their violation. But P2P technology will create new markets for honest digital commerce with enormous economic incentives and will also permit private network conversations between consenting adults along with the expected criminal activity. The latter is well understood human behavior that P2P neither encourages or discourages much like a highway neither encourages or discourages a driver to exceed the posted speed limit. The solution to these problems is neither to abolish driving nor stop innovative technological progress because it can be misused. Clearly such reactions are neither well thought out, nor well founded, and will not lead to long term, fair and meaningful solutions. The latter will come from the technologists themselves working with lawmakers and media firms, and always respecting an individual’s Basic P2P Rights. The engineering aspects of data and meta-data are discussed in Chapter 3. In the next section we give an overview of the book as well as a brief description of a P2P Pseudo-Programming language that we invented.This enables a more technical description of our protocol design.
Introduction
21
1.6 Contents of This Book and P2P PSEUDOProgramming Language This book is organized as follows: Ɣ Chapter 2 gives an historical perspective on both the evolution of the Internet in two phases, pre-WWW and post WWW, and the roots of P2P. Ɣ Chapter 3 covers the essential engineering components of the generic P2P model. These include the descriptive language and resulting document space, unique peer identity, the overlay network, and communication components. The chapter concludes by showing how these components can be assembled to permit communication on the overlay network given the limitations of the real, underlying network and physical layers. Ɣ Chapter 4 gives life to the components, and describes protocols used to create an active P2P network. Here connecting, routing, load balancing, and querying are discussed. Ɣ Chapter 5 we present the details of how one can implement standards-based security in such a system. We conclude this chapter by applying these security fundamentals to demonstrate how to create secure Java mobile agent P2P services. Ɣ Chapter 6 is a thorough discussion of wireless networks followed by showing how P2P can enable exciting new applications that are device and bearer network independent, and thus be a long needed, unifying force between wired and wireless networks. We also describe what is required to build a Java P2P application for mobile handsets. Ɣ Chapter 7 explores some possible P2P applications starting with the familiar email, and chat programs and ending up with less familiar and innovative, almost science fiction-like possibilities. To be able to explicitly express the engineering principals in this book, a P2P Pseudo-Programming Language, 4PL has been devised. The syntax and semantics of 4PL are defined in Appendix I. 4PL permits one to both programmatically define nodes on the P2P overlay network, as well as describe their interaction by defining each P2P component we introduce in Chapter 3 as a 4PL data type and creating a set of associated 4PL commands to which one can pass and return typed variables. As mentioned above, in Chapter 4 we define several overlay network communication protocols. We will use 4PL here to create flow charts to
22
Chapter 1
describe the precise protocol behavior. Thus, 4PL will give a solid logical foundation to the engineering and protocol definitions and eliminate the possibility of inconsistent behavior barring any 4PL programming bugs. It is suggested that the reader uses Appendix I as a reference when reading Chapters 3 and 4.
CHAPTER 2 P2P: THE ENDGAME OF MOORE’S LAW
Today, systems with clock speeds in excess of 4 gigahertz. 16 gigabytes of high-speed RAM and 1 to 8 terabytes of disk storage are readily available on desktops and laptops. At the same time, the best mobile phones have over 3 gigahertz clock speeds, up to 6 gigabytes of RAM, and a maximum of terabyte of memory. If we look back to the 1980’s the story is significantly different for micro-computers. Today’s mobile phones were a distant dream: “I recall quite vividly having been awe struck by the processing power and memory size that was available to me when I took it upon myself in 1981 to rewrite my PDP11-05 mini-computer based, multiple protocol router code to take advantage of a Masters Degree student’s, Andy Becholsheim, mc68000 microprocessor board. The latter had a clock speed of 8 megahertz and 256K bytes of DRAM while the former had a clock speed of 2Mhz and 56K bytes of core memory available to run software. Andy’s board was one of the first in a long line that would bring us to where we are at this time1.”
Now, more than forty years later we see the predictive power of Moore’s law: Gordon Moore stated in 1965 that the transistor density will double every 12 months. Yes, this slowed down to doubling every two years over the years. Still, along with the doubling of transistor density we have had a concomitant increase in processor speeds. This is because, first, there are more transistors in a given space, and second the delay on the transistors and wires is reduced2. It is assumed that by the end of this decade Moore’s law will no longer apply to silicon-based transistors. Similarly, given the current computing resources in the hands of the average user, and the Internet, it is no coincidence that the new millennium was greeted with a rebirth of P2P technology. The potential “computational energy” available boggles one’s mind. Indeed, we view the emergence of P2P as a means to 1
William Yeager’s recollections of his days at Stanford University, December, 2003. Delay is proportional to resistance times the capacitance, and both resistance and capacitance are reduced because of Moore’s law. 2
24
Chapter 2
tap this energy source, and as such, the final moves, the logical conclusion to the evolution brought about by Moore’s Law, that is to say, “P2P is the End-Game of Moore’s Law.” Decentralize and conquer is the end-game’s winning strategy. The inevitability of harnessing the latent CPU power of personal systems into communities of users with common interests is the logical conclusion of Moore’s Law. There is just too much CPU power to be ignored.
2.1 The 1980s As the increase in processor speeds began to diligently follow the almost mystical curve of Moore's law, the initial benefactors were the servers in the client/server model of distributed computing. Still, there were those who even in the 1980s dreamed of harnessing the untapped computational resources of the workstations that began to dominate the desktops of researchers, those who viewed these systems as Peer Nodes capable of sharing content and computational services. The embryonic beginnings of the P2P technology that would surface at the debut of the new millennium were already in place twenty years before its birth. And, during this twentyyear gestation period several events put in place exactly the requirements necessary for the successful rebirth and growth of P2P computing during the first ten years of the new millennium. Owen Densmore, formerly of Sun Labs, and then worked for Complexity Workshop (Complexity workshop 2000) in Santa Fe, predicted that 2000-2010 would be the “Decade of The Peer,” and we believe, as do many others, that Owen is correct. In this chapter we look at the history of P2P, its initial appearance in the 1980s, and the historical highlights of the twenty-year gestation period leading to its inevitable and logical reappearance in 2000. One imagines that most of the software architects and engineers that designed Napster, Gnutella, and FreeNet were about thirty years old in 2000. That puts them at the introduction to their teenage years at the time that the Arpanet switched from NCP, the Arpanet protocols, to IP in 1983. And from our point of view the internet became “The Internet” with the addition of end-to-end IP connectivity. During the decade of the 1980s, IP quickly obsoleted the other network protocols that were in use at that time. Some examples are XNS, DECNET, Appletalk, Localtalk, CHAOSnet, and LAT. By 1984 IP and its accompanying protocol suites were already well on their way to becoming the global network standards. So, what does this have to do with P2P? Clearly, the rapid growth of networking technology that took place in the 1980s was the major impetus, the force that pushed us technically from the perspective of applications to where we are today.
P2P: The Endgame of Moore’s Law
25
Behind these advances, behind the scenes if you like, we also find first the effects of Moore’s law: Smaller is faster; smaller is more memory; more memory implies network hardware and software with more functionality; better networks imply higher productivity and new and creative ways to communicate. Second, the IETF as a forum for open Internet standards was then, and still is a major factor. From 1980 until 1990 three hundred and two RFCs were approved. The 1980s, indeed, set the stage, put in place the sets, and the scenery for an as yet unwritten drama to take place. They would be all about the evolution of the Internet and P2P has played a major role in the act that began in the year 2000.
2.1.1 LANs, WANs, and the Internet While some Local Area Networks (LAN) did exist in the 1970s at places like Xerox PARC where ethernet was invented, the real upsurge occurred in the 1980s, and in particular, after 1983. In this context, it is important not to forget that the 3mbps ethernet, ethernet “version 1,” uses the Parc Universal Packet (PUP) specifications. It was officially documented in 1976 in a paper by Bob Metcalfe and David Boggs entitled, “Ethernet: Distributed PacketSwitching for Local Networks.” The interested reader can find the PARC Interoffice Memorandum by Ed Taft and Bob Metcalfe written in June of 1978 which describes the PUP specifications on the Internet (Boggs, 1979). We certainly can assume that the hardware and software that is discussed in this latter memo existed well before this date. PUP was used to link together Altos, Lisp Machines, Xerox Star Systems, and other servers at Parc. Bob Metcalfe left Xerox in 1979 to form 3COM, and promote ethernet. The Ethernet version 2 standard, or 10mbps ethernet is specified in IEEE 802.3. The standardization is the result of a joint effort by Xerox, DEC and Intel from 1979 to 1983 that was motivated by Bob Metcalfe. Today ethernet is the world’s standard comprising 85% of the LANs. We now give a brief description of the emergence of the Stanford University LAN. It is by no means a unique event but one we can describe from Bill Yeager’s deep personal involvement in this effort. Bill was a member of the Stanford Knowledge Systems Lab’s Symbolic System Research Group (SSRG). And certainly, as we will see, what happened at Stanford did spearhead the growth of networking on a world-wide scale. By the means of a grant of hardware and software received in December of 1979 by Stanford University from Xerox PARC, PUP became the original 3mbps ethernet LAN protocol at Stanford University and was primarily used to link D-Machines, Altos, Sun Workstations, VAX’s and TENEX/TOPS20 systems across the University. The first three subnets
26
Chapter 2
linked the medical center, with departments of Computer Science and Electrical Engineering by the means of the first incarnation in 1980 of Bill’s router which ran in a PDP11-05 and routed the PUP protocol. These three subnets were the basis for the original Stanford University LAN. IP showed up at Stanford in late 1981, and to support IP Bill invented the multiple protocol, packet switched ethernet router that routed both PUP and IP. He continued to develop the code over the next 5 years. By 1985 the code routed PUP, IP, XNS and CHAOSNET. It was officially licensed by Cisco Systems in 1987 and was the basis for the Cisco systems router technology. While the initial version of this hardware was created by a hardware engineer, Nick Veizades, in the SSRG, as of late 1981 the hardware for these multiple protocol routers was known as the “blue box.” The blue box was invented in the Stanford department of Computer Science. Both versions had a multibus backplane outfitted with a power supply, motherboard with a mc68000 CPU and 256 Kbytes of DRAM memory and held up to 4 ethernet interfaces. The motherboard was Andy Becholsheim’s invention (Carey, 2002). The first cisco routers used the identical hardware. They ultimately became the Internet router of choice. 10mbps ethernet subnets appeared in late 1981 and along with IP began to dominate the LAN which blossomed from a research LAN. With three subnets, this LAN began to connect all of Stanford’s academic and nonacademic buildings. The Stanford LAN had the IP class A internet address 36.0.0.0 and was first connected to the Internet in 1983 by the means of a BBN router called the Golden Gateway. It was maintained by a graduate student named Jeff Mogul. Jeff has been very active in network research since graduate student days in the early 1980s. As a graduate student he coauthored rfc903 on the Reverse Address Resolution Protocol in 1984 and since that time he has joined with others to write an additional fifteen RFC’s. His most notable effort may be rfc2068 which specifies HTTP/1.1. One of his co-authors was Tim Berners-Lee. By 1990 the Stanford LAN had more than 100 subnets. This growth was accomplished by formally acknowledging the necessity of the Stanford LAN for doing day-to-day business and by forming a department in 1985 under the management of Bill Yundt to support their further growth. Stanford continued to support the PUP, IP, XNS and CHAOSnet protocols into the late 1980s since the ongoing research required it. The Stanford LAN service was superb and permitted seminal work in Distributed Systems to be done which is clearly a forerunner of P2P. This research is discussed in section 2.1.3. In a similar context, the MIT Media Labs had CHAOSnet which was originally used to link its Lisp machines, and later a large selection of
P2P: The Endgame of Moore’s Law
27
machines at MIT. This was documented by David Moon in 1981(Moon, 1981). By the mid-1980s LANs like these were commonplace in universities, and businesses began to follow suit. In 1985 Golden was retired and Bill Yundt’s department provided connections to the Internet, NSF backbone. These were T1, 1.52mbps networks formally called the NSFNET. Similarly, a T1 Bay Area Network (BARNET) was created to link up universities, research institutions and companies in the general Bay Area. BARNET extended to UC Davis near Sacramento. Bill Yundt played a major role and was an impetus to the formation of BARNET which was one of the first Wide Area Networks (WAN). There were no restrictions with respect to whom might connect to BARNET. It was a pay as you go network. From 1985 onward, LANs and WANs popped up everywhere with the NSFNET providing cross country, Internet connectivity. Networking was the rage. The Internet grew dramatically with NSFNET as the major motivating force. “NSF had spent approximately $30 million on NSFNET, complemented by in-kind and other investments by IBM and MCI. As a result, 1995 saw about 100,000 networks—both public and private—in operation around the country. On April 30 of that year, NSF decommissioned the NSF backbone. The efforts to privatize the backbone functions had been successful, announced Paul Young, then head of NSF's CISE Directorate, and the existing backbone was no longer necessary.” (Colwell 2000)
Before we move on, it is worth reflecting just why networking was the rage? What was behind the rapid adaptation of the latest technology? Clearly, one did not spend millions of dollars on a technology because it was something “cool” to do. Rather, those who financed the research and development ultimately required a return on investment (ROI). It is because networks stirred the imagination of visionaries, researchers, software designers, systems analysts; CEOs and CFOs; of thousands of students in universities; and above all, the users of the network applications. Thus, a job market was created to implement what would become the next industrial revolution. The ability to network applications, streamlined business processes, opened the doors to new forms of interactive entertainment, and provided avenues for long distance collaboration in real time. Expertise became location independent, as did one’s geographical location with respect to her or his office. The effective, network virtual office was born in this decade because of the network infrastructure and the early network applications.
28
Chapter 2
2.1.2 Early Network Applications The first, foremost and most popular network application has always been email. It existed in the 1970s on the ARPANET and became standardized in the 1980s on the Internet with SMTP, POP1 and IMAP. The rapid information exchange that email provides has always made it a great tool for communication, for research, business, or personal use. It also points out that applications that promote person-to-person, end-to-end communication will always yield a decent return on investment. The IMAP email service of the late 1980s was a harbinger of how effective protocols can be that are specifically targeted at both the network infrastructure and computer resources. It had significant advantages over POP at that time since IMAP permits one to access the properties of messages, as well as, and independently of the messages themselves, and the parsing of messages was entirely done on mail servers. This greatly simplified the writing of client UI code and maximized the use of network bandwidth. It is important to recall that the NSFNET was T1 based at this time and clients were very limited with respect to computational and storage resources. Also, many clients ran on computers that used serial lines to connect to the network. Bill Yeager, who along with Mark Crispin coinvented IMAP in 1984-85, and together wrote the first clients and servers, recalls demonstrating macMM, a MacIntosh-II IMAP client, and IMAP at the Apple Corporate building in Cupertino in 1989 by reading and replying to his email on the Stanford LAN via BARNET and was not surprised to see no human recognizable difference between reading email in Cupertino from the SUMEX-AIM IMAP server some fifteen miles away at Stanford and doing the same thing on his desktop at Stanford. A great deal of this performance was implicit in the original design of the protocol provided that the clients were well written as macMM and Mail Manger on Xerox Dolphin, Dorado and Dandelion Interlisp Lisp Machines (MMD) both were. While mail is not P2P, it gives the user a sense that it is P2P. Servers in the middle tend to be invisible servants for most of today’s email users. And, just like IMAP was born out of the necessity of the requirements of the 1980s, we see today’s network infrastructures and computer resources ready for new email protocols. To this end, for a discussion of P2P Email see chapter 7. Similarly, one cannot discuss early network applications without mentioning telnet and ftp. Both can be viewed as early P2P applications for those users who had Unix workstations or Lisp machines as desktop. Each system was both a telnet/ftp client and server. Users regularly ran these endto-end applications on one another’s systems and exchanged data with ftp. This is discussed further in the next section.
P2P: The Endgame of Moore’s Law
29
It is also amusing how today’s users believe Instant Messaging and chat rooms are a phenomenon of the new millennium. Chat room applications were available on mainframes before we had networks. One’s buddy list was extracted from a “systat” command which showed who was on at the time. And the chat rooms were supported by dumb terminals like Heathkit Z29’s or Datamedia’s. Mike Achenbach of SUMEX-AIM wrote a chat application exactly like this that ran on TENEX mainframes in 1981. Much like Zoom on today's Internet, the terminal screen was broken into rectangles to support each member of the chat room. These chat rooms were small, but the functionality is identical. When networks arrived, we had PUP based chat on Lisp Machines. The UIs were graphics based. Finally, the Unix talk command was always networked and used in conjunction with rwho for presence detection. The protocols have evolved over the years, but the ideas came from the 1980s. Nothing new has been invented in this arena since that time. Chat and talk were both P2P applications. Also, LAN and Internet router software used routing protocols to communicate routing information with one another in a P2P manner. Here we are thinking about the Routing Information Protocol (RIP) and interdomain routing based on the Border Gateway Protocol (BGP). By either broadcasting routing information (RIP) or supplying it with a reliable connection (BGP), the service was symmetric, with each such system behaving as both a client and a server. Another application that was born in the 1980s is the network bulletin board. They arrived in many forms. While some were simple digests with moderators, others were fully distributed client/server systems with interest lists to which a user could subscribe, and both post and read messages. Good examples of the former are the SF-Lovers digest, and info-mac. The SFLovers digest was ongoing email messages where the daily threads were mailed out as a moderated digests to the subscribers. The subject was science fiction and fantasy, and SF-Lovers became extremely popular in the late 1970s with the release in 1977 of the first Star Wars film, “A New Hope.” Info-mac was all you wanted to know about the Macintosh and was hosted by SUMEX-AIM for more than a decade. What was admirable about such digests was the dedication of the moderators. Keeping such a digest active was a full-time job, and those who moderated these digests did it out of a passion for the subject. It was voluntary. The Network News Transport Protocol (NNTP) was specified in rfc877 in 1986. “NNTP specifies a protocol for the distribution, inquiry, retrieval, and posting of news articles using a reliable stream (such as TCP) server-client model.” USENET servers on top of NNTP were P2P systems in the sense that they were clients of one another in order to update their news databases. The actual net news
30
Chapter 2
clients could run on any system that had TCP/IP as the transport. The clientside protocol is simple and elegant, and the USENET client/server system provided a powerful mechanism for the massive exchange of opinions on almost any topic imaginable. One might pause and ask where the ROI is, “show me the money.” Even applications as simple as ftp on TCP/IP encouraged digital data repositories to be created, and thus the rapid exchange of information. People caught on quickly in the 1980s and soon other content query, storage, and exchange protocols were placed on top of TCP/IP. Among these were networked SQL, distributed database technology; Digital libraries of medical information at NIH; Remote medical analysis; networked print services from companies like IMAGEN; and Laboratory-to-Laboratory research as exemplified by national resources like the Stanford University Medical Experimentation in AI and Medicine (SUMEX-AIM). All of these networked technologies led to huge cost savings and streamlined both research and business processes thus yielding more profit and ROI. Finally, “During the late 1980s the first Internet Service Provider companies were formed. Companies like PSINet, UUNET, Netcom, and Portal were formed to provide service to the regional research networks and provide alternate network access (like UUCP-based email and Usenet News) to the public.” (History of the Internet 2022)
2.1.3 Workstations and Distributed File Systems The 1980s also hallmarked the birth of systems such as the personal Lisp machine, the Unix workstation desktop, the Macintosh, and the PC. These machines, for the first time, gave users their own systems for running applications and network services, and broke away from the approach of “all of your eggs in one basket,” that is to say, a dependency on a serial-line tether to a time-shared mainframe to run applications and store data. As already discussed, routers on the other hand inspired a “Let’s connect everything to everything” attitude. They provided the means to this interconnectivity be it on a LAN, WAN, or the Internet. An important feature of routers that is often overlooked is that they also form barriers that isolate local subnet traffic to that subnet. Consequently, they permit a great deal of experimentation to take place within a LAN without having it disrupt the day-to-day business that is conducted through interaction of many of the hosts connected to the LAN. Thus, the 1980’s found users and researchers alike in the ideal network environment where cohabitation was the accepted policy, and routers effectively administered the policy. We were at this time clearly on the path towards both centralized
P2P: The Endgame of Moore’s Law
31
client/server and decentralized, distributed computational services. And as seen below, although not called P2P, the freedom this environment provided encouraged both distributed file sharing and computation. Since many of these systems (Unix desktops and Lisp Machines in particular) had client as well as server capabilities, telnetting or ftping between them was the norm. Also, mutual discovery was done with DNS. Every host on the Internet could have a fixed IPv4 address, and it was easy to keep track of the unique hostnames of interest that were bound to those addresses. In this sense, a user having symmetric ftp access to one another’s systems is P2P in its generic form. Noting that this was as easily done across the Internet as on one’s local subnet or LAN since each such system had a unique IP address, the true end-to-end connectivity that existed at that time yielded P2P in its purest state. The early 1980s featured the rise of Unix servers. These servers ran the rdist software that permitted them to share binary updates automatically and nightly. They were peers from the perspective of rdist. Similarly, Lisp machines such as Symbolics Systems, and Texas Instruments Explorers were extremely popular as research workstations, and they too behaved as both clients and servers, as peers using their own file sharing applications as well as ftp. The Network Files System (NFS) was introduced by Sun Microsystems in 1984 and standardized with rfc1094 in 1987. This was quickly followed by the Andrew File System from Project Andrew at Carnegie Mellon University. While NFS was restricted to the LAN, AFS was Internet wide. These file systems run on both clients and servers, and permit users to view a distributed file system as a collection of files virtually on their own systems. The Unix “ls” command was location independent. Therefore, to access a file one used the usual local command line interfaces since drag and drop user interfaces did not yet exist. As long as the connectivity was there, any file for which the user had access privileges could be simultaneously shared as if it was on the local system. This is again an example of P2P file sharing. A major difference between NFS and AFS file sharing, and what has become known as file sharing in the current decade, is that the latter is done by retrieving a copy and storing it locally, while the distributed file systems worked and still work perfectly well as virtual file systems. The file itself need not reside on the local system even if it appears to do so. Thus, a file can be read, or written with simultaneous access and appropriate locking mechanisms to prohibit simultaneous writes. One other difference is the nature of the content. During the 1980s for the most part shared content was either text, or application binaries, and thus the impetus for massive file sharing did not exist as it does now. The user communities
32
Chapter 2
in the 1980s were initially technical and research based and evolved to include businesses towards the end of the decade. Still, it is easy to imagine what might have happened if even digital music was available for distribution during that epoch. We are quite sure that speakers would have appeared on workstations and distributed virtual files systems like NFS and AFS would have been one of the communication layers beneath the Napster equivalents of the 1980’s. Sure, the audiences would have been smaller, but the technology was there to do what was required for LAN/WAN wide distribution of digital content, and the Internet connected LANs and WANs. You get the picture. Using these distributed file systems for P2P was a natural for read-only file sharing of multimedia content. Recall that disk drives were limited in size, and that many of the early workstations were often diskless. They booted off the network and ran programs using NFS. Still, peer nodes could have auto-mounted the file systems containing content of interest, and then search, list and view it as appropriate for the media type. The meta-data for each file could have been cached throughout the P2P Network on small servers behaving much like mediators and carry with it the file system location of where the file resided. The meta-data and content may have migrated with access to be close to those users to whom it was most popular. Noting that scatter-gather techniques are a variation on the themes used in the 1980s for both the interleaving of memory and storing files across multiple disk drive platters for simultaneous access with several disk drive read heads to improve performance. Coming up with a similar scheme for distributing files in a more efficient way is and was an obvious next step. A file may have in fact existed in several chunks that were co-located on the thus constructed P2P network. The demand would have motivated the innovation. Finally, since the content never needed to be stored on the system that was accessing it, if necessary, digital rights management software could have been integrated as part of the authentication for access privileges. Thus, P2P content sharing existed in a seminal, pure form in the 1980s. The technological engineering innovations in place today that give us global content sharing on P2P networks are really a tuning/reworking of old ideas accompanied with the expansion of the Internet, the performance enhancing corollaries associated with Moore’s law, and drastically increased local disk storage. The authors sincerely believe that careful research for prior art would uncover sufficient examples from the 1980s to invalidate huge numbers of current software patents. Just as distributed file systems were developed in the 1980s, so were distributed operating systems. The latter bear a very strong resemblance to
P2P: The Endgame of Moore’s Law
33
P2P systems. In this spirit we next describe the V-System that was developed at Stanford University.
2.1.4 The V-System One thing that can be said about the 1980s is that all efforts were made to support heterogeneous computing environments. We’ve already mentioned the multiple network protocols that were present. Along with these protocols one also found a large assortment of computers. At Stanford University, for example, research was done on systems such as Sun workstations, VAX’s, DEC-20’s and Lisp machines3. These systems also supported student users. Appropriately enough one would expect distributed systems research to look at ways to use these machines in a co-operative fashion, and this is exactly what the V-System did under the guidance of computer science assistant professors David Cheriton and Keith Lantz. It brought together the computing power of the above collection of machines in a way that was P2P. The major goal of the V-System was to distribute processing and resources, and to do so with protocols and APIs that were system independent. Why discuss the V-System in detail? As you will see, the organization of the V-System, its model and the approach that was taken towards development, were carefully thought out and implemented to satisfy the needs of the user. Each system component separated with APIs and protocols were machine independent. The developers had user satisfaction and performance as primary goals rather than after thoughts and addressed its network protocols to the IETF. This software development practices adhered to by the graduate students were way ahead of their time. All protocols and APIs were carefully documented and rules for writing consistent C code were strictly followed. And, last, but not least, it exhibited many features of P2P systems. The V-System begins with its user model. Each user had a workstation, and state-of-the-art user interface support was a first principle. “The workstation should function as the front end to all available resources, whether local to the workstation or remote.” To do so the V-System adheres to three fundamental principles:
3
William Yeager wrote an Interlisp version of the VGTS messaging system in 1984. The VGTS is the V-System, UI component and explained in this section. The Interlisp VGTS was used to demonstrate the power of remote virtual graphics by permitting Xerox PARK Dolphins to communicate with a Sun workstation client running V-System. The graphics were generated on the Interlist Dolphin servers.The client is where the graphics were displayed.
34
Chapter 2
1. The interface to the application programs is independent of physical devices or intervening networks. 2. The user is allowed to perform multiple tasks simultaneously. 3. Response to user interaction is fast (Cheriton 1988, 314-333). It is refreshing to see the user placed first and foremost in a research project. All processing was parallel, and a “reasonably sophisticated” window system was employed. Applications ran either locally or remotely and were associated with one or more virtual terminals when user interaction was required. “The V-System adheres to a server model (Cheriton 1988, 314-333).” In the V-System, resources are managed servers and accessible by clients. One can view a server as an API that hides the resource it represents and thus it is by the means of the API that the resource can be accessed and manipulated. The APIs are well defined across servers thus yielding consistent access. In a sense, the V-System has many properties of today’s application servers with perhaps the following exception. A server can act as a client when it accesses the resources managed by another server. “Thus, client and server are merely roles played by a process (Cheriton 1988, 314-333).” And here, we see the P2P aspect of the V-System. It is easy to imagine the collection of workstations running the V-System all sharing resources in a symmetric way. The resources can be CPU cycles or content or both. This is again pure P2P. Let’s look a little more closely to see what else can be revealed. The system is a collection of clients and servers that can be distributed throughout the Internet, and that can access and manipulate resources by the means of servers. The access to a resource is identical if the resource is local or remote since there are APIs and protocols that are used for this access. This access is said to be “network transparent.” There is a principle of P2P that resources will tend to migrate closer to the Peer Nodes that show interest in them. The V-System has a similar feature. V-System clients may influence or determine the location of a resource. To support the server processes, the V-System has a distributed kernel which is the collection of V-Kernels that run on each machine or host in the distributed system. “Each host kernel provides process management, interprocess communication, and low-level device management facilities.” Furthermore, there is an Inter-kernel Protocol (IKP) that permits transparent, inter-process communication between processes running in V-Kernels. Let’s take a quick look at a few of the typical V-Servers:
P2P: The Endgame of Moore’s Law
35
1. Virtual Graphics Terminal Server: Handles all terminal management functions. There is one per workstation. An application may manipulate multiple virtual terminals and the Virtual Graphics Terminal Protocol (VGTP) is used for this purpose. The VGTP is an object-oriented protocol where the graphic objects can be recursively defined by other graphic objects and thus the VGTS supports structured display files which are highly efficient with respect to both the frequency of communication and amount of data communicated. 2. Internet Server: Provides network and transport level support. 3. Pipe Server: Standard asynchronous, buffered communication. 4. Team Server: Where a team is a collection of processes on a host, the team server provides team management. Noting that applications can migrate between hosts, this migration and remote execution is managed by the team server. 5. Exception Server: Catches process exceptions and manages them appropriately. 6. Storage Server: Manages file storage. 7. Device Server: Interfaces to standard physical devices like terminals, mice, serial lines, and disks. It is therefore simple to visualize a typical workstation running the VSystem, and users running applications communicating with processes which form teams all managed by the distributed V-kernel’s servers. The symmetry of client/server roles is clear, and symmetry is at the heart of P2P. Now, suppose that the distributed V-Kernel is active across a LAN on multiple hosts, and that there are team processes on several of the hosts that have a common goal, or at least a need to share resources. What is the most efficient way for this communication to take place? First, we need to organize the teams. In the V-System the teams are organized into host groups. A host group is a collection of servers on one or more hosts. And, certainly, there can be many host groups active at the same time in the VSystem. They are like our connected communities as well as JXTA peer groups. In fact, a host group can be implemented as a connected community. Again, the computer science roots of P2P reach back at least to the 1980s. To efficiently communicate between the distributed host groups the VSystem uses multicast that is first described in rfc966, and ultimately obsoleted by rfc1112. The authors of rfc966 are David R. Cheriton and Steve Deering. Steve was a computer science graduate student in David’s distributed systems group. The author of rfc1112 is Steve Deering. Rfc1112 is entitled “Host Extensions for IP Multicasting.” Rfc1112 is an Internet
36
Chapter 2
standard. What follows is an excerpt from rfc1112: IP multicasting is the transmission of an IP datagram to a “host group”, a set of zero or more hosts identified by a single IP destination address. A multicast datagram is delivered to all members of its destination host group with the same “bestefforts” reliability as regular unicast IP datagrams. The datagram is not guaranteed to arrive intact at all members of the destination group or in the same order relative to other datagrams. The membership of a host group is dynamic; that is, hosts may join and leave groups at any time. There is no restriction on the location or number of members in a host group. A host may be a member of more than one group at a time. A host need not be a member of a group to send datagrams to it. A host group may be permanent or transient. Indeed, host groups are the forerunners of connected communities. To accommodate host groups in IPV6 there are dedicated group multicast addresses. It would have been quite simple to implement P2P chat rooms in the VSystem given the VTGS. The implementation would have been quite efficient with the use of IP Multicasting as it is implemented. This is because IP Multicast datagrams were directed to the subnets on which the host groups reside. On that subnet a single IP datagram is multicast to all the host group members yielding a huge savings in bandwidth. Content sharing would also be straightforward with the storage server and VTGS. The VSystem could have been also used for grid computing where host groups partition the grid for targeted, host group-based calculations. Finally, we are sure that the V-System is not the only example from the 1980s of a distributed system that is very P2P-like in its behavior. P2P is really a nascent capability that a couple of decades has brought to the mainstream. We next look at the decade of the 1990s that was a decade of maturation of the ideas from the 1980s with a lot of help from excellent hardware engineering taking advantage of Moore’s Law.
2.2 The 1990s - The Decade of the Information Highway Recall from section 2.1.1 that the Internet had been so successful that on April 30, 1995 NSF abandoned the NSFNET backbone in favor of a fully privatized backbone having achieved a growth to about 100,000 networks in the United States. During the same time the Internet4 was on a global growth path. While universities, research laboratories, governments and companies were discovering a better, more streamlined way of doing 4
The term “Internet” as we use it includes both the private and public networks. Purists may find this objectionable, but in hindsight that is what the Internet became in the mid-90’s.
P2P: The Endgame of Moore’s Law
37
business using the Internet, it is clear that the invention of the world wide web by Tim Berners-Lee in 1991 was the real force behind bring the Internet from where it was then to where it is now, in 2004. Tim Berners-Lee writes, “Given the go-ahead to experiment by my boss, Mike Sendall, I wrote in 1990 a program called “World Wide Web”, a point and click hypertext editor which ran on the “NeXT” machine. This, together with the first Web server, was released to the High Energy Physics community at first, and to the hypertext and NeXT communities in the summer of 1991. Also available was a “line mode” browser by student Nicola Pellow, which could be run on almost any computer. The specifications of UDIs (now URIs), HyperText Markup Language (HTML) and HyperText Transfer Protocol (HTTP) published on the first server in order to promote wide adoption and discussion.” (Berners-Lee 1998) The first web server, info.cern.ch, was put on-line in 1991 and the access grew by an order of magnitude each year up until 1994. By 1994 the interest in the web was so large in both business and academia that Tim decided to form the World Wide Web Consortium (w3c). At the same time a series of RFC’s specified the protocols and definitions in the IETF: 1. rfc1630 Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web. T. Berners-Lee. June 1994. 2. rfc1738 Uniform Resource Locators (URL). T. Berners-Lee, L. Masinter, M. McCahill. December 1994. 3. rfc1866 Hypertext Markup Language - 2.0. T. Berners-Lee, D. Connolly. November 1995. 4. rfc1945 Hypertext Transfer Protocol -- HTTP/1.0. T. BernersLee, R. Fielding, H. Frystyk. May 1996. The 1990s also launched the commercial use of the Internet. There was resistance from academics to the commercialization. “Many university users were outraged at the idea of non-educational use of their networks. Ironically it was the commercial Internet service providers who brought prices low enough that junior colleges and other schools could afford to participate in the new arenas of education and research.” (History of the Internet 2022) With the end of commercial restrictions in 1994 the Internet experienced unprecedented growth. ISPs flourished and began to offer both web access and email service. Fiber optic cables were pulled to globally connect major
38
Chapter 2
industrial areas, satellite service was added, and the web went mobilewireless with the introduction of the Wireless Access Protocol (WAP) in the late 1990s bringing web access to mobile phones and PDAs. As we are all aware, by the end of the decade web sites flourished, and the .COM era arrived. Perhaps the best measure of this growth is the number of web pages that are indexed: “The first search engine, Lycos, was created in 1993 as a university project. At the end of 1993, Lycos indexed a total of 800,000 web pages.” (History of the Internet 2022) Google currently indexes around 30 trillion web pages. In nearly 20 years the increase is 37,500,000-fold! With respect to standards, during the decade of the 1990s the IETF was extremely active. 1,679 RFCs were published. 380 were published in the previous decade. The intellectual contribution to the IETF was escalating, and this is a standards body that became extremely formal during the 1990s. Processes were put in place to create application areas, working groups and an overall governing body for the IETF. The WAP specification 1.0 was released in 1999, thus giving a standards foundation for proxied Internet access by mobile phones. As is discussed in chapter 6, Java standards for these same devices were put into place with MIDP 1.0 in the Fall of 1999. These standards along with the increase of Internet bandwidth brought to the users of the Internet the protocols to access a diversity of content types on a variety of devices. In the 1980s we had for the most part text-based content. The innovations of the 1990s provided multi-media content to the masses: Text, images, sound, and video. And the masses loved it! Finally, we have the issue of security and in particular ciphers and public key algorithms. With respect to the former, the patent for the DES cipher expired in 1993. Because 56-bit DES can be cracked by a brute force attack which makes it obsolete, 3DES was introduced to make up for this shortcoming. Also, Bruce Schneier introduced the Blowfish cipher in 1993 as a public domain DES alternative. With respect to the latter, DiffeyHellman expired in 1997 and on September 6, 2000, RSA Security made the RSA algorithm publicly available and waived its rights to enforce the RSA patent. Thus, by the end of the 1990s developers were able to secure their software without a concern for royalties which gave a large boost to ecommerce on the Internet. As we reflect upon the last few paragraphs, the one salient thing beyond the global connectivity provided by the Internet, beyond the hardware, that motivates this growth is the driving force for people to communicate and to be entertained. Behind it all is a starvation for social interaction, for access to information for education, and entertainment. They will pay dollars for the applications that fulfill these needs. And, here the stage is set for P2P to
P2P: The Endgame of Moore’s Law
39
enter the scene and play a major role. The Information Highway is in place and ready to handle the traffic!
2.3 The New Millennium P2P exploded into the public’s eye in 2000 with the flurry of lawsuits against Napster for contributing to the infringement of copyright by its users or peers. By that time billions of MP3 music files had been exchanged by users of the Napster network and clients. The network was a collection of servers that indexed music found on users’ systems that ran the Napster client. The Napster client used the servers to find the indexed music and the peers on which it resided, and the network provided the mechanisms necessary to permit the music to be shared between peers. Napster was created by Shawn Fanning in May of 1999 as a content sharing application. It was not P2P in the purest sense. At its peak there were 160 Napster servers at the heart of the Napster network. The lawsuits had the ironic effect of popularizing Napster. In March of 2001 a ruling by a U. S. District Court of Appeals upheld an injunction against Napster, thus requiring it to block copyrighted songs. In June of 1999 Ian Clarke brought us Freenet. Freenet, unlike Napster, is a pure P2P system. Ian is very interested in personal privacy and freedom of speech. In an article by Ian on “The Philosophy of Freenet”, Ian states, “Freenet is free software which lets you publish and obtain information on the Internet without fear of censorship. To achieve this freedom, the network is entirely decentralized, and publishers and consumers of information are anonymous. Without anonymity there can never be true freedom of speech, and without decentralization the network will be vulnerable to attack.” (Clarke, 1999)
Communications by Freenet nodes are encrypted and are “routedthrough” other nodes to make it extremely difficult to determine who is requesting the information and what its content is. Ian’s next P2P system is Locutus. Locutus emphasizes security, runs on .NET and is targeted to the Enterprise. A third generic P2P system of this decade is Gene Kan’s Gnutella. Gnutella is an elegant protocol for distributed search with five commands. It too is pure P2P where each peer plays the role of both a client and a server. A brief description is the following: Gnutella 2 is a protocol for distributed search. Although the Gnutella protocol supports a traditional client/centralized server search paradigm, Gnutella’s distinction is its peer-to-peer, decentralized model. In this model,
40
Chapter 2
every client is a server, and vice versa. These so-called Gnutella servants perform tasks normally associated with both clients and servers. They provide client-side interfaces through which users can issue queries and view search results, while at the same time they also accept queries from other servants, check for matches against their local data set, and respond with applicable results. Due to its distributed nature, a network of servants that implements the Gnutella protocol is highly fault-tolerant, as operation of the network will not be interrupted if a subset of servants goes offline. Gnutella has undergone a huge amount of analysis since it was launched. It had weaknesses, and these weaknesses were part of its strength. They encouraged and yielded excellent research in P2P and as a consequence improved the algorithm. The P2P world is grateful to Gene for his vision of P2P, and his energy as an evangelist of the P2P technology. Finally, we close our discussion of the history noting the launch of Sun Microsystems’ Project JXTA for “juxtapose” on April 25, 2001. JXTA was open source and had been under continuous development for several years. Its specifications define a P2P infrastructure that includes both peer nodes and super-peers called rendezvous. The JXTA infrastructure is fully realized with a Java implementation. From the beginning one of the goals of JXTA has been to create a P2P standard. Currently, P2P networks are not interoperable with differing protocols creating P2P networks that are isolated islands. As a direct consequence of this desire to standardize P2P by members of the JXTA community. William Yeager was JXTA’s CTO. He worked with the Internet Architecture Board (IAB) to create in 2003 an Internet Research Task Force Research Group on P2P. JXTA was used world-wide by P2P enthusiasts for creating P2P applications. (Project JXTA 2011) We have made a conscious choice in writing this book to not be encyclopedic and thus, not list the remaining P2P applications and networks that now exist. No such list will ever be current until a consensus is reached on a P2P standard for the Internet. What we have now are cul-de-sac protocols that cannot possibly do justice to the possibilities of P2P imagined by its visionaries. Understandably, these dead-end alley ways are driven by the desire to capitalize, to profit on P2P. While not bad in itself, since capital is necessary to support research and development, we really want to see the history of P2P come to the place where the agreement on a standard is reached. We hope that this brief history of P2P has given the reader an idea of its roots, some of which are not apparently P2P on the surface. So much of technical history is filled with side-effects. One cannot always guess what a new idea will bring along with it. The original Internet of the 1980s had very
P2P: The Endgame of Moore’s Law
41
few worries about security until the end of the decade when it became global and surprise hackers arrived to cause serious problems. The Internet’s age of innocence was short-lived. Still, the energy and creativity of those who continued to build out this amazing infrastructure could not be stopped. Security problems are an impedance that this creative energy continues to overwhelm. Most of the history of P2P is in front of us. Let’s get to work to realize its possibilities.
CHAPTER 3 COMPONENTS OF THE P2P MODEL
From thirty thousand feet a P2P overlay network 1 appears as a collection of peer-nodes that manage to communicate with one another. At this altitude it is sufficient to discuss content sharing, its pros, and cons and how it will create a new Internet digital economy as was done in Chapter 1. In order to come down out of the clouds and to discuss the ground level engineering concepts, it is necessary to define the real engineering parts of these peernodes, the components that comprise the P2P overlay network, as well as the protocols used for peer-node to peer-node communication. This is not unlike assembling a detailed plastic model of a classic automobile, or futuristic spacecraft. Each part is important, and the rules for assembling them correctly, i.e., the blueprints, are indispensable to the process. To this end, in this chapter we first have a discussion of the P2P document language which is a universal, descriptive meta-component (a component for describing components). Just like the final blueprint of a home, there is always a collection of blueprints, one describing the plumbing, several for the multiple views, others for each room, the exterior walls, etc. Our final peer-node document will also be a collection of documents. Thus, as we define our P2P model component by component in this chapter, by starting with those that are fundamental and using these as building blocks for more complex components, we will be defining a set of 4PL types and combinations thereof. These types will then be the grammar of our document language and permit us to create the multiple blueprints that will be the engineer’s guide to constructing a peer-node. To help the reader to build a conceptual understanding of the final, assembled model, each section explains the motivations and behaviors of the components it defines. It is from these explanations and 4PL that we derive the semantics of the document language.
1
We may sometimes use “P2P network” in lieu of “P2P overlay network”.
Components of the P2P Model
43
3.1 The P2P Document Space It is important to use a document language that is globally accepted on the Internet, portable across multiple programming languages, and easy to parse. It must also permit software engineers, architects, and analysts to use their own name spaces. Our choices were between JSON and XML. We selected the latter because it satisfies all the above constraints.
3.1.1 XML as a Document Language In any society, to establish a communication between people, either everyone needs to speak a common language or to be able to translate their language into a language which can be understood by both parties. Peernode to peer-node P2P overlay network communication is not an exception to this rule. For example, either for the transfer of content or to initialize connections, one peer-node will send messages to a target peer-node, the target peer-node will send responses. The “language” in which the messages are expressed must be understood by all peer-nodes participating in the communication. In the networking world, the meaning of such a “language” is not the same as that of a programming language in the computing world. Instead, the former language permits us to define the structure (or format, or syntax) in which messages are written, and unlike programming languages, this structure is independent of the message’s semantics, or meaning. This structure should be able to allow messages not only to say “hello”, but also to permit peer-nodes to negotiate a secure communication channel, and to transfer the multi-dimensional data along that channel. The semantics of such a negotiation, or data transfer will be defined by the associated protocols’ behavior and not the underlying document language. A required feature of the document language is to permit the creation of structured documents with flexibility of descriptors or names. It is difficult to describe almost arbitrary peer-node components in a document language whose descriptor set, or namespace is fixed. The language of choice must also be an accepted standard in the Internet community and off-of-the-shelf, open-source parsers must be available. It should also be simple to write parsers for minimal, application defined namespaces so that it can be used across the device space. Extensible Markup Language (XML) (W3C n.d.) naturally meets above requirements, is a widely deployed on the Internet, is a World Wide Web Community (w3c) standard, and for us, is an ideal markup language to create structured documents that describe the various engineering components we require to communicate their properties amongst the peer-
Chapter 3
44
nodes on a P2P overlay network. With its structured format, XML parsers are easily implemented. Also, the tags used by XML are not fixed like in HTML, and therefore, the elements can be defined based on any application’s needs. Each application can have its particular XML namespace that defines both the tags and the text that appears between these tags that the application uses. Because of these properties, HTML can be expressed in XML as XHTML (W3C n.d.). Given the freedom to choose tags suitable for specific applications or devices, XHTML basic (W3C n.d.) which is a subset of XHTML is used for mobile phone like devices. There are XHTML basic browsers with a footprint of about 60K bytes which shows the programmability and power of XML. The parsers are a small percentage of this overall browser footprint. In other words, XML is a versatile markup language that is able to represent the nature of both computer and human behavior. For example, one can imagine a small “talk” application expressed in XML. Here at the highest-level John talks to Sam:
talk.jar John Sam Hi Sam!
John’s local system analyzes the document, starts the talk program, connects to Sam’s system, and sends the document. Upon receipt, Sam will see something like: Message from John: Hi Sam! The structure is in the document, the behavior is in the talk program. This is admittedly an oversimplification but does express the power of XML. In this case, for example, the talk application might only need to have four or five tags, and XML is used to describe several programmatic actions: 1. The “run” tag is interpreted by an application to mean to run a program named talk.jar. Here, the extension implicitly will invoke Java to accomplish this task. 2. The “from” and “to” tags implicitly describe the local and remote ends of a communication and are explicit parameters for talk.jar. 3. Finally, the “text” tag is also a parameter for talk.jar and the program’s code sends this as a text message to Sam.
Components of the P2P Model
45
4. Notice that the data is included in the scope of ... . Again, it is important to emphasize that the meanings of the tags are not implied by the XML document. Our brains have associations with respect to the tag names, and they are named in this manner because humans have come to understand “run program.” We could have just as well used , , , , and as tags:
talk.jar John Sam Hi Sam!
The talk program’s code doesn’t care and can be written to interpret any text string to mean “send the message to the name bound to the tag pair defined by this text string.” After all, this is just a string match. Now, let’s generalize the above example to explain what is meant by Meta-data, or data about data.
tcp://John tcp://Sam
talk.jar java 13.0 27691
read-only
file:///home/John/friends-only
46
Chapter 3
hello txt
image/gif John.gif
video/jpeg Hawaii.jpeg
In the above example “xmlns:dc” identifies the namespace with a Uniform Resource Identifier (URI) (Berners-Lee 1998). This latter URI name need only have uniqueness and persistence and is not intended to reference a document. There are several examples of meta-data: The application version, size, type, the access control fields, and the attachments’ content types. Because meta-data is such a powerful tool, many efforts have been made to standardize its format, such as the open forum Dublin Core Metadata Initiative (DCMI), the w3c standardization community and the knowledge representation community. Out of the w3c efforts we have the Resource Description Framework (RDF) (W3C ,2014). The goal of RDF is not only to specify what kind of tags are needed, but also to enable the creation of relationships between these tags, i.e., RDF is explicitly about semantics, and uses XML syntax to specify the semantics of structured documents. For example, here’s a relationship: Holycat is the creator of the resource www.holycat.com. RDF will indicate this relationship as the mapping to the proper tags: Holycat as the creator in an RDF “Description about” www.holycat.com. The relevant part of the metadata is below:
HolyCat.com
Components of the P2P Model
47
holycat
In the following sections and chapters, we require a markup language for the structured documents to describe the P2P components of our overlay network. We are selecting XML for this purpose. As mentioned above, it is a w3c standard with wide and growing support in the high-tech industry. It permits us to create our own namespace to clearly express the concepts of each component. Also the engineers who wish to implement a system modeled on these components will have the tools available to parse the documents (SourceForge, 2013), or write their own, application, namespace specific XML parsers as has been done for many small devices and existing P2P systems.
3.1.2 Publish and Subscribe: Administrated versus Ad-hoc Our P2P document language, XML, will provide structured information describing our components, their network behavior, the content exchanged between peer-nodes, cryptographic data, and much more. When this information is published either locally, or remotely, it is crucial to efficiently administer this large document space, and this administration may or may not be centralized. Recall the “P2P Spectrum” introduced in Chapter 1. A P2P network may be configured as hybrid or pure ad-hoc, and each point in the spectrum needs various methods to distribute, publish and subscribe to these documents as well as the policies that control both the publication and subscription. Inside a hybrid P2P network, any document can be stored and managed on centralized nodes. The more centralized the network becomes the more server-like these peer-nodes become, and consequently, the P2P network has much more control imposed upon the peer-nodes’ behavior. For example, the initial copies of the most popular music, digital certificates, the status of critical peer-nodes, and the registration of peer-node names can be published on content, enterprise key escrow, system status, and naming servers, and access is controlled by means such as passwords, and firewalls. On the other hand, in a pure, ad-hoc P2P network, the information is not required to be centrally stored, and the administrative policies that control its access are set between the peer-nodes themselves. In fact, in this latter case, all peer-nodes may have unrestricted read access to all documents.
48
Chapter 3
3.1.2.1 Administrated Registries There were already existing, well administered registries and servers in use on the Internet in 1982 (Harrenstien, White, and Feinler, 1982). In 1983 the concept of domain names was introduced (Mockapetris, 1983). This was accompanied by the full specifications as well as the specific specifications of domain name servers and resolvers (Mockapetris, 1983). The domain name concepts, facilities and their use for Internet mail were defined (Mockapetris, 1983) as well as an implementation schedule (Postel, 1984). Soon afterwards, the Domain Naming Service (DNS) was in place and was exclusively used for domain name to IP address lookups, or vice-versa. With time the DNS implementation has become a general, administered, Internet database. Thus, we can manage XML documents in such a fashion. While it is possible to store an entire XML document in DNS servers, this is impractical given the billions of possible peers and their associated documents. On the other hand, in several fields these documents must be unique, and in some cases, their creation controlled by those who administer the P2P network in question. Collisions of some fields will be probabilistically zero by using the right algorithms for their generation, for example, secure random byte strings of sufficient length. Other such fields will be text strings, for example, a peer’s name, and as such, have a high probability of collision. These will certainly be administered and controlled in enterprise deployments of P2P systems, and DNS may be appropriate for these names. The problem with DNS is that this service is already overloaded and adding even millions of additional entries is not a good idea. The Lightweight Directory Access Protocol (LDAP) (Yeung, Howes and Kille, 1995) provides naming and directory services to read and write an X.500 Directory in a very efficient way. Its operations can be grouped to three basic categories: binding/unbinding which starts/terminates a protocol session between a client and server; reading the directory including searching and comparing directory entries; and writing to the directory including modifying, adding, and deleting entries from the directory. Such a simple protocol can be used to store and access our P2P documents. LDAP has the advantage over DNS that the administration and scope of its directories are more flexible. A small company, or even a neighborhood in a community can decide to put in place and administer its own LDAP directory. For a highly centralized P2P network it is appropriate to store entire XML documents or selected fields from these documents in LDAP directories. Tagged fields can have URI’s referencing LDAP directory entries. In certain cases, it will be necessary to authenticate peers so that
Components of the P2P Model
49
they can access, for example, private data, and the peer’s credentials can be validated through an LDAP search. Here is a hypothetical example of using LDAP to access XML fields from an LDAP directory. Assume LDAP server, P2PLDAPServer, exists with hostname P2PCommerce.com. Furthermore, assume that the organization P2PCommerce.com supports shopping in department stores. Now, a client will make a query to search for Anne-Sophie Boureau who works for a department store in Paris, France. The “Grand Magasins” in Paris are well organized and have XML documents for each employee.
médicins.com
médicins
Anne-Sophie médicins.com
Anne-Sophie Boureau
Anne-Sophie
Boureau
Anne-Sophie.Boureau@médicins.com
Chapter 3
50
So, in the following code the client creates a query, then makes a connection to the Nth server. After the connection succeeds, the client can perform the search and save the result: LDAP_NAME[N] = "P2PLDAPServer"; LDAP_SERVER[N] = "P2PCommerce.com"; LDAP_ROOT_DN[N] = "ou=DepartmentStores,o=P2PCommerce.com"; LDAP_QUERRY = "L=Paris,C=France,CN=Anne-Sophie Boureau"; connect_id = ldap_connect(LDAP_SERVER[N]); search_id=ldap_search(connect_id, LDAP_ROOT_DN[N], LDAP_QUERY); result = ldap_get_entries(connect_id, search_id);
In the result, we will have several items, common name (cn), distinguished name (dn), first name (gn), last name (sn), and email address (email): result["cn"] = "Anne-Sophie Boureau" result["dn"] = "uid=Anne-Sophie,company=médicins.com" result["gn"] = "Anne-Sophie" result["sn"] = "Boureau" result["em"] = "Anne-Sophie.Boureau@médicins.com"
3.1.2.2 Ad-hoc Registries Peers will be required to have a set of unique XML documents to describe the component of which they are composed. Since the overlay network is really the join of these components, then the network itself will be consistent and there will be no conflicts from duplicate documents. The details of each of these XML documents are described later in this chapter and Chapter 4. When a collection of nodes on a network decides to form an ad-hoc, structured P2P overlay network, the associated XML document management cannot rely on the centralized “super peer” computers. Instead, in our model for P2P behavior, and in a purely ad-hoc network, each peer manages its own document space, as well as conflicts due to duplication to assure consistency. Suitable algorithms are described in the following sections to avoid duplication. The probability of duplication is extremely small, and when duplication arises the code will be present to deal with it. Other degrees of ad-hoc behavior are possible. For example, an ad-hoc P2P network may have a peer-node, or several peer-nodes whose systems are reliable. In this case one can off-load the real-time registration of these documents to these peer-nodes. Registration of documents is purely ad-hoc without any administration except for the knowledge of how to connect to these peer-nodes. One advantage of this pseudo-centralization is the
Components of the P2P Model
51
presence of systems whose role is real-time guardians of document consistency. The system that volunteers for this role is making the commitment to provide solid or near-solid services: being up constantly as well as having a predictable degree of reliability so that the problem of duplication of XML documents is non-existent. We create unique documents for components by including unique identifiers (see section 3.2). An enterprise P2P overlay network behind firewalls can guarantee consistency to eliminate duplications by helping to form unique identifiers and managing documents on enterprise servers. But when these peer-nodes connect to a non-enterprise P2P overlay network, this guarantee is difficult to maintain because there is no global monitoring of unique identifiers. And the algorithms used to generate unique identifiers may permit the duplication of some of the components’ XML documents, consequently yielding an inconsistent join of these different overlay networks. Thus, even if both P2P networks are strictly administered with independent, centralized control to guarantee each one’s consistency, we still run into this problem. The same problem exists for the pure ad-hoc network discussed just above. Because there is no global registration, joins might be inconsistent. Figure 3-1 shows the problem in both situations.
Figure 3-1. The P2P Join Problem
While, clearly, for security reasons enterprises may not want to permit joins of this nature, the join problem can be solved with global, ad-hoc
52
Chapter 3
registries. We can argue that prohibiting a global P2P address space, and that is what we are describing, is the wrong way to solve such security problems, and that such an address space is a good idea to promote P2P technology and e-Commerce. What would the Internet be like today if the IP address space was treated in the same manner? The security solutions are best solved with security algorithms, etc. How a global P2P overlay network with globally unique identifiers, and therefore a globally consistent document space is accomplished is discussed in section 3.2.2 of this chapter.
3.2 Peer Identity How does one establish a unique, global Peer Identity given the ultimate trillion or more peers and the connected device space grows without end? The IPv4 address space was long ago exhausted and the migration to IPv6 is slow to arrive. The latter also has uniqueness problems even if the address space is more than sufficient. We discuss the latter.
3.2.1 One Peer among Billions As the Internet evolved since the year 2000, and billions of devices have become Internet enabled, each of these networked devices will be capable of being a peer-node on a P2P overlay network, and each will require a unique identity that is recognizable independently of the device’s user and location within this network. While we do not expect most appliances in a home network to be mobile, unique peer identities will still be necessary, so that one peer among billions like one human being among billions, has its own, unique identity. The identity is globally unique, for example: sensors, desktop or a laptop, PDA, mobile phone, or Bluetooth enabled devices, wearables like necklaces and watches, etc. The mobile devices’ users may meander in a crowd looking for contact and the users of laptops go from coffee shops to hotel rooms while remaining an active member in their home P2P network. To permit individual peers to have anonymity, a peer should be capable of changing this identity at will, and this new identity must maintain both its uniqueness and mobility properties on the P2P network to which it belongs. You are stuck with your DNA but not with your peer identity. And, as always, policies that permit change of identity are set by the appropriate members of the peer network. It is clear that most enterprises, and probably all governments will want to carefully administer the peer identities that they legally control. But the large majority of these identities may not be registered, and registration is neither necessary nor always desirable in a P2P network.
Components of the P2P Model
53
A peer should be able to generate a unique identity when it is initially configured, and the engineering fundamentals for building the P2P model described in this book will permit such a peer to become a member of a P2P network, to be discovered by other peer-nodes on this network, and to communicate with those peer-nodes. Registration of unique identities is also within the scope of this model. The model is, in fact, indifferent to administrative policies which are decided and implemented by the P2P network owners. So, what can one use for such a universal peer identity? Certainly, IP version 4 (IPv4) addresses are out of the question given that its 32-bit address space was long ago exhausted. IPv4 addresses are the current on Internet visible address default. And this in fact may differ from its Local Area Network address given that nearly all devices use a router and Network Address Translation. The difficulties this raises for P2P networks are discussed in section 3.5 of this chapter. We can consider using IP version 6 (IPv6) addresses which provide a 128-bit address space but at this time and for the near future, IPv6 has still not been universally deployed. Still, in anticipation of this deployment, the IPv6 option must be considered from the point of view of format. If we believe that IPv6 addresses are the inevitable future for the Internet, then we can argue for at least using the IPv6 format to both minimize the formats the software is required to support, and to stay as close to the Internet standards as possible with a long-term goal of interoperable P2P software. Another choice is a unique random identifier of 128 or more bits generated by a secure random number generator. We can also “spice” any of these identifiers with cryptographic information to create unique, secure cryptographic based identities (CBID) (Montenegro 2002). These and other possibilities are discussed in the next section.
3.2.2 Unique Identifiers for Peers As mentioned above, it was recognized by the early 1990’s that the 32bit IPv4 address space would be soon exhausted. Most of the early innovators of the Internet thought that 4,294,967,295 addresses were sufficient and thus the initial IP specification, RFC760, which was authored by Jon Postel (Postal, 1980), and published by the Department of Defense (DOD) as a DOD standard in January of 1980, allocated two 32-bit fields in the IP packet header for the source and destination IP addresses. As an historical note, interestingly enough, the XEROX Palo Alto Research Center’s XNS network protocol arrived at the same time as IP, and had 80 bits of address space, 48 bits for the host and 32 for the network, and the 48-
54
Chapter 3
bit host address was usually the 48-bit MAC address. As we will see below, using the MAC address as part of the IPv6 address is one possible format option. It is always amusing that from an historical perspective, so many cool things were done in the 1980’s, and that they are not abandoned, but rather, as was pointed out in Chapter 2, sat on the shelf until rediscovered. The problem with XNS was that it was XEROX-proprietary at the time, and IP has never been proprietary. They came from two different visions, and it is clear which vision won, i.e., open standards, and this is the vision for the future. In any case, in December of 1995, the first specifications for IPv6 were submitted to the Internet Engineering Task Force (IETF) in RFC1883 and RFC1884. In order to more fully understand the appropriateness of IPv6 addresses for unique identifiers, a very careful reading of RFC2373 and RFC2460, which obsolete the previous two RFCs, is necessary. We give the reader a good enough overview of these RFCs in subsection 3.2.2.1. Again, it is important to keep in mind that the immediate goal with respect to IPv6 is, when appropriate, to use its address format to store, and publish generated, unique identities. As mentioned in the introduction to this section, IPv6 is not the only possible choice for the format of what is called a Universally Unique Identifier (UUID). There are many good algorithms that can generate many thousands of UUIDs per second. The IPv6 format may not be suitable in some cases, and in this book’s model multiple UUIDs will be necessary for its component set. In subsection 3.2.2.2 these UUIDs are discussed. Keeping this in mind let’s first move on to the primer on IPv6 addresses. 3.2.2.1 IPv6 Addresses IPv6 solves several shortcomings of IPv4. IPv6 is designed to improve upon IPv4’s scalability, security, ease-of-configuration, and network management. The scalability improvements reflect both increasing the address space size as well as providing the mechanism for scalable Internet routing. We’ve adequately discussed the 32-bit address space limitation in this Chapter which IPv6 eliminates with 128-bit addresses. This is an almost unimaginable number. If we start now and use one billion addresses per second without recycling, then we have enough addresses to last 1022 years. Noting that our sun will go supernova in 109 years, and if the universe is closed, its calculated lifetime is about 1011 years, clearly, IPv6 addresses solve the address space size problem for the foreseeable future. Since IPv4 addresses do not provide a mechanism for hierarchical routing, like, for example, the telephone exchange does for phone calls with country and area codes, IP routers’ routing table size has become problematic as the Internet
Components of the P2P Model
55
has grown in a way that was not anticipated by its founders. With the original IPv4 address space formats, the class A, B, and C networks provided no mechanism for hierarchical routing. The classic IPv4 address format, as defined in RFC796, permits 127 class A networks, 16,383 class B networks, and 1,048,537 class C networks. Since this is a flat address space, to route to every network using this scheme, an entry for each network is required in each router’s routing table. With the advent of Classless Inter-Domain Routing (CIDR) (Fuller, Li, Yu and Varadhan 1993) in 1993, a hierarchical means of creating 32-bit IPv4 addresses was devised as a near-term solution to this problem. CIDR is backward compatible with the older IPv4 addresses but does not eliminate the already existing legacy networks. A route to each one still must be maintained in the routing tables for all routers that provide a path to such a network, but they can coexist with the CIDR network addresses. Thus, despite the CIDR near term solution, a true hierarchical addressing scheme is required, and IPv6 provides such a mechanism.
Figure 3-2. IPv4 Address Format
IPv6 offers multiple ways to format its 128-bit addresses, and there are three types of addresses: unicast, anycast and multicast. Where a node on an IP network may have more than one interface attached to that network a unicast address is an identifier for a single interface; an anycast address is an identifier for a collection of interfaces for which an anycast packet destined for this collection of interfaces is delivered to one and only one of
56
Chapter 3
the interface members; a multicast address is an identifier for a collection of interfaces for which a multicast packet destined for this collection of interfaces is delivered to all of them. Because an anycast address is syntactically indistinguishable from a unicast address, nodes sending packets to anycast addresses are not generally aware that an anycast address is used. We will concentrate our explanations on those addresses which are most useful for our P2P UUID purposes. In particular, the IPv6 aggregatable global unicast is salient here (Hinden, O'Dell and Deering,1998) since it solves the scalable routing problem and provides a method to generate globally unique IP addresses when used in conjunction with IPv6 Neighbor Discovery (ND) (Narten, Nordmark and Simpson, 1998) and IP stateless address autoconfiguration (Thomson, Narten, 1998). As we see in Figure 33, the aggregatable global unicast address permits aggregation in a threelevel hierarchy.
Figure 3-3. Aggregatable Global Unicast Address Structures
The Top-Level Aggregator (TLA) identifiers are at the top node in the Internet routing hierarchy, and must be present in the default-free routing tables of all of the top-level routers in the Internet. The TLA ID has 13 bits and thus permits 8,191 such IDs. This will keep these routing tables within reasonable size limits, and the number of routes per routing update that a router must process to a minimum. It is worth noting that in spring, 1998
Components of the P2P Model
57
the IPv4 default-free routing table contained approximately 50,000 prefixes. The technical requirement was to pick a TLA ID size that was below, with a reasonable margin, what was being done with IPv4. The Next-Level Aggregator (NLA) identifier is for organizations below the TLA nodes and is 24 bits. This permits 16,777,215 flat IDs or can give an arrangement of addresses like that of IPv4 that is hierarchical. One could, for example, do something similar to CIDR here. Next, we have the SiteLevel Aggregator (SLA) for the individual site subnets. This ID has 16 bits which permits 65,535 subnets at a given site. The low order 64 bits are for the interface identifier on the local link to which an host with an IPv6 address belongs. This is usually the real MAC address of a host's interface. It is certainly possible that during certain time windows, two hosts may end up with the same such address, and there are means available to resolve these conflicts and to guarantee global uniqueness. These are discussed just below. The authors of the IPv6 RFCs clearly understood the burden IPv4 imposed on network administrators. The seemingly simple task of assigning a new IP address, is in fact, not that simple. The address must be unique. Yet, often enough there are unregistered IP addresses on a subnet, and in most cases the perpetrator is innocent. The original intent usually required a temporary address for a test and the temporary address was never unassigned. The unfortunate side effect is that two systems will receive IP Address Resolution Protocol (ARP) requests from, for example, a router, and both will reply. Which system will receive the packet that initiated the ARP is arbitrary. There is also the assignment of a default router, and DNS servers. While most of this is now solved with the Dynamic Host Configuration Protocol (DHCP) (Droms, 1993), it is still a source of administrative difficulty when hosts change subnets, or IP addresses must be renumbered. Also, mobility adds a new twist to the equation (Mobile IP). Most large organizations have a dedicated staff to deal with these administrative issues which are more often than not a source of painful and costly network bugs. An illustrative example, as recalled by one of the authors, William Yeager, is sufficient here: In the late 1980’s Stanford University had networks that supported multiple protocols, and an organization was put in place to administer the university's rapidly growing local area network. One afternoon all XEROX Interlisp machines in the Knowledge Systems Laboratory (KSL) went into hard garbage collection loops. These systems were used as desktops as well as for research and so about one hundred and twenty-five people lost the use of their systems. Rebooting did not solve the problem. William Yeager always watched the network traffic, he kept a network sniffer continuously active in his office,
58
Chapter 3
and he noticed a huge, constant upsurge in Xerox Network Services (XNS) routing table updates and all routes being advertised were illegal, constantly changing, and non-repeating. The lisp machines in question cached XNS routing table entries, and thus, were madly updating internal data structures, freeing up entries, resulting in a hard garbage collection loop. At that time when a router received a new route it always immediately advertised it. These routes were originating on the backbone network from a previously unknown pair of routers. Fortunately, the KSL managed its own routers and code they ran. William generated an immediate patch which was added to the appropriate router to firewall the errant routing table advertisements to keep them on the backbone. A phone call to a Stanford network administrator alerted them to the problem. It was their own. They had installed two XNS routers to support some administrative software, and assumed they worked fine. They did on small networks, but when the number of XNS networks exceeded 17 all hell broke loose. The KSL had 17 such networks and triggered this bug. The routers were shut down until the problem was resolved. Such scenarios are not atypical. They arrive out of nowhere on a daily basis. Anything that can be done to ease the burden on network administrators is important. To simplify the task of assigning IPv6 addresses, IPv6 autoconfiguration capabilities have also been defined. Both stateful and stateless autoconfiguration are possible. Either one or the other or both can be used, and this information is flagged, and thus automated, in periodic IPv6 router updates. If stateful autoconfiguration is used, then a stateful configuration server is contacted which assigns an IPv6 address from a known, administered, list. Even in this case ND, as described below, is used to assure that the server supplied address is unique. If it isn't, the address is not assigned, and the appropriate error message is logged. Stateless autoconfiguration begins with the assignment of a link-local address (Thomson and Narten, 1998) as the 64-bit interface ID. This is usually the MAC address, but any unique token will do. Next, the host uses the ND Neighbor Solicitation (NS) Message to see if this identifier is unique. If no system complains, then it is assumed to be unique. If it is found not to be unique, an ICMP Neighbor Solicitation message would be received from a neighbor with a matching token, then an administrator is required to assign an alternative link-local address. This may appear to be heavy handed but is not. It is important to verify if there are in fact two identical MAC addresses on the local link. The authors believe that it is sufficient to log the problem and use a secure random number generator to create 64-bit tokens to be used here in conjunction with ND. These can be created in such a way as not to be in MAC address format. Such a system will at least permit a system to
Components of the P2P Model
59
auto configure and get on-line. A later administrative action can fix the address if necessary. Next, given a unique link-local address, periodic router advertisements contain the necessary prefix information to form a complete IPv6 address of 128 bits. A node can request such an advertisement with an ND router solicitation. IPv6 addresses have preferred and valid lifetimes where the valid lifetime is longer than the preferred lifetime. An address is preferred if its preferred lifetime has not expired. An address becomes deprecated when its preferred lifetime expires. It becomes invalid when its valid lifetime expires. A preferred address can be used as the source and destination address in any IPv6 communication. A deprecated address must not be used as the source address in new communications but can be used in communications that were in progress when the preferred lifetime expired. Noting that an address is valid if it is preferred or deprecated, an address becomes invalid when its valid lifetime expires. Thus, deprecation gives a grace period for an address to pass from preferred to invalid. The interested reader can explore the complete details of autoconfiguration in the RFCs mentioned in this section. A full list of the IPv6 RFCs can be found in the bibliography. Finally, IPv6 provides a header extension for Encapsulation Security Payload (ESP) (Kent and Atkinson, 1998) This can permit authentication of the data’s origin (anti-source spoofing), integrity checks, confidentiality, and the prevention of replay attacks. The well tested SHA-2 and SHA-3 hash algorithms are used, and authentication is done with Message Authentication Codes (MACs) (symmetrically encrypted hashes), and symmetric encryption algorithms like 3DES, AES, and Camellia. Sequence numbering is mandatory in the ESP. They are monotonically increasing and must never wrap to prevent replay attacks. The duration or lifetime of an IPv6 address poses a problem for their use as UUIDs on an P2P overlay network which is independent of the underlying real, in this case, IPv6 network. While the 64-bit interface ID can be assumed to have an infinitely unique lifetime even if periodic ND checks must be made to assure that this is the case, the router prefixes can expire, and do arrive with preferred and valid lifetimes bound to them. Periodic router updates must be monitored to assure that an address is not deprecated, and if it is, then appropriate actions must be taken. These actions are discussed in detail in Chapter 4. As mentioned in the introduction to this section, UUIDs are used to give a unique identity to each peer on the overlay network. These peers are also mobile. Thus, if one takes a laptop from the office to the home, or vice-versa, the IPv6 prefix will most likely change, and thus, a new UUID will be required. Why? If the prefixes are different, which can be discovered from
60
Chapter 3
router updates, then there is no way to use ND at the new location to verify the lifetime of the UUID. It could well be that if one is at home, then at the office another system has acquired this IPv6 address because the system at home cannot respond to ND Neighbor Solicitation Messages. This same system can join the P2P overlay network using the IPv6 address as a UUID, and therefore create a conflict. This implies that when a system becomes mobile, it must abandon its old IPv6 address and acquire another for use on the local link as well as for a UUID on the overlay network. This again does not impose a management problem on the P2P overlay network given the mechanisms described in Chapter 4. One thing is clear. If IPv6 addresses as described above are used as UUIDs, then before a system disconnects from the overlay network, if it intends to be mobile, it must be able to flush any knowledge of itself on the overlay network, or the overlay network has timeto-live values associated with dynamic information that permit this information to be expunged at expiration time. It is important to understand that the IPv6 stateless, autoconfiguration protocols are attackable. There are obvious attacks like a malicious host replying to all ND MS messages, thus denying any new node the ability to auto configure. This kind of attack is detectable with a reasonableness heuristic: Generate up to five 64-bit interface IDs using a good pseudo random number generator. If each of these five is denied as a duplicate, then there is an attack, and measures can be taken to find the attacker. Another equally obvious form of this attack is a node with a duplicate interface address not responding to ND. In this case, a duplicate IPv6 address will be created on the same local link. Also, a node masquerading as a router and generating bogus prefixes or valid prefixes with incorrect lifetimes is possible. It is important to understand here that even with these possible attacks, IPv6 is a major step forward, and can be deployed while solutions to these attacks are in progress. The IETF does not stand still, and its members are pursuing solutions. Also, IPv4 can be similarly attacked, is being attacked as we write, and many of the IPv4 attacks are not possible with IPv6. In spite of these security problems, IPv4 has been tremendously successful, and IPv6 will be even more so. Finally, there are alternatives to using IPv6 addresses as UUIDs, and they are discussed in the next section. 3.2.2.2 Universal Unique Identifiers (UUID) In the previous sections we have given a good overview of IPv6 addresses, and their appropriateness as UUIDs on the P2P overlay network.
Components of the P2P Model
61
The major problem faced with the IPv6 alternative is deployment. The attacks on IPv6 described in our chapter on security should not slow down its deployment for general use and are less menacing for IPv6 address format for P2P UUIDs. The most serious attack in the P2P space would be theft of peer identity. As dangerous as this sounds, recall that someone attached to the internet can use almost any IPv4 address they wish if they are clever enough, and they have a non-Internet Service Provider (ISP) connection. ISPs can refuse to route some addresses for example. It is all too easy to change one’s IPv4 address with most home systems. IPv6 can be made more difficult to attack if stateless, auto-configuration is used. There is a computational and personal cost, the user must beware and take the right precautionary measures, and it is that cost that must be weighed against the probability of being hacked which is miniscule. In any case, we feel that IPv6 gives us a good future solution for UUIDs for several reasons: 1. Built-in global registration, 2. Barring attacks and administrative errors, the possibility of globally unique addresses, and therefor UUIDs, 3. IPv6 addresses can be used as UUIDs when a system is mobile to permit reattaching and acquiring a new UUID, and here the interface identifier is almost always reusable, 4. The attacks and related security problems are being addressed as we write, 5. Global uniqueness also permits disjoint overlay networks to join as mentioned in section 3.1.2.2. Until IPv6 is sufficiently deployed, we can implement a P2P UUID generation strategy that is similar to ND. The interested reader can read Chapter 4, section 4.3.3.4, on mediator prefixed UUIDs. There are other methods that can be used to generate UUIDs with a high probability of uniqueness given enough bits and essentially impossible to spoof. One can use a good pseudo random number generator, or better yet, a secure random number generator like /dev/random on Unix and Linux systems, to generate enough random bits per ID to make the probability of duplication essentially zero. If one uses 128-bit UUIDs generated in this way, the probability of a collision is less than winning the lottery 9 times in a row. Larger UUIDs can be created if desired. We can never fill up the UUID space. Yes, there will be cheaters who will attempt to create peers with duplicate UUIDs since these values are public. This problem is currently resolvable with several emerging identifier generation techniques.
62
Chapter 3
There are Statistically Unique and Cryptographically Verifiable (SUCV) Identifiers, Crypto Based IDs (CBID), which are referred to as Cryptographically Generated Addresses (CGA) (Aura, 2002) While the security issues discussed in these papers will be covered in chapter 5, the basic common ideas that play a role in UUID generation will be reviewed here. Where H-x is the high order x bits of the hash algorithm H, a host generating a UUID can do the following: 1. Create a public/private key pair using, say, RSA or DiffeyHelman. 2. Using a hash function, H, like SHA-3, generates H (Public Key), the 256-bit SHA-3 hash. 3. For a CBA use H-64 (Public Key) as the CBID IPv6 interface identifier along with the high order 64-bit prefix. This can be an IPv6 based UUID. 4. For a UUID one can also use H-128 (Public Key) as CBID. Given such a UUID, a challenge can be used to verify the owner of the UUID who possesses the private key associated with the public key. When peer1 receives a document containing the UUID from peer2, peer1 requests a private key-signed message from peer2 containing peer2’s public key, and a random session identifier, SID, generated by peer1. The latter SID prevents peer1 from spoofing peer2’s identity in a communication with peer3. Without the SID peer1 can save and send the signed message from peer1 thus faking the ownership of peer1’s private key. Continuing, peer1 can calculate H-128(Public Key), and if the hash is correct, then verify the signature of the message. The signature can be a straightforward private-key signed SHA-3 hash of the message. If the signature is correct, then the document indeed belongs to peer2, and peer2’s identity has been established. How can this be attacked? There are those that worry that the H-64 (Public Key) interface identifier can be attacked with brute force. Here, a successful attacker would need to find a different public/private key pair where the public key hashes to the exact H-64 (Public Key) value, i.e., find another public key that collides with the original one. Let’s assume RSA2048 is used. First, to generate a table with 264 values, let’s assume that a Hard Disk Drive (HDD) with a 1.25 -inch radius can hold 40 terabytes of data. We will need 264 64-bit or 8-byte values. A back-of-the envelope calculation says the total number of HDDs is 201,000,000. Now, if one just wants to compute until a collision is found and it is generous to assume that an RSA2048 public/private key pair can be computed in 500 milliseconds, then let’s assume that sometime in the future the calculation will take 1
Components of the P2P Model
63
microsecond, or that multiple systems are used in parallel to achieve the 1 microsecond/public/private key pair calculation. In this case, an exhaustive search for a collision will take 3 million years. Assuming that only half of the possible values are required to achieve this single collision, this reduces to 1.5 million years. That’s a lot of CPU cycles. Even with Moore’s law, we should not lose sleep over this attack succeeding in the near future. All this implies that 128-bit UUIDs are impossible to attack by brute force. Other attacks are possible if care is not taken to prevent them. These are typically the “man-in-the-middle” (MITM) attacks. There are several ways to prevent MITM attacks: One can use a secure channel like TLS to exchange CBIDs; one can use infrared communication with eyeball contact between the individuals exchanging the CBIDs; out-ofband verification is also possible where upon the receipt of a 16byte CBID, the source is contacted and asked to verify the value; and a trusted 3rd party can be a CBID escrow. Finally, MITM attacks are not always a threat. For example, if one is exchanging mpeg or jpeg content in an ad-hoc P2P network where CBIDs are used as UUIDs, then as long as the content satisfies the recipient, there is no real security threat. And a great deal of P2P activity will be the ad-hoc exchange of content. When financial information like credit card numbers is exchanged, then it is necessary to use strong security and verifiable CBIDs. This, as with security details, is covered in Chapter 5. 3.2.2.2.1 The BestPeer Example BestPeer (Ng, Ooi and Tan, 2002, 272-284) is a self-configurable, mobile agent based P2P system. It is highly centralized and relies on the Location Independent Global Names Lookup (LIGLO) server to identify the peers with dynamic IPv4 addresses. When a peer node joins the BestPeer P2P system, it registers to a LIGLO server. The server gives the peer node a unique global ID (BestPeerID). This ID is a combination of LIGLO server’s IPv4 address and a random number which the server assigned to the peer node. The LIGLO server saves the BestPeerID, peer node IP address pair. The LIGLO server also sends the new peer node a list of such (BestPeerID, IP) pairs to which it can connect. When a node has a new IP address, it should update its LIGLO server with this information. These IDs can be easily spoofed, thus permitting identity theft, because any MITM can watch the network activity to obtain BestPeerIDs and then notify the LIGLO server of a change in IP address associated with these IDs.
Chapter 3
64
3.2.2.2.2 Microsoft’s Pastry Example Pastry (Rowstron and Druschel, 2001) is a P2P overlay network performing application-level routing and object locating. Each peer node in the Pastry network is randomly assigned a nodeID in the numerical form of 128-bits in length. When a peer node joins the Pastry network, its ID can be generated through a cryptographic hashing of the node’s public key or of its IP address. The value of the ID plays a crucial role when doing the scalable application-level routing. Applications hash file name and owner to generate a fileID and replicas of the file are stored on the nodes whose IDs are numerically closest to the file ID. Given a numeric fileID, a node can be routed to the node with the ID which is the numerically closest to the file. Although there is no mention of CBIDs in Pastry, if the hash of the public key is used, then CBID techniques could be used to secure Pastry routes. 3.2.2.2.3 The JXTA Example The project JXTA (Project JXTA, 2011) is another P2P overlay network and assigns a UUID, the node’s peerID, to each peer. The peerID is implemented as 256-bit UUIDs, is unique to the peer node and is independent of the IP address of the node. JXTA permits peers to form groups which are called peer groups. The groups make the overlay network more scalable since all peer activities are restricted to the current peer group in which the peer is a member. Also, all communication between peers on the overlay network is done through pipes. Besides peer nodes, there are UUIDs for peer groups, for data, and for communication pipes. In JXTA a UUID is a URI string, for example: urn:jxta:uuid59616261646162614A78746150325033E7D0CCAB80FD4EBB99BB 89DD0597D12F03 The peer and its current peer group’s UUIDs along with ID type are encoded into the above yielding the 256-bit peerID. CBIDs are also implemented for JXTA peerIDs. In this case the 256-bits of the peer’s UUID are the SHA-3 hash of its X509.v3 root certificate. If peers use X509.v3 certificates for peer group membership authentication, then the peer group’s UUID part of the peerID is also a SHA-3 hash of the peer group root certificate.
Components of the P2P Model
65
3.2.3 Component 1 - The Peer-UUID We require that every peer node on the overlay network has a UUID which we call the Peer-UUID. From the above discussion it is clear that we have many options for generating these UUIDs. The feature we desire given any of the options is global uniqueness. An absolute requirement is uniqueness within one’s peer overlay network. If an enterprise decides to form an enterprise-wide, overlay network, then registration techniques can be used to administer uniqueness. One might consider the SHA-256 hash of each system’s IP address or MAC address. But this can lead to problems if an enterprise decides to renumber its IP addresses, uses IPv6 where IP addresses have a definite lifetime, or if one inadvertently programmatically creates two identical MAC addresses. In ad-hoc networks other techniques are required. In this latter case the best choice is using a sufficient number of bits, x, from the H-x (public key or X509.v3 certificate). If one uses, for example, RSA2048, then public/private key pairs are unique. Thus, if x equals 128, then the probability of a hash collision is sufficiently close enough to zero to guarantee global uniqueness, and as discussed in section 3.2.2.2, one can get by with even fewer bits from the hash. Therefore, while the choice of UUID is up to the designer of the P2P software, in our discussions we will assume uniqueness UUIDs within the overlay network, and when security is an issue, CBID based UUIDs will be used. If one is behind a firewall, and all communication is secure, this may not be necessary. Still, we cannot overlook the implicit advantage of cryptographic information being embedded in the peer-UUID. 3.2.3.1 Towards a Standard UUID for Peers Why do we need a standard? The answer is straightforward. We want to have a single, world-wide, peer-to-peer network for all devices. And, when and if the Internet becomes Interplanetary or even Intergalactic, we want this to be true. Standards drive a world-wide Internet economy. What should the standard look like? We didn’t intend to waste the reader’s time reading about IPv6. This is clearly the correct approach for standardized Peer-UUIDs. As will be explained in our Chapter 4, we introduce the mediator component. Mediators behave like Internet routers on the overlay network. Therefore, we can introduce protocols similar to neighborhood discovery and mediator prefix assignment to yield PeerUUIDs in IPv6 format. Then, when IPv6 is fully deployed, we can then use IPv6 addresses as long as we use CBIDs for the 64 bits of interface identifier. The reasons for this latter requirement are discussed in section 3.2.2.1.
66
Chapter 3
Open-source cryptographic software is available for the generation of public/private keys and SHA-2 or SHA-3 hash algorithms can be found in the current versions of JDK (Oracle 2022) 3.2.3.2 The PeerIdentity document It is not an accident that a great deal of what will be described here is an outgrowth of the JXTA peer advertisement. Both of us have worked on project JXTA, helped define the specifications, and after all, there are generic requirements that cannot be avoided. At the highest level each peer needs an XML document description of several basic properties which are common to all P2P systems. First, a peer will have a human readable peerName, and a Peer-UUID. The name is usually assigned by the person using the peer node during a configuration phase. Because one can use this peerName for applications like P2P Email (see chapter 7), the allowable characters are restricted by both XML and MIME. For the actual details we refer the reader to the application XML and MIME specifications in the references. The MIME Header Extensions specify how the XML characters in the name must be formatted to satisfy email address constraints. We do expect some peers to have machine generated peer names. Certainly, the peer name may not be unique. In cases where uniqueness is an absolute requirement, registration is required as discussed in section 3.1.2.1. If one were to ask for an elaboration of all the peers on the overlay network, then a list of peer names, Peer-UUIDs pairs would be given. In most ad-hoc networks, users of peer nodes will be familiar with the names of peers with whom they regularly communicate, and registration will not be necessary. The names may be unique in the personal peer community in which they are being used. In this case, a peer can restrict its searches to this community and not worry too much about unrecognized peer names. Still, it is possible to have name collisions in such a community. To help with this situation we add an optional peer description field to the document. The description is usually created by the user when the peer is initially configured and is there to help differentiate peer names in the case of duplication. The description will usually be a simple text string but almost any digital data is permitted, for example, photo.gif of the user. All such descriptions must be digitally signed. Without this signature forgery is possible. To verify the signature a different peer will require the signing peer’s public key. Note that it is always important to consider performance, and PeerIdentity documents will be frequently accessed. Consequently, text is a good choice for this field, and in the case of a gif file, or even executable code, a URN should be provided so that it can be accessed only when
Components of the P2P Model
67
necessary to disambiguate name collisions. The details of personal peer communities are discussed in section 3.4 of this chapter. A peer’s PeerIdentity document is required to communicate with that peer. One needs a unique identity as well as the other information discussed just below to communicate. When two peer Nodes communicate with one another or by the means of a mediator, each peer Node must provide the other, or the mediators with the possible ways it can communicate. For example, a peer Node may prefer to always use a secure communication channel like TLS when it is available, or may be behind a firewall where a means of traversal such as http or SOCKS is necessary. To this end the PeerIdentity document will contain a list of available protocols for communication. Communication protocols can be on the real network or on the overlay network. For example, TCP/IP is a communication protocol on the real network and requires an IP address as well as a TCP port in its specification. On the other hand, TLS is between peers on the overlay network and only requires the Peer-UUID in its specification since all communication on this network is between peers, and independent of the real network and underlying physical bearer networks. Thus, the following are URI’s describing real and overlay network communication protocols that can be used to contact the peer that includes them in the special document described just below: tcp://150.8.11.3.8788 udp://150.8.11.3.9999 tls://uuid-AACDEF689321121288877EEFZ9615731 Finally, a physical layer may or may not permit the use of multicast. If it does, the peer Node is configured to take advantage of this functionality. As appropriate the multicast field is marked as TRUE or FALSE. Given this introduction we define the PeerIdentity document as follows: Document type = PEERIDENTITY Content tags and field descriptions: Restricted Legal XML character string [XML][MIME]
uuid-Legal UUID in hexadecimal ascii string
Legal XML character string [XML] Legal Universal Resource Name
68
Chapter 3
real protocol URI overlay network URI
TRUE | FALSE There may be multiple protocols specified on both the real and overlay network. Below is an example of a PeerIdentity document:
LBBoureau
uuid-AACDEF689321121288877EEFZ9615731
Je suis le mec francais signed SHA-3 Hash of text http://www.Beaune.org/chateau/grandCru/LB
tcp://152.70.8.108.9133 http://152.70.8.108.1111
tls://uuid-AACDEF689321121288877EEFZ9615731
FALSE
Using 4PL we create the above example as follows: Document pi = new Document (PEERIDENTITY, “LBBoureau”); The other PeerIdentity document fields will be known to the system as part of its boot time configuration data. These details will vary from implementation to implementation. In some implementations the creation of a PeerIdentity document will automatically publish it. Document publication on the overlay network is described in detail in Chapter 4. To functionally publish a document in 4PL we use the publish command:
Components of the P2P Model
69
publish(pi); In the next section we will discuss the Virtual P2P Network. For two peers to communicate with one another on this network they must possess one another’s PeerIdentity document. This, of course, enables communication on the real, underlying networks, and a P2P system using what we describe must implement the code necessary to create, update, and analyze the above document as well as establish communication on these real networks.
3.3 The Virtual P2P Network Up to this point we have generally discussed the notion of an overlay network. The reader has in mind a possibly ad-hoc collection of peer nodes with certain rights, and policies for communication. Also, the requirement that each such peer has a UUID as a means of identification is well understood at this point. A single UUID is necessary to facilitate communication on this network. It does in fact give a single point of entry but lacks a means to organize the information that might be communicated. One might argue that the parameters required to organize data triage can be included in the initial part of the data. While this is true, first there may be several underlying transports on the real network, and we want an end-toend delivery mechanism. Second, including addressing information as part of the data does not yield a very satisfactory network communication stack and hides what is really going on. Such a stack has always been a part of networking protocols, and we will build an overlay network stack in the same spirit. One can ask, why not just use the IP stack and be done with it? Why go to all the trouble of inventing yet another network stack on top of network stack? As we have carefully examined in the sections on IPv6 and UUID, in order to reestablish end-to-end network communication UUIDs are required that are independent of the underlying real networks which may not in fact use IP. In the case of IPv4, the reader now understands the address space problem, and that IPv4 addresses cannot for this reason be used as UUIDs. Also, with the imminent arrival of literally billions of devices the ability to create ad-hoc UUIDs is necessary. We have mentioned that IPv6 addresses are possible candidates for UUIDs, but we still have a deployment issue here. Finally, we have the Network Address Translator (NAT), firewall and other underlying real network barriers, as for example, the prohibition for good reasons of propagated multicast that in turn makes long range, ad-hoc discovery impossible without UUIDs. Thus, to establish a viable P2P network topology, a simple network stack where the UUID layer is at the
Chapter 3
70
bottom is necessary. Before we give the details of the overlay network stack, let’s briefly examine the IP network stack.
3.3.1 Hosts on the Internet There are many network stacks. The most general is probably the Open Systems Interconnection (OSI) stack which has seven layers ranging from the physical layer on the bottom to the application layer on the top. A few examples of physical layers are ethernet, 802.11a/b/g, LTE, Bluetooth, and wide band CDMA. An IP stack has five layers, and level 1 is the physical layer. The IP stack is used on the Internet, and is seen in Figure 3-4 with OSI layers:
Figure 3-4. The IP Network Stack
Level 2 is the link or network layer, and this is where the device drivers do their work. On ethernet, the IP packet has a unique number, 4096, that identifies it to the device driver, and this is used to dispatch the packet to the next level. IP is at level 3. There are other IP protocols like the IP Address Resolution protocol (ARP), Reverse Address Resolution Protocol (RARP) at this level. The transport is at level 4. Here we have, for example, TCP, UDP, and ICMP. Finally, the application is at level 5.
Components of the P2P Model
71
There are a multitude of network applications that run at level 5. A few examples are telnet, ftp, imap, pop3, http, smtp, and SNMP. The IP ports are well defined and registered through the Internet Assigned Numbers Authority (IANA). For those interested in the complete list look at the latest assigned numbers published on the IANA website (IANA 2011). Consequently, in order to organize these applications, as discussed in the next section, the transport protocols at level 4 that dispatch data to these applications will in this way have well defined port numbers. 3.3.1.1 Addresses and Ports Given that each host on the IP network has an IP address by which it can be contacted, or at least if the address is not registered, then responded to, these addresses give hosts end-to-end communication during the lifetime of the hosts’ IP addresses. At the transport layer to permit application triage, network application port numbers are associated with each transport layer protocol. Looking at the above short list we have the following which all also have TLS/SSL ports: Application Protocol
TCP Ports
telnet
23
ftp
21,22
smtp
25
http
80
pop3
110
imap
143
snmp
161
Thus, a telnet daemon listens on port 23 for incoming TCP/IP telnet connections at a host IP address. The listening IP-address.port pair can be viewed as a socket which will accept incoming connection requests to run the application associated with the port number. Not all port numbers are used, and this leaves room for experimentation as well as the assignment of a random port number to the host requesting a connection for a particular
Chapter 3
72
Internet application’s service. That is to say, if a host with IP address A1 wishes IMAP service on the host with IP address A2, then the initiating host uses as a source port, a unique, unassigned port number, PN, to be associated with A2.143, and creates the source socket, A1.PN. The combination of A1.PN and A2.143 is a unique connection on the Internet.
3.3.2 Peers on The Overlay P2P Network As is the case with the IP network, we also define a stack on the overlay network. This stack has three layers because there is neither a physical nor a link layer. At level 1 is the Overlay Network Protocol (ONP) which is analogous to IP in the IP stack, and thus, peer-UUIDs play the role of IP addresses. There is a transport layer at level 2. Here there are two protocols which are discussed in detail later. Where ONP messages are our IP packet equivalent, for transports we have the Application Communication Protocol (ACP) which is a reliable message protocol, and the Universal Message Protocol (UMP) which like UDP is not reliable. Hence, for UMP, when required, reliability is application dependent. At level 3 applications reside. As with IP, we also require virtual ports for the triage of incoming application data.
Figure 3-5. The Overlay Network Stack
3.3.2.1 The Virtual Port Like the peer-UUID, a virtual port is also a UUID that defines a point of contact for a given application. Just like the Peer-UUID, the virtual-portUUID can be a random number of enough bits, say 128, to guarantee uniqueness, or as we prefer, a CBID so that cryptographic challenges can be made to verify the application source of the virtual port information. A virtual port can be ad-hoc or registered and has an associated name that usually identifies the application. So, for example, for instant messaging one
Components of the P2P Model
73
might have the name, virtual-port-UUID pair, (IMApp, UUID). Again, the names are not necessarily unique as with IP ports unless registration is used. This will certainly be the case in more formal, enterprise P2P networks. Continuing with the IP analogy for TCP, we might have, either on an adhoc or registered network: Application Protocol
ACP Ports
chat
UUID1
p2pftp
UUID2
chess
UUID4
mobile Agents
UUID5
p2pEmail
UUID6
In the case of ad-hoc networks we will thoroughly describe how peers discover such ports within the context of their personal peer communities. 3.3.2.2 Level 2 Communication Channel Virtual Port Once a peer possesses another peer’s PeerIdentity document, it has enough information to communicate with that peer at level 2 on the overlay network. This “operating system” to “operating system” communication is required to publish system documents to which other peers subscribe. These system documents enable level 3 or application and services communication. In a sense, one is distributing the operating system communication primitives across an overlay network, that is to say, we are really dynamically bootstrapping an ad hoc distributed operating system. We use the Level 2 communication virtual port (L2CVP), and UMP for this purpose. This communication is established using the reserved virtual port 128-bit UUID whose value is all 1’s. 3.3.2.3 Unicast, Unicast Secure, Multicast and Multicast Secure Virtual Ports There are two basic types of virtual ports on the overlay network. These are unicast and multicast. A unicast port permits two peers to establish a unique bi-directional, overlay network connection. Similarly, a multicast
74
Chapter 3
port accepts uni-directional input from multiple peers. Each of these ports has a secure counterpart that can ensure the authenticity of the communicating parties, and always guarantees the privacy and integrity of the data that is exchanged. The actual protocols that can secure the overlay network communication in this manner are discussed in Chapter 5. As previously mentioned, a virtual port is identified by a UUID. 3.3.2.4 Component 2: The (Name, Virtual-Port-UUID) The (Name, virtual-port-UUID) is our second component. As with the peer-UUID, the name must be a legal XML character string. It is used by services and applications to both publish and establish communication channels on the overlay network. This component is published by the means of a VirtualPort document. 3.3.2.4.1 The VirtualPort document Two documents are required to be published by a peer to establish level 3, application based communication on the overlay network. The first is the PeerIdentity document as discussed above, and the second is the VirtualPort document. The VirtualPort document is created by applications and services at level 3, and, as the PeerIdentity document, is published and subscribed to at level 2 using the L2CVP and UMP. See section 3.5 for an overview of publication, subscription and how communication is established. Given this introduction we define our VirtualPort document as follows: Document type = VIRTUALPORT Content tags and field descriptions: Legal XML character string [XML] uuid-Legal UUID in hexadecimal ascii string
unicast | UnicastSecure | multicast | multicastSecure
uuid-Legal UUID in hexadecimal ascii string
MM DD YYYY HH:MM:SS +/-HHMM
Right to publish ID - Hexadecimal string
peer UUID of publisher
Components of the P2P Model
75
The tag’s field is the UUID of the multicast group bound to the vportUUID. This tag is exclusively for a virtualPort of type multicast, and its functionality is described in chapter 4, section 4.1.4. The is the date after which this port is no longer accessible. If this field is missing, then the virtualPort has no expiration date. Examples of such a date are: Aug 03 2032 05:33:33 +1200 Jun 16 2040 05:20:14 -0800 The format is standard. The +/-HHMM is the hours and minutes offset of the time zone from GMT. The field provides a mechanism to prove the source for this document is the creator. Other peer Nodes should not publish this document. Mechanisms for detecting false publication are discussed in chapter 5. If this field is not present, then the default is FALSE. The tag is to identify a subscriber to this document, a communication hint to aid in contacting its publisher. Thus, by including the peer-UUID, and acquiring its associated PeerIdentity document, a peer may have all that is required to communicate. We say “may” because of possible real network barriers which can prohibit true end-to-end communication on the real network. In this case additional communication hints will be necessary to add to this document. See section 3.5 and Chapter 4 for the overlay network solutions to this latter problem. Below is an example of a Virtual Port documented:
MobileAgent
uuid-AACDEF689321121288877EEFZ9615731
unicastSecure
uuid-61AAC8932DE212F169717E15731EFZ96
The following 4PL commands create, and publish the above VirtualPort document:
76
Chapter 3
Document pi = new Document(peerIdentity, “LBBoureau”); Document vp = new eDocument(VirtualPort, pi, “MobileAgent”, unicastSecure); publish(pi); publish(vp); Again, we include the creation of the PeerIdentity document for clarity of the requirements to build a VirtualPort document. In most implementations the system will be able to get these values from a peer’s global context. In the next section we discuss the virtual socket which is used to establish a connection between two peers. 3.3.2.5 Component 3: The Virtual Socket To enable application to application communication on the overlay network we require peer unique network identifiers on each peer that are analogous to the IP sockets mentioned just above. We call the unique (peerUUID, virtual-port-ID) pair a virtual socket on the overlay network. On a system there are two kinds of virtual sockets. The first kind is well known and published by the means of the VirtualPort document, and its publication implies that the publishing peer will accept incoming connection requests for this virtual socket. The second is generated by the peer that is trying to establish a connection on the overlay network. While the peer UUID part uniquely defines that peer, the virtual port number must be unique only to that peer and can be generated in any way that peer desires as long as it preserves peer-local uniqueness. When we are discussing published virtual sockets, i.e., published PeerIdentity documents and their accompanying virtual port documents, we will refer to them as listening virtual sockets. Note that on each peer a human readable representation of the listening virtual sockets is given by (peer-name, virtual-port-name). This permits a socket-like application programming model, which in turn hides the underlying complexity of their real network’s behavior. In 4PL to create and publish a listening virtual socket as well as an outing socket on the MobileAgent virtual port we do the following: // First, we create the PeerIdentity document Document pi = new Document(PEERIDENTITY, “LBBoureau”); // Second, we create a unicast VirtualPort document Document vp = new Document(VIRTUALPORT, pi, “MobileAgent”, unicast);
Components of the P2P Model
77
// We now create the listening virtual socket using the VirtualPort // document VirtualSocket vs = new VirtualSocket(vp); listen(vs); // We next publish the VirtualPort document so that incoming // connections can be established publish(pi); publish(vp); // Create a virtual socket used to establish outgoing connections. // This virtual socket can then be used to connect to any listening // socket. Note: The P2P system has the peerIdentity document stored // locally. Thus, a call to createSocket without any parameters will // generate a virtual socket with a random source virtual port UUID. // There is no virtual port document associated with the virtual port // that is used. Rather, it only appears in the outgoing messages. // Responses can be sent to this socket which is registered in the // system until it is closed. VirtualSocket local_out = new VirtualSocket(); // Imagine we have “discovered” the mobile Agent listening virtual // socket, remoteMobileAgent. We then open a connection as follows // (see chapter 4 for a definition of TYPE VirtualChannel out = new VirtualChannel(local_out, remoteMobileAgent, TYPE); Given these fundamentals we can describe the peer communication channel which is used for data transfer. It is important to note here that we are giving an overview of the fundamentals required to understand the detailed specifications in Chapter 4.
3.3.3. Putting it all together: Communication on the Virtual P2P Network Each peer now can create a peerIdentity document, virtual port documents and virtual sockets. These are the components that are necessary for establishing end-to-end communication between peers on the overlay network. The medium for end-to-end communication on the overlay network will be called a channel. Let’s imagine that peer1 and peer2 wish
Chapter 3
78
to establish a channel to permit a mobile agent to migrate between them. Peer1 and peer2 are named peer_rita and peer_bill, and they each have personal data repositories that contain restaurant reviews that they wish to share with others. Both therefore need listening-virtual sockets that can accept incoming connection requests to establish either unicast or unicastSecure channels. In any system there must be a service or application that created the initial listening socket, and which will send and receive data using either UMP/ONP or ACP/ONP. Whether or not these applications can accept more than one connection request is implementation dependent, and not important to our discussion. Thus, we can assume without loss of generality that peer_rita and peer_bill have listening sockets named rita.mobileAgent and bill.mobileAgent, and that their virtual port documents have been both published and received by one another. Furthermore, let us assume that peer_rita and peer_bill are communicating with one another; have created the virtual outgoing sockets, rita.338888881, and bill.338888881; and two channels have been established. We will then have on each of the peers: Bill
Rita
local
remote
local
remote
bill.mobileAg ent
rita.3388888 81
rita.3388888 81
bill.mobileAg ent
bill.33888888 1
rita.mobileA gent
rita.mobileA gent
bill.33888888 1
Note that we intentionally use the same virtual output port number on each peer because the only requirement is that the number 338888881 is unique on the systems where it was created so that the socket pairs define a unique channel. The reason that two channels are required even if they are bidirectional is that each peer is playing the role of both a client and a server. The listening socket is the server side of any protocol they establish between themselves. Certainly, it is possible that a single channel can be used to send migrating mobile agents in both directions, and in this case the listening servers would have dual personalities. This does lead to code complexity that is best avoided by adhering to the strict separation of client/server roles in applications. Still, each peer’s channels can be multiplexed on the same mediated, real network TCP/IP connection. Or, in the case where no
Components of the P2P Model
79
mediator is required, then a single TCP/IP connection between the peers suffices. So, what do we have? The fundamental mechanisms that permit peers to connect with one another. But there are some protocols missing at this point. The above scenario is high level and postpones the engineering details until we have established a more complete description of the components that comprise a P2P overlay network, as well as the protocols which manage communication. That is to say, we need to describe the publication and subscription mechanisms in detail as well as how two peers can discover one another given the complexities of the underlying real networks. The interested reader can skip to Chapter 4 where the engineering description of how to connect is given.
3.4 Scope and Search - With Whom Do I Wish to Communicate? Imagine a global P2P overlay network with billions of devices which can range from light switches to massively parallel supercomputers. The global network will be a union of home networks, manufacturing assembly line control processors, enterprise-wide networks, military, and government networks, smart cities and homes, automobiles, sensors, etc. This gives us a massive collection of PeerIdentity documents and their associated virtual port documents. P2P communication given such a search space is unmanageable without some means to scope or limit search to those peers that a given peer desires to contact. Why should a light switch wish to contact a tank on military maneuvers? Searching the entire space given the best of algorithms is not only too time consuming, but also ridiculous. It would render this P2P topology useful to researchers in search and routing algorithms but would have no practical applications. To attack this problem, we need to organize the search space in a way which has a certain logic to it, and reflects how humans and machines might really want to group themselves.
3.4.1 The Virtual Address Space The above global overlay network at this point can be described as a collection of PeerIdentity and VirtualPort documents. So, given an overlay network with n peers, and with the PeerIdenity documents, peerIdentity(i), I = 1, 2, ..., n, then for each such I, virualPort(I, j), j = 1, ..., m, is the complete collection of virtual ports for peer(i). Thus, in this virtual address space we have mn documents. If peer1 and peer2 wish to communicate with
80
Chapter 3
one another on a well-known socket, the problem of discovery can be both bandwidth and computationally intensive. To minimize the discovery problem, we need to minimize the document search space. To this end we use a common sense, real world approach, and recognize that communication is usually based on shared interests, friendship, and other human ways of social interaction. There are lonely hearts clubs, gambling clubs, baseball clubs, families, instant messaging buddy lists, assembly lines, military maneuver squadrons, mission control, forest fire fighting teams, political groups, chess clubs, etc., etc. The list is endless. Therefore, we define a connected community, CC, to be a subset of the collection, {peerIdentity(i) | I = 1, 2,..,n}, where the peer members have a common interest. A peer can belong to multiple connected communities, i. CC ( 2) may be non-empty. Any peer e, given CC(1) and CC(2), CC ( 1) can create a CC, and a peer must always be a member of at least one CC. The default CC should be set when a peer is initially configured. Given our overlay network with its collection of connected communities, { CC ( i ) | 1 ≤ I ≤ N} , let CC(j) be any member of this collection. Then CCS(j) = {peer nodes p | p is a member of CC(j)} forms an overlay subnetwork with a very special property: Where, m ≠ n , if we have p1 and p2 as nodes on CCS(m) and CCS(n), respectively, and neither are in both, then p1 cannot connect to p2 int these two CSSes, and vice-versa. This implies that the virtual ports and virtual sockets on CCS(m) are visible only to the nodes on CCS(m). As we will see later, the act of creating and publishing virtual ports and sockets is always restricted to a CCS. Thus, connected communities become secure “walled gardens” in the larger overlay network with inter-CCS connectivity not permitted. Thus, the set of all CCSes is pairwise, communication disjoint. Certainly, p1 can be an active member of multiple CCs and concurrently communicate with members of each such CC on the associated CCS. With this definition the larger overlay network is the union of pairwise, communication disjoint CCSes. In Figure 3-6 we illustrate a simple overlay network with four CCs each determining a CCS. The elliptical boundaries indicate the pairwise, communication disjoint attribute of the CCSes. Note that Peer 1 and Peer 2 are in multiple CCs.
Components of the P2P Model
81
Figure 3-6. Connected Communities on an Overlay Network
We see that a CC isolated to its CCS in this manner is a very powerful concept that will first speed up the discovery of the virtual sockets on adhoc P2P networks by limiting the search to a single CCS; second, minimize the time required to establish connections and route data between its peer node members; and third, simplify implementing CC-based policies to guarantee strong authentication, privacy, and integrity of either local data being transmitted on the CCS, or remotely stored. A CCS does raise one problem. How can information that is publicly readable be accessed by any peer? An example of this occurs in the following section where some of the documents describing connected communities must be publicly available. Note that this is not a requirement and is determined by the community’s creator. In this case, a connected community document for CC0 must be accessible outside of the explicit access control of CC0 so non-CC members are to be able to find it, and thus join CC0. To solve this access problem, we define the Public Connected Community (PubCC). All peers are members of this community and as a member can both publish documents, and access all documents published by the PubCCs members. As a rule, documents describing CCs, and metadata describing data that is publicly accessible can be published in the PubCC. The former are like CC bootstrap documents, and certainly, other CC documents will be published in particular CCs and their access will be restricted to the CC members’ scope. The latter might be meta-data containing URLs that can be used to access information about connected communities, for example, images and other publicity. Thus, the PubCC permits a peerNode global context for publishing some CC documents, and
82
Chapter 3
meta-data. As is seen in chapter 4, section 4.2.3.3, the pubCC restricts data access to CC documents. To bind a virtual port to the CC in which it is created we are going to require another tag in the VirtualPort document. This will be the CC UUID that is generated by the overlay network’s UUID algorithm and is described in the next section. Let’s now revisit the VirtualPort document: Document type = VIRTUALPORT Content tags and field descriptions: Legal XML character string [XML] uuid-Legal UUID in hexadecimal ascii string
unicast | UnicastSecure | multicast | multicastSecure
uuid-Legal UUID in hexadecimal ascii string
peer UUID of publisher connected community UUID
Below is the revised virtual port example:
MobileAgent uuid-AACDEF689321121288877EEFZ9615731
unicastSecure
uuid-DDEDEF689321121269717EEFZ9615659
uuid-FBCAEF689321121269717EEFZ9617854
Components of the P2P Model
83
Given this additional tag, the virtual socket also reflects its CC identity. Let’s assume Bill and Rita are both members of CC1 and CC2, have created mobileAgent ports in each CC, and established connections on the associated CCSs. Then the connection tables on Bill and Rita could appear as in Table 3-1. When CCs are created, its description needs to be published in a connected community document so that their existence can be recognized by other peers. This document is discussed in the next section. Bill
Rita
local
remote
local
remote
bill.mobileAge nt.CC1
rita.33888888 1.CC1
rita.33888888 1.CC1
bill.mobileAg ent.CC1
bill.33888888 1.CC1
rita.mobileAg ent.CC1
rita.mobileAge nt.CC1
bill.33888888 1.CC1
bill.mobileAge nt.CC2
rita.33888888 1.CC2
rita.33888888 1.CC2
bill.mobileAg ent.CC2
bill.33888888 1.CC2
rita.mobileAg ent.CC2
rita.mobileAge nt.CC2
bill.33888888 1.CC2
Table 3-1. Local and Remote Socket with CC Identity.
3.4.2 Component 4: The Connected Community Document When a connected community is created, it is given a human readable text name to render it recognizable, and a UUID so that it has a unique identity on the overlay network. Admittedly, as we’ve frequently discussed, it is possible to have connected community name collisions in purely adhoc networks. Because of this problem, the connected community document contains a description field provided by the creator which can give detailed information about the community to aid in distinguishing it from others with the same name. What this field contains is digital data whose format and use is up to the creator. It could be a plain text string, a gif, mpeg, or jpeg file. If a non-text description is chosen, we suggest using a Uniform Resource Name (URN) (IETF Datatracker 2002) to reference the data for better
84
Chapter 3
performance. Certainly, a URN list can be used to point to multiple locations, such as peers, or even websites. This data must be publicly available, and accessible via the Public Connected Community. Because of virus problems, we caution against but do not prohibit the use of executable code. It is certainly possible to create execution environments that are virus safe, and the current version of JAVA, with the exception of bugs, is such an example. The fourth, and final field is for policies that moderate community behavior. Examples are authentication, digital rights, content restrictions, data security, etc. These policies may also refer to executable code and the same cautions apply. We define three membership policy types. They are ANONYMOUS, REGISTERED, and RESTRICTED: 1. ANONYMOUS – Any peerNode can join a CC, and access its associated content. 2. REGISTERED – A peerNode must register with the CC creator. This is a “good faith” registration and is not intended to strictly control access to content. Registration does result in returning a registration validation stamp that must be presented to other members for content access. On the other hand, no attempt is made to control the registration validation forgery. There is no deep security here. 3. RESTRICTED – Here a secure credential is returned when membership is granted. The credential is used to authenticate the new member, and without the credential, access to content is denied. Such a credential may use either strong security or “Webs-of-Trust’’-like mechanisms (Chen and Yeager, 2001). Descriptions of how to implement this security are discussed in chapter 5. The fifth field is the optional Email VirtualPort UUID for this connected community. For details see chapter 7, section 7.4. The following is the connected community document: Document type = CONNECTEDCOMMUNITY Content tags and field descriptions: Restricted Legal XML character string [XML][MIME]
uuid-Legal UUID in hexadecimal ascii string
Legal XML character string [XML]
Components of the P2P Model
85
Legal Universal Resource Name
ANONYMOUS | REGISTERED | RESTRICTED Legal XML character string [XML] Legal Universal Resource Name
uuid-Legal UUID in hexadecimal ascii string
Below is the connected community example:
China green tea club uuid-AACDEF689321121288877EEFZ9615731
We love green tea from China urn:peerIdentity:uuidDDEDEF689321121269717EEFZ9615659/TeaCup
ANONYMOUS No membership restrictions GREEN tea related content only
uuid-AACDEF689321121288877EEFZ9615731
We now have complete descriptions of four components: The peerIdentity document, the virtualPort document, the virtualSocket document, and the ConnectedCommunity document. We have also introduced the concepts and definitions of the overlay network and connected community subnetworks. As well, we have given high level descriptions of how connections are established on the overlay network given the restriction to connected community subnetworks. Finally, we also noted the optional
86
Chapter 3
Public Connected Community for data that must be available to all peer nodes on the overlay network when they are active members of this community. Virtual connectivity exists to avoid the underlying issues of real network connectivity, and to provide by the means of peer identities both unique and true end-to-end visibility across all possible nodes on the overlay network despite these former issues. Yet, a system that provides overlay network functionality must have code that addresses the real underlying issues to permit this simplified form of network access. The implementation of the required software is where the real engineering takes place. In the next section we begin to discuss these real network issues and their concomitant programmable solutions to make them transparent to the application programmers using the overlay network.
3.5 How to Connect In this section we discuss the Internet IP stack and its binding to the Peer-to-Peer overlay network stack. We show how the latter does not have the address limitations of IPv4 and need not change as the Internet migrates to IPv6. The Peer-to-Peer overlay network can bind to any underlying physical network and its associated transport protocols. Consequently, establishing a Peer-to-Peer connection on the overlay network is independent of the physical layer and its protocols.
3.5.1 Real Internet Transport Imagine yourself in front of your familiar web browser connecting to some website anywhere in the world after just having done a search. What really happens? What Internet protocols are involved when you click on the URL? Let’s take a typical URL, http://www.ietf.org. First, the browser code knows that the protocol that will be used to connect to www.ietf.org is http, and that the default TCP port is 80. Before a connection can be made the real IP address of www.ietf.org must be found. How does that happen? In most cases one has a Domain Name Service (DNS) server that will return the IP address of a valid, registered domain name like www.ietf.org. Therefore, the DNS protocol is used to do the name to address translation before the connection is even attempted. But, to use the DNS protocol one must be able to locate DNS servers. How the latter is done is described in section 3.5.1.1. Let’s assume we know a DNS server and the appropriate translation takes place. Then your system can connect to the IP address of www.ietf.org
Components of the P2P Model
87
using the http protocol and TCP/IP. But your system is on the Internet and most likely not on the same subnetwork as www.ietf.org. Therefore, your system requires a route from your IP network to the destination IP network where www.ietf.org is hosted. How are routers discovered? Sometimes they are part of your system’s initial configuration, and otherwise there are protocols to find them (see section 3.5.1.1). Let’s assume your system knows the IP address of its router so that an attempt to connect to www.ietf.org can be made. Neither your system’s network nor the router’s network interface understand IP since they are physical layer devices. They each require the MAC addresses that work on the physical layer, e. g., ethernet. Since your system and the router must be on the same subnet, your system uses the Address Resolution Protocol (ARP) (Plummer, 1982), broadcasts an ARP packet which includes its own IP and MAC address, and asks the system with the router’s IP address, i.e., the router, to respond with its MAC address. We assume the router is up, and its MAC address is found. Given that the network is in good form, IP packets from your system destined for www.ietf.org are sent to the router and forwarded on to its destination. In this manner the connection will succeed and http running on top of TCP/IP will retrieve the site’s home page. While we have glossed over many details here and have left out other possible barriers that might have been resolved to connect to the IETF’s website, we have pointed out the key steps required to connect to a host somewhere on the Internet. In the following sections we fill in the missing details that are required to be known by your P2P software to permit any two peers to connect to one another.
88
Chapter 3
Figure 3-7. Connecting on the Internet
3.5.2 Issues The most basic IP component your system possesses is its IP address. We have previously discussed the IPv4 address limitations; the not yet fully deployed IPv6 solution; and why it is highly probable that your IP address is not fixed. That is to say, each time your system boots or after an appropriate lease expiration discussed below, this address may change. 3.5.2.1 Dynamic Host Configuration Protocol (DHCP) ISPs, Enterprises, home routers, and LANs with more clients than can be accommodated by their allocated IP address space, require a mechanism to fairly distribute IP addresses to these clients. Here we mean that an address is reusable when not yet allocated. The assumption is made that every client is not always up. DHCP provides a method to allocate shared IP addresses from pools of reusable, administratively assigned addresses. DHCP includes a lease time for address use as well as a way for a client to relinquish the address when it no longer needs it. DHCP can provide all that is required to auto-configure a client under most situations. The DHCP
Components of the P2P Model
89
service must be reachable by broadcast. Below is a short list of the more than one hundred DHCP options for providing information to a host about the network to which it is connected. A complete list can be found at the Internet Assigned Numbers Authority (IANA 2011). Because, on the Internet in general, there is no guarantee that a system’s IP address will remain fixed, end-to-end connectivity on the real network is not guaranteed. Since end-to-end connectivity is guaranteed on the overlay network with a unique peer-UUID, this guarantee must be reflected in the connectivity on the real network. To do this the underlying P2P software must recognize the change of IP address and appropriately update the PeerIdentity document when this occurs. Let’s assume that the peer, LaRueMansart, relinquished the IP address 152.70.8.108, and then DHCP assigned the new address 152.70.8.169. This peer’s PeerIdentity document would then be updated as described below with 4PL: String oldStr = “tcp://152.70.8.108.9133“; String newStr = “tcp://152.70.8.169.9133“; replaceField(peerIdenity, “”, newStr); oldStr = “http://152.70.8.108.1111“; newStr = “http://152.70.8.169.1111“; replaceField(PeerIdenity, “”, newStr);
oldStr,
oldStr,
This results in the new PeerIdentity document that is below. How these changed documents are redistributed on the overlay network is discussed in the next chapter.
LaRueMansart uuid-AACDEF689321121288877EEFZ9615731
tcp://152.70.8.169.9133 http://152.70.8.169.1111
tls://uuid-AACDEF689321121288877EEFZ9615731
90
Chapter 3
This situation becomes much more complicated with the introduction of Network Address Translators (NAT). 3.5.2.2 Network Address Translator (NAT) Again because of the shortage of IPv4 addresses Network Address Translators (Egevang and Francis, 1994) (Srisuresh and Egevang, 2001), provide a simple scheme to permit the use of internal, Intranet-private addresses and map them to globally unique addresses on the exterior Internet. NAT boxes shield the external Internet from IP addresses used for LAN communication and WAN access. Imagine that a small business assigns the ten class A addresses, 10.1.1.2, 10.0.1.3-10.0.1.11 to the ten systems it uses, and 10.0.1.1 is the subnetwork address of the router supporting NAT. Also, assume that externally, the router has a single, globally unique IP address, which in this case is 192.88.24.7 that is assigned by either an administrator or an ISP using DHCP. Furthermore, suppose that the system with IP address 10.0.1.3 wishes to connect to the external system with an application using TCP/IP and has the address 128.47.88.11. The application will generate a random local TCP port, say 2077, and try to connect to TCP port 23 on host 128.47.88.11. The router does several things on the reception of IP packets from the system on the subnetwork. First, it replaces the source IP address with the globally unique IP address, 192.88.24.7. Second, it assigns a TCP port to the outgoing connection, say 1024, and changes the source TCP port accordingly. It then maintains a port map, which maps received packets destined from 192.88.24.7:1024 to 10.0.1.3:2077 by appropriately updating the destination IP address and TCP port before delivering the IP packet on the LAN. In this manner, if there are simultaneous outgoing TCP connections from multiple systems on the subnetwork, then every connection will have its unique port mapping. Because the IP address is changed, this means that the IP checksum must be recomputed for each outgoing packet. This change of IP address also forces a new computation of the TCP checksum because the TCP pseudo-header is updated to reflect the changed IP address. The pseudo-header and source port are part of the checksum calculation. Since the checksum calculation also includes all the TCP data in the current IP packet, NAT can negatively affect TCP performance for large data transfers. UDP/IP packets also have a port and can be similarly mapped. The basic NAT is shown below in Figure 3-8.
Components of the P2P Model
91
Figure 3-8. NAT Network Setup
Finally, NAT causes another restriction. One cannot fragment either TCP or UDP packets on the subnetwork side of a Router. Packet fragmentation uses the following IP header fields: IP packet identification field, a more-fragments flag, and a fragment offset. Thus, in the first fragment the more-fragments flag is true, and in the last fragment it is false. If two systems on the same subnetwork happen to be fragmenting outgoing TCP/IP (UDP/IP) packets and the TCP/IP (UDP/IP) packets happen to have the same identification field and are sending to the same destination, then on the global side of the router, the IP header information that is used to reassemble the packets at the ultimate destination will be identical. Since the TCP and UDP header information is only in the first packet, all bets are off. Consequently, if one is behind such a router, fragmentation cannot take place for those packets that have global destinations. Now, since routers supporting NAT can respond to DHCP, the fixed subnet addresses may be reusable, and dynamically assigned. There is the additional complication that the globally unique IP address on the Internet side of the router is acquired with DHCP and thus, changes over time as discussed in the above section. Furthermore, the subnetwork systems do not know the globally unique external router address. Consequently, the PeerIdentity document will contain real IP addresses that do not reflect the
92
Chapter 3
addresses really seen by other peers outside of the subnetwork, and these peers themselves may be on their own subnetworks. The death blow to endto-end connectivity is one cannot reliably initiate connections from the exterior side of these routers. How this is resolved so that one has end-toend connectivity on the overlay network is discussed in section 3.5.3. 3.5.2.3 Multiple Devices and Physical Layer Characteristics Besides the headaches coming from firewall and NAT traversal, the growth of the scope and dimension of the Internet is causing more pain. With billions of devices almost always connected wired and/or wirelessly, every device cannot have an IPv4 address. These latter 4,294,967,295 addresses are exhausted. We’ve run out of fixed IPv4 addresses. It is too much to ask if it is possible to form end-to-end, fixed unique address to unique address communication in this sea of devices? Even with IPv6 a device’s unique address can change to a different, unique IPv6 address. To build a P2P overlay network including each single device with a unique identity has a solution. Let’s look at the fundamentals of the problems as an opportunity to create something with longevity that is independent of the underlying communication Internet protocols and underlying physical layer, be it wired, wireless or quantum based. 1. The problems caused by the growing number of devices are already covered by our careful design of the PeerIdentity document. Not every device has to have an IP address. Instead, a peer-UUID can be assigned for identification purposes. These identifiers are globally unique and independent of the underlying physical layer. 2. To solve the problems caused by the growing types of devices and their associated communication protocols, and physical layers, we need to find the common transport point between different networks. Then, the end-to-end, P2P overlay network communication can be established on the top of the transport chain using the common transport point which in our case is the mediator. 3.5.2.4 Changing Identities – How do I Know the Color of a Chameleon The above issues are from a technological point of view, but the most unpredictable element which contributes to the “chaos” of the Internet is human users. Yes, the carefully designed PeerIdentity documents are perfect to identify people, and the systems that they use. But a user can discard an
Components of the P2P Model
93
old identity, receive a new user identity, and start a new “life” with a new “face” on a P2P overlay network. One should be permitted to have multiple identities on different P2P overlay networks. As will be pointed out in Chapter 5, there are many possible ways to engineer an overlay network to not permit a change of user identity. This is analogous to the IETF and the protocol working groups and standards they produce. The P2P overlay transport layer design is neutral with respect to its use.
3.5.3 Solutions Let’s briefly summarize what we have learned about our P2P overlay network up to this point, and what obstacles we must overcome to permit P2P communication on this network. The P2P overlay network is a collection of peer-nodes, each having a name, unique identity, and description. And each peer must belong to a connected community to communicate with other peers in these communities. Also, since interconnected community communication is not possible, and we have overlay network information that must be communicated independently of these communities, in every P2P overlay network there is a Public Connected Community. Any community creator can belong to the Public Connected Community to enable access to the creator’s community universal in the corresponding P2P overlay network. Any peer belonging to at least one connected community in this overlay network has read access to the Public Connected Community’s data, and one might be able to post a request to join a connected community there. For example, another member tells this user about the community. In this instance, the creator will be alerted, and can invite this user, or ignore the request and auto-delete it. It is important to note that “stealth” connected communities are permitted and that their existence will never be publicly available. Other out-of-band means are necessary to discover and join these communities. For peer-nodes to discover, and communicate with one another, we have defined three documents: The PeerIdentity document, the VirtualPort document, and the Connected Community document. As noted just above, the PeerIdentity document identifies peers. The VirtualPort document permits a peer to establish both listening and outgoing virtual sockets, and the Connected Community documents creates safe “walled gardens” to both minimize the requirements for discovery, content exchange, routing, and provide a framework upon which privacy and security can more easily be added.
Chapter 3
94
Recall that as we began to look under the network hood, we found that there are many obstacles to prevent end-to-end communication on the real network. To overcome these obstacles we require mediators. Most importantly, mediators make these barriers invisible to the P2P application programmer. Also, a description of a peer-node’s real transports is in the PeerIdentity document. While the overlay network is an abstraction that simplifies communication, for this abstraction to be realized the underlying P2P software must bind the overlay network to the real network’s transports. In this section we describe exactly what a mediator does as well as how the above binding is implemented. 3.5.3.1 Transport Mediator Reviewing the issues with the real transports, and their current protocols: 1. IP Multicast limited to a local subnet prevents peer-nodes from discovering one another if they are not on the same subnet, 2. Multicast is non-existent in mobile phone networks and device discovery is limited on a physical network like Bluetooth because of its small radius, 3. Non-fixed IP addresses in the IPv4 address space, 4. NAT limitations which are like (3) but even worse, 5. Small devices with limited memory and processor capabilities requiring both storage and computational aid, 6. Routing our overlay network protocols to give us end-to-end communication, 7. The 100,000,000,000 devices requiring robust search algorithms for the discovery of peer nodes, their associated documents, and content, 8. Ad-hoc document registration for the peer-nodes in (7), i.e., administered registration is no longer feasible. Our solutions to all these problems begin with the P2P Mediator. Mediators host peer-nodes and any peer can be a mediator. For small, adhoc, P2P overlay networks, a single mediator is sufficient. Certainly, this depends on the processing power, memory size, and disk storage of the mediator. If one imagines a neighborhood network with a maximum of 100 peer-nodes, then any fully equipped home system available today is fine. On the other hand, as we move to networks with thousands to millions of peer-nodes the ratio of mediators to peer-nodes on a single P2P overlay network should decrease because the systems hosting peer nodes must be
Components of the P2P Model
95
more powerful and we wish also to maximize response time and stability so that the technology is compelling for users. There are some subtle computational problems related to routing. We are all familiar with the structure of telephone numbers. Local calls are easily differentiated from national and international calls. Our Mediators will be organized similarly to simplify routing and minimize routing table memory requirements. The primary requirement is that each mediator has a fixed network address that is not NAT/firewall limited so that it is always reachable from anywhere on its overlay network. Note that a mediator may be firewall limited but in this case firewall traversal may not be permitted using this mediator. This is appropriate for an enterprise P2P overlay network that prohibits firewall traversal. A second requirement is that a mediator must be available within the context of the needs of the peer-nodes it supports. For example: A neighborhood mediator may only be up from 6pm until midnight during the week and 24 hours on Saturday and Sunday while enterprise mediators will surely be 24x7 highly available. Now, let’s take a closer look at the problem of two peers discovering one another. If both peers are not NAT/Firewall bound from each other, then even if they have dynamic IP addresses via DHCP or are on the same NAT subnetwork, discovery is possible. They may be on the same local network, and then multicast can be used to announce a peer’s presence and send the documents previously described. Again, this is short lived because both NAT and DHCP acquired addresses can be leased and reused. Once two peers are not on the same local network or NAT subnetwork, the story changes. If these same peers are not on the same local network and using DHCP services, then an out-of-band communication, e. g., email, a telephone call, or an ad-hoc name/address registry like an LDAP server, can be used to exchange network addresses so that communication is established, and the documents sent to one another. This is cumbersome but it works. Finally, if either of the peers is NAT bound, then the external IP socket is hidden from the peer’s software since port mapping is used, and there is no reliable way that this peer can give the information necessary for an externally located peer to contact it. Simply stated: Because of NAT port mapping, this is impossible in general. Clearly, we have a real problem with discovery. So, how do our mediators solve the problem of discovery? Mediators are always visible to the peers they host, and to the other mediators that are necessary to support the P2P overlay network. That is to say (see chapter 4 for the protocols, and algorithm details):
96
Chapter 3
1. They must have fixed IP addresses, 2. Mediators must be external to all NAT stub networks they support, 3. If the P2P overlay network supports multiple physical layers whose level2 transports are incompatible, e.g., Bluetooth and IP, then the mediator must support all such transports. In this case note that the overlay network as we have defined is independent of these incompatibilities, 4. If one wishes to traverse firewalls, then the mediator must be outside of the firewall, and some protocol must be permitted to traverse the firewall. Examples are http and SOCKS Version 5, 5. The mediators must maintain a map of all known mediators, and this map is called the mediator map. This is maintained using the mediator discovery protocol. Peers must have either been preconfigured to initially contact a mediator or have another method to acquire this information. For IP based mediators/peers the contact information will be an IP address/socket pair. This may be in the software when it is booted; the software may have the address of an external website where mediators are registered; or for a neighborhood P2P overlay network, it can be acquired out-of-band using, for example email, a telephone call, or a casual conversation. Let’s assume a mediator hosted peer has the required contact information. What does the peer do to make itself known? It registers its peerIdentity, virtualPort, and connected community documents on that mediator. Since mediators will know about other mediators by the means of the mediator-to-mediator communication protocol, this information is distributed among themselves. A mediator may have a limit to the number of peers it can host and may redirect a peer to a different mediator using the mediator redirect protocol. In Figure 3-9, each mediator has a complete map which includes all four mediators. As the number of peers on a P2P overlay network grows, discovery becomes more and more difficult because the amount of data that is distributed among the mediators is enormous. In this case even with the best search algorithms, the task is still very difficult. One can imagine using a distributed hash table (DHT) and hashing references to all documents across the mediator map. Thus, a peer searching for a string will contact its mediator with the query, the query’s search string will be hashed, and sent to the mediator which has the information in its hash table along with the query. If we are clever, we append enough information to the hashed data
Components of the P2P Model
97
so that the mediator having the information in its hash table can forward the query to the peer from which the original document hash originated.
Figure 3-9. Simple Mediator Map
This glosses over some hash algorithm details, and routing issues, but nonetheless, it is a reasonable approach. This is illustrated in Figure 3-10 where two document descriptions are sent to mediator M3 along with the routing information. Then M3 hashes peerIdentity and virtualPort document descriptions to M1 and M0, respectively.
98
Chapter 3
Figure 3-10. Hashing Scheme
In Figure 3-11, assume peer p1 queries Mediator M1 for p2’s virtualPort Document. This query includes the routing information necessary for the responding peer to reach p1. M1 hashes the query and finds that M3 has the required information, then sends the query to M3. M3 has the routing information to forward the query to p2 via M2. Finally, since the query contains the route back to p1, p2 sends the response to p1 through M1 if M1 is reachable directly from p2. Otherwise, p2 must use its mediator, M2, to communicate with M1 and p1.
Components of the P2P Model
99
Figure 3-11. Query and Response
Now, suppose a mediator crashed. In this case, a great deal needs to be done to stabilize the remaining mediators’ hash tables. To this end, assume that a mediator hosting several peers discovers that a mediator to which it has hashed data has crashed. What does this imply? There are multiple possibilities, recovery is difficult because it can use a great deal of bandwidth and CPU time. Imagine we have a simple DHT algorithm (note that there are many possible DHT algorithms): Thus, given a (string, object) pair, we do the SHA-3 hash of the string modulo the smallest prime number, p, greater than or equal to the original number of mediators. We then store the ((string, object), originating peerIdentity) in the resulting mediator’s DHT: Given mediators M0, M1,..., MN, j = SHA-VWULQJ 02'S M p, Then, the data will be in mediator Mj’s hash table row j . Rows may have more than one entry if additional mediators join the P2P Network. Suppose a mediator, say Mk, crashes. First, all data hashed to that mediator must be rehashed. This implies that when data is hashed a reference to the mediator to which it is hashed must be kept, and, if mediator Mj hashes data to mediator Mk, then Mk’s mediator map entry on Mj should maintain the reference for the sake of computational efficiency during crash
100
Chapter 3
recovery. Then Mj need not search all its hashed data for Mk, rather it goes directly to Mk’s mediator map entry. We can decide to keep the same modulus, p, and any data that was stored on the crashed mediator’s hash table would then be stored on its successor, Mk+1, mod (p). This is OK, and all mediators need to use the same algorithm. One could result in a brute force like search instead, but this is not good for performance. When a mediator discovers that another mediator is down, then it must notify all other mediators to keep a consistent DHT. Because there is a race condition here, i.e., a mediator may be hashing data to a crashed mediator and not discover it is down until it tries to store the data, we have the simple rule: In this kind of failure, the mediator will wait until there is again a consistent mediator map, i.e., backoff and wait for a notification, and if none arrives, then apply the rule that is used to maintain a consistent map. That might be a simple ping of each member of the map from which it has not been recently contacted. The more unstable mediators one has, the more complicated this maintenance problem becomes. Here we must emphasize that mediators are supposed to be stable systems. And it is important to try to minimize the impact of such disaster recovery. In Figure 3-12, mediators M0, M1 and M2 have discovered that M3 has crashed, have updated their mediator maps, and rehashed the data that was hashed to M3 to M0, M3’s successor mod (5). Thus, in Figure 3-13, p2’s peerIdentity document is rehashed to M0. Finally, p1’s query is directed to M0 instead of M3. Recall that connected communities are pairwise, communication disjoint. Using this feature and the mediator-to-mediator protocol we do the following: 1. Mediators maintain a mediator map for each CC they host, i.e., for those connected communities in which the peerNodes they host are active (see chapter 4, section 4.2.3.1), 2. Mediators communicate the existence of each CC they host to all other mediators using the mediator-to-mediator protocol. If another mediator hosts this CC, it adds the contacting mediator to its CC mediator map and notifies that mediator that it also hosts peers belonging to the CC so that both have the same CC mediator map.
Components of the P2P Model
101
Figure 3-12. Hashing Recovery Scheme peer
Figure 3-13. Query and Response after Hashing Recovery
It is not necessary for CC mediator maps to be complete, that is to contain an entry for every mediator that hosts a peer that is a member of a given CC. Here, it will simply be impossible to find all members of a CC, and this is permissible in P2P networks, i.e., discovery need not be complete. But CC mediator maps must be consistent, i.e., every mediator in a CC
102
Chapter 3
mediator map hosts at least one peer that is a member of that CC. So, why do all of this? It simplifies disaster recovery because when a mediator has crashed, and this is discovered, then recovery is limited to those CC mediator maps to which that crashed mediator belongs. Figure 3-14 shows that mediator M0 supports two connected communities, CC1 and CC2, and M1 only supports CC1. Similarly, M2 and M3 only support CC2, and M4 is the sole supporter of CC3. Figure 3-15 shows that mediator M0 has crashed, its peers have discovered alternative mediators and CC mediator maps have been updated to reflect this.
Figure 3-14. CC Mediator Map
Components of the P2P Model
103
Figure 3-15. CC Mediator Hash Recovery Scheme
Finally, mediators can proxy storage and computation for their hosted peers which are device constrained. Given this capability and the above discussion, the eight issues previously mentioned can each be resolved by the addition of mediators to the P2P overlay network. 3.5.3.2 Putting the Components Together: Mapping P2P Overlay Network to the Real Transport We now have the components and documents that define the P2P overlay network, and mediators. We’ve also given an overview of discovery. What is missing is how the overlay network is mapped to the underlying real network by the means of the real transports in the peerIdentity document and mediators. Looking back at figures 3-4 and 3-5 we have the IP stack and the Overlay network stack. The code for the implementation of the overlay network is at the application layer in the IP stack. We have a stack bound to a stack at the IP application layer. Here we formalize this binding. P2P overlay network applications create virtual sockets. At this point it might be a good idea for the reader to review section 3.3. The following is extracted from table 3-1 and represents an Overlay Network Protocol connection:
Chapter 3
104
Bill local ONP
bill.33888888 1.CC1
Rita remote
rita.mobileAg ent.CC1
local rita.mobileAg ent.CC1
remote bill.33888888 1.CC1
Here, the peers peer_bill and peer_rita are members of connected community “1” and peer_rita has a listening mobile Agent virtual socket active. The above table shows an open connection on the P2P overlay network, and we describe below exactly the steps necessary to establish this connection which in turn requires a real connection on the chosen real transport. Every step from discovery to communication on the P2P overlay network requires a real transport mapping. This requires a case-by-case analysis: 1. Both peers are on the same local network, 2. Both peers are not on the same local network, and thus, a mediator is required at some point, and perhaps throughout the communication. In case 1 peer_bill and peer_rita would have discovered one another using IP multicast. They will also have the following in their peerIdentity documents: On peer node peer_bill we will have
tcp://152.70.8.108.3000
and on peer_rita,
tcp://152.70.8.109.3000
Having discovered one another on the same subnet, the software that implements ONP will establish a real tcp connection for communication between peer_bill and peer_rita. The above table now showing both the ONP sockets and the IP sockets appears as follows:
Components of the P2P Model
Bill local
105
Rita remote
local
remote
ONP
bill.33888888 1.CC1
rita.mobileAge nt.CC1
rita.mobileAg ent.CC1
bill.338888881. CC1
TCP
152.70.8.108. 3000
152.70.8.109.3 000
152.70.8.109. 3000
152.70.8.108.3 000
In this way, the TCP data that is exchanged between peer_bill and peer_rita is appropriately dispatched by the ONP software to the mobile agent applications using the channel that is open on the P2P overlay network. There is a minor variation in this case where both peers have TCP/IP connectivity but cannot discover one another for many possible reasons, e. g., they are not on the same subnet. Here, after receiving each other’s peerIdentity documents from a mediator, the ONP software attempts an initial TCP/IP connection to the transport addresses in the peerIdentity documents. This will be successful, and all further communication will proceed as above, and the above table will be identical. To describe case 2, we assume that peer_bill is on a NAT stub network, and peer_rita is behind a firewall. Thus, a mediator is required for all communications. This begins with discovery and then applies to every communication between the two peers. Bill is a member of CC1 and wishes to communicate with Rita who is also a member of CC1. We assume that both Bill and Rita have one another’s peerIdentity and virtualPort documents by the means described in the above discussion on mediators. As mentioned, several times in this section, the details of document retrieval are thoroughly covered in chapter 4. Here, peer_bill’s ONP software already knows its peer is behind a NAT. How? When peer_bill initially contacted its mediator, say M0, the mediator requests peer_bill’s peerIdentity document, notes that the source IP address of the TCP/IP connection peer_bill made is different from the IP address in the peerIdentity comprotocols fields. In this case three things will happen: 1. The mediator creates an INBOX for peer_bill. The INBOX will be for all messages from other peers destined to peer_bill. Recall, the mediator cannot contact peer_bill because peer_bill is behind a NAT, and so, peer_bill must poll the mediator at a reasonable frequency to retrieve its messages. A mediator should let
106
Chapter 3
peer_bill remain connected if the system resources permit, and a fairness algorithm for disconnecting hosted peers must be implemented, 2. The mediator notices the peer_bill that it is behind a NAT and sends peer_bill its mediator document which contains the routing information necessary to reach this mediator. It may be that peer_bill will communicate with peer_rita via a different mediator, and need to append to all communications with peer_rita the route back to peer_bill, i.e., via M0, 3. Peer_bill will update its peerIdentity and virtual port documents to reflect the routing information in the mediator document it received. That way, any further requests for either of these documents provides a highly probable route. Note that peers usually first ask for virtual port documents. If they contain successful routes, then the peerIdentity document is not required. If a route fails, then in this case, the virtualPort document also contains the peer UUID which can be used to recover the most recent peerIdentity document with a viable route. 1 and 2 above apply similarly to peer_rita. Let’s assume without loss of generality, that peer_rita is using M1. Peer_bill has an INBOX on M0 and peer_rita an INBOX on M1. Peer_bill and peer_rita both have their mediator’s documents. Peer_bill sends a request to connect to Rita’s mobileAgent port to Rita’s INBOX on M1 using ACP/ONP. Recall Bill has already communicated with Rita, received the requisite peerIdentity and virtualPort documents, and thus, also has received from M1’s routing information. This request includes the routing information necessary for Rita to respond to Bill. Peer_rita is polling the INBOX on M1 and receives this message. peer_rita then completes the handshake by acknowledging in a similar manner to peer_bill that the request to connect has been received. Recall that ACP/ONP is a reliable channel protocol, and these messages are guaranteed to be delivered barring a catastrophic partitioning of the real network. Now we have the following connection tables on peer_bill and peer_rita describing the mappings between the overlay network and the real network.
Components of the P2P Model
Bill local
107
Rita remote
local
remote
ONP
bill.33888888 1.CC1
rita.mobileAg ent.CC1
rita.mobileAg ent.CC1
bill.33888888 1.CC1
TCP
10.0.1.3.3000
129.14.7.25.3 000
152.70.8.109. 3000
152.70.96.11. 8080
In the above table the overlay network connectivity is always the same. This is what makes P2P a simplifying technology for applications. On the other hand, Bill is behind NAT, has received its IP address using DHCP from NAT, and NAT has a hidden but different external IP address. Bill’s remote address is the IP address of M0, 129.14.7.25:3000. Rita is behind a firewall and is using an http proxy address to contact M1 and 152.70.96.11.8080 is the address of this proxy. This completes the discussion of the functional mapping that takes place so that two peers can communicate on the P2P overlay network. Notice that all the components and documents we have described up to here are required. Also, the description of the new fields containing routing information that are added and virtualPort documents will be completed in the next chapter. There we describe the protocols that moderate the behavior of peers and mediators as well as the mediator document.
CHAPTER 4 BASIC BEHAVIOR OF PEERS ON A P2P SYSTEM
We now have a good understanding of the definition of a P2P overlay network, its peer nodes, and how this network maps onto the underlying real transports. On the other hand, we have not yet really provided the details necessary to ensure that peer nodes can dependably communicate with one another where “dependability” must be taken in the context of the P2P network’s place on the P2P spectrum as discussed in chapter 1, section 3. There is no P2P engineering way around the inherent ad-hoc behavior of some P2P overlay networks. In any case, these details are called protocols, or the rules that peer nodes must follow so that they can predictably and meaningfully interact. P2P overlay network protocols have both syntax and semantics. The syntax will define the form of the protocol data sent and/or received, and the semantics define the required behavior and interpretation of this data. These protocols are programming language independent, and, if the protocols are well defined, then correct implementations will be interoperable. Network protocols are no different in spirit than the familiar rules of the road we must all know and obey to drive on the highways of our respective countries. And, although there are no international standards for such rules, there is a reasonable familiarity, a core behavior, that is common among them so that a driver in unfamiliar territory can be well behaved and probably not receive a traffic fine. What we are saying here is that we are all familiar with the use and need for network protocols even if we have never read the specifications, for example, the RFCs for IP and TCP. This chapter is all about P2P overlay network protocols. Where the P2P overlay network provides the arteries and veins for overlay network transport, the protocols permit them to be filled with their life’s blood and to regulate its flow.
4.1 The P2P Overlay Network Protocols Recall as discussed in chapter 3, section 3, that beneath the application level in the Overlay Network Stack is the transport level. The transport level has two protocols: The Universal Message Protocol (UMP), and the
Basic Behavior of Peers on a P2P System
109
Application Communication Protocol (ACP). At the bottom is the Overlay Network Protocol (ONP). The ONP specifies both the syntax of our message format, and the semantics associated with each field this format defines. It is the IP equivalent but on the Overlay Network. Thus, we have ACP/ONP, and UMP/ONP.
4.1.1 The Overlay Network Protocol We assume that the real transports bound to the overlay network will use protocols with the strength of IPv4 or IPv6 to manage the packet traffic on the underlying infrastructure, and thus, real transport issues are of no concern in this section. Rather, the ONP has as its goal the successful delivery of a message between two peers on the overlay network. This delivery requires a destination overlay network address. And just like for IP, we must also supply the source overlay network address because communication is ultimately between two virtual sockets, and the information defining the two sockets must be included in the message. There are many reasons for always including the source address even if the communication does not require a response. For example, one needs to know the source of a message to discourage denial of service attacks; to maintain audit trails; to do billing; and to authenticate the source of a message for security reasons. Recall that a peerIdentity may contain cryptographically based information that permits source authentication by a receiver. Moreover, to simplify both routing to the destination, and the destination peer node’s task of responding to the source, optional routing information can be added. Please note that we are not specifying an implementation, and thus not giving real values to any of the fields. A message implementation can have many forms, e.g., XML, binary, (name, value) pairs, etcetera. But, for reasons of performance we do suggest that the message be in a binary format. While the current fashion of expressing everything including the “kitchen sink,” in XML often has its merits, there is a performance hit one must take to analyze an XML document. Given that ONP messages contain information that may be used, modified, and possibly extended on a message’s route to the destination peerNode, it is imperative that the ONP fields be readily accessible without invoking an XML parse. On the other hand, the data payload is certainly a candidate for XML, or any other format the application writer may prefer. The ONP message is comprised of following fields:
Chapter 4
110
ONP Header: 1. Version - ONP Version number. This assures a controlled evolution of the protocol. 2. Length - This is the total length in bytes of the data following the ONP header. 3. Lifetime - How long a message can live. There are multiple possible implementations for this field. For example, it can be the maximum number of mediators this message can visit before being dropped. Each mediator decrements the value, and when it reaches zero, the message is discarded. Another possibility is the maximum time spent at mediators in transit. It could be initially the maximum value in seconds and be decreased at each mediator just before being forwarded to the next one. When the value reaches zero, it is discarded. If the lifetime expires, a nondelivery message may be returned to the source peer node. 4. Source Address - PeerIdentity 5. Destination Address - PeerIdentity 6. Connected Community Identity - Communication is restricted to a single connected community. Optional header extensions: 1. Multicast Group UUID - This is for multicast UMP messages only. In this case, the initial destination address is set to this value if the multicast message is sent to a mediator for propagation. See section 4.1.4 for the details. 2. Destination routing information - An ordered list, M1, M2,..., Ms1, of the PeerIdentities of the mediators that define the most recent route for the destination path to the destination peer node. The destination path order is M1, M2,..., Ms. 3. Source routing information - An ordered list, M1, M2,..., Mt, of the PeerIdentities of the mediators that define the most recent route for the return path to the source peer node. Given the possible volatility of some P2P overlay networks, the information in the routing extensions may be incorrect. It is a best guess given the previous communication, and routing information established using the mediator protocols discussed in section 4.2. This information can 1
In section 4.5 we add more precision to this definition by imposing a hierarchical structure on the P2P Overlay Network.
Basic Behavior of Peers on a P2P System
111
reduce the network bandwidth because it is reusable, and the receipt of messages keeps it reasonably current. Routing on P2P overlay networks can take many forms. The above routing information extensions are optional for this very reason. P2P overlay networks with a small number of mediators might have preconfigured routes and try to determine routes just before sending the message while very large P2P overlay networks may use more dynamic, mediator assisted routing mechanisms. Routing is discussed in section 4.2. Finally, given the future possibility of billions of peer nodes and mediators, a flat space will not be manageable. The routing information may be based on a hierarchical tree of mediators with a structure that is like that of IPv6 aggregators (see chapter 3, section 3.2.2.1). Figure 4-1 summarizes the ONP header fields. After the ONP headers we have the data. It is formatted as follows: 1. Data Protocol - ACP or UMP 2. Data length - Number of bytes of data
Figure 4-1. ONP Header
The data that follows these headers is protocol dependent and is discussed in the next section.
4.1.2 Universal Message Protocol The Universal Message Protocol (UMP) is our UDP analogue. Like UDP it is unreliable, and connectionless. One usually uses UMP for sending
112
Chapter 4
small amounts of data, or for simple queries. One might implement an interactive game, or a “ping-like” protocol using UMP. The UMP format is as follows:
Figure 4-2. UMP Format
Let’s assume we have a UMP status protocol. A peer sends a status request and receives a response. It is a simple UMP message exchange. Let the following be true (see chapter 3, section 3.3.2 for a definition of virtual socket and the PeerIdentity Document): 1. The local PeerIdentity document has been created and published. 2. The remote virtual socket, peerStatus, has been discovered. Then, we send and receive the UMP messages using the 4PL code: // Create our own virtual socket: our PeerIdentity.randomVirutalPort VirtualSocket loc_vs = new VirtualSocket(); // Bind the local and remote virtualSockets into a local channel // identifier VirtualChannel umpChan = new VirtualChannel(loc_vs, peerStatus, UMP); // send the status request message. Note that sending to the peerStatus // virtualSocket is all that is required. It listens for peerStatus // commands. ump_send(umpChan, NULL); // Receive a status message on a local // The status information is in the data portion of the UMPMessage. UMPMessage m = ump_receive(umpChan);
Basic Behavior of Peers on a P2P System
113
// Report status If (m NEQ NULL) { report_status(m); } For a second example, suppose we have a go game application. Because the moves are simple position descriptions on a 19 by 19 board, to return all the positions given two bytes per position requires 722 bytes. Consequently, the UMP is appropriate for playing the game. We require a listening virtual Socket to initiate games, and then a channel for playing a game between two individuals. The following 4PL describes how to setup a game, and sketches how it is played: Document goPort = new; Document(VIRTUALPORT,pidDoc,“goListener”,unicast); // Now we create the Virtual Socket VirtualSocket goListen = new VirtualSocket(goPort); // We also publish the VirtualPort document publish(goPort); // Now we listen for incoming “let’s play go” messages // The listen code only detects input on the goListen virtual socket. // It is protocol independent albeit that UMP messages will be // received on this channel VirtualChannel playGo = listen(goListen, UMP); // Here we remove the waiting message from an implicit input queue // on the channel with a UMP receive. UMPMessage newGame = ump_receive(playGo); // We next accept the challenge to play a new game // This requires the creation of a UMP channel // Extract the source socket give type, field, and UMP message VirtualSocket source_vs = extractSocket(UMP, SOURCE, newGame); // Create a local virtual socket with a random virtualPortID VirtualSocket loc_vs = new VirtualSocket();
Chapter 4
114
// Next, we again need a virtual channel that locally binds the local // and remote sockets VirtualChannel gameChan = new VirtualChannel(loc_vs, source_vs, UMP); // We now have our game loop. The receipt of the first move // is an acceptance to play Go. Underlying the code is a UI where // the player makes a move. We can only hint at that code. // Note that moves must be sequenced and acknowledged. GoBoard gb = showGoBoard(); integer oldSeqNumber = 0; loop () BEGIN “play”; // get next move from user interface GoMove myMove = acceptMove(gb); GoMove opponentMove = null; // Send the next move as data ump_send(gameChan, myMove); // Wait for move and on a timer resend the previous move loop () BEGIN “wait for reply”; // ump_receive returns null if there is a time out opponentMove = ump_receive(gameChan, timeout); IF (opponentMove EQUAL null OR opponentMove.seqNumber NOT EQUAL oldSeqNumber + 1) THEN BEGIN “resend”; ump_send(gameChan, myMove); continue; END “resend” ELSE break; END “wait for reply”; // keep us in sequence oldSeqNumber = opponentMove.seqNumber; // Have move display it displayMove(opponentMove); END “play”; close(gameChan);
Basic Behavior of Peers on a P2P System
115
close(umpChan); delete(loc_vs); delete(goListen); delete(goPort); From the above examples we intend to use UMP for simple overlay network communication. Game playing requires some reliability, for example, retransmitting the last move, but this can be implemented in such a way as to not impact the real network traffic in a negative way. On the other hand, if one is moving large amounts of data as would be done in a streaming video application, UMP is inappropriate. The Application Communication Protocol described just below provides that kind of functionality and is an extremely important network protocol that must be included as part of any P2P system. Once the ACP is written, and debugged, then it will be available for all services and applications.
4.1.3 Application Communication Protocol The Application Communication Protocol (ACP) is our TCP analogue. We assume that a reliable communication protocol underlies our P2P overlay network, and we are not going to reinvent TCP/IP. Because of the requirement for mediators to mitigate the problems of NAT, firewalls, and differing physical layer protocols, it is possible that ONP messages can be dropped in route to their ultimate destination. TCP/IP, for example, can only guarantee the delivery of a message between two peer-nodes if they have TCP/IP connectivity. If one of these nodes is a mediator, the mediator will have to buffer the message. Therefore, if a mediator runs out of input buffer space, then messages will be dropped. Also, one may be doing video streaming, transport layer security, etc., and order is important here. Messages must be received in the order in which they are sent. Mediators, like routers, cannot guarantee the order of delivery. Thus, we must have our own reliable communication protocol on the P2P overlay network, and this is what ACP accomplishes. A reliable communication protocol like ACP has several basic requirements, some of which we discussed just above: 1. We require a way to initiate sending messages on a channel. We call this starting a channel session. Similarly, we must close a session. Consequently, we need a unique channel session ID. 2. Given that a channel session can be started, and must also be gracefully closed, either the sender both starts and closes a
116
Chapter 4
session, or a session may be closed by the recipient. In the latter case, we say that a session is aborted. 3. Messages must arrive in the same order they are sent. Therefore, we need a sequence numbering scheme so that each message has a unique sequence number associated with it. These numbers must be monotonically increasing. 4. Given sequenced messages we need a way for the recipient to acknowledge the receipt of a message, or several messages. To minimize bandwidth, we will do a selected acknowledgement (SACK) just like is done in TCP. Here, the receiver acknowledges all the messages it has received even if some may be missing. Also, if an appropriate amount of time expires without receiving messages, then a duplicate SACK is sent to tell the sender that the receiver is still alive and not receiving data. As discussed below, this may be a sign that the P2P overlay network is congested2. These basic requirements enable the communication between two peers in a normal operation. Figure 4-3 highlights ACP communication without any interference.
Figure 4-3. Basic Communication Behavior
2
The reception of a duplicate SACK implying the congestion rule only applies to the wired Internet physical layer. Wireless devices and their associated link layer characteristics are noisy with at times signal to noise ratios that cause data to be lost. In this case, the rules one applies are often the opposite since the problem is related to loss and not congestion. We discuss this in chapter VI.
Basic Behavior of Peers on a P2P System
117
4.1.3.1 Aborting an ACP Session In the case where some of the messages are missing, we say there is a hole in the sequence numbers. On the other hand, until the channel session is closed, it may be that the recipient has received and acknowledged a complete sequence of messages while the sender still is attempting to send more messages, but they are not getting through, i.e., the P2P overlay network is congested. Thus, on a reasonable timer (what is meant by reasonable is discussed below), the recipient will send a duplicate SACK which tells the sender both to back off, and that the recipient is still alive. It may be that each is using a different mediator route, and the recipient’s route is viable while the sender’s is not, or a mediator’s buffer space is filled, and messages are being dropped. This is expected on a P2P overlay network because misbehavior is normal. Things may slow down, and in fact, the recipient may abort the channel session after a session time-out has expired. In Figure 4-4 an ACP session starts normally but during the session, and sent data is no longer being received. Let’s assume that two different routes are being used, and the sender to recipient route is no longer functional. The recipient sends a duplicate SACK while the sender continues to send and retransmits unacknowledged data after receiving a duplicate SACK. Finally, the recipient aborts the session because a session time-out expired. In Figure 4-4 M_n is the message number n. Also, if a channel session has been closed, and the recipient receives messages on the closed session, it will discard them. They are delayed retransmissions. Closed is closed. We do not need to perturb the network with unnecessary traffic.
118
Chapter 4
Figure 4-4. Recipient aborts session
4.1.3.2 Maximum Message Space on the P2P Overlay Network Because we are sending messages on an overlay network, we need to consider two things: First, the impact of message storage on the destination peer, and second, the message buffering capacity of mediators if the two peers do not have direct communication. Each peer has an input buffering capacity in bytes. During session initiation, the receiving peer sends its Peer Maximum Message Space (PMMS) to the peer initiating the channel session. The PMMS is the maximum number of unacknowledged bytes that can be sent. It can be one message of PMMS bytes in length or multiple messages whose total size is less than or equal to the PMMS without acknowledgement. The PMMS is a per channel maximum. And a peer may have multiple open
Basic Behavior of Peers on a P2P System
119
channels. In the second case, a mediator is a store and forward service. Suppose peer1 sends a message to peer2 via one or more mediators. The final mediator hosts the destination peer and must store and forward the messages for all active channels. This mediator knows peer2’s PMMS and will discard messages from the sender that exceed a channel’s PMMS. The finaldestination mediator has the option of acknowledging the messages it receives before sending them to peer2. It can then use the SACKs it receives from peer2 as polling requests for more messages. This minimizes the SACK traffic between peer1 and peer2 and is the preferred behavior. Otherwise, the mediator is an overlay network proxy between peer1 and peer2. If the sender continues to exceed the receiver’s PMMS or is not receiving SACKS from its hosted peer, then it should close both ends of the connection. Either the destination peer may have crashed, or the sending peer is not obeying the overlay network protocols. See section 4.1.3.3 for the correct behavior when the latter situation arises. In order to respect the mediator’s storage capacity by the means of the peer-mediator communication protocol, the mediator’s maximum allowable message buffering size is sent to the peers it hosts. The Mediator Maximum Message Space (MMMS) is the same for all mediators, and this is a predefined, global constant for the P2P Overlay Network mediators, or can be negotiated at mediators’ boot time before any peerNode connections are permitted. This is discussed in section 4.2. In large P2P Overlay Networks, the MMMS is likely to be large like a mailbox. Still, functions like the Maximum Transmission Unit (MTU) are used for IP. The mediators do not support message fragmentation while applications may implement such fragmentation. The Maximum Message Space (MMS) is the minimum of MMMS and PMMS when a mediator is required. If the MMMS is less than the PMMS, then it is not wise to have more than one active channel between two peers using a mediator for communication. If no mediator is used, then MMS is identical to PMMS. This value is communicated to the application initiating the transmission prior to sending its first message. Any attempt to send a message that exceeds the MMS must be rejected by the sending peer’s ACP software with an appropriate error message. Why? We do not want to waste bandwidth by sending the message to either a mediator or destination peerNode, and have it rejected there. It will by necessity be rejected by these destinations. The MMMS is an attempt to minimize the number of messages dropped by mediators. The idea is that a mediator will guarantee storage for the peers it hosts, and at the same time take into consideration the transient buffer
120
Chapter 4
space it must also allocate for routing. If the latter is exhausted, then indeed, a message may be dropped rather than routed. We realize that multiple peers may in fact be communicating with a single peerNode by the means of its mediator. In this case, it is possible that the hosted peerNode’s space can be exceeded. This will be treated just like a mailbox that has reached its allocated quota. Messages will be dropped, and mediator generated signals will be sent to the senders. The ACP will force these transmissions to slow down until more space is available or they are aborted. Figure 4-5 below gives an example where Peer_1 sends messages to Peer_4 and the messages are stored in Peer_4’s data storage space on its mediator. The mediator sends SACKS to Peer_1 for all messages sent to Peer_4. When Peer_4 sends a SACK to the mediator, the mediator sends any buffered messages not yet received by Peer_4 as well as any additional messages the mediator has received that will not exceed Peer_4’s PMMS. This frees data from the mediator’s storage space. This method is an alternative to the mediator forwarding all messages including those that Peer_4 may discard. The latter can occur because Peer_1 retransmits messages that have not been SACKED by Peer_4 as soon as the retransmission timeout (section 4.1.3.3) expires. It is possible that the latter timeout can occur while a SACK sent by Peer_4 is in transit. To minimize the impact of transmitted messages on the P2P Overlay Network, we have several more things to consider. We need to limit the number of messages that can be sent at a time. The transmission window is the maximum number of bytes that can be sent without an acknowledgement. Its maximum value is MMS. But, if a mediator is used, then realizing that the MMS reflects the size of a shared, input buffer on the mediator that hosts the peerNode, it is sensible for applications to send large messages in multiple pieces. This is very general. If for example, MMMS = 10 megabytes, then sending large messages in 100K byte chunks is a reasonable rule of thumb, that is, PMMS is 500K bytes so that at least 5 messages can be sent without an acknowledgement. What must be kept in mind is if one is using a security protocol like Transport Layer Security (TLS) on the IP layer, then, all TCP/IP data messages are broken into approximately 16K byte messages when implemented as part of the TLS over TCP/IP protocol. For better performance on the P2P Overlay Network, one can implement ACP end-to-end security in the spirit of S/MIME (Ramsey, B., Turner, S., 2010). Then all ACP data is encrypted, and TLS over TCP/IP is not necessary. In this case ACP will mediate the overall behavior of the P2P Overlay Network message traffic. Good overlay network behavior is rewarded with good throughput. Bad behavior gets the opposite treatment.
Basic Behavior of Peers on a P2P System
121
Figure 4-5. PMMS and MMMS
4.1.3.3 Acknowledgement and Retransmission of ACP Messages As mentioned above, an application cannot send a message whose byte size exceeds MMS. The transmission window is decreased as the sender fills it with messages, and places these sent messages on the retransmission queue awaiting acknowledgment from the recipient, as shown as in Figure 4-6. Once the transmission window is filled, that is, when the size of the next message to be sent exceeds the transmission window, then no new messages may be sent. Retransmission can be triggered by three conditions: 1. The receipt of a SACK. a) This removes the acknowledged messages from the retransmission queue. b) If the retransmission queue is non-empty, one then retransmits unacknowledged, sent messages under the constraints discussed below. 2. The receipt of a duplicate SACK. This acknowledges old messages that are no longer in the retransmission queue. Apply (1b) above.
122
Chapter 4
3. An idle timer expires, and the sender has not received a SACK. Again, apply (1b) above.
Figure 4-6. Sending a Message
The number of messages in the retransmission queue that can be retransmitted is controlled by the size of the transmission window. Retransmitting messages does not reduce the transmission window since they are either already accounted for when they were initially sent, or the retransmission window discussed below is used to assure that the transmission window in not exceeded in the case (*) of Figure 4-6. Still, under certain circumstances one does not always retransmit all the messages that the transmission window permits to be sent. The P2P overlay network may be congested, and a peer does not wish to exacerbate this situation. How do we know there is congestion? Again, the typical signal is the receipt of a duplicate SACK. Recall, this tells us two things: First, the receiving end is still alive, and second, that the sender’s messages are not getting through. To manage congestion, the sender will have a retransmission window. This is initially set to the size of the transmission window. When consecutive duplicate SACKs arrive, the sender reduces the size of the retransmission window by one half until it finally reaches 0, as shown as Figure 4-8. The receipt of a SACK that
Basic Behavior of Peers on a P2P System
123
acknowledges messages doubles the size of the retransmission window. However, it never exceeds the value of the transmission window size. What to do when the size of all messages on the retransmission queue exceeds the retransmission window size is discussed below. After a sufficiently long period of time during which the recipient has not sent a SACK, the sender has three options. First, the sender may close the session. Second, we do not want to retransmit a megabyte message which will seriously impact the performance and capacity of the P2P Overlay Network. If an unacknowledged message is less than 32K bytes, it can be retransmitted. In this case, we examine the retransmission queue for the message with the minimum sequence number that fits the 32K criteria. We want to fill holes in the recipient’s input queue so that the messages can be passed up to the requesting application. Thus, sending any message is better than not sending a message. Figure 4-9 shows this option. Third, if all messages in the retransmission queue exceed 32k or retransmission window, then a “Peer UP” Signal (PUPS) is sent on the channel. These signals are sequenced and only the most recent PUPS must be kept on the mediator hosting the destination peer. They are out-of-band, and when a hosted peerNode requests data, PUPS have priority over data. This is a peerto-peer signal. Either a SACK or an ABORT is a correct response.
124
Chapter 4
Figure 4-7. Reception of Normal SACK
Figure 4-8. Retransmission with Duplicate SACK .
Basic Behavior of Peers on a P2P System
125
Figure 4-9. Retransmission Procedure
In order to better control the frequency of retransmission two traditional variables are used. They are the Round-Trip Time (RTT) and the Retransmission Time-out (RTO). We say traditional because these variables did not first appear in the original TCP RFCs. Rather they preceded this specification. The authors who wrote an implementation of the Parc Universal Packet Byte Stream Protocol (PUP/ BSP) in 1978 defined both RTT and RTO. A round trip time is the number of milliseconds from the time a packet is sent until it is acknowledged. The running average of the 10 most recent round-trip times is the RTT algorithm we recommend for the P2P Overlay Network. The suggested initial RTT is 1000 milliseconds. The RTT can be calculated in many ways, and we prefer to keep it simple because algorithms governing protocols like TCP underlay ACP. Leave the complexity to these algorithms. The RTO is calculated from the RTT and is always greater than the RTT. When the RTO timer expires, one can retransmit messages respecting the transmission window size as discussed above. The RTO grows monotonically as a function of the RTT if the recipient does not acknowledge messages. For example, a simple rule is that the initial RTO = 2 x RTT, and it is doubled until a SACK is received up to a maximum of 10 minutes. When a SACK is received for an outstanding message, the RTO is reset to its initial
Chapter 4
126
value, as shown as in Figure 4-10. On a purely ad-hoc, P2P Overlay Network we recommend a 30-minute abort channel session time-out. If this network is in an enterprise, well administered, and the mediators all of sufficient capacity, then a channel session time-out closer to the TCP timeout can be considered. We still prefer to be more forgiving and suggest, as mentioned above, 10 minutes.
Figure 4-10. Retransmission and RTO
4.1.3.4 Closing an ACP Session When a session has successfully completed, the initiating application will close the channel, and this invokes the ACP close procedure. The closing peerNode sends an ACP close message to the destination
Basic Behavior of Peers on a P2P System
127
virtualSocket. It then waits for a close acknowledged response using the RTT as a timeout. If this response is not received within the timeout window, then the ACP close is re-sent and the session is closed as if it had received the response.
Figure 4-11. Closing a Session
4.1.3.5 ACP Messages Given the above requirements and behavior we can now define the ACP messages and their header fields: The first ACP message is to initialize an ACP channel session, and has the following headers: 1. 2. 3. 4.
ACP message Type - Start ACP session. ACP session identifier - Sending peer generated unique identifier. Source Virtual Port ID - Sending peer unique port identifier. Destination Virtual Port ID - Receiving peer VirtualPort UUID.
The second ACP message is a response to the Start ACP session message: 1. 2. 3. 4. 5.
ACP message Type - Acknowledge start ACP session. ACP session identifier - Sending peer generated unique identifier. Source Virtual Port ID - Receiving peer VirualPort UUID. Destination Virtual Port ID - Sending peer unique port identifier. PMMS - Receiving peer maximum message Space.
The third ACP message is to send data, and has the following headers:
Chapter 4
128
1. 2. 3. 4. 5.
ACP message Type - ACP Data Message. ACP session identifier - Sending peer generated unique identifier. Source Virtual Port ID - Sending peer unique port identifier. Destination Virtual Port ID - Receiving peer VirtualPort UUID. Sequence number - monotonically increasing sequence number starting from i. Given a sequence number j, j >= i, the successor of j is j + 1. 6. Data size in bytes. 7. Message Data. The fourth ACP message is the selective acknowledgement: 1. 2. 3. 4. 5.
ACP message Type - Acknowledge ACP Data Message. ACP session identifier - Sending peer generated unique identifier. Source Virtual Port ID - Receiving peer VirualPort UUID. Destination Virtual Port ID - Sending peer unique port identifier. Acknowledged sequence numbers - monotonically increasing, sorted list of the sequence numbers of received data messages. Given two sequence numbers in the list, i, and j, if i < j, then i precedes j in the list.
The fifth ACP message is close ACP session: 1. 2. 3. 4.
ACP message Type - Close ACP session. ACP session identifier - Sending peer generated unique identifier. Source Virtual Port ID - Sending peer unique port identifier. Destination Virtual Port ID - Receiving peer VirtualPort UUID.
The sixth ACP message is to response to close ACP session: 1. 2. 3. 4.
ACP message Type - Acknowledge close ACP session. ACP session identifier - Sending peer generated unique identifier. Source Virtual Port ID - Receiving peer VirualPort UUID. Destination Virtual Port ID - Sending peer unique port identifier.
The seventh ACP message is an abort session: 1. ACP message Type - Abort ACP session. 2. ACP session identifier - Sending peer generated unique identifier. 3. Source Virtual Port ID - Receiving peer VirualPort UUID.
Basic Behavior of Peers on a P2P System
129
4. Destination Virtual Port ID - Sending peer unique port identifier. 5. Error message - Optional receiver supplied text error message. The eighth is an ACP Signal: 1. ACP Signal Type - Peer UP Signal. 2. ACP session identifier - Sending peer generated unique identifier. 3. Signal sequence number - monotonic increasing non-negative integer. 4. Source Virtual Port ID - Sending peer unique port identifier. 5. Destination Virtual Port ID - Receiving peer VirtualPort UUID. Like the UMP, we need 4PL to establish the ACP communication between two peers. In order to demonstrate this, the following assumption is made: The local peerIdentity document, pidDoc, has been created, and published. Based on the above assumption, one application, such as file transfer, can use the following 4PL functions for ACP communication. Document filePort = new Document(VirtualPort, pidDoc, “fileListener”, unicast); // Now we create the Virtual Socket listener VirtualSocket fileListen = new VirtualSocket(filePort); // We also publish the VirtualPort document so other peerNodes can // connect to our peer ftp daemon publish(filePort); // Now we listen for incoming requests to send files VirtualChannel listenChannel = listen(fileListen, ACP); // Listen returns signaling ACP message has arrived // We need to locally create a transfer channel: // fileListen, source VirtualSocket VirtualChannel transferFile = acp_accept(listenChannel); // We assume a protocol is present, and the first message contains // the number of total bytes to be received and some data. // Here we receive the message from an implicit queue on the channel ACPMessage fileMsg = acp_receive(transferFile);
Chapter 4
130
// Read file size integer fileSize = fileMsg.fileSize; loop() BEGIN “receiving file”; integer nDataBytes = fileMsg.dataLength; byte data = fileMsg.getData(); // assume data is saved to a file // Account for data received fileSize = fileSize - nDataBytes; // See if we have received all data IF (fileSize EQUAL 0) THEN break; ELSE // read the next message fileMsg = acp_receive(transferFile); IF (fileMsg EQUAL null) THEN BEGIN “Error”; doErrMessage(); break; END “Error”; END “receiving file”; close(listen); close(acp_accept); delete(fileListen); delete(filePort);
4.1.4 Multicast on the P2P Overlay Network So far, we have discussed the unicast pipes and the three communication protocols in the context of establishing contact between peers and permitting them to communicate over channels. The channels in these instances are bidirectional. In this section we introduce the notion of multicast on the P2P overlay network. Why? At times the need arises for unidirectional, unreliable, UMP based communication among a group of peerNodes in a connected community. For example: A peer using a P2P overlay network “ping” to see if there is either mediator congestion or unreachable known
Basic Behavior of Peers on a P2P System
131
peers; and lazy, non-urgent alerts to a subset of friends in a connected community. In those instances, when a peer wishes to send a message to all active members of a connected community, and the message need not be reliable, we use the general multicast protocol. As we will also see, reliable multicast can be implemented using the mediator multicast protocol. Multicast requires three things. First, multicast on the P2P virtual overlay network is not true multicast in the sense that the latter is hardware based. For the latter a system enables a multicast address on its network interface. Then all multicast packets with this destination multicast address detected by the interface are passed to system network software. The use of a host multicast group as an Internet multicast standard first appears in RFC1112 (RFC1112, 1989). IPv6 allocates 112 bits for multicast groups in RFC4291 (RFC4291, 2006) and accomplishes multicast by this mechanism. Both techniques only use hardware address-based multicast on the ultimate destination local network because routers rarely forward these packets. The P2P virtual overlay network is software defined. Thus, we need to have a means to imitate multicast without hardware. To provide this functionality the ONP headers have a multicast group UUID. If the same UUID appears in the multicastGroup field of a virtualPort document of type multicast, then the mechanisms described below permit the ONP message to be delivered to that virtualPort. Membership in a multicast group is application dependent within a given connected community, and a single member can belong to more than one multicast group at a time. We assume an application wishing to use multicast will generate the required multicast UUID. It is also possible to have a collection of application wide, well known multicast group UUID’s. Second, it follows that each peerNode creates a UMP virtualSocket that listens for input, is marked as type multicast, contains the multicast group UUID in the virtualPort document and is then published locally if multicast is enabled. Third, using the Mediator Multicast Protocol, the virtualPort must be registered with the peerNode’s hosting mediator. The mediator, also using the Mediator Multicast Protocol, publishes its own association with the multicast group UUID to all mediators in the CC mediator map for the connected community in question. Each mediator hosting a peerNode that is listening for multicast within a multicast group will maintain a list associating the peerNode’s communication information with the multicast group UUID. The actual delivery of multicast messages also uses the Mediator Multicast Protocol. This P2P Overlay Network protocol is required because, as mentioned above, long range multicast is not generally supported on the Internet for both IPv4 and IPv6, and therefore cannot be
132
Chapter 4
counted upon for multicast communication beyond the local subnet. These protocols and details are discussed in section 4.2. Additionally, on physical layers that do support multicast, as mentioned in the preceding paragraph, the multicast virtual port documents are locally published in this way, thus eliminating the need for mediated multicast. Similarly, if two peerNodes have direct network connectivity, even if their publish/subscribe mechanism may be mediated, they also have no need for mediated multicast. In both cases mediated multicast must never be used. Mediated Multicast communication bandwidth is expensive, and one does not want to burden the mediator with unnecessary message traffic. In fact, in these latter two cases, a single peerNode member of the multicast group can agree to represent the other members if they share the same mediator, and in this manner proxy multicast messages between the mediator and the other peer Nodes to which it has direct connectivity. This implies that the proxied peerNodes use the proxy to send multicast messages to the Mediator, and vice-versa. Such a peerNode is called a CC mediator multicast proxy. It is important to note that multicast is possible on the overlay network between any two peers that have a direct real network connection, and the multicast group organization still works. It simply is not mediated multicast. The publish/query procedures discussed in 4.2 permit these peerNodes to directly exchange the peerIdentity and virutalPort documents necessary to establish direct multicast group communication. However, it is possible that given these alternative publication mechanisms duplicate messages may be received by some peerNodes. There are three solutions to this problem: 1. Permit duplicates and let the application manage the problem with sequence numbers or something similar. Note that well written applications that use UMP multicast must always consider the possibility of duplicate messages. 2. Only use mediated multicast to prevent the duplication that arises from physical layer multicast. We discourage this approach. 3. The mediator has at its disposal sufficient information in the ONP headers and peerIdentity documents to decide if two peerNodes are on the same subnet and if multicast is enabled. If this is true, then the mediator should not deliver messages to one peerNode from the other peerNode. If two peerNodes have direct network communication and use mediators to multicast to those peerNodes that do not, then we cannot prevent them from receiving duplicate messages. In the end solution (1) is the simplest to
Basic Behavior of Peers on a P2P System
133
implement. Now, let’s assume that some members of a CC are running an application that is using multicast. Without loss of generality, suppose this is for inter-CC status messages. Any such peerNode can then connect to those peerNodes in the multicast group for which it has direct network connectivity, and multicast on the P2P overlay network to these peerNodes. It can also send the message to the hosting mediator. That mediator will store the message locally in the buffers for those peerNodes it hosts and who have registered for the service as members of the multicast group. Next, it will forward the multicast message to all mediators in the CC mediator map that have shown interest in this multicast group for which a hosted peer has published a virtualPort document for that multicast group on the latter mediator. This is a general description of how multicast works on the topology shown in Figure 4-12. Note that Peer_1 is a mediator-multicast proxy for Peer_3 and Peer_4. The details of mediated multicast are discussed in the next section.
Figure 4-12. Multicast on P2P Overlap Network
4.2 P2P Mediator Protocols Because of the underlying, real network limitations we’ve previously discussed, mediators are at the heart of a P2P system. In the general case that exists on the Internet today peerNodes do not have direct connectivity, and they require at least a third party, or out-of-band communication to
134
Chapter 4
discover one another. A mediator fulfills this latter requirement on the P2P Overlay network. And, to communicate with one another if they are both NAT limited, a 3rd party is required for the average user that has no idea how to hack a NAT. If one of two peerNodes is NAT bound, then the NAT bound peerNode can contact the other, but not conversely. It is important to keep our goal in mind. A straightforward, standardized way to facilitate endto-end communication with either participant initiating the contact. A mediator again provides this functionality as a willing helper-in-the-middle. We do note however, that peerNodes may in fact have direct network connectivity, and in this case, along with the P2P Mediator Protocols, we will require PeerNode-to-PeerNode (PNToPN) protocols. These will use many of the same mechanisms discussed below but will always permit peerNodes to take advantage of the gains from direct connectivity. It may in fact be the case that administrators of P2P Overlay networks will not want to relinquish the implicit control that mediators give them, so we must be careful to make allowances for administrative control over the decision to use the PNToPN protocols. These are discussed in section 4.3. Finally, let us imagine that IPv6 has been fully deployed; that NAT’s no longer exist; the end-to-end communication requirement is fulfilled; and we have the expected billions of systems on the Internet. Why do we need Mediators? How can one proceed to exchange information between systems that spontaneously arrive and leave the Internet? First, some degree of organization is required. It may be several friends communicating their IPv6 addresses to one another (remember that these addresses are 16 bytes long), with each of these persons creating a private hostname table for the convenience of name to address translation. This works fine but does not scale; there is no reliable way to update this hostname table; presence detection is possible but will lack the analysis required to make sure bandwidth utilization is moderated; and ad-hoc behavior is difficult. Hence, a new system wishing to join up with these systems of some friends has no standard way of doing so. Ultimately this approach leads to chaos with 100’s of 1000’s of 1000’s of isolated collections of systems unable to inter-communicate. Well, one can look at the ISP solution as a possibility. First of all, one must pay for the service, and the question arises, “Can we do as well or even better for free?” Second, we have created a dependency that is unnecessary for P2P. If you and your neighbor wish to communicate with another without a 3rd party charging you to do so, we drop back to the situation described above. 99.99999% of Internet users do not want to administer their connectivity and keep local host tables or create a shared LDAP service and worry about that system. Somewhere in the middle of all of this P2P
Basic Behavior of Peers on a P2P System
135
arrives using software that creates, for the most part, self-administrated, connected communities where individuals have the freedom to do as they wish. Clearly, ISP’s or the ISPs of the future will have a role to play here. They can in fact provide, maintain, and administer the systems that one uses for mediators, and many people will be willing to pay a small charge for this service. It will be a commodity where there is profit in numbers. On the other hand, it is not required, and this is the key. In all cases, we need those friendly systems-in-the-middle that have protocols smart enough to keep it all running. These latter systems are called Mediators. That is what this section addresses. It is a good idea to review section 3.5.3.1 on the limitations imposed by the physical layer before continuing.
4.2.1 Mediator Document Because our architecture must be consistent, and communication must use the ONP stack, and a mediator, like all other peerNodes, requires an identity and an associated mediator document. Here security considerations are relevant. When a peerNode first connects to a mediator there is an assumed trust that in fact the mediator is the intended system it wishes to use. This is like responding to the question, “Is that web server from which I am buying an automobile really the one I believe it is.” To address this issue, we again mention CBID’s as a desirable identity solution. The mediator UUID can be a CBID which at least in the weakest sense requires the contacted mediator to verify it owns the X509.v3 certificate from which the CBID is generated. This is attackable and these attacks can be either minimized or prevented. This is discussed in chapter 5. The mediator also needs a name and a description to be user friendly. Since we wish to minimize the complexity of the communication software, a mediator requires a collection of virtualPorts to support the protocols it uses. Next, the mediator’s real transports must be described. All connections to mediators are managed at the real transport level, and all messages with a mediator as one end of the connection use the UMP. Therefore, if an application sends an ONP message to another application by the means of a mediator, then that ONP message will be encapsulated in another ONP message that is sent directly to the mediator in question. Also, because we will be required to manage our P2P Overlay Network, a well-known management connected community is defined. This CC has a UUID that is known to all mediators at boot time, and that is also advertised to peerNodes by the means of this document. Finally, the mediator document contains the mediator map’s MMMS (see section 4.1.3.2).
136
Chapter 4
Here we also have a bootstrap problem. In general, the transport address or addresses, and the virtualPort used for first contact to at least one mediator must be known by the peerNode. There are many methods to bootstrap this information. For example, an LDAP directory entry, DNS, web server, email, another peerNode on the same subnet, etc.The method will be part of the P2P software itself. For now, let us assume that such an “hello” transport address and virtualPort are known. We call this latter port the greeting virtualPort. This combination of information is named the HelloTransportInfo.
Figure 4-13. Mediator / PeerNode Encapsulation
Note that these values are also in the mediator document for verification. As is discussed in chapter 5, this document may be signed. The latter along with the CBID and a session-based challenge will assure that there is neither an imposter nor “man-in-the-middle” attack in progress. An example of such a pair might be:
Basic Behavior of Peers on a P2P System
137
tcp://129.44.33.8.9510, hello-virtualPort UUID. Given this introduction we define the Mediator document as follows: Document type = MEDIATOR Content tags and field descriptions: Legal XML character string [XML] uuid-Legal UUID in hexadecimal ascii string
Legal XML character string [XML] Legal Universal Resource Name
real protocol URI
uuid-Legal UUID in hexadecimal ascii string
uuid-Legal UUID in hexadecimal ascii string
uuid-Legal UUID in hexadecimal ascii string
uuid-Legal UUID in hexadecimal ascii string
uuid-Legal UUID in hexadecimal ascii string
uuid-Legal UUID in hexadecimal ascii string
uuid-Legal UUID in hexadecimal ascii string
Integer in ASCII format The mediatorName, mediatorUUID and description are as in the peerIdentity document. Just like for the peerIdentity document we require all communication protocols to be included. Mediators may be multi-homed
138
Chapter 4
with more than one IP address or use a transfer protocol like http. It is also possible that two different physical layers are in use, and they have different network transports. Next, we have the mediator’s virtualPorts. Each is associated with a protocol that the mediator supports. A brief description of each protocol follows: 1. Greeting - This is used to establish the initial connection between a peerNode and its mediator, or between two mediators. This protocol is required before any of the following protocols will be accepted. As mentioned above, it is included in this document for verification. The mediator greeting protocol, section 4.2.2, uses this virtualPort. 2. PNToMed - In section 4.2.3 we describe the PeerNode-toMediator communication protocol that uses this virtualPort. 3. MedToMed - The Mediator-to-Mediator Communication Protocol. See section 4.2.4. 4. Router - The Mediator Routing protocol, section 4.2.5, between mediators, or mediators and peerNodes uses this virtualPort. 4. MedInfo - The Mediator Information Query Protocol between mediators, or mediators and peerNodes is for passing status information like load, and number of connections between these nodes. See section 4.2.6 for the details. 5. Multicast - This virtualPort handles all aspects of multicast. See section 4.2.7.
Figure 4-14. Mediator VirtualPort Connectivity
Basic Behavior of Peers on a P2P System
139
It is important to note that once these virtualPorts are known through the greeting protocol, then no further discovery is required. Unlike the application virtualPorts, the mediator virtualPorts can be directly contacted using the underlying transport information, and without acquiring an associated virtualPort document. Also, all these protocols are subject to looming denial-of-service attacks. To confront these attacks at the front door, each protocol has as one of its responses a challenge that has a global effect on the attacking peerNodes behavior. It is certainly possible in some P2P configurations that a peerNode can attack multiple peerNodes or mediators. To this end an effort will be made to broadcast a list of such attackers to known peerNodes or mediators. Detection is complicated because attackers can hide their source identity. We have ways to combat this behavior and they are covered in chapter 5. The mediator greeting protocol is described in the next section.
4.2.2 Mediator Greeting Protocol The “greeting” protocol is initiated by a peerNode or mediator when it first contacts any mediator. PeerNodes will be hosted by a single Mediator, but in all probability will require more than one mediator to, for example, appropriately route a message. And certainly, in all but very small P2P overlay network infrastructures, mediators will contact multiple mediators. Recall that initially all peerNodes are in the public connected community, and all mediators maintain mediator maps for this same connected community. The ONP / UMP messages used in this protocol will have the public connected community’s UUID in its appropriate place following the source and destination virtualPorts. The greeting protocol is for exchanging two documents. For a peerNode to mediator’s first contact, the peerNode sends a greeting message which contains the text “greetings” followed by its peerIdentity document. The responses are as follows: 1. The responding Mediator sends a message containing “welcome” followed by its mediator document. The receipt of the document acknowledges that the peerIdentity document has been received; the mediator has placed the peerNode in its public connected community; and the peerNode’s M-INBOX has MMMS bytes reserved for the receipt of messages. 2. The responding Mediator sends a message containing “redirect” followed by a list of alternative mediators to try. Here, the mediator is saying that it is overloaded.
140
Chapter 4
3. The responding Mediator sends a message containing “Challenge” accompanied with a client puzzle to be solved. In this case, the next greeting command must contain the solution and either the peerNode or mediator document as is appropriate. These puzzles typically send a hash of N bits, for example, 160 bits of a SHA256 hash, of J bytes, along with J-K bytes. The challenge is to find the K bytes that satisfy the hash. Sometimes multiple sets of the above are sent to increase the difficulty (Jules and Brainard 1999, 151-165). 4. The responding message sends a message containing “refused.” In this case, the sender has been refused service. Command: Greeting text parameter Text := “greeting” parameter := peerNode document | mediator document | puzzle solution Response: type ReturnValue type := “welcome” | “redirect” | “challenge” | “refused” If type == “welcome” OR type == “redirect”, then ReturnValue := {N} L(I), I = 1,...,N N := 1 if type is “welcome” L(I) := Mediator document If type == “challenge”, then ReturnValue := Puzzle to solve if type == “refused”, then ReturnValue := Optional explanation. For the initial contact between mediators, there is an exchange of mediator documents using the same “greetings” and “welcome” preambles. The mediators use the public connected community in the CC ONP header field. Also, mediators exchange lists of known mediators to help establish the public CC mediator map, and the keep-alive average round trip time. Each element of this list will have four members: (Mediator UUID, Mediator MedToMed Port, Mediator real transport addresses, MaximumCCMediatorGet). Finally, the Mediator-Election-ID is included. This is the identity of the next election to be held. See section 4.2.4 on the MedToMed
Basic Behavior of Peers on a P2P System
141
protocol for more details. This scenario is very ad-hoc in nature. We have not mentioned security issues. For example, has the peerNode really connected to the mediator it believes it has, and is the peerNode one of those the mediator supports. It is both possible that peerNode is falling victim to an imposter attack, and that the mediator has a list of peers to which its support is limited. While the details of resolving these issues are in chapter 5, we can say that there are multiple ways to assure the above communication is secure and that the strength of this security varies as a function of the requirements of the connected communities involved. At each phase of a protocol’s behavior security questions will arise that must be addressed. Once the greeting protocol has completed, both the peerNode and mediator can begin to do what is necessary to support the multitude of possible P2P applications. The peerNode-to-Mediator communication protocol enables application-level communication within private connected communities and is discussed next.
Figure 4-15. Greeting Protocol
142
Chapter 4
Figure 4-16. Greeting Message
4.2.3 PeerNode-to-Mediator (PNToMed) Communication Protocol As mentioned in chapter 3, section 3.4, a mediator maintains a CC mediator-map for the CC’s of the peerNodes it hosts, and all of its hosted peerNodes are automatically members of the public CC. The public CC is used to access public documents that the hosted peerNodes wish to share with all peerNodes. The non-public CC’s may or may not have access control policies, and this is CC dependent. One functionality of the PNToMed protocol is to provide the mechanisms for publishing, subscribing to, and finding documents in CC’s. There are many ways to implement publish, subscribe, and search algorithms, and these are discussed in sections 3 and 5 of this chapter. An example of a public CC accessible document is a CC document. One needs a bootstrap procedure to discover CCs, and thus, be able to join them. This “bootstrapping” can be done in the public CC. We realize that in some instances CC discovery may avoid using the public CC. This can either be
Basic Behavior of Peers on a P2P System
143
done through CC mediators that are restricted to the CC’s that they support, by placing the CC documents in the P2P code, or by using an LDAP directory to get these documents, etc. Still, we need a way to discover CC’s without out-of-band techniques, and we believe for the most part that CC’s will be discovered by the means of the public CC and mediators. The mechanism is in place for that purpose. Next, a peerNode uses this protocol to receive messages that are queued for it on the mediator. These messages all contain the CC community UUID, and it is the mediator’s responsibility to correctly deliver them. If the destination peerNode is not in the mediator-CC map associated with this CC UUID, then the message is not delivered, that is to stay, stored in the peerNode’s INBOX. To these ends the PNToMed protocol has the following commands: 1. Notify - A peerNode notifies the mediator that it is active, or inactive as a member of a CC, or is no longer interested in being hosted as a member of that CC, that is, to be removed. The CC UUID is the connected community identity in the ONP message header. 2. GetMessage - A peerNode pulls a message from its INBOX on the mediator. 3. Publish - A peerNode publishes meta-data to the mediator in a CC. Note that publish has many possible implementations but, in all cases, meta-data describing documents is published rather than the documents themselves. We do not say what representation this meta-data takes, and for example, a (name, value) pair may be used for indexing and be posted on the mediator. 4. Query - A peerNode searches for a document. Just as above, we cannot say where the document is. Rather, the mediator assists in finding it. This request has a counterpart which is the query response. 5. Content Request - A peerNode has sufficient meta-data information from a query for application data to request that content from an owning peerNode. It uses this command to invoke the process to retrieve the content. This request has a counterpart which is the content response. 6. Mobile Agent - A peerNode can send a mobile agent to be run on a Mediator using this command. The use of this command is discussed in detail in chapter 7, section 7.3.
Chapter 4
144
4.2.3.1 The Notify Command We assume that a peerNode using the CC-Protocol has become a member of a CC with the CC UUID, UUID0. Then, as a member of this CC it can either become an active member, that is to say, wishes to be both discovered and communicate in the CC, or become inactive. It sends the Notify command to its hosting mediator at the PNToMed virtual port with Notify and the CC status it wishes to set as the UMP parameters in the data payload. In response the mediator sends an OK, DENIED, or a CHALLENGE. Each response will have the appropriate reason which is implementation dependent. Below we use a text description: 1. OK “Notify/Active,” OK “Notify/Inactive,” or “Notify/Remove.” 2. DENIED “CC Quota Exceeded - Try Notify/Remove” 3. CHALLENGE “Solve the following puzzle” which is accompanied by a client puzzle to solve. An example of the Notify command is in Figure 4-17.
Figure 4-17. PeerNode notifies mediator it is active in CC UUID0
When the mediator receives an active CC status from a peerNode, then if it is not a member of the CC mediator map for this CC, it adds itself to
Basic Behavior of Peers on a P2P System
145
this map, and will notify all mediators in the mediator map (see chapter 3, section 3.5.3) that it is now a member of that CC mediator map using the MedToMed protocol discussed in section 4.2.4. We only note here that this notification is done in such a way as to minimize its impact on the P2P Overlay Network. During this notification process a private copy of the CC mediator map is built. With respect to the peerNode Notify/Active command, the mediator does several things to initialize the CC context: 1. It adds the peerNode to a list of hosted peerNodes for this CC. When this list becomes empty for any of the reasons below, or by the means of the Notify/Remove command, the mediator removes itself from the associated CC mediator map, and suitably notifies the other mediators in the mediator map. When all such mediators have been notified, it flushes all data associated with that CC. This procedure is discussed in detail in section 4.2.4. 2. For each peerNode a list of the CC’s to which it belongs should be kept. This simplifies publication, subscription, and query mechanisms. 3. It keeps an expiration time for the binding between a peerNode and a CC. This time is increased if the peerNode shows an active interest in this CC. If the interest is idle for the expiration time, then the peerNode is removed from the list in 1, and its quota counter in (4) is decremented. 4. For each peerNode it keeps a quota counter of the number of CC’s to which it belongs. The mediator certainly has the right to have a quota, and as noted in the “DENIED” response, can refuse further Notify “Active” commands. 5. It maintains the time between successive notify commands. If the frequency of these commands appears to be a DoS attack, the Mediator responds accordingly as is discussed in chapter 5. 6. A set of relationships is created to ensure the correct delivery of CC messages for this now active CC. For example, associated with each CC can be a list of pointers into the M-INBOX for the CC messages this peerNode has received, or the M-INBOX can be a union of multiple, non-intersecting, M-INBOXs, one for each CC. From experience, we believe the latter approach is preferable because it makes the Notify/Remove command easier to implement since messages stored for a particular CC may have to be expunged.
146
Chapter 4
7. Structures are created for the publication of, and subscription to content within the context of the CC. In general, being a member of a CC will create a subscription to the CC’s documents. The mediator does not manage CC’s access rights. These are CC dependent. For example, when a peerNode publishes a document in a CC, it may only be indexed across those mediators in the mediator map for that CC, and structures must be put in place to support this mechanism. One approach is for the list of edge peers in this CC to have a pointer to the mediator map for that CC. The Notify/Inactive command has three parameters: Flush, Delete, and Keep which are explained below. Also, this command leaves the above data structures in place and takes the following actions with respect to messages in this CC: 1. The arrival of this command blocks any further input to the MINBOX. Note that the storing of any message that is in progress can be finished. But, before storing a message, the task that stores them tests the CC active state for this peerNode. If that test is false, then no further messages can be stored in the M-INBOX. 2. Notify/Inactive/Flush - The mediator will send to the peerNode all the pending messages in the M-INBOX. 3. Notify/Inactive/Delete - The mediator will delete all pending messages. 4. Notify/Inactive/Keep - This will keep any messages in the MINBOX until an expiration time expires. The peerNode may send an additional parameter which is the expiration time in minutes. The mediator may also have an expiration time and in this case the minimum of these two times is used. The Notify/Remove command does the following on the mediator: 1. If there are pending messages in the M-INBOX and the Notify/Remove/Flush command is received, then these messages are sent to the peerNode. The default action is to delete all such messages. All messages arriving after this command has been received are ignored. 2. All data structures associated with this peerNode and the CC are expunged. 3. The peerNode’s CC quota counter is decremented.
Basic Behavior of Peers on a P2P System
147
Figure 4-18. Mediator Structures for Notify Command
4.2.3.2 The GetMessage Command Recall that the peerNode is restricted to receive messages from within anyone its active CC’s. We define M-INBOX(CC) to be all of those messages in a given M-INBOX from peerNodes in that particular CC. Thus, in a getMessage command message sent from a hosted peerNode, the (source peerIdentity, CC UUID) pair in the ONP header selects for the hosted peerNode the M-INBOX(CC) to be used. For this reason, the getMessage command has a subcommand, GetMessage/WhichCC, to ask for all the CCs in which it has pending messages. In this case, the CC UUID in the ONP header will be all 0’s. This form of the GetMessage command is defined as follows: Command: GetMessage/WhichCC Returns: List of pairs, (CC UUID, N), where N is the Number of Messages in the respective Virtual M-INBOX, N >= 0. Given a particular CC UUID, the messages are from one or more peerNodes in the CC. It may well be that the requesting peerNode would like to select messages from a single peerNode. Applications can have priorities of which the mediator is unaware. To this end, given such a CC UUID, we have the “Get Message/FromPeerIdentity” command:
148
Chapter 4
Command: “GetMessage/FromPeerIdentity” (Note this is a text string and not the peerIdentity as a UUID) Returns: List of pairs, (peerIdentity, N), where N is the number of messages pending from that peerIdentity, N >= 0, in the MINBOX(CC) identified with CC UUID in the ONP header.
Figure 4-19. GetMessage/“FromPeerIdentity” Command
Finally, to get messages from a given M-INBOX(CC) where the peerNode is currently active for this CC, we use the GetMessage commands defined as follows: Command: “GetMessage[/peerIdentity][SPACE][N]” Returns: Get 1 or more messages in M-INBOX(CC), or none if MINBOX(CC) is empty. If the peerIdentity is present, then only those messages from that peerIdentity are returned. If N is present then at most N messages are returned, otherwise the next message is returned.
Basic Behavior of Peers on a P2P System
149
Figure 4-20. GetMessage/peerIdentity/N Command
4.2.3.3 The Publish Command What we envision for publication is multi-faceted. Realizing that content exchange is extremely important and that in many cases content must be protected, while in others there will be no access restrictions, our methodology for publication must address each of these needs, and everything in between. Also, a means is required to advertise or attract users to content. In chapter 3 we discussed connected communities (CC) and the role they play in achieving these goals. They provide a firewall for their members. The members set the access control to their content. We noted that to bootstrap this process we need a public connected community, PubCC, where access is limited to the CC documents and meta-data for other kinds of content. This latter meta-data is how one can advertise and draw users to join CC’s. It may be URLs pointing to web pages with directions and the motivation for joining. This is left to the CC creators and members to decide. On the other hand, the CC’s content for reasons described above, cannot be directly accessed in the PubCC. The PubCC can be viewed as either the base of a large bowl that contains all the content, or the base nodes of a huge content tree. By the means of the publish command a peerNode makes its documents (see Chapter 3 for document descriptions) and general content such as jpeg and mpeg files, text files, videos, code in both binary and source form, etc.,
150
Chapter 4
available to other peerNodes. Publish is explicitly tied to query. Thus, the publish command below has a somewhat general description for which the details cannot be supplied until we give specific examples in section 4.5. We know from experience that it is better not to publish data directly to mediators, but rather to publish representations or meta-data such as indices to be hashed. In such a context the data itself, when queried, will be supplied by peerNodes. And there is some data that must be exclusively accessed from the peerNode on which it was created, and furthermore, there will be data that may or may not migrate. For example: The virtual port document must always be accessed from the peerNode that created it. The reason for this is that if a peerNode desires to no longer permit connections to this virtualPort, it withdraws the document from its local store. If such a document was to be republished by the peerNodes that retrieve a copy, then synchronizing its revocation becomes almost impossible because it can be retrieved from any peerNode that published it. Finally, if the expiration date in the document is ignored, and peerNodes keep trying to connect to this non-existent virtual port, they are wasting their cpu cycles. The connections are refused. We cannot prevent multiple peerNodes from attempting to publish content whose author wishes to be the exclusive source. On the other hand, we easily can make it the responsibility of the peerNodes who receive such documents from non-originating sources to use these documents at their own risk. But we are not getting into the issue of digital rights management. Rather, we are pointing out in chapter 5 some mechanisms that can be used to let the recipient of digital data recognize that it has come from the wrong source. Recall that in chapter 3, section 3.3.2.4.1, the virtualPort document has a field. This contains data that may be used to prove that the source from which the document was retrieved is the originating source that has the exclusive right to publish this document. If this document is published rather than an index, then mediators can respect this field. If indexing is used, then mediators may not be able to detect a republished document. In this case the primary responsible party for honoring the originator’s desires is the republisher. But, if the republisher decides to fraudulently republish the document or content, we provide mechanisms to detect this misbehavior at the recipient peerNode. These mechanisms can be either ad-hoc or tightly controlled by trusted 3rd parties (T3P). Now to the publish command itself. The publish command has a list, L(i), of elements describing the data to be published. Each of these elements has at least four fields, F1, F2, F3, and F4. F1 is meta-data that describes the data; F2 is the publication lifetime of the data in minutes; F3 is the {peerIdentity, virtual port name} = {virtual
Basic Behavior of Peers on a P2P System
151
Socket} of the originating peerNode. F3 permits a querying peerNode to contact the originating peer Node; and F4 is a flag to indicate if this is a system document or application meta-data. Without loss of generality, we can say F4 is true for system documents, and false otherwise. When the elements L(i) are published, this will be on one of the mediators in the CC’s mediator map. Each element, L(i), is published with the originating peerNode’s peerIdentity. Thus, any query for a L(i) will be accompanied with the owner’s peerIdentity. Note that since the publication is within a particular CC, a mediator storing any of the publication list elements {L(i), virtualSocket} pairs, must also store the source route taken from the ONP message. The peerNode’s source route may be incorrect because of the dynamicity of the P2P Overlay Network. In this case, the mediator will correct the source route with its current source route information. Why store the source route? This source routing information is a hint of how to find the publishing peerNode whose virtualSocket is known from the {L(i), virtualSocket} pair. Again, routing is covered in section 4.2.5. F5 and other fields are left to the implementers' imaginations. For example, F5 might be a credit card number with expiration dates if there is a cost associated with data publication. Meta-data may be (name, value) pairs, URL’s, etc. A peerNode must proceed with the publish command with a Notify/Active command to become an active member of a CC. The CC UUID in the ONP message containing the publish command must be the CC UUID for the currently active CC, otherwise an error message is returned. Please note that we are not specifying the specific syntax of the list elements, L(i). Rather, we are showing the structure of the command. For example, they may be in XML format: Command: Publish {N} L(1), L(2),..., L(N) L(i) := List element, {F1, F2, F3, F4} (i) F1 := Meta-data F2 := Publication lifetime F3 := VirtualSocket of originating peerNode F4 := true | false, true if it’s system document Returns: OK [M]. If M is not present, then all N documents were successfully published. Otherwise, 1