135 50 25MB
English Pages 378 [373] Year 2020
Feng Zhang
High-speed Serial Buses in Embedded Systems
High-speed Serial Buses in Embedded Systems
Feng Zhang
High-speed Serial Buses in Embedded Systems
123
Feng Zhang The 10th Research Institute of CETC Chengdu, China
ISBN 978-981-15-1867-6 ISBN 978-981-15-1868-3 https://doi.org/10.1007/978-981-15-1868-3
(eBook)
© Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
What This Book Is About? Bus is a communication system that transfers data between components inside a computer, or between computers. It covers all related hardware components (wire, optical fiber, etc.) and software, including communication protocols. Since the born of computers, three revolutions in bus have been seen with the very beginning of PC/AT (Personal Computer/Advanced Technology) proposed by IBM in the year 1984 as the first generation bus with a transfer speed less than 1 Mbps, then PCI (Peripheral Component Interconnect Local Bus) with the transfer rate at 133 Mbps came in the year 1993 as the second generation bus. Both of the two generation buses transfer data in parallel mode. PCIE (PCI Express), as a representative of the third generation bus appeared in 2002 making a huge revolution with data transferring in the serial bit stream. The transfer rate of PCIE could reach 2.5 Gbps in one differential pairs, and 20 Gbps in the X8 mode. Different methods make different classification results. In the view of transfer rate, buses can be divided into buses with low transfer rate and buses with high transfer rate with the clock 66 MHz as a general line. In the view of transfer mode, data can be transferred in parallel buses or in the serial buses. In the parallel buses, there are usually 1 clock signal and several data signal combined together to complete the data transmission. In the serial buses, there is no special clock signal and the clock is usually embedded in the serial bit stream. How to use the high-speed serial buses in the embedded system is the central mission of this book.
v
vi
Preface
Why the Serial Buses? In the early times, the parallel buses were the probe transfer mode to choose, and the transfer rate could be enhanced by two ways, lifting the transfer clock or enlarging the data width. The transfer clock rate was lifted from KHz to MHz then to several hundreds of MHz, and the data width was extended from 8 bit to 32 bit to 64 bit then to 128 or 132 bit. The higher the clock rate and the wider the data width, the more difficulties must be overcome in the signal integrity and reflection and crosstalk. All these weakness does not exist the serial buses. With the clock embedded into the serial bit stream, a several Gbps transfer rate can be arrived easily. PCIE, SRIO, FC, Aurora, SATA, and SAS are the current mainstream in the embedded system now.
Who Is This Book For? The book is meant to be an introduction to serial buses in the embedded system. I assume the reader has a little of familiarities with the serial buses, including definitions, layered-structure, 8/10 b and CRC methods, also the reader should have a little of familiarities with the FPGA and its programming language VHDL or Verilog.
Outline of This Book History and Development of Bus (Chap. 1) Chapter 1 provides the definition and appearance of bus, then followed by the three revolutions for bus in the computer and in the embedded system, and at last the fourth revolution in bus is predicted. High-Speed Data Transfer Based on SERDES (Chap. 2) Chapter 2 provides several basic concepts adopt in the serial buses, including parallel to serial translation and serial to parallel translation, LVTTL/LVDS/CML logical levels, and a step-by-step example for how to use the Serdes in FPGA. Finally, several crucial optimization schemes including Maxskew/Offset in/IDELAY2 come on stage. JESD204/SRIO/SATA/Aurora (Chaps. 3–6) Chapters 3–6 provides the basic principle of JESD204/SRIO/SATA/Aurora/FC firstly and then a step-by-step for how to construct the FPGA project examples. CPCIE/VPX/FC (Chaps. 7–9) Chapters 7–9 provide some factors which should be considered in the CPCIE and VPX architectures. Chengdu, China
Feng Zhang
Contents
1 History and Development of Bus . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Appearance and Definition of Bus . . . . . . . . . . . . . . . . . . . . . . 1.2 Progress of Bus in PC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 ISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 PCI/PCI-X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 PCIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 ATA/SATA—Used for Storage . . . . . . . . . . . . . . . . . . 1.3 Progress of Bus in Embedded System . . . . . . . . . . . . . . . . . . . 1.3.1 The Emergence of Embedded Systems . . . . . . . . . . . . . 1.3.2 PC104—The Embedded Version of ISA . . . . . . . . . . . . 1.3.3 Compact PCI—The Embedded Version of PCI . . . . . . . 1.3.4 Compact PCI Express—The Embedded Version of PCI Express . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.5 SRIO—The Embedded System Interconnection . . . . . . . 1.3.6 JESD204—Solving the ADC, DAC Data Transfer Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.7 FC—A Combination of Channel I/O and Network I/O . 1.3.8 VPX—An Integration Architecture of High-Speed Serial Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Analysis of the Three Evolutions of Bus . . . . . . . . . . . . . . . . . 1.5 Common Attributes in High-Speed Serial Buses . . . . . . . . . . . . 1.6 The Development Trend of High-Speed Serial Bus in Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Speed Upgrades Constantly . . . . . . . . . . . . . . . . . . . . . 1.6.2 Adoption of Multiple Signal Levels . . . . . . . . . . . . . . . 1.6.3 Laser Communication and Its Miniaturization . . . . . . . . 1.6.4 Extended Reading—Laser Takes the Place of Microwave Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
1 1 3 4 5 8 10 12 12 14 16
.. ..
19 20
.. ..
23 24
.. .. ..
26 27 31
. . . .
. . . .
33 34 34 35
.. ..
37 39
. . . . . . . . . . .
vii
viii
2 High-Speed Data Transfer Based on SERDES . . . . . . . . . . . . . . 2.1 Brief Introduction to Serdes . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 LVDS—Physical Layer of Serdes . . . . . . . . . . . . . . . . . . . . . 2.3 Data Transfer Based on Serdes Primitive Embedded in FPGA 2.3.1 FPGA Supports LVDS Level . . . . . . . . . . . . . . . . . . . 2.3.2 FPGA Embeds-in OSERDESE2/ISERDESE2 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Analysis of the Transfer Rate of Serdes . . . . . . . . . . . 2.4 Implementation of Serdes Transfer Function in FPGA . . . . . . 2.4.1 OSERDESE2 Configuration at the Transmitter in FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 ISERDESE2 Design at the Receiver in FPGA . . . . . . . 2.4.3 Experiment Result of Serdes Communication . . . . . . . 2.5 Extended Reading—Optimization Scheme for Multi-channel Communication Based on Serdes . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Clock Region Optimization . . . . . . . . . . . . . . . . . . . . 2.5.2 MAXSKEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 IDELAY2 Primitives to Adjust the Delay . . . . . . . . . . 2.5.5 A Self-Adaptive Delay Adjustment Scheme Based on Idelay2 Primitive . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Brief Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Extended Reading—A New Rising Star: Xilinx and Its FPGA References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 ADC, DAC Data Transmission Based on JESD204 Protocol . . . 3.1 Introduction to JESD204 Protocol . . . . . . . . . . . . . . . . . . . . . 3.2 Detailed Analysis of JESD204 Specification . . . . . . . . . . . . . . 3.2.1 JESD204 Physical Layer Analysis . . . . . . . . . . . . . . . 3.2.2 Frame Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 8B/10B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Scrambling/De-scrambling . . . . . . . . . . . . . . . . . . . . . 3.2.5 Analysis of JESD204 Protocol Receiver State Machine 3.3 Implementation of JESD204 Protocol Based on GTX Embedded in FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Feasibility Analysis—Physical Layer Electrical Characteristics Compatibility . . . . . . . . . . . . . . . . . . . 3.3.2 GTX Structure Analysis . . . . . . . . . . . . . . . . . . . . . . .
Contents
. . . . .
41 41 42 45 45
... ... ...
46 48 49
... ... ...
49 62 70
. . . . .
. . . . .
. . . . .
71 72 73 75 77
. . . .
. . . .
. . . .
80 81 82 83
. . . . . . . .
. . . . . . . .
. . . . . . . .
85 85 90 90 92 93 96 97
. . . . .
. . . . .
. . . 100 . . . 101 . . . 101
Contents
3.3.3 Build the FPGA Project for JESD204 IP Core Based on GTX . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Analysis of Some Technical Points of JESD204 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
. . . . . . . . 109 . . . . . . . . 123 . . . . . . . . 128 . . . . . . . . 129
4 SRIO: The Embedded System Interconnection . . . . . . . . . . . . . . 4.1 SRIO—Dedicated for the Embedded System Interconnection . 4.1.1 Embedded Bus and PC Bus Applications Went Separate Ways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 SRIO Technology Dedicated for Embedded System Interconnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 SRIO Versus PCIE Versus Ethernet Versus Others . . . 4.2 SRIO Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 SRIO Protocol Hierarchical Structure . . . . . . . . . . . . . 4.2.2 SRIO Physical Layer Specification . . . . . . . . . . . . . . . 4.2.3 Packet and Operation Types . . . . . . . . . . . . . . . . . . . . 4.2.4 Lane Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Lane Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6 Configuration Space . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Point to Point SRIO Communication Based on FPGA . . . . . . 4.3.1 Create the SRIO Project . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 SRIO Project Structure Analysis . . . . . . . . . . . . . . . . . 4.3.3 Analysis and Realization of Key Technology of SRIO Point-to-Point Communication . . . . . . . . . . . 4.3.4 SRIO P2P Communication Function Test . . . . . . . . . . 4.4 The Implementation of Communication Function of SRIO Switch Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Overview of the SRIO Switch Fabric . . . . . . . . . . . . . 4.4.2 Brief Introduction on SRIO Switch Chip 80HCPS1616 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 The Configuration of SRIO Switch Chip 80HCPS1616 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 I2C Configuration Interface for 80HCPS1616 . . . . . . . 4.4.5 Maintenance Frame Configuration for SRIO Switch Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.6 Communication Function Test of SRIO Switch Fabric . 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 131 . . . 131 . . . 131 . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
133 135 137 137 140 143 146 146 149 149 151 162
. . . 164 . . . 167 . . . 168 . . . 168 . . . 169 . . . 171 . . . 178 . . . .
. . . .
. . . .
180 186 187 189
x
5 Transmission Technology Based on Aurora Protocol . . . . . . . . 5.1 Aurora Bus Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Aurora Bus Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Aurora Bus Communication Model . . . . . . . . . . . . . . 5.2.2 Electrical Characteristics of Aurora Physical Layer . . 5.2.3 Aurora Data Frame Structure . . . . . . . . . . . . . . . . . . 5.2.4 Aurora Lane Synchronization . . . . . . . . . . . . . . . . . . 5.3 Implementation of Aurora Point-to-Point Data Transmisstion Between FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Establish Aurora Bus Testing Project . . . . . . . . . . . . 5.3.2 Analysis of Aurora Bus Protocol Files and Interfaces 5.3.3 Aurora Bus Frame Mode and Streaming Mode . . . . . 5.3.4 Aurora Bus Communication Performance Analysis and Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
191 191 192 192 193 195 197
. . . .
. . . .
. . . .
. . . .
201 201 207 209
. . . . 214 . . . . 217 . . . . 217
6 High Speed Data Storage Technology Based on SATA . . . . . . . . 6.1 Various Modes of High-Speed Data Storage Technology and the Involved Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Data Storage Mode Based on ATA Bus Standard . . . . 6.1.2 High Speed Data Storage Mode Based on SCSI Bus Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 High Speed Data Storage Mode Based on SAS/SATA Bus Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 Extended Reading—High Speed Data Storage Mode Based on NandFlash Arrays . . . . . . . . . . . . . . . . . . . . 6.1.5 Extended Reading—High Speed Data Storage Mode Based on eMMCs and Its Array . . . . . . . . . . . . . . . . . 6.1.6 Comparison and Analysis of Multiple Storage Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 SATA Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 OOB Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Primitives and Frame Information Structures . . . . . . . . 6.2.4 Encode Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Implementation of SATA IP Core in FPGA . . . . . . . . . . . . . . 6.3.1 Brief Introduction to ML50x Evaluation Platforms . . . 6.3.2 Brief Introduction to Virtex-5 FPGA GTX . . . . . . . . . 6.3.3 GTX Configurations to Comply with SATA Protocol . 6.3.4 OOB Communication of SATA Protocol . . . . . . . . . . 6.3.5 Implementation of 8B/10B, CRC and Scrambling . . . . 6.3.6 Implementation of Analysis on Application Layer of SATA Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 219 . . . 220 . . . 220 . . . 222 . . . 224 . . . 227 . . . 230 . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
231 232 232 233 235 240 240 241 241 244 252 255
. . . 257
Contents
xi
6.3.7 Implementation of Application Layer . . . . . . . . . . . . . 6.3.8 SATA Protocol IP CoreTest . . . . . . . . . . . . . . . . . . . . 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Extended Reading—DNA-Based Biology Storage Technology Appendix 1: SATA CRC32 Implementation in VHDL . . . . . . . . . . Appendix 2: SATA Scrambling Implementation in VHDL . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
263 264 266 267 273 280 286
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
289 289 295 297 299 299 301 312 312 315 316 316 317
8 VPX Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Brief Introduction to VPX and Its Origin VME . . . . . 8.2 Analysis on VPX Protocol Families . . . . . . . . . . . . . . 8.3 Signals and Interconnect . . . . . . . . . . . . . . . . . . . . . . 8.3.1 VME32 Signals . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 VPX Signals . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Pin Mappings Between Backplane and Plug-in Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 System Design Consideration . . . . . . . . . . . . . . . . . . 8.4.1 Logical Topology . . . . . . . . . . . . . . . . . . . . . 8.4.2 Connectors Selection . . . . . . . . . . . . . . . . . . . 8.4.3 Backplane Keying . . . . . . . . . . . . . . . . . . . . . 8.4.4 Power Design . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
319 319 323 327 327 329
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
329 329 329 332 333 335 340 341
9 Implementation and Application of FC Protocol . . . . . . 9.1 Brief Introduction to FC . . . . . . . . . . . . . . . . . . . . . 9.1.1 FC Appears from Big Data, Clouds and SAN 9.1.2 Advantages of FC . . . . . . . . . . . . . . . . . . . . 9.1.3 FC Roadmap . . . . . . . . . . . . . . . . . . . . . . . . 9.1.4 Applications of FC to Airborne Avionics . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
343 343 343 345 346 347
7 Compact PCI Express . . . . . . . . . . . . . . . . . . . . . . 7.1 From ISA to PCI to PCIE . . . . . . . . . . . . . . . . 7.2 Compact PCIE—Embedded Version of PCIE . . 7.3 Classification of Functional Modules in CPCIE 7.4 CPCIE Connectors and Signals Definition . . . . 7.4.1 Connectors . . . . . . . . . . . . . . . . . . . . . 7.4.2 Definition of Signals . . . . . . . . . . . . . . 7.5 System Design Considerations . . . . . . . . . . . . . 7.5.1 Functional Labels of Boards . . . . . . . . . 7.5.2 Power Supply Requirements . . . . . . . . . 7.5.3 Clock Design . . . . . . . . . . . . . . . . . . . 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . .
xii
Contents
9.2 Analysis of FC Specification . . . . . . . . . . . . . . . . 9.2.1 FC Topology . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Hierarchical-Layered Structure . . . . . . . . . 9.2.3 FC Protocol Families . . . . . . . . . . . . . . . . 9.2.4 Frame Structure and Coding Scheme . . . . 9.2.5 Classes of Service . . . . . . . . . . . . . . . . . . 9.2.6 Interface Forms . . . . . . . . . . . . . . . . . . . . 9.3 Analysis on Realization of FC Protocols . . . . . . . 9.3.1 Realization Scheme Based on IP of Xilinx 9.3.2 Realization Method Based on ASICs . . . . 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
348 348 350 352 352 356 361 361 362 363 365 366
Chapter 1
History and Development of Bus
1.1 Appearance and Definition of Bus The word “bus” originates from the computer whose history can be traced back to 1946. The military designed the first computer for the calculation of ballistic trajectory and gunnery and put the weapon-aimed equipment into operation in the peace-loved University of Pennsylvania on February 15, 1946. They call it ENIAC (Electronic Numerical Integrator And Computer) [1]. ENIAC consisted of 17,468 vacuum tubes, consuming a power of 174 kW, covering an area of 170 m2 , weighing 30 t, performing 5,000 addition operations and 400 multiplication operations per second, and costing USD 1 million. What does the operation speed mean? Taking π as an example, it took Tsu ChungChi (a distinguished Chinese mathematician, born in A.D.429) 15 years to calculate π to 7 decimal places, and William Shanks (a British mathematician, born 1300 years later than Tsu Chung-Chi) a whole life to calculate it to 707 decimal places. Shanks created and held the precision record in the following 73 years until the appearance of the computer ENIAC, which took only 40 s to achieve the same precision and found that the 528th place was wrong in Shanks’ calculation [2]. So ENIAC brought the whole world a new world—the Information Revolution came and ENIAC was just the very beginning, marking the advent of computer era. The 1st generation computers were usually called vacuum tube computers, and gave birth to the information technology. Vacuum tube computers had played an inharmonious role in the social community, since they consumed much of the precious power energy belonging to the inhabitants nearby. In 1956, they were replaced by the 2nd generation computers, which were made up of transistors, had higher operation speeds and less volumes, and could be accommodated in one room compared with a plant building for the older generation. Despite of these advantages, they were only applied in the military and government due to high cost of transistors until 1963. In parallel with the development and evolution of the 2nd generation computers during the 1950s, substantial efforts were put into the 3rd generation research. In 1958, three transistors were integrated into a single silicon chip, giving birth to the © Springer Nature Singapore Pte Ltd. 2020 F. Zhang, High-speed Serial Buses in Embedded Systems, https://doi.org/10.1007/978-981-15-1868-3_1
1
2
1 History and Development of Bus
integrated circuit (IC) industry. Then the IC was firstly used for computer by IBM with its product named IBM360. With the IC used, the space required was again reduced from the size of a room to a desk. The computers produced by 1971 were called the 3rd generation computers, featuring medium- and small-scale ICs, while those produced after 1971 were called the 4th generation computers, characterized by mega-scale ICs. On August 12, 1981, IBM released a new computer called 5150 series, then the Personal Computer era came and computer became an everyday commodity [3]. The history of the four generations of computers is shown in Fig. 1.1.
(a)
(b)
(c)
(d)
Fig. 1.1 Four generations of computers. a 1st generation computers (vacuum tubes) [3], b 2nd generation computers (transistors), c 3rd generation computers (medium and small-scale ICs) [3], d 4th generation computers (mega-scale ICs) [4]
1.1 Appearance and Definition of Bus
3
The 1st and 2nd generation computers were mainly used for the professional and complex calculations. Apart from the need of huge amounts of vacuum tubes or transistors, these computers also required many welders, since the data transmission in these computers was accomplished by wires. With the IC’s rapid development later, the data transmission between IC chips on a PCB became a capability and served as a substitute for wires. IBM realized that it was a must to establish a standard that all the manufactories of CPU and other peripherals should stand by, because the data transmission rate between CPU and peripheral slaves became rapider with smaller but powerful ASICs. In 1981, IBM addressed its representative computer PC/XT, specifying a data transmission mechanism between CPU and its peripheral slaves. The mechanism was called PC/XT bus or PC bus. The major characteristic of PC/XT bus was that: data width 8 bit, address width 20 bit, with CPU as the only master and all the other peripherals as the slaves, including the DMA and coprocessor [5]. The PC/XT bus is an open architecture with several slots in the motherboard that can be used for connecting the peripheral I/O to CPU. Because of the cheapness, robustness, simpleness and a good compatibility of PC/XT bus, many manufactories became a part member the PC/XT family. So the IBM bus standard PC/XT played the No. 1 role in the computer market, and became the industrial standard in fact. Based on the bus conception of IBM, we can make the definition of bus as follows: In computer architecture, a bus is a communication system that transfers data between components inside a computer, or between computers. This expression covers all related hardware components (wire, optical fiber, etc.) and software, including communication protocols (www.wikipedia.com).
A typical computer bus structure is shown in Fig. 1.2, involving FSB, ISA, AGP, PCI, SCSI, ATA, USB and other buses to connect the CPU, chipset, South Bridge, North Bridge, hard disks, video card, sound card and other functional devices together. The introduction of bus into the computer has brought about obvious advantages: • simplifying system structure, benefiting system expansion. • easier for upgrading. Upgrading one module generally does not affect the other functions in the system slots. • easy for diagnosis and repairment.
1.2 Progress of Bus in PC It is the development of ultra large scale integrated circuit computers that contributed to the emergence and the evolution of the bus. The bus has always been associated with the development of PC dependently in the Twentieth Century. The story starts from the ISA bus.
4
1 History and Development of Bus
Fig. 1.2 A typical bus structure in PC
1.2.1 ISA Three years later after the PC/XT bus models are launched, IBM launched the 16 PC named PC/AT (Personal Computer Advanced Technology) in 1984 based on the latest Intel 80286 processor, and introduced the PC/AT bus based on the PC/XT bus concept to meet the 8/16-bit data bus requirements. Also at this time, the PC industry had begun to take shape, and IBM allowed third-party vendors to supply compatible products, so various manufacturers developed IBM PC-compatible peripherals and PC/AT bus became the default standard. Based on the IBM PC/AT bus specification, an industry standard was gradually established, named the Industrial Standard Architecture (ISA) [6, 11]. The IEEE formally drafted the ISA bus standard in 1987. The ISA-based computer architecture is shown in Fig. 1.3. ISA bus had been running through the 286 and 386SX era, with the maximum transfer rate of 16 MBps, which is more than enough for the 16-bit X86 system. ISA did not encounter any trouble until the 32-bit 386DX processor appeared. ISA bus with 16 bit data width became a problem. The over-low transfer rate, high CPU Fig. 1.3 ISA-based PC structure
1.2 Progress of Bus in PC
5
occupancy rate and hardware interrupt resource occupancy severely constrained the performance of processors. In 1988, Compaq, HP and other nine manufacturers expand ISA to 32 bit, which was the famous EISA (Extended ISA) bus. The EISA bus operates at a frequency of up to 8 MHz and is fully compatible with the 8/16 bit ISA bus, doubling the bandwidth to 32 MBps due to the 32 bit bus, but EISA did not repeat the successes of ISA. Because of high cost, before becoming a standard bus, it was replaced by another new generation of bus—PCI (Peripheral Component Interconnect Local) bus [7] in the early 1990s. However the EISA bus did not disappear fast, it had been coexisting with the PCI bus in the computer for a long time. An ISA/PCI bus coexistence motherboard is shown in Fig. 1.4. It was until 2000, i.e. 19 years after the proposal of ISA standard, that EISA officially withdrew.
1.2.2 PCI/PCI-X CPU had gotten a rapid development since 1992, and then Intel introduced the 80486 processor. At that time, the operation rate of CPU had been significantly higher than the bus. ISA and EISA obviously became a bottleneck in the transmission between CPU and peripherals. Because hard disks, graphics cards and other peripherals could only send and receive data through a slow and narrow ISA/EISA channel, the low speed of the bus was not synchronized with the high speed CPU, and the whole performance was seriously affected. For solving this problem, Intel launched a new technology in 1993—PCI (Peripheral Component Interconnect Local Bus), which was quickly recognized and became a new industry standard. In terms of the structure of PCI local bus, one level of bus is added between the ISA bus and CPU bus to coordinate data transmission and provide bus interface. With the new bus structure, some high-speed peripherals, such as network adapter cards and disk controllers, can be unloaded from the ISA bus and then connected to the CPU bus directly through the PCI bus to match the transmission rate friendly, meanwhile the low-speed devices achieve interaction with the PCI bus through the traditional ISA bus controller. Fig. 1.4 ISA/PCI bus coexistence motherboard
6
1 History and Development of Bus
Figure 1.5 shows the internal structure of the PCI—based computer. The initial version of the PCI bus, operating at 33 MHz frequency, with data width of 32 bit and transmission bandwidth up to 133 MBps, well meets the transmission rate requirements in computer system, providing a performance much higher than the ISA and EISA bus. The PCI local bus allows up to 10 PCI add-on cards to be installed on a single computer, and allows the ISA and EISA expansion controller cards to be installed simultaneously, to achieve a better system bus synchronization. This approach not only meets the requirements of peripherals on interface and bus, but also provides a considerable flexibility, which contributes to today’s design philosophy. Subsequent PCI versions can use 32-bit or 64-bit (depending on the device) to exchange data with the system CPU. The PCI bus has some unique advantages over the previous bus. (1) High-performance multimedia technology. High-performance graphics, video, and network require a high operation rate processor. PCI local bus provides a wide bandwidth, allowing these applications to operate quite smoothly. (2) Wide compatibility. Unlike any other bus standard, any device that follows the PCI bus protocol can be used in PCI systems, regardless of the type of processor. Therefore, the PCI local bus provides a wider bandwidth and faster access to the CPU, effectively overcoming the bottleneck of data transmission. For a long period of time, the PCI local bus interface had been the preferred interface for many adapters, such as network adapters, built-in MODEM cards, and sound adapters. Most motherboards have PCI slots. One year after of PCI specification release, Intel immediately put forward the 64-bit/33 MHz PCI bus version with 266 MBps bandwidth, primarily used for enterprise servers and workstations. In these areas, the requirements for bus performance become higher day by day. The bandwidth of 64-bit/33 MHz specification of the PCI soon became insufficient again, which was increased to 66 MHz later. With the X86 server markets continuing to expand, the 64 bit/66 MHz specification of the PCI bus undoubtedly became the standard for the servers/workstations containing SCSI (Small Computer System Interface) card, RAID (Redundant Array of Independent
Fig. 1.5 PCI-based PC structure
1.2 Progress of Bus in PC
7
Disks) card, Gigabit Ethernet and other devices. Even today the 64bit/66 MHz PCI specification is widely used. In 1999, PCI-SIG (Peripheral Component Interconnect Special Interest Group) launched PCIX (PCI eXtensions) [8] bus based on the inheritance and development of PCI bus. PCIX bus operation adopts separate transaction processing, eliminating the waiting state and greatly improving the utilization ratio of the bus. The bandwidth of PCIX was up to 1.066 GBps (133 MHz/64 bit). PCIX is the product during the transition from shared bus architecture to switched bus architecture. PCIX bus architecture has been used to improve the I/O throughput in high performance server field. A PCIX bus interface on the motherboard is shown in Fig. 1.6. PCIX bus frequency was then increased to 133 MHz. In the year of 2002, PCI-SIG launched PCIX 2.0 (PCIX 266) version, supporting DDR (Double Date Rate) mode, in which data was transmitted at both the rising and falling edges of clock, making PCIX theoretic bandwidth increased to 2.1 GBps. The development process of ISA/PCI/PCIX bus is shown in Table 1.1. Encouraged by the success of PCIX 2.0, the PCI-SIG organization announced in November 2002 that it would develop the PCIX 3.0 standard, which is called PCIX 1066. It is reported that the standard would clock at a frequency of 1066 MHz, sharing bandwidth of up to 8.4 GBps, with each peripheral device having at least 1.06 GBps bandwidth. But it was a pity that the plan came to an end before execution, and there was no further plan for PCIX, since synchronization for the 64-bit parallel data was extremely difficult at the frequency of 1066 MHz. Once again it proved that improvement alone was not enough; sometimes there must be a revolutionary change for solving the problem, just like the replacement of ISA bus by PCI bus and then the replacement of PCI by PCIE.
Fig. 1.6 PCIX becomes server’s major standard
8
1 History and Development of Bus
Table 1.1 Comparison among ISA/PCI/PCIX Standard
Launched time/year
Data width/bit
Clock frequency/MHz
Transfer Rate/MBps
Peripherals supported
PC/XT
1981
8
ISA(AT)
1984
8/16
8
16
>12
EISA
1988
32
8.3
33
>12
PCI
1993
32/64
33/66
133 (32 bit@33 MHz)
PCIX 1.0
1999
64
133
1066
1–2
PCIX 2.0
2002
64
266
2133
1–2
10
PCI/PCIX/PCIE bus has been coexisting for a long time, as shown in Fig. 1.7.
1.2.3 PCIE With new technologies and equipment emerging one after another, especially in the increasingly wide field of games and multimedia applications, PCI operating frequency and bandwidth were unable to meet the demand. In addition, PCI also had some other problems, such as IRQ sharing confliction and only supporting a limited number of devices. After 10 years of tinkering, PCI bus was unable to meet the CPU’s performance requirements, and must be replaced by a new generation of bus with higher bandwidth, wider adaptability, and greater development potential. This was the mission of PCI Express bus. In the spring of 2001, at the Intel Developer Forum (Intel Developer Forum), Intel proposed the concept of a 3GIO (Third Generation I/O Architecture) bus, which achieved high performance in a serial and high-frequency operation. The 3GIO’s design was forward-looking, aimed to replace the PCI bus and meet the requirement of PC for the next decade. The 3GIO plan got extensive responses, and later was
Fig. 1.7 PCI/PCIX/PCIE coexistence
1.2 Progress of Bus in PC
9
submitted to the PCI-SIG organization by Intel. PCI-SIG organization renamed the 3GIO to PCI Express (Express means high speed or fast) [9], and launched it in April 2002. Indeed its performance is amazing; just X1 mode can achieve 2.5 Gbps bandwidth. PCI Express also supports X2, X4, X8, and X16 modes, with a bandwidth of up to 80 Gbps in X16 mode. More importantly, PCI Express has been improved in basic architecture, completely eliminating the parallel shared structure and adopting the industry’s popular peer-to-peer serial connection. Compared with the shared parallel architecture of PCI and earlier buses, each device has its own dedicated connection in PCI Express structure. This mechanism makes it unnecessary to request bandwidth from the entire system, and increases the data transfer rate to a very high frequency which PCI cannot achieve. So a new era began. The type of PCIE connector is shown in Fig. 1.8. Following the release of PCI Express version 1.0, practical development started. In June 2004, Intel launched its chipset i915/925x fully based on PCI Express, and nVIDIA and ATI, two graphics card manufacturers, launched their own PCI ExpressX16 interface graphics card at the earliest time. PCI Express era officially arrived. In 2005, PCI Express version 2.0 was released, with the unidirectional bandwidth of X1 link reaching 5 Gbps and the total bandwidth of X16 reaching 160 Gbps. PCI Express version 3.0 standard was released in 2010, with the unidirectional bandwidth reaching 8 Gbps. On January 9, 2012, the world’s first PCI-E 3 graphics card Radeon HD 7970 came out. Until now, the replacement of PCI by PCIE is in full swing. The PCIE bus’s roadmap is shown in Table 1.2. The three revolutions of the PC bus are shown in Fig. 1.9. Refer to Chap. 7 for more details on the PCI/PCIX and PCIE bus.
Fig. 1.8 Typical PCIE module cards and system slots
10 Table 1.2 Roadmap of PCIE [9]
1 History and Development of Bus Version
Release time (Year)
Transfer rate/Gbps
Encoding type
PCIE 1.0
2002
2.5
8B/10B
PCIE 2.0
2005
5.0
8B/10B
PCIE 3.0
2010
8.0
128B/130B
Fig. 1.9 Three Revolutions of the PC Bus
1.2.4 ATA/SATA—Used for Storage The abovementioned ISA, PCI and PCIE buses are all system buses or local buses used in PC, for connecting CPU and various peripherals based on a common bus structure, while the graphics cards, disks and other high-speed peripherals need to have a dedicated bus channel. Here is a typical dedicated bus in PC just for storage purpose. With PC widely used in thousands of households, Compaq, Western Digital and several other companies saw great opportunities, and in order to share a piece of cake they jointly developed a storage interface, known as Advance Technology Attachment technology later for hard disks. The Advance Technology Attachment technology began to be applied to the desktop PC in the early 1990s, known as the famous ATA. In 1986, Compaq’s 386 PC was equipped with a hard disk based on the Western Data’s controller chip, which was the earliest record of ATA hard drives, supporting data rates up to 8.3 MBps [12]. ATA specification is a protocol family, with ATA-1 as the starting point. After years of development, ATA-5 was launched. The ATA-5 made use of two rows of 40-pin cables to connect the motherboard and hard disk interface, as shown in Fig. 1.10. When the ATA-6 protocol came, the traditional parallel interface and cables
1.2 Progress of Bus in PC
11
(a) ATA Interface on Motherboard
(b) ATA Interface on Disk
Fig. 1.10 ATA interface
Table 1.3 Roadmap of ATA standard Launched time(Year)
1986
1993
1996
1998
1999
2001
2004
Version
ATA-1
ATA-2
ATA-3
ATA-4
ATA-5
ATA-6
ATA-7
Transfer rate/MBps
8.3
16.6
16.6
33
66
100
133
could not satisfy the higher data transmission rate as the signal integration became worse. So the traditional parallel transmission method was a technical bottleneck, and there was no hope in making a breakthrough with the later ATA-7 protocol. Many manufacturers gave up on the ATA standard development except that only Maxtor launched a series of ATA-7 standard in 2004, supporting data rates up to 133 MBps. The ATA protocol went on the same way as ISA/PCI which was replaced by PCI Express, ATA was replaced by Serial ATA [13, 14]. The transfer rates of ATA family are shown in Table 1.3. In February 2000, Intel for the first time proposed the SATA technology concept at the Intel Developer Forum (IDF) and set up the Serial ATA Working Group. In August 2001, Seagate announced the Serial ATA version 1.0 at the IDF Fall 2001 conference, and the Serial ATA (SATA) specification was formally established. By changing the data transmission mechanism from parallel to serial (the principle like PCIE), the crosstalk and synchronization problems caused by parallel transmission were avoided. SATA 1.0 as the starting point supported the rate of 150 MBps, faster than 133 MBps of ATA-7, SATA 2.0 supported 3.0 Gbps, SATA 3.0 supported 6.0 Gbps, and the latest SATA 3.2 supported 16 GBps. Figure 1.11 shows the SATA protocol development. In addition to solving the crosstalk and synchronization problems in the ATA specifications, SATA has fewer pins and better error correction capability and supports hot plug, as shown in Fig. 1.12. So far, SATA has successfully replaced ATA, and quickly become the primary mechanism in storage market. ATA/SATA is primarily used in hard disk-based data storage field, mainly for individual users, while in the field of servers, SCSI/SAS standard is dominant.
12
1 History and Development of Bus
Fig. 1.11 Roadmap of SATA specification [15]
(a) SATA Inferface on Motherboard
(b) SATA Interface on Disk
Fig. 1.12 SATA interface
1.3 Progress of Bus in Embedded System 1.3.1 The Emergence of Embedded Systems Human desire is always endless, and the desire in turn promotes the development of technology and even the society. In view of the great success of bus in standardization in PC, the enterprisers consider two issues from the very beginning of the birth of the computer: • Can the computer substitute for human in some industrial control applications. So the embedded system appears. • Is it possible to use some bus specifications for embedded system only by referring to the bus protocol in PC. So the dedicated embedded system bus appears.
1.3 Progress of Bus in Embedded System
13
With the rapid development of computers, their high-speed numerical computing ability and high intelligent level arouse the interest of the control professionals. Would the computer be an alternative of human in some specific field to achieve the intelligent control of the specific system? For example, the computer can be installed on a large ship as an autopilot or turbine condition monitoring system by electrical alteration, mechanical reinforcement, and configuration of a variety of peripheral interface circuits. The properties of smallness, cheapness, and high-reliability make computer get out of the engine room quickly and embedded in a specific but different system, to achieve intelligent control since the birth of the embedded system. So, the embedded system should be defined as “a dedicated computer system embedded in a specific system”. At the beginning, people tried to modify the general computer system to achieve embedded applications in large-scale equipment reluctantly. However restricted by volume, environmental requirements and other factors, the general computer system cannot be embedded in for a large number of systems (home appliances, instrumentation, industrial unit, etc.). The general computer system must be cut to meet the minimum requirements of the object with the proper configuration of software/hardware, to meet the environmental requirements of the target system, such as physical environment (small), electrical environment (reliable) and cost (inexpensive), and to meet the reliability requirement by control. Therefore, the embedded computer systems must be developed independently, and the applications in embedded system become an important branch of modern computers, including the rapid development of ASICs. CPU in the computer corresponds to the micro-controller in embedded system and the first micro-controller was the 8048 microcontroller launched in 1976 by Intel. Motorola launched the 68HC05 at the same time. These early microcontrollers contained 256Byte RAM, 4KByte ROM, four 8-bit parallel ports, a full-duplex serial port and two 16-bit timer. Later in the early 1980s, Intel further improved the 8048, enhancing it to a series named 8051 [16], a milestone in the history of the microcontroller. So far, the 51 series microcomputers are still extensively used in a variety of products. In 1990s, embedded system technology got a broader space for development. The famous American futurist Negroponte predicted in January 1999 that 4 to 5 years later, the embedded intelligent (computer) equipment will be the greatest invention after PC and the internet. After 2005, with the development of information, intelligence and network, the application of embedded systems has been in full swing, and has now become the development trend of communication and consumer products. In the field of communications, digital technology is taking the place of analog technology. In the field of radio and television, the United States has begun to shift from analog TV to digital TV, and the Europe’s DVB (digital television broadcasting) technology has been promoted in most countries around the world. Digital audio broadcasting (DAB) has also entered the commercialization stage. It is because of the rapid development of embedded systems that smart phones affect and change the life in all aspects. All of the above products cannot be separated from the embedded system bus technology.
14
1 History and Development of Bus
The embedded system bus technology originated mainly from the buses in PC. The embedded-based PC104/CPCI/CPCIE buses are all “variants” of PC-based ISA/PCI/PCIE buses. Then later, there are some dedicated bus technologies for embedded system interconnection only, such as SRIO, JESD204, and Aurora.
1.3.2 PC104—The Embedded Version of ISA The development of embedded system is accompanied by PC development. The earliest bus in embedded system is based on the existing bus in PC and then modified to meet the embedded requirements on physical environment, electrical characteristic, cost, etc. The 1st generation buses in PC are PC/XT, PC/AT, and later the ISA, then what is the 1st generation bus in embedded system? In 1984, the ISA bus was identified as an industry standard architecture. In order to expand it to the embedded system in control applications, the industry optimized the ISA bus in electrical and mechanical specifications to achieve a small, low power state. In 1992, the RTD, AMPRO and other 12 companies engaged in embedded system development initiated the formation of the International PC/104 (also known as PC104) Association [17], which was supported by many manufacturers around the world. Then an exclusive name- PC104—was given to the ISA bus in embedded application. Since then the PC104 bus technology has developed rapidly. Synchronized with the development of the system bus in the PC, the PC104 bus kept updated, as shown in Table 1.4. PC104, like the original PC/AT bus, had been designed by a non-statutory organization, rather than by the industrial committee. It was in 1992 that the IEEE began to make PC/XT and PC/AT an international standard—IEEE P996. And PC104, the extended version from PC/AT, was defined as IEEE P996.1, known as PC-compatible embedded module standard. It can be seen that PC104 is an industrial control bus specially defined for embedded control. Since then, this standard has been accepted by a lot of embedded manufacturers, and gradually become popular. From the history of PC104 technology, we can see that PC104 bus comes from the requirement in practice, and develops with PC and gets support from many manufacturers. All these make PC104 bus naturally have a vigorous life. Up to now, Table 1.4 Roadmap of PC104 [17] Standard
Launched time
Specification
Current version
PC104
1992
ISA(AT and XT)
2.6
PC104-Plus
1997
ISA, PCI
2.3
PCI-104
2003
PCI
1.1
PCI/104-Express and PCIE/104
2008
PCI, PCIE
2.10
1.3 Progress of Bus in Embedded System
15
there have been more than 200 manufacturers engaged in the production and sale of PC104-compliant products. PC104 differs from ISA bus mainly in the following aspects: • Small size: PC104 module measures 3.6 × 3.8 inches, or 90 mm × 96 mm. • Stack connection: The bus is connected through the “pins” and “holes” instead of the backplane and slot. That is, the connection is accomplished by plugging the pins of the upper layer into the holes of the lower layer. This laminated mechanism has excellent shock resistance, as shown in Fig. 1.13. • Low power consumption: Most of drive current in PC104 is reduced to 4 mA, with energy consumption of each module being 1–2 W. In the later PC era, the embedded systems develop in a diversified flourishing manner. In the torrent of history, many embedded system bus specifications appear and compete with each other, but in terms of appeal and support from manufacturers, PC104’s position in the embedded field is unmatchable. There is always a question regarding PC104—why the name is PC104, and what does 104 mean? The PC104 specification has two versions, 8-bit and 16-bit, corresponding to the 8-bit PC bus and 16-bit PC/AT bus respectively. For the 8-bit version of PC104 specification, there are two connectors: P1 and P2. P1 has 64 pins, and P2 has 40 pins, totaling 104 pins, hence comes the name PC104. By the 1990s, the bus within the PC has changed, and ISA has been replaced by the PCI bus, exerting great impact on embedded applications, so comes Compact PCI, the embedded version of PCI.
Tops of P0/P1 consist of 2 columns of holes Bottoms of P0/P1 consist of 2 columns of pins
Fig. 1.13 PC 104 module
16
1 History and Development of Bus
1.3.3 Compact PCI—The Embedded Version of PCI Since the PCI replaced ISA bus in the PC field, the good performance of the PCI bus has affected the embedded system quickly. “Just borrowing” mechanism continued and created miracles. The PCI protocol was adjusted and put in the embedded application proposed by PICMG (PCI Industrial Computer Manufacturers Group). The embedded version of PCI was named Compact PCI (CPCI), and made the applications of PCI in PC field extend to the industrial and embedded areas. This happened in 1995, two years after the launch of the PCI specification. The goal of launching the CPCI protocol by PICMG organization is described as follows: “CompactPCI Objectives: CompactPCI is an adaptation of the Peripheral Component Interconnect (PCI) Specification 2.1 or later for industrial and/or embedded application require a more robust mechanical form factor than Desktop PCI, CompactPCI uses industry standard mechanical components and high performance connector technologies to provide and optimized system intended for rugged applications. CompactPCI provides a system that is compatible with the PCI Specification, permit low cost PCI components to be used in a mechanical form factor Suitable for rugged environments” [18]. Based on the description above, we can draw the following conclusion: CPCI technology has been reformed based on the PCI, specifically in three aspects: • Compatible with PCI specification electrically; • Adopting the highly reliable European card structure after years of practice, abandoning the traditional mechanical structure of PCI to improve the cooling conditions and anti-vibration capacity, meeting the electromagnetic compatibility requirements; • Employing 2 mm pinhole connector instead of the PCI gold-finger, with better air tightness and corrosion resistance, further improving the reliability and load capacity. Figures 1.14 and 1.15 show the comparison of structure between CPCI and PCI. The module card and system slot in CPCI are shown in Fig. 1.15. Compared with the traditional PCI, CPCI has advantages in three aspects: (1) Maintainability. As usual, it is a time expensive process to replacing a card from a traditional PC system. First screw off the bolt, second take off the chassis cover, with a possibility of removing some internal connection wires between each other. All these factors make the process error-prone and the connection wires are easily to be broken. Meanwhile in the CPCI system, the card can be pulled out easily just from the front panel and the replacement of card is very simple without removing the chassis cover. In addition, due to the I/O is connected through the backplane, it makes the replacement of cards quick and convenient, reducing the repair time from hours to minutes, thus shortening the MTTR (the average maintenance time). In terms of maintainability, the traditional PCI structure is inferior to CPCI since the latter is much simple and efficient.
1.3 Progress of Bus in Embedded System
17
Fig. 1.14 PCI module card and system slots
Fig. 1.15 CPCI module card and system slots
(2) Vibration resistance. Traditional PC cannot provide a reliable support for peripheral boards, since the peripheral card is just inserted and fixed through the connectors without regular rails at the top and bottom of the card. So the connection between the card and the slot is prone to poor contact during vibration. To overcome the shortage, the Compact PCI card is tightly connected by the pinhole connector to system slots, with its top and bottom securely fastened to the chassis by rails and the front panel fastened to the entire chassis. Since the four sides of the card are firmly fixed in their position, even in the scenario with severe and intense vibration, the connection would be kept firm integrated and perfectly. (3) Ventilation. The air flow is not smooth in the traditional industrial PC box, and cannot be cooled effectively since it is blocked by the passive backplane, various cards and hard disks, resulting in that the hot air cannot be discharged immediately out of the chassis and the cold air cannot circulate across all boards and furthermore, imposing a damage to the electronic ASICs and PCB boards by breaking wires and shortening service life. All these problems would not happen in the CPCI system. The Compact PCI system provides a smooth cooling path
18
1 History and Development of Bus
for all heated cards in the slots, so the cold air can flow freely across the cards and take away the heat. In addition, the CPCI chassis is integrated with a fan at the bottom, accelerating the cooling process. Due to the well-considered mechanism for heat dissipation, Compact PCI system rarely has cooling problems. In addition to the advantages above, CPCI has hot swap, the most prominent and attractive feature. In short, the CPCI card can be pulled out from or plugged into the system slots without interfering with system’s normal operation in power-on state. The philosophy is that the CPCI adopts three kinds of pins with different lengths so that the power supply and ground, the PCI bus signal and the hot plug start signal are in sequence when the card is inserted or removed. Hot-swap and high reliability enable CPCI technology to be widely used in communication, network, computer, intelligent control, industrial automation, data acquisition, military systems and other fields requiring real-time operations. Since CPCI has also a high bandwidth, it is applied to some high-speed data communications applications, including servers, routers, switches and so on. The CPCI has developed through multiple versions. The latest CPCI architecture specified by PICMG 3.0 based on a more open platform is conducive to the supplier of equipment for all kinds of system integrators, equipment suppliers to provide more convenient services and more cost-effective products and solutions for users. PICMG 3.0 standard is a completely new technology and quiet different from PICMG 2.x. PICMG 3.0 speeds up to 2 Tbps and is mainly used in high-bandwidth telecommunications transmission to adapt to the development of telecommunications in the future. But now PICMG 2.x is still the mainstream of CPCI, and will dominate the CPCI market for a long time. CPCI not only brings PCI protocol into the embedded and industrial applications, but also regulates the structure and size of CPCI module with the 3U and 6U standards. The dimension of 3U is 160 × 100 mm, while that of 6U is 233 × 160 mm. The introduction of this standard is quickly accepted by the vast majority of manufacturers and becomes a standard structure for the embedded design. In one word, CPCI is a PCI extension in the industrial and embedded applications. What would happen when PCI enters the domain of measurement and instrument? Another extension named PXI bus came out. PXI (PCI eXtensions for Instrumentation) [19] is a PCI-based measurement and automation specification released by NI. PXI was developed in 1997 and officially launched in 1998 as an open industry standard to meet the increasing demand for complex instrumentation systems. So far, the PXI standard has been managed and maintained by the PXI Systems Alliance (PXISA). The alliance consists of more than 60 companies to promote PXI standards, ensure PXI interchangeability, and maintain PXI specifications. PXI combines the electrical characteristics of PCI and Compact PCI’s ruggedness, modularity and European card mechanic packaging, with the purpose of bringing the benefits of desktop PC in the instrumentation field and making it a high-performance, low-cost platform for measurement and automation systems.
1.3 Progress of Bus in Embedded System
19
In short, PXI is a structure based on PCI and Compact PCI plus some PXI unique signal. PXI inherits the PCI’s electrical signal, enabling PXI to have a transmission ability as high as PCI bus with a transmission rate of up to 132–528 MBps. Also in software PXI is fully compatible with PCI. Thirdly, PXI adopts the same mechanical form as Compact PCI, with the characteristics of high-density, high-performance connectors. The architecture of PXI in measurement and instrument field is shown in Fig. 1.16. PXI demonstrates once again the great success of the PCI bus and CPCI architecture. After the launch of the PCIE bus and CPCIE architecture later, PXIE (PXI express) was also launched to continue extending the PCIE specification to the measurement and instrument field.
1.3.4 Compact PCI Express—The Embedded Version of PCI Express In 2002 PCI Express specification was successfully launched as a revolutionary identity to PCI/PCIX in PC areas. The point-to-point serial communication model was called the 3rd generation bus standard. Rapidly, the impact of PCIE in PC extended to the embedded field. PICMG (PCI Industrial Computer Manufacturers Alliance) drew on the experience of the introduction of Compact PCI, and again in June 2005 introduced the PCIE protocol into embedded and industrial applications. The new extension version of PCIE was called Compact PCIE (CPCIE) with the objective set as follows:
Fig. 1.16 PXI structure in measurement and instrument field [19]
Peripheral slot
Peripheral slot
Peripheral slot
Peripheral slot
System slot
Compact PCIE Objectives: This specification’s objective is to bring PCI Express technology to the popular PICMG 2.0 Compact PCI form factor. This specification is intended to meet the future market needs of the Compact PCI, PXI, military, and aerospace markets and defines
20
1 History and Development of Bus the connector, electrical, and mechanical requirements of 3U/6U system boards, peripheral boards, switch boards, and backplanes. [20]
From the description above, the following conclusions can be drawn. • Compact PCIE continues to use PCIE specification and has the same electrical performance as PCIE; • Compact PCIE is not only applied in the industrial and embedded fields of the traditional Compact PCI, but also will possess the military/aerospace markets. In order to achieve a smooth transition from CPCI to CPCIE, in earlier CPCIE systems, some CPCI slots were typically reserved to accommodate traditional CPCI cards, as shown in Fig. 1.17. The launch of the CPCI and CPCIE standards enables various embedded applications to have standards to follow, accelerating development of the embedded systems in modularization and standardization.
1.3.5 SRIO—The Embedded System Interconnection After the year of 2000, PCIE, the representative of the serial communication, has gradually been the mainstream in the PC market. In PC architecture, usually there is only one central processing unit CPU, and all the rest are peripherals, that is, one master device plus several slaves. While in the embedded system, there are usually multiple main processors needing to be interconnected, such as PowerPC, FPGA and DSP, and it is difficult to distinguish the master-slave relationship. So CPCIE became less capable, without support to peer-to-peer communication, routing, broadcast and multicast. The industry was in urgent need of a bus for interconnection of multiple main processors such as PowerPC, FPGA and DSP. Then the SRIO came out. When the embedded market surpasses the traditional PC market, everyone wanted to enter the gratifying growth market. Various organizations put forward a variety of
Fig. 1.17 CPCIE module card and system slot
1.3 Progress of Bus in Embedded System
21
bus standards, such as QuickRing, Futurebus, Fastbus, Rocket, Lights and starfabric. The future implied was extremely beautiful, and in this case RapidIO came out. In 2000, Motorola and Mercury proposed a bus architecture as a replacement for the low-speed parallel bus (Local Bus) in their own PowerPC processors [21]. In February, the RapidIO Trade Association was formed, with members mainly including telecommunications/storage OEMs (OEMs) and FPGA/processor/switch fabric chip manufacturers. The goal of the association was to provide a low latency, packet-based switching and distribution structure supporting messages and reading and writing operations, having fault tolerance and flow control mechanism, featuring high efficiency and low power consumption, and supporting thousands of nodes. In 2001 the RapidIO Trade Association launched the RapidIO version 1.1. Initially, the parallel LVDS was adopted in the physical layer of RapidIO Version 1.1, and the version using serial communications in the physical layer was launched in December 2001, known as SRIO (Serial RapidIO). SRIO can be applied in wireless communications, image processing and military project. After years of development, RapidIO 3.0 was launched in 2013 by RapidIO Trade Association. Compared with the 2.x version, RapidIO 3.0 has the following improvements [22]. • Transfer rate is up to 10.3125 Gbps; • Support two run modes, with short run capable of spanning at least 20 cm of PCB material with one connector and long run capable of spanning at least 100 cm of PCB material with two connectors based on Ethernet 10GBASE-KR electrical characteristics; • A 64B/67B encoding scheme is added limit the coding overhead and increase the bandwidth efficiency; • Support asymmetric operation, allowing the directional bandwidth of a link and the power consumption of the ports connected by the link is tailored to the performance requirements for the link. • Extend the Device ID to 32 bits for increasing the number of nodes in the system; • The packet exchange strategy is optimized. The roadmap of RapidIO is shown in Fig. 1.18 and a comparison between RapidIO and other buses on interconnection level is shown in Fig. 1.19. In the PC field, PCIE occupies the traditional PC bus market, and CPCIE takes up the embedded market, but it is SRIO that is just designed for the multi-processor interconnection. SRIO can provide Chip To Chip, Board To Board, and Chassis To Chassis interconnection. The topology of SRIO is more flexible, supporting topology of tree, star, mesh and so on. SRIO supports peer-to-peer communication, with routing and broadcast capabilities, and provides the embedded system a scalable, reliable and efficient transmission. A detailed comparison between RapidIO and PCIE is shown in Table 1.5. Refer to Chap. 4 for more information about SRIO. You will find that the varieties of bus previously mentioned are all part of the digital circuit field. Would the communication mechanism based on point-to-point
22
1 History and Development of Bus
RapidIO Roadmap
Today
Legacy RapidIO 1.0 Parallel-LVDS Serial-XAUI Multi-processor DSP Cards Compact PCI Backplanes
Future
RapidIO 1.2,1.3,2.3 Serial-XAUI,CES 1.25,2.5,3.125GBaud 5,6.25GBaud 1X,2X,4X,8X,16X
RapidIO 10xN
RapidIO 25xN
Serial-10G
IEEE-25G 25GBaud
10-20GBaud 4X backplanes 1.25GxN to 6.25xN
40GBaud to 160GBaud
10GBaud
25GxN
10GxN
25GBaud to 400GBaud backplanes
backplanes
ATCA,VME VPX,XSX Key Applications RapidIO 1.2 and Beyond
Military Medical 2.5G Wireless WiMax Modem SDRadio
DSP and processor Farms Wirless 3G,WiMax, 4G and future 5G Wideo servers,IPTV,HDTV,Media Gateways microTCA,AMC,PMC
VME,VSX,VPX systems High Performance Computing Storage/Server Systems
Fig. 1.18 Roadmap of RapidIO [22]
Fig. 1.19 Comparison between RapidIO and other bus standards on interconnection level [23]
with high-speed in the digital circuit field affect the analog circuits? If so, which interface will bear the brunt? It is the interface between ADC (analog to digital converter), DAC (digital to analog converter) and logic devices such as FPGA. In 2006, JEDEC organization timely launched a serial communication protocol called JESD204 [27] for the interfaces between ADC, DAC and logic devices.
1.3 Progress of Bus in Embedded System
23
Table 1.5 Comparison between SRIO and PCIE [22, 24–26] PCIE 2.0
PCIE 3.0
SRIO 2.0
SRIO 3.0
Channels-supported
1, 2, 4+
1, 2, 4+
1, 2, 4+
1, 2, 4+
Transfer rate
5 GHz
8 GHz
6.25 GHz
10.3125 GHz
Encoder
8b/10b + scramble
128b/130b + scramble
8b/10b + scramble
64b/67b + scramble
X4 link bandwidth/Gbps
20
32
25
40
Delay
ms
ms
ms
ms
1.3.6 JESD204—Solving the ADC, DAC Data Transfer Problem In the PC and embedded areas, engineers recognized that the data width of parallel bus was limited, and further improving bandwidth for parallel bus just by increasing the bus frequency was impossible actually, since the possibility existed only in theory, limited by the manufacture of industry. The research orientation shifted from the parallel to the serial bus with SRIO PCIE bus as the representative and taking the place of parallel Local Bus and PCI bus. In the embedded ADC and DAC applications, such problems are also encountered. Usually ADC or DAC contains 12–16-bit data lines, and the routing lines in PCB are strictly required to have the same length so that the data could be captured by the same clock. When the sample frequency becomes high, the synchronization between the data bits is more difficult and sensitive to the mismatch in length. With a reference to the PCIE/SRIO and other serial communication bus protocol based on the transmission mechanism of data package (data frame), an international organization JEDEC launched the JESD204 protocol in 2006. JESD204 protocol adopts CML (Current Mode Logic) level, with a pair of differential signals instead of the original 12–16-bit parallel data signals, functioning as a serial communication interface between ADC, DAC and logic devices, supporting up to 3.125 Gbps. In January 2012, JESD204 protocol was upgraded to the JESD204 version B.01, with a maximum transfer rate of 12.5 Gbps based on only one pair of differential signals [28]. A comparison between LVDS and JESD204 of various versions is shown in Table 1.6. The introduction of the JESD204 protocol significantly reduces the number of pins required for logic devices. The reduction in pins makes the PCB routing simpler, the ADC/DAC devices smaller and power consumption lower. Table 1.7 shows the number of pins required by a variety of ADC and DAC devices. Thanks to the superior performance of JESD204 protocol, a number of manufacturers launched ADC and DAC devices complying with the JESD204 protocol, therefore its market share continued to expand. Refer to Chap. 3 for more information about JESD204 and how the IP is designed.
24
1 History and Development of Bus
Table 1.6 Comparison among various ADC/DAC interfaces [27] Attributes
LVDS
JESD204
JESD204A
JESD204B
Launched time
2001
2006
2008
2011
Transfer rate/Gbps
1.0
3.125
3.125
12.5
Multi-channel supportive
No
No
Yes
Yes
Link synchronization
No
Yes
Yes
Yes
Multi-channel synchronization
No
Yes
Yes
Yes
Deterministic latency
No
No
No
Yes
Table 1.7 Pin count comparison [27] Number of channels
Resolution
CMOS pin count
LVDS pin count (DDR)
CML pin count (JESD204B)
1
12
13
7
4
2
12
26
14
4
4
12
52
28
6
8
12
104
56
6
1
14
15
8
4
2
14
30
16
4
4
14
60
32
6
8
14
120
64
6
1
16
17
9
4
2
16
34
18
4
4
16
68
36
6
8
16
136
72
6
1.3.7 FC—A Combination of Channel I/O and Network I/O The demand for high-speed serial communication exists everywhere. SRIO is mainly used to solve the interconnection between embedded processors such as FPGA/DSP/PowerPC, JESD204 is for the interface between ADC, DAC and FPGA, and SATA is primarily used to solve the problem of storage for PC. Is there a demand for high-speed serial communication technology for Storage Area Network (SAN) in solving the problem of storage for enterprises? Yes, SAN also has a demand for high-speed serial communication technology to replace the parallel SCSI, so the FC protocol comes out with a long-awaited wish. FC (Fiber Channel) standard was proposed by ANSI (American National Standards Institute) in 1988 [29, 30], mainly for solving the enterprise level SAN (Storage Area Network) storage problems, in addition to functioning as a backbone network technology for IP, audio streaming and other applications that require high-speed data transmission. Through the long-term efforts of many companies, Fiber Channel
1.3 Progress of Bus in Embedded System
25
technology has been mature, able to transmit data at high speed, with characteristics including high bandwidth, scalability, low latency and long transmission distance. The FC standard combines the advantages of both channel technology and network technology. Channel technology is designed to transfer data quickly between caches of the equipment without the need of lots of logic operations, belonging to a hardwareintensive technology, such as the SRIO. Network technology is a software-intensive technology, with the ability to handle large amount of nodes, and packets at one node can be routed to another node from domains. The typical application of network technology is SAN. For the expansion of FC technology to the field of aircraft control, data acquisition, signal processing, data distribution and sensor/video signal transmission in the military, ANSI launched a draft of FC-AE (Fiber Channel Avionics Environment) standard in 2002. The FC-AE draft has been constantly improved in recent years. In addition to the FC-AE protocol, ANSI has also developed a variety of protocols such as FC-AE-ASM (ANONYMOUS SUBSCRIBER MESSAGING), FCAE-RDMA (SCSI-3 REMOTE DIRECT MEMORY ACCESS) and FC-AE-1553 for the application in avionics environment. The FC standards support the data transmission rates of 100, 200, 400, 800, 1600 and 3200 MBps, which refer to the valid data bandwidth. Considering the overhead of encoding, the equivalent line rates in the physical channel are 1.0625, 2.125, 4.25, 8.5, 14.025 and 28.05 Gbps respectively. Table 1.8 shows the roadmap of FC. Table 1.8 Roadmap of FC standard [30] Version
Throughput/MBps
Equivalent line rate/GBaud
Spec technically completed (Year)
Market availability (Year)
1GFC
200
1.0625
1996
1997
2GFC
400
2.125
2000
2001
4GFC
800
4.25
2003
2005
8GFC
1600
8.5
2006
2008
16GFC
3200
14.025
2009
2011
32GFC
6400
28.05
2013
2016
128GFC
25600
4 × 28.05
2014
2016
64GFC
12800
56.1
2017
2019
256GFC
51200
4 × 56.1
2017
2019
128GFC
25600
TBD
2020
Market demand
256GFC
51200
TBD
2023
Market demand
512GFC
102400
TBD
2026
Market demand
1TFC
204800
TBD
2029
Market demand
26
1 History and Development of Bus
1.3.8 VPX—An Integration Architecture of High-Speed Serial Bus So far, a variety of serial buses have been mentioned in this chapter such as PCIE, CPCIE, SRIO, JESD204, FC and SATA for storage applications. These buses are all for some specific applications. Is it possible to combine several or all of these buses together in specific scenarios? If so, how to establish the interconnection and what rule should be complied with? It is the responsibility and mission of VPX. Tracing back to 1981 when IBM launched the PC/XT protocol, Motorola introduced a 32-bit bus standard called the Versa bus based on its own 68000 series microcontrollers. The Versa bus was later renamed VME (Versa Module Eurocard) [31], and was accepted as a standard by IEEE in 1987 in view of its wide application. VME was coded as IEEE 1014 and then maintained by VITA (VMEbus International Trade Association). VME was also a parallel transmission bus technology, with the initial transmission rate of 40 MBps. With the development of the entire electronics industry and in response to the competition from Intel, a whole life competitor against IBM, VME made constant upgrading. VME64 was launched in 1995 with a bandwidth of 80 MBps, VME64x was launched in 1997 with the bandwidth increased to 160 MBps, followed by VME160 and VME320 upgrading bandwidth further. Another competitor against VME was the PCI series, including PCI, PCIX, PCIE, CPCI and CPCIE. In order to protect the application of VME, and also for the inheritance and continuation for VME itself, VITA organization improved and upgraded the standards constantly in the following 25 years since the VME was launched. But a small amount of modification could not meet the demand of market, and finally a thorough reform was achieved in 2007. The objective was to make a sharp revolution to the serial transmission mechanism from the parallel through increasing the backplane bandwidth, improving the I/O capacity and adopting a different electrical definition while retaining the original user—selectable VME signal. The new architecture was completely different from the original VME standard, which was the most important improvement for VME. The new architecture replaced VME finally. The new bus standard is called VITA46 [32], consisting of VITA46.0 and VITA46.1. Since then, VITA need not confront the PCI series, because it incorporated PCI series into the VITA46 specification. With the feature of PCI support, VITA is also known as VPX (VME and PCI eXtensions). The VPX bus architecture is once again rampant in the field of defense and airspace, such as radar, sonar, video image processing and intelligent signal processing, and other applications. After the successful launch of VITA46, VITA made a persistent effort to introduce a number of VITA standards as shown in Fig. 1.20. VME was original adopted in the field of servers, followed by wide applications in industrial control, and quickly extended to the entire embedded system. The success of VPX itself is a powerful proof of the rapid development of embedded system. VPX
1.3 Progress of Bus in Embedded System
27
Fig. 1.20 Roadmap of VITA specifications [33]
bus, originating from the computer and breaking up the boundaries, has become one of the mainstream architectures of embedded systems. From the point of view of application, the early VME is just a bus, involving the physical interface and protocol layer, while VPX is a structure in essence, more focusing on the physical interface and supporting a variety of bus protocols, such as PCIE, SRIO, Ethernet, FC and Infiniband. So far, this chapter has summarized the commonly used high-speed serial bus technologies in PC and embedded field. The following section will make an analysis on the success of the high-speed serial bus technologies.
1.4 Analysis of the Three Evolutions of Bus As mentioned above, three revolutions had taken place in the bus field, with the first revolution marked by the launch of PC/XT (after an upgrade to ISA) by IBM in 1981 from scratch; the second revolution symbolized by the launch of PCI (including PCIX) in 1993, with a rapid upgrade of data transfer rate from ISA 32 MBps to PCIX 2133 MBps; and the third revolution represented by PCIE bus in 2002, with two differential signals (TX±, RX±) instead of the traditional parallel data transmission adopted by ISA and PCI, achieving a much higher data transmission rate based on the
28
1 History and Development of Bus
Table 1.9 Three evolutions of bus Generation
Bus standard
1st generation
PC/XT, PC/AT, ISA, EISA, PC104
2nd generation
PCI, CPCI, VME, PXI
3rd generation
PCIE, CPCIE, PXIE, SRIO, JESD204, SATA, SAS, FC, Infiniband, Aurola, VPX
point-to-point architecture. The other buses mentioned in this chapter were derived from or generated based on ISA/PCI/PCIE. The three generations of bus are shown in Table 1.9. Let me make a post-event analysis of the three revolutions. (1) The first revolution: The original generation of specification for bus. In 1979 Intel introduced the 8088 processor, with a data width of 8 bit and a memory address depth of 1 MB. It was IBM that introduced this chip into the market through its large number of PC computers to the families. Huge market triggered a tsunami. Other manufacturers provided peripheral products for IBM’s PC according to IBM’s interface requirements, and the requirements known as PC/XT became the default standard for the entire PC industry. The various peripherals were directly connected to the CPU via PC/XT bus and many other bus versions were derived from PC/XT, such as PC/AT/ISA/EISA/PC104. Although the data bandwidth was enlarged and frequency increased, the connection relationship between peripherals and CPU remained the same, as shown in Fig. 1.21. The first generation bus standard came into being. It was actually developed by IBM because of its large PC shipments. (2) The second revolution: The change of bus architecture. As shown in Fig. 1.21, the various peripherals were connected with the CPU directly via the ISA bus, sharing all the address and data signals. With the increase of the number of peripherals connected, there were more differences and more difficulties that could not be solved by ISA. For example, you wanted to access the hard disk quickly, but you had to wait for a long time until the low-speed scanner handed over the bus control to the high-speed hard disk. As there were more and more types of peripherals, the waiting time was longer and longer, which was noticed by Intel. Intel made a final decision to solve the speed mismatch among the high-speed CPU, high-speed peripherals and low-speed peripherals by adding Fig. 1.21 ISA-based PC structure
1.4 Analysis of the Three Evolutions of Bus
29
a bridge as shown in Fig. 1.22. The bridge was later called PCI, liberating CPU from the low-speed peripherals with a mechanism by which high-speed peripherals were connected to CPU from PCI bridge and low-speed peripherals were connected to ISA and then to PCI. PCI protocol was proposed by Intel and released by the organization PCISIG. PCI was a representative of the second bus generation. The derived versions PCIX/PXI/CPCI and other buses remained the same structure as shown in Fig. 1.22. (3) The third revolution: the change in transmission mechanism. After the second revolution, CPU was liberated from the low-speed devices and got developed according to Moore’s Law. By 1999, PCI/PCIX had become the bottleneck of the whole system in the field of PC and even the entire embedded market. The idea of increasing the data bandwidth of the parallel bus by increasing the data width or bus frequency (ATA/PCI bus up to 133 MHz) was impractical, because the synchronization between parallel data was much more difficult, as shown in Fig. 1.23. Would the opposite way, i.e. reducing the width of parallel data to 1 bit in a serial bit stream with a specific data format, be better? (Fig. 1.24). This
Fig. 1.22 PCI-based PC structure
Fig. 1.23 Dis-alignment in parallel data [15]
30
1 History and Development of Bus
Fig. 1.24 Frame format in serial bit stream [15]
method could avoid the synchronization problems occurring in parallel easily. And based on this idea, Intel introduced the concept of the Third Generation I/O Architecture (3GIO) in 2001 with the feature of high performance and high frequency in serial bit stream, and submitted it to PCI-SIG. PCI-SIG renamed the 3GIO technology to PCIE and released the new version standard in 2002. Since then, the serial bus has developed vary rapidly, and a variety of serial bus standards have been introduced, such as SRIO, SATA, SAS, Aurora, VPX and FC. The third revolution began with PCIE as a start symbol, involving a wide range of various organizations. A variety of organizations put forward different bus standards, such as PCIE, CPCIE, PXIe, SRIO, Aurora, SATA, SAS and other protocols. All these high-speed serial transmission protocols have the same foundation, based on a LVDS (Low-Voltage Differential Signaling) physical layer. Before PCIE, data was transmitted in parallel in the type of single-ended signal. The electrical level has been reduced from 12 to 3.3, 2.5 and 1.8 V, with the level type including TTL, LVTTL, LVCMOS_25 and LVCMOS_18. The decline in electrical level reduced power consumption, but the corresponding anti-interference capability also reduced. Is there an optimized way to achieve low power consumption and high anti-interference simultaneously? In November 1995, a seemingly insignificant thing happened. A new electrical level standard was launched. NS (National Semiconductor) proposed ANSI/TIA/EIA-644 standard and later IEEE published the IEEE 1596.3 standard In March 1996. The two standards proposed a new differential level—LVDS (Low Voltage Differential Signaling) [34, 35], and standardized the interface, electrical characteristics, interconnection and line termination for the new differential level. A comparison between single-ended signal and differential signal is shown in Fig. 1.25.
1.4 Analysis of the Three Evolutions of Bus
31
Fig. 1.25 Single-ended signal and differential signal levels
In Fig. 1.25, the single-ended signal is represented by the logic “1” and “0”, and the differential signal is obtained by subtraction of the two opposite waveforms VA mode level of the LVDS and VB based on the common mode voltage VCM. The common level is typically 1.2 V with a differential swing of ±350 mV. The advantages of adopting differential level transmission are obvious. First, it reduces the level of amplitude and power consumption. Second, differential signals improve anti-interference ability, so the bit error rate is lower and the transmission rate is higher. The LVDS standard recommends a maximum data transfer rate of 655 Mbps, and theoretically the maximum transfer rate can reach 1.923 Gbps on transmission line without attenuation. LVDS is only one typical representative of the differential signals, and there are some other differential signals, such as CML and LVPECL. CML level has a transfer rate of 300 Mbps–3.125 Gbps, and LVPECL is usually used for clock signals. In view of the huge advantages of the differential signal of LVDS, Intel adopted this standard and proposed a differential-based PCIE bus in 2001, five years after the introduction of the LVDS level standard. The third bus revolution, which was generally considered to be caused by the PCIE bus, has differential levels such as LVDS/CML as a foundation.
1.5 Common Attributes in High-Speed Serial Buses So far, there have been a variety of high-speed serial buses, although the bus protocols are different. Compared with the traditional parallel data buses, the high-speed serial buses have several advantages as below: (1) Higher data transfer rate. Take the parallel ATA bus for storage as an example, the data transfer rate is up to 100 MBps in real products, while the serial communication structure of SATA (Serial ATA) has a start rate of 150 MBps, SATA-2 has 300 MBps and SATA-3 600 MBps based on only a pair of differential signals. The serial bus become a success with a data transfer rate up to 10 Gbps based on only a pair of differential signals (X1 mode) while the parallel bus acts as the failure.
32
1 History and Development of Bus
(2) Better signal integrity. Crosstalk exists in the parallel data bus and cannot be removed. The synchronization process becomes more difficult as the frequency goes higher and further improves the bit error rate. In the differential signal (such as LVDS or CML level) based bus standards, there is no the chance of crosstalk at all. The differential signals are not easily affected by the common mode noise and so enhance the anti-interference capability. At the same time, the clock signal is embedded into the serial bit stream at the transmitter, and be extracted from the data stream at the receiver, by which the data is recovered. All these features contribute to a better signal integrity and lower bit error rate. (3) Beneficial to miniaturization. Compared with the communication mechanism based on serial bit stream taking use of only 2 pairs of differential signals, 1 pairs (Tx_P, Tx_N) for sending and 1 pairs (Rx_P, Rx_N) for receiving, the traditional parallel communication mechanism usually coordinates multi-bit of address signals, data signals and control signals together to complete a transfer. Taking PCIX in PC as an example, a card can be connected to the motherboard by “connecting finger” or “golden finger”, and the number of “fingers” (pins) is 184 (including Power/Ground signals), while for PCIE in embedded system, the number is 4 (excluding Power/Ground signals). The reduction in number of connectors and pins of the corresponding ASICs decreases the area required for PCB, conducive to the miniaturization. The current high-speed serial buses usually run faster at Gbps level in a single channel (X1) mode. For example, SRIO can run at 1.25G, 2.5G, 3.125G, 5.0G, 6.25G and 10.3125 Gbps. A much higher or faster speed can be reached by a mechanism called “channel bonding” technology which combines 4, 8 and 16 channels to transfer data together, with corresponding names of X4, X8 and X16 respectively. The common characteristics of the high-speed serial buses can be summarized as follows: (1) Based on the serial bit stream transmission scheme, with data in frame format starting with SOF (Start of Frame) and ending with EOF (End of Frame) to make a data boundary; (2) Containing an encoder/decoder, CRC checksum and scrambling/de-scrambling structures. The encoder can be a 8B/10B, 64B/66B, 64B/67B, or 128B/130B encoder, encoding the corresponding parallel data from 8 bit, 64 bit or 128 bit to the serial bit stream 10 bit, 66 bit, 67 bit or 130 bit. The decoder is just the opposite. The 8B/10B encoder is used by JESD204, SRIO, PCIE, Aurora and SATA. CRC (Cyclic Redundancy Check) is used to check or correct the errors during the serial transmission based on a specific rules designed by its generation polynomial. All of the JESD204, SRIO, PCIE, Aurora and FC contain the CRC unit. Scrambling is mainly used for balancing the number of “0” and “1” in the serial bit stream, avoiding the occurrence of a long string of “0” or “1”. JESD204, SRIO, PCIE and Aurora all have the scrambling/de-scrambling structure. (3) Containing a link synchronization process. The difficulty in serial communication is how to slide the data interception window and make a proper boundary, then to recover the serial bit stream into the original parallel data. This is the
1.5 Common Attributes in High-Speed Serial Buses
33
Fig. 1.26 Link synchronization by bus
Node 2
Node 1
Bus signal
Fig. 1.27 Link synchronization by discrete signal
Node 2
Node 1
Bus signal
Discrete signal
link synchronization process, essential for the serial communication. JESD204, SRIO, PCIE, Aurora, SATA, FC and other bus protocols contain the link synchronization feature. The process of link synchronization is simple. First, a string of specific characters (usually called training patterns) are send at the transmitter. Second, an interception window is slided by the receiver until making a proper boundary and restoring the original data. Third, a feedback signal is transmitted back to the transmitter and then the transmitter sends the normal data. According to the transmission types of the feedback signal, there are 3 modes in the link synchronization process. • Feedback signal is transferred via the differential signals as shown in Fig. 1.26, adopted by SRIO, PCIE, SATA and FC. • Feedback signal is transferred via the discrete single-ended signal as shown in Fig. 1.27, adopted by JESD204. • No feedback signal. The communication partners complete their own synchronization process according to some specific order. The simplex mode of Aurora uses the way of no feedback.
1.6 The Development Trend of High-Speed Serial Bus in Embedded System As mentioned above, there have been three revolutions in the bus field starting with PC/XT as the first bus protocol since 1981 in PC. After more than 30 years of development, the high-speed serial buses based on LVDS (including LVDS, CML
34
1 History and Development of Bus
and LVPECL) have make themselves popular both in PC and embedded system applications. And now the market is shared by a variety of serial bus standards. What will be the next-step of the bus? And what will be the fuse of the fourth revolution? Let’s continue.
1.6.1 Speed Upgrades Constantly Taking SRIO as an example, SRIO 3.0 was launched in 2013 by the RapidIO Trade Association, and a modified version was launched in September 2014, upgrading the speed to 10.312 Gbps. The speed of 10.3125 Gbps is too high for actual applications where the speed of 1.25G/2.5G/3.125 Gbps is often used. Restricted by the manufacture industry, a special attention should be paid if the speed is higher than 5 Gbps. Now the RapidIO Trade Association has a plan of raising the speed of SRIO for the next generation to 25G–400 Gbps. It will be a long process from concept to practice. At the same time, the other organizations are also active in the speed promotion. For example, PCIE 3.0 launched by PCI-SIG is able to run at 8.0 Gbps, and PCIE 4.0 will also be available in 2017, promoting the speed up to 16 Gbps; the JESD204 protocol launched by JEDEC also made a lift from the initial 3.125 Gbps to 12.5 Gbps; and the SATA 3.2 launched in August 2013 updated the rate to 16 Gbps. With joint efforts of various organizations and vendors, the transmission bandwidth of various serial buses will be further updated to meet the requirement for current and next generation communications.
1.6.2 Adoption of Multiple Signal Levels According to Shannon’s information theory, the greater the uncertainty, the larger the amount of information. Compared to the currently used 2-order levels, 4-order level transmission method will double the bandwidth in the communication, carrying double message. 4-order level transmission method becomes a new breakthrough in the telecommunications, storage and network transmission fields in the case that the UI (Unit Interval) is too short to be shortened further in the 2-order level transmission. The 4-order level transmission is called PAM4 (Pulse Amplitude Modulation 4) and has been adopted by IEEE 802.3 (100 Gbps Ethernet) and T11 organization (32 Gbps FC fiber) standard. A comparison between the 2-order and 4-order level transmission mode is shown in Fig. 1.28. The 4-order levels have four signal levels—0 (0b00), 1 (0b01), 2 (0b10) and 3 (0b11). Each level has 2 bit, carrying 2 times amount of information of the 2-order signals. With a channel bonding technology, a much higher bandwidth can be achieved. The 100 Gbps Ethernet standard is implemented with four 25 Gbps channels, that is
1.6 The Development Trend of High-Speed Serial …
35
Eye
Code
Fig. 1.28 2-Order and 4-order level transmissions
in X4 mode, while the higher 400 Gbps is considered to be implemented with eight 56 Gbps links, that is in X8 mode. PAM4 ignites the inspiration, PAM8 and 16QAM and others those are often used for the wireless communication can be referred to.
1.6.3 Laser Communication and Its Miniaturization The buses cited above are all based on the electrical levels. Taking SRIO as an example, the UI (Unit Interval) is 400 ps when the transfer rate is 2.5 Gbps and is reduced to 100 ps when the transfer rate is 10.3125 Gbps and is reduced further to 40 ps when the transfer rate reaches 25 Gbps. What will happen in a duration time slot of 40 ps? The light just travels a distance of 0.12 mm. The upgrade in transfer rate of the serial bus brings three problems. First, as shown in Fig. 1.29, when the UI is reduced, the length of the eye diagram becomes shorter, and the data setup and hold time cannot be reduced proportionally, so the restoration of parallel data becomes much more difficult, easy to cause more error bits, thus increasing the BER (bit error rate). Second, the faster the bus runs, the more rapidly it attenuates. And the attenuation will cause that the width of the eye diagram becomes narrower as shown in Fig. 1.29. To overcome the defect, mechanisms called pre-emphasis and equalization are usually be adopted to compensate for the high frequency attenuating too fast during transmission when the bus runs faster than 5 Gbps. Pre-emphasis is used at the transmitter and equalization is adopted at the receiver. The pre-emphasis and equalization share no common formula and are closely related to the PCB board even to the environment or temperature. Besides the defect above, the frequency cannot be upgraded unlimitedly. Based on the combination of copper PCB + FR4, the limited frequency is 16 Gbps. Third, the increasing in frequency makes the serial line itself a radiation source, and the bit error rate will be
36
1 History and Development of Bus
Fig. 1.29 Eye diagram under different speeds of SRIO
further enhanced under the electromagnetic interference from the combined effect of a plurality of radiation sources. When the 16 Gbps becomes the last straw for copper, in what direction will the high speed serial bus go? It is the laser communication that gives people hope again. So far, laser communication has been verified among satellites, aircrafts and ground. Is there a possibility for laser communication between ASIC chips in embedded applications? The answer is YES when the miniaturization of the laser communication devices is mature. The miniaturization of the above-described laser communication can be achieved in three steps. First, convert the electrical signal to laser signal at the transmitter. After the conversion, the laser signal can travel via optical fiber and can be converted back to electrical signal at the receiver. There are already some mature ASIC chips designed for this purpose. Figure 1.30 shows an electrical/optical conversion ASIC with a 12T12R capability. Second, assemble the transmitter, receiver and optical fiber into one package through the SIP (System In Package) technology and encapsulate the above functions into one ASIC chip, with external interfaces still using optical fiber as the transmission medium. This can be achieved by using current technologies. Third, abandon the electrical signal at all and data is transferred by laser inside the chips. Combined with the second step, laser is the only medium for transmission
1.6 The Development Trend of High-Speed Serial …
37
Fig. 1.30 12T12R optical-electrical converter
both inside and outside of the ASIC chip. If this can be realized, electromagnetic compatibility experts are likely to be unemployed. Any new technology will encounter a variety of problems, so will the laser communication in embedded system especially inside the ASIC chips. Nothing to be worried. The 4th generation of bus may come soon unconsciously.
1.6.4 Extended Reading—Laser Takes the Place of Microwave Communication [36] In Sect. 1.6.3, laser communication is adopted in embedded system, also laser can be used in wireless communication. Here is a brief description about laser communication in wireless applications. The radio frequency (RF) communication involves a severe problem—the limited spectrum cannot satisfy the unlimited requirement of bandwidth. So scientists expand the useful spectrum from RF to laser. Table 1.10 shows the frequency spectrum distribution. The laser communication is also called free space optical communication (FSOC), and has been around since the late 1960s. Laser offered the potential to small transmitters and receivers with very high antenna gain (that is, small transmitter spot sizes). Specifically, FSOC systems could be much more efficient and could provide orders of magnitude gains in data rate compared to an RF system of the same size. Unfortunately in the 1970s and 1980s, much of the potential gain in efficiency was lost because of poor electrical-to-optical efficiency, poor optical detector efficiency, the increased transmitter spot sizes necessitated by transmitter pointing error limitations, and most importantly, link degradation from optical channel effects. The result was that the advantages of optical communications over RF communications were never realized for the past 40 years. After the problems were solved in 1994, the Optical Communication Demonstrator (OCD) designed by the Jet Propulsion Laboratory (JPL) and Massachusetts Institute of Technology (MIT) got a great success with speed up to 250 Mbps based on wavelength of 800 nm and OOK modulation. Although lower than the anticipation, it proved the possibility of FSOC. Besides OCD, the Defense Advanced Research Projects Agency (DARPA) planned a prototype system that integrates free-space optical and radio frequency
38
1 History and Development of Bus
Table 1.10 Frequency distribution Frequency/Hz
Denomination
Typical application
3–30
Extremely low frequency (ELF)
Long-range navigation, underwater communication
30–300
Super low frequency (SLF)
Underwater communication
300–3000
Ultra low frequency (ULF)
Long-range navigation
3 K–30 K
Very low frequency (VLF)
Long-range navigation, underwater communication, sonar
30 K–300 K
Low frequency (LF)
Navigation, underwater communication, radiosphere
300 K–3000 K
Middle frequency (MF)
Broadcast, maritime communication, coast guard
3 M–30 M
High frequency (HF)
Tele-broadcast, telegraph, telephone, fax
30 M–300 M
Very high frequency (VHF)
Television, FM broadcast, land traffic, air traffic control, taxi, navigation, aeroplane communication
0.3G–3G
Ultra high frequency (UHF)
Television, microwave communication, navigation, satellite communications, GPS, monitor radar, radio altimeter
3G–30G
Super high frequency (SHF)
Satellite communications, radio altimeter, microwave communication, airborne radar, meteorologic radar, public land mobile communication
30G–300G
Extremely high frequency (EHF)
Radar landing system, satellite communications, mobile communication, railway communication
300G–3T
Sub-millimeter-wave (0.1–1 mm)
Reserved for experiment
43T–430T
Infrared ray (7–0.7 μm)
Laser communication system
430T–750T
Visible light (0.7–0.4 μm)
Laser communication system
750T–3000T
Ultraviolet rays (0.4–0.1 μm)
Laser communication system
communications in a single network—the Optical and Radio Frequency Combined Link Experiment, known as ORCLE—in 2004. ORCLE combines optical communications technology—which transmits data via lasers—with radio signals and network management software to ensure a reliable network, according to statement released by DARPA. ORCLE showed an FSOC transfer rate of 2.5 Gbps. Some other countries had also tried the FSOC and RF/FSO communications as shown in Fig. 1.31.
1.6 The Development Trend of High-Speed Serial …
39
Fig. 1.31 FSOC and FSO/RF communication experiments
The Chinese government has also undertaken some experiments about FSOC. In October 2011, it was the first time that the laser from satellite was captured by a ship named “Ocean II”, with a distance of nearly 2000 km and a transfer rate of 504 Mbps.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
ENIAC, https://encyclopedia.thefreedictionary.com/ENIAC https://zhidao.baidu.com/question/198681067621014045.html https://wenku.baidu.com/view/09934ec7b207e87101f69e3143323968001cf464.html https://baike.baidu.com/item/%E7%AC%AC%E4%B8%80%E4%BB%A3%E7%94%B5% E5%AD%90%E8%AE%A1%E7%AE%97%E6%9C%BA/1430548 https://baike.baidu.com/item/IBM%20PC%2FXT/7825571?fr=aladdin PC104 Consortium. PC104-plus specification, vol 2 (1997) PCI Local Bus Specification V2.2. http://www.pcisig.com (1998) PCI-X Addendum to the PCI Local Bus Specification. www.pcisig.com (2002) https://zh.wikipedia.org/wiki/PCI_Express PCI Express Base Specification Revision 3.0. www.pcisig.com (2008) ISA, www.wikipedia.com https://zh.wikipedia.org/wiki/%E9%AB%98%E6%8A%80%E8%A1%93%E9%85%8D% E7%BD%AE Serial ATA Revision 2.5. www.sata-io.org https://zh.wikipedia.org/wiki/SATA www.sata-io.org https://baike.baidu.com/item/8051/3624046?fr=aladdin PC104, www.wikipedia.com
40
1 History and Development of Bus
18. 19. 20. 21. 22. 23.
CompactPCI Specification Revision 3.0 1999. www.picmg.org PXI System Alliance, PXI hardware specification Revision 2.2, September 22,2004 CompactPCI Express Revision 1.0, 2005, www.picmg.org Fuller S (2005) RapidIO the embedded system interconnect. Wiley & Sons, Ltd www.rapidio.org RapidIO: The Interconnect Architecture for High Performance Embedded Systems, www. rapidio.org Fuller S, The opportunity for sub microsecond interconnects for processor connectivity, www. rapidio.org RapidIO, PCI express and gigabit ethernet comparison, www.rapidio.org RapidIO® technology and PCI ExpressTM—a comparison, www.rapidio.org Analog device, JESD204B Survival Guide, 2013 JEDEC Standard, Serial Interface for Data Converters (JESD204B.01), 2012.1 Tate J, Introduction to Storage Area Networks, www.ibm.com/redbooks Fibre channel solution guide 2012–2013, www.fibrechannel.org https://baike.baidu.com/item/VME/1960004?fr=aladdin https://baike.baidu.com/item/VPX http://www.vita.com TI. LVDS Application and Data Handbook, 2002.11 TI. LVDS Owner’s Manual,2008 Bagley ZC, Hughes DH, Juarez JC, Kolodzy P (2013) Hybrid optical radio frequency airborne communications, http://spiedigitallibrary.org/ on 05/08/2013
24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.
Chapter 2
High-Speed Data Transfer Based on SERDES
2.1 Brief Introduction to Serdes SERDES (Serialization/De-serialization) transfer mechanism, namely a serializer at the transmitter for serializing the parallel data into a serial bit stream and a deserializer at the receiver for recovering the bit stream back to the original parallel data. The parallel-to-serial and serial-to-parallel functions were mainly implemented by specific ASICs first with DS92LV16 [1] and DS90CR287 [2] as examples. With these chips, a communication from Chip To Chip, Board To Board and Chassis To Chassis is established. As for the electronic level, it is the LVDS (Low-Voltage Differential Signaling) that is usually adopted. Due to the advantages of high speed and low bit error rate, LVDS is widely used in signal acquisition, high-speed transmission, video monitoring and so on. The typical transmission speed of LVDS ranges from hundreds of Mbps to 1.923 Gbps [3]. Since the proposition of TIA/EIA-644 for LVDS in 1995, the implementation of LVDS in communication had been mainly accomplished by the external specific ASICs mentioned above. Then the change happened in 2006 and Xilinx released a series of Virtex-4 FPGA with embedded SERDES primitives [4]. From the beginning of the Virtex-4, FPGA from Xilinx has been supporting the electronic level of LVDS, with the built-in SERDES primitive structures. The SERDES primitives can reach a transmission rate of about 1 Gbps through one pairs of general IOs (Input/Output Pins). Compared with external specific ASICs mentioned above, the SERDES primitive structure greatly reduces the requirement for pins, demand for PCB area and power consumption. With these advantages, it becomes an ideal, even the inevitable choice for replacing the external LVDS ASICs by the built-in SERDES primitives. LVDS, the physical layer of SERDES, will be first introduced in this chapter, followed by a detailed analysis of the advantages and disadvantages of the embedded
© Springer Nature Singapore Pte Ltd. 2020 F. Zhang, High-speed Serial Buses in Embedded Systems, https://doi.org/10.1007/978-981-15-1868-3_2
41
42
2 High-Speed Data Transfer Based on SERDES
SERDES primitives in FPGA. Then an example project about SERDES communication is established based on Kintex-7 series FPGA platform and a 1.25–1.6 Gbps transmission rate will be reached after a variety of timing optimization methods.
2.2 LVDS—Physical Layer of Serdes LVDS (Low Voltage Differential Signaling) was launched by NS (National Semiconductor) in 1990s to overcome the disadvantages of large power consumption, high bit error rate and serious electromagnetic interference of TTL (Transistor-Transistor Logic) [6]. LVDS was initially proposed for the digital video application. Up to now, there are two popular criteria of LVDS, one is ANSI/TIA/EIA-644 launched in November 1995, and the other is IEEE1596.3 launched by IEEE in March 1996. Both focus on the LVDS interface electrical characteristics, interconnection and line termination, and the power supply voltage level is not specified. The transmission medium can be copper inside PCB or twisted pair cables. The maximum data transfer rate of LVDS recommended is 655 Mbps, and the maximum transmission rate is up to LVDS 1.923 Gbps in theory on a lossless transmission line, much higher than other electronic levels. A comparison on transmission rate and distance between LVDS and other levels is shown in Fig. 2.1 [7]. The LVDS transmitter is driven by a differential current source, based on a 1.2 V bias voltage, and usually the current is 3.5 mA. The LVDS receiver has a very high input impedance, so the current goes mostly through the matched load resistance of 100 , generating about 350 mV and so the power consumption of LVDS is about
Fig. 2.1 Comparison on transfer speed and distance between LVDS and other signal levels [7]
2.2 LVDS—Physical Layer of Serdes
43
Fig. 2.2 Comparison on power between LVDS and other signal levels [7]
1.2 mW (350 mV * 3.5 mA). Compared with other different signals, LVDS consumes the minimum power, as shown in Fig. 2.2. With the above advantages like high transmission rate, long distance and minimum power consumption, LVDS transmission technology is widely used, especially in the HDMI and the CCD interface. A typical LVDS circuit consists of a transmitter, a transmission medium and a receiver, as shown in Fig. 2.3. The LVDS transmitter adopts a TTL level (including 5, 3.3, and 2.5 V) to LVDS conversion chip, converting the parallel data to serial bit stream, as shown in Fig. 2.4. At the receiver, the serial bit stream is converted back to the original parallel data by the LVDS to TTL conversion chip. In a LVDS communication system, in addition to the normal data to be sent by the transmitter, a clock needs to be transferred to the receiver. The clock is called source clock and it is the principal clock the receiver based on to complete the boundary alignment and then recover the original parallel data. A common source clock is sent along with the 4 LVDS channels, as shown in Fig. 2.5, where the left one is the transmitter (DS90CR287) and the right one (DS90CR288) is the receiver, which are commonly used in the CCD camera link interfaces. It is usually a serious problem and could not be solved easily when there is a need of multiple LVDS channels. The number of LVDS channels supported is 96 and the
Fig. 2.3 A typical LVDS interconnection system
44
2 High-Speed Data Transfer Based on SERDES
Fig. 2.4 Parallel-clock-serializer coding scheme [8]
Fig. 2.5 Block diagram for A 4-ch+1clock LVDS transceiver [2]
data width of parallel data is M = 16 (4 < M < 16), then the number of pins required in both of the transmitter and receiver is more than 1536, which is beyond the supply capacity of pins of the FPGA [9]. Is there a solution that does not taking up too much of the FPGA pins while having the advantageous characteristics of LVDS, or directly replaces the external LVDS chips? Yes, there is. That is adopting the SERDES (Serialization/De-serialization) primitive embedded in FPGA.
2.3 Data Transfer Based on Serdes Primitive Embedded in FPGA
45
2.3 Data Transfer Based on Serdes Primitive Embedded in FPGA The external LVDS ASIC chips take up much of the pins, which are valuable in the embedded system, so they are not suitable for transmission of multi-channels. If the external LVDS chips are replaced by the inner SERDES primitives, FPGA should meet the following two conditions: • FPGA supports LVDS level; • FPGA contains the embedded parallel/serial structure. The two conditions are analyzed in detail below.
2.3.1 FPGA Supports LVDS Level According to the supply voltage, the bank in Kintex-7 series FPGA can be divided into two types: HP (High Performance) and HR (High Range). The supply voltage of HP bank ranges from 1.2 to 1.8 V, while for HR bank it is 1.2–3.3 V [10]. Both HP and HR banks support LVDS signal levels with only the names being different. The LVDS is called LVDS_25 in HR bank and LVDS in HP bank. The electronic characteristics of LVDS_25 and LVDS are shown in Tables 2.1 and 2.2 respectively. Tables 2.1 and 2.2 show that the difference between LVDS_25 and LVDS only lies in supply voltage V CCO , and other parameters are the same or similar, so the communication can be established across the LVDS_25 and LVDS from different HP and HR banks. Table 2.1 LVDS_25 DC specifications [5] Symbol
DC parameter
Conditions
Min
Type
Max
Units
V cco
Supply voltage
–
2.375
2.500
2.625
V
V OH
Output high voltage
RT = 100
–
–
1.675
V
V OL
Output low voltage
RT = 100
0.700
–
–
V
V ODIFF
Differential output voltage
RT = 100
247
350
600
mV
V OCM
Output common-mode voltage
RT = 100
1.000
1.25
1.425
V
V IDIFF
Differential input voltage
–
100
350
600
mV
V ICM
Input common-mode voltage
–
0.300
1.200
1.425
V
46
2 High-Speed Data Transfer Based on SERDES
Table 2.2 LVDS DC specifications [5] Symbol
DC parameter
Conditions
Min
Type
Max
Units
V cco
Supply voltage
–
1.71
1.800
1.890
V
V OH
Output high voltage
RT = 100
–
–
1.675
V
V OL
Output low voltage
RT = 100
0.825
–
–
V
V ODIFF
Differential output voltage
RT = 100
247
350
600
mV
V OCM
Output common-mode voltage
RT = 100
1.000
1.25
1.425
V
V IDIFF
Differential input voltage
Common-mode input voltage = 1.25 V
100
350
600
mV
V ICM
Input common-mode voltage
Differential input voltage = ±350 mV
0.300
1.200
1.425
V
2.3.2 FPGA Embeds-in OSERDESE2/ISERDESE2 Primitives In a SERDES communication system in the 7 series FPGA devices, the transmit function is mainly completed by the OSERDESE2. The OSERDESE2 primitive is a dedicated parallel-to-serial converter with specific clocking and logic resources designed to facilitate the implementation of high-speed source-synchronous interfaces [11]. The receiver function is primarily completed by the ISERDESE2. The ISERDESE2 primitive is a dedicated serial-to-parallel converter with specific clocking and logic features designed to facilitate the implementation of high-speed source-synchronous applications. The ISERDESE2 avoids the additional timing complexities encountered during the design of deserializers in the FPGA fabric [11]. The structures of OSERDESE2 and ISERDESE2 are shown in Fig. 2.6. At the transmitter, the OSERDESE2 ports D1-D8 represent the 8 bit parallel data, and port OQ outputs the serial bit stream; at the receiver, the serial bit data is input from the D port of ISERDESE2, and the converted 8 bit parallel data is output through ports Q1-Q8. Two OSERDESE2 modules can be cascaded to build a parallel-to-serial converter larger than 8:1, up to 14:1; and two ISERDESE2 modules can be cascaded to build a serial-to-parallel converter larger than 8:1, up to 14:1, as shown in Fig. 2.7. SERDES primitives support two modes of data sampling, i.e. SDR (Single Data Rate) mode with data sampled at the rising edge of clock, and DDR (Double Data Rate) mode with data sampled at both the edges of clock. The parallel data width supported under SDR and DDR mode is shown in Table 2.3. Based on the analysis above, it can be seen that the communication based on SERDES primitives across different FPGA banks or even different FPGA banks is possible. So the replacement of external dedicated LVDS conversion chip is a certainty.
2.3 Data Transfer Based on Serdes Primitive Embedded in FPGA
Fig. 2.6 OSERDESE2/ISERDESE2 primitives ports [11]
Fig. 2.7 Cascaded structure of OSERDESE2/ISERDESE2 [11]
47
48
2 High-Speed Data Transfer Based on SERDES
Table 2.3 OSERDESE2/ISERDESE2 attributes OSERDESE2 Attribute
ISERDESE2 Attribute
Sampling mode
Data width
Sampling mode
Data width
SDR
2/3/4/5/6/7/8
SDR
2/3/4/5/6/7/8
DDR
4/6/8/10/14
DDR
4/6/8/10/14
Table 2.4 SERDES Rate in Kintex-7 FPGA [5] Sampling mode
I/O Bank type
Speed grade
Units
1.0 V −3
0.9 V −2/−2L
−1
−2L
SDR (data width 4bit–8bit)
HR
710
710
625
625
Mbps
HP
710
710
625
625
Mbps
DDR (data width 4bit–14bit)
HR
1250
1250
950
950
Mbps
HP
1600
1400
1250
1250
Mbps
2.3.3 Analysis of the Transfer Rate of Serdes There are two factors that affect the transmission rate of SERDES primitives in FPGA: (1) DC characteristics of SERDES in FPGA, involving the data width, data sampling mode (SDR/DDR) and grade of FPGA, as shown in Table 2.4. (2) The maximum clock frequency supported by FPGA, affecting the clock and transmission rate of SERDES primitives, as shown in Table 2.5. Taking the –3 grade FPGA as an example, the IO clock FMAX_BUFIO reaches up to 800 MHz, and according to DDR double sampling mode, the maximum transmission rate is 800 MHz * 2 = 1.6 Gbps. This is consistent with the speed of SERDES under the condition of DDR mode in HP bank as shown in Table 2.4. Tables 2.4 and 2.5 show the feasibility and certainty of replacement the external LVDS chips by the embedded primitives in FPGA to achieve a high-speed transmission rate based on LVDS.
2.4 Implementation of Serdes Transfer Function in FPGA
49
Table 2.5 Clock characteristics in Kintex-7 FPGA [5] Symbol
Description
Speed grade
Units
1.0 V
0.9 V
−3
−2
−1
−2L
MHz
FMAX_BUFG
Global clock tree (BUFG)
741.00
710.00
625.00
560.00
MHz
FMAX_BUFIO
I/O clock tree (BUFIO)
800.00
800.00
710.00
710.00
MHz
FMAX_BUFR
Regional clock tree (BUFR)
600.00
540.00
450.00
450.00
MHz
FMAX_BUFH
Horizontal clock buffer (BUFH)
741.00
710.00
625.00
560.00
MHz
2.4 Implementation of Serdes Transfer Function in FPGA Objective: To build a new Serdes_Test project, based on SERDES primitives (ISERDESE2 and OSERDESE2), and verify the communication function of SERDES, then measure the upper-limit transmission rate of SERDES. Thought: At the transmitter, generate a cyclic accumulative 8 bit parallel data from 0 to 255 at the frequency of 32 MHz with a step of 1, serializing the parallel 8 bit data into bit stream through the OSERDESE2 primitives. Connect the transmitter and receiver through the motherboard based on LVDS. At the receiver, receive the differential bit stream through ISERDESE2 making the serial-to-parallel conversion, and recover the original 32 MHz 8 bit original parallel data. Implementation method: There are two choices. The first is instantiating the OSERDESE2/ISERDESE2 primitives and configuring the relevant parameters and ports manually; the second is the use of visualization tools supplied by the Integrated Software Development Kits (SDK). In this example, we first use SDK to build the project and configure the relevant parameters. And the manual instantiating method is adopted later.
2.4.1 OSERDESE2 Configuration at the Transmitter in FPGA Train of thought: • generate a cyclic accumulative 8 bit parallel data from 0 to 255 at the frequency of 32 MHz with a step of 1; • instantiate the OSERDESE2 primitives through SDK converting data from parallel to serial; • send the data out based on the LVDS_25;
50
2 High-Speed Data Transfer Based on SERDES
1. Configuration of OSERDESE2 IP core The first step is to create a new ISE project with the name of Serdes_Test and add the SERDES IP core to the project. Select the FPGA device for XC7K325T-900FBG-1, as shown in Fig. 2.8. In the Serdes_Test project, add a new IP core named Serdes_Tx, for the realization of the OSERDESE2 function, as shown in Fig. 2.9. Click the “Next” button in Fig. 2.9, then Fig. 2.10 pops up. Select “SelectIO Interface Interface” under “FPGA and Design/IO Wizard” and click the “Next” button, a summary interface is shown in Fig. 2.11. Click the “Finish” button in Fig. 2.11 and pop up the “Serdes IP” kernel configuration interface, as shown in Fig. 2.12.
Fig. 2.8 Create a new ISE project
2.4 Implementation of Serdes Transfer Function in FPGA
51
Fig. 2.9 Add an IP core source
The following is the specific configuration for OSERDESE2 primitives. Select an Interface for IO configuration: SERDES primitives can support multiple protocols such as DVI, Cameralink, SGMII and “Chip to Chip”. Select “Chip to Chip” from the drop-down menu here for the purpose of the user-specific function. Data Bus Direction: The SERDES (including OSERDESE2 and ISERDESE2) primitives support Input, Output, Bidirectional and Separate I/O. The separate input and output options create independent input and output pins. In this example, the output function of OSERDESE2 is only used at the transmitter, thereby select “Configure output from the device”. I/O signaling type: Choose based on whether your bus is single-ended or differential. Select “differential” here in combination with the LVDS_25 standard below.
52
2 High-Speed Data Transfer Based on SERDES
Fig. 2.10 Select the SelectIO IP core
I/O signaling standard: Select LVDS_25 according to the voltage supplied to the FPGA bank and the selected standard will appear in the generated HDL code. Click the “Next” button and pop up the interface, as shown in Fig. 2.13. Data Rate: As mentioned above, SERDES can support two types of data capture mode—SDR and DDR. Select SDR if the data is clocked on the rising edge or DDR if the data is clocked on both edges. SDR mode is used in the project. Use serialization: if the “USE serialization” is ticked, then the SERDES primitives are instantiated depending on the device selected. It is the OSERDESE2 at the transmitter of Kintex-7.
2.4 Implementation of Serdes Transfer Function in FPGA
53
Fig. 2.11 Summary of adding IP core
Serialization factor: Refer to Table 2.3, SDR mode of SERDES supports a conversion of 2–8 bit parallel data into serial bit stream, and DDR mode of SERDES supports a conversion of 4/6/8/10/14 bit parallel data into serial bit stream. If serialization factor is set more than 8, then two SERDES primitives will be cascaded together. In the example, the serialization factor is set to 8. External Data Width: The channel number of SERDES. Setting to 1 here indicates that only one SERDES transmission channel is used in this project. Click the “Next” button and pop up the interface, as shown in Fig. 2.14. Since the “Delay inserted into input data routing” is used at the receiver, which will be discussed later, just keep the default values.
54
2 High-Speed Data Transfer Based on SERDES
Fig. 2.12 Configuration of OSERDESE2—Tab1
Fig. 2.13 Configuration of OSERDESE2—Tab2
2.4 Implementation of Serdes Transfer Function in FPGA
55
Fig. 2.14 Configuration of OSERDESE2—Tab3
Click the “Next” button and pop up the interface, as shown in Fig. 2.15, configure the clock parameters. Clock signal: As mentioned above, the input clock signal for SERDES comes in either of the two type: single-ended and differential. In the project, the “differential” is adopted and the corresponding standard is LVDS_25. Clock strategy: there are also two choices of clock strategy—Fabric Clock based on MMCM or source synchronous based on BUFIO and BUFR. The differences between MMCM usually adopting BUFG and source synchronous based on BUFIO and BUFR will be discussed later. Keep default values for the rest and click the “Next” button, then Fig. 2.16 pops up. Figure 2.16 shows the brief summary of the configuration parameters. After a confirmation, click the “Generate” button to generate the desired files for the Serdes_Tx project, which are located in the project directory “*\Serdes_Test\ipcore_dir”.
56
2 High-Speed Data Transfer Based on SERDES
Fig. 2.15 Configuration of OSERDESE2—Tab4
Fig. 2.16 Configuration of OSERDESE2—Tab5
2.4 Implementation of Serdes Transfer Function in FPGA
57
The files generated are list below. Serdes_Tx.asy Serdes_Tx.gise Serdes_Tx.sym Serdes_Tx.ucf Serdes_Tx.vhd Serdes_Tx.vho Serdes_Tx.xco Serdes_Tx.xdc Serdes_Tx.xise Serdes_Tx\doc\selectio_wiz_ds746.pdf Serdes_Tx\doc\selectio_wiz_gsg700.pdf Serdes_Tx\doc\selectio_wiz_v4_1_readme.txt Serdes_Tx\doc\selectio_wiz_v4_1_vinfo.html Serdes_Tx\example_design\Serdes_Tx_exdes.ucf Serdes_Tx\example_design\Serdes_Tx_exdes.vhd Serdes_Tx\example_design\Serdes_Tx_exdes.xdc Serdes_Tx\implement\implement.bat Serdes_Tx\implement\implement.sh Serdes_Tx\implement\planAhead_ise.bat Serdes_Tx\implement\planAhead_ise.sh Serdes_Tx\implement\planAhead_ise.tcl Serdes_Tx\implement\planAhead_rdn.bat Serdes_Tx\implement\planAhead_rdn.sh Serdes_Tx\implement\planAhead_rdn.tcl Serdes_Tx\implement\xst.prj Serdes_Tx\implement\xst.scr Serdes_Tx\selectio_wiz_v4_1_readme.txt Serdes_Tx_flist.txt Serdes_Tx_xmdf.tcl
So far, the OSERDESE2 primitives have been configured and instantiated through the GUI of SDK. And files related to the transmission function have been generated. A detailed analysis about the files structure and even the HDL function in the code follows.
58
2 High-Speed Data Transfer Based on SERDES
Fig. 2.17 Directory structure of OSERDESE2 at the transmitter
2. Analysis of source code of Serdes_Tx (1) Directory Structure. A variety of files with different suffixes are generated in the Serdes_Test project, and most of them are useless for user. Just delete the Serdes_Tx.xco, and add Serdes_Tx.vhd and Serdes_Tx.ucf to the project. The project hierarchy is shown in Fig. 2.17. Serdes_Tx.vhd: Top level of the project. There are three functions in the HDL code. Instantiation of OBUFDS; Instantiation of OSERDESE2; Generation of the source data to be sent by the OSERDESE2. The HDL code is shown below.
2.4 Implementation of Serdes Transfer Function in FPGA
-- Instantiate the buffers -- Instantiate a buffer for every bit of the data bus and convert the -- single-ended signal to differential obufds_inst : OBUFDS generic map ( IOSTANDARD => "LVDS_25") port map ( O => DATA_OUT_TO_PINS_P, OB => DATA_OUT_TO_PINS_N, I => data_out_to_pins_int ); -- Instantiate the OSERDESE2 primitives oSerdese2_master : OSERDESE2 generic map ( DATA_RATE_OQ => "SDR", DATA_RATE_TQ => "SDR", DATA_WIDTH => 8, TRISTATE_WIDTH => 1, Serdes_MODE=> "MASTER") port map ( D1 => oSerdes_d(0), D2 => oSerdes_d(1), D3 => oSerdes_d(2), D4 => oSerdes_d(3), D5 => oSerdes_d(4), D6 => oSerdes_d(5), D7 => oSerdes_d(6), D8 => oSerdes_d(7), T1 => '0', T2 => '0', T3 => '0', T4 => '0', SHIFTIN1 => '0', SHIFTIN2 => '0', SHIFTOUT1 => open, SHIFTOUT2 => open, OCE=> clock_enable, CLK=> CLK_IN,
59
60
2 High-Speed Data Transfer Based on SERDES
CLKDIV => CLK_DIV_IN, OQ => data_out_to_pins_int, TQ => open, OFB=> open, TBYTEIN=> '0', TBYTEOUT => open, TFB=> open, TCE=> '0', RST=> IO_RESET); -- Concatenate the serdes outputs together. Keep the timesliced -- bits together, and placing the earliest bits on the right -- ie, if data comes in 0, 1, 2, 3, 4, 5, 6, 7, ... -- the output will be 3210, 7654, ... out_slices: for slice_count in 0 to 7 generate begin oSerdes_d(slice_count) FALSE, -- Differential termination IOSTANDARD => "LVDS_25") --I/O signal standard port map ( I => DATA_IN_FROM_PINS_P , IB => DATA_IN_FROM_PINS_N , O => data_in_from_pins_int ); -- Instantiate the ISERDESE2 iSerdese2_master: ISERDESE2 generic map ( DATA_RATE => "SDR", DATA_WIDTH=> 8, INTERFACE_TYPE=> "NETWORKING", DYN_CLKDIV_INV_EN => "FALSE", DYN_CLK_INV_EN=> "FALSE", NUM_CE=> 2, OFB_USED => "FALSE", IOBDELAY => "NONE", Serdes_MODE => "MASTER") port map ( Q1=> iSerdes_q(0), Q2=> iSerdes_q(1), Q3=> iSerdes_q(2), Q4=> iSerdes_q(3), Q5=> iSerdes_q(4), Q6=> iSerdes_q(5), Q7=> iSerdes_q(6), Q8=> iSerdes_q(7), SHIFTOUT1 => open, SHIFTOUT2 => open, BITSLIP => BITSLIP, CE1 => clock_enable,
2.4 Implementation of Serdes Transfer Function in FPGA
67
CE2 => clock_enable, CLK => CLK_IN, CLKB => not CLK_IN;, CLKDIV=> CLK_DIV_IN, CLKDIVP => '0', D => data_in_from_pins_int, DDLY => '0', RST => IO_RESET, SHIFTIN1 => '0', SHIFTIN2 => '0', --unused connections DYNCLKDIVSEL => '0', DYNCLKSEL => '0', OFB => '0', OCLK => '0', OCLKB => '0', O => open); -- Concatenate the serdes outputs together. Keep the timesliced -- bits together, and placing the earliest bits on the right -- ie, if data comes in 0, 1, 2, 3, 4, 5, 6, 7, ... -- the output will be 3210, 7654, ... in_slices: for slice_count in 0 to num_serial_bits-1 generate begin DATA_IN_TO_DEVICE (slice_count)
IF PHYRDY = '0' THEN next_link_state