Field Programmable Gate Array and Applications [1 ed.] 9781783323241, 9781783322152

FIELD PROGRAMMABLE GATE ARRAY (FPGAs) belong to the family of programmable logic devices and designing with FPGAs requir

172 48 28MB

English Pages 373 Year 2016

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Field Programmable Gate Array and Applications [1 ed.]
 9781783323241, 9781783322152

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Field Programmable Gate Arrays and Applications

S.S.S.P. Rao

α

Alpha Science International Ltd. Oxford, U.K.

Field Programmable Gate Arrays and Applications 372 Pags. | 203 Figs. | 41 Tabls.

S.S.S.P. Rao Department of Computer Science & Engineering Indian Institute of Technology Powai Mumbai Copyright © 2016 ALPHA SCIENCE INTERNATIONAL LTD. 7200 The Quorum, Oxford Business Park North Garsington Road, Oxford OX4 2JZ, U.K. www.alphasci.com ISBN 978-1-78332-215-2 E-ISBN 978-1-78332-324-1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the publisher.

Dedicated to Srikantam Krishna Rao and Srikantam Saradamba

Preface

As Professor of Computer Science and Engineering, Indian Institute of Technology-Bombay, I started teaching Architectural details and design methodologies of Field Programmable gate Arrays (FPGAs) since their introduction in 1983. My students at undergraduate and postgraduate levels did projects during my tenure of Professorship using right from Xilinx XC 2000 to Spartan 6. Public and Private sectors in India sponsored projects to me to do FPGA based embedded systems and my project engineers successfully completed them to the satisfaction of the sponsor. After my retirement from IIT-Bombay, I got an opportunity to join Xilinx Centre established in Hyderabad by Akshya Prakash of Xilinx, USA as their Chief Technology Officer. After my tenure in this organization I joined CMC Limited, Hyderabad and advised CMC engineers in their FPGA projects in the Embedded Systems Group. Then I thought of putting all my academic and industrial experience in a book form and with the help of some experts finalized the contents of the book. Since Field Programmable gate arrays belong to the family of programmable devices and designing with FPGAs require knowledge of Digital design, Chapter 1 was devoted to cover an overview of Boolean Algebra and Logic Design. This chapter covers gates, minimization techniques of Boolean expressions and combinational and sequential logic circuits and state machines with design examples. The next topic is on Programmable Logic Devices which is covered in Chapter 2. In this chapter; SPLDs and CPLDs are explained and their internal organization is presented which include ROM, PAL and PLA. PLD Design methodologies with tools are covered with design examples. Introduction to Field programmable devices (FPGAs) are then explained right from the basic FPGA which was used as glue logic to present day very advanced FPGA used in Embedded Systems. Design flow for FPGA is then given and applications of FPGAs are listed. Comparison of Microcontrollers, FPGAs and ASIC is discussed. Chapter 3 is completely devoted to evolution of Xilinx FPGAs and their architectural features. Chapter 4 covers Altera FPGAs and Actel/Microsemi FPGAS are reviewed in Chapter 5. Since designing with FPGAs require knowledge of hardware description languages, Verilog and VHDL with many design examples in Verilog and VHDL and brief introduction to System Verilog form the chapter 6. In chapter 7 complete Xilinx FPGA design flow is covered right from architectural specification to obtaining bit

viii

Preface

stream to be loaded into FPGA and testing methodologies. Chapter 8 completely covers design exercises using Spartan Series that include an ALU, Multiplier unit and Arbiter using round robin algorithm. These applications give the reader good understanding of designing systems using FPGAs with design tools. Having seen SRAM, Antifuse, Flash FPGA Programming Technologies, a comparison of these technologies would greatly help the designer to choose the right one for his application based on conditions under which his product functions reliably and safely. In chapter 9, this comparison is given along with the important research issues of FPGA Security and Future of FPGAs for the next decade. S.S.S.P. Rao

Foreword-1

Advent of digital logic systems was facilitated by VLSI technology. It is interesting that one can implement even a complex system such as a digital computer by using only AND, OR, and NOT gates, further, we can implement using only NAND or NOR gates. A transistor is basically an inverter and as such synthesis using only NAND or NOR gates, is possible, but one needs a large number of transistors. VLSI made it possible to have large number of gates on a chip. Difference between different systems was in the interconnection between the gates. Technology produced a gate array which is an array of large number of gates. It led to field programmable gate array chips with a matrix of interconnects which can be modified to suit the required logic by removing undesirable inter connects by selectively burning out chosen ones. They number of gates on a FPGA (Field Programmable Gate Array) can be in thousands and one can fabricate in the lab even a small special purpose computer in a short time. Early advertisements mentioned that “one can conceive a special purpose chip at breakfast and have the chip ready by dinner time! FPGA’s have made it easy for students to fabricate their projects and have the satisfaction of building the desired system. FPGA is a boon to remote colleges teaching digital design. Dr. SSSP Rao has over 30 years of experience in teaching and design of digital systems and is welcome that he has chosen to write a book on FPGA. It is like “hearing from the horse’s mouth”! Coverage is comprehensive and the book is suited for a textbook as well as for self-study, and It is appropriate that it is based on XILINX technology, which is easily available in India. I thank Dr. Rao for the book and wish all success to the book. Prof. H N Mahabala

(Retd.) Professor Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai

Foreword-2

Ever since the transistor was invented in the middle years of the 20th century an era of miniaturization in electronics has been heralded worldwide based on silicon technology and the resulting microchip revolution. These have now become the corner stone in almost every product/service developed for the benefit of mankind and enabled the launching of new industries like design, technology, packaging, testing, applications, leading to the present-day Information Age. The Field Programmable Gate Arrays (FPGAs) constitute an important class of microchips that can be programmed in the laboratory or on the shop floor to perform specific electronic functions in digital logic form. Since their inception in the 1980s decade, FPGAs have grown in capacity and complexity to include millions of logic gates, megabytes of memory and high speed interfaces in microchip form in the current versions and have become the preferred choice in a wide range of applications, such as consumer products, computing/communication/control/instrumentation areas as well as military/space systems. As a result, good knowledge/expertise in “FPGAs and Applications” is now expected from graduating engineers in electrical/electronic/computer and related branches so that they can exploit the emerging opportunities and prepare themselves for good careers in this subject area. But, the absence of a comprehensive/contemporary text/reference book covering this subject matter has been a long felt limitation at Indian Universities and Institutions. Recognizing this, Prof. SSSP Rao, a senior computer science academic and researcher with high expertise and long experience in working with FPGAs at the Indian Institute of Technology, Mumbai has prepared this Book entitled, “Field Programmable Gate Arrays and Applications” which can serve as text/reference source for engineering & technology students at both UG/PG levels and professionals alike. The book is well planned and organized to provide the reader with a good insight into the fascinating world of FPGAs. Beginning with an Overview of Boolean Algebra and Logic Design in Chapter 1 followed by state of the art coverage on Programmable Logic Devices in Chapter 2, An Introduction to FPGAs, XILINX, ALTERA, ACTEL/MICROSEMI and Overview of Hardware Descriptive Languages are covered in the next four Chapters.

xii

Foreword-2

The FPGA Design Flow is then described in Chapter 7 followed by a range of Selected Applications of Xilinx FPGAs to provide a flavour of the potential uses of FPGAs in present day electronic product and services in chapter 8. The last Chapter covers the important issue of FPGA Security and Future of FPGAs. An excellent Bibliography given at the end of the chapter is very helpful to the reader in learning and gaining more experience in this subject area. On the whole, the Book is an excellent addition to text/reference sources now available in the broad area of VLSI Design and Applications of considerable use in both learning/teaching and R&D/industry. I would like to compliment the author for undertaking this commendable task and making a significant contribution to the world of academic and research endeavour. Prof. B. S. Sonde Former Vice Chancellor University of Goa

Acknowledgment

Since the introduction of Field Programmable Gate Arrays (FPGAs) in around 1982, as Faculty member of Indian Institute of Technology (IIT)-Bombay, I started teaching and offering FPGA based projects from 1982 to 2005, using Xilinx XC 2000 to Spartan 6, to Undergraduate, Post Graduate students of Electrical and Computer Science Engineering Departments. Also FPGA based sponsored projects from public and private sectors were successfully completed by project engineers under my guidance in IIT-Bombay. After my retirement in IIT-Bombay, I joined Xilinx Centre established in Hyderabad in 2005 as their Chief Technology Officer. This centre is established initially in CMC -Hyderabad by Akshya Prakash from Xilinx, USA. After my tenure in Xilinx Centre, I joined CMC Limited Hyderabad as their Chief Advisor and mentor and handled FPGA based projects in the Embedded Systems Group. At that time I thought of putting all my academic and industrial experience in designing with FPGAs in the form of a book and drafted the contents. The contents of the book were reviewed by experts Akshya Prakash of Xilinx, USA, Ms Usha Priyadharshini of IBM-Bangalore and Dr. Vamsi Srikantam of AMCC, USA and contents of the book were finalized. My sincere thanks to all these experts. I should place on record my sincere thanks to N.K. Mehra of Narosa Publishing House, Delhi who on my communication readily agreed to publish the book. I should then place on record my sincere thanks and appreciation to my colleagues from IIT-Bombay retired Professor M.R Bhujade and Professor M.P. Desai for providing some excellent subject matter for chapter 1 on Boolean Algebra and Logic Design: an Overview. Their help is greatly acknowledged. Ron Wilson of ALTERA, USA and Gautam Sachin of MICROSEMI greatly helped me in writing the chapters on ALTERA and ACTEL/MICRSEMI FPGAS by providing technical information on their Company FPGAs. I would like to place on record my sincere thanks and appreciation to both the experts. My interaction with Sachin Gupta of Microsemi was excellent and I would like to sincerely thank him once again for his untiring efforts to make this chapter on Actel/Microsemi FPGAs look highly technical and informative. The contribution of Ms. Saambhavi of Xilinx Centre, Hyderabad needs special mention of excellence in contributing to the chapter 6 on Hardware description

xiv

Acknowledgment

Languages (Verilog/VHDL). Her assistance is highly appreciated and I am very grateful for her help.. In writing Chapter 7 on FPGA Design Flow, I got immense help from Akshya Prakash of Xilinx, USA, Dr. Sudip Nag of Xilinx, USA and Mrinal Sarmah of Xilinx Centre, Hyderabad. I would like to express my heartfelt gratitude to Mrinal Sarmah for the excellent support he gave me in writing this chapter and also providing the subject matter for this very important chapter. Special and sincere thanks to Mrinal for his great contribution in shaping this design flow chapter. His untiring and excellent support is gratefully acknowledged. Chapter 8 deals with Design Exercises using FPGAs. In this chapter, instead of giving pure academic material, I thought of giving projects really implemented and tested so that readers get real experience and understanding in FPGA designs. Towards this end I sought help from a FPGA training Institute. Xilinx, USA has given approval to an Institute in Bangalore, India called Sandeepani-School of Embedded System Design, CG-CoreEl Technologies, Bangalore to give training to students and practicing engineers in Xilinx FPGA based designs. On my request Sandeepani-CGcoreEl Technologies agreed to provide FPGA based designs right from specifications to implementation and test results done by students of the training Institute under the supervision of Ms. Deepa and Padmanabhan in consultation with me. I am very confident that these design projects give readers very good understanding of FPGA design practices. I am highly grateful for all the assistance given by Deepa and Padmanabhan of SandeepaniCG Core El Technologies for giving material for this chapter a picture of reality. After these three simple design exercises, a complex project involving design of High Speed Telescope Data Acquisition System using FPGAs is explained. This design illustrates how a complex FPGA design is translated from high level specifications to detailed microarchitecture and how a self-checking test bench architecture is designed for a complex FPGA design. This complex Project was done by Sunil Puranik of Computational Research Laboratory, Pune and Dr. Subrahmanya of Raman Research Institute, Bangalore. On my request they provided me this information so that reader will understand how complex projects could be designed using FPFGAs. I would like to thank both of them profusely for their untiring efforts to help me in presenting this design. My sincere thanks to both the experts. I thank profusely Prof. H.N. Mahabala, Retired Professor from the Department of Computer Science and Engineering, IIT-Madras for writing Foreword -1. Now I would like to thank and express my gratitude to Prof. Sonde, retired Professor from IISc, Bangalore and former Vice Chancellor of Goa University who while writing Foreword-2 for my book after going through some chapters he suggested to me to include some research oriented issues related to FPGAs like FPGA security since FPGAs are being used in very sensitive and very confidential projects in industrial and military applications. Dr. Trimberger of Xilinx, USA presented a paper on FPGA security which is published by IEEE proceedings. With the permission of the author and publisher IEEE, I included this research oriented topic in chapter 9. I hereby express my thanks and gratitude to Dr.Trimberger and IEEE for giving permission to take material from the paper ` FPGA Security: Motivations, Features, and Applications, Proceedings of IEEE, Vol. 102, No.8, August 2014, pp 1248-1265.. In a ACM work shop, industry people like Xilinx CTO and Senior Vice President Dr. Ivo Bolsens and Altera CTO and senior vice president of Research and Development, Misha

Acknowledgment

xv

Burich etc and academicians like Prof. Peter Cheung, Head of Department of Electrical and Electronic Engineering, Imperial College, London debated on Future of FPGS till 2032 which gives very interesting topics for research. The discussion that took place in this ACM conference was published by Ron Wilson, Editor-in Chief of Altera. I met him in Altera and had discussions on this topic and with his permission I included this topic in Chapter 9 which will help research scholars to do further research on this. My sincere thanks to Ron Wilson for all the help he gave in writing this material on Future of FPGAs till 2032. I am extremely thankful to Ms N.P. Shravya of CMC Limited for drawing diagrams for this book with a fine comb. Finally I should express my sense of appreciation and gratitude to my wife Mrs. Rajeswari who gave me immense encouragement and support while I was busy with my writing at home. I hope with all the material in this book it will greatly help students and practicing engineers in understanding FPGA based designs and research scholars to do further research on future of FPGAs and FPGA Security methods. S.S.S.P. Rao

Contents

Preface..................................................................................................................................................... vii Foreword-1.............................................................................................................................................. ix Foreword-2.............................................................................................................................................. xi Acknowledgement..................................................................................................................................xiii 1. Boolean Algebra and Logic Design: An Overview..........................................................1.1—1.45 1.1 Introduction...........................................................................................................................1.1 1.2 Truth Table............................................................................................................................1.1 1.3 Basic Logic Gates.................................................................................................................1.3 1.3.1 AND Gate................................................................................................................1.3 1.3.2 OR Gate................................................................................................................... 1.3 1.3.3 NAND Gate.............................................................................................................1.3 1.3.4 NOR Gate................................................................................................................1.4 1.3.5 Buffer....................................................................................................................... 1.4 1.3.6 Inverting Buffer.......................................................................................................1.4 1.3.7 Exclusive OR Gate..................................................................................................1.5 1.3.8 Exclusive NOR Gate...............................................................................................1.5 1.3.9 Tri-state Gate........................................................................................................... 1.5 1.4 Boolean function Minimization............................................................................................ 1.6 1.4.1 Theorems................................................................................................................. 1.6 1.4.2 Karanugh Map......................................................................................................... 1.8 1.4.2.1 Boolean Algebra Expression Minimization by Karanugh Map............. 1.8 1.4.2.2 Minimization Rules................................................................................ 1.9 1.4.2.3 Summmary........................................................................................... 1.11 1.4.2.4 Don’t cares........................................................................................... 1.12 1.4.2.5 Design Example Race hazards.............................................................1.12 1.4.2.6 Some Examples....................................................................................1.12

xviii

Contents

1.4.3 Elimination of Race Hazards.................................................................................1.15 1.4.4 Quine-McCluskey Algorithm................................................................................1.16 1.4.4.1 Example...............................................................................................1.16 1.4.4.2 Alternate Method.................................................................................1.19 1.4.5 Logic Circuits........................................................................................................1.22 1.4.5.1 Combinational Circuits........................................................................1.22 1.4.5.2 Examples of Combinational Circuits................................................... 1.22 1.4.5.3 Half Adder............................................................................................ 1.23 1.4.5.4 Full Adder............................................................................................ 1.23 1.4.5.5 2 to 4 Line Decoder..............................................................................1.24 1.4.5.6 4 to 1 Multiplexer.................................................................................1.25 1.4.5.7 BCD to Excess-3 Code Converter BCD Input Excess-3 Output Decimal................................................................................................1.26 1.4.5.8 Programmable Logic Devices are Discussed in Chapter 2.................. 1.28 1.4.5.9 An Example of a Combinational Circuit Design.................................1.28 1.4.6 Sequential Circuits................................................................................................. 1.29 1.4.6.1 Examples..............................................................................................1.29 1.4.7 Sequential Circuit Design...................................................................................... 1.35 1.4.7.1 Sequential Circuit Design Example..................................................... 1.35 1.4.8 State Machine........................................................................................................1.37 1.4.8.1 An Example of a State Machine is the Digital Computer.................... 1.38 1.4.8.2 Design Example 1................................................................................1.38 1.4.8.3 Design Example 2................................................................................1.41 1.5 Summary............................................................................................................................. 1.44 2. Programmable Logic Devices (PLDs).............................................................................. 2.1—2.31 2.1 Introduction...........................................................................................................................2.1 2.2 SPLDs .................................................................................................................................2.2 2.2.1 A Commercial PAL 22V10 is shown in Fig. 2.4. Programming this Chip is Explained Later............................................................................................................ 2.5 2.3 Complex Programmable Logic Devices (CPLD).................................................................2.5 2.3.1 Comparison of SPLD and CPLD ........................................................................... 2.7 2.3.2 Some Commercial PLDs.........................................................................................2.7 2.3.3 Design Methodologies for PLDs............................................................................. 2.9 2.3.3.1 Steps involved in manual programming................................................2.9 2.3.3.2 Steps Involved in Designing with SPLDs Using CAD Tools are Shown in Fig 2.8.................................................................................2.13 2.4 Field Programmable Gate Array (FPGA)...........................................................................2.16 2.4.1 XILINX FPGAS In the beginning.........................................................................2.18 2.4.2 Advanced FPGAs.................................................................................................. 2.18

Contents

xix

2.4.3 Spartan Family.......................................................................................................2.20 2.4.4 Designing with Xilinx FPGAs...............................................................................2.20 2.4.5 Applications of FPGAs..........................................................................................2.21 2.5 Microcontrollers.................................................................................................................2.22 2.6 Application Specific Integrated Circuit (ASIC)................................................................. 2.24 2.6.1 Full-Custom ASIC................................................................................................. 2.25 2.6.2 Standard Cell ASIC...............................................................................................2.25 2.6.3 Gate Array ASIC.................................................................................................... 2.25 2.6.4 Generalised ASIC Design Flow Steps................................................................... 2.25 2.6.5 ASIC Design Flow................................................................................................2.26 2.7 Comparisons....................................................................................................................... 2.26 2.7.1 Comparison of Microcontrollers vs FPGA............................................................ 2.26 2.7.2 Comparison of FPGA vs ASIC..............................................................................2.28 2.8 Current Status..................................................................................................................... 2.28 2.9 Conclusions......................................................................................................................... 2.28 3. Xilinx FPGAs..................................................................................................................... 3.1—3.43 3.1 Introduction...........................................................................................................................3.1 3.1.1 Look-up Table (LUT)..............................................................................................3.2 3.1.2 Slice......................................................................................................................... 3.2 3.1.3 Fast Carry Logic......................................................................................................3.3 3.1.4 Multiplier Unit.........................................................................................................3.5 3.1.5 Shift Register...........................................................................................................3.5 3.2 Configurable Logic Block (CLB)......................................................................................... 3.6 3.3 Interconnect and Routing ..................................................................................................... 3.7 3.3.1 Direct Connections..................................................................................................3.9 3.3.2 Single-Length Lines................................................................................................ 3.9 3.3.3 Double-Length Lines.............................................................................................3.10 3.3.4 Long Lines............................................................................................................. 3.10 3.3.5 Global lines............................................................................................................3.10 3.4 IOB-INPUT.OUTPUT BLOCK......................................................................................... 3.10 3.4.1 Pull-up and Pull-down Resistors........................................................................... 3.12 3.4.2 Digitally Controlled Impedance............................................................................ 3.12 3.5 Further advances in FPGAs................................................................................................3.13 3.6 Virtex Architecture..............................................................................................................3.13 3.6.1 Main features of Virtex ......................................................................................... 3.14 3.6.2 Block RAM........................................................................................................... 3.14 3.6.3 Delay-Locked Loop (DLL)................................................................................... 3.15

xx

Contents

3.7 Virtex-II.............................................................................................................................. 3.18 3.7.1 Digital Clock Manager (DCM).............................................................................3.19 3.8 Virtex-II Pro........................................................................................................................ 3.20 3.8.1 Main Features of Virtex-II Pro..............................................................................3.20 3.8.2 Rocket IO Transceiver Features............................................................................3.21 3.8.3 Processors in Xilinx FPGA....................................................................................3.21 3.8.3.1 Hard Core Processor IBM POWER PC 405........................................3.21 3.8.3.2 A Power PC based embedded system..................................................3.22 3.8.4 Soft Core Processor MICROBLAZE....................................................................3.23 3.8.4.1 An Embedded System Using Microblaze and Power PC.................... 3.24 3.9 Virtex-4............................................................................................................................... 3.25 3.9.1 Sub Families..........................................................................................................3.26 3.10 Virtex-5............................................................................................................................... 3.27 3.11 Virtex-6............................................................................................................................... 3.29 3.11.1 Virtex-6 sub-families.............................................................................................3.31 3.12 Virtex-7............................................................................................................................... 3.31 3.13 Spartan Family....................................................................................................................3.34 3.13.1 Spartan-3................................................................................................................ 3.34 3.13.2 Spartan-6................................................................................................................ 3.35 3.14 Boundary Scan.................................................................................................................... 3.40 3.15 FPGA Design flow.............................................................................................................. 3.40 3.16 FPGA vs ASIC.................................................................................................................... 3.41 3.17 Conclusions......................................................................................................................... 3.43 4.

Altera FPGAs..................................................................................................................... 4.1—4.23 4.1 Introduction...........................................................................................................................4.1 4.2 New Application, New Techniques...................................................................................... 4.3 4.3 Enter the Processors, and The Great Crash........................................................................... 4.4 4.4 Trends in IP........................................................................................................................... 4.6 4.5 Stratix FPGA......................................................................................................................... 4.8 4.6 Cyclone FPGAs....................................................................................................................4.9 4.7 Arria FPGA.........................................................................................................................4.15 4.8 An Evolving Methodology.................................................................................................4.17 4.9 The CPU-Centric Phase......................................................................................................4.18 4.10 Core and Multicore.............................................................................................................4.19 4.11 Consensus and Hardening................................................................................................... 4.21 4.12 Conclusions......................................................................................................................... 4.23

5. Microsemi FPGAs..............................................................................................................5.1—5.41 5.1 Introduction...........................................................................................................................5.1

Contents

xxi

5.2 Technologies.........................................................................................................................5.2 5.3 The Antifuse/Flash Advantage over SRAM based FPGAs.................................................. 5.2 5.3.1 Lower-Total-Cost-of-Ownership............................................................................ 5.2 5.3.2 Instant-on ...............................................................................................................5.3 5.3.3 Low-Power..............................................................................................................5.5 5.3.4 Reliability ...............................................................................................................5.5 5.4 FPGA Security...................................................................................................................... 5.5 5.4.1 Design Security....................................................................................................... 5.5 5.4.2 Data Security...........................................................................................................5.6 5.5 Faster Time-to-Market.......................................................................................................... 5.7 5.6 Antifuse FPGAs of Microsemi.............................................................................................5.7 5.6.1 SX-A/SX Architecture is shown in Fig. 5.3............................................................ 5.8 5.6.2 MX FPGAs..............................................................................................................5.9 5.7 ProASIC3 FPGA Overview ............................................................................................... 5.10 5.7.1 The ProASIC3 series ............................................................................................ 5.11 5.8 Third-Generation(ProASIC3) FPGA Architecture (Fig. 5.7)............................................. 5.12 5.9 IGLOO Low Power FPGA Family.....................................................................................5.13 5.10 IGLOO FPGA Family Overview .......................................................................................5.14 5.11 Flash*Freeze Technology...................................................................................................5.15 5.12 Small Footprint Packages................................................................................................... 5.15 5.13 SmartFusion2 SoC FPGA...................................................................................................5.15 5.14 Reliability........................................................................................................................... 5.17 5.15 Highest Security Devices....................................................................................................5.17 5.16 Design Security................................................................................................................... 5.17 5.17 Data Security......................................................................................................................5.18 5.18 Low Power..........................................................................................................................5.19 5.19 High-Performance FPGA Fabric........................................................................................ 5.19 5.20 Dual-Port Large SRAM (LSRAM)....................................................................................5.19 5.21 Three-Port Micro SRAM (uSRAM)...................................................................................5.19 5.22 Mathblocks for DSP Applications......................................................................................5.20 5.23 Microcontroller Subsystem (MSS)..................................................................................... 5.20 5.23.1 ARM Cortex-M3 Processor...................................................................................5.20 5.23.2 Cache Controller.................................................................................................... 5.20 5.23.3 DDR Bridge...........................................................................................................5.21 5.23.4 AHB Bus Matrix (ABM).......................................................................................5.21 5.23.5 System Registers...................................................................................................5.21 5.23.6 Fabric Interface Controller (FIC)..........................................................................5.21 5.23.7 Embedded SRAM (eSRAM).................................................................................5.22 5.23.8 Embedded NVM (eNVM).....................................................................................5.22

xxii

Contents

5.23.9 DMA Engines........................................................................................................5.22 5.23.10 APB Configuration Bus.........................................................................................5.22 5.23.11 Peripherals............................................................................................................. 5.22 5.24 Clock Sources: On-Chip Oscillators, PLLs, and CCCs......................................................5.24 5.25 High Speed Serial Interfaces.............................................................................................. 5.25 5.26 High Speed Memory Interfaces: DDRx Memory Controllers............................................ 5.25 5.27 MDDR Subsystem .............................................................................................................5.26 5.28 FDDR Subsystem .............................................................................................................5.26 5.29 IGLOO2 FPGAs .............................................................................................................5.26 5.29.1 High-Performance FPGA Fabric........................................................................... 5.27 5.29.2 Dual-Port Large SRAM (LSRAM).......................................................................5.28 5.29.3 Three-Port Micro SRAM (uSRAM)......................................................................5.28 5.29.4 Mathblocks for DSP Applications.........................................................................5.28 5.29.5 High-Performance Memory Subsystem (HPMS).................................................. 5.28 5.29.6 DDR Bridge...........................................................................................................5.29 5.29.7 AHB Bus Matrix (ABM).......................................................................................5.29 5.29.8 Fabric Interface Controller (FIC)..........................................................................5.29 5.29.9 Embedded SRAM (eSRAM).................................................................................5.29 5.29.10 Embedded NVM (eNVM).....................................................................................5.30 5.29.11 DMA Engines........................................................................................................5.30 5.29.12 APB Configuration Bus.........................................................................................5.30 5.29.13 Peripherals............................................................................................................. 5.30 5.30 Clock Sources: On-Chip Oscillators, PLLs, and CCCs......................................................5.30 5.31 High Speed Serial Interfaces.............................................................................................. 5.31 5.32 High Speed Memory Interfaces: DDRx Memory Controllers............................................ 5.32 5.33 MDDR Subsystem .............................................................................................................5.32 5.34 FDDR Subsystem .............................................................................................................5.32 5.35 Software Development Tools..............................................................................................5.33 5.35.1 LiberoSoC..............................................................................................................5.33 5.35.2 Libero IDE ............................................................................................................ 5.35 5.36 Programming and Debugging.............................................................................................5.37 5.36.1. FlashPro Hardware Programmer............................................................................ 5.37 5.36.2 FlashPro5 .............................................................................................................5.38 5.36.3 FlashPro4 .............................................................................................................5.39 5.36.4 FlashProLite..........................................................................................................5.39 5.37 Evaluation and Development Kits.......................................................................................5.40 5.37.1 SmartFusion2 Security Evaluation Kit..................................................................5.40

Contents

xxiii

6. Hardware Description Languages (Verilog and VHDL)................................................ 6.1—6.39 6.1 Introduction...........................................................................................................................6.1 6.2 Verilog HDL......................................................................................................................... 6.1 6.2.1 Terminology and Basic Concepts of Verilog...........................................................6.2 6.3 Examples of Hardware Modeling in Verilog...................................................................... 6.10 6.4 VHDL.................................................................................................................................6.19 6.5 Examples of Hardware Modeling in VHDL....................................................................... 6.25 6.6 System Verilog.................................................................................................................... 6.37 6.7 Summary............................................................................................................................. 6.38 7. FPGA Design Flow.............................................................................................................7.1—7.26 7.1 Introduction...........................................................................................................................7.1 7.2 Architecture Specification....................................................................................................7.1 7.2.1 Requirement Gathering........................................................................................... 7.2 7.2.2 Data Flow Specification..........................................................................................7.2 7.2.3 Control Flow Specification......................................................................................7.4 7.2.4 Clocking and Reset Specification............................................................................7.4 7.2.5 Performance Specification....................................................................................... 7.5 7.3 Behavioral Simulation..........................................................................................................7.5 7.3.1 Simulation using C based simulator........................................................................7.6 7.3.2 Simulation using MATLAB....................................................................................7.6 7.4 RTL Coding .........................................................................................................................7.7 7.4.1 RTL Coding Considerations.................................................................................... 7.7 7.4.1.1 Clock Considerations.............................................................................7.7 7.4.1.2 Reset Considerations..............................................................................7.8 7.4.1.3 RTL Design Considerations...................................................................7.8 7.4.2 Use of pre-verified IP..............................................................................................7.9 7.4.3 Use of High Level Synthesis ................................................................................ 7.10 7.5 Functional Simulation.........................................................................................................7.12 7.5.1 RTL Simulation with Directed Tests..................................................................... 7.13 7.5.2 RTL Verification with Constraint Random Test....................................................7.13 7.6 Synthesis............................................................................................................................. 7.14 7.6.1 User Constraint Specification............................................................................................. 7.15 7.6.2 Synthesis Strategy.................................................................................................7.15 7.7 Implementation................................................................................................................... 7.16 7.7.1 Design Optimization.............................................................................................. 7.17 7.7.2 Technology Mapping.............................................................................................7.17 7.7.3 Place and Route.....................................................................................................7.17 7.8 Static Timing Analysis........................................................................................................7.17 7.8.1 Definition of Setup and Hold Time.......................................................................7.17

xxiv

Contents

7.8.2 Static Timing Report Analysis............................................................................... 7.18 7.8.3 Handling Static Timing Violation in the Design................................................... 7.19 7.9 Post Route Timing Simulation............................................................................................7.19 7.9.1 Running Post Route Timing Simulation................................................................ 7.20 7.9.2 Handling Post Route Timing Failures...................................................................7.20 7.10 Power Estimation................................................................................................................7.20 7.10.1 Static Power Analysis............................................................................................7.22 7.10.2 Dynamic Power Analysis...................................................................................... 7.22 7.11 Device Configuration..........................................................................................................7.22 7.11.1 Bitstream Generation............................................................................................. 7.22 7.11.2 Device Programming.............................................................................................7.23 7.12 System Debugging .............................................................................................................7.24 7.12.1 FPGA Debug Interfaces......................................................................................... 7.24 7.12.2 Board Level Debugging........................................................................................7.24 7.12.3 RTL Design Debugging.........................................................................................7.24 8. Design Exercises................................................................................................................. 8.1—8.83 8.1 Introduction...........................................................................................................................8.1 8.2 Design Exercise 1: Design of a 4 bit ALU........................................................................... 8.1 8.2.1 Tools: Modelsim 6.4, Xilinx ISE v14.2...................................................................8.2 8.2.3 ALU Operations...................................................................................................... 8.2 8.2.4 ALU Block Diagram...............................................................................................8.3 8.2.5 ALU Operation Flow Chart..................................................................................... 8.4 8.2.6 Verilog Code............................................................................................................ 8.5 8.2.7 Verilog Test Bench...................................................................................................8.6 8.2.8 Simulation Results................................................................................................... 8.8 8.2.9 Synthesis Result..................................................................................................... 8.13 8.2.10 Timing Summary................................................................................................... 8.13 8.3 Background of Round Robin Algorithm............................................................................. 8.14 8.3.1 Background of Round Robin Algorithm............................................................... 8.14 8.3.2 Scope of Work....................................................................................................... 8.14 8.3.3 Project Overview...................................................................................................8.14 8.3.3.1 Abstract................................................................................................ 8.14 8.3.4 Round Robin Arboter Specification....................................................................... 8.15 8.3.5 Design of the Round Robin Arbiter Core.............................................................. 8.15 8.3.6 Top Design.............................................................................................................8.15 8.3.7 Pin Description...................................................................................................... 8.16 8.3.8 Top Design with Submodule................................................................................. 8.17 8.3.9 Verilog Code.......................................................................................................... 8.18 8.3.10 Submodules........................................................................................................... 8.19 8.3.10.1 Priority Logic.......................................................................................8.19

Contents

xxv

8.3.11 Verilog Code ......................................................................................................... 8.20 8.3.11.1 Verilog Code Leaf_Level Priority Logic............................................. 8.21 8.3.11.2 Verilog Code for Or_4:-....................................................................... 8.23 8.3.12 Arbitration Edge Detector.....................................................................................8.23 8.3.13 Verilog Code.......................................................................................................... 8.24 8.3.16 Feedback Logic.....................................................................................................8.25 8.3.15 Verilog Code.......................................................................................................... 8.26 8.3.16 Output Logic..........................................................................................................8.26 8.3.17 Verilog Code ......................................................................................................... 8.27 8.3.18 FSM Approach for the Round Robin Arbiter........................................................8.28 8.3.19 FSM Verilog Code.................................................................................................8.29 8.3.20 Test Benches Codes for DUT and FSM................................................................ 8.35 8.3.20.1 Verilog Code Test Bench for Arbiter_FSM .........................................8.35 8.3.20.2 Verilog Code for DUT.........................................................................8.37 8.3.21 Simulation Result.................................................................................................. 8.39 8.3.22 Implementation Results.........................................................................................8.40 8.3.23 Device and Tool Details........................................................................................8.41 8.3.24 Process Properties ................................................................................................. 8.41 8.3.25 Resource utilization Summary..............................................................................8.41 8.3.26 Applications...........................................................................................................8.41 8.4 Design Exercise 3: Design and Implementation of Booth Multiplier................................ 8.42 8.4.1 Introduction...........................................................................................................8.42 8.4.2 Booth’s Multiplication Algorithm......................................................................... 8.42 8.4.3 Description............................................................................................................ 8.43 8.4.4 Examples .............................................................................................................8.43 8.4.5 Design Details....................................................................................................... 8.45 8.4.5.1 PIN Description Table..........................................................................8.45 8.4.6 BLOCK DIAGRAM: Block Diagram of Booths Multiplier is shown in Fig 8.26.................................................................................................................. 8.46 8.4.7 Implementation...................................................................................................... 8.46 8.4.8 Verilog Code.......................................................................................................... 8.46 8.4.9 Test/Verification Plan.............................................................................................8.48 8.4.10 Test Bench............................................................................................................. 8.50 8.4.11 Simulation Results................................................................................................. 8.53 8.4.12 RTL Schematic...................................................................................................... 8.53 8.4.13 Technology Schematic........................................................................................... 8.54 8.4.14 Device Utilization Summary.................................................................................8.54 8.4.15 Timing Summary................................................................................................... 8.55

xxvi

Contents

8.5 Background......................................................................................................................... 8.55 8.6 Specifications of Data Partitioning System........................................................................8.56 8.7 High Level Design (Macro-Architecture) of DAS............................................................. 8.57 8.7.1 Pooler Card Macro-Architecture .......................................................................... 8.58 8.7.2 Bridge Card Macro-Architecture (virtex-5 board)................................................ 8.59 8.7.3 The Bridge Card Micro-architecture..................................................................... 8.61 8.7.3.1 Aurora Receiver IPs (Aurora_receiver_inst1/inst2)............................8.61 8.7.3.2 Frame Buffers (Frame_buffer_l1_pi1-pi8 & Frame_buffer_l1_po1po8, Frame_buffer_l2_pi1-pi8 & Frame_buffer_l2_po1-po8)............8.62 8.7.3.3 Frame Write Controllers (Frame_wr_cntlr1 & Frame_wr_cntlr2)......8.62 8.7.3.4 Frame Read/Write Controller (Frame_rdwr_cntlr)..............................8.62 8.7.3.5 DDR2 Memory Controller................................................................... 8.63 8.7.3.6 Burst Fetcher (brst_fschr).................................................................... 8.63 8.7.3.7 Burst Fifo (brst_fifo)............................................................................8.64 8.7.3.8 Burst Fifo Read Controllers (brst_frd_cntlr1-brst_frd_cntlr4)............8.64 8.7.4 Test Bench Framework for Bridge Card................................................................ 8.65 8.7.5 Synthesis of Bridge Card ...................................................................................... 8.66 8.7.6 Conclusions........................................................................................................... 8.81 9. Conclusions.........................................................................................................................9.1—9.10 9.1 Having seen SRAM based, Anti-fuse and Flash FPGAs, a comparison of these FPGAs in the concluding chapter will be well in place.................................................................... 9.1 9.1.1 SRAM based FPGAs...............................................................................................9.1 9.1.2 Antifuse-Based FPGAs...........................................................................................9.2 9.1.3 E2PROM/Flash-Based FPGAs................................................................................ 9.3 9.1.4 Hybrid Flash-SRAM FPGAs...................................................................................9.3 9.2 FPGA Security...................................................................................................................... 9.4 9.2.1 Need for FPGA Security.......................................................................................... 9.4 9.2.2 Security Methods..................................................................................................... 9.5 9.2.3 Modern FPGA Security...........................................................................................9.6 9.3 Future of FPGAs................................................................................................................... 9.7 9.3.1 Advances in Microelectronics................................................................................. 9.7 9.3.2 Architecture.............................................................................................................9.9 9.3.3 Design Tools.......................................................................................................... 9.10

1 Boolean Algebra and Logic Design: An Overview

1.1 INTRODUCTION All digital devices use binary number system. Binary number system will have two states ‘1’ and ‘0’. Active electronic components like transistors are used to represent these two states. In one state say ‘1’, the transistor is not at all conducting and is in OFF state. To represent the other state ‘0’ transistor will be fully conducting. These are two stable states representing the binary digit called BIT. Boolean algebra, developed by George Boole in the 1840s, is the logical calculus of truth tables and operates on elements that have one of two values ‘1’ or ‘0’ Since this book deals with digital devices called programmable logic devices using which Boolean functions will be implemented, it is appropriate to have a brief overview of Boolean Algebra.

1.2  TRUTH TABLE A truth table shows how a logic circuit’s output responds to various combinations of the inputs, using logic 1 for true and logic 0 for false. All permutations of the inputs are listed on the left, and the output of the circuit is listed on the right. The desired output can be achieved by a combination of logic gates. An example of a truth table for three input variables A, B, C is shown in Table 1.1 Since there are three binary variables there will be 23 = 8 input combinations. For certain combinations of the input variables A, B, C the out is 0 and for other combinations the output is 1.

1.2

Field Programmable Gate Arrays and Applications

Input Output Table 1.1: Truth Table for 3 variables A

B

C

Z

0

0

0

0

0

0

1

1

0

1

0

1

0

1

1

0

1

0

0

1

1

0

1

0

1

1

0

0

1

1

1

1

A minterm is a special product of literals, in which each input variable appears exactly once. For example, in the above truth table the minterm A’ B’ C gives the output of ‘1’. A’ is called complement of A so if A = 1, A’ will be 0 and if A = 0, A’ will be ‘1’, A Boolean function can be expressed algebraically from a given truth table by forming a minterm which is a product of literals for each combination of the variables that produces a 1 in the function and then taking the OR (+) of all those terms. For the truth table 1, this Boolean function is: Z = A’B’C + A’BC’ + AB’C’ + ABC This Boolean function formed out of minterms of the truth table is called SUM OF PRODUCT TERM (SOP) The above boolean function can also be written as: Z = m1 + m2 + m4 + m7 Or Z = S(m1, m2, m4, m7) A maxterm is a sum of literals, in which each input variable appears exactly once. Each maxterm is false for exactly one combination of inputs. A Boolean function can be expressed algebraically from a given truth table by forming a maxterm for each combination of the variables that produces a 0 in the function and then taking the AND (.) of all those terms. For the truth table 1.1, it can be seen that it is 0 for rows 0,3,5,6. So

Z = M0· M3· M5· M6 = ( A + B + C) . ( A + B + C) . ( A + B + C) . ( A + B + C) This form also lends itself to a compact notation: using the Greek letter capital pi to denote a product, we write only the numbers of the maxterms included in:

Z = P(0,3,5,6)

The inverse of the function can be expressed as a product (AND) of its 1-maxterms. It can be noted that in this case Z’ will be

Z’ = (A + B + C’) · (A’ + B + C) · (A’ + B + C’)



Z’ = P(1,2,4,7)

Boolean Algebra and Logic Design: An Overview

1.3

This Boolean function formed out of maxterms of the truth table is called PRODUCT of SUM TERM (POS) Boolean functions expressed as a sum of minterms or product of maxterms are said to be in canonical form. Boolean functions in canonical form can be implemented using basic logic gates.

1.3  BASIC LOGIC GATES 1.3.1  AND Gate The output is high only when both inputs A and B are high.

The AND operation will be signified by AB or A * B other common mathematical notations for it are A Ù B and A ∩ B, called the intersection of A and B.

1.3.2  OR Gate

The output is high when either or both of inputs A or B is high.

The OR operation will be signified by A + B other common mathematical notations for it, are A Ú B and A ∪ B, called the union of A and B.

1.3.3  NAND Gate

The output is high when either of inputs A or B is high, or if neither is high. In other words, it is normally high, going low only if both A and B is high.

1.4

Field Programmable Gate Arrays and Applications

1.3.4 NOR Gate The output is high only when neither A nor B is high. That is, it is normally high but any kind of non-zero input will take it low. The NOR gate and the NAND gate can be said to be universal gates since combinations of them can be used to accomplish any of the basic operations and can thus produce an inverter, an OR gate or an AND gate. The non-inverting gates do not have this versatility since they cannot produce an invert. A NOR

A+B

B

A 0 0 1 1

B 0 1 0 1

Out 1 0

0 0

1.3.5 Buffer The buffer is a single-input device which has a gain of 1, mirroring the input at the output. It has value for impedance matching and for isolation of the input and output and may have more fan-out capacity. In

Out

In 0 1

Out 0 1

1.3.6  Inverting Buffer The inverting buffer is a single-input device which produces the state opposite the input. If the input is high, the output is low and vice versa. In

Out

In 0 1

Out 1 0

This device is commonly referred to as just an inverter.

Boolean Algebra and Logic Design: An Overview

1.5

1.3.7 Exclusive OR Gate The output is high when either of inputs A or B is high, but not if both A and B are low or high. A

AB

XOR B

A 0 0 1 1

B 0 1 0 1

Out 0 1

1 0

1.3.8 Exclusive NOR Gate The output is high when both inputs A and B are low or high and low when neither A nor B is high. A

AB

XNOR B

A 0 0 1 1

B 0 1 0 1

Out 1 0

0 1

1.3.9 Tri-state Gate There is another gate called tri-state buffer. This is commonly used in Bus based systems. c

c

x

z Tri-state buffer with active high control

x

z

Tri-state buffer with active low control

Apart from input and output, this gate will have a control signal which can be active HIGH or active LOW. When this control signal is not activated the gate allows the input to output. That means X = Z. When the control signal is activated, it will not allow the input signal to output but places the output in a High Impedance state. If the Boolean functions in canonical form are implemented using the logic gates, it may so happen that many logic gates may be required to realize the Boolean function. Since logic gate is an electronic circuit, there will be some delay from input to reliable output. Reducing the number of gates not only reduces the cost of implementation and also the delay from

1.6

Field Programmable Gate Arrays and Applications

inputs to outputs, Boolean functions can be minimized using various Boolean theorems. This minimization will yield the same result with reduction in number of logic gates and the delay from input to output.

1.4  BOOLEAN FUNCTION MINIMIZATION inimization theorems with proof and also some methods are discussed in the following M sections.

1.4.1 Theorems Theorem 1:

a + b = b + a, ab = ba (commutative)

Theorem 2 :

a + bc = (a + b) (a + c) (distributive)



a(b + c) = ab + ac

Theorem 3:

a + 0 = a, a1 = a

(identity)

Theorem 4:

a + a’ = 1, a a’ = 0

(complement)

Theorem 5 (Involution Laws): For every element a in B, (a’)’ = a Proof: a is one complement of a’.

The complement of a′ is unique

Thus

a = (a′)′

Theorem 6: For every pair a,b in B, a· (a + b) = a; a + a· b = a. Proof: a(a + b)

(Absorption Law): = (a + 0) (a + b) (P1)

= a + ab + 0.a + 0·b (P2) = a + ab + 0 + 0

(P3)

= a + ab (P4) = a(1 + b) (P5) = a

(p6)

Theorem 7: For every pair a, b in B

a + a’ × b = a + b; a × (a’ + b) = a × b

Proof: a + a’ × b = (a + a’) × (a + b) (P1) = (1) × (a + b) (P2) = (a + b) (P3) Theorem 8 (Demorgan’s Theorem): For every pair a, b in set B:

Boolean Algebra and Logic Design: An Overview

1.7

(a + b)’ = a’b’, and (ab)’ = a’ + b’. Proof: We show that a + b and a’b’ are complementary. In other words, we show that both of the following are true (P1): (a + b) + (a’ b’) = 1, (a + b) (a’ b’) = 0. This Demorgan’s theorem can be proved using Truth Tables A

B

A+B

(A + B)’

A’B’

0

0

0+0

1

1

0

1

0+1

0

0

1

0

1+0

0

0

1

1

1+1

0

0

The above truth table proves the Demorgan theorem (A + B)’ = A’B’ A

B

(AB)’

A’ + B’

0

0

1

1

0

1

1

1

1

0

1

1

1

1

0

0

The above truth table proves the Demorgan theorem (AB)’ = A’ + B’ Theorem 9. Boolean Transformation Show that a’b’ + ab + a’b = a’ + b Proof 1: a’b’ + ab + a’b = a’b’ + (a + a’)b (P1) = a’b’ + b (P2) = a’ + b Proof 2: a’b’ + ab + a’b = a’b’ + ab + a’b + a’b = a’b’ + a’b + ab + a’b (P1) = a’(b’ + b) + (a + a’)b (P2) = a’ × 1 +1 × b (P3) = a’ + b (P4) Theorem 10. Boolean Transformation (a’b’ + c)(a + b) (b’ + ac)’ = (a’b’ + c) (a + b) (b(ac)’) (De Morgan’s) = (a’b’ + c) (a + b) b(a’ + c’) (DeMorgan’s) = (a’b’ + c)b (a’ + c’) (Absorption) = (a’b’b + bc) (a’ + c’) (P1) = (0 + bc) (a’ + c’) (P2) = bc(a’ + c’) (P3)

1.8

Field Programmable Gate Arrays and Applications

= a’bc + bcc’ = a’bc + 0 = a’bc

(P4) (P5) (P6)

1.4.2  Karanugh Map The Karnaugh map, also known as the K-map, is a method of simplifying Boolean algebra expressions. This K-Map was introduced by Maurice Karnaugh as an improvement over the Veitch Diagram method suggested by Edward Veitch in 1952. The Karnaugh map reduces the need for extensive calculations by taking advantage of humans’ patternrecognition capability. It also permits the rapid identification and elimination of potential race conditions. The required Boolean results are transferred from a truth table onto a two-dimensional grid where the cells are ordered in Gray Code and each cell position represents one combination of input conditions, while each cell value represents the corresponding output value. Optimal groups of 1s or 0s are identified, which represent the terms of a canonical form of the logic in the original truth table. These terms can be used to write a minimal Boolean expression representing the required logic. Karnaugh maps are used to simplify real-world logic requirements so that they can be implemented using a minimum number of physical logic gates. A Sum of Products (SOP) can always be implemented using Logic AND and OR gates. AND gates feeding into an OR gate and a Product of SUM (POS) expression uses OR gates feeding an AND gate.

1.4.2.1  Boolean Algebra Expression Minimization by Karanugh Map If there are two input variables, there will be four combinations. Karanugh map will be arranged in a 2 × 2grid with two rows and two columns. For three input variables there will be eight combinations and Karanugh map grid will be 2 × 4 with 2 rows and 4 columns or 4 rows and 2 columns. For four input variables, because of 16 combinations, the Karanugh map will be organized as 4 × 4 grid with four rows and four columns as shown in Fig..1.1. 2 variables: B A

0

1

01

11

0

1

3 variables: BC A 0

1

00

10

Boolean Algebra and Logic Design: An Overview

1.9

4 variables: CD 00 AB

01

11

10

00

01

11

10

Fig. 1.1 Karanugh Maps for two, three and four variables.

The row and column values are shown across the top, and down the left side of the Karnaugh map and they are ordered in Gray Code rather than binary numerical order. Gray code ensures that only one variable changes between each pair of adjacent cells. Each cell of the completed Karnaugh map contains a binary digit representing the function’s output for that combination of inputs. After the Karnaugh map has been constructed it is used to find one of the simplest possible forms—a canonical form—for the information in the truth table. Adjacent 1s in the Karnaugh map represent opportunities to simplify the expression. The minterms (‘minimal terms’) for the final expression are found by encircling groups of 1s in the map. Minterm groups must be rectangular and must have an area that is a power of two (i.e., 1, 2, 4, 8…). Minterm rectangles should be as large as possible without containing any 0s. Groups may overlap in order to make each one larger. The grid is toroidally connected, which means that rectangular groups can wrap across the edges. Cells on the extreme right are actually ‘adjacent’ to those on the far left; similarly, so are those at the very top and those at the bottom.

1.4.2.2 Minimization Rules The Karnaugh map uses the following rules for the simplification of expressions by grouping together adjacent cells containing ones. Examples of correct grouping and wrong grouping are illustrated in Fig. 1.2

● Groups may not include any cell containing a zero. A

0

A

1

B

0

1

1

1

B

0

0

1

1

0 ✓

X

Wrong X

1

Right ✓

1.10

Field Programmable Gate Arrays and Applications

● Groups may be horizontal or vertical, but not diagonal. A

0

A

1

B

0

1

0

1

B

0

0

0

1



X 1

1

1

0

1

Right ✓

Wrong X



1

● Groups must contain 1, 2, 4, 8, or in general 2n cells.

That is if n = 1, a group will contain two 1’s since 21 = 2. If n = 2, a group will contain four 1’s since 22 = 4. A

0

AB 00 C

1

B

Group of 2

0

1

1

1

0

0

01

11

10 Group of 3

0

0

1

1

1

1

0

0

0

0



X

Right ✓ A

0

Wrong X AB 00 C

1

B

Group of 4

0

1

1

1

1

1

0

1

01

11

10

1

1

1

Group of 5 X

1

0

Right ✓



0

0

1

Wrong X

● Each group should be as large as possible. AB 00 C

01

11

10

1

1

1

1

0

AB 00 C

01

11

10

0

1

1

1

1

1

0

0

0

1

✓ 1

0

1

0

X

1

Right ✓



Wrong X

● Each cell containing a one must be in at least one group. AB 00 C

01

11

10

0

0

0

1

1

1

0

0

0

1

Group I I present in at least one group. Group II

Boolean Algebra and Logic Design: An Overview

1.11

● Groups may overlap. AB 00 C

01

11

10

0

1

1

1

1

1

0

0

1

1

Groups overlapping ✓

Right ✓ AB 00 C

01

11

10

0

1

1

1

1

1

0

0

1

1

Groups not overlapping X

Wrong X



● Groups may wrap around the table. The leftmost cell in a row may be grouped with the rightmost cell and the top cell in a column may be grouped with the bottom cell. Top cell AB C 00 Leftmost cell

0

1

1

1

01

11

10

1

1

1

1

1

Rightmost cell

Bottom cell



● There should be as few groups as possible, as long as this does not contradict any of the previous rules.

Fig. 1.2 Examples of correct grouping and wrong grouping in Kanranugh Maps.

1.4.2.3 Summmary

1. No zeros allowed.



2. No diagonals.



3. Only power of 2 numbers of cells in each group.



4. Groups should be as large as possible.

1.12

Field Programmable Gate Arrays and Applications



5. Every one must be in at least one group.



6. Overlapping allowed.



7. Wrap around allowed.



8. Fewest number of groups possible. Courtesy: http://www.ee.surrey.ac.uk/Projects/Labview/minimisation/karrules.html

1.4.2.4 Don’t cares Karnaugh maps also allow easy minimizations of functions whose truth tables include ‘Don’t care’ conditions. A ‘‘don’t care’’ condition is a combination of inputs for which the designer doesn’t care what the output is. Therefore ‘‘don’t care’’ conditions can either be included in or excluded from any circled group, whichever makes it larger. They are usually indicated on the map with a dash or X.

1.4.2.5  Design Example Race hazards Karnaugh maps are useful for detecting and eliminating race hazards. Race hazards are very easy to spot using a Karnaugh map, because a race condition may exist when moving between any pair of adjacent, but disjointed, regions circled on the map.

1.4.2.6 Some Examples Example 1

F = x’yz’ + x’yz + xy’z’ + xy’z + xyz

Boolean Algebra and Logic Design: An Overview

1.13

Example 2. Use a K-Map to simplify the following Boolean expression F(a, b, c) = Σm (2, 3, 6, 7). Perform three variable KMAP for the above Boolean equation which is F = a’bc’ + a’bc +abc’ + abc Step 1: Plot the K-map AB C

00

01

11

0

1

1

1

1

1

01

11

0

1

1

1

1

1

10

Step 2: Circle Prime Implicants AB C



00

10

F(a, b, c) = Σm (2, 3, 6, 7)

Solution:

F(a, b, c) = b

Example 3. Use a K-Map to simplify the following Boolean expression

F(a, b, c, d) = Σm (0, 2, 3, 6, 8, 12, 13, 15)

Draw the K-Map AB CD

00

00

1

01

01

11

10

1

1

1

11

1

10

1

1 1

1.14

Field Programmable Gate Arrays and Applications

Solution.

F = a′ b′ d ′ + a′ b′ c + abd + ac′ d ′ Now we show an example using don’t care terms Example 4. Use a K-Map to simplify the following Boolean expression F(a, b, c, d) = Sm (0, 2, 6, 8, 12, 13, 15) + d(3, 9, 10) D = Don’t care (i.e. either 1 or 0) Draw the K-Map AB CD

00

01

11

10

00

1

d

1

1

1

d

01



11

d

10

1

1 1

F (a, b, c, d) = Σm (0, 2, 6, 8, 12, 13, 15) + d(3, 4, 9) F = a′ d ′ + ac ′ + abd

Example 5. Given the Boolean function Z = A’B’C’D’ + A’BC’D’ + ABC’D’ + AB’C’D’ + A’BC’D + ABC’D + A’B’CD’ + AB’CD’. Minimize the function using K Map Solution. AB CD

00

01

11

10

00

1

1

1

1

1

1

01

11

10



1

Z = + BC’ + B’D’

1

Boolean Algebra and Logic Design: An Overview

1.15

1.4.3  Elimination of Race Hazards As mentioned earlier Karnaugh maps are useful for detecting and eliminating race hazards. Race hazards are very easy to spot using a Karnaugh map, because a race condition may exist when moving between any pair of adjacent, but disjointed, regions circled on the map. Consider the example:

f (A, B, C, D) = Σm (6, 8, 9, 10, 11, 12, 13, 14)

Karnaugh Map for this Boolean equation is shown in Fig. 1.3.

● In the example, a potential race condition exists when C is 1 and D is 0, A is 1, and B changes from 1 to 0 (moving from the blue state to the green state (Fig. 1.3). For this case, the output is defined to remain unchanged at 1, but because this transition is not covered by a specific term in the equation, a potential for a glitch (a momentary transition of the output to 0) exists.



● There is a second potential glitch in the same example that is more difficult to spot: when D is 0 and A and B are both 1, with C changing from 1 to 0 (moving from the blue state to the red state). In this case the glitch wraps around from the top of the map to the bottom.

01

11

10

00

0

0

1

1

01

0

0

1

1

11

0

0

0

1

0

1

1

1

CD

00

10

AB

Fig. 1.3 Karnaugh Map for Race hazard example.

Above k-map (Fig. 1.3) with the AD term added to avoid race hazards. Whether these glitches will actually occur depends on the physical nature of the implementation, and whether we need to worry about it depends on the application. In this case, an additional term of AD would eliminate the potential race hazard

1.16

Field Programmable Gate Arrays and Applications

The term is redundant in terms of the static logic of the system, but such redundant, or consensuses terms are often needed to assure race-free dynamic performance. Courtesy: https://www.google.co.in/search?hl=enIN&source= hp&q=karnaugh+map+tutorial+ppt&g bv= 2&oq= karn&gs_l=heirloomhp.1.0.35i39j0i20l2j0l7.2937.6703.0.11297.4.4.0.0.0.0.610. 1485.22j1j0j1.4.0....0...1ac.1.34.heirloom-hp..0.4.1485.GwfNqIIdJig Karnaugh map could be used for more than four variables.

1.4.4  Quine-McCluskey Algorithm The Quine-McCluskey algorithm (or the method of prime implicants) is a method used for minimization of Boolean functions which was developed by W.V.Quine and Edward J. McCluskey in 1956. It is functionally identical to Karnaugh mapping, but the tabular form makes it more efficient for use in computer algorithms, and it also gives a deterministic way to check that the minimal form of a Boolean function has been reached. It is sometimes referred to as the tabulation method. The method involves two steps:

1. Finding all prime implicants of the function.



2. Use those prime implicants in a prime implicant chart to find the essential prime implicants of the function, as well as other prime implicants that are necessary to cover the function.

Although more practical than Karnaugh mapping when dealing with more than four variables, the Quine-McCluskey algorithm also has a limited range of use since the problem it solves is NP-hard: the runtime of the Quine-McCluskey algorithm grows exponentially with the number of variables. It can be shown that for a function of n variables the upper bound on the number of prime implicants is 3n/n. If n = 32 there may be over 6.5 ∗ 1015 prime implicants. Functions with a large number of variables have to be minimized with potentially non-optimal heuristic methods, of which the Espresso heuristic logic minimize is the de facto standard.

1.4.4.1 Example Step 1: finding prime implicants Minimizing an arbitrary function:

f(A, B, C, D) = S(m (4, 8, 10, 11, 12, 15) + d(9, 14)

This expression says that the output function f will be 1 for the minterms 4, 8, 10, 11, 12 and 15 (denoted by the ‘m’ term). But it also says that we don’t care about the output for 9 and 14 combinations (denoted by the ‘d’ term). (‘x’ stands for don’t care). A

B

C

D

f

m0

0

0

0

0

0

0

0

0

1

0

m2

0

0

1

0

0

m1

Boolean Algebra and Logic Design: An Overview

1.17

m3

0

0

1

1

0

m4

0

1

0

0

1

m5

0

1

0

1

0

m6

0

1

1

0

0

m7

0

1

1

1

0

m8

1

0

0

0

1

m9

1

0

0

1

x

m10

1

0

1

0

1

m11

1

0

1

1

1

m12

1

1

0

0

1

m13

1

1

0

1

0

m14

1

1

1

0

x

m15

1

1

1

1

1

One can easily form the canonical sum of products expression from this table, simply by summing the minterms (leaving out don’t-care terms) where the function evaluates to one:

fA,B,C,D = A’BC’D’ + AB’C’D’ + AB’CD’ + AB’CD + ABC’D’ + ABCD

which is not minimal. So to optimize, all minterms that evaluate to one are first placed in a minterm table. Don’t-care terms are also added into this table, so they can be combined with minterms: Number of 1s

Minterm

Binary Representation

1

m4

0100

m9

1001

m12

1100

m14

1110

2

3 4

m8

1000

m10

1010

m11

1011

m15

1111

At this point, one can start combining minterms with other minterms. If two terms vary by only a single digit changing, that digit can be replaced with a dash indicating that the digit doesn’t matter. Terms that can’t be combined any more are marked with a ‘‘∗’’. When going from Size 2 to Size 4, treat ‘–’ as a third bit value. Ex: –110 and –100 or -11- can be combined, but not -110 and 011-. (Trick: Match up the ‘–’ first.)

1.18

Field Programmable Gate Arrays and Applications

Number of 1s

Minterm

0-Cube

Size 2 Implicants

Size 4 Implicants

1

m4

0100

m(4,12) -100*

m(8,9,10,11) 10--*

m8

1000

m(8,9) 100-

m(8,10,12,14) 1--0*

--

--

m(8,10) 10-0

--

--

--

m(8,12) 1-00

--

m9

1001

m(9,11) 10-1

m(10,11,14,15) 1-1-*

m10

1010

m(10,11) 101-

--

m12

1100

m(10,14) 1-10

--

--

--

m(12,14) 11-0

--

m11

1011

m(11,15) 1-11

--

1110

m(14,15) 111-

--

m15

1111

--

--

2

3 4

m14

Note: In this example, none of the terms in the size 4 implicants table can be combined any further. Be aware that this processing should be continued otherwise (size 8 etc.). Step 2: prime implicant chart None of the terms can be combined any further than this, so at this point we construct an essential prime implicant table. Along the side goes the prime implicants that have just been generated, and along the top go the minterms specified earlier. The don’t care terms are not placed on top - they are omitted from this section because they are not necessary inputs.

4 8 10 11 12 15 ⇒ A B C D m(4, 12)*  X  X ⇒ – 1 0 0



m(8, 9, 10, 11)  X X X ⇒



m(8, 10, 12, 14)  X X  X ⇒



m(10, 11, 14, 15)*  X  X  X ⇒ 1 – 1 –

To find the essential prime implicants, we run along the top row. We have to look for columns with only 1 star. If a column has only 1 star, this means that the minterm can only be covered by 1 prime implicant. This prime implicant is essential. For example: in the first column, with minterm 4, there is only 1 star. This means that m(4, 12) is essential. So we place a star next to it. Minterm 15 also only has 1 star. This means that m(10, 11, 14, 15) is also essential. Now all columns with 1 star are covered. The second prime implicant can be ‘covered’ by the third and fourth, and the third prime implicant can be ‘covered’ by the second and first, and neither is thus essential. If a prime implicant is essential then, as would be expected, it is necessary to include it in the minimized boolean equation. In some cases, the essential prime implicants do not cover all

Boolean Algebra and Logic Design: An Overview

1.19

minterms, in which case additional procedures for chart reduction can be employed. The simplest ‘‘additional procedure’’ is trial and error, but a more systematic way is Petrick’s Method. In the current example, the essential prime implicants do not handle all of the minterms, so, in this case, one can combine the essential implicants with one of the two non-essential ones to yield one equation:

fA,B,C,D = BC’D’ + AB’ + AC

Both of those final equations are functionally equivalent to the original, verbose equation

fA,B,C,D = A’BCD’ + AB’C’D + AB’C’D + AB’CD’ + AB’CD

+ ABC’D’ + ABCD’ + ABCD Courtesy:

http://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm As an exercise, the same Boolean equation

fA,B,C,D = Σm (4, 8, 10, 11, 12, 15) + d(9, 14).

Will be subjected to minimization using Karnaugh method. The Karanugh map is shown in Fig. 1.4



F(A, B, C, D) = BC’D’ + AC + AB’ Fig. 1.4 Karnaugh Map for the Boolean function



f(A, B, C, D) = Σm(4, 8, 10, 11, 12, 15) + d(9, 14). The same result that is obtained using Quine McClusky method.

1.4.4.2 Alternate Method Now an alternate solution for the same equation is tried using Quine McClusky method.

f(A, B, C, D) = Σm(4, 8, 10, 11, 12, 15) + d(9, 14).

1.20

Field Programmable Gate Arrays and Applications Table 1 0

1

0

0

1

0

0

0

......................................................................... 1

0

0

1

1

0

1

0

1

1

0

0

......................................................................... 1

0

1

1

1

1

1

0

......................................................................... 1

1

1

1

......................................................................... Table 2 1

0





1

0





1





0

P2 P3

......................................................................... 1





9

1



1



P3

......................................................................... 1



1



P4

......................................................................... Table 3 0

1

0

0

1

0





1





0

1



1



The PI are P1 = A’BC’D’ P2 = AB’

P3 = AD’’ P4 = AC

P1

P2 P3 P4

Boolean Algebra and Logic Design: An Overview PI

1.21 MINTERMS

P1 = 0 1 0 0 BC’D’ P2 = 1 0 – – AB’

P3 = 1 – – 0 AD’

8

10

×

×

×

×

P4 = 1 – 1 – AC

12

11

14

15

× ×

×

×

×

×

×

AC is essential PI since 15 is covered only by AC REMOVE ROWS COLUMNS COVERED BY PI AC GET REDUCED TABLE as BELOW remove columns 10 11 14 15 row P4 (COLLECT PI AC). REDUCED PI CHART 8

12

BC’D’

×

AB’

×

AD’

×

×

To cover 8 and 12 either AB’ or AD’ can be used. Function form is f1 = AC + AD’. This is irreducible and minimal form. Checking the answer AC covers

10

11

14

15

AD’ cover

8

10

12

14

So

f1 = AC + AD’ = sigma  8  10  11  12 14  15

Other form is f2 = AC + BC’D’ + AB’

Comparison of Quine McClusky Mehtod over Karnaugh Method. Quine McClusky method is known as tabular method, a more systematic method of minimizing expressions of larger number of variables. Therefore, it has an edge over the disadvantage of Karnaugh Map method where it supports a

01

11

10

1

1

1

00 01

If mapping is done differently in the Karnaugh map, the second solution will be obtained as shown in Fig. 1, 5.

00

X

11

There are two minimal equations for the given Boolean equation. One solution BC’D’ + AC + AB’ and another is BC’D’ + AC + AD’.

AB CD

1

1

10

Accordingly f1 is correct minimal form. The form given by net is not minimal but irreducible form.

x

1

F(A, B, C, D) = BC’D’ + AC + AD’ Fig 1.5 Karnaugh map.

1.22

Field Programmable Gate Arrays and Applications

maximum of six variables. Qunie McClusky method is very suitable for hand computation as well as for the soft program implementation.

1.4.5  Logic Circuits There are two types of logic circuits—one is combinational and other one is Sequential.

1.4.5.1  Combinational Circuits A combinational circuit consists of logic gates whose outputs at any time are determined from only the present combination of inputs. A combinational circuit performs an operation that can be specified logically by a set of Boolean functions. Classification combinational circuits is shown in Fig. 1.6

Fig. 1.6 Classification combinational circuits.

1.4.5.2  Examples of Combinational Circuits The Boolean expression from a truth table is Q = CB’A’. This expression could be implemented with two inverters and an AND gate. The Fig 1.7 shows how to implement the same Boolean function in a different way. Logic Gates A Digital B Inputs C

(A.D)

Boolean Expression Q = (A, B), (A+B), C Output (Q) C B A Q

(A+B) Logic Diagram

Typical Truth Table

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 0 0 0 1 0 0 0

Fig. 1.7 Implementation of simple Boolean function OUTPUT = CB’A’.

Boolean Algebra and Logic Design: An Overview

1.23

1.4.5.3 Half Adder Half adder is a combinational logic circuit with two inputs and two outputs. The half adder circuit is designed to add two single bit binary numbers A and B. It is the basic building block for addition of two single bit numbers. This circuit has two outputs carry and sum. Block Diagram A

Sum ‘s’ Half Adder

B

Carry ‘c’

Truth Table Inputs

Output

A

B

S

C

0

0

0

0

0

1

1

0

1

0

1

0

1

1

0

1

Circuit Diagram A S B

C

Fig. 1.8 Half Adder

1.4.5.4 Full Adder Full adder is developed to overcome the drawback of Half Adder circuit. It can add two onebit numbers A and B, and carry C which comes from the previous stage. The full adder is a three input and two output combinational circuit. Block Diagram

1.24

Field Programmable Gate Arrays and Applications

Truth Table Inputs A

B

0

0

0

Output Cin

S

Co

0

0

0

0

1

1

0

0

1

0

1

0

0

1

1

0

1

1

0

0

1

0

1

0

1

0

1

1

1

0

0

1

1

1

1

1

1

Circuit Diagram

Fig. 1.9 Full Adders.

1.4.5.5  2 to 4 Line Decoder The block diagram of 2 to 4 line decoder is shown in the fig. A and B are the two inputs where D through D are the four outputs. Truth table explains the operations of a decoder. It shows that each output is 1 for only a specific combination of inputs. Block Diagram Highest priority input

A

D2 2 to 4 line decoder

B

D2 D2 D0 Lowest priority input



Boolean Algebra and Logic Design: An Overview

1.25

Truth Table Inputs

Output

A

B

D0

D1

D2

D2

0

0

0

1

0

1

0

0

0

1

0

0

1

0

1

1

0

0

0

1

1

0

0

0

Logic Circuit

Fig. 1.10 2 to 4 Line Decoder

1.4.5.6 4 to 1 Multiplexer The following 4-to-1 multiplexer is constructed from 3-state buffers and AND gates (the AND gates are acting as the decoder):

1.26

Field Programmable Gate Arrays and Applications

00 I0

01

I1

Out 10

I2

11 I3

00

01

10

11

Set2 Set1

Fig. 1.11 4 to 1 Multiplexer.

1.4.5.7  BCD to Excess-3 Code Converter BCD Input Excess-3 Output Decimal. A

B

C

D

W

X

Y

Z

0

0

0

0

0

0

0

1

1

1

0

0

0

1

0

1

0

0

2

0

0

1

0

0

1

0

1

3

0

0

1

1

0

1

1

0

4

0

1

0

0

0

1

1

1

5

0

1

0

1

1

0

0

0

6

0

1

1

0

1

0

0

1

7

0

1

1

1

1

0

1

0

8

1

0

0

0

1

0

1

1

9

1

0

0

1

1

1

0

0

Boolean Algebra and Logic Design: An Overview

1.27

From the truth table for the code converter, use K-Map to minimize the Boolean expressions.

From the K-maps, we can reduce the expressions for each of the outputs to minimal sum of products.

W = A + B C + B D = A + B(C + D)



X = BC + BD + BCD



Y = CD + CD = C XOR D



Z = D

Unlike in the case of minimisation for a single output, the alternative form of the expressions with further factorisation of the term (C + D) allow for the sharing of the circuits in this case. The penalty however is increased delay because there are more gates in series between the inputs and outputs.

1.28

Field Programmable Gate Arrays and Applications A W

B

X

C Y

D

Z

Fig. 1.12 BCD to Excess-3 code converter.

1.4.5.8  Programmable Logic Devices are Discussed in Chapter 2 1.4.5.9  An Example of a Combinational Circuit Design Step 1: Define the problem Design a 3 input majority detector. If two or more inputs are one the output is high. Step 2 : Call the 3 inputs A, B, and C. Call the output F. Step 3 : Draw the function table A

B

C

D

0

0

0

0

0

0

1

0

0

1

0

0

0

1

1

1

1

0

0

0

1

0

1

1

1

1

0

1

1

1

1

1

Step 4 - Simplify the expression

F = ABC + ABC + ABC + ABC

Boolean Algebra and Logic Design: An Overview

1.29



F = ABC + ABC + ABC + ABC + ABC



F = BC(A + A) + AB(C + C) + AC(B + B)

F = BC + AB + AC  Simplified SOP form A AB 0 0

0 1

1 1

0

0

0

1

0

1

0

1

1

1

C

C

1 0

B

Step 5 : Draw the logic diagram

1.4.6  Sequential Circuits

Sequential circuits employ storage elements in addition to logic gates. Their outputs are a function of the inputs and the state of the storage elements. Because the state of the storage elements is a function of previous inputs, the outputs of a sequential circuit depend not only on Present values of inputs, but also on past inputs, and the circuit behaviour must be specified by a time sequence of inputs and internal states. Sequential circuits are the building blocks of digital systems.

1.4.6.1 Examples

(a) R-S Latch using NOR gates. R and S are inputs and Q is output and Q’ is complement of Q. R Q

Q S

Fig. 1.13(a) RS Latch (NOR)

1.30

Field Programmable Gate Arrays and Applications

(b) S’ and R’ Latch using NAND gates S Q

Q R

Fig. 1.13 (b) R S Latch (NAND)

(c) A synchronous SR latch (sometimes clocked SR flip-flop) can be made by adding a second level of NAND gates to the inverted SR latch (or a second level of AND gates to the direct SR latch). The extra NAND gates further invert the inputs so the simple SR latch becomes a gated SR latch (and a simple SR latch would transform into a gated SR latch with inverted enable). With E high (enable true), the signals can pass through the input gates to the encapsulated latch; all signal combinations except for (0, 0) = hold then immediately reproduce on the (Q, Q’) output, i.e. the latch is transparent. With E low (enable false) the latch is closed (opaque) and remains in the state it was left the last time E was high. The enable input is sometimes a clock signal but more often a read or write strobe. R Q

E

Q S

Fig. 1.14 A gated SR latch circuit diagram constructed from NOR gates.

(c) D Flip Flop D Q

Q E

Fig. 1.15A D-type transparent latch based on an SR NAND latch

Boolean Algebra and Logic Design: An Overview

1.31

Q Clock Q

Data

Fig. 1.16A Positive-edge-triggered D flip-flop.

(d) Another important flip-flop is J-K Flip-Flop. Inputs

Q

Q’

J

K

0

0

Q

Q’  ( no change)

0

1

0

1

1

0

1

0

1

1

Toggles ( If Q was 1 it becomes 0 and if it is 0 Q becomes 1) J Q

Clock

Q K

Fig. 1.17 J-K Flip Flop.

(e) Registers: A n bit register will have n flip-flops to store n bits of information. Registers will have control signals such CLEAR and LOAD. Clear signal clears the register and all bits in the register are zero. LOAD control signal loads the input into the register. Each flip flop will have an input signal and output signal. A 4 bit register is shown in Fig.  1.18.

1.32

Field Programmable Gate Arrays and Applications D

D0

Q0 C R

Clock Clear D

D1

Q1 C R

D2

D

Q2 C R

D

D3

Q3 C R

Fig. 1.18 4 bit Register using D Flip-Flops with CLEAR and LOAD with Clock.

( f ) Shift Registers: Shift register will shift the data in the shift register by one bit either right or left depending on the CONTROL for each clock. Q2

Q1

D S Q

Data In

R

Q

D S Q

R

Q

D S Q

R

D S Q

Q

R

Q

Clock

Fig. 1.19 Serial-in parallel-out shift Right Register. D2

D3

D4

W rit

e/ S

hi ft

D1

D sQ

D sQ

D sQ

D sQ

RQ

RQ

RQ

RQ

Clock

Fig. 1.20 Parallel-in Serial out 4 bit Shift Right Register.

Q

Boolean Algebra and Logic Design: An Overview

1.33

Fig. 1.21 Serial in Parallel out Shift Register.

Shift right/left Register L/R (L/R=1)

SL

SR

SR cascade

CLK D

Q

Ck

D

D

Q

Ck

Ck

Q QC

SL cascade QA

QB

QC

Shift left/right, right action

Fig. 1.22 Register with right/left shift control.

(g) Counters: There are asynchronous and synchronous counters. An asynchronous counter is shown in Fig. 1.23. Assume initially the counter is cleared. When the clock makes a transition from HIGH to LOW then the flip-flop state changes. Clock is given to the first Flip-Flop. This first Flip-Flop is the least significant bit of the counter. The output of first Flip Flop goes as clock to the second stage and the output of second stage flip-flop goes as clock to the third stage and so on. In the asynchronous 4 bit counter shown in Fig. 1.23 the

1.34

Field Programmable Gate Arrays and Applications

left most flip-flop is the least significant bit of the counter and the last flip-flop is the most significant bit of the counter. When the clock to the first stage makes a transition from HIGH to LOW this flip-flop toggles and becomes one and the counter reading is 1000. Since the first flip flop output changed from LOW to HIGH, there will not be any effect on the second flip-flop. When the clock changes the state from HIGH to LOW this flip-flop again toggles and goes to 0. Since the putput of first flip flop changed from HIGH to LOW this transition toggles the second flip flop and second flip flop takes the state 1 and the counter reading becomes 0100. In this way the count goes on upto 1111 and the next transition of the clock makes the counter go to 0000. This is a 4 bit asynchronous counter will have sixteen states from 0000 to 1111. (h) Synchronous Counter: A 4 bit synchronous counter is shown in Fig. 1.24. In this counter, the clock goes to all flip flops and all flip flops change their states depending on the status of their J K inputs in synchronism with the clock.

5V

5V J

L3

L2

L1

5V

Q

J

Q

K

L4

5V

Q

J

Q

K

Q

J

Q

K

Q

CLK

K C

C

C

Q C

SW1

Fig. 1.23 4 bit Asynchronous Counter. L1

L3

L2

L4

5V J

K C

Q

J

Q

K C

Q

J

Q

K

Q

J

Q

K

C

SW1 CLK

Fig. 1.24 4 bit Synchronous Counter.

Q

Q C

Boolean Algebra and Logic Design: An Overview

1.35

1.4.7  Sequential Circuit Design Main components of a sequential circuit are shown in Fig. 1.25. As mentioned sequential circuit output depends not only the present status of inputs to the combinational circuit but also previous states stored in storage elements. Sequential circuit

Input

Output Combinational circuit

Feedback Feedback circuit

Fig. 1.25 Main Components of a Sequential Circuit.

1.4.7.1  Sequential Circuit Design Example Design a sequential circuit of a binary counter that have the following sequence of states after occurrence of a clock. >000 → 011 → 101 → 111→ 110 → 000………… Draw a truth table consisting of present state, next state after occurrence of a clock and inputs to the J and K inputs of three flip flops. This truth table is shown in Table 1.2. Table 1.2: Truth Table for the counter. Present state

Next state

JK flip-flop inputs

A

B

C

A

B

C

JA

KA

JB

KB

JC

KC

0

0

0

0

1

1

0

d

1

d

1

d

0

0

1







d

d

d

d

d

d

0

1

0







d

d

d

d

d

d

0

1

1

1

0

1

1

d

d

1

d

0

1

0

0







d

d

d

d

d

d

1

0

1

1

1

1

d

0

1

d

d

0

1

1

0

0

0

0

d

1

d

1

0

d

1

1

1

1

1

0

d

0

d

0

d

1

1.36

Field Programmable Gate Arrays and Applications

From the truth table using Karnuagh map, simplify the Boolean expressions for J’s and K’s of the three flip-flops

Now the binary counter will be: B

C

A J

Clock

Q

B

High

J

CK

J

Q

CK

Q

Q

CK

A+C

AB K

A

C K

Q

K

Q

Fig. 1.26 Binary Counter with states 000 → 011 → 101 → 111 → 110 → 000…………

Boolean Algebra and Logic Design: An Overview

1.37

1.4.8  State Machine A state machine, in general, can be any device that stores at a given time the status of something and can operate on input to change the status and/or cause an action or output to take place for any given change. In other words, the state machine can have:

● An initial state or record of something stored some place



● A set of possible input events



● A set of new states that may result from the input



● A set of possible actions or output events that result from a new state

A finite state machine is one that has a limited or finite number of possible states. (An infinite state machine can be conceived but is not practical.) A finite state machine can be used both as a development tool for approaching and solving problems and as a formal way of describing the solution for later developers and system maintainers. There are a number of ways to show state machines, from simple tables through graphically animated illustrations. Finite state machines in general have outputs in addition to the state variable. For example, vending machine controllers generate output signals to dispense product, provide change, illuminate displays etc. Two types of finite state machines are Mealy and Moore machines which are shown in Figs. 1.27 and 1.28. Z

X Inputs

Present State

Outputs Combinational Logic

Y

Memory Elements

Fig. 1.27 Mealy Type Machine.

Characteristics of Mealy Machine

1. Current outputs are affected by the current state and the current inputs



2. Outputs are unstable until current inputs achieve steady state



3. More difficult to engineer because of the unstable outputs



4. Require less hardware than Moore circuits



5. Inputs can effect outputs in current clock period

1.38

Field Programmable Gate Arrays and Applications

X Inputs Combinational Logic

Z

Y

Combinational Logic

Outputs

Memory Elements

Present State

Fig. 1.28 Moore Type Machine.

Characteristics of Moore Machine

1. Current outputs are affected by the current state only



2. Current outputs are always stable since they depend only on the current state which is always stable



3. Easy to engineer since current outputs are always stable



4. Require more hardware than Mealy circuits



5. Inputs can affect outputs in next clock period only Finite State Machine Design Steps:

(a) Draw state and state transition table (b) State minimization (c) State assignment (state encoding) (d) Minimize nest state logic (e) Implement the design

1.4.8.1 An Example of a State Machine is the Digital Computer 1.4.8.2  Design Example 1 Problem. Design a circuit that can detect 3 or more 1’s in a bit string. Solution: First a state diagram is to be drawn as shown in Fig. 1.29. Then draw the state transition table which is shown in Table 1.3. Fig. 1.29 State Diagram

Boolean Algebra and Logic Design: An Overview

1.39

Table 1.3: Transition Table Current Output

Reset

Current State

Input

Next State

0

1

­—

==

00

0

0

A

0

A

0

0

A

1

B

0

0

B

0

A

0

0

B

1

C

0

0

C

0

A

0

0

C

1

D

1

0

D

0

A

1

0

D

1

D

From State Transition perform State Encoding (Table 1.4). Table 1.4: State Encoding Reset

Current State MSB-

1

Input

LSB—

Next State

Output

IN

MSB+

LSB+



0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

0

0

1

0

0

0

0

0

0

1

1

1

0

0

0

1

0

0

0

0

0

0

1

0

1

1

1

0

0

1

1

0

0

0

1

Write Boolean equations for MSB* and LSB* in terms of current state and input MSB+ = MSB-‘.LSB-. IN + MSB* .LSB-’.IN+MSB-.LSB- .IN LSB+ = MSB-’.LSB-’. IN + MSB-.LSB-’.IN + MSB- .LSB- .IN OUT+ = MSB- .LSB- .IN’ + MSB= .LSB- . IN Minimize the MSB* and LSB* Boolean equations using Karnaugh maps Current state MSB + CS IN 0 1

00

01

11

10

0

0

0

0

1

1

0

1

1.40

Field Programmable Gate Arrays and Applications Current state

LSB +

CS 00

IN 0

1

0

1

01

11

10

0

0

0

1

1

0

Current state OUT + CS IN 0

1

00

01

11

1 0

0

0

1

0

1

0

0

0

Minimized equations are:

MSB+ = MSB- . IN +MSB- . IN



LSB+ = MSB- .IN + LSB-’ .IN



OUT+ = MSB- .LSB-

These equations are implemented as shown in Fig. 1.30. IN MSB-

Set D MSB

IN MSBReset

Out

Set IN LSBD IN LSB-

LSB Clock Reset

Fig. 1.30: Circuit that can detect 3 or more 1’s in a bit string.

Boolean Algebra and Logic Design: An Overview

1.41

1.4.8.3  Design Example 2 Design of an FSM: a modulo-3 two bit up/down counter Suppose we are given the following building blocks:

● Inverters, NAND2, NOR2 gates each with input capacitance 1 unit. Assume that the parasitic delay of the inverter is 1 τ unit.



● D-flip flops with input capacitance 2 units, Clock → Q delay of 4 τ units, set-up time of 2 τ units and hold-time of 2 τ units. The load on the D-flip flop can be at most 2 units.

We illustrate the design process of obtaining a logic circuit implementation of a Mealy FSM starting with an abstract specification. 1. The FSM specification

● The set of input symbols is Σ = {reset, up, down}.



● The set of output symbols is Σ = {Y, N}.



● The set of states is Σ = {A, B,C}.



● The initial state is A. The next state function and output functions are shown in Table 1.5 Table 1.5: Next state function and output functions X (k)

q(k)

q(k + 1)

y(k)

Reset



A

N

Up

A

B

N

Down

A

C

N

Up

B

C

N

Down

B

A

N

Up

C

A

Y

Down

C

B

N

Note that the input symbol reset is used to put the machine in the initial state A. The specification can be visualized by the state transition graph (STG) shown in Figure 1.31.

Fig. 1.31 State transition graph of the up/down counter

1.42

Field Programmable Gate Arrays and Applications

The input capacitance of the FSM implementation (at each input) is to be at most 4 units (except for clock, which can have arbitrary input capacitance). The output capacitance at each output is given to be 4 units. 2. Input encoding and combinational function implementation First, we need to encode the input, output and state symbols. Let us use the binary encoding as shown in Table 1.6: Table 1.6: Binary Encoding State

q1

q0

B

0

1

C

1

0

Input

r

X

Reset

1



Up

0

1

Down

0

0

Output

Y

N

0

Y

1

A

0

0

Thus, the output bit y has the truth-table 00

01

11

00

d

01

d

11

d

10

d

10

q1q0

1

rx which has the simplest formula q1.r’.x. (r’ is complement of r). The next state variable nq1 has the truth-table Nq1 00

00

01

1

01

11 d

1

d

11

d

10

d

rx so that nq1 = r’.x’.q1’.q0’ + q0.r’.x.

10

q 1q 0

Boolean Algebra and Logic Design: An Overview

1.43

The next state variable nq0 has the truth-table Nq0

00

01

1

01

00

11

10

d

1

q1q0

d

11

d

10

d

rx so that

nq0 = r′.x.q1′.q0′ + q1′.r′.x′.

To implement these three equations, we can obtain the following simplified set of formulas (we have introduced intermediates u, v, w): 1

2

1

y

u

q1 q0

C=4

1

1 r

w

2

x

nq1 C=4 1

1 1

1 v

1 nq1

q1

2 nq0 1 C=4

nq1

q0

Clock

Fig. 1.32 Logic Network.

1.44

Field Programmable Gate Arrays and Applications



u = q1′.q0′ v = r′.x w = r′.x′ y = v.q1 nq1 = w.u + v.q0 nq0 = v.u + w.q1 Implement these using our library of gates to get the logic network in Figure 1.32. In this logic network, even if we keep all gates at minimum size, we will get a reasonable solution (because the total effort of each path is quite small). 3. The final timing analysis: Assuming that all gates are sized to be 1-unit, the holdtime is not a problem (because the flip-flop delay is itself equal to the hold time). • The maximum delay from a circuit input to a flip-flop input is 16.7 units. • The maximum delay from launch flip-flop to capture flip-flop (from clock to nq) is 16 units. • The maximum delay from clock to the output is 8.7 units. • The maximum delay from circuit input to circuit output is 15.33 units.

1.5 SUMMARY This chapter presented an overview of Boolean Algebra, minimization methods, logic gates, combination and sequential circuits, state machines with some design examples. This material is required for designing systems using Programmable logic devices which are covered in following chapters.

EXERCISES Q.1. Using only NAND gates, implement AND, OR and INVERT functions. Q.2. What is a universally complete logic gate? Give examples. Q.3. Simplify the following functions using Karnaugh Map. (a) f (A, B, C, D) = Σ m (0, 1, 3, 5, 6, 9, 11, 12, 13, 15) (b) f(A, B, C, D) = Σ m(0, 2, 3, 4, 5, 7, 8, 9, 13, 15) (c) f(A, B, C, D) = π M (0, 3, 4, 5, 6, 6, 7, 11, 13, 14, 15) (d) f = F(A, B, C) = Σ m (0, 1.2, 4, 5.6) Q.4. Simplify the following Boolean Functions using Quine McClusky method. (a) f (A, B, C, D) = Σ m (4, 8, 10, 11, 12, 15) + d(9, 14). (b) F(A, B, C, D) = Σ m (0, 1, 2, 5, 6, 7, 8, 9, 10, 14) (c) f(A, B, C, D) = A′B′C′D′ + AB′C′D′ + A′BC′D + ABC′D + AB′C′D + A′BCD + A′B′CD′ + AB′CD′ + ABCD (d) f(A, B, C, D) = Σ m(4, 5, 6, 8, 9, 10,13) + d(0, 7, 15) Q.5. For the R, S latch using NOR gates shown in Fig. 1.13(a) pulse of width t appears at the S terminal so that S = 1 for time t. If the propogation delay time of each gate

Boolean Algebra and Logic Design: An Overview

1.45

is tpd, show the output Q as a function of time if Q is initially 0 for (a) t > 2tpd (b) tpd < t < 2tpd (c) t < tpd Q.6. Design a binary counter that goes through the states 000 → 001 → 011 → 111 → 110 → 000 for every occurrence of a clock pulse. Q.7. What is difference between a Latch and a Flip-flop and where do you prefer to use a latch? Q.8. A ripple counter is to operate at a maximum frequency of 10 MHz. If the propagation delay time of each flip flop is 10ns and the strobing time is 50ns, how many stages can the counter have? Q.9. While decoding the contents of an asynchronous counter do you face any problem? If so what is that problem? Do you see similar problem while decoding the contents of a synchronous counter? Q.10. Show how an SR flip-flop can be constructed using a D flip-flopand other logic gates. Q.11. Design a vending machine that releases an item after receiving 15 cents. Single coin slot for dimes and nickels and machine does not give change. Q.12. Design a traffic controller for a busy highway intersected by a little used farm road. With no vehicle on farm road, green light remains on in highway direction. If a vehicle is on farm road, highway light goes from green to yellow to red allowing farm road light to go green. This farm rod light remains green only as long as vehicles are detected but not longer than a set interval. When these conditions are met, farm road light goes from green to yellow to red. allowing highway light go green. Even vehicles are waiting on farm road; highway gets at least some interval for green light. Q.13. Design a FSM (give state table) that outputs Z = 1 for an binary input sequence that has odd number of 1s in the input at any time t. Example X = 00110011110010101101 Z = 001 0010100010001010 Q.14. Design a FSM that output Z = 1 when input binary sequence X has exactly 3 consecutive 1’s in the input sequence X Example X = 001111000011100101010001110001

Z = 00000000000010000000.000001000

REFERENCES

1. M. Morris Mano, Michael D. Ciletti, Digital Design with introduction to Verilog HDL, Pearson, 2013



2. Herbert Taub, Donald Schilling, Digital Integrated Electronics, McGraw Hill, 1977



3. Stephen Brown, ZvonkoVranesic, Fundamentals of Digital Logic Design with VHDL, Tata McGraw Hill, 2009



4. ZviKohavi, Switching and Finite Automata Theory, Tata McGraw Hill,1978



5. Kohavi and Niraj K. Jha, Switching and Finite Automata Theory,

Third Edition, Cambridge University Press, 2010

6. J.P. Hayes, Introduction to Logic Design. Addison-Wesley, 2003

2 Programmable Logic Devices (PLDs)

2.1 INTRODUCTION Digital logic circuits were implemented using vacuum tubes, transistors and small scale integrated (SSI), medium scale integrated (MSI) circuits. A possible disadvantage

Fig. 2.1 Programmable Device Family.

2.2

Field Programmable Gate Arrays and Applications

of using these standard SSI/MSI ICs is that logic design may require large number of ICs that result in considerable amount of board space, a great deal of time and cost in inserting, soldering and testing. Unlike these logic circuits which have a fixed function, a Programmable logic device (PLD), introduced around 1978, is an electronic component for building reconfigurable digital circuits. The PLD has an undefined function at the time of manufacture and needs to be programmed for a specific logic function. The original PLD will have fuses in the programmable AND and/or OR array intact. Programming involves blowing of the fuses at appropriate places based on the logic to be implemented. The advantages of using PLDs over standard SSI/MSI ICs are less board space, faster, lower power requirements, less costly assembly processes, higher reliability due to less number of soldering joints and availability of design software. Fig 2.1 shows programmable device family. In this chapter, all these programmable devices will be discussed in detail.

2.2 SPLDS Under these category comes Programmable Read Only Memory (PROM), Programmable

Inputs

Fixed AND Array (Decoder)

Programmable OR Array

Outputs

(a) Programmable Read Only Memory (PROM)

Inputs

Programmable AND Array

Fixed OR Array

Outputs

(b) Programable Array Logic (PAL) Device

Inputs

Programmable AND Array

Programmable OR Array

Outputs

(c) Programmable Logic Array (PLA) Device Programmable Connections

Fig. 2.2 (a) Configuration of SPLDs.

Normal Connections

Programmable Logic Devices

2.3

  Fig. 2.2 (b) Circuit configuration of SPLDs.

Array Logic (PAL) and Programmable Logic Array (PLA). Figs. 2.2(a) and 2.2(b) show configuration of these programmable devices. All these devices will have AND arrays and OR arrays. IN PROM, OR array is programmable, in PAL AND array is programmable and in PLA both AND and OR arrays are programmable. These are one time programmable devices (OTP). The SPLDs shown in Fig 2.2(b) can be used only for combinational logic. Using PROM as a programmable logic device has a disadvantage. A combinational circuit may occasionally have don’t-care conditions. When implemented with a read-only memory, a don’t care condition becomes an address input that will never occur. The words at the don’t care address need not be programmed and may be left in their original state (all 0’s or all 1’s). The result is that not all the bit patterns available in readonly memory are used which may be considered a waste of available component. Output Enable

Product Lines D

Q Q

Clock

Feedback inputs for AND arrays

Enable

Fig. 2.3(a) Output Stage Cell of Registered PLA/PAL.

Output

2.4

Field Programmable Gate Arrays and Applications

There are registered programmable logic devices. These devices will have a D flip flop at the output and this output could be used as input to the AND array. These configurations are shown in Fig. 2.3(a) and Fig. 2.3(b). These registered programmable devices could be used to implement sequential logic. Block Diagram I1 I11

CLK/I0 1

11

Programmable AND Array (44 × 132)

Reset

8

10

12

14

15

15

14

12

10

9

Output Logic Macro Cell

Output Logic Macro Cell

Output Logic Macro Cell

Output Logic Macro Cell

Output Logic Macro Cell

Output Logic Macro Cell

Output Logic Macro Cell

Output Logic Macro Cell

Output Logic Macro Cell

Output Logic Macro Cell

I/O0

I/O1

I/O2

I/O3

I/O4

I/O5

I/O6

I/O7

I/O0

I/O8

AR D CLK

S0 = 0 S1 = 0

S0 = 0 S1 = 1

Q Q

SP Registered/Active Low

AR D CLK

Combinatorial/Active Low S0 = 0 S1 = 0

S0 = 1 S1 = 1

Q Q

CLK

SP Registered/Active Low

Combinatorial/Active High

Fig. 2.3(b) Block Diagram of Combinational and Registered PAL.

Preset

16564D 1

Programmable Logic Devices

2.5

2.2.1 A Commercial PAL 22V10 is shown in Fig. 2.4. Programming this Chip is Explained Later

Fig. 2.4 PAL LV22V10

2.3 COMPLEX PROGRAMMABLE LOGIC DEVICES (CPLD) PLAs and PALs are useful for implementing a wide variety of small digital circuits. Each device can be used to implement circuits that do not require more than the number of inputs, product terms, and outputs that are provided in the particular chip. These chips are limited to fairly modest sizes, typically supporting a combined number of inputs plus outputs normally of not more than 32. For implementation of circuits that require more inputs and outputs, either multiple PLAs or PALs can be employed or else a more sophisticated type of chip, called a complex programmable logic device (CPLD), can be used. A CPLD comprises multiple circuit blocks on a single chip, with internal wiring resources to connect the circuit blocks. Each circuit block is similar to a PLA or a PAL which will refer to the circuit blocks as PAL-like blocks. Macrocells are functional blocks

2.6

Field Programmable Gate Arrays and Applications

that perform combinatorial or sequential logic, and also have the added flexibility for true or complement along with varied feedback paths. An example of a CPLD is given in Fig. 2.5(a) and 2.5(b ) It includes four PAL-like blocks that are connected to a set of interconnection wires. Each PAL-like block is also connected to a sub-circuit labeled I/O block, which is attached to a number of the chip’s input and output pins. logic-block

logic-block

macrocell

macrocell

macrocell

macrocell

macrocell

macrocell

CPLD-architecture

Fig. 2.5(a)

MC 1 I/0

Logic Block

LB Inputs

LB Inputs

MC 0

MC 0 Logic Block

Logic Block

Feedback Paths

LB Inputs

LB Inputs

MC 0

Interconnect

Feedback Paths

MC 1

I/0 MC X

MC X

I/0

MC 1

MC 0 Logic Block

MC 1 I/0 MC X

MC X Feedback Paths

Feedback Paths

Fig 2.5(b) CPLD Block Structure.

CPLDs perform a variety of useful functions in systems design due to their unique capabilities and as the market leader in programmable logic solutions.

Significant characteristics for the CPLD-architecture :

● product terms generated in programmable macrocells



● typically one dedicated flip-flop per macrocell

Programmable Logic Devices

2.7



● many macrocells per logic-block



● typically all logic-blocks identical



● minimum two logic-blocks per device



● routing between logic-blocks via global switch matrix

2.3.1 Comparison of SPLD and CPLD A brief comparison of SPLD and CPLD is shown in Table 2.1. Table 2.1 Comparison of SPLD and CPLD. Criteria

SPLD

CPLD

Propagation Delay

High Speed - Typically 3.5 ns

High Speed - Typically 10 ns

Density

100 to Several Hundred

Several Hundred to several Thousand (25,000 Gates)

Technology

Bipolar and CMOS

CMOS

Ease of Designing

Very Easy

Medium Ease

Complexity

Simple Architecture

Medium to Difficult Architecture

Frequency

200 MHz

100 MHz

Programmable I/Os

Yes

Yes

2.3.2  Some Commercial PLDs Some Commercial PLDs are listed in Table 2.2. Table 2.2: PAL and PLD Commercial Product Examples Company

Product Family

Basic Cell Type

Programming

Fabrication

Technology

Technology

Advanced Micro Devices

CMOS PALS

Macrocell

EEPROM

CMOS

Advanced Micro Devices

Bipolar PALS

Macrocell

One-Time Programmable

CMOS

Altera

FLEXlogic

sum-ofproducts macrocell or 128 X 10 RAM

SRAM and Flash EPROM

CMOS

Altera

Classic

sum-ofproducts macrocell

UV-PROM

CMOS 0.65 mm MAX 9000

2.8

Field Programmable Gate Arrays and Applications Atmel

V750

22V10 sum-ofproducts style

UV-EPROM

0.65 mm CMOS

Atmel

22V10 / 16V8 20V8

sum-ofproducts

Flash EEPROM

0.65 mm CMOS

Cypress Semiconductor

PAL22V10 family

EPROM PLD

Flash EEPROM, EPROM

CMOS BiCMOS

Cypress Semiconductor

PALLE16V8

PLD

Flash EEPROM

CMOS

ICT

PEEL

SOP macrocell

EEPROM

CMOS

Lattice Semiconductor

ispGAL and

EEPROM isp: in-circuit prog.

CMOS

GAL families

Programmable AND - Fixed OR w/ regs.

Philips Semiconductor

GAL family

macrocell

one-time prog., and reprogrammable

Bipolar, BiCMOS EPROM, EEPROM

Philips Semiconductor

PAL family

combinational & registered outputs

one-time programmable

Bipolar

Philips Semiconductor

Prog. Logic Arrays (PLAs)

bidirectional I/O or combinational outputs

one-time programmable

Bipolar

Philips Semiconductor

Prog. Macro Logic and Sequencers

JK, D registered and combinational I/O

one-timeprog. and UVEPROM

Bipolar and EPROM CMOS

Texas Instruments

PLDs and Field Prog. Sequencers

PLD

TiW fuses

Bipolar

Xilinx

XC7200A EPLD

21-input, 9-output sumof-products macrocell w/ ALU

EPROM

Submicron CMOS

Xilinx

XC7300 EPLD

9 macrocell blocks

EPROM

Submicron CMOS

Programmable Logic Devices

2.9

2.3.3  Design Methodologies for PLDs Though some engineers programmed PAL devices by manually editing files containing the binary fuse pattern data, most designers opt to design their logic using a hardware description languages (HDL) such as Data I/O’s ABEL, Logical Devices’ CUPL, or MMI’s PALASM. These were computer-assisted design (CAD)(now referred to as “design automation”) programs which translate (or “compile”) the designers’ logic equations into binary fuse map files used to program (and often test) each device.

2.3.3.1  Steps involved in manual programming

1. Prepare the truth table



2. Write the Boolean expression in SOP (sum of products) form.



3. Obtain the minimum SOP form to reduce the number of product terms to a minimum.



4. Decide the input connection of the AND matrix for generating the required product term.



5. Then decide the input connections of OR matrix to generate the sum terms.



6. Decide the connections of invert matrix.



7. Program the PLA.

Example 2.1. Implement the following Boolean functions using a combinational PAL :

W = A′B′CD′ + ABC’D′ + ABC’D



X = A′BCD + AB′C′D′ + AB′C′D + AB′CD′ + AB′CD + ABC′D′ + ABC′D + ABCD′ + ABCD



Y = A′B′C′D′ + A′B′CD′ + A′B′CD + A′BC′D’ + A′BC′D + A′BCD′ + A′BCD + AB′C′D′ + AB′CD′+ AB′CD + ABCD



Z = A′B′C′D + A′B′CD′ + AB′C′D′ + ABC′D′ + ABC′D

Simplifying the 4 functions to a minimum number of terms using Karnaugh map results in the following Boolean functions:

W = ABC′ + A′B′CD′



X = A + BCD



Y = A′B + CD + B′D′



Z = ABC′ + A′B′CD′ + AC′D′ + A′B′C′D

Z can be rewritten as

Z = W + AC′D′ + A′B′C′D

The programmed PAL is shown in Fig. 2.6

2.10

Field Programmable Gate Arrays and Applications

Fig. 2.6 Combinational PAL implementing the Boolean equations of example 2.1.

Example 2.2. Design a 4 bit counter with synchronous clear using PAL16R4 PAL16R4 Chip pin details are shown in Fig. 2.7(a) and PALASM design of a 4bit counter I shown in Fig. 2.7(b).

Programmable Logic Devices

2.11

Figure 2.7 PAL16R4 Pinout, Pinouts CLK

GND

1

20

VCC

2

19

I/O

3

18

I/O

4

17

Q

5

16

Q

6

15

Q

7

14

Q

8

13

I/O

9

12

I/O

10

11

OE

Fig. 2.7(a) Pin Details of PAL16R4.

PAL164 internal details are shown in Fig 2.7 (b) PAL16R4 PAL PAL DESIGN SPECIFICATION CNT4SC 4 bit counter with synchronous clear Michael Holley and Dave Pellerin clk Clear NC NC NC NC Nc Nc NC GND OE NC NC /Q3 /Q2 /Q1 /Q0 NC NC VCC Q3 := Clear + /Q3 * /Q2 * /Q1 * /Q0 + Q3 * Q0 + Q3 * Q1 Q2 := Clear + /Q2 * /Q1 * /Q0 + Q2 * Q0 + Q2 * Q1 Q1 := Clear + /Q1 * /Q0 + Q1 * Q0 Q0 := Clear + /Q0

2.12

Field Programmable Gate Arrays and Applications

Fig 2.7(b) PAL16R4 Internal details.

Programmable Logic Devices

2.13 Function Table

OE

Clear

Clk

/Q0

/Q1

/Q2

/Q3

L

H

C

L

L

C

H

L

L

L

L

L

C

L

H

L

L

L

L

C

H

H

L

L

L

L

C

L

L

H

L

L

H

C

L

L

L

L

L

L

L

L

Fig 2.7(c) PALASM design of a 4-bit counter.

The PALASM (from “PAL assembler”) language was used to express Boolean equations for the output pins in a text file which was then converted to the ‘fuse map’ file for programming system using a using a device programmer. Courtesy: http://en.wikipedia.org/wiki/Programmable_Array_Logic

2.3.3.2 S  teps Involved in Designing with SPLDs Using CAD Tools are Shown in Fig 2.8

Fig. 2.8: CAD Design Flow for SPLDs.

Design Example 2.3: Design a Binary to Gray Code Converter that accepts a 4-bit binary input (B3 B2 B1 B0, where B3 is the MSB and produce the corresponding 4-bit Gray code (G3 G2 G1 G0, where G3 is the MSB, at its output. Use the PAL CE16V8. The pin assignment for the PAL is shown in Fig. 2.9.

2.14

Field Programmable Gate Arrays and Applications PALACE16V8 BIN0

1

20

BIN1

2

19

BIN2

3

18

BIN3

4

17

5

16

6

15

G3

7

14

G2

8

13

G1

9

12

G0

10

11

GND

VCC

Fig. 2.9 PAL CE16V8

Pin 1 to 4 are input pins and Pin 12 to 15 are output pins. A logic ‘1’ output is represented by zero volt at the output pin. Truth table for Binary to Gray code conversion is shown in Table 2.3 Table 2.3: Truth table for Binary to Gray code conversion Binary

Gray Code

BIN3

BIN2

BIN1

BIN0

G3

G2

G1

G0

0

0

0

1

0

0

0

1

0

0

1

0

0

0

1

1

0

0

1

1

0

0

1

0

0

1

0

0

0

1

1

0

0

1

0

1

0

1

1

1

0

1

1

0

0

1

0

1

0

1

1

1

0

1

0

0

1

0

0

0

1

1

0

0

1

0

0

1

1

1

0

1

1

0

1

0

1

1

1

1

1

0

1

1

1

1

1

0

1

1

0

0

1

0

1

0

1

1

0

1

1

0

1

1

1

1

1

0

1

0

0

1

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

Programmable Logic Devices

2.15

Title Binary To Gray Code Converter Pattern Bingra.Pds Revision 1.0 Author Rao Company Iitb Date 03/03/14 Chip _Decoder Palce16v8 Pin Declarations Pin 1

Bin0

Combinatorial ; Input :

Pin 2

Bin1

Combinatorial ; Input

Pin 3

Bin2

Combinational ; Input

Pin 4

Bin3

Combinatorial;

Pin 10

Gnd

Pin 12

G0

Combinational ; Output

G2

Combinational ; Output

Pin 13 Pin 14 Pin 15 Pin 20 Equations

Input :

G1

Combinational ; Output

G3

Combinational ; Output

Vcc

From the truth table, Boolean equations for G3, G2, G1 and G0 are simplified and listed below:

G3 = B3



G2 = B3∗.B2 + B3.B2∗ where B∗ is complement of B



G1 = B2.B1∗ + B2∗.B1



G0 = B1∗.B0 + B1.B0∗

SIMULATION Following are steps for designing using PALASM Write a PDS file and run PALASM:

● Turn on the computer and wait until Windows start..



● Double click the PALASM icon.



● Select FILE and then CHANGE DIRECTORY



● Type c:\microp



● Select Begin New Design. Go to New File Name field and type a name up to 8 characters.

Press F10. Don’t forget the filename.

2.16

Field Programmable Gate Arrays and Applications

Compile the PDS file and verify your design.

● Once the PDS file is completed, return to the main menu by pressing ALT-X and then .



● Compile the file by selecting Run / Compilation and then press F10. If you find ERROR count: 0 WARNING count: 0 at the bottom of the screen, your compilation is successful and goto next step.



● If errors occur or warning count is not zero, press ESC to leave this window. Select EDIT / Text File to edit the PDS file again and correct errors. Re-compile after any modification.



● Run simulation by selecting SIMULATION to verify the design. Press N for not using auxiliary simulation file, and then press F10. If no error and warning, goto next step. Otherwise, edit the PDS file again.

The simulation result can be examined by using View / Simulation Data / Trace in the main menu. A series of input and output signals appeared on the screen. ‘H’ means ‘1’ and ‘L’ means ‘0’. Check the signals with the truth table. Save your final compiled file and program it into a PAL chip by following the procedure given. Programming PAL and Testing the Design Copy the JED file to floppy disk by following the steps below:

● At PALASM, select FILE and then Go To System.



● Type copy filename.JED a:\ , where filename is the name of your design file exit

Go to the PLD writer. Follow the procedures of programming a PLD listed next to the programmer. Test the programmed chip using the experiment board according to the truth table of the decoder. If any error happens, go back to PDS file and locate any possible mistakes. Courtesy: http://kcchu888.tripod.com/Lab2_PLD.pdf

2.4  FIELD PROGRAMMABLE GATE ARRAY (FPGA) A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing. Hence, it is field programmable. FPGAs are manufactured by several companies as shown in Fig. 2.10. Profiling the top FPGA companies for 2013, Altera and Xilinx continue to dominate the market for general purpose programmable logic. These two companies comprise approximately 90% market share (Xilinx 47%, Altera 41%) in 2012 with combined revenues

Programmable Logic Devices

2.17

in excess of $4.5B and a market cap over $20B. Worldwide PLD/FPGA revenue in US dollars is given in Table 2. 4. 2%

36%

2% 6% Xillnx Altera Actual Vantis Lattice Lucent Quick Logic Cypress

7%

7%

10% 31%

Fig. 2.10 Market Share of Various FPGA Manufacturers (Year 2010).

Table 2.4: Worldwide PLD/FPGA revenue in FPGA Companies Rank

Total Market

CY2013

CY2013

$4,542

Percent

1

Xilinx

2,297

50.6%

2

Altera

1,703

37.5%

3

Lattice Semiconductor

333

7.3%

4

Microsemi

195

4.3%

5

Atmel

6

0.1%

6

QuickLogic

3

0.1%

7

Cypress Semiconductor

1

0.0%

8

Other Companies

4

0.1%

Calender Years, Pre-Split Data Source: iSuppli

Comparing this to the chart to the market in 2010/2013 provides a quick overview of the changes in the last few years. While revenues have been relatively flat in recent quarters, it is expected that the programmable logic sector will be seeing growth towards the end of the year. All companies in this market, along with their associated foundry partners, have continued to invest heavily for new technologies and manufacturing. Now ACTEL has been bought over by MICROSEMI CORPORATION but Actel has still kept the Actel brand alive since it does have some brand-recognition. There are two primary classes of FPGA architecture and they are coarse grained and fine grained. Coarse-grained systems consist of fewer, larger components than finegrained systems; a coarse-grained description of a system regards large subcomponents

2.18

Field Programmable Gate Arrays and Applications

while a fine-grained description regards smaller components of which the larger ones are composed. Another classification of FPGA is based on the process technology used to manufacture the switch, viz. SRAM based or Antifuse. Xilinx FPGAs are SRAM based. Since Xilinx is dominating the FPGA market, we will see briefly Xilinx FPGAS that are introduced in the beginning and the advances made in FPGA architectures and currently available FPGAs and also Xilinx low power, low cost Spartan versions. In Chapters 3 Xilinx FPGAs are discussed in at good length and an Overview of Altera and Actel/Microsemi FPGAS is presented in Chapter 4 and chapter 5 respectively.

2.4.1  XILINX FPGAS In the beginning Simpler versions of FPGAs such as XC2000, XC 3000 contain an internal array of logic blocks and registers which are called Configurable Logic Blocks (CLBs), surrounded by a ring of programmable I/O blocks to provide interface between FPGA and outside world and connected together through programmable interconnects as shown in Fig. 2.11. Input/Output Blocks

Logic Blocks

Programmable Interconnect

Fig. 2.11 Basic FPGA (XC 2000, XC 3000).

2.4.2  Advanced FPGAs Additional resources of Virtex-II FPGA are Clock manager (PLL, DCM), dedicated multipliers, Block RAMs (memory) etc. as shown in Fig. 2.12.

Programmable Logic Devices

2.19

Advanced Xilinx FPGAs, the Virtex Family (Virtex-II, Virtex-II Pro, Virtex-4, Virtex-5, Virtex-6, Virtex-7) have additional; resources such as Hard core (IBM Power PC and ARM) and Soft core processors (Microblaze), Digital Signal Processors (DSPs), very high speed transceivers etc as shown in Fig. 2.12 (a), 2.12 (b) and 2.12 (c) Xilinx FPGAs, all units of FPGAs are discussed in detail in Chapter 3. I/O Blocks (IOBs)

Block SelectRAMTM resource

Programmable Interconnect Dedicated multipliers Configurable Loglc Blocks (CLBs) VirtexTM-II architecture’s core vp;tage operates at 1.5V

Clock Management (DCM, BUFGMUXes)

Fig. 2.12 (a) Virtex-II Architecture

2 Power PCTM Processors 450 MHZ High-peed 11.1 Gbps Serial Transceivers

Programmable IO + DCI (Digitally Controlled Impedance)

>500 DSP datapaths

200, 000 LUTs (~ 10 million gates)

Fig. 2.12 (b) Advanced Xilinx FPGAs.

10Mbit DualPortTM RAM

2.20

Field Programmable Gate Arrays and Applications

CLB Memory Controller I/O MGT

CMT BUFG

PCle Endpoint

BUFIO Block RAM DSP48

Fig. 2.12 (c) General overview of the FPGA architecture.

2.4.3  Spartan Family Xilinx also offers Spartan-3 FPGA family which deliver an optimal balance of low risk, low cost, and low power for cost-sensitive applications that meets the common problem of increased design functionality and minimizing device costs faced by gate-centric designers. With advances in semiconductor technology, with 42% less power consumption and 12% increased performance over previous generation devices, Xilinx’s Spartan-6 FPGAs offer advanced power management technology, up to 150K logic cells, integrated PCI Express® blocks, advanced memory support, 250MHz DSP slices, and 3.2Gbps low-power transceivers. This Spartan family of devices are also discussed in Chapter 3 in detail.

2.4.4  Designing with Xilinx FPGAs VLSI Design Flow

Plant & Budget

Implement Translate Map

Create Design HDL/ Schematic Integrate IP Core Functional Simulation

Unique to FPGA Place & Route Designflow

Timing Simulation

HDL RTL Simulation

Synthesize to Create Netlist

Translate: Merge multiple design files into single netlist Map: Assign (‘map’) gates to physical components (LUTs, register, etc.) Create Silicon: Bit File 5 min (FPGA) Attain (FPGA) Timing Mask Files vs. 5 months (ASICI)! Closure (ASIC)

Fig. 2.13 VLSI/FPGA Design Flow.

Programmable Logic Devices

2.21

To design with FPGAs, CAD tools need to be used. Xilinx CAD tool is ISE/Vivado. A block diagram of VLSI/FPGA design flow is shown in Fig. 2.13.

2.4.5 Applications of FPGAs Some of the applications of FPGAs are listed in Table 2.5. The following table 2.5 lists common application areas for FPGAs today: Table 2.5: Applications of FPGAs End Markets

Subsegments Wireless

Application Cellular Base Stations Wireless LAN Metro Area Networks Optical Networks

Communications Networking

DSL Modems Switches Routers Mass Storage Storage Area Networks Network Attached Storage

Storage

High Speed Servers Computer Peripherals Mass Storage

Office Automation

Copiers, printers Plasma Displays DVRs Consumers

Set Top Boxes MP3 Players Digital Cameras Factory Automation

Consumer, Industrial And Other

Industrial

Medical Imaging Test Equipment Multimedia Systems

Automotive

GPS Navigation Systems Voice Recognition Satellite Surveillance

Military

Radar and Sonar System Secure Communication

2.22

Field Programmable Gate Arrays and Applications

Xilinx FPGA design tools and applications are discussed in detail in chapter 8.

● Microcontrollers are used in applications where FPGAs are being used now. It is interesting to have an overview of Microcontrollers and compare them with FPGAs so that the designer can choose the right device that suits his application most.

2.5 MICROCONTROLLERS There are LSI/VLSI chips that provide a CPU, memory, general purpose and special purpose controllers and interfaces. One can build a microcomputer using these chips with the help of additional SSI/MSI components that include latches, drivers, decoders, etc. which is normally termed as Glue logic. This is a microcomputer built using separate components (CPU, Memory, etc.). For some specific applications, we also have single chip computers in as a VLSI chip. This single chip microcomputer will have a CPU, memory and I/O interfaces, timers, ADC/DACs etc. on a single chip itself. This component is called a Single Chip Microcontroller. Nowadays it is also being referred to as an Embedded Controller. It could also be called System-on-Chip (SoC). The term SoC is generally at the high-end of the single chip microcontroller. This could be used for specific purposes. Since memory and I/O interfaces are available on the chip along with the CPU itself, there will be some constraints on the amount of memory and interfaces that could be provided on the chip itself. As we will see later, the currently available microcontrollers will have limited amount of memory in terms of ROM, EPROM, Flash ROM/RAM and I/O ports. If this chip could be used as it is for a specific application it will be highly beneficial and system built around a single chip microcontroller would be highly reliable. One can go to a ROM based microcontrollers if the production volume is large. For prototyping and limited applications, one can go for EPROM or Flash ROM based single chip microcontrollers. The manufacturers of Single chip microcontrollers will provide facility to have external memory, if required but one has to sacrifice some I/O ports to have this extra memory requirement. Since external memories are being accessed through I/O ports overall memory access times are longer. To illustrate this, internal details of Motorola 8 bit microcontroller M6801 is shown in Fig. 2.14. If the microcontroller is used in single chip mode, all four ports could be used as input/ output pins along with on chip 2K byte ROM and 128 byte RAM.. If external memory need to be used, port3 and port4 have to used for external memory address and data and cannot be used as I/O ports. There are 4 bit, 8 bit, 16 bit and 32 bit microcontrollers and depending on the application, appropriate microcontroller could be used. After selecting the microcontroller, an executable application programme needs to loaded into the ROM/EPROM/Flash memory. Typical microcontroller applications are listed in Table 2.6

Programmable Logic Devices VOC VSS XTAL1 XTAL2 B NML TRQ1 RESET

2.23

Mode Expanded Multiplexed Expanded Non-Multipexed Singal Chip A7/D7 A6/D6 A5/D5 A4/D4 A3/D3 A2/D2 A1/D1 AG/DO R/W AS

D7 D6 D5 D4 D3 D2 D1 D0 R/W IOS

I/O I/O I/O I/O I/O I/O I/O I/O OS3 IS3

Port 3

Port 2

Max IRO1

P20 P21 P22 P23 P23

Timer

TRQ2

P37 P36 P35 P34 P33 P32 P31 P30 SC2 SC1

MPU

SCI P47 P46 P45 P44 P43 P42 P41 P40

A15 A14 A13 A12 A11 A10 A9 A8

A7 A6 A5 A4 A3 A2 A1 A0

I/O I/O I/O I/O I/O I/O I/O I/O

Port 4

Address Port 1 Data

Voc Standby

128 × 8 RAM

P10 P11 P12 P13 P14 P15 P16 P17

2048 × 8 ROM (See Note)

Fig. 2.14 Internal details of Microcontroller M6801

Table 2.6 Examples of Typical microcontroller applications Consumer

Function Performed by the Microcomputer

Washing Machine

Controls the water and spin cycles

Exercise equipment

Measures speed, distance, calories, heart rate, logs workouts

Remote controls

Accepts key touches and sends infrared (IR) pulses to base system

Clocks and watches

Maintains the time, alarm, and display

Games and toys

Entertains the user, joystick input, video output

Audio/video

Interacts with the operator and enhances performance

Communication Telephone answering machines Plays outgoing message, saves and organizes messages Telephone system

Interactive switching and information retrieval

Cellular phones and pagers

Key pad inputs, sound I/O, and communicates with central station

2.24 ATM machines

Field Programmable Gate Arrays and Applications Provides both security and banking convenience

Automative Automatic banking

Optimizes stopping on slippery surfaces

Noise cancellation

Improves sound quality by removing background noise

Theft deterrent devices

Keyless entry, alarm systems

Electronics ignition

Controls sparks and fuel injectors

Power windows and seats

Remembers preferred settings for each driver

Instrumentation

Collects and provides the driver with necessary information

Military Smart weapons

Recognizes friendly targets

Missle guidance systems

Sense car positions and controls traffic lights

Global positioning systems

Determines where you are on the planet

Industrial Setback thermostats

Adjusts day/night thresholds, thus saving energy

Traffic control systems

Senses car positions and controls traffic lights

Robot systems

Input from sensors, controls the motors

Bar code readers and writers

Input from readers, output to writers for inventory control and shipping

Automatic sprinklers

Used in farming to control the wetness of the soil.

Medical Apnea monitors

Detects breathing and alarms if the baby stops breathing

Cardiac monitors

Measures heart functions

Renal monitors

Measures kidney functions

Drug delivery

Administers proper doses

Cancer treatments

Controls doses of radiation, drugs, or heat

Pacemakers

Helps the heart beat regularly

Esthetic devices

Increases mobility for the handicapped

Dialysis machines

Performs functions normally done by the kidney

2.6  APPLICATION SPECIFIC INTEGRATED CIRCUIT (ASIC) An application-specific integrated circuit (ASIC) is an integrated (IC) customized for a particular application, rather than intended for general-purpose use. As feature sizes have shrunk and design tools improved over the years, the maximum complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over 100 million.

Programmable Logic Devices

2.25

Modern ASICs often include entire microprocessors, memory blocks including ROM, RAM, EEPROM, FLASH memory and other large building blocks. Such an ASIC is often termed a SOC (System-on-Chip). Designers of digital ASICs use a Hardware description languages (HDL), such as Verilog or VHDL to describe the functionality of ASICs. Examples of ASIC include, chips designed for a satellite, a car, a mobile, a digital voice recorder etc. The following paragraphs will describe the types of ASIC’s.

2.6.1  Full-Custom ASIC In full custom IC, the designer designs all of the logic cells, layout for that one chip. The designer does not use predefined gates in the design. Complete design of the chip is done from scratch.

2.6.2  Standard Cell ASIC Standard Cell ASIC design uses predesigned logic cells such as ANDs, NANDs, NOR gate, etc. These gates are called Standard Cells. The advantage of Standard Cell ASIC’s is that the designers save time, money and reduce the risk by using a predesigned and pretested Standard Cell Library. Also each Standard Cell can be optimized individually. The Standard Cell Libraries is designed using the Full Custom Methodology. One can use these pre-designed libraries in the Standard Cell design. This design style gives a designer the same flexibility as the Full Custom design, but reduces the risk.

2.6.3  Gate Array ASIC In Gate Array ASIC, the transistors are predefined in the silicon wafer. The predefined pattern of transistors on the gate array is called a base array and the smallest element in the base array is called a base cell. The base cell layout is same for each logic cell, only the inter-connect between the cells and inside the cells is customized. The following are the types of gate arrays: (a) Channeled Gate Array (b) Channel less Gate Array (c) Structured Gate Array When designing a chip, the following factors need to be taken into consideration:

1. Speed



2. Area



3. Power



4. Time to Market To design an ASIC, one needs to have a good understanding of the CMOS Technology.

2.6.4  Generalised ASIC Design Flow Steps Steps involved in ASIC design are given in Table 2.7

2.26

Field Programmable Gate Arrays and Applications Table 2.7: Steps involved in ASIC design



● High Level Design



● Specification Capture



● Design Capture in C, C++, SystemC or SystemVerilog



● HW/SW partitioning and IP selection



● RTL Design



● Verilog/VHDL



● System, Timing and Logic Verification



● Is the logic working correctly?



● Physical Design



● Floor planning, Place and Route, Clock insertion



● Extraction of Physical View



● Verification of timing and signal integrity



● Design Rule Checking/ LVS



● Performance and Manufacturability/Lithography Verification (DFM)

2.6.5 ASIC Design Flow ASIC design flow is shown in Fig. 2.15

2.7 COMPARISONS Having studied the programmable logic devices, PALs and PLA are at the low end, and current FPGAs and ASICs are at the other high end. It is now necessary to look at the comparison of high end devices of Microcontrollers vs FPGA and FPGA vs ASIC for system designs so that a designer can choose a component that suits most for the application he has at his hand.

2.7.1 Comparison of Microcontrollers vs FPGA

1. Microcontrollers are custom built computers with limited resources in an IC and are driven by application software while FPGAs are composed of logic blocks, RAM, DSP, Processor, High speed transceivers etc. and these parts can be interconnected electrically.



2. Microcontrollers consume less power than FPGAs.



3. FPGAs take a considerably longer time to set-up while there are ready built microcontrollers and only application software needs to be written.



4. Building devices with FPGAs are more costly than a single chip microcontroller.



5. Since microcontrollers run on software and FPGAs are mostly hardwired, FPGAs are faster than microcontrollers and are suitable for Real time applications.



6. Microcontrollers are cheaper than FPGAs.

.

Programmable Logic Devices

2.27 Design Specification

Behavioral Description

RTL Description

Verification Vectors

RTL Functionality Verified?

HDL Design Synthesis

Yes RTL to Logic

Constraints

Logic Optimization

No

Logic to Technology

Timing/Area Optimization

Constraints

Scan Path Insertion & Test Vector Generation

Netlist Logic & Timing Verified

Design Implimentation

No

Yes Floor Planning

Place & Route Physical Layout

Layout Function & Timing Verified? Yes Chip Production

Fig. 2.15 ASIC Design Flow.

No

2.28

Field Programmable Gate Arrays and Applications

2.7.2  Comparison of FPGA vs ASIC FPGA Advantage Faster time-to-market

Benefit No layout, masks or other manufacturing steps are needed

No upfront non-recurring expenses Costs typically associated with an ASIC design (NRE) Simpler design cycle

Due to software that handles much of the routing, placement, and timing

More predictable project cycle

Due to elimination of potential re-spins, wafer capacities, etc.

Field reprogram ability

A new bit stream can be uploaded remotely

ASIC Advantage

Benefit

Full custom capability

For design since device is manufactured to design specs

Lower unit costs

For very high volume designs

Smaller form factor

Since device is manufactured to design specs

2.8  CURRENT STATUS Xilinx has released in 2015 their FPGA design suite 2014.4.1. As technology has advanced further, Xilinx has announced their new 16nm UltraScale+™ family of FPGAs, 3D ICs and MPSoCs, that combines new memory, 3D-on-3D and multi-processing SoC (MPSoC) technologies, delivering a generation ahead of value. The newly extended Xilinx(R) UltraScale+ FPGA portfolio is comprised of Xilinx’s market leading Kintex(R) UltraScale+ FPGA and Virtex(R) UltraScale+ FPGA and 3D IC families, while the Zynq(R) UltraScale+ family includes the industry’s first all programmable MPSoCs. Optimized at the system level, UltraScale+ delivers value far beyond a traditional process node migration – providing 2-5X greater system level performance/watt over 28nm devices, far more systems integration and intelligence, and the highest level of security and safety.

2.9 CONCLUSIONS This chapter presented Programmable Logic Devices, FPGAs, Microcontrollers, design methodologies and their applications. Comparison of FPGAs vs Microcontrollers and FPGAs vs ASIC are also discussed so that a designer can choose the one that suits most for his market. Remaining part of book covers SRAM based and Antifuse/Flash FPGAs from Xilinx, Altera and Actel/Microsemi companies. Only architectural details and main features of FPGAs are discussed in these chapters. For designing with a selected FPGA, the designer will require more information on the FPGA that include electrical specifications, pin assignments, software design tools etc. So designer after selecting the most suitable FPGA for his project, he needs to visit websites of the selected FPGA company to get data sheets and other information required.

Programmable Logic Devices

2.29

Since the book is for designing with FPGAs, an overview of Hardware Description languages is presented followed by design methodologies, selected exercises based on Xilinx FPGAs in the remaining chapters. In the end, FPGA Security and future of FPGAs are discussed.

EXERCISES Q.1. Which factors you take into account to select either PROM or PAL or PLA to implement a given application? Q.2. Design a combinational circuit using ROM. The circuit accepts a 3-bit number and generates an output binary number equal to the square of the number. Q.3. Tabulate the truth for an 8 X 4 ROM that implements the following four Boolean functions: A(X, Y, Z) = S(3, 6, 7); B(X, Y, Z) = S(0, 1, 4, 5, 6) C(X, Y, Z) = S(2, 3, 4); D(X, Y, Z) = S(2, 3, 4, 7) Q.3. Specify the size of a ROM (number of words and number of bits per word) that will accommodate the truth table for the following combinational circuit: An 8-bit adder/ subtractor with Cin and Cout. Q.4. Design a half adder using PAL. Give complete circuit diagram. Q.5. Design a full adder using PAL. Give complete circuit diagram. Q.6. Using a PLA, design a 3-bit ripple carry binary adder circuit. Give complete circuit diagram. Q.7. A 20 pin registered out put PAL is selected and a programme in PALASM is written as follows: CNT4SC (4 bit Counter with synchronous clear ) Pin of PAL

Signal Assignment

1

Clock

2

Clear

3

NC

4

NC

5

NC

6

NC

7

NC

8

NC

9

NC

10

Ground

11

OE

12

NC

2.30

Field Programmable Gate Arrays and Applications 13

NC

14

/Q3

15

/Q2

16

/Q1

17

/Q0

18

NC

19

NC

20

VCC

OE (output enable) is active HIGH and Clear is Active LOW The logic expressions are: Q3 :

= Clear



+ /Q3 * /Q2 * /Q1 * /Q0



+ Q3 * Q0



+ Q3 * Q1



+ Q3 * Q2 Q2

:= Clear



+ /Q2 * /Q1 * /Q0



+ Q2 * Q0



+ Q2 * Q1 Q1

:= Clear



+ / Q1 * /Q0



+ Q1 * Q0 Q0

:= Clear



+ /Q0

Using the above information fill in the function table: OE

Clear

Clock

H

L

Present (P)

H

H

P

H

H

P

H

H

P

H

H

P

H

H

P

H

H

P

H

L

P

Q0

Q1

Q2

Q3

Programmable Logic Devices

2.31

Q.8. Implement MOD 7 counter with asynchronous clear using a PAL or PLA. Q.9. A controller is to be designed to monitor and control the temperature of a furnace within the upper limit of Tmax and the lower limit of Tmin. The controller should read the temperature of the furnace at a regular interval of t seconds and check that the temperature is within the limits of Tmax and Tmin. If the temperature is outside the limits, the controller should give an alarm indication. Using the microcontroller M6801, design the unit along with the additional components such as ADC/DAC and alarm circuit etc. Give the complete circuit diagram along with pesudocode for control program. Q.10. Would you choose a microcontroller with a real-time operating system (RTOS) or an FPGA from Virtex family for a real-time application. Give reasons for your choice.

REFERENCES

1. Ashok K. Sharma, Programmable Logic Handbook: PLDs, CPLDs and FPGAs, McGraw Hill, 1998



2. Richard C. Seals, Programmable Logic: PLDs and FPGAs, McGraw Hill, 1997



3. James D. Broesch, Practical Programmable Circuits, A Guide to PLDs, State machines and Microcontrollers, Academic Press, 2012



4. Ajay Deshmukh, Microcontrollers (Theory and Applications), Tata McGraw Hill, 2005



5. Todd D. Morton, Embedded Microcontrollers, Pearson, 2001.

3 Xilinx FPGAs

3.1 INTRODUCTION Xilinx started manufacturing Field Programmable Gate Arrays (FPGAs) since 1984. Initial FPGAS such as XC 2000, XC 3000 and XC 4000 are simpler ones and used for combinational and sequential logic and mainly used as ‘glue’ logic. These FPGAS had only resources like combinational logic gates and registers, interconnects for interconnecting the appropriate logic gates and I/O blocks to provide an interface between FPGA and outside world. XC 4000FPGA architecture is shown in Fig. 3.1. All components of this FPGA are discussed in the following sections.

Fig. 3.1 XC4000 FPGA Architecture.

3.2

Field Programmable Gate Arrays and Applications

3.1.1  Look-up Table (LUT) Combinational functions are implemented using a look-up-table (LUT). A 4 input LUT has 16 programmable states to implement any arbitrary 4-input functions as shown in Fig 3.2. Thus, LUTs store a predefined list of outputs for every combination of inputs and provide a fast way to retrieve the output of logic operations. They can implement combinational logic using any arbitrarily defined Boolean function of four inputs. Combinatorial Function A B Z C D

A

B

C

D

Z

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

1

1

1

0

1

0

0

1

0

1

0

1

1

.

.

.

1

1

0

0

0

1

1

0

1

0

1

1

1

0

0

1

1

1

1

1

Fig. 3.2 4 Input Look-up Table.

LUT can have a Flip flop as shown in Fig. 3.3 for implementing sequential logic.

A B C

D

Y

LUT

Q FF

Clock

Fig. 3.3 Look-up Table with a Flip Flop.

3.1.2 Slice A logic slice consists of two sets of the following as shown in Fig. 3.4 – Carry & control ● Fast arithmetic logic ● Multiplier logic ● Multiplexer logic

Xilinx FPGAs

3.3

– Storage element ● Latch o r flip-flop ● Set and reset ● True or inverted inputs ● Synchronous or asynchronous. Control

Fig. 3.4 Slice configuration.

3.1.3  Fast Carry Logic To perform addition/subtraction, fast carry generation is used in the FPGA. Each CLB contains separate logic and routing for the fast generation of SUM and CARRY signals. In slice as shown in Fig. 3.5 carry-Look-ahead principle is used to generate carry by using Carry Generate and carry propagate logic. Between slices, carry is rippled through as shown in Fig. 3.6.

3.4

Field Programmable Gate Arrays and Applications x

y

0

0

0

1

1

0

1

Cout

cout y

cin cin

1

y

S



p = x ⊕ y



g = y



s = p ⊕ cin = x ⊕ y ⊕ cin

p

x y

g

Cin

Fig. 3.5 Carry Look-ahead and Carry propagate generation in a slice. COUT

COUT

To CIN of S2 of the next CLB

To S0 of the next CLB

O

MUXCY FF

LUT

O

FF

LUT

First Carry Chain

Slice S3

MUXCY

CIN COUT O

MUXCY FF

LUT

O O

MUXCY

Slice S2

MUXCY FF

LUT FF

LUT

O

MUXCY FF

LUT

Slice S1

CIN Second Carry Chain

COUT O

MUXCY FF

LUT

O

MUXCY FF

LUT

CIN

Slice S0

CIN

CLB

Fig. 3.6 Ripple Carry between slices for fast carry generation.

Xilinx FPGAs

3.5

3.1.4  Multiplier Unit FPGAs are increasingly being applied to DSP applications but are often inefficient in space and time compared with dedicated DSP chips, particularly for multiplication-based operations. To improve FPGA arithmetic performance, a flexible multiplication unit and configurable carry logic circuitry suitable for incorporation into a FPGA logic block are proposed as shown in Fig. 3.7. The multiplier unit is based on a modified carry-save adder and along with the carry logic circuitry that efficiently supports multiplication, addition and multiply/accumulate operations in serial or parallel form. Preliminary results indicate logic utilization for a multiplier implementation in such an FPGA is approximately a third that of the XC4000 architecture. . Propagation delays are also reduced due to the use of dedicated inter-block interconnect for all sum and carry signals and flexible routing multiplexers. In the multiplier, highly efficient multiply and add implementation is done. Earlier FPGA architectures require two LUTs per bit to perform multiplication and addition. The above multiplier enables an area reduction by performing multiply and add in one LUT per bit.

Fig. 3.7 Multiplier in XC 4000.

3.1.5  Shift Register Dynamically addressable serial-in serial-out shift registers could be built using LUTs (SRL16CE) as shown in Fig. 3.8. They can be cascadable to other LUTs or CLBs for longer shift registers. Q 15 of this shift register can be connected to D input of next SRL16CE. LUT D CE CLK

O CE

O CE

O CE

O

O CE

L UT A[30]

Q15 (cascade out)

Fig. 3.8 Shift Register using LUT.

3.6

Field Programmable Gate Arrays and Applications

3.2  CONFIGURABLE LOGIC BLOCK (CLB) Configurable Logic Blocks implement most of the logic in an FPGA. The principal CLB elements are shown in Fig. 3.9. Two 4-input function generators (F and G) offer unrestricted versatility. Most combinatorial logic functions need four or fewer inputs. However, a third function generator (H) is provided. The H function generator has three inputs. Either zero, one, or two of these inputs can be the outputs of F and G; the other input(s) are from outside the CLB. The CLB can, therefore, implement certain functions of up to nine variables, like parity check or expandable identity comparison of two sets of four inputs. Each CLB contains two storage elements that can be used to store the function generator outputs. However, the storage elements and function generators can also be used independently. DIN (Direct Input) can be used as a direct input to either of the two storage elements. H1 can drive the other through the H function generator. Function generator outputs can also drive two outputs independent of the storage element outputs. This versatility increases logic capacity and simplifies routing. Thirteen CLB inputs and four CLB outputs provide access to the function generators and storage elements. These inputs and outputs connect to the programmable interconnect resources outside the block. 4

C1...C4

H1 G4 G3 G2

D1NH2 SR/H0 EC S/R Control

Logic Function of G G1-G4

DIN F G H

Bypass D SD Q

YQ

G1 Logic Function of F, G, H and H1

F4 F3 F2

EC RD

G H

Logic Function of F F1-F4

1

Y Bypass

S/R Control

DIN F G H

D SD Q

XQ

F1 EC K (Clock)

RD

1 H F

X Multiplexer Controlled by Configuration Program

Fig. 3.9 XC4000 CLB Structure.

Xilinx FPGAs

3.7

3.3 INTERCONNECT AND ROUTING Programmable interconnect points, or PIPs, provide the routing paths used to connect the inputs and outputs of IOBs and CLBs into logic networks. Routing for XC 4000 is shown in Fig. 3.10(a) and 3.10(b).

CLB

CLB

CLB

Doubles

PSM

Singles

PSM

Doubles

CLB

CLB

CLB

PSM

PSM

CLB

CLB

CLB

Fig. 3.10(a) XC 4000 Routing.

CLB inputs and outputs connect to channels on all four sides, to provide maximum routing flexibility. Switch matrix connects rows and columns as shown in Fig.3.11(a). This switch matrix consists of six CMOS transistors as shown in Fig. 3.11 (b) and they could be programmed to turn ON or OFF. There are three kinds of interconnect:

1. Row and column routing.



2. IOB routing, which forms a ring around the outside of the CLB array.



3. Dedicated networks, primarily intended for clocks, but usable for other signals.

3.8

Field Programmable Gate Arrays and Applications

Double

Single

Double

Long

F4C4G4

YQ Y

Direct

G1 C1 F1

CLB K X XQ

G3 C3 F3

Feedback

F2C2G2

Long

k

c ba ed

Fe

ct ire

D

e

l ba lo

G

ng

Lo

bl ou

D

le ng Si

e

bl ou

D

ng Lo

*

Fig. 3.10(b) Interconnect details in XC 4000

Xilinx FPGAs

3.9

Fig. 3.11(a) Switch Matrix

Fig. 3.11(b) CMOS transistors in Switch matrix.

CLBs connect to lines of various lengths.

3.3.1 Direct Connections Direct connections are used between adjacent CLBs.

3.3.2 Single-Length Lines Single-length lines provide the greatest interconnect flexibility and offer fast routing between adjacent blocks. There are eight vertical and eight horizontal single-length lines associated with each CLB. These lines connect the switching matrices that are located in every row and a column of CLBs. Single-length lines are connected by way of the programmable switch matrices, as shown in Fig. 3.10(a). Single-length lines incur a delay whenever they go through a switching matrix. Therefore, they are not suitable for routing signals for long distances. They are normally used to conduct signals within a localized area and to provide the branching for nets with fan-out greater than one.

3.10

Field Programmable Gate Arrays and Applications

3.3.3  Double-Length Lines The double-length lines consist of a grid of metal segments, each twice as long as the singlelength lines: they run past two CLBs before entering a switch matrix. Double-length lines are grouped in pairs with the switch matrices staggered so that each line goes through a switch matrix at every other row or column of CLBs. There are four vertical and four horizontal double-length lines associated with each CLB. These lines provide faster signal routing over intermediate distances, while retaining routing flexibility. Double-length lines are connected by way of the programmable switch matrices.

3.3.4  Long Lines Long lines are used for high fan out and time critical nets or nets that need to be distributed over much of the chip.

3.3.5  Global lines Global lines are used for clocking or distributing logic signals.

3.4  IOB-INPUT.OUTPUT BLOCK A Field Programmable Gate Array (FPGA) device includes a plurality of input/output blocks (IOBs). An inter-connect network provides routing of signals between the IOBs and CLBs. The IOBs are arranged along a top, left, bottom and right side of the FPGA. An IOB includes a (1) delay for timing input signals, (2) a configurable output latch which may be set or reset responsive to control signals, and (3) transistor for controlling a NOR line. The IOB is programmably configured to the inter-connect network which includes vertical and horizontal inter-connect channels comprising adjacent inter-connect lines. IOB block has two input registers and two output registers with tristate control as shown in Fig. 3.12. IOB

Reg OCK 1

Reg OCK 2

Reg OCK 1

Reg OCK 2

DDR MUX Reg ICK 1 3-state Reg ICK 2

DDR MUX PAD

Output

Fig. 3.12 Input/Output Block.

Xilinx FPGAs

3.11

The Xilinx I/O cell is the input/output block (IOB). Fig. 3.13(a) and 3.13(b) show the Xilinx XC4000 IOB, slew passive passive rate pull-down pull-up M

M

VDD

M D1

three-state

OE

M

TS

output buffer

FFO OUT

M

output clock OK

R1 10 kohm

M

DQ

I/O pad

M

0B

10 M

I1

M

1B

flip-flop or latch

FFI QD

D2

R3 100 ohm

input buffer M

M2

delay T

R2 100 kohm

I2 M

M = SRAM cell

flip-flop or latch

input clock M = programmable MUX

IK

Fig. 3.13 (a) Xc 4000 IOB.

Slew Rate Control

Passive Pull-Up/ Pull-Down

T Flip-Flop D

Q

Out

Output Buffer

CE Output Clock I1

I2

FlipFlop/ Latch Q

Input Buffer

D Delay

Clock Enable

CE

Input Clock

Fig. 3.13 (b) The Xilinx XC4000 family IOB (input/output block).

Pad

3.12

Field Programmable Gate Arrays and Applications

The outputs contain features that allow to do the following: ● Switch between a totem-pole and a complementary output. ● Include a passive pull-up or pull-down (both n-channel devices) with a typical resistance of about 50 k W (ohm). ● Invert the three-state control (output enable OE or three-state, TS). ● Include a flip-flop, or latch, or a direct connection in the output path. ● Control the slew rate of the output. The features on the inputs allow to do the following: ● Configure the input buffer with TTL or CMOS thresholds. ● Include a flip-flop, or latch, or a direct connection in the input path. ● Switch in a delay to eliminate an input hold time. ● IOB has more versatile clocking polarity options. ● IOB has programmable input set-up time. ● long to avoid potential hold time problems. ● short to improve performance. ● IOB has Longline access through its own TBUF. ● Outputs are n-channel only, lower VOH increases speed. ● XC4000 outputs can be paired to double sink current to 24 ma.

3.4.1 Pull-up and Pull-down Resistors Programmable pull-up and pull-down resistors are useful for tying unused pins to Vcc or Ground to minimize power consumption and reduce noise sensitivity. The configurable pullup resistor is a p-channel transistor that pulls to Vcc. The configurable pull-down resistor is an n-channel transistor that pulls to Ground. The value of these resistors is 50 kW - 100 kW. This high value makes them unsuitable as wired-AND pull-up resistors. The pull-up resistors for most user-programmable IOBs are active during the configuration process. After configuration, voltage levels of unused pads, bonded or un-bonded, must be valid logic levels, to reduce noise sensitivity and avoid excess current. Therefore, by default, unused pads are configured with the internal pull-up resistor active. Alternatively, they can be individually configured with the pull-down resistor, or as a driven output, or to be driven by an external source. To activate the internal pullup, attach the PULLUP library component to the net attached to the pad. To activate the internal pull-down attach the PULLDOWN library component to the net attached to the pad. Separate clock signals are provided for the input.

3.4.2  Digitally Controlled Impedance Bus carrying address, data and control in computer systems needs to be terminated by the characteristic impedance ( R0) of the bus wire which depends on the inductance and capacitance of the wire per unit length to avoid reflections on bus. In FPGA, digitally controlled impedance (DCI) provides on-chip termination for receivers and transmitters

Xilinx FPGAs

3.13

that match the characteristic impedance of the traces and improves signal integrity by eliminating stub reflections. Also DCI reduces board routing complexity and component count by eliminating external resistors and eliminates the effects of temperature, voltage, and process variations by using an internal feedback circuit.

3.5  FURTHER ADVANCES IN FPGAS After the initial simpler FPGAs like XC 2000, XC 3000, XC 3000 which are used for small scale applications and ‘glue logic , Xilinx has come out with derivatives of all families such as XC 5200, a onetime programmable non-volatile FPGA XC8100, the ultimate reprogrammable FPGA XC6200 etc. All these FPGAs used nearly the same base blocks. Because of the tremendous advances made in microelectronics and also Moores law is still applicable, Xilinx then started with new FPGA family VIRTEX. The first in this family is Virtex and it is introduced in 1998.

3.6  VIRTEX ARCHITECTURE For about 14 years after the first FPGA, Xilinx introduced Virtex in 1998 due to the advances in semiconductor technology that followed Moores law. Virtex FPGAs are composed of an array of Configurable Logic Blocks (CLBs) surrounded by a ring of Input/Outputs Blocks (IOBs). On the east and west edges are Block RAMs (BRAMs). The CLBs are the primary building blocks that contain elements for implementing customizable gates, flip flops, and wiring for connectivity. The IOBs  provide circuitry for communicating signals with external devices. The BRAMs allow for synchronous DLL

DLL

IOBs

VersaRing

IOBS

V e r s a R i n g

CLBS BRAMs

BRAMs

V e r s a R i n g

IOBS

VersaRing

DLL

IOBS

Fig. 3.14 Architecture of Virtex.

DLL

3.14

Field Programmable Gate Arrays and Applications

or asynchronous storage of kilobits of data, though each CLB can also implement synchronous/asynchronous 32-bit RAMs. Virtex architecture is shown in Fig.  3.14. ● IOBs - Input/Output Blocks ● CLBs - Configurable Logic Blocks ● GRM - General Routing Matrix ● 3-state buffers ● BRAMs - Block SelectRAM ● DLLs - Delay-Locked Loops ● VersaRing - I/O interface routing resources

3.6.1 Main features of Virtex 1. 0.25 micron, 5 metal layer, 2.5V core voltage 2. 5 volt compatible IO usable in 14 standards 3. Introduced (a) 4K-bit Block RAM (b) Delay Locked Loop ( not shown in fig.) (c) IO decoupling from logic

3.6.2 Block RAM The block RAM in FPGAs stores up to 18K bits of data and can be configured as either two independent 9 Kb RAMs, or one 18 Kb RAM. Each RAM can be addressed through two ports,

Fig. 3.15 Dual Port Block RAM.

Xilinx FPGAs

3.15

but can also be configured as a single-port RAM. The block RAM resources include output registers to increase pipeline performance. Block RAMs are placed in columns. The total number of block RAMs depends on the size of the device. In Block RAMs, Write and Read are synchronous operations; the two ports are symmetrical and totally independent, sharing only the stored data. Each port can be configured in one of the available widths, independent of the other port. The memory content can be initialized or cleared by the configuration bit stream. During a write operation the memory can be set to have the data output either remain unchanged, reflect the new data being written or the previous data now being overwritten. Dual port Block RAM is shown in Fig. 3.15 True Dual-Port Names and Descriptions Port Name

Description

DI[A|B](1)

Data Input Bus

DIP[A|B](1)

Data Input Parity Bus

ADDR[A|B]

Address Bus

WE[A|B]

Byte-wide Write Enable

EN[A|B]

When inactive no data is written to the block RAM and the output bus remains in its previous state.

SSR[A|B]

Synchronous Set/Reset the output registers (DO_REG = 1).

CLK[A|B]

Clock Input

DO[A|B](1)

Data Output Bus

DOP[A|B](1)

Data Output Parity Bus

REGCE[A|B] Output Register Clock Enable

3.6.3  Delay-Locked Loop (DLL) For clock synchronization, Phase-locked loops (PLLs) have been used since the 1940’s in analog implementations. PLL block diagram is shown in Fig. 3.16. Voltage Controlled Oscillator

CLKIN

CLKOUT

Control

CLKFB PLL Block Diagram

Fig. 3.16 PLL Block Diagram.

Clock Distribution Network

3.16

Field Programmable Gate Arrays and Applications

Recent emphasis on digital methods has made it desirable to match signal phases digitally. Using a digital delay-locked loop (DLL) in place of an analog PLL eliminates the need for separate noise-free ground and power planes. Virtex uses DLLs. Associated with each global clock input buffer is a fully digital DelayLocked Loop (DLL) that can eliminate skew between the clock input pad and internal clock-input pins throughout the device. Each DLL can drive two global clock networks. The DLL monitors the input clock and the distributed clock, and automatically adjusts a clock delay element. Clock edges reach internal flip-flops one to four clock periods after they arrive at the input. This closed-loop system effectively eliminates clock-distribution delay by ensuring that clock edges arrive at internal flip-flops in synchronism with clock edges arriving at the input. In addition to eliminating clock-distribution delay, the DLL provides advanced control of multiple clock domains. The DLL provides four quadrature phases of the source clock, can double the clock, or divide the clock by 1.5, 2, 2.5, 3, 4,5, 8, or 16. The DLL also operates as a clock mirror. By driving the output from a DLL off-chip and then back on again, the DLL can be used to de-skew a board level clock among multiple Virtex devices. In order to guarantee that the system clock is operating correctly prior to the FPGA starting up after configuration, the DLL can delay the completion of the configuration process until after it has achieved lock. A block diagram of DLL is shown in Fig. 3.17. CLKIN

CLK0

CLKFB

CLK90 CLK180 CLK270

CLK2x CLKDV RST

LOCKED

Fig. 3.17 DLL Block Diagram

Functioning of DLL: A delay-locked loop (DLL), shown in Fig. 3.18, is a digital circuit similar to a Phased Lock Loop(PLL), with the main difference being the absence of an internal Voltage-Controlled Oscillator replaced by a delay line. A DLL can be used to change the phase of a clock signal (a signal with a periodic waveform) usually to enhance the clock rise-to-data output valid timing characteristics of integrated circuits. DLLs can also be used for Clock recovery (CDR). From the outside, a DLL can be seen as a negative-delay gate placed in the clock path of a digital circuit. The main component of a DLL is a delay chain composed of many delay gates connected front-to-back. The input of the chain (and thus of the DLL) is connected to the clock that is to be negatively delayed. A multiplexer is connected to each stage of the delay chain; the selector of this multiplexer is automatically updated by a control circuit to produce the negative delay effect. The output of the DLL is the resulting, negatively delayed clock signal.

Xilinx FPGAs

3.17 Delay-locked loop A delay line whose total delay is locked to the clock period Local dock

Reference signal X(s) Phase Reference clock Comparator G

~

Error signal E(s) = X(s) – Y(s)]

Total delay correction changing the delay of each stage

GE(s)

Note: with a flat gain block the DLL is a loop of 0st order and type 0. With an integrator instead, it is a loop of 1st order and type 1 (that locks with zero steady state phase error). Variable delay line N elements

Gf/s Integrator G E(s)

Gf N phases of the reference clock

Fig. 3.18 Delay Locked Loop (DLL).

The delay locked loop is a variable delay line whose delay is locked to the duration of the period of a reference clock. Depending on the signal processing element in the loop (a flat amplifier or an integrator), the DLL loop can be of 0th order type 0 or of 1st order type 1. Another way to view the difference between a DLL and a PLL is that a DLL uses a variable phase (=delay) block where a PLL uses a variable frequency block. A DLL compares the phase of its last output with the input clock to generate an error signal which is then integrated and fed back as the control to all of the delay elements. The integration allows the error to go to zero while keeping the control signal, and thus the delays, where they need to be for phase lock. Since the control signal directly impacts the phase this is all that is required. Features of DLLS implemented in Virtex are given in Table 3.1 Virtex DLL

Table 3.1: Features of DLLs in Virtex. Feature

Virtex

Architecture

DLL

Technology

100% Digital

Quantity

4

Max Output (MHz)

200

Input Duty Cycle

n/a

Output Duty Cycle

50%

Min Input Clock (MHz)

25

3.18

Field Programmable Gate Arrays and Applications

3.7 VIRTEX-II After Virtex, Xilinx introduced Virtex-II. Virtex-II Architecture is shown in Fig. 3.19. Block SelectRAM resource

I/O Blocks (IOBs)

TM

Programmable Interconnect Dedicated multipliers Configurable Loglc Blocks (CLBs) VirtexTM-II architecture’s core vp;tage operates at 1.5V

Clock Management (DCM, BUFGMUXes)

Fig. 3.19 Virtex-II Architecture.

In Virtex-II, additional resources are: (a) Distributed RAM and block RAM

– Distributed RAM uses the CLB resources (1 LUT = 16 RAM bits)



– Block RAM is a dedicated resources on the device (18-kb blocks)

(b) Dedicated 18 × 18 multipliers next to block RAMs (c) Clock management resources

– Sixteen dedicated global clock multiplexers



– Digital Clock Managers (DCMs) RAM Applications are given in Table 3.2 Table 3.2: RAM Applications



● ● ● ● ● ● ● ● ●

Operand stacks ● Register files ● Instruction caches ● DMA buffers Instruction memories ● State tables ● Logic functions Message buffers ● Virtual channels ●

Video line buffers Digital delay lines RAMDAC color mapping tables Test vector buffers PCl configuration space Sequential machines More...

Xilinx FPGAs

3.19

Fig. 3.20 Shows a single bit SRAM Read/Write circuit.

A1

Row Address Decoder

A0

Right Row (RR) Transistor

Left Row (LR) Transistor

Left Column (LC) Transistor

Right Column (RC) Transistor

A2 Column A3 Address Decoder ENA WE OE

Control Logic

DATA OUT

DATA IN

Fig. 3.20 Single bit SRAM Read/Write Circuit.

3.7.1  Digital Clock Manager (DCM) Each DCM provides familiar clock generation capability. To generate deskewed internal or external clocks, each DCM can be used to eliminate clock distribution delay. The DCM also provides 90°, 180°, and 270° phase-shifted versions of the output clocks. Fine-grained phase shifting offers higher resolution phase adjustment with fraction of the clock period increments. Flexible frequency synthesis provides a clock output frequency equal to a fractional or integer multiple of the input clock frequency. DCMs are located on the top and bottom edges of the die and are driven by clock input pads.DCM block diagram is shown in Fig. 3.21. Configurable Logic

Configurable Logic

RST FF CLK*

BUFG

FF

CLK*

FF

BUFFG DCM Primitive

PS Done

PS*/DS

Status Locked

Fig. 3.21 DCM Block Diagram.

3.20

Field Programmable Gate Arrays and Applications

3.8  VIRTEX-II PRO Xilinx introduced Virtex-II Pro family in the year 2003. Virtex-II Pro architecture is shown in Fig. 3.22.

Fig. 3.22 Virtex-II Pro Architecture.

3.8.1  Main Features of Virtex-II Pro

● 0.13 micron process



● Up to 24 RocketIO™ Multi-Gigabit Transceiver (MGT) blocks



– Serializer and deserializer (SERDES)



– Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant transceivers, and others



– 8-, 16-, and 32-bit selectable FPGA interface



– 8B/10B encoder and decoder



● PowerPC™ RISC processor blocks



– Thirty-two 32-bit General Purpose Registers (GPRs)



– Low power consumption: 0.9mW/MHz



– IBM CoreConnect bus architecture support We will focus on additional functional block added in Virtex-II Pro

Xilinx FPGAs

3.21

3.8.2  Rocket IO Transceiver Features

● Full-Duplex Serial Transceiver (SERDES) Capable of

Baud Rates from 600 Mb/s to 3.125 Gb/s

● 120 Gb/s Duplex Data Rate (24 Channels)



● Monolithic Clock Synthesis and Clock Recovery (CDR)



● Fibre Channel, 10G Fibre Channel, Gigabit Ethernet,

10 Gb Attachment Unit Interface (XAUI), and Infiniband-Compliant Transceivers

● 8-, 16-, or 32-bit Selectable Internal FPGA Interface



● 8B/10B Encoder and Decoder (optional)



● 50^/75^ on-chip Selectable Transmit and Receive

Terminations

● Programmable Comma Detection



● Channel Bonding Support (from 2 to 24 Channels)



● Rate Matching via Insertion/Deletion Characters



● Four Levels of Selectable Pre-Emphasis



● Five Levels of Output Differential Voltage



● Per-Channel Internal Loopback Modes



● 2.5V Transceiver Supply Voltage

3.8.3  Processors in Xilinx FPGA In Xilinx FPGA there are two types of processors (1) Hard core (2) Soft core. Hard core processor is implemented in the FPGA and it is an integral part of the FPGA. Hard core processors implemented are IBM Power PC 405 and ARM. Soft core processors are not physically implemented in the FPGA and it is available in software IP and if the designer/ customer wants to use the soft processor, the software IP needs to be downloaded into FPGA. Picoblaze and Microblaze are the softcore processors available with Xilinx. Picoblaze is an 8 bit architecture softcore processor and Microblaze is a 32 bit architecture softcore processor.

3.8.3.1 Hard Core Processor IBM POWER PC 405 Power PC Block diagram is given in Fig. 3.23. Main features of IBM Power PC 405:

● Embedded 300+ MHz Harvard Architecture Block



● Low Power Consumption: 0.9 mW/MHz



● Five-Stage Data Path Pipeline

3.22

Field Programmable Gate Arrays and Applications



● Hardware Multiply/Divide Unit



● Thirty-Two 32-bit General Purpose Registers



● 16 KB Two-Way Set-Associative Instruction Cache



● 16 KB Two-Way Set-Associative Data Cache



● Memory Management Unit (MMU)



– 64-entry unified Translation Look-aside Buffers (TLB)



– Variable page sizes (1 KB to 16 MB)



● Dedicated On-Chip Memory (OCM) Interface



● Supports IBM CoreConnect™ Bus Architecture



● Debug and Trace Support



● Timer Facilities

Fig. 3.23 IBM Power PC 405.

3.8.3.2  A Power PC based embedded system A Power PC based embedded system is shown in Fig. 3.24.

Xilinx FPGAs

3.23

Fig. 3.24 Power PC based Embedded System.

3.8.4  Soft Core Processor MICROBLAZE Block diagram showing various units of Microblaze is shown in Fig. 3.25 Microblaze

Fig. 3.25 Microblaze Architecure

Microblaze Features: It is a soft processor, around 900 LUTs

3.24

Field Programmable Gate Arrays and Applications

RISC Architecture 32 bit, 32 × 32 bit general purpose registers Supported in Virtex/Spartan Family 32 bit Instruction word with three operands and two addressing modes Separate 32 bit Instruction and data buses that conform to IBM On-chip Peripheral bus (OPB) specification Separate 32 bit Instruction and data buses with direct connection to On-chip block RAM through a Local Memory Bus (LMB) 32-bit address bus Single Issue Pipeline Instruction and data Cache Hardware debug logic Fast Simplex Link (FSL) support

3.8.4.1 An Embedded System Using Microblaze and Power PC An embedded system using Microblaze and Power PC is shown in Fig. 3.26

Fig. 3.26 Microblaze and Power PC based Embedded System.

Xilinx FPGAs

3.25

3.9 VIRTEX-4 After Virtex-II Pro, Xilinx introduced Virtex-4in the year 2004. Virtex-4 architecture is shown in Fig. 3.27. Virtex-4 Architecture Rocket OTM Multi-Gigabit Transceivers 622 Mbps-10.3 Gbps

SmartRAM New block RAMFIFO

Xesium-Clocking Technology 500 MHZ

Advanced CLBs 200K Logic Cells

Tri-Mode Ethernet MAC 10/100/1000 Mbps

TM

XtremeDSP Technology Slices 256 18 × 18 GMACs TM

PowerPC 405 with APU interface 450 MHZ,680 DMPs

Fig. 3.27 Virtex-4 Architecture.



1 Gbps Select OTM ChipSyncTM Source synch, XCITE Active Termination

Virtex-4 Main features ● Xesium™ Clock Technology – Digital clock manager (DCM) blocks – Additional phase-matched clock dividers (PMCD) – Differential global clocks ● XtremeDSP™ Slice – 18 × 18, two’s complement, signed Multiplier – Optional pipeline stages – Built-in Accumulator (48-bit) and Adder/Subtracter ● Smart RAM Memory Hierarchy – Distributed RAM – Dual-port 18-Kbit RAM blocks – Optional pipeline stages – Optional programmable FIFO logic automatically remaps RAM signals as FIFO signals – High-speed memory interface supports DDR and DDR-2 – SDRAM, QDR-II, and RLDRAM-II.

3.26

Field Programmable Gate Arrays and Applications



● SelectIO™ Technology – 1.5V to 3.3V I/O operation – Built-in ChipSync™ source-synchronous technology – Digitally controlled impedance (DCI) active termination – Fine grained I/O banking (configuration in one bank) ● Flexible Logic Resources



● Multiple Tri-Mode Ethernet MACs

● Secure Chip AES Bitstream Encryption ● 90 nm Copper CMOS Process ● 1.2V Core Voltage ● Flip-Chip Packaging including Pb-Free Package Choices ● RocketIO™ 622 Mb/s to 6.5 Gb/s Multi-Gigabit Transceiver (MGT) ● IBM PowerPC RISC Processor Core – PowerPC 405 (PPC405) Core – Auxiliary Processor Unit Interface (User Coprocessor)

3.9.1  Sub Families Xilinx manufactured sub-families so that user can choose the best mix of resources to optimize cost and performance. Sub-families are shown in Fig. 3.28 LX is for high performance logic, Block RAMs, I/Oand could be used for high performance logic applications. Virtex-4 Family Optimized for logic, Embedded, and Signal Processing LX

FX

SX

14K-200 KLCs

12K-140KLCs

23K-55KLCs

0.9–6 Mb

0.6-10 Mb

2.3-5.7 Mb

4-12

4-20

4-8

32-96

32-152

128-512

240-960

240-896

320-640

RocketIO

N/A

0-24 Channels

N/A

PowerPC

N/A

1 or 2 Cores

N/A

Ethernet MAC

N/A

2 or 4 Cores

N/A

Resource Logic Memory DCMs SDP Slices SelectIO

Fig. 3.28 Vitex-4 Sub-family.

Xilinx FPGAs

3.27

SX has more DSP Blocks, Block RAMs and less logic and highly suited for Digital Signal Processing applications, FX adds powerful System features like Power PC, Ethernet Controller, 11Gbps transceivers and well suited for high performance applications. After Virtex-4, Xilinx introduced Virtex-5 in 2007, Virtex-6 in 2009 and Virtex-7 in 2010. Since main blocks of Xilinx advanced FPGAs are already explained only main features of Virtex-5, Virtex-6 and Virtex-7 are only presented in the following sections.

3.10 VIRTEX-5 Summary of Virtex-5 FPGA Features Architectural features of Virtex-5 is shown in Fig. 2.12(b) in Chapter 2

● Five platforms LX, LXT, SXT, TXT, and FXT



– Virtex-5 LX: High-performance general logic applications



– Virtex-5 LXT: High-performance logic with advanced serial connectivity



– Virtex-5 SXT: High-performance signal processing applications with advanced serial connectivity



– Virtex-5 TXT: High-performance systems with double density advanced serial connectivity



– Virtex-5 FXT: High-performance embedded systems with advanced serial connectivity



● Cross-platform compatibility – LXT, SXT, and FXT devices are footprint compatible in the same package using adjustable voltage regulators ● Most advanced, high-performance, optimal-utilization,FPGA fabric



– Real 6-input look-up table (LUT) technology



– Dual 5-LUT option



– Improved reduced-hop routing



– 64-bit distributed RAM option



– SRL32/Dual SRL16 option



● Powerful clock management tile (CMT) clocking



– Digital Clock Manager (DCM) blocks for zero delay buffering, frequency synthesis, and clock phase shifting



– PLL blocks for input jitter filtering, zero delay buffering, frequency synthesis, and phase-matched clock division



● 36-Kbit block RAM/FIFOs



– True dual-port RAM blocks



– Enhanced optional programmable FIFO logic



– Programmable

3.28

Field Programmable Gate Arrays and Applications

– True dual-port widths up to x36 – Simple dual-port widths up to x72

– Built-in optional error-correction circuitry



– Optionally program each block as two independent 18-Kbit blocks



● High-performance parallel SelectIO technology



– 1.2 to 3.3V I/O Operation



– Source-synchronous interfacing using ChipSync™ technology



– Digitally-controlled impedance (DCI) active termination



– Flexible fine-grained I/O banking



– High-speed memory interface support



● Advanced DSP48E slices



– 25 × 18, two’s complement, multiplication



– Optional adder, subtracter, and accumulator



– Optional pipelining



– Optional bitwise logical functionality



– Dedicated cascade connections



● Flexible configuration options



– SPI and Parallel FLASH interface



– Multi-bitstream support with dedicated fallback reconfiguration logic



– Auto bus width detection capability



● System Monitoring capability on all devices



– On-chip/Off-chip thermal monitoring



– On-chip/Off-chip power supply monitoring



– JTAG access to all monitored quantities



● Integrated Endpoint blocks for PCI Express Designs



– LXT, SXT, TXT, and FXT Platforms



– Compliant with the PCI Express Base Specification 1.1

– x1, x4, or x8 lane support per block

– Works in conjunction with RocketIO™ transceivers



● Tri-mode 10/100/1000 Mb/s Ethernet MACs



– LXT, SXT, TXT, and FXT Platforms



– Rocket IO transceivers can be used as PHY or connect to external PHY using many soft MII (Media Independent Interface) options



● Rocket IO GTP transceivers 100 Mb/s to 3.75 Gb/s – LXT and SXT Platforms

Xilinx FPGAs

3.29

● Rocket IO GTX transceivers 150 Mb/s to 6.5 Gb/s – TXT and FXT Platforms ● PowerPC 440 Microprocessors



– FXT Platform only



– RISC architecture



– 7-stage pipeline



– 32-Kbyte instruction and data caches included



– Optimized processor interface structure (crossbar)



● 65-nm copper CMOS process technology



● 1.0V core voltage



● High signal-integrity flip-chip packaging available in standard or Pb-free package options

3.11 VIRTEX-6

Fig. 3.29 Virtex-6 Architecture.

Virtex-6 Features; Advanced, high-performance FPGA Logic

● Real 6-input look-up table (LUT) technology



● Dual LUT5 (5-input LUT) option



● LUT/dual flip-flop pair for applications requiring rich register mix



● Improved routing efficiency



● 64-bit (or two 32-bit) distributed LUT RAM option per 6-input LUT



● SRL32/dual SRL16 with registered outputs option

3.30

Field Programmable Gate Arrays and Applications

● Powerful mixed-mode clock managers (MMCM) ● MMCM blocks provide zero-delay buffering, frequency synthesis, clock-phase shifting, inputjitter filtering, and phase-matched clock division ● 36-Kb block RAM/FIFOs ● Dual-port RAM blocks ● Programmable – Dual-port widths up to 36 bits – Simple dual-port widths up to 72 bits ● Enhanced programmable FIFO logic ● Built-in optional error-correction circuitry ● Optionally use each block as two independent 18 Kb blocks ● High-performance parallel SelectIO™ technology ● 1.2 to 2.5V I/O operation ● Source-synchronous interfacing using ChipSync™ technology ● Digitally controlled impedance (DCI) active termination ● Flexible fine-grained I/O banking ● High-speed memory interface suppoAdvanced DSP48E1 slices ● 25 x 18, two’s complement multiplier/accumulator ● Optional pipelining ● New optional pre-adder to assist filtering applications ● Optional bitwise logic functionality ● Dedicated cascade connections ● Flexible configuration options ● SPI and Parallel Flash interface ● Multi-bitstream support with dedicated fallback reconfiguration logic ● Automatic bus width detection ● System Monitor capability on all devices ● On-chip/off-chip thermal and supply voltage monitoring ● JTAG access to all monitored quantities ● Integrated interface blocks for PCI Express® designs ● Compliant to the PCI Express Base Specification 2.0 ● Gen1 (2.5 Gb/s) and Gen2 (5 Gb/s) support with GTX transceivers ● Endpoint and Root Port capable ● x1, x2, x4, or x8 lane support per block ● GTX transceivers: up to 6.6 Gb/s ● Data rates below 480 Mb/s supported by oversampling in FPGA logic. ● GTH transceivers: 2.488 Gb/s to beyond 11 Gb/s

Xilinx FPGAs

● ● ● ● ● ● ● ●

3.31

Integrated 10/100/1000 Mb/s Ethernet MAC block Supports 1000BASE-X PCS/PMA and SGMII using GTX transceivers Supports MII, GMII, and RGMII using SelectIO technology resources 2500Mb/s support available 40 nm copper CMOS process technology 1.0V core voltage (–1, –2, –3 speed grades only) Lower-power 0.9V core voltage option (-1L speed grade only) High signal-integrity flip-chip packaging available in standard or Pb-free package options

3.11.1  Virtex-6 sub-families

Three sub-families: ● Virtex-6 LXT FPGAs: High-performance logic with advanced serial connectivity ● Virtex-6 SXT FPGAs: Highest signal processing capability with advanced serial connectivity ● Virtex-6 HXT FPGAs: Highest bandwidth serial Connectivity

3.12 VIRTEX-7 Architecture of Virtex-7 is shown in Fig. 3.30.

High-speed Serial Transceivers

DSP datapath Programmable Logic Cells

Programmable IO Dual-Port

Fig. 3.30 Architecture of Virtex-7.

TM

RAM

3.32

Field Programmable Gate Arrays and Applications

Virtex- 7 Main Features:

● Advanced high-performance FPGA logic based on real 6-input lookup table (LUT) technology configurable as distributed memory.



● 36 Kb dual-port block RAM with built-in FIFO logic for on-chip data buffering.



● High-performance SelectIO™ technology with support for DDR3 interfaces up to 1,866 Mb/s.



● High-speed serial connectivity with built-in multi-gigabit transceivers from 600 Mb/s to maximum rates of 6.6 Gb/s up to 28.05 Gb/s, offering a special low-power mode, optimized for chip-to-chip interfaces.



● A user configurable analog interface (XADC), incorporating dual 12-bit 1MSPS analog-to-digital converters with on-chip thermal and supply sensors.



● DSP slices with 25 × 18 multiplier, 48-bit accumulator, and pre-adder for highperformance filtering, including optimized symmetric coefficient filtering.



● Powerful clock management tiles (CMT), combining phase-locked loop (PLL) and mixed-mode clock manager (MMCM) blocks for high precision and low jitter.



● Integrated block for PCI Express® (PCIe), for up to x8 Gen3 End point and Root Port designs.



● Wide variety of configuration options, including support for commodity memories, 256-bit AES encryption with HMAC/SHA-256 authentication,and built-in SEU detection and correction.



● Low-cost, wire-bond, lidless flip-chip, and high signal integrity flipchip packaging offering easy migration between family members in the same package. All packages available in Pb-free and selected packages in Pb option.



● Designed for high performance and lowest power with 28 nm, HKMG, HPL process, 1.0V core voltage process technology and 0.9V core voltage option for even lower power. The 7 series FPGAs include Artix-7, Kintex-7 and Virtex-7 as shown in Fig. 3.31.



● Artix®-7 Family: Optimized for lowest cost and power with small form-factor packaging for the highest volume applications.



● Kintex®-7 Family: Optimized for best price-performance with a 2X improvement compared to previous generation, enabling a new class of FPGAs.



● Virtex®-7 Family: Optimized for highest system performance and capacity with a 2X improvement in system performance. Highest capability devices enabled by stacked silicon interconnect (SSI) technology.

Xilinx FPGAs

3.33

Fig. 3.31 Artix-7, Kintex-7 and Virtex-7

PowerPCs were in V2P, V4FX, and V5FX families. 6-series FPGAs (V6 and S6) did not have any hard processors. 7-series FPGAs consists of Artix-7, Kintex-7, and Virtex-7 subfamilies. None of these sub-families have any hard core processors. The Zynq family has Dual ARM Cortex-A9 processors as hard processors. The Zynq family has 6 device members. The smaller three members 7010, 7015, and 7020 use Artix-7 like fabric in the PL section where as the bigger three family members 7030, 7045, and 7100 use Kinex-7 like fabric.

Fig. 3.32 FPGA Usage Model Evolution

3.34

Field Programmable Gate Arrays and Applications

To summarize, Xilinx started with FPGAs in 1985 that were used for Glue Logic and with semiconductor technological advances, Xilinx came up with FPGAs progressively that are used in control logic, complex control, embedded processing, co-processing and ASSP/ FPGA integration. as shown in Fig. 3.32.

3.13  SPARTAN FAMILY The Spartan® Generation of FPGAs offers a choice of platforms, each delivering a unique cost-optimized balance of programmable logic, connectivity, and dedicated hard IP for lowcost low power applications.

3.13.1 Spartan-3 The Spartan-3 is the first FPGA produced on the advanced 90nm process technology and has resulted in the industry’s lowest cost FPGA. Feature enhancements combined with advanced process technology deliver more functionality and bandwidth than previously possible. Architecture of Spartan 3 is shown in Fig. 3.33. IOBs

DCM

CLBs

IOBs

IOBs

IOBs

CLBs

IOBs Block RAM

Multiplier

Fig. 3.33 Spartan 3 Architecture.

The Spartan 3 family consists of eight family members with Block RAM, clock management (DCM), and multipliers. Device density ranges from 50,000 to 5,000,000 system gates depending on the size of the device. The Spartan-3™ family provides the lowest cost per gate and lowest cost per I/O of any FPGA. Spartan-3 also contains numerous platform features such as embedded 18 ×

Xilinx FPGAs

3.35

18 multipliers, up to 1.8Mb of block RAM, and embedded 32-bit and 8-bit soft processors. In addition, digitally controlled impedance, digital clock managers and 24 supported I/O standards are all in this low cost solution. The Spartan-3 family is for I/O centric designs; the Spartan-3L for low-power designs. The Spartan®-3 Generation of FPGAs offers a choice of five platforms, each delivering a unique cost-optimized balance of programmable logic, connectivity, and dedicated hard IP for your low-cost applications



● Spartan-3A DSP: DSP Optimized – For applications where integrated DSP MACs and expanded memory are required. – Ideal for designs requiring low cost FPGAs for signal processing applications such as military radio, surveillance cameras, medical imaging, etc. ● Spartan-3AN: Non-volatile – For applications where non-volatile, system integration, security, large user flash are required – Ideal for space-critical or secure applications as well as low cost embedded controllers ● Spartan-3A: I/O Optimized



– For applications where I/O count and capabilities matter more than logic density



– Ideal for bridging, differential signaling and memory interfacing applications, requiring wide or multiple interfaces and modest processing



● Spartan-3E: Logic Optimized



– For applications where logic densities matter more than I/O count



– Ideal for logic integration, DSP co-processing and embedded control, requiring significant processing and narrow or few interfaces



● Spartan-3: For Highest Density and Pin-Count Applications



– For applications where both high logic density and high I/O count are important



– Ideal for highly-integrated data-processing applications

3.13.2 Spartan-6 Spartan®-6 FPGA delivers an optimal balance of low risk, low cost, and low power for cost-sensitive applications, now with 42% less power consumption and 12% increased performance over previous generation devices. Part of Xilinx’s All Programmable low-end portfolio, Spartan-6 FPGAs offer advanced power management technology, up to 150K logic cells, integrated PCI Express® blocks, advanced memory support, 250MHz DSP slices, and 3.2Gbps low-power transceivers. Architecture of Spartan-6 is shown in Fig. 3.34(a) and  3.34(b).

3.36

Field Programmable Gate Arrays and Applications

Summary of Spartan-6 FPGA Features Spartan-6 FPGAs

Hardened Memory Controllers

3.3 Volt compatible I/O

Fig. 3.34 (a) Spartan-6 Architecture.

Spartan-6 FPGA

CLB Memory Controller I/O MGT

CMT BUFG

PCle Endpoint

BUFIO Block RAM DSP48

Fig. 3.34 (b) Spartan-6 Architecture.

Common resources for Virtex-6 and Spartan-6 are

● Spartan-6 Family main features



● Spartan-6 LX FPGA: Logic optimized



● Spartan-6 LXT FPGA: High-speed serial connectivity



● Designed for low cost



● Multiple efficient integrated blocks



● Optimized selection of I/O standards

Xilinx FPGAs

3.37 Common Resources LUT-6 CLB Block RAM DSP Slices High-performance Clocking Parallel I/O Gigabit Transceivers* PCle® Interface



● Staggered pads



● High-volume plastic wire-bonded packages



● Low static and dynamic power



● 45 nm process optimized for cost and low power



● Hibernate power-down mode for zero power



● Suspend mode maintains state and configuration with multi-pin wake-up, control enhancement



● Lower-power 1.0V core voltage (LX FPGAs, -1L only)



● High performance 1.2V core voltage (LX and LXT FPGAs, -2, -3, and -3N speed grades)



● Multi-voltage, multi-standard SelectIO™ interface banks



● Up to 1,080 Mb/s data transfer rate per differential I/O



● Selectable output drive, up to 24 mA per pin



● 3.3V to 1.2V I/O standards and protocols



● Low-cost HSTL and SSTL memory interfaces



● Hot swap compliance



● Adjustable I/O slew rates to improve signal integrity



● High-speed GTP serial transceivers in the LXT FPGAs



● Up to 3.2 Gb/s



● High-speed interfaces including: Serial ATA, Aurora,1G Ethernet, PCI Express, OBSAI, CPRI, EPON, GPON, DisplayPort, and XAUI



● Integrated Endpoint block for PCI Express designs (LXT)



● Low-cost PCI® technology support compatible with the 33 MHz, 32- and 64-bit specification.



● Efficient DSP48A1 slices

3.38

Field Programmable Gate Arrays and Applications



● High-performance arithmetic and signal processing



● Fast 18 × 18 multiplier and 48-bit accumulator



● Pipelining and cascading capability



● Pre-adder to assist filter applications



● Integrated Memory Controller blocks



● DDR, DDR2, DDR3, and LPDDR support



● Data rates up to 800 Mb/s (12.8 Gb/s peak bandwidth)



● Multi-port bus structure with independent FIFO to reduce design timing issues



● Abundant logic resources with increased logic capacity



● Optional shift register or distributed RAM support



● Efficient 6-input LUTs improve performance and minimize power



● LUT with dual flip-flops for pipeline centric applications



● Block RAM with a wide range of granularity



● Fast block RAM with byte write enable



● 18 Kb blocks that can be optionally programmed as two independent 9 Kb block RAMs



● Clock Management Tile (CMT) for enhanced performance



● Low noise, flexible clocking



● Digital Clock Managers (DCMs) eliminate clock skew and duty cycle distortion



● Phase-Locked Loops (PLLs) for low-jitter clocking



● Frequency synthesis with simultaneous multiplication, division, and phase shifting



● Sixteen low-skew global clock networks



● Simplified configuration, supports low-cost standards



● 2-pin auto-detect configuration



● Broad third-party SPI (up to x4) and NOR flash support



● Feature rich Xilinx Platform Flash with JTAG



● MultiBoot support for remote upgrade with multiple bitstreams, using watchdog protection



● Enhanced security for design protection



● Unique Device DNA identifier for design authentication



● AES bitstream encryption in the larger devices



● Faster embedded processing with enhanced, low cost, MicroBlaze™ soft processor\



● Industry-leading IP and reference designs Spartan 6 Family summary is given in table 3.2 Spartan-6 FPGA Feature Summary

74,637

101,261

147,443

24,051

43,661

XC6SLX45

XC6SLX75

XC6SLX100

XC6SLX150

XC6SLX25T

XC6SLX45T

8

6. Memory Controller Blocks are not supported in the –3N speed grade.

6

6

6

4

4

6

6

6

4

4

4

4

4

5. Each CMT contains two DCMs and one PLL.

8

8

8

4

2

0

0

0

0

0

0

0

0



1

1

1

1

1

0

0

0

0

0

0

0

0



4

4

4

2

2

4

4

4

2

2

2

2

0

4. Block RAMs are fundamentally 18 Kb in size. Each block can also be used as two independent 9 Kb blocks.

6

6

6

4

2

6

6

6

4

2

2

2

2

Total I/O Banks



4,824

4,824

3,096

2,088

936

4,824

4,824

3,096

2,088

936

576

576

216

Maximum GTP Transceivers

3. Each DSP48A1 slice contains an 18 × 18 multiplier, an adder, and an accumulator.

268

268

172

116

52

268

268

172

116

52

32

32

12

Endpoint Blocks for PCl Express

2. Each Spartan-6 FPGA slice contains four LUTs and eight flip-flops.

180

180

132

58

38

180

180

132

58

38

32

16

Max (Kb)

Memory Controller Blocks (Max) (6)



1,355

976

692

401

229

1,355

976

692

401

229

136

90

75

18 Kb(4)

CMTa(5)



184,304

126,576

93,296

54,576

30,064

184,304

126,576

93,296

54,576

30,064

18,224

11,440

4,800

FlipFlops

DSP48A1 Slices(3)

Block RAM Blocks

1. Spartan-6 FPGA logic cell ratings reflect the increased logic cell capability offered by the new 6-input LUT architecture.

23,038

15,822

11,662

6,822

3,758

23,038

15,822

11,662

6,822

3,758

2,278

1,430

600

Silces(2)

Max Distributed RAM (Kb)

Configurable Logic Blocks (CLBs)



Notes:

147,443

43,661

XC6SLX25

XC6SLX150T

24,051

XC6SLX16

74,637

14,579

XC6SLX9

101,261

9,152

XC6SLX4

XC6SLX100T

3,840

Device

XC6SLX75T

Logic Cells (1)

Table 3.3: Spartan-6 FPGA Feature Summary by Device

540

498

348

296

250

576

480

408

358

266

232

200

132

Max Iser I/O

Xilinx FPGAs

3.39

3.40

Field Programmable Gate Arrays and Applications

3.14  BOUNDARY SCAN Testing PCBs can be done using a bed-of-nails tester. This approach becomes very difficult with closer IC pin spacing and more sophisticated assembly methods using surface-mount technology and multilayer boards. The IEEE implemented boundary-scan standard 1149.1 to simplify the problem of testing at the board level. The Joint Test Action Group (JTAG) developed the standard; thus the terms JTAG boundary scan or just JTAG are commonly used. Many FPGAs contain a standard boundary-scan test logic structure with a four-pin interface. By using these four signals, you can program the chip using ISP, as well as serially load commands and data into the chips to control the outputs and check the inputs. This is a great improvement over bed-of-nails testing.

3.15  FPGA DESIGN FLOW

Fig. 3.35 FPGA Design Flow

Xilinx FPGAs

3.41

All Xilinx FPGA and Spartan architectural features are discussed in this chapter. To design systems using the Xilinx FPGAs, Xilinx provides software tools (ISE/Vivado)) and all design concepts and procedures. An FPGA design flow is shown in Fig.3.35. The tool finally generates a bitstream which describes configuration data which is stored in a PROM/ EPROM/EEPROM/Flash Memory and needs to be loaded into the FPGA either serially or in parallel on power on. The detailed format of the bitstream for a particular FPGA chip is usually considered proprietary to the FPGA vendor. When Virtex-4 or Virtex-5 FPGAs or FPGAs with processor are used design will be as shown in Fig. 3.36 since apart from FPGA design, Software also need to be written for the Hard core or Soft core processor.

Fig. 3.36 Design flow for an FPGA with either Hard core or Soft core processor.

3.16  FPGA VS ASIC When user want to design a Electronic product viz an Embedded System that may go into mass production, one has to consider three factors before deciding whether to go for an FPGA or ASIC.

1. Reprogrammability: When an ASIC is designed and fabricated, it should be used only for that purpose and no design changes for upgradation is possible. An FPGA based product will permit changes for upgradation through reprogrammability. This an important factor.



2. Time to market the product: From the design concept to marketing the product, there are several faces that take time as shown in Fig. 3.37.

3.42

Field Programmable Gate Arrays and Applications FPGA vs. ASIC Time-To-Market FPGA Time-To-Market is 9 months vs 2-3 years for ASIC

ASIC Spec

Design & verification

Silicon Prototype

Silicon First Production Ship

System Integration

Freeze Design Here

55%

less

time

FPGA Spec Design & Verification System Integration Freeze Design Here

FPGA flexibility allows late changes, higher chance of meeting customer needs

Fig. 3.37 Time to market a Product.

From the Fig. 3.37 one can see that it takes too much time to market an ASIC based product than a FPGA based product. The figure also indicates that FPGA based product permits flexibility and allows late changes.

3. Cost: Normally it is mentioned that if a product is mass produced it is cheaper to go for an ASIC. FPGA vs. ASIS Cost ASIC: High volumes needed to recover design cost Total cost

FPGA.09 FPGA.13 ASIC.09

ASIC Design Cost is much higher (and increasing)!!

ASIC cost/part is lower

ASIC.13

Volume For each technology advance, crossover volume moves higher

Fig. 3.38 Cost vs Volume.

From the Fig. 3.38 it can be seen that at technology 130 nm, the cross over point between FPGA and ASIC is at low volumes and at 90 nm the cross over point has moved

Xilinx FPGAs

3.43

to higher level. Now FPGA are being manufactured at 25 nm, the crossover would have moved to such an extent that one may prefer FPGA to ASIC as FPGA has all advantages over ASIC even for mass production.

3.17 CONCLUSIONS In this chapter Xilinx FPGAs architectural features are presented and FPGAs Vs ASIC comparison for applications is explained. Designing systems/products using Xilinx FPGAs will be detailed in the chapter 8. Architectural features of other FPGAs, ALTERA and ACTEL, will be discussed in the chapter 4 and 5. Selected applications using Xilinx FPGAs with complete design details will form chapter 8.

REFERENCE Xilinx website www.xilinx.com

4 Altera FPGAs

In chapter 3, architectural details of Xilinx FPGAs are discussed in greater detail. The other FPGAs that are used extensively are ALTERA and ACTEL/MICROSEMI. In this chapter only overview of ALTERA FPGAs along with the developments that took place in the second and third decades are presented. In the next chapter overview of ACTEL/ Microsemi FPGAs is presented.

4.1 INTRODUCTION Embedded Array Block (EAB) I/O Element (IOE)

IOE

IOE IOE

IOE IOE

IOE

IOE

IOE IOE

IOE

IOE

IOE

IOE

IOE

Column interconnect

Logic Array Logic Array Block (LAB)

EAB

IOE

IOE

IOE

IOE

Logic Element (LE)

Row Interconnect

EAB Local Interconnect

Logic Array

IOE

IOE IOE

IOE

IOE

IOE

IOE

IOE IOE

IOE

Embedded Array

Fig. 4.1 Flex 10K Block Diagram.

4.2

Field Programmable Gate Arrays and Applications

From CPLDs, Altera designers concluded that the market needed a new kind of product. It should employ fine-grained logic elements, each containing a Look-up-Table (LUT) and a register. But like CPLDs, it should use a hierarchy of deterministic interconnect, to keep timing predictable and simple. And Altera added a third element, borrowed from certain types of gate arrays. This new element was a set of SRAM blocks embedded in the architecture, designed to serve either as buffer or scratchpad memory, as register files, or as look-up tables for complex function generators. The result of this thinking was, in 1995, the Altera’s first FPGA-like family, FLEX® 10K device which is shown in Fig. 4.1. Altera’s and Xilinx’s basic building block is a 4-input look-up table (LUT), a flip-flop and some additional circuitry that Altera calls a logic element (LE) and Xilinx calls a logic cell (LC). Flex-10K Family is given in Table 4.1 and 4.1(A) Table 4.1: FLEX 10K Device Features. Feature

EPF10K10 EPF10K20 EPF10K30 EPF10K40 EPF10K50 EPF10K10A EPF10K30A EPF10K50V

Typical gates (logic and RAM) (1)

10,000

20,000

30,000

40,000

50,000

Maximum system gates

31,000

63,000

69,000

93,000

116,000

Logic elements (LEs)

576

1,152

1,728

2,304

2,880

Logic array blocks (LABs)

72

144

216

288

360

Embedded array blocks (EABs)

3

6

6

8

10

Total RAM bits

6,144

12,288

12,288

16,384

20,480

Maximum user I/O pins

150

189

246

189

310

Table 4.1(A). FLEX 10K Device Features. Features

EPF10K70

EPF10K100 EPF10K100A

EPF10K130V EPF10K250A

Typical gates (logic and RAM) (1)

70,000

100,000

130,000

250,000

Maximum system gates

118,000

158,000

211,000

310,000

Logic elements (LEs)

3,744

4,992

6,656

12,160

Logic array blocks (LABs)

468

624

832

1,520

Altera FPGAs

4.3

Embedded array blocks (EABs)

9

12

16

20

Total RAM bits

18,432

24,576

32,768

40,960

Maximum user I/O pins

358

406

470

470

In another break from company tradition, Altera engineers used SRAM cells—rather than their trusted EEPROM technology—to hold the FLEX chip’s configuration data. This choice meant that FLEX, like SRAM-based FPGAs, had to be configured from an external memory each time the system power came on. But it also meant that—without the need for space-gobbling EEPROM cells, FLEX logic elements would be smaller. And, an implication that would take a decade to bear fruit, the switch to SRAM meant that Altera could build its devices on leading-edge processes as soon as the new process was available, instead of waiting two years or more for the foundry to develop embedded EEPROM for the process. That change would eventually make FPGAs among the first designs in the semiconductor industry to use each new process node.

4.2 NEW APPLICATION, NEW TECHNIQUES The FLEX 10K device and its successors added momentum to the PLD industry’s eternal wheel of evolution shown in Fig. 4.2. New architecture made possible new applications, which demanded new tools and techniques, which increased the demand for even newer architectures. With as many as 250K gate-equivalents, up to 40K of SRAM, and the ability to implement some quite complex state machines and arithmetic functions using its SRAM blocks, the FLEX device family went far beyond traditional interface and glue applications. Users could contemplate an entire subsystem, such as an Ethernet interface, its media access controller, and a protocol off-load engine; or perhaps an entire signal-processing accelerator with its bus interface, local storage, and hardware finite-impulse-response (FIR) filter, all on one FLEX chip. Despite Altera’s initial desire to differentiate FLEX devices from FPGAs, FPGA was becoming the generic term for large PLDs, and Altera increasingly adopted the term too.

Architecture

Tools

Applications

Fig. 4.2 Circle of FPGA Evolution.

As users began to think in subsystems, they found new needs. Subsystems often required multiple clocks, each clock synchronized to its own phase-locked loop (PLL). But discrete PLLs were expensive and ate up board space. So in 1996 Altera introduced a FLEX device with internal, programmable PLLs. The capability saved board space and could improve clock quality on the chip. But more important, it foreshadowed an enormous trend: as FPGAs became subsystems, they began to integrate blocks of commonly-used, non-PLD hardware to improve the overall system design.

4.4

Field Programmable Gate Arrays and Applications

Design techniques were also changing. In the late 1990s, FLEX devices with over 100K gates had simply outgrown traditional, clean-sheet-of-paper design techniques. Design teams broke their work into blocks, and tried to reuse previously-designed blocks instead of creating new ones. And teams began to license blocks from third-party intellectualproperty (IP) developers. Altera began building IP libraries. And design tools adapted, adding features—such as the ability to set the parameters on a reusable IP block through a simple user interface, or the ability for a third-party vendor to provide an IP block in encrypted form—to help with IP reuse. The added capacity brought with it another issue: debug. For small PLDs, the most common debug technique had been the smoke test: plug in the chip and turn on the power. At 100K gates, the probability that the chip would work on the first try, or that the designer could learn anything useful by watching the external pins with a logic analyzer, was about nil. The smoke test had become useless. Designers began to mimic their relatives in the ASIC world, testing their register-transfer-level (RTL) code with an RTL simulator before they synthesized it into a netlist for the FPGA. But RTL that worked in simulation could still break in practice. Designers needed a way to observe the internal workings of the chip while it was running. With SRAMprogrammed FPGAs it was possible to write ad-hoc debug circuitry into the RTL, resynthesize, and retest. But this was laborious and time-consuming. So in 1999, Altera introduced SignalTap™ logic analyzer (similar to Chipscope/Integrated Logic Analyser(ILA) of Xilinx) a hardware feature that allowed users to monitor every register in a FLEX device while the system was running. The SignalTap megafunction is a parameterized embedded logic analyzer that provides access to signals inside a device. The SignalTap architecture is based on a set of scalable logic analyzer primitives that provide input channels with a multi-level triggering sequence. The SignalTap mega function can store acquisition data. Data captured by the logic analyzer can be transferred to a host computer using the Master Blaster communications cable via a serial or universal serial bus (USB) port.

4.3 ENTER THE PROCESSORS, AND THE GREAT CRASH One of the most requested IP blocks for these large FPGAs was some kind of microprocessor core. CPU core IP was commonly available, both as hardened netlists and as RTL, for ASIC designs. But CPUs had proved a challenge for FPGAs. Some structures vital to CPU cores, such as arithmetic units and multi-port register files, fit poorly into FPGA hardware. And timing closure on a CPU core’s critical paths could be difficult. So in 2000, Altera introduced Nios®, a microprocessor core designed from the ground up, by FPGA designers, to be implemented in a FLEX device. Unlike cores intended for cell-based ASICs, the 16 bit RISC NiOS processor could be small, fast—50 MHz was blazingly fast for a CPU in FPGA fabric then—and easy to implement. An ecosystem of bus interfaces, peripherals, and software quickly began to form around the core. After NIOS processor, now Altera recommends the NIOSII Soft processor for Altera devices. NIOSII processor architecture is shown in fig 4.3. The Niosii embedded processor provides the

Altera FPGAs

4.5

flexibility to integrate memory, processors, peripherals, and other intellectual property (IP) for SOPC designs. SOPC Builder is a powerful system development tool. SOPC Builder enables you to define and generate a complete system-on-a-programmable-chip (SOPC) in much less time than using traditional, manual integration methods. The NIOSii processor is a configurable, general-purpose RISC processor can be easily integrated with user logic and programmed into an Altera FPGA. The processor features a 16-bit instruction set, user-selectable 16- or 32-bit data paths, and a library of standard soft peripherals configurable for a wide array of applications. Nios II processors can implement up to 256 custom instructions that allow critical software subroutines to be implemented in hardware, under operational control of the Nios II embedded processor. one can use custom instructions to implement functions that would take many clock cycles in software, but in hardware, complete in as little as one clock cycle, increasing system performance and data throughput. To increase the efficiency of these custom instructions even further, Altera devices include M4K memory blocks for code and/or data storage and embedded 18 × 18 multipliers that can implement DSP functions. M4K is a synchronous, true dualport memory block, with registered inputs and optionally registered outputs, available in supported device(Arria GX, Cyclone, Cyclone II, HardCopy II, Stratix, Stratix II, Stratix II GX, and Stratix GX) family devices. The M4K block is useful for storing processor code, implementing lookup schemes, and implementing large memory applications. Each block is a 128 × 36 RAM block and contains 4,608 programmable bits, including parity bits. You can configure the M4K block as true dual-port, dual-port, and single-port RAM, FIFO buffers, and ROM, and you can use a Memory Initialization File (.mif) or Hexadecimal (Intel-Format) File (.hex) to pre-load the memory contents when the M4K memory block is configured as a RAM or ROM. The Nios II processor includes a library of peripherals that

TCM I-MEM

1$

INT CNTRL

CUSTOM INSTR IF

Nios*II

MMU

MPU

TCM D-MEM

D$

EXP CNTRL

Debug JTAG DEBUT

HW BP

I&D TRCE

TRCE PORT

Fig. 4.3 Architecture of NIOS RISC 32 bit Soft Processor.

4.6

Field Programmable Gate Arrays and Applications

enable designers to turn a concept into a working design within minutes. These peripherals include: ● Serial interfaces (UART, SPI, JTAG UART) ● On-chip RAM and ROM, and interfaces to off-chip SRAM, flash, SSRAM, and SDRAM memories ● General-purpose parallel I/O (GPIO)

● Direct memory access ● Joint Test Action Group (JTAG) debug interface Then, in mid-2000, the lights went out. The dot-com bubble, which had fed not only ridiculous stock prices but a huge build-out of Internet capacity, began its catastrophic collapse, carrying away much of the demand for semiconductors as it fell. FPGA companies were particularly hard-hit, as the bubble had made network-equipment vendors dominant users of FPGAs. Along with retrenching financially, FPGA vendors became more aggressive in pursuing applications outside the networking community. In 2001, amid the rubble of the burst bubble, Altera chose the Embedded Systems Conference to introduce a new device, destined to be more important as a harbinger than long-lived as a product. The Excalibur™ processor united an APEX™ device family FPGA fabric—an evolution of the FLEX device architecture—with a cell-based ARM922 CPU core running at up to 200 MHz. Excalibur devices integrate an industry-standard ARM922T™ processor with debugging modules, on-chip memory, and peripherals with an APEX™ 20KE device-like architecture. This combination provides system performance of up to 200 MHz (210 Dhrystone MIPS) and an FPGA with embedded RAM, phase-locked loops (PLLs), and advanced I/O capabilities. The device represented a conscious attempt to tune an FPGA for embedded applications, and a major step forward in embedding key IP blocks into FPGA hardware.

4.4  TRENDS IN IP As FPGAs grew and IP reuse became more established, more trends began to emerge. Certainly design teams wanted processor cores they could drop into a design. Also, they wanted cores for industry-standard interfaces. But new needs were showing up as well. For example, starting in the bandwidth-starved communications industry, system designers were beginning to abandon parallel interfaces with separate clocks, in favor of high-speed, self-clocking serial I/O. Signals between chips on a board were beginning to resemble the signals coming from a disk read-amplifier or a satellite receiver. And the transceiver circuits on these chips were correspondingly complex mixed-signal blocks, often running at Gigahertz speeds. These transceivers were both specialized design tasks—outside the expertise of most FPGA users—and unsuited for implementation in programmable logic. Accordingly, in 2001 Altera announced the Mercury™ device family of FPGAs, with 1.25 Gbps transceivers built-in as hard IP. The blocks included both the 1.25 GHz analog drivers and receivers and the mixed-signal clock-data recovery circuits that recreated the original data from the received waveform.

Altera FPGAs

4.7

A similar process was taking place in the world of signal processing, both for wireless communications and in military applications. The most critical building block in these designs, the multiply-accumulator, was particularly taxing for programmable logic. Beginning with dedicated 8x8 multiplier sub-blocks in Mercury devices, Altera moved on to embed full digital signal processing (DSP) building blocks when it introduced the Stratix® device architecture in 2002. Altera offers new levels of system integration with the Mercury™ device family, the world’s first programmable Application Specific Standard Product (ASSP). Mercury devices combine the I/O capability of an advanced clock data recovery (CDR)-enabled transceiver with a performance-optimized core. The Mercury device family also supports a variety of high-speed I/O standards, external memory interfacing, enhanced phase-locked loops (PLLs), and quad-port capable embedded system block (ESB) RAM, giving customers a programmable ASSP (Application Specific Standard Product). Data One Differential Pair X

Data Clock Recovery Unit

Data + Clock

Clock

Clock

Embed Clock in Data

Transmit Single Stream

Recover Clock from Data

Fig. 4.4 A Mercury Device Incorporated with Clock-Recovery Circuits for High-Speed Serial Interconnect.

Mercury Device Family

Fig. 4.5 Altera Mercury Device family.

4.8

Field Programmable Gate Arrays and Applications

The Stratix® series of high-density, high-performance FPGAs leverages Altera’s innovative adaptive logic module (ALM) logic structure to provide the most efficient logic fabric ever in any FPGA. Stratix V FPGAs leverage an enhanced adaptive logic module and MultiTrack interconnect to provide a highly efficient, high-performance FPGA. After Flex 10K FPGA, Altera developed Stratix, Cyclone and Arria FPGAs. These FPGAS are discussed in the following sections.

4.5 STRATIX FPGA Architecture of Stratix iv is shown in Fig. 4.6. DLLS I/O Banks M9K Blocks DSP Blocks M144K Blocks PLLs ALMs/MLABs High-Speed Transceivers Hard IP for PCI Express

Fig. 4.6 Stratix IV FPGA Architecture.

Table 4.2 presents Stratix devices and Table 4.3 gives features of Stratix series. Table 4.2: Stratix Series Introduction Device Family

Stratix Stratix Stratix Stratix GX II II GX

Stratix III

Stratix IV

Stratix V

Stratix 10

Year of introduction

2002

2003

2004

2005

2006

2008

2010

2013

Process technology

130 nm

130 nm

90 nm

90 nm

65 nm

40 nm

28 nm

14 nm Tri-Gate

Table 4.3: Stratix Series Common Features IV

V

10

Stratix

Stratix V

Stratix 10

Adaptive Logic Modules







Transceivers







Technology

Altera FPGAs

4.9

Power







DSP blocks







External memory interfaces







Embedded memory







I/O performance







Design security







Single-event upset mitigation







Remote system upgrades











Partial reconfiguration 3D Tri-Gate Transistor Technology



Hard Processor System



HyperFlexTM



architecture

Heterogeneous 3D Solutions (SRAM, DRAM, and ASICs)



Hard IEEE 754 single precision floating point Key New Features in Stratix are:

● Random Logic: synchronous load/clear to each LE, dedicated cascade routing for LUTs, XOR gate for add/subtract capabilitiy



● Fast Add/Subtraction: XOR gate for add/subtract capability in one LE, carry select logic for architectural speedup of addition/subtraction



● High Level Arithmetic Functions: dedicated multiplier blocks in Stratix (18x18, signed or unsigned)



● DSP Support: configurable datapaths with dedicated multipliers and adder/ accumulator (up to 56 bits)



● Memory: true dual port RAM blocks



● IO : support for new differential IO standards



● Clocking: hierarchical clock network instead of flat global clock network



● Vdd supply: Flex (versions at 5.0v, 3.3, and 2.5v). Stratix: 1.5 v.

4.6 CYCLONE FPGAS Altera introduced Cyclone® Family FPGAs to meet low-power, cost-sensitive design needs, enabling to get to market faster. Each generation of Cyclone FPGAs solves the technical challenges of increased integration, increased performance, lower power, and faster time to market while meeting cost-sensitive requirements. Table 4.4 gives brief introduction of Cyclone FPGA series.

4.10

Field Programmable Gate Arrays and Applications Table 4.4: Cyclone Series Introduction

Cyclone Device Family

Process Technology

Year Introduced

Recommended for New Designs

Cyclone V

28 nm

2011

Yes

Cyclone IV

60 nm

2009

Yes

Cyclone III

65 nm

2007

Yes

Cyclone II

90 nm

2004

Yes

Cyclone

130 nm

2002

Yes

Architectures of Cyclone FPGAs: Architectures of Cyclone II and Cyclone III FPGAS is shown in Fig 4.7 (a) and Fig.  4.7  (b).

Fig. 4.7 (a) Cyclone II FPGA Block Diagram.

Altera FPGAs

4.11 Phase-Locked Loops M9K Memory Blocks Logic Array Embedded 18-bit × 16-bit Multipliers Side I/O Cell with Support for LVDS Signals up to 875 Mbps

Top and Bottom I/O Cell for Memory Interfaces Up to 400 Mbps

Fig. 4.7 (b) Cyclone III FPGA.

Table 4.5 gives the comparison of cyclone and Cyclone II and Table 4.6 compares features of Cyclone II and Cyclone III. Table 4.5: Cyclone II & Cyclone Feature Comparison Device

Feature

Cyclone II

Cyclone

Cost-Optimized Architecture

● 30 percent lower cost than ● Offers a mix of features, Cyclone FPGAs (on a costdensity, and performance per-LE basis) at low cost

Process Technology

● 9 0-nm, low-k dielectric ● 0 .13-µm, FSG dielectric process● Built on 300-mm process wafers ● Built on 300-mm wafers

Core Voltage

● 1.2 V

● 1.5 V

I/O Voltage

● 1.5 V, 1.8 V, 2.5 V, 3.3 V

● 1.5 V, 1.8 V, 2.5 V, 3.3 V

Logic Density

● 4,608 to 68,416 LEs

● 2,910 to 20,060 LEs

I/O Pin Count

● 85 to 622

● 65 to 301

Embedded Memory

● 2 6 t o 2 5 0 M 4 K R A M ● 13 to 64 M4K RAM blocks, blocks, including 512 including 512 parity bits parity bits per block ● per block Offers up to 1.1 Mbits of ● Offers up to 288 Kbits of on-chip memory on-chip memory

External Memory Interface Support

● S ingle data rate (SDR), ● DDR, SDR double data rate (DDR), DDR2, QDRII

4.12

Field Programmable Gate Arrays and Applications

Digital Signal Processing (DSP) Implementation

● U p t o 1 5 0 e m b e d d e d ● U p to 25 18 × 18 soft 18 × 18 multipliers multipliers (implemented (implemented using using LEs) dedicated circuitry)

PLLs

● 2 to 4 PLLs per device ● 1 to 2 PLLs per device with up to 12 PLL outputs with up to 6 PLL outputs

Clock Networks

● Up to 16 dedicated global ● Up to 8 global clock per clocks (GCLK) and 20 device dual-purpose clocks per devices

I/O Standards Support

● L V D S , m i n i - L V D S , ● LVDS, RSDS, SSTL, PCI, LVPECL, RSDS, SSTL, LVTTL, LVCMOS HSTL, PCI, PCI-X, LVTTL, LVCMOS

Nios® II Embedded Processor Support

● Yes

Packages

● 144-pin TQFP

● Yes

● 208-pin PQFP ● 240-pin PQFP ● 256-pin FineLine BGA® ● 4 84-pin Ultra FineLine BGA ● 484-pin FineLine BGA ● 672-pin FineLine BGA ● 896-pin FineLine BGA Table 4.6: Cyclone® II and Cyclone III Feature Comparison Feature

Device Cyclone II

Cyclone III

Cost-Optimized Architecture

● 30% lower cost than Cyclone ● 2 0% lower cost thanCyclone FPGAs (on a cost-per-logicII FPGAs (on a cost-per-logicelement basis) element basis)

Process Technology

● 90-nm ● Low-K dielectric process

● TSMC’s 65-nm low-power (LP) process

● Built on 300-mm wafers

● Low-K dielectric process

Core Voltage

● 1.2 V

● 1.2 V

I/O Voltage

● 1.5 V, 1.8 V, 2.5 V, 3.3 V

● 1.5 V, 1.8 V, 2.5 V, 3.3 V

● Built on 300-mm wafers

Altera FPGAs

4.13

Logic Density

● 4 , 6 0 8 t o 6 8 , 4 1 6 l o g i c ● 5,136 to 119,088 LEs elements (LEs)

I/O Pin Count

● 85 to 622

● 82 to 535

Embedded Memory

● M4K RAM blocks

● M9K RAM blocks

● Up to 1.1 Mbits of on-chip ● U p to 4 Mbits of on-chip memory memory ● 216-MHz performance

● 260-MHz performance

External Memory Interface Support

● SDR, DDR, DDR2, QDRII● ● SDR, DDR, DDR2,QDR II 167-MHz DDR2 ● 200-MHz DDR2

Digital Signal Processing (DSP) Implementation

● U p t o 1 5 0 multipliers

PLLs

● 2 to 4 phase-locked loops ● 2 to 4 PLLs per device with up (PLLs) per device with up to to 20 PLL outputs 12 PLL outputs ● PLLs can be cascaded

18

x

1 8 ● Up to 288 18 x 18 multipliers

● PLLs dynamically configurable Clock Networks

● U p to 16 dedicated global ● Up to 20 dedicated global clocks clocks (GCLK) and 20 dualpurpose clocks per device

I/O Standards Support

● L V D S , m i n i - L V D S , ● L VCMOS, LVPECL, LVDS, LVPECL, RSDS, SSTL, mini-LVDS, RSDS, PPDS, HSTL, PCI, PCI-X, LVTTL, SSTL, HSTL, PCI-X, LVTTL LVCMOS ● All standards supported on all banks ● Dedicated LVDS output buffers ● LVDS TX 840 Mbps ● LVDS RX 875 Mbps

II Embedded ● Yes Processor Support Nios®

Packages

● 144-pin TQFP ● 208-pin PQFP ● 240-pin PQFP ● 256-pin FineLine BGA ● 484-pin Ultra FineLine BGA ● 484-pin FineLine BGA ● 672-pin FineLine BGA ● 896-pin FineLine BGA

● Yes ● 144-pin EQFP ● 240-pin PQFP ● 256-pin 1 mm pitch FBGA ● 324-pin 1 mm pitch FBGA ● 484-pin 1 mm pitch FBGA ● 780-pin 1 mm pitch FBGA ● 256-pin .8 mm pitch UBGA ● 484-pin .8 mm pitch UBGA

4.14

Field Programmable Gate Arrays and Applications

Figure 4.8 shows the architecture of Cyclone IV.

Fig. 4.8 Architecture of Cyclone IV



● Cyclone IV FPGAs are the market’s lowest cost, lowest power FPGAs, now with a transceiver variant. The Cyclone IV FPGA family is targeted towards highvolume, cost-sensitive applications, enabling you to meet increasing bandwidth requirements while lowering costs. Cyclone IV GX FPGAs have up to eight integrated 3.125-Gbps transceivers. Cyclone IV E FPGAs for a wide spectrum of general logic applications Architecture of Cyclone V is shown I Fig. 4.9.

Fig. 4.9 Cyclone V Architecture

Cyclone V FPGAs provide the market’s lowest system cost and lowest power FPGA solution for applications in the industrial, wireless, wire line, broadcast, and consumer

Altera FPGAs

4.15

markets. The family integrates an abundance of hard intellectual property (IP) blocks to enable you to do more with less overall system cost and design time. The SoC FPGAs in the Cyclone V family offer unique innovations such as a hard processor system (HPS) centered around the dual-coreARM®CortexTM-A9 MPCoreTM processor with a rich set of hard peripherals to reduce system power, system cost, and board size.

4.7 ARRIA FPGA Altera’s Arria® family delivers optimal performance and power efficiency in the midrange. The Arria family has a rich feature set of memory, logic, and digital signal processing (DSP) blocks combined with the superior signal integrity of up to 28.05 Gbps transceivers that allow you to integrate more functions and maximize system bandwidth. Introduction to Arria family is given in Table 4.7. Table 4.7: Arria Family Introduction. Family

Arria GX

Arria II GX

Arria II GZ

Arria V GX, GT, SX

Arria V GZ

Arria 10 GX, GT, SX

Year of introduction

2007

2009

2010

2011

2012

2013

90 nm

40 nm

40 nm

28 nm

28 nm

20 nm

Process technology

Architecture of Arria V and 10 FPGAs are shown in Fig. 4.10 and 4.11. HPS I/Os ALM ARM Cortex-A9 MPCore HPS

Distributed Memory

Variable-Precision DSP Blocks

PCIe Gen2 × 4 Hard IP (GX, GT), PCIe Gen 3 × 8 Hard IP (GZ)

10K internal Memory Blocks (GX, GT), M20K Internal Memory Blocks (GZ)

Fractional PLLs

Hard IP per Transceiver (PCS) High-Speed Serial Transceivers

General-Purpose I/Os (LVDS, Memory Interfaces)

Integrated Multiport Memory Controllers (GX and GT only)

Fig. 4.10 Architecture of Arria V FPGA.

4.16

Field Programmable Gate Arrays and Applications

Variable Precision DSP Blocks with Hardened Floating Point M0K Internal Memory Blocks

Transceiver Channels

Hard IP Per Transceiver, 8b/10b PCS, 64b/66b PCS, 10GBase-KR FEC, Intertaken PCS

I/O PLLs Fractional PLLs

PCI Express Gen3 Hard IP

Hard Memory Controllers, General Purpose I/O Cells, LVDS

Core Logic Fabric

Fig. 4.11 Architecture of Arria 10 FPGA.

Arria-10 family consists of 10GT, 10GX and 10SX. Their main features are: Arria 10 GT FPGAs enabled with up to 96 full-duplex transceivers with data rates up to 28.3 Gbps chip-to-chip,17.4 Gbps backplane and up to 1,150K equivalent LEs Arria 10 GX FPGAs enabled with up to 96 full-duplex transceivers with data rates up to 17.4 Gbps chip-to-chip,16.0 Gbps backplane, and up to 1,150K equivalent LEs Arria 10 SX SoCs enabled with a dual-core ARM Cortex-A9 HPS, up to 48 full-duplex transceivers with data rates up to 17.4 Gbps chip-to-chip, 16.0 Gbps backplane, and up to 660K equivalent LEs Arria-10

is used in wireless backhaul, Optical Transport Network, Military Radar/ FlexDAR and Broadcast Video-Audio equipment applications.

A brief comparison of Cyclone, Arria and Stratix FPGAs is given in Table 4.8. Table 4.8: Comparison of Cyclone, Arria and Stratix FPGAs Cyclone FPGAs Cyclone ®

series FPGAs are the industry’s lowest cost, lowest power FPGAs, ideal for high-volume, costsensitive applications.

Arria FPGAs Arria®

series FPGA provide an optimal balance of performance, power, and price for midrange transceiver-based applications. You’ll find a rich feature set of functions (memory, logic, and DSP) combined with superior signal integrity in the devices.

Stratix FPGAs Stratix® series FPGAs are the industry’s highest bandwidth, highest density FPGAs, ideal for high-end applications. Newer families come with integrated transceiver options (at data rates up to 28G).

Altera FPGAs Use a Cyclone series FPGA alone, as a digital signal processor, or as a cost-effective embedded processing solution. Cyclone series FPGAs offer a wide range of density, memory, embedded multiplier, and packaging options. Newer families include integrated transceiver options (at data rates up to 5G).

4.17 Arria series FPGAs feature on-chip transceivers that allow you to integrate more functions and maximize system bandwidth (at data rates up to 10G).

Stratix series FPGAs simplify the challenges of signal integrity by providing transceivers with best-in-class jitter characteristics. Features such as Programmable Power Technology keep total power in check.

4.8 AN EVOLVING METHODOLOGY FPGAs were growing in capacity, and increasingly they included significant blocks of hard IP, such as PLLs, debug controllers, serial transceivers, and, in some applications, CPU cores. Customers increasingly created their designs by tying together blocks of previouslydesigned IP. Some of these designs were application-specific accelerators, often for packet processing or signal processing, that implemented one powerful pipeline. But now another architecture was emerging as well: designs centered on a CPU core, with a system bus emanating from the CPU as the backbone upon which the other blocks hung. This design approach emphasized selection and verification of IP, and making the right connections, over creating new subsystems in Verilog. And once again Altera responded, inventing SOPC Builder. This tool was an interactive, guided user interface for constructing CPU-based systems on an FPGA. The user indicated which blocks to assemble where, and the tool generated the necessary RTL. Once again evolutionary change was setting the stage for revolution. PLDs had grown from glue logic and bus interface components to self-contained packet-processing, signalprocessing, or CPU-based subsystems. With enough logic capacity, the right IP, and the appropriate tools, FPGAs were ready to move from their subsystem role to become the heart of the system. In the world of system design, the elements had quietly aligned for the next stage in the evolution of FPGAs. Use of the most aggressive CMOS technology had given FPGAs the logic density and speed to implement a CPU core and its peripherals in a single chip. Altera had released NIOSII a RISC CPU core optimized for FPGAs, and partners had developed FPGA implementations of other popular CPU cores as well. Avalon, a multi-master bus architecture tuned for use in programmable logic, normalized interconnects between CPUs and subsystems on the chip. And a tool to add automation to the process of assembling intellectual property (IP) into an FPGA-based SoC, SoPC Builder, reached the market. This groundwork enabled an entirely new way of thinking about programmable logic. Designers would continue to build glue logic in CPLDs. Seekers of high performance would

4.18

Field Programmable Gate Arrays and Applications

continue to implement faster, ever more powerful accelerators and subsystems in packetswitching, signal-processing, and related applications. But in addition, Altera’s third decade would be the dawn of the FPGA as system on chip.

4.9  THE CPU-CENTRIC PHASE Early in the decade, SoCs tended to follow a simple pattern, based on the board-level computers they were replacing. A SoC (Fig 4.12) typically comprised a single CPU core, a local cache or tightly-coupled SRAM, a DRAM controller, an on-chip version of a microprocessor bus, and whatever peripheral controllers the application required. Applications might include in this picture a DMA controller or an application accelerator for some frequent but taxing task, such as data movement, cryptographic computations, or Fast Fourier Transforms (FFTs).

CPU

DRAM Controller Cache

DMA Controller

System Bus

Peripheral Controller

Peripheral Controller

Peripheral Controller

Fig. 4.12 A Typical CPU-Centric Soc Design.

Implementing the SoC in an FPGA offered some valuable benefits. Designer could select just the hardware blocks they needed in the CPU core. Numerical accelerators could use the fast digital signal processing (DSP) blocks in Altera FPGAs to achieve arithmetic performance well beyond what the combination of a microprocessor and a DSP chip could reach. And a designer could implement custom accelerators using the programmable logic, DSP blocks, and RAM blocks embedded in the FPGA fabric. These accelerators could be designed either as units on the microprocessor bus or as independent flow-through processors, creating a data plane separate from the microprocessor’s control plane. An important advantage of this increased integration was energy efficiency. Hard functions such as RAM and DSP blocks in the FPGA were at least as energy-efficient as an equivalent ASIC or off-the-shelf function. Functions implemented in the programmable logic would generally—but not always—consume more power than their standard-product

Altera FPGAs

4.19

equivalents. But during this period I/O dominated energy consumption in many systems. And moving data through the FPGA fabric was not only vastly faster, but far more efficient than moving it across chip boundaries. By confining high-bandwidth data transfers inside the FPGA, system designers could often achieve very substantial net energy savings at the system level. With the hardware and IP to support CPU-centric SoCs already in place, Altera focused on the tool flow. It was quickly apparent that the tool needs of SoC developers were different from those of traditional logic designers. Traditionally, designers of interfaces or data path components would express every detail of their design in VHDL or Verilog, and then follow each element through the steps of logic verification, mapping to the FPGA resources, and timing closure. But SoC designers focused at a more abstract level. Was the hardware fast enough and the on-chip RAM large enough? Were the bus and memory bandwidths adequate? Did the bus interfaces interoperate? With heavy IP reuse, the focus of design effort shifted from the overall SoC logic to writing software, and to creating one or two new blocks to drop into a design assembled from existing IP. In other words, SoC developers were thinking like system designers, not like chip designers. One result of this shift in emphasis was Incremental Compilation, first introduced by Altera in 2005. Often, design effort would focus on one or two blocks in an SoC, while the majority of the hardware work remained unchanged. Altera’s Incremental Compilation feature allowed designers to rework one portion of a design, subject to fixed location and pin constraints, without having to run the entire design back through the tool chain. It not only saved compilation time, but it removed the risk of disturbing the portion of the hardware that was already working. SoC designs also introduced a shift in the use of FPGA I/O pins. As bus bridges or accelerators, FPGA s tended to have data flowing through the chip in bursts or streams, usually from one standard bus into another. Typically there would be only a few clock domains, mostly defined by the busses. CPU-centric SoCs presented new requirements. There would often be a standard external bus, such as PCI or USB. But now the FPGA would be the originator of the bus, not simply a client on the bus. There would also, almost certainly, be a DRAM port, drawing FPGAs onto the challenging trajectory of DDR SDRAM interface technology. And there would likely be a number of serial or parallel connections between on-chip peripheral controllers and their external devices. This diversity could mean more pins, more signaling and voltage variety in the I/O, and more clock domains. These changes were reflected in increasing complexity of FPGA I/O cells and clock networks.

4.10  CORE AND MULTICORE The treadmill of semiconductor process improvements continued to run, grinding out ever higher transistor densities. But during Altera’s third decade, the mill became less and less able to produce higher circuit speeds. Accordingly, CPU manufacturers refocused: from

4.20

Field Programmable Gate Arrays and Applications

ever-higher core clock frequencies to two—and then four, and then more—CPU cores on one die: multicore architectures. SoC designers quickly followed, both in ASIC designs and in FPGAs. Multicore thinking had two significant threads in FPGA use. One thread simply replicated CPU cores. By this time it was relatively easy to compile several processor cores into one FPGA. It was less simple, though, to figure out how to connect them. Here, programmable logic offered an embarrassment of riches, as architects could implement virtually anything from arrays of tightly-coupled cores to shared L2 cache architectures to independent CPUs on the multimaster Avalon® bus. The second line of multicore thinking led down a different path: heterogeneous systems. The same bus, IP, and tools that made multiple instances of one CPU core feasible made combinations of a CPU core and multiple, peer-level accelerators just as possible. And this, in turn, led to an entirely different way of thinking about multicore design: a softwarecentric approach (Fig 4.13).

Accelerator

Accelerator

CPU

Cache

Cache

Cache

DMA Controller

System Bus

Peripheral Controller

Peripheral Controller

DRAM Controller

Peripheral Controller

Fig. 4.13 A Heterogeneous Multicore SoC Design.

Planning a homogeneous multicore system can be—to vastly oversimplify—pretty straightforward. Figure out how many times faster than single-CPU speed you need to go. Put in that many more CPUs, and maybe an extra or two to account for inefficiencies. Choose an interconnect architecture based on the level of memory sharing between threads that you expect. Divide up your software threads among the CPUs, simulate the system, and repeat until it works within specs. This process remains firmly hardwarecentric, selecting an architecture, implementing it, and then dividing up the code to fit the hardware. But the ability to create one’s own accelerators opens up a whole new methodology. It goes like this. Profile your code to find the hot spots. For the nastiest code segments, create custom accelerators that will save both CPU cycles and energy. Simulate the system, and

Altera FPGAs

4.21

return to the profiling step and repeat, until the performance requirements are met. This approach starts with working software on one CPU core, and generates a constellation of hardware accelerators customized to the actual system software. For the first time, the system becomes a reflection of the software requirements, rather than a Procrustean bed into which the software will be condemned. In 2006, Altera introduced two innovations that supported this heterogeneous multicore design style. One was a compiler that would transform a block of executable ANSI C code into an accelerator optimized to work with a Nios® CPU core in an Altera FPGA. This C-toHardware Acceleration (C2H) compiler tool automated one of the most time-consuming and error-prone steps in software-centric design: generation of the accelerators. The second innovation was less obvious. If you compare the power consumption of a fast single-core processor to that of an equivalent cluster of slower-clocked processors, dynamic power should go down sharply because of the efficiencies of the accelerators. But leakage—a growing problem throughout the decade—increases with the total number of transistors, regardless of circuit activity. So leakage currents could take away much of the energy efficiency that multicore design provided. Altera responded to this problem with a second innovation: Programmable Power. This combination of hardware and software-tool features selects slower, low-leakage circuits for non-critical timing paths, minimizing leakage current in the FPGA while delivering timing closure. The result could be recapturing the big energy gains that heterogeneous multicore design had to offer, despite the higher leakage of deep-submicron processes.

4.11  CONSENSUS AND HARDENING A final phase marked the closing years of Altera’s third decade: the growth of consensus on IP selection. Gradually, the system design community is tightening its focus on specific solutions to some of its most pressing problems. In particular, C has become nearly ubiquitous among embedded-system developers, ARM® cores are coming to dominate embedded computing, and a relative few interface standards are coming to dominate specific uses, such as high-speed system busses, backplane connections, and inter-chip connections. That focus is allowing Altera to innovate in its support of these solutions. One example is in the way programmers express parallelizable chunks of code. C, while it is sufficient to define a sequential procedure to implement a task, cannot express the opportunities for parallelism that a skilled programmer can find. But a C-derivative called OpenCL™ can. In 2011, Altera introduced a set of tools that allowed programmers to write parallel algorithms in the increasingly popular OpenCL, and translate them—without specialized knowledge of FPGA design—into parallel hardware in the FPGA and control code on a conventional CPU. The growing consensus around the use of ARM Cortex™-A-class CPU cores in multicore SoCs enabled a second innovation. As long as every design team wanted a different CPU, FPGA vendors had to meet these needs with soft cores implemented in the programmable logic. But that flexibility had its costs: logic-element consumption, power consumption, and lower speed.

4.22

Field Programmable Gate Arrays and Applications

Altera responded to a specific trend: the use of the Cortex-A9 in a growing number of embedded and wireless applications. In 2012, the company began sampling an FPGA with an on-die Hard Processor Subsystem: a dual-core Cortex-A9 cluster with its own caches, local RAM, optimized memory controller, and selected peripheral controllers, all in ASICstyle cell-based hardware. This Hard Processor System is shown in Fig. 4.14 Block Diagram

Fig. 4.14 Hard Processor System (HPS) Features.

HPS Features:

● 925 MHz, dual-core ARM® Cortex™-A9 MPCore™ processor



● Each processor core includes:



o 32 KB of L1 instruction cache, 32 KB of L1 data cache



o Single- and double-precision floating-point unit and NEONTM media engine



o CoreSightTM debug and trace technology



● 512 KB of shared L2 cache



● 64 KB of scratch RAM



● Multiport SDRAM controller with support for DDR2, DDR3, and LPDDR2 and optional error correction code (ECC) support



● 8-channel direct memory access (DMA) controller



● QSPI flash controller



● NAND flash controller with DMA



● SD/SDIO/MMC controller with DMA

Altera FPGAs

4.23



● 2 × 10/100/1000 Ethernet media access control (MAC) with DMA



● 2 × USB On-The-Go (OTG) controller with DMA



● 4 × I2C controller



● 2 × UART



● 2 × serial peripheral interface (SPI) master peripherals, 2 × SPI slave peripherals



● Up to 134 general-purpose I/O (GPIO)



● 7 × general-purpose timers



● 4 × watchdog timers

The chip architects took particular care to optimize the interconnect between the subsystem and the programmable logic fabric for implementing heterogeneous multicore systems. This increasing convergence between multicore processor systems and FPGAs led to one more major innovation. In 2013, Altera announced that its next generation of highend FPGAs would be fabricated not by a traditional foundry partner, but by Intel Corp., using a 14 nm Tri-Gate process whose heritage was Intel’s own CPUs and SoCs. This shift from the ASIC-oriented foundry market to the foundry arm of a CPU specialist in effect put Altera’s FPGAs on a separate power-performance trajectory, optimizing the semiconductor process characteristics that are vital to processing elements, local RAM, and high-speed interconnects, rather than optimizing across the much wider space that a broad-market ASIC foundry must serve. Altera believes that the result of this choice will be a discontinuity in the performance and energy consumption patterns that have dominated the FPGA industry for years. It is a promising way to start a new decade.

4.12 CONCLUSIONS Altera offers customers a broad spectrum of FPGAs geared towards diverse markets and applications. This chapter presented an overview of Altera’s glue logic FPGAs to subsystems and also the developments in the second and third decades. At the time of writing this chapter, news broke out that intel corporation has acquired Altera.

REFERENCES

1. http://www.altera.com/technology/system-design/articles/2013/glue-logic-subsystem.html



2. http://www.altera.com/technology/system-design/articles/2013/third-decade-fpga-soc.html

5 Microsemi FPGAs

5.1 INTRODUCTION Historically, Microsemi FPGAs were traded with the brand name Actel which was later acquired by Microsemi Corporation. Actel came a publicly traded company in 1985 and became known for its high-reliability and antifuse-based FPGAs, dominating the military and aerospace markets. In 2000, Actel acquired GateField which expanded Actel’santifuse FPGA offering to include flash-based FPGAs. In 2004, Actel announced it had shipped the one-millionth unit of its flash-based ProASICPLUS FPGA. In 2005, Actel introduced a new technology known as Fusion to bring FPGA programmability to mixed-signal solutions. Fusion was the first technology to integrate mixed-signal analog capabilities with flash memory and FPGA fabric in a monolithic device. In 2006, to address the tight power budgets of the portable market, Actel introduced the IGLOO FPGA. The IGLOO family of FPGAs was based on Actel’s nonvolatile flash technology and the ProASIC 3 FPGA architecture. Two new IGLOO derivatives were added in 2008: IGLOO PLUS FPGAs with enhanced I/O capabilities, and IGLOO nano FPGAs, the industry’s lowest power solution at 2 µW. A nano version of ProASIC3 also became available in 2008. In 2010, Actel introduced the SmartFusion line of FPGAs. SmartFusion includes both analog components and a programmable flash-based logic fabric within the same chip. SmartFusion was the first FPGA product to additionally include a hard ARM processor core. In November 2010, Actel was acquired by Microsemi Corporation. Hence, in the remaining part of the chapter, all the antifuse and flash based FPGAs are referred as MicroSemi FPGAs. After Microsemi acquired Actel, it added next generation FPGAs/SoC FPGAs to its portfolio named – SmartFusion2 and IGLOO2. SmartFusion2 SoC FPGAs integrate mainstream FPGA fabric and ARM Cortex-M3 microcontroller with other sub-

5.2

Field Programmable Gate Arrays and Applications

systems on a single chip. IGLOO2 FPGAs are mainstream FPGAs with best-in-class security, low-power and immunity to single event upset (SEU).

5.2 TECHNOLOGIES Microsemi’s portfolio of FPGAs is based on two types of technologies: Antifuse-based FPGAs (Accelerator, SX-A, eX, and MX families) and flash-based FPGAs (Fusion, IGLOO, ProASIC3 and SmartFusion families). The latest flash-based devices include SmartFusion2 and IGLOO2. Microsemi’s antifuse FPGAs have been known for their nonvolatility, live at powerup operation,single-chip form factor and security..Microsemi’s flash-based FPGA families include these same characteristics and are also reprogrammable and low-power. Microsemi also develops system-critical FPGAs (RTAX and ProASIC3 families), including extended temperature, automotive, military, and aerospace FPGAs, plus a wide variety of space-class radiation-tolerant devices. These flash and antifuse FPGAs have high levels of reliability and firm-error immunity.

5.3 THE ANTIFUSE/FLASH ADVANTAGE OVER SRAM BASED FPGAs Microsemi’s antifuse/flash-based devices are low-cost, high-performance solutions for today’s logic designer. Ideal for integrating logic typically implemented in multiple CPLDs, PALs and FPGAs, antifuse devices offer significant cost savings while maintaining high performance. In addition, the Microsemi antifuse/flash technology ensures design security and gets your design to market faster than an ASIC. Microsemi antifuse/flash FPGAs combine the benefits of programmable logic and ASICs and forms a programmable ASIC solution.” Microsemi FPGAs offer several advantages over SRAM based FPGAs. These advantages are inherited from the architecture of flash based configuration cells. These advantages are: Lower Total-Cost-of-Ownership Instant-on Low-Power Reliability

5.3.1 Lower-Total-Cost-of-Ownership Microsemi’s antifuse/flash FPGAs are nonvolatile, enabling them to retain their configuration indefinitely and without an external storage device. With no external storage device necessary to hold the configuration data, the need for a PROM or microprocessor and the associated board space are eliminated, providing additional cost savings and a smaller-system foot print.

Microsemi FPGAs

5.3

5.3.2 Instant-on Low cost does not mean low performance. With its low resistance and low capacitance properties, Microsemi’s antifuse technology offers very highspeeds. And since Microsemi devices hold configuration in non-volatile configuration cells, they are instantly operational on power-up. There is no boot-up period while data is being downloaded from an external device. FPGA Programming Technologies (SRAM and Antifuse) are shown in Fig. 5.1(a) and Fig. 5.1(b). FPGA Programming Technologies Anti-fuse

SRAM A

SRAM cell

Un-programmed Anti-fuse B (open)

Conductor A

A

B

B A

Programmed Anti-fuse

B

(closed)

SRAM programed with 1/0 to enable/disable connection between A and B

Insulator (oxide)

Anti-fuse programmed to enable connection between A and B

Fig. 5.1(a) SRAM, Antifuse, Flash FPGA Programming Technologies

Antifuse Switch

● Antifuses are originally open circuits that take on low resistance only when programmed.



● Antifuses are manufactured using modified CMOS technology.

5.4

Field Programmable Gate Arrays and Applications

Fig. 5.1(b) MicrosemiAntifuse Switch.

Antifuse Advantages Table 5.1 Advantages of Antifuse/flash technologies. ● Highest density ● Nearly impossible to reverse engineer – a Simple cross point – 10X the density of SRAM ● Radiation hard ● Lowest switch resistance ● Live within 1 millisecond of the power supply – ~ 25 Ohms reaching Spec Voltage ● Very low capacitance – ~1 fF per node ● Software is easy to place and route – approaching the metal line capacitance ● non-volatile The Table 5.2 shows comparisons for the three major kinds of FPGA technologies: Table 5.2 Comparison of SRAM, Antifuse, Flash Technology FPGAs Feature

SRAM

Antifuse

E2 PROM/Flash

Technology node

State-of-the-art

One or more generations behind

One or more generations behind

Reprogrammable

Yes (in system)

No

Yes (in-system or offline)

Reprogramming speed (inc. erasing)

Fast



3 × slower than SRAM

Volatile (must be programmed on power-up)

Yes

No

No (but can be if required)

Requires external configuration file

Yes

No

No

Yes (very good)

No

Yes (reasonable)

No

Yes

Yes

Good for prototyping Instant-on

Microsemi FPGAs IP Security

5.5 Acceptable (especially when using bistream encryption)

Very Good

Very Good

Size of configuration cell

Large (six transistors)

Very small

Medium-small (two transistors)

Power consumption

Medium

Low

Medium

No

Yes

Not really

Rad Hard

5.3.3 Low-Power As configuration is already stored inside the device and does not need to be loaded from the external memory, antifuse and flash based device yield zero inrush current and zero configuration current. Flash cells offer lower static power compared to SRAM cell. In majority of the applications where a duty cycle based operation (when device performs a task and then go to low power mode), static power matters a lot in determining overall system power consumption. Microsemi devices support Flash*Freeze mode that further reduces power when device goes to low-power mode.

5.3.4 Reliability Flash and Antifuse device possess zero failure-in-time (FIT) as flash cells are immune to single event upset due to neutron strike. That makes them the most reliable FPGAs to be used in all aviation applications and where system needs to operate in harsh environment.

5.4  FPGA SECURITY Security in FPGAs can be classified in two parts: Design Security Data Security

5.4.1  Design Security Design security is a necessity intoday’s highly competitive technology to protect designs from theft. SRAM-based FPGA’s lack the key capabilities required to create a trusted and secure hardware platform for a secure embedded system making them vulnerable to cloning, copying, and reverse engineering. Sensitive customer data can be attacked and embedded systems compromised. Microsemi’s FPGAs offer that security. Microsemi devices do not need a start-up bitstream, eliminatingthe possibility of configuration data being intercepted. This also prevents in-system errors and accidental data erasures that otherwise occur during download. Add to that the inherent security of the antifuse technology itself, the antifuses that form the interconnections within an Microsemi FPGA are extremely small, are densely distributed throughout the device (over 6.5 million on the largest Microsemi device), and do not leave

5.6

Field Programmable Gate Arrays and Applications

an observable signature that can be electrically probed or visually inspected. With these safeguards, Microsemi devices are virtually immune to copying and reverse engineering.

5.4.2  Data Security Data security is primarily about the protection of user data from unauthorized access. Microsemi devices provide various encryption methods that are useful in protecting user data.

Zeroization feature on IGLOO2 and SmartFusion2 device helps in erasing user data and configuration if a temper is detected. Various protection levels can be set by the designer based on the security needs. Security in Xilinx, Altera and Microsemi FPGAs is summarised in table 5.3. Table 5.3: Security Provided in Xilinx, Altera and Microsemi FPGAs Microsemi

XIIinx

Altera

Data Security Licensed Patent Protected DPA Pass Through License

Yes

No

No

Key Storage Using Physically Uncloneable Function (PUF)

Yes

No

No

Hardened Security for ECC, AES, True RNG, SHA and HMAC

Yes

No

No

Design Security X.509 Signed Digital Certificate for Supply Chain Assurance

Yes

No

No

Tamper Detection with an Active Mesh and Counter measures

Yes

No

No

Key Storage

Secure Flash

Bitstreams exposed to Monitoring

Only during programing

On every power-up

On every power-up

Bitstream Authentication

Yes

Yes

No

Fuse or Fuse or battery backed battery backed

Secure Hardware Licensed Patent Protected DPA Counter measures

Yes

No

No

Random Number, ECC and PUF

Yes

No

No

NIST Certification for ECC, SHA, AES, DRBG and HMAC

Yes

A E S , S H A , AES only HMAC

ECC: Error code, SHA: Secure Hash Algorithm, AES: Advanced Encryption Standard, DRBG: Deterministic Random Bit Generation, HMAC: Hash-based Message Authentication Code.

FPGA security is further discussed in chapter 9.

Microsemi FPGAs

5.7 .

5.5  FASTER TIME-TO-MARKET Microsemi devices SmartFusion2 and IGLOO2 offer a host of other benefits that simplify the design cycle and speed the design to market. Using Microsemi’s automatic place-androute tools, 100% logic utilization is possible, speeding design time. And Microsemi’s unique general and local routing structure allows 100% pin-locking even at full logic utilization, so the PCB can be developed concurrently with the FPGA design. Even during verification, Microsemi devices can be observed on the board and in real time using the Silicon Explorer diagnostic tool decreasing verification times.

5.6  ANTIFUSE FPGAS OF MICROSEMI Antifuse Products of Microsemi ar shown in Fig. 5.2. Actel Antifuse Products

System Performance (MHZ)

270

Future Antifuse Products

SX-A/SX FPGAs

00 MX FPGAs

75 10k

50k

100k System Gates

Fig. 5.2 Antifuse products of Microsemi.

Ideal for integrating designs that were previously segmented between CPLDs for speed and FPGAs for capacity, SX-A/SX devices offer a low-cost, single-chip solution. Designs can now be easily integrated into a single SX-A/SX FPGA, improving system reliability and simplifying system integration. Integration also typically results in 1/3 the power consumption of comparable CPLDs and FPGAs. And, of course, fewer components mean reduced production costs associated with component counts and board space.

5.8

Field Programmable Gate Arrays and Applications

5.6.1 SX-A/SX Architecture is shown in Fig. 5.3 SX-A/SX Architecture

C

R

C

C

R

C

C

R

C

C

R

C

Direct Connect Fast Connect Routing Segments ● No antifuses ● One antifuses ● Typically 2 antifuses ● 0.1 ns routing delay ● 0.4 ns routing delay ● Max. 5 antifuses

Fig. 5.3 SX-A/SX Architecture

Cells C and R are explained in Fig. 5.4 and 5.5.

Actel SX-A and SX Family



● the C-cell implements a range of combinatorial functions up to 5-inputs ● inclusion of the DB input and its associated inverter function dramatically increases the number of combinatorial functions that can be implemented in a single module from 800 options in previous architecturs to more than 4,000 in the SX architecture

D0 D1 Y D2 D3 Sb

Sa

DB

Fig. 5.4 C cell

A0

B0

A1

B1

Microsemi FPGAs

5.9

Actel SX-A and SX Family

Routed Data InputS1 S0

● the R-cell contains a flip-flop featuring



– asynchronous clear



– asynchronous preset



– clock enable (using the S0 and S1 lines) control signals



● the R-cell registers feature programmable clock polarity selectable on a register-by-register basis.

PSETB Direct Connect Input

D

HCLK CLKA, CLKB, Internal Logic

Q

Y

CLRB

CKS

CKP

Fig. 5.5: R Cell

5.6.2  MX FPGAs With capacities ranging from 3,000to 54,000 system gates, 250 MHz system performance, and clock-to-out delays as low as 5.6 ns, Microsemi’s MX family of FPGAs offers high performance 5.0V solutions. Microsemi MX family is shown in Fig. 5.6.

Actel MX Family

● low power consumption



● 5.0V, 3.3V and mixed voltage systems compatible



● design security MX Selector Guide MX02 MX04 MX09

MX16

MX24

MX36

System Gates

3,000

6,000

13,500 24,000 36,000 54,000

Typical Gates

2,000

4,000

9,000

Max I/O

57

69

104

140

176

202

Max Flip-Flops

147

273

516

928

1419

1822

Logic Modules

295

547

684

1232

1890

2438

Fig. 5.6 Microsemi MX Family

MX Family devices are given in Table 5.4.

16,000 24,000 36,000

5.10

Field Programmable Gate Arrays and Applications Table 5.4: Microsemi MX Family Devices.

Actel MX Family

● MX 40M and 42M devices Device

A40MX02 A40MX04

A42MX09

A42MX16

A42MX24

A42MX36

Capacity System Gates SRAM Bits

3,000

6,000

14,000

24,000

36,000

54,000











2,560





348

624

954

1,230

295

547

336

608

912

1,184

Logic Modules Sequential Combinational Decodes









24

24

9.5 ns

9.5 ns

5.6 ns

6.1 ns

6.1 ns

6.3 ns

(64×4 or 32×8)











10

Dedicated FlipFlops





348

624

954

1,230

Maximum FlipFlops

147

273

516

928

1,410

1,822

Clocks

1

1

2

2

2

6

User I/O (maximum)

57

69

104

140

176

202

PCl









Yes

Yes

Boundary Scan Test (BST)









Yes

Yes

44,68

44,68,84

84

84

84



100

100

100, 160

100, 160, 208

160, 260

208, 240

80

80

100

100









176

176

176













208, 256











272

Clock-to-Out SRAM Modules

Packages (by pin count) PLCC PCFP VOFP TOFP COFP PBGA

Now we will discuss FlashTechnology Microsemi FPGAs

5.7  PROASIC3 FPGA OVERVIEW The ProASIC®3 series of low cost, low power FPGAs, which includes ProASIC3/E, ProASIC3 nano, and ProASIC3L, offers a breakthrough in power, price, performance,

Microsemi FPGAs

5.11

density, and features for today’s most demanding high-volume applications. ProASIC3 devices support the ARM - Cortex-M1 soft processor IP core, offering the benefits of programmability and time-to-market. The ProASIC3 families are based on nonvolatile flash technology and support 10,000 to 3,000,000 gates, up to 620 I/Os. In addition to supporting portable, consumer, industrial, communications and medical applications with commercial and industrial temperature devices, Microsemi also offers ProASIC3 FPGAs with specialized screening for automotive and military systems.

Key Features

● Low-power consumption



● Lowest total system cost



● 1.2 V or 1.5 V support



● Cost-optimized, reprogrammable, and nonvolatile



● Supports 128-bit AES decryption for device configuration



● Single chip and live at power-up



● 1,024 bits of user flash memory



● Advanced I/O structure



● ARM Cortex-M1 processor support



● Immune to configuration loss due to atmospheric neutrons (firm errors)



● Available in automotive (T-Grade) and military temperature grade



● ISO/TS 16949:2002 certified

5.7.1 The ProASIC3 series ProASIC3\eWithARM Cortex-M1 Processor ProASIC3 nano ProASIC3LWithARM Cortex-M1 Processor Device Comparison ProASIC3 Series

ProASIC3\e

ARM Enabled

M1 ProASIC3/E

ProASIC3nano

ProASIC3L

Overview

The low-power, lowcost FPGA solution

Lowest cost solution with enhanced I/O capabilities

The FPGA that balances low power, performance, and low cost

M1 ProASIC3L

System Gates

15,000 - 3,000,000

10,000 - 250,000

250,000 - 3,000,000

Max User I/Os

620

71

620

Power Consumption

3 mW

0.9 mW

0.4 mW

5.12

Field Programmable Gate Arrays and Applications

5.8 THIRD-GENERATION(PROASIC3) FPGA ARCHITECTURE (FIG. 5.7)

Fig. 5.7 Third Generation Architecture.



1. SRAM and FIFOs: ProASIC3 devices have embedded dual-port SRAM and FIFO blocks along the north and south sides of the device. Each variable-aspect-ratio SRAM block is 4,608 bits in size. Available memory configurations are: 256x18, 512x9, 1kx4, 2kx2, or 4kx1 bits. The individual blocks have independent read and write ports that can be configured with different bit widths on each port. Dedicated FIFO control logic enables flexible and efficient FIFO implementations.



2. VersaTile: The ProASIC3 low power Versatile elements allow synthesis and mapping tools to use any tile as a three-input look-up table equivalent, a D-flipflop, or latch (with or without enable). ProASIC3 devices with VersaTiles offer an abundance of registers so you can often choose a smaller device and still meet register requirements.



3. Advanced I/O Standards ProASIC3 devices support up to 19 advanced I/O standards::



● Cold sparing I/Os



● 700 Mbps LVDS-capable DDR I/Os



● Up to 8 different I/O banks per chip



● Single-Ended I/O Standards: LVTTL, LVCMOS 3.3 V / 2.5 V / 1.8 V / 3.3 V PCI / 3.3 V PCI-X, and LVCMOS 2.5 V / 5.0 V input



● Differential I/O Standards: LVPECL and LVDS, BLVDS, M-LVDS support

Microsemi FPGAs

5.13



● Voltage-Referenced I/O Standards (ProASIC3E only): GTL+2.5 V / 3.3 V, GTL2.5 V / 3.3 V, HSTL Class 1 and 2, SSTL2 Class 1 and 2, SSTL3 Class 1 and 2



● Registered I/Os



● Hot-swap compliant I/Os



● Programmable slew rate and drive strength on outputs



● Programmable delay, weak pull-up/down



● Schmitt trigger option on inputs (ProASIC3E only)



● Pin compatibility across a given package



4. Charge Pumps: ProASIC3 devices can be programmed from a single 3.3 V supply voltage. If remote programming is not required, ProASIC3 devices can be run off a single 1.5 V supply.



5. FlashROM (FROM): ProASIC3 flash FPGAs include user flash memory. One kbit of flash memory, arranged in eight 128-bit pages, allows for diverse applications support, such as device serialization, secure application key storage, revision control, and selective feature enabling.



6. Routing Structure: ProASIC3 provides millions of flash cell switches and an abundance of hierarchical routing resources, enabling extensive design and routing flexibility. VersaNet (segmented global) routing allows high-fan out nets to traverse small or large areas of the ProASIC3 devices with low skew and flexibility. The VersaNet network is used automatically by the software tools to route clocks and high-fanout nets.



7. JTAG (IEEE 1532): ProASIC3 devices use industry-standard JTAG programming (IEEE 1532). In addition, ProASIC3 devices support board-level JTAG (IEEE 1149) I/O boundary scan.



8. PLL and CCC: ProASIC3 devices have six Clock Conditioning Circuits (CCCs) with up to six PLLs.



● Wide input frequency range (fIN_CCC) = 1.5 to 350 MHz



● Output frequency range (fOUT_CCC) = 0.75 to 350 MHz



● Output phase shift = 0°, 90°, 180°, and 270°

5.9  IGLOO LOW POWER FPGA FAMILY The IGLOO low-power FPGA families support up to 3 million system gates with up to 504 kbits of true dual-port SRAM, up to 6 embedded PLLs, and up to 620 user I/Os. Low-power applications that require 32-bit processing can use the ARM ® Cortex™-M1 processor without license fee or royalties in M1 IGLOO devices. Developed specifically for implementation in FPGAs, Cortex-M1 offers an optimal balance between performance and size to minimize power consumption.

5.14

Field Programmable Gate Arrays and Applications

Key Features

● Ultra-low power in Flash*Freeze mode



● Low power active capability



● Variety of small footprint packages as small as 3x3 mm



● Many products under $0.99



● Known good die supported



● Reprogrammable flash technology



● 1.2 V to 1.5 V operation



● High capacity, advanced I/Os



● Clock conditioning circuit (CCCs) and PLLs



● Embedded SRAM and nonvolatile memory (NVM)



● In-system programming (ISP) and security

5.10. IGLOO FPGA FAMILY OVERVIEW The Industry’s Lowest-Power FPGAs The Microsemi IGLOO® series of Low Power FPGAs includes IGLOO/e, IGLOO nano, and IGLOO PLUS devices—the industry’s lowest-power FPGAs. IGLOO devices are reprogrammable, full-featured flash, low-power FPGAs designed to meet the demanding power and area requirements of today’s portable and power-conscious electronics. Based on Microsemis nonvolatile flash technology and the single-chip FPGA architecture, the 1.2 V / 1.5 V operating voltage family offers the industry’s lowest power consumption, smallest footprint, and competitive prices. The IGLOO series includes:

● IGLOO/e with ARM Cortex-M1



● IGLOO nano



● IGLOO PLUS Device Comparison IGLOO Family

IGLOO/e

ARM - Enabled

M1 IGLOO/e

IGLOO nano

IGLOO PLUS

Overview

The ultra-low-power, programmable solution

The industry’s lowestpower, smallest-size solution

The low-power FPGA with enhanced I/O capabilities

System Gates

15,000 - 3,000,000

10,000 - 250,000

30,000 - 125,000

Max User I/Os

620

71

212

Power Consumption

5 µW

2 µW

5 µW

Microsemi FPGAs

5.15

5.11  FLASH*FREEZE TECHNOLOGY The Flash*Freeze technology used in IGLOO devices enables easy entry and exit from ultra-low power mode, which consumes as little as 2 µW, while retaining SRAM and register data. Flash*Freeze technology simplifies power management through I/O and clock management without a need to turn off voltages, I/Os, or clocks at the system level. In Flash*Freeze mode, power drops to as low as 2 µW, and no additional components are required to turn off I/Os or clocks while preserving the design information, SRAM content, and registers. I/Os can maintain their state during Flash*Freeze mode. Entering and exiting Flash*Freeze mode takes less than 1 µs. Additionally, the Low Power Active capability (static idle) allows for ultra-low power consumption while the IGLOO device is completely functional in the system by maintaining I/Os, SRAM, registers and logic functions. I/Os can maintain their state during Flash*Freeze mode. This allows the IGLOO device to control the system power management based on external inputs (e.g., scanning for keyboard stimulus) while consuming minimal power.

5.12  SMALL FOOTPRINT PACKAGES Microsemi IGLOO low power FPGAs are true single-chip devices, do not require configuration or other support components, and offer a variety of small-footprint packages with high I/O pin count to match design needs. The Microsemi IGLOO family is offered in a small form factor (3 × 3, 4 × 4, 5 × 5, 6 × 6, and 8 × 8 mm), high-density, chip-scale packages and quad flat no-lead packages.

5.13  SMARTFUSION2 SOC FPGA Microsemi’s SmartFusion®2 SoC FPGAs integrate fourth generation flash-based FPGA fabric, an ARM® Cortex®-M3 processor, and high-performance communications interfaces on a single chip. The SmartFusion2 family is the industry’s lowest power, most reliable and highest security programmable logic solution. SmartFusion2 FPGAs offer up to 3.6X the gate density, up to 2X the performance of previous flash-based FPGA families, and includes multiple memory blocks and multiply accumulate blocks for DSP processing. The 166 MHz ARM Cortex-M3 processor is enhanced with an embedded trace macrocell (ETM), memory protection unit (MPU), 8 Kbyte instruction cache, and additional peripherals, including controller area network (CAN), Gigabit Ethernet, and high speed universal serial bus (USB). High speed serial interfaces include PCI EXPRESS® (PCIe®), 10 Gbps attachment unit interface (XAUI) / XGMII extended sublayer (XGXS) plus native serialization/ deserialization (SERDES) communication, while DDR2/DDR3 memory controllers provide high speed memory interfaces.

5.16

Field Programmable Gate Arrays and Applications

SmartFusion2 SoC FPGA Block Diagram is shown in Fig 5.8.

Fig. 5.8 SmartFusion2 SoC FPGA.

SmartFusion2 Chip Layout (Fig. 5.9) PLLs

PLLs

Serdes MSS and DDR

uSRAM (1 Kb)

East I/Os West I/Os

FPGA Fabric eNVM Math Blocks LSRAM (18 Kb) Oscillators System Controller Fabric DDR

Serdes

Crystal

PLLs

Fig. 5.9 SmartFusion2 Chip Layout.

Microsemi FPGAs

5.17

5.14 RELIABILITY SmartFusion2 flash-based fabric has zero FIT (Failure In Time) configuration rate due to its single event upset (SEU) immunity, which is critical in reliability applications. The flash fabric also has the advantage that no external configuration memory is required, making the device instant-on; it retains configuration when powered off. To complement this unique FPGA capability, SmartFusion2 devices add reliability to many other aspects of the device. Single Error Correct Double Error Detect (SECDED) protection is implemented on the Cortex-M3 embedded scratch pad memory, Ethernet, CAN and USB buffers, and is optional on the DDR memory controllers. This means that if a one-bit error is detected, it will be corrected. Errors of more than one bit are detected only and not corrected. SECDED error signals are brought to the FPGA fabric to allow the user to monitor the status of these protected internal memories. Other areas of the architecture are implemented with latches, which are more resistant to SEUs. Therefore, no correction is needed in these locations: DDR bridges (MSS, MDDR, FDDR), instruction cache and MMUART( Multi Mode UART), SPI, and PCIe FIFOs.

5.15  HIGHEST SECURITY DEVICES Building further on the intrinsic security benefits of flash nonvolatile memory technology, the SmartFusion2 family incorporates essentially all the legacy security features that made the original SmartFusion®, Fusion®, IGLOO®, and ProASIC®3 third-generation flash FPGAs and cSoCs (Customised System-0n-Chip) the gold standard for secure devices in the PLD industry. In addition, the fourth-generation flash-based SmartFusion2 SoC FPGAs add many unique design and data security features and use models new to the PLD industry.

5.16  DESIGN SECURITY Design security is protecting the intent of the owner of the design, such as keeping the design and associated bitstream keys confidential, preventing design changes (for example, insertion of Trojan Horses), and controlling the number of copies made throughout the device life cycle. Design security may also be known as intellectual property (IP) protection. It is one aspect of anti-tamper (AT) protection. Design security applies to the device from initial production, includes any updates such as in-the-field upgrades, and can include decommissioning of the device at the end of its life, if desired. Good design security is a prerequisite for good data security.

● The following are the main design security features supported. Software Memory Protection Unit (MPU)



● FlashLock™ Passcode Security (256-bit)



● Flexible security settings using flash lock-bits



● Encrypted/Authenticated Design Key Loading



● Symmetric Key Design Security (256-bit)

5.18

Field Programmable Gate Arrays and Applications



● Design Key Verification Protocol



● Encrypted/Authenticated Configuration Loading



● Certificate-of-Conformance (C-of-C)



● Back-Tracking Prevention (also known as, Versioning)



● Device Certificate(s) (Anti-Counterfeiting)



● Support for Configuration Variations



● Fabric NVM and eNVM Integrity Tests



● Information Services (S/N, Cert., USERCODE, and others)



● Tamper Detection



● Tamper Response (includes Zeroization)



● ECC Public Key Design Security (384-bit)



● Hardware Intrinsic Design Key (SRAM-PUF)

5.17  DATA SECURITY Data security is protecting the information the FPGA is storing, processing, or communicating in its role in the end application. If, for example, the configured design is implementing the key management and encryption portion of a secure military radio, data security could entail encrypting and authenticating the radio traffic, and protecting the associated application-level cryptographic keys. Data security is closely related to the terms information assurance IA) and information security. All SmartFusion2 devices incorporate enhanced design security, making them the most secure programmable logic devices ever made. Select SmartFusion2 models also include an advanced set of on-chip data security features that make designing secure information assurance applications easier and better than ever before. The following are the main data security features supported.

● CRI Pass-through DPA Patent License



● Hardware Firewalls protecting access to memories



● Non-Deterministic Random Bit Generator Service



● AES-128/256 Service (ECB, OFB, CTR, CBC modes)



● SHA-256 Service



● HMAC-SHA-256 Service



● Key Tree Service



● PUF Emulation (Pseudo-PUF)



● PUF Emulation (SRAM-PUF)



● ECC Point-Multiplication Service



● ECC Point-Addition Service



● User SRAM-PUF Enrollment Service

Microsemi FPGAs

● User SRAM-PUF Activation Code Export Service



● SRAM-PUF Intrinsic Key Gen. &Enrollment Service



● SRAM-PUF Key Import &Enrollment Service



● SRAM-PUF Key Regeneration Service

5.19

5.18  LOW POWER Microsemi’s flash-based FPGA fabric results in extremely low power design implementation with static power as low as 7 mW. Flash*Freeze (F*F) technology provides an ultra-low power static mode (Flash*Freeze mode) for SmartFusion2 devices, with power less than 7 mW for the largest device. F*F mode entry retains all the SRAM and register information and the exit from F*F mode achieves rapid recovery to active mode.

5.19  HIGH-PERFORMANCE FPGA FABRIC Built on 65 nm process technology, the SmartFusion2 FPGA fabric is composed of 4 building blocks: the logic module, the large SRAM, the micro SRAM and the math block. The logic module is the basic logic element and has advanced features:

● A fully permutable 4-input LUT (look-up table) optimized for lowest power



● A dedicated carry chain based on carrylook-ahead technique



● A separate flip-flop which can be used independently from the LUT

The 4-input look-up table can be configured either to implement any 4-input combinatorial function or to implement an arithmetic function where the LUT output is XORed with carry input to generate the sum output.

5.20  DUAL-PORT LARGE SRAM (LSRAM) Large SRAM (RAM1Kx18) is targeted for storing large memory for use with various operations. Each LSRAM block can store up to 18,432 bits. Each RAM1Kx18 block contains two independent data ports: Port A and Port B. The LSRAM is synchronous for both Read and Write operations. Operations are triggered on the rising edge of the clock.The data output ports of the LSRAM have pipeline registers which have control signals that are independent of the SRAM’s control signals.

5.21 THREE-PORT MICRO SRAM (USRAM) Micro SRAM (RAM64x18) is the second type of SRAM which is embedded in the fabric of SmartFusion2 devices.RAM64x18 uSRAM is a 3-port SRAM; it has two read ports (Port A and Port B) and one write port (Port C). The two read ports are independent of each other and can perform Read operations in both synchronous and asynchronous modes. The write port is always synchronous. The uSRAM block is approximately 1 KB (1,152 bits) in size.

5.20

Field Programmable Gate Arrays and Applications

These uSRAM blocks are primarily targeted for building embedded FIFOs to be used by any embedded fabric masters.

5.22  MATHBLOCKS FOR DSP APPLICATIONS The fundamental building block in any digital signal processing algorithm is the multiplyaccumulate function. SmartFusion2 FPGAs implement a custom 18 × 18 MultiplyAccumulate (18x18 MACC) block for efficient implementation of complex DSP algorithms such as finite impulse response (FIR) filters, infinite impulse response (IIR)filters, and Fast Fourier Transform (FFT) for filtering and image processing applications. Each math block has the following capabilities:

● Supports 18x18 signed multiplications natively (A[17:0] x B[17:0])



● Supports dot product; the multiplier computes:

(A[8:0] x B[17:9] + A[17:9] x B[8:0]) x 29

● Built-in addition, subtraction, and accumulation units to combine multiplication results efficiently

In addition to the basic MACC function, DSP algorithms typically need small amounts of RAM for coefficients and larger RAMs for data storage. SmartFusion2 micro RAMs are ideally suited to serve the needs of coefficient storage while the large RAMs are used for data storage.

5.23  MICROCONTROLLER SUBSYSTEM (MSS) The microcontroller subsystem (MSS) contains a high-performance integrated Cortex-M3 processor, running at up to166 MHz. The MSS contains an 8 Kbyte instruction cache to provide low latency access to internal eNVM (embedded Non Volatile Memory) and external DDR memory. The MSS provides multiple interfacing options to the FPGA fabric in order to facilitate tight integration between the MSS and user logic in the fabric.

5.23.1 ARM Cortex-M3 Processor The MSS uses the latest revision (r2p1) of the ARM Cortex-M3 processor. Microsemi’s implementation includes the optional embedded trace macrocell (ETM) features for easier development and debug and the memory protection unit (MPU) for real-time operating system support.

5.23.2 Cache Controller In order to minimize latency for instruction fetches when executing firmware out of off-chip DDR or on-chip eNVM, an 8 Kbyte, 4-way set associative instruction cache is implemented. This provides zero wait state access for cache hits and is shared by both I and D code buses of the Cortex-M3 processor. In the event of cache misses, cache lines are filled, replacing existing cache entries based on a least recently used (LRU) algorithm. There is a configurable option available to operate the cache in a locked mode, whereby a

Microsemi FPGAs

5.21

fixed segment of code from either the DDR or eNVM (embedded Non Volatile Memory) is copied into the cache and locked there, so that it is not replaced when cache misses occur. This would be used for performance-critical code. It is also possible to disable the cache altogether, which is desirable in systems requiring very deterministic execution times. The cache is implemented with SEU tolerant latches.

5.23.3  DDR Bridge The DDR bridge is a data bridge between four AHB bus masters and a single AXI bus slave. The DDR bridge accumulates AHB writes into write combining buffers prior to bursting out to external DDR memory. The DDR bridge also includes read combining buffers, allowing AHB masters to efficiently read data from the external DDR memory from a local buffer. The DDR bridge optimizes reads and writes from multiple masters to a single external DDR memory. Data coherency rules between the four masters and the external DDR memory are implemented in hardware. The DDR bridge contains three write combining / read buffers and one read buffer. All buffers within the DDR bridge are implemented with SEU tolerant latches and are not subject to the single event upsets (SEUs) that SRAM exhibits. SmartFusion2 devices implement three DDR bridges in the MSS, FDDR, and MDDR subsystems.

5.23.4  AHB Bus Matrix (ABM) The AHB bus matrix (ABM) is a non-blocking, AHB-Lite multi-layer switch, supporting 10 master interfaces and 7 slave interfaces. The switch decodes access attempts by masters to various slaves, according to the memory map and security configurations. When multiple masters are attempting to access a particular slave simultaneously, an arbiter associated with that slave decides which master gains access, according to a configurable set of arbitration rules. These rules can be configured by the user to provide different usage patterns to each slave. For example, a number of consecutive access opportunities to the slave can be allocated to one particular master, to increase the likelihood of same type accesses (all reads or all writes), which makes more efficient usage of the bandwidth to the slave.

5.23.5  System Registers The MSS System registers are implemented as an AHB slave on the AHB bus matrix. This means the Cortex-M3processor or a soft master in the FPGA fabric may access the registers and therefore control the MSS. The System registers can be initialized by user-defined flash configuration bits on power-up. Each register also has a flash bit to enable write protecting the contents of the registers. This allows the MSS system configuration to be reliably fixed for a given application.

5.23.6 Fabric Interface Controller (FIC) The FIC block provides two separate interfaces between the MSS and the FPGA fabric: the MSS master (MM) and fabric master (FM). Each of these interfaces can be configured to

5.22

Field Programmable Gate Arrays and Applications

operate as AHB-Lite or APB3. Depending on device density, there are up to two FIC blocks present in the MSS (FIC_0 and FIC_1).

5.23.7  Embedded SRAM (eSRAM) The MSS contains two blocks of 32 KB eSRAM, giving a total of 64 KB. Having the eSRAM arranged as two separate blocks allows the user to take advantage of the Harvard architecture of the Cortex-M3 processor. For example, code could be located in one eSRAM, while data, such as the stack, could be located in the other. The eSRAM is designed for Single Error Correct Double Error Detect (SECDED) protection. When SECDED is disabled, the SRAM usually used to store SECDED data may be reused as an extra 16 KB of eSRAM.

5.23.8  Embedded NVM (eNVM) The MSS contains up to 512 KB of eNVM (64 bits wide). Accesses to the eNVM from the Cortex-M3 processor are cacheable.

5.23.9  DMA Engines Two DMA engines are present in the MSS: high-performance DMA and peripheral DMA. High-Performance DMA (HPDMA) The high-performance DMA (HPDMA) engine provides efficient memory to memory data transfers between an external DDR memory and internal eSRAM. This engine has two separate AHB-Lite interfaces—one to the MDDRbridge and the other to the AHB bus matrix. All transfers by the HPDMA are full word transfers. Peripheral DMA (PDMA) The peripheral DMA engine (PDMA) is tuned for offloading byte-intensive operations, involving MSS peripherals, to and from the internal eSRAMs. Data transfers can also be targeted to user logic/RAM in the FPGA fabric.

5.23.10 APB Configuration Bus On every SmartFusion2 device, an APB configuration bus is present to allow the user to initialize the SERDES ASICblocks, the fabric DDR memory controller, and user instantiated peripherals in the FPGA fabric.

5.23.11 Peripherals A large number of communications and general purpose peripherals are implemented in the MSS. USB Controller: The MSS contains a high speed USB 2.0 On-The-Go (OTG) controller with the following features:

● Operates either as the function controller of a high-speed / full-speed USB peripheral or as the host/peripheral in point-to-point or multi-point communications with other USB functions.



● Complies with the USB 2.0 standard for high-speed functions and with the On-TheGo supplement to the USB2.0 specification.

Microsemi FPGAs

5.23

● Supports OTG communications with one or more high-speed, full-speed, or lowspeed devices.

TSE Ethernet MAC: The triple speed Ethernet (TSE) MAC supports IEEE 802.3 10/100/1000 Mbps Ethernet operation. The following PHYinterfaces are directly supported by the MAC:

● GMII



● MII



● TBI The Ethernet MAC hardware implements the following functions:



● 4 KB internal transmit FIFO and 8 KB internal receive FIFO



● IEEE 802.3X full-duplex flow control



● DMA of Ethernet frames between internal FIFOs and system memory (such as eSRAM or DDR)



● Cut-through operation



● SECDED protection on internal buffers

SGMII PHY Interface: SGMII mode is implemented by means of configuring the MAC for 10-bit interface (TBI) operation, allocating one of the high-speed serial channels to SGMII, and by implementing custom logic in the fabric. 10 Gbps Ethernet: Support for 10 Gbps Ethernet is achieved by programming the SERDES interface to XAUI mode. In this mode, a soft10G EMAC with XGMII interface can be directly connected to the SERDES interface. Communication Block (COMM_BLK): The COMM block provides a UART-like communications channel between the MSS and the system controller. System services are initiated through the COMM block. SPI: The serial peripheral interface controller is compliant with the Motorola SPI, Texas Instruments synchronous serial, and National Semiconductor MICROWIRE™ formats. In addition, the SPI supports interfacing to large SPI flash and EEPROM devices by way of the slave protocol engine. The SPI controller supports both Master and Slave modes ofoperation.The SPI controller embeds two 4 × 32 (depth × width) FIFOs for receive and transmit. These FIFOs are accessible through Rx data and Tx data registers. Writing to the Tx data register causes the data to be written to the transmit FIFO. This is emptied by transmit logic. Similarly, reading from the Rx data register causes data to be read from thereceive FIFO. Multi-Mode UART (MMUART): SmartFusion2 devices contain two identical multimode universal asynchronous/synchronous receiver/transmitter(MMUART) peripherals that provide software compatibility with the popular 16550 (standard UART) device. They perform serial-to parallel conversion on data originating from modems or other serial devices, and perform parallel-to-serial conversion on data from the Cortex-M3 processor to these devices.

5.24

Field Programmable Gate Arrays and Applications

The following are the main features supported:

● Fractional baud rate capability



● Asynchronous and synchronous operation



● Full programmable serial interface characteristics



– Data width is programmable to 5, 6, 7, or 8 bits



– Even, odd, or no-parity bit generation/detection



– 1,1½, and 2 stop bit generation



● 9-bit address flag capability used for multidrop addressing topologies

I2C: SmartFusion2 devices contain two identical master/slave I2C peripherals that perform serial to-parallel conversion on data originating from serial devices, and perform parallel-to-serial conversion on data from the ARM Cortex-M3processor, or any other bus master, to these devices. The following are the main features supported:

● I2C v2.1



– 100 Kbps



– 400 Kbps



● Dual-slave addressing



● SMBus v2.0



● PMBus v1.1

5.24  CLOCK SOURCES: ON-CHIP OSCILLATORS, PLLS, AND CCCS SmartFusion2 devices have two on-chip RC oscillators—a 1 MHz RC oscillator and a 50 MHz RC oscillator—and up totwo main crystal oscillators (32 kHz–20 MHz). These are available to the user for generating clocks to the on-chip resources and the logic built on the FPGA fabric array. The second crystal oscillator available on the SmartFusion2devices is dedicated for RTC clocking. These oscillators (except the RTC crystal oscillator) can be used in conjunction with the integrated user phase-locked loops (PLLs) and fabric clock conditioning circuits (FAB_CCC) to generate clocks of varying frequency and phase. In addition to being available to the user, these oscillators are also used by the system controller, power-on reset circuitry, MSS during Flash*Freeze mode, and the RTC. SmartFusion2 devices have up to eight fabric CCC (FAB_CCC) blocks and a dedicated PLL associated with each CCC to provide flexible clocking to the FPGA fabric portion of the device. The user has the freedom to use any of the eight PLLs and CCCs to generate the fabric clocks and the internal MSS clock from the base fabric clock (CLK_BASE). There is also a dedicated CCC block for the MSS (MSS_CCC) and an associated PLL (MPLL) for MSS clocking and de-skewing the CLK_BASE clock. The fabric alignment clock controller (FACC), part of the MSS CCC, is responsible for generating various aligned clocks required by the MSS for correct operation of the MSS blocks and synchronous communication with the user logic in the FPGA fabric.

Microsemi FPGAs

5.25

5.25  HIGH SPEED SERIAL INTERFACES SERDES Interface: SmartFusion2 has up to four 5 Gbps SERDES transceivers, each supporting the following: ● 4 SERDES lanes ● The native SERDES interface facilitates implementation of Serial RapidIO (SRIO) in fabric or an SGMII interface for the Ethernet MAC in MSS PCI Express (PCIe): PCIe is a high speed, packet-based, point-to-point, low pin count, serial interconnect bus. The SmartFusion2 family has two hard high-speed serial interface blocks. Each SERDES block contains a PCIe system block. The PCIe system is connected to the SERDES block and following are the main features supported: ● Supports x1, x2, and x4 lane configuration ● Endpoint configuration only ● PCI Express Base Specification Revision 2.0 ● 2.5 and 5.0 Gbps compliant ● Embedded receive (2 KB), transmit (1 KB) and retry (1 KB) buffer dual-port RAM implementation ● Up to 2 Kbytes maximum payload size ● 64-bit AXI or 32-bit AHB-Lite Master and Slave interface to the application layer ● 32-bit APB interface to access configuration and status registers of PCIe system ● Up to 3 × 64 bit base address registers ● 1 virtual channel (VC) XAUI/XGXS Extension: The XAUI/XGXS extension allows the user to implement a 10 Gbps (XGMII) Ethernet PHY interface by connecting the Ethernet MAC fabric interface through an appropriate soft IP block in the fabric.

5.26  HIGH SPEED MEMORY INTERFACES: DDRX MEMORY CONTROLLERS There are up to three DDR subsystems, MDDR (MSS DDR) and FDDR (fabric DDR) present in SmartFusion2 devices. Each subsystem consists of a DDR controller, PHY, and a wrapper. The MDDR has an interface from the MSS and fabric, and FDDR provides an interface from the fabric. The following are the main features supported by the FDDR and MDDR:

● ● ● ● ● ●

Support for LPDDR, DDR2, and DDR3 memories Simplified DDR command interface to standard AMBA AXI/AHB interface Up to 667 Mbps (333 MHz double data rate) performance Supports 1, 2, or 4 ranks of memory Supports different DRAM bus width modes: x8, x9, x16, x18, x32, and x36 Supports DRAM burst length of 2, 4, or 8 in full bus-width mode; supports DRAM burst length of 2, 4, 8, or 16 in half bus-width mode

5.26

Field Programmable Gate Arrays and Applications

● Supports DRAM burst length of 2, 4, or 8 in full bus width mode; supports DRAM burst length of 2, 4, 8, or 16 in half bus-width mode ● Supports memory densities up to 4 GB ● Supports a maximum of 8 memory banks ● SECDED enable/disable feature ● Embedded physical interface (PHY) ● Read and Write buffers in fully associative CAMs, configurable in powers of 2, up to 64 Reads plus 64 Writes ● Support for dynamically changing clock frequency while in self-refresh ● Supports command reordering to optimize memory efficiency ● Supports data reordering, returning critical word first for each command

5.27  MDDR SUBSYSTEM The MDDR subsystem has two interfaces to the DDR. One is an AXI 64-bit bus from the DDR bridge within the MSS. The other is a multiplexed interface from the FPGA fabric, which can be configured as either a single AXI 64-bit bus or two 32-bit AHB-Lite buses. There is also a 16-bit APB configuration bus, which is used to initialize the majority of the internal registers within the MDDR subsystem after reset. This APB configuration bus can be mastered by the MSS directly or by a master in the FPGA fabric. Support for 3.3 V Single Data Rate DRAMs (SDRAM) can be obtained by using the SMC_FIC interface in the MDDR subsystem. Users would then instantiate a soft AHB or AXI SDRAM memory controller in the FPGA fabric and connect I/O ports to 3.3 V MSIO.

5.28  FDDR SUBSYSTEM The FDDR subsystem has one interface to the DDR. This is a multiplexed interface from the FPGA fabric, which can be configured as either a single AXI 64-bit bus or two 32-bit AHB-Lite buses. There is also a 16-bit APB configuration bus, which is used to initialize the majority of the internal registers within the FDDR subsystem after reset. This APBconfiguration bus can be mastered by the MSS or a master in the FPGA fabric.

5.29  IGLOO2 FPGAS The Microcontroller Subsystem (MSS) in SmartFusion2 and High Performance Memory Subsystem (HPMS) in IGLOO2 have some common units with the same description and function. These units are repeated under IGLOO2 for completeness. This may kindly be noted. Microsemi’s IGLOO®2 FPGAs integrate fourth generation flash-based FPGA fabric and high-performance communications interfaces on a single chip. The IGLOO2 family is the industry’s lowest power, most reliable and highest security programmable logic solution. This next generation IGLOO2 architecture offers up to 3.6X gate count implemented with 4-input look-up table (LUT) fabric with carry chains, giving 2X performance, and includes multiple embedded memory options and mathblocks for digital signal processing

Microsemi FPGAs

5.27

(DSP). High speed serial interfaces include PCI EXPRESS® (PCIe®), 10 Gbps attachment unit interface (XAUI) / XGMII extended sublayer (XGXS) plus native serialization/ deserialization (SERDES) communication, while double data rate 2 (DDR2)/DDR3 memory controllers provide high speed memory interfaces. IGLOO2 FPGA Block Diagram is shown in Fig. 5.10.

Fig. 5.10 IGLOO2 FPGA

IGLOO2 FPGA offer same security, reliability and low power features as offered by SmartFusion2 devices. So, SmartFusion2 section can be referred for the same. Functionally, the primary difference between SmartFusion2 and IGLOO2 devices is the exclusion of Microcontroller-sub-system (MSS) in IGLOO2 devices. In IGLOO2 devices, High Speed Memory Subsystem is included (HPMS). Though, most of the functional blocks have similar specifications as SmartFusion2, they are covered again for completeness in following sections.

5.29.1  High-Performance FPGA Fabric Built on 65 nm process technology, the IGLOO2 FPGA fabric is composed of four building blocks: the logic module, the large SRAM, the micro SRAM and the mathblock. The logic module is the basic logic element and has advanced features:

● A fully permutable 4-input LUT (look-up table) optimized for lowest power



● A dedicated carry chain based on carry look-ahead technique



● A separate flip-flop which can be used independently from the LUT

5.28

Field Programmable Gate Arrays and Applications

The 4-input look-up table can be configured either to implement any 4-input combinatorial function or to implement an arithmetic function where the LUT output is XORed with carry input to generate the sum output.

5.29.2  Dual-Port Large SRAM (LSRAM) Large SRAM (RAM1Kx18) is targeted for storing large memory for use with various operations. Each LSRAM blockcan store up to 18,432 bits. Each RAM1Kx18 block contains two independent data ports: Port A and Port B. The LSRAM is synchronous for both Read and Write operations. Operations are triggered on the rising edge of the clock. The data output ports of the LSRAM have pipeline registers which have control signals that are independent of the SRAM’s control signals.

5.29.3  Three-Port Micro SRAM (uSRAM) Micro SRAM (RAM64x18) is the second type of SRAM which is embedded in the fabric of IGLOO2 devices. RAM 64x18 uSRAM is a 3-port SRAM; it has two read ports (Port A and Port B) and one write port (Port C). The tworead ports are independent of each other and can perform Read operations in both synchronous and asynchronous modes. The write port is always synchronous. The uSRAM block is approximately 1 KB (1,152 bits) in size. These uSRAM blocks are primarily targeted for building embedded FIFOs to be used by any embedded fabric masters.

5.29.4 Mathblocks for DSP Applications The fundamental building block in any digital signal processing algorithm is the multiplyaccumulate function. TheIGLOO2 device implements a custom 18x18 Multiply-Accumulate (18 × 18 MACC) block for efficient implementation of complex DSP algorithms such as finite impulse response (FIR) filters, infinite impulse response (IIR) filters, and fast Fourier transform (FFT) for filtering and image processing applications. Each mathblock has the following capabilities:

● Supports 18x18 signed multiplications natively (A[17:0] × B[17:0])



● Supports dot product; the multiplier computes:

(A[8:0] × B[17:9] + A[17:9] × B[8:0]) × 29

● Built-in addition, subtraction, and accumulation units to combine multiplication results efficiently

In addition to the basic MACC function, DSP algorithms typically need small amounts of RAM for coefficients andlarger RAMs for data storage. IGLOO2 micro RAMs are ideally suited to serve the needs of coefficient storage while the large RAMs are used for data storage.

5.29.5  High-Performance Memory Subsystem (HPMS) The high-performance memory subsystem (HPMS) embeds two separates 32 kbyte SRAM blocks that have optional SECDED capabilities (32 kbytes with SECDED enabled, 40

Microsemi FPGAs

5.29

kbytes with SECDED disabled), up to two separate256 kbyte eNVM (flash) blocks, and two separate DMA controllers for fast DMA user logic offloading. The HPMS provides multiple interfacing options to the FPGA fabric in order to facilitate tight integration between the HPMS and user logic in the fabric.

5.29.6  DDR Bridge The DDR bridge is a data bridge between two AHB bus masters and a single AXI bus slave. The DDR bridge accumulates AHB writes into write combining buffers prior to bursting out to external DDR memory. The DDR bridgealso includes read combining buffers, allowing AHB masters to efficiently read data from the external DDR memory from a local buffer. The DDR bridge optimizes reads and writes from multiple masters to a single external DDR memory. Data coherency rules between the masters and the external DDR memory are implemented in hardware. TheDDR bridge contains two write combining / read buffers. All buffers within the DDR bridge are implemented with SEUtolerant latches and are not subject to the single event upsets (SEUs) that SRAM exhibits. IGLOO2 devices implementthree DDR bridges in the HPMS, FDDR, and MDDR subsystems.

5.29.7  AHB Bus Matrix (ABM) The AHB bus matrix (ABM) is a non-blocking, AHB-Lite multi-layer switch, supporting 4 master interfaces and 8 slave interfaces. The switch decodes access attempts by masters to various slaves, according to the memory map and security configurations. When multiple masters are attempting to access a particular slave simultaneously, an arbiter associated with that slave decides which master gains access, according to a configurable set of arbitration rules. These rules can be configured by the user to provide different usage patterns to each slave. For example, a number of consecutive access opportunities to the slave can be allocated to one particular master, to increase the likelihood of same type accesses (all reads or all writes), which makes more efficient usage of the bandwidth to the slave.

5.29.8  Fabric Interface Controller (FIC) The FIC block provides two separate interfaces between the HPMS and the FPGA fabric: the HPMS master (MM) and fabric master (FM). Each of these interfaces can be configured to operate as AHB-Lite or APB3. Depending on device density, there are up to two FIC blocks present in the HPMS (FIC_0 and FIC_1).

5.29.9 Embedded SRAM (eSRAM) The HPMS contains two blocks of 32 KB eSRAM, giving a total of 64 KB. Having the eSRAM arranged as two separate blocks allows the user to take advantage of the parallelism that exists in the HPMS. The eSRAM is designed for Single Error Correct Double Error Detect (SECDED) protection. When SECDED isdisabled, the SRAM usually used to store SECDED data may be reused as an extra 16 KB of eSRAM.

5.30

Field Programmable Gate Arrays and Applications

5.29.10 Embedded NVM (eNVM) The HPMS contains up to 512 KB of eNVM (64 bits wide).

5.29.11 DMA Engines Two DMA engines are present in the HPMS: high-performance DMA and peripheral DMA. High-Performance DMA (HPDMA): The high-performance DMA (HPDMA) engine provides efficient memory to memory data transfers between an external DDR memory and internal eSRAM. This engine has two separate AHB-Lite interfaces—one to the MDDR bridge and the other to the AHB bus matrix. All transfers by the HPDMA are full word transfers. Peripheral DMA (PDMA): The peripheral DMA engine (PDMA) is tuned for offloading byte-intensive operations, involving HPMS peripherals, to and from the internal eSRAMs. Data transfers can also be targeted to user logic/RAM in the FPGA fabric.

5.29.12 APB Configuration Bus On every IGLOO2 device memory, an APB configuration bus is present to allow the user to initialize the SERDESASIC blocks, the fabric DDR memory controller, and user instantiated peripherals in the FPGA fabric.

5.29.13 Peripherals A large number of communications and general purpose peripherals are implemented in the HPMS. Communication Block (COMM_BLK): The COMM block provides a UART-like communications channel between the HPMS and the system controller. System services are initiated through the COMM block. System services such as Enter Flash*Freeze Mode are initiated though this block. SPI: The serial peripheral interface controller is compliant with the Motorola SPI, Texas Instruments synchronous serial, andNational Semiconductor MICROWIRE™ formats. In addition, the SPI supports interfacing to large SPI flash and EEPROM devices by way of the slave protocol engine. The SPI controller supports both Master and Slave modes of operation. The SPI controller embeds two 4×32 (depth × width) FIFOs for receive and transmit. These FIFOs are accessible through RX data and TX data registers. Writing to the TX data register causes the data to be written to the transmit FIFO. This is emptied by transmit logic. Similarly, reading from the RX data register causes data to be read from the receive FIFO.

5.30.  CLOCK SOURCES: ON-CHIP OSCILLATORS, PLLS, AND CCCS IGLOO2 devices have two on-chip RC oscillators—a 1 MHz RC oscillator and a 50 MHz RC oscillator—and the main crystal oscillator (32 KHz–20 MHz). These are available to the user for generating clocks to the on-chip resources and the logic built on the FPGA fabric array. These oscillators can be used in conjunction with the integrated user phase

Microsemi FPGAs

5.31

locked loops (PLLs) and FAB_CCCs to generate clocks of varying frequency and phase. In addition to being available to the user, these oscillators are also used by the system controller, power-on reset circuitry, and HPMS during theFlash*Freeze mode. IGLOO2 devices have up to eight fabric CCC (FAB_CCC) blocks and a dedicated PLL associated with each CCC toprovide flexible clocking to the FPGA fabric portion of the device. The user has the freedom to use any of the eight PLLs and CCCs to generate the fabric clocks and the internal HPMS clock from the base fabric clock (CLK_BASE).There is also a dedicated CCC block for the HPMS (HPMS_CCC) and an associated PLL (MPLL) for HPMS clocking and de-skewing the CLK_BASE clock. The fabric alignment clock controller (FACC), part of the HPMS CCC, is responsible for generating various aligned clocks required by the HPMS for correct operation of the HPMS blocks and synchronous communication with the user logic in the FPGA fabric.

5.31  HIGH SPEED SERIAL INTERFACES SERDES Interface IGLOO2 FPGA has up to four 5 Gbps SERDES transceivers, each supporting the following:

● 4 SERDES/PCS lanes



● The native SERDES interface facilitates implementation of Serial RapidIO (SRIO) in fabric or a SGMII interfacefor a soft Ethernet MAC

PCI Express (PCIe): PCIe is a high speed, packet-based, point-to-point, low pin count, serial interconnect bus. The IGLOO2 family has two hard high-speed serial interface blocks. Each SERDES block contains a PCIe system block. The PCIe system is connected to the SERDES block and following are the main features supported:

● Supports x1, x2, and x4 lane configuration ● Endpoint configuration only



● PCI Express Base Specification Revision 2.0



● 2.5 and 5.0 Gbps compliant



● Embedded receive (2 KB), transmit (1 KB) and retry (1 KB) buffer dual-port RAM implementation



● Up to 2 kbytes maximum payload size



● 64-bit AXI or 32-bit/64-bit AHBL Master and Slave interface to the application layer



● 32-bit APB interface to access configuration and status registers of PCIe system



● Up to 3 × 64 bit base address registers



● 1 virtual channel (VC)

XAUI/XGXS Extension: The XAUI/XGXS extension allows the user to implement a 10 Gbps (XGMII) Ethernet PHY interface by connecting the XGMII fabric interface through an appropriate soft IP block in the fabric.

5.32

Field Programmable Gate Arrays and Applications

5.32 HIGH SPEED MEMORY INTERFACES: DDRX MEMORY CONTROLLERS There are up to two DDR subsystems, MDDR (HPMS DDR) and FDDR (fabric DDR) present in IGLOO2 devices. Each subsystem consists of a DDR controller, PHY, and a wrapper. The MDDR has an interface to/from the HPMS and fabric, and FDDR provides an interface to/from the fabric. The following are the main features supported by the FDDR and MDDR:

● Support for LPDDR, DDR2, and DDR3 memories



● Simplified DDR command interface to standard AMBA AXI/AHB interface



● Up to 667 Mbps (333 MHz double data rate) performance



● Supports 1, 2, or 4 ranks of memory



● Supports different DRAM bus width modes: x8, x9, x16, x18, x32, and x36



● Supports DRAM burst length of 2, 4, or 8 in full bus-width mode; supports DRAM burst length of 2, 4, 8, or 16 in half bus-width mode



● Supports memory densities up to 4 GB



● Supports a maximum of 8 memory banks



● SECDED enable/disable feature



● Embedded physical interface (PHY)



● Read and Write buffers in fully associative CAMs, configurable in powers of 2, up to 64 Reads plus 64 Writes



● Support for dynamically changing clock frequency while in self-refresh



● Supports command reordering to optimize memory efficiency



● Supports data reordering, returning critical word first for each command

5.33  MDDR SUBSYSTEM The MDDR subsystem has two interfaces to the DDR. One is an AXI 64-bit bus from the DDR bridge within the HPMS.The other is a multiplexed interface from the FPGA fabric, which can be configured as either a single AXI 64-bit bus or two 32-bit AHB-Lite buses. There is also a 16-bit APB configuration bus, which is used to initialize the majority of the internal registers within the MDDR subsystem after reset. This APB configuration bus is mastered by a master in the FPGA fabric. Support for 3.3 V Single Data Rate DRAMs (SDRAM) can be obtained by instantiating a soft AHB or AXISDRAM memory controller in the FPGA fabric and connecting I/O ports to 3.3 V MSIO.

5.34  FDDR SUBSYSTEM The FDDR subsystem has one interface to the DDR. This is a multiplexed interface from the FPGA fabric, which can be configured as either a single AXI 64-bit bus or two 32-bit AHB-Lite buses. There is also a 16-bit APB configuration bus, which is used to initialize

Microsemi FPGAs

5.33

the majority of the internal registers within the FDDR subsystem after reset. This APB configuration bus can be mastered by a master in the FPGA fabric

5.35  SOFTWARE DEVELOPMENT TOOLS Microsemi’s software eco system for its FPGAs/SoC FPGAs has two offerings: Libero System-on-chip (SoC) Libero Integrated Development Environment (IDE)

5.35.1 LiberoSoC Microsemi’s Libero System on Chip (SoC) is a comprehensive software toolset for designing with Microsemi FPGA and SoC FPGAs. LiberoSoC supports Microsemi’s IGLOO2, SmartFusion2, SmartFusion, IGLOO, ProASIC3 and Fusion families managing the entire design flow from design entry, synthesis and simulation, through place-and-route, timing and power analysis, with enhanced integration of the embedded design flow. LiberoSoC comes with free Gold licence that allows application development using most of the devices offered by Microsemi. LiberoSoC Software offers handful of features that helps in easy and fast develop using Microsemi FPGAs and SoC FPGAs:

● Full suite of integrated design entry tools and methodologies:



o Create your design based on high level design specifications using SystemBuilder



o Configure in built microcontroller components using Configure MSS



o Integrate DirectCore & Customer IPs to generate a Top level hierarchical HDL module with SmartDesign



o Generate your Top level HDL hierarchical module from scratch using Create HDL



● Optimize FPGA performance and area using Synplify Pro ME



● Optimize FPGA performance and area for Simulink® block sets using Synphony Model Compiler ME



● Simulate HDL Behavioral, post-synthesis and post-layout designs with Modelsim ME



● Assign your IO ports, Package pins and view package layout using IO Editor



● Apply timing constraints like Clock, Set-up, Hold, Multicycle path, False path, etc using SmartTime



● Generates comprehensive power estimation of your design using SmartPower



● Program your FPGA using FlashPro and FlashPro express

5.34

Field Programmable Gate Arrays and Applications



● Integrated Microsemi On Chip Debug Tools like SmartDebug and Synopsys Identify tools to debug your design



● Find and fix design errors faster using new Message Wizard



● Optimize IO power consumption with new IO Advisor



● Easy device selection with Project Wizard



● Access MSS/HPMS peripherals using AXI interface

LiberoSoC Project Manager and Smart Design and System builder are shown in Fig. 5.11 and Fig. 5.12

Fig. 5.11 LiberoSoC Project Manager and SmartDesign.

Microsemi FPGAs

5.35

Fig. 5.12 System Builder.

Device Support Family support in LiberoSoC is indicated below: ProASIC3 ProASIC3E ProASIC3L (including RT3PEL) IGLOO IGLOOe IGLOO PLUS FusionSmart Fusion SmartFusion2 IGLOO2

5.35.2 Libero IDE Microsemi’sLibero ® IDE software release is for designing with Microsemi Rad-Tolerant FPGAs, Antifuse FPGAs and Legacy & Discontinued Flash FPGAs and managing the entire design flow from design entry, synthesis and simulation, through place-and-route, timing and power analysis.

5.36

Field Programmable Gate Arrays and Applications

Libero IDE Software Features:

● Powerful project and design flow management



● Full suite of integrated design entry tools and methodologies:



● SmartDesign graphical SoC design creation with automatic abstraction to HDL



● IP Core Catalog and configuration



● User-defined block creation flow for design re-use



● Synplify Pro ME synthesis fully optimizes Microsemi FPGA device performance and area utilization



● Synphony Model Compiler ME performs high-level synthesis optimizations within a Simulink® environment



● Modelsim ME VHDL or Verilog behavioral, post-synthesis and post-layout simulation capability



● Physical design implementation, floor planning, physical constraints, and layout



● Timing-driven and power-driven place-and-route

Fig. 5.13 Libero IDE Project Manager.

Microsemi FPGAs

5.37



● SmartTime environment for timing constraint management and analysis



● SmartPower provides comprehensive power analysis for actual and “what if” power scenarios



● Interface to FlashPro programmers



● Post-route On Chip Debug Tools and Identify ME debugging software for Microsemi flash designs



● Silicon Explorer II debugging software for Microsemiantifuse designs

5.36  PROGRAMMING AND DEBUGGING Microsemi offers many programming options including Silicon Sculptor single-site and multisite device programmers that support all Microsemi device families.

5.36.1. FlashPro Hardware Programmer FlashPro series of hardware programmers saves board space because a single JTAG chain can be used for all JTAG devices. All FlashPro programmers use JEDEC-standard STAPL files, meaning there are no algorithms built into the software. Hence, all FlashPro hardware programmers are supported by each FlashPro software and user interface support. The FlashPro series of programmers can also be used for interactive debug of designs using embedded IP in the flash FPGAs in conjunction with FlashPro’s on-chip debug or Synopsys software. FlashPro’s on-chip debug feature allows access to SmartFusion and Fusion-specific peripherals such as flash memory, analog-to-digital converter (ADC) and other FPGA and design-specific information used for verifying the implemented design.

Key Features

● Supports in-system programming



● Supports IEEE 1149 JTAG programming through STAPL



● Supports IEEE 1532



● Connections to parallel port and USB port available



● Self-test option



● Uses MicrosemiFlashPro software, available as part of Libero ® System on Chip (SoC) or Libero ® Integrated Development Environment (IDE). Also available standalone.



● Free software update

5.38

Field Programmable Gate Arrays and Applications Features comparison of Hardware Programmers Feature

FlashPro51

FlashPro42

FlashProLite

FlashPro Software

Windows only

Windows only

Windows only

FlashPro Express Software

Windows and Linux

Windows only

Windows only

SmartFusion2, IGLOO2, SmartFusion, IGLOO, ProASIC3, FusionRT ProASIC3

SmartFusion2, IGLOO2, SmartFusion, IGLOO, ProASIC3, FusionRT ProASIC3

ProASICPLUS







Supported Devices

JTAG Programming SmartDebug Support





Coming Soon



Available with



SoftConsole v4.0



USB 2.0 (high speed)





USB 1.1





Synopsys Identify Support Soft Console Support

Parallel Port Notes:



1. LiberoFlashPro V11.3 software or later versions are required to use FlashPro5. 2. Libero IDE v8.6 SP1 or FlashPro v8.6 SP1 software or later versions are required to use FlashPro4. 3. FlashPro and FlashProExpress software supports our earlier discontinued hardware programmer FlashPro3 also. For more information, read the product discontinuation notification.

5.36.2 FlashPro5 FlashPro5 is the newest programmer, which along with Windows, supports Linux platforms also such as RedHat Enterprise Linux 6 and CentOS 6, in conjunction with FlashPro Express software. It supports all FPGA devices in SmartFusion2, IGLOO2, SmartFusion, Fusion, IGLOO, ProASIC3 and RT ProASIC3 series. This is completely backward compatible and complies with the requirements specified in EMC Directive 2004/108/IEC and RoHS Directive 2011/65/EU. The minimum version requirements to run FlashPro5 on Windows are LiberoSoC v11.3/FlashPro v11.3 and on Linux are LiberoSoC v 11.4/ FlashPro v11.4.

Microsemi FPGAs

5.39

5.36.3 FlashPro4 FlashPro4 is a programmer supporting all FPGAs in the IGLOO Series and ProASIC3 Series (including RT ProASIC3), SmartFusion and Fusion families, and future generation flash FPGAs. FlashPro4 offers extremely high performance through the use of USB 2.0 and is high-speed compliant for full use of the 480 Mbps bandwidth. Powered exclusively via USB, FlashPro4 provides a VPUMP voltage of 3.3 V for programming these devices. For IGLOO nano FPGAs, programming at 1.2V Core Voltage is supported. For SmartFusion designs, FlashPro4 hardware supports device programming for both the FPGA Libero IDE-generated hardware design as well as software design coming from Microsemi’s Soft Console embedded software design and debug. FlashPro4 connects to any PC with a USB port and operates with USB 1.1 (full-speed) or USB 2.0 (both high-speed and full-speed modes). Multiple FlashPro4 programmers can be connected to a single PC using USB hubs, enabling the end user to set up a small-scale production environment with concurrent ISP occurring across multiple boards and the FlashPro software. FlashPro4 replaces FlashPro3 and FlashPro3X and is completely backward-compatible, supporting additional features such as lower cost, smaller form factor and the latest flash FPGA families. Libero IDE v8.6 SP1 or FlashPro v8.6 SP1 is the minimum software version required to use FlashPro4.

5.36.4 FlashProLite FlashProLite is used exclusively with the ProASIC PLUS family. FlashProLite provides all required programming voltages. The programming connection to the target board is a 26-pin SAMTEC micro header on the target board. A replaceable programming cable is connected to the FlashProLite. FlashProLite is conveniently powered from the target board. If the PC/Laptop does not have parallel port a PCMCIA to parallel port converter can be used as an alternative. To program the APA device via USB, QuickFlash Programmer can be used as alternative. Microsemi has tested the functionality of QuickFlash Programmer.

5.40

Field Programmable Gate Arrays and Applications

FlashPro Hardware Programmer Operating Systems

FlashPro5

FlashPro41,3

FlashPro Lite4

Linux RHEL 6 64-bit



CentOS 6 64-bit



Microsoft Windows Windows 8 64 bit Windows 7 Professional

 

Windows XP Pro SP3 (cumulative)



2



1

Notes:

1. Both x86 32-bit and x64 operating systems are supported for USB.



2. x86 32-bit operating systems only.



3. FlashPro3 was discontinued in 2009 and replaced with FlashPro4. For more information, read the product discontinuation notification.



4. FlashProLite supports only the ProASICPLUS family.

5.37 EVALUATION AND DEVELOPMENT KITS Microsemi offers various evaluation and development kits for all device families. This section covers one of the Evaluation Kit for SmartFusion2 devices.

5.37.1 SmartFusion2 Security Evaluation Kit Microsemi’s SmartFusion®2 Security Evaluation Kit makes it easy to develop secure embedded systems and provides the best-in-world solutions for both Design Security— when protecting your design IP is critical; and DataSecurity—when protecting application data is necessary. The kit provides a cost effective SoC field programmable gate array (FPGA) platform for developing cost-optimized SoC FPGA designs using Microsemi’s SmartFusion2System-on-Chip (SoC) FPGAs. The SmartFusion2 Security Evaluation comes with a large 90K LE device to make it easy to develop transceiver I/O-based FPGA designs to build PCI Express® and Gigabit Ethernet-based systems and also have room for large IP blocks to complement the design. The board is also small form-factor PCIe compliant, which will allow quick prototyping and evaluation using any desktop PC or laptop with a PCIe slot. The kit enables to: ● Evaluate the Data Security features of SmartFusion2 SoC FPGAs including: – Elliptic Curve Cryptography (ECC) – SRAM-PUF (Physically Unclonable Function) – Random Number Generator (RNG) – AES/SHA – Anti-Tamper

Microsemi FPGAs

5.41



● Develop and test PCI Express Gen2 x1 lane designs ● Test the signal quality of the FPGA transceiver using full-duplex SERDES SMA Pairs ● Measure the low power consumption of the SmartFusion2 SoC FPGA ● Quickly create a working PCIe link with the included PCIe Control Plane Demo The board includes an RJ45 interface to 10/100/1000 Ethernet, 512 MB of LPDDR, 64 MB SPI Flash, andUSB-UART connections, as well as I2C, SPI, and GPIO headers. The kit includes a 12 V power supply, but can also be powered through the PCIe edge connector. 50 Mhz GPIO Oscillator LPDDR Header Sw5

LEDs

JTAG Programming Header

Reset Switch

ETM Trace Debug Header

On/Off Switch RVIO/IAR Debug Header

12V Power Supply Input 10/100/1000 Ethernet Rj45 Connector

Tx/Rx Serdes SMA Pairs

USB-UART Terminal

2

IC Header MicroUSB OTG SERDES Reference Clock

SW1

Smart Fusion2 x1 PCIe Edge Connector On Board 125 Mhz

SW4

Current SPI Flash Measurement

Current Measurement LP Crystals

SW2 SW3

Fig. 5.14 SmartFusion2 Security Evaluation Kit.

REFERENCES

1. http://www.df.unipi.it/~flaminio/laboratori/esercitazioni_06_07/doc_fpga/antifusePIB.pdf



2. http://www.microsemi.com/products/fpga-soc/fpga/proasic3-overview



3. http://www.microsemi.com/products/fpga-soc/fpga/igloo

Overview

4. http://www.microsemi.com/products/fpga-soc/fpga/igloo2docs#documents



5. http://www.microsemi.com/products/fpga-soc/soc-fpga/smartfusion#documents



6. http://www.microsemi.com/products/fpga-soc/design-resources/design-software/libero-soc



7. http://www.microsemi.com/products/fpga-soc/security

6 Hardware Description Languages (Verilog and VHDL)

6.1   INTRODUCTION The aim of this chapter is to enable the reader to understand the basic terminology and concepts of the two most widely used hardware description languages, namely Verilog and VHDL. Examples of basic digital circuits are illustrated for better understanding. Many of the examples should be familiar to a hardware engineer in some setting. Although prior knowledge on any programming languages would be of help, the reader has to keep in mind that hardware description languages are different in that it is used to model the flow of data with respect to time in an electronic circuit unlike system programming languages like C, which are procedural (contains a series of steps to be executed). That is, HDLs include explicit constructs for expressing concurrency of hardware circuits. We will first start with Verilog HDL, followed by VHDL.

6.2  VERILOG HDL Verilog is one of the Hardware Description Languages, used to describe a digital logic circuit at different abstraction levels. Most commonly used levels are behavior level, register-transfer level and gate level. Behavior level describes a digital system using functions, tasks and always blocks. The statements inside these elements are executed sequentially. Register-Transfer level models a design by specifying the flow of data between registers and the logical operations implemented on those data. Gate level modeling uses predefined primitives like AND, OR, etc. to model the operations.

6.2

Field Programmable Gate Arrays and Applications

6.2.1 Terminology and Basic Concepts of Verilog (a) Module declaration: The basic design block in Verilog is called module. As seen from Example 1, the module declaration has 3 parts – module name declaration, port list and functionality. The module declaration starts with an identifier, lists the input and output ports of the design followed by the implementation of the design functionality and ends with endmodule statement. Example 1 describes how to model an inverter in Verilog. module ex1_inverter ( data_in, data_out ); input data_in; output data_out; assign data_out = ~data_in; endmodule Example 1 Inverter (b) Comments: Comments are descriptions added to improve the readability of a code. These are also required for appropriate maintenance of the code. Examples of declaring comments are shown below. // single line comment starts with two slashes in Verilog /* A multi-line comment starts with a slash followed by an asterisk And ends with an asterisk followed by a second slash */ (c) Ports: Ports are the input/output interfaces of a design which are used to connect to other modules. The ports are to be declared with one of the keywords - input, output or inout, which indicate its direction. The port directions can be declared in two ways—either after port declaration like Example 1 or along with the port declaration as shown in the Example 2. (d) Parameter: Parameters are constant values declared within a module structure. The declaration starts with keyword parameter followed by the parameter name/ identifier with its default value assignment. Values can be passed for each parameter while instantiating a module to override the default values during compilation or by using defparam statement.Example 2 shows an example where parameter is declared in multiplexer design. module ex2_mux #(

//Parameter Declaration



parameter DATA_WIDTH = 2

)

Hardware Description Languages (Verilog and VHDL)

6.3

(

//Port Declaration



input wire [DATA_WIDTH-1:0] data_in1,



input wire [DATA_WIDTH-1:0] data_in2,



input wire sel_in,



output wire [DATA_WIDTH-1:0] data_out

); //Functionality assign data_out = ( sel_in == 1 ) ? data_in2 : data_in1; endmodule Example 2 Multiplexer (e) Operators and Operands: Verilog provides various types of operators – these are classified into different categories as shown in the table below. Operands can be of different data types. Operator type

Syntax

Arithmetic

A*B

Multiply A and B

A/B

Divide A by B

A+B

Add A, B

Bitwise

Explanation

+8

Unary operation indicating positive sign

–7

Unary operation indicating negative sign

A–B

Subtract B from A

A%B

Modulus

~A

Negate A bitwise

A&B

Bitwise and of A and B

A|B

Bitwise or of A and B

A^B

Bitwise xor of A and B

A ^~ B Concatenation

C = { A , B}

Conditional (three operands)

B = (A==1)? C : D

Equality

A == B !=

Bitwise xnor of A and B If A=2’b10 and B=2’b01, C=4’b1001 If A equals 1, then result B is C else D. Logical 1 if A is equal to B, else Logical 0 Logical 1 if A is not equal to B, else Logical 0

6.4

Field Programmable Gate Arrays and Applications Logical

!A

Logical negation; Unary operator;not of A – produces 1-bit value; If A=1, result is 0

A && B

Logical and If A=0 and B=1, result is 0

A || B

Logical or; If A=0 and B=1, result is 1

Reduction (Unary operand; result is 1-bit)

&A

and on bits of A

|A

or on bits of A

^A

xor on bits of A

Relational

A>B

Result is logical 1 if A is greater than B

A= B

Result is logical 1 if A is greater than or equal to B

A > 1

Result is A shifted right by 1 bit

A