Power Electronics and Design 2004 1-4020-8076-X

Книга Power Electronics and Design 2004 Power Electronics and Design 2004Книги Электротехника и связь Автор: Kluwer Ultr

300 67 3MB

English Pages 290 Year 1988

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents......Page 6
Contributors......Page 8
Preface......Page 10
Introduction......Page 14
1.1 INTRODUCTION......Page 18
1.2 POWER CONSUMPTION BECOMES CRITICAL......Page 19
1.3 TRADITIONAL APPROACHES TO POWER REDUCTION......Page 21
1.4 ZERO-VTH DEVICES......Page 22
1.5 DESIGN APPROACHES TO POWER REDUCTION......Page 27
Acknowledgement......Page 35
References......Page 36
2.1 INTRODUCTION......Page 38
2.2 OPTICAL INTERCONNECT TECHNOLOGY......Page 39
2.3 AN OPTICAL CLOCK DISTRIBUTION NETWORK......Page 40
2.4 QUANTITATIVE POWER COMPARISON BETWEEN ELECTRICAL AND OPTICAL CLOCK DISTRIBUTION NETWORKS......Page 46
2.5 OPTICAL NETWORK ON CHIP......Page 50
References......Page 55
3.1 INTRODUCTION......Page 57
3.2 SINGLE ELECTRONICS......Page 58
3.3 MOLECULAR ELECTRONICS......Page 68
3.4 DISCUSSION......Page 69
References......Page 70
4.1 INTRODUCTION......Page 73
4.2 LEAKAGE MODEL AND CHARACTERISTICS......Page 77
4.3 SUBTHRESHOLD LEAKAGE REDUCTION......Page 78
4.4 LEAKAGE REDUCTION METHOD FOR BOTH SUBTHRESHOLD AND GATE LEAKAGE CURRENT......Page 84
4.5 RESULTS......Page 91
4.6 CONCLUSIONS......Page 98
References......Page 99
5.1 INTRODUCTION......Page 101
5.2 BACKGROUND......Page 103
5.3 PARTITIONED SHARED MEMORY ARCHITECTURE......Page 106
5.4 PERFORMANCE AND ENERGY CHARACTERIZATION......Page 108
5.5 EXPLORATION FRAMEWORK......Page 112
5.6 EXPERIMENTAL RESULTS......Page 114
5.7 CONCLUSIONS......Page 117
References......Page 118
6.1 INTRODUCTION......Page 120
6.2 BACKGROUND – TUNABLE CACHE PARAMETERS......Page 121
6.3 A SELF-TUNING LEVEL ONE CACHE ARCHITECTURE......Page 122
6.4 AUTOMATIC TUNING OF A TWO-LEVEL CACHE ARCHITECTURE - THE TCAT......Page 126
6.5 USING A VICTIM BUFFER IN AN APPLICATION SPECIFIC MEMORY HEIRARCHY......Page 130
6.6 LOW STATIC-POWER FREQUENT-VALUE DATA CACHES......Page 133
References......Page 138
7. REDUCING ENERGY CONSUMPTION IN CHIP MULTIPROCESSORS USING WORKLOAD VARIATIONS......Page 140
7.1 INTRODUCTION......Page 141
7.2 CHIP MULTIPROCESSOR ARCHITECTURE AND EXECUTION MODEL......Page 143
7.3 LOAD IMBALANCE IN PARALLEL EXECUTION......Page 144
7.4 COMPILER SUPPORT......Page 147
7.5 ADDITIONAL OPTIMIZATIONS......Page 149
7.6 EXPERIMENTS......Page 151
References......Page 155
8.1 INTRODUCTION......Page 158
8.2 ENERGY EFFICIENT HETEROGENEOUS SOC’S......Page 160
8.3 ULTRA LOW POWER COMPONENTS......Page 163
8.4 DESIGN & ARCHITECTURE EXPLORATION......Page 166
8.5 DOMAIN-SPECIFIC CO DESIGN ENVIRONMENTS......Page 167
References......Page 171
9.1 INTRODUCTION......Page 173
9.2 TRANSFORMATIONS OVERVIEW......Page 174
9.3 METHODOLOGY......Page 176
9.4 CASE STUDIES......Page 178
9.5 EXPERIMENTAL RESULTS......Page 186
9.6 CONCLUSIONS......Page 187
References......Page 188
10.1 INTRODUCTION......Page 189
10.2 PRELIMINARIES......Page 190
10.3 BACKLIGHT AND TRANSMITTANCE SCALING......Page 202
References......Page 213
11.1 INTRODUCTION......Page 215
11.2 REMOTE STORAGE SPACE......Page 216
11.3 SWAP DEVICES......Page 217
11.4 EXPERIMENTAL SETUP......Page 219
11.5 CHARACTERIZATION OF SWAPPING COSTS......Page 220
11.6 POWER OPTIMIZATION......Page 221
11.7 CASE STUDY......Page 225
11.8 CONCLUSION......Page 227
References......Page 228
12.1 INTRODUCTION......Page 231
12.2 PHYSICAL LAYER......Page 232
12.3 SYSTEM INTERCONNECT ARCHITECTURE......Page 234
12.4 DATA LINK LAYER......Page 236
12.5 NETWORK LAYER......Page 238
12.6 TRANSPORT LAYER......Page 240
12.7 SYSTEM AND APPLICATION LAYERS......Page 242
12.8 APPLICATION-SPECIFIC NETWORKS-ON-CHIP......Page 243
12.9 CONCLUSIONS......Page 247
References......Page 248
13. SYSTEM LEVEL POWER MODELING AND SIMULATION OF HIGH-END INDUSTRIAL NETWORK-ON-CHIP......Page 250
13.2 BACKGROUND......Page 251
13.3 ON-CHIP NETWORK: STBUS INTERCONNECT......Page 252
13.4 ENABLING ENERGY EXPLORATION FOR NOC......Page 253
13.5 STBUS ENERGY MODEL......Page 257
13.6 OPTIMAL DESIGN OF EXPERIMENTS......Page 258
13.7 STBUS POWER MODEL VALIDATION AND EXPERIMENTAL RESULTS......Page 261
13.8 LOW EFFORT, HIGH ACCURACY POWER MACRO MODELING......Page 265
13.9 CONCLUSIONS......Page 269
References......Page 270
14. ENERGY-AWARE ADAPTATIONS FOR END-TO-END VIDEOSTREAMINGTOMOBILEHANDHELD DEVICES......Page 272
14.1 MOTIVATION......Page 273
14.2 RELATED WORK......Page 274
14.3 SYSTEM MODEL......Page 277
14.4 HARDWARE/ARCHITECTURAL LEVEL OPTIMIZATIONS......Page 278
14.5 OS/MIDDLEWARE LEVEL OPTIMIZATIONS......Page 281
14.6 APPLICATION LAYER ADAPTATION......Page 286
14.7 SUMMARY......Page 287
References......Page 288
Recommend Papers

Power Electronics and Design 2004
 1-4020-8076-X

  • Commentary
  • 24906
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

TLFeBOOK

TLFeBOOK

ULTRA LOW-POWER ELECTRONICS AND DESIGN

TLFeBOOK

This page intentionally left blank

TLFeBOOK

Ultra Low-Power Electronics and Design Edited by

Enrico Macii Politecnico di Torino, Italy

KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW

TLFeBOOK

eBook ISBN: Print ISBN:

1-4020-8076-X 1-4020-8075-1

©2004 Springer Science + Business Media, Inc.

Print ©2004 Kluwer Academic Publishers Dordrecht All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Springer's eBookstore at: and the Springer Global Website Online at:

http://www.ebooks.kluweronline.com http://www.springeronline.com

TLFeBOOK

Contents

CONTRIBUTORS…………………………………………………………………….VII PREFACE…………………………………………………………….………………...IX INTRODUCTION……………………………………………………………………XIII

1.

ULTRA-LOW-POWER DESIGN: DEVICE AND LOGIC DESIGN APPROACHES……………………………………….………………………………….1

2.

ON-CHIP OPTICAL INTERCONNECT FOR LOW-POWER……………………21

3.

NANOTECHNOLOGIES FOR LOW POWER……………….…………………….40

4.

STATIC LEAKAGE REDUCTION THROUGH SIMULTANEOUS Vt/Tox AND STATE ASSIGNMENT………………………………………………….56

5.

ENERGY-EFFICENT SHARED MEMORY ARCHITECTURES FOR MULTI-PROCESSOR SYSTEMS-ON-CHIP…………………………………...…..84

6.

TUNING CACHES TO APPLICATIONS FOR LOW-ENERGY EMBEDDED SYSTEMS……………………………………………………………………………..103

7.

REDUCING ENERGY CONSUMPTION IN CHIP MULTIPROCESSORS USING WORKLOAD VARIATIONS……………………………………………....123

8.

ARCHITECTURES AND DESIGN TECHNIQUES FOR ENERGY EFFICIENT EMBEDDED DSP AND MULTIMEDIA PROCESSING……….….141

9.

SOURCE-LEVEL MODELS FOR SOFTWARE POWER OPTIMIZATION…..156

10. TRANSMITTANCE SCALING FOR REDUCING POWER DISSIPATION OF A BACKLIT TFT-LCD…………………………………………………………..172

TLFeBOOK vi 11. POWER-AWARE NETWORK SWAPPING FOR WIRELESS PALMTOP PCS…………………………………………………………………………………… 198 12. ENERGY EFFICIENT NETWORK-ON-CHIP DESIGN…………………………214 13. SYSTEM LEVEL POWER MODELING AND SIMULATION OF HIGH-END INDUSTRIAL NETWORK-ON-CHIP……………………………….233 14. ENERGY AWARE ADAPTATIONS FOR END-TO-END VIDEO STREAMING TO MOBILE HANDHELD DEVICES…………………………….255

TLFeBOOK vii

Contributors

A. Acquaviva L. Benini D. Bertozzi D. Blaauw A. Bogliolo A. Bona C. Brandolese W.C. Cheng G. De Micheli N. Dutt W. Fornaciari F. Gaffiot J. Gautier A. Gordon-Ross R. Gupta C. Heer M. J. Irwin I. Kadayif M. Kandemir B. Kienhuis I. Kolcu E. Lattanzi D. Lee A. Macii S. Mohapatra I. O’Connor K. Patel M. Pedram C. Pereira C. Piguet M. Poncino F. Salice P. Schaumont U. Schlichtmann D. Sylvester

Università di Urbino Università di Bologna Università di Bologna University of Michigan, Ann Arbor Università di Urbino STMicroelectronics Politecnico di Milano University of Southern California Stanford University University of California, Irvine Politecnico di Milano Ecole Centrale de Lyon CEA-DRT–LETI/D2NT–CEA/GRE University of California, Riverside University of California, San Diego Infineon Technologies AG Pennsylvania State University Canakkale Onsekiz Mart University Pennsylvania State University Leiden UMIST Università di Urbino University of Michigan, Ann Arbor Politecnico di Torino University of California, Irvine Ecole Centrale de Lyon Politecnico di Torino University of Southern California University of California, San Diego CSEM Università di Verona Politecnico di Milano University of California, Los Angeles Technische Universität München University of Michigan, Ann Arbor

TLFeBOOK viii

F. Vahid N. Venkatasubramanian I. Verbauwhede N. Vijaykrishnan V. Zaccaria R. Zafalon B. Zhai C. Zhang

University of California, Riverside and University of California, Irvine University of California, Irvine University of California, Los Angeles and K.U.Leuven Pennsylvania State University STMicroelectronics STMicroelectronics University of Michigan, Ann Arbor University of California, Riverside

TLFeBOOK ix

Preface

Today we are beginning to have to face up to the consequences of the stunning success of Moore’s Law, that astute observation by Intel’s Gordon Moore which predicts that integrated circuit transistor densities will double every 12 to 18 months. This observation has now held true for the last 25 years or more, and there are many indications that it will continue to hold true for many years to come. This book appears at a time when the first examples of complex circuits in 65nm CMOS technology are beginning to appear, and these products already must take advantage of many of the techniques to be discussed and developed in this book. So why then should our increasing success at miniaturization, as evidenced by the success of Moore’s Law, be creating so many new difficulties in power management in circuit designs? The principal source and the physical origin of the problem lies in the differential scaling rates of the many factors that contribute to power dissipation in an IC – transistor speed/density product goes up faster than the energy per transition comes down, so the power dissipation per unit area increases in a general sense as the technology evolves. Secondly, the “natural” transistor switching speed increase from one generation to the next is becoming downgraded due to the greater parasitic losses in the wiring of the devices. The technologists are offsetting this problem to some extent by introducing lower permittivity dielectrics (“lowk”) and lower resistivity conductors (copper) – but nonetheless to get the needed circuit performance, higher speed devices using techniques such as silicon-on-insulator (SOI) substrates, enhanced carrier mobility (“strained silicon”) and higher field (“overdrive”) operation are driving power densities ever upwards. In many cases, these new device architectures are increasingly leaky, so static power dissipation becomes a major headache in power management, especially for portable applications.

TLFeBOOK x A third factor is system or application driven – having all this integration capability available encourages us to combine many different functional blocks into one system IC. This means that in many cases, a large part of the chip’s required functionality will come from software executing on and between multiple on-chip execution units; how the optimum partitioning between hardware architecture and software implementation is obtained is a vast subject, but clearly some implementations will be more energy efficient than others. Given that, in many of today’s designs, more than 50% of the total development effort is on the software that runs on the chip, getting this partitioning right in terms of power dissipation can be critical to the success of (or instrumental in the failure of!) the product. A final motivation comes from the practical and environmental consequences of how we design our chips – state-of-the-art high performance circuits are dissipating up to 100W per square centimeter – we only need 500 square meters of such silicon to soak up the output of a small nuclear power station. A related argument, based on battery lifetime, shows that the “converged” mobile phone application combining telephony, data transmission, multimedia and PDA functions that will appear shortly is demanding power at the limit of lithium-ion or even methanol-water fuel cell battery technology. We have to solve the power issue by a combination of design and process technology innovations; examples of current approaches to power management include multiple transistor thresholds, triple gate oxide, dynamic supply voltage adjustment and memory architectures. Multiple transistor thresholds is a technique, practiced for several years now, that allows the designer to use high performance (low Vt) devices where he needs the speed, and low leakage (high Vt) devices elsewhere. This benefits both static power consumption (through less sub-threshold leakage) and dynamic power consumption (through lower overall switching currents). High threshold devices can also be used to gate the supplies to different parts of the circuit, allowing blocks to be put to sleep until needed. Similar to the previous technique, triple gate oxide (TGO) allows circuit partitioning between those parts that need performance and other areas of the circuit that don’t. It has the additional benefit of acting on both sub-threshold leakage and gate leakage. The third oxide is used for I/O and possibly mixed-signal. It is expected over the next few years that the process technologists will eventually replace the traditional silicon dioxide gate dielectric of the CMOS devices by new materials such as rare earth oxides with much higher dielectric constants that will allow the gate leakage problem to be completely suppressed.

TLFeBOOK xi Dynamic supply voltage adjustment allows the supply voltage to different blocks of the circuit to be adjusted dynamically in response to the immediate performance needs for the block – this very sophisticated technique will take some time to mature. Finally, many, if not most, advanced devices use very large amounts of memory for which the contents may have to be maintained during standby; this consumes a substantial amount of power, either through refreshing dynamic RAM or through the array leakage for static RAM. Traditional nonvolatile memories have writing times that are orders of magnitude too slow to allow them to substitute these on-chip memories. New developments, such as MRAM, offer the possibility of SRAM-like performance coupled with unlimited endurance and data retention, making them potential candidates to replace the traditional on-chip memories and remove this component of standby power consumption. Most of the approaches to power management described briefly above will be employed in 65nm circuits, but there are a lot more good ideas waiting to be applied to the problem, many of which you will find clearly and concisely explained in this book.

Mike Thompson, Philippe Magarshack STMicroelectronics, Central R&D Crolles, France

TLFeBOOK

This page intentionally left blank

TLFeBOOK xiii

Introduction ULTRA LOW-POWER ELECTRONICS AND DESIGN Enrico Macii Politecnico di Torino

Power consumption is a key limitation in many electronic systems today, ranging from mobile telecom to portable and desktop computing systems, especially when moving to nanometer technologies. Power is also a showstopper for many emerging applications like ambient intelligence and sensor networks. Consequently, new design techniques and methodologies are needed to control and limit power consumption. The 2004 edition of the DATE (Design Automation and Test in Europe) conference has devoted an entire Special Focus Day to the power problem and its implications on the design of future electronic systems. In particular, keynote presentations and invited talks by outstanding researchers in the field of low-power design, as well as several technical papers from the regular conference sessions have addressed the difficulties ahead and advanced strategies and principles for achieving ultra low-power design solutions. Purpose of this book is to integrate into a single volume a selection of these contributions, duly extended and transformed by the authors into chapters proposing a mix of tutorial material and advanced research results. The manuscript consists of a total of 14 chapters, addressing different aspects of ultra low-power electronics and design. Chapter 1 opens the volume by providing an insight to innovative transistor devices that are capable of operating with a very low threshold voltage, thus contributing to a significant reduction of the dynamic component of power consumption. Solutions for limiting leakage power during stand-by mode are also discussed. The chapter closes with a quick overview of low-power design techniques applicable at the logic level, including multi-Vdd, multi-Vth and hybrid approaches. Chapter 2 focuses on the problem of reducing power in the interconnect network by investigating alternatives to traditional metal wires. In fact, according to the 2003 ITRS roadmap, metallic interconnections may not be able to provide enough transmission speed and to keep power under control for the upcoming technology nodes (65nm and below). A possible solution, explored in the chapter, consists of the adoption of optical interconnect networks. Two applications are presented: Clock distribution and data communication using wavelength division multiplexing.

TLFeBOOK xiv In Chapter 3, the power consumption problem is faced from the technology point of view by looking at innovative nano-devices, such as single-electron or few-electron transistors. The low-power characteristics and potential of these devices are reviewed in details. Other devices, including carbon nanotube transistors, resonant tunnelling diodes and quantum cellular automata are also treated. Chapter 4 is entirely dedicated to advanced design methodologies for reducing sub-threshold and gate leakage currents in deep-submicron CMOS circuits by properly choosing the states to which gates have to be driven when in stand-by mode, as well as the values of the threshold voltage and of the gate oxide thickness. The authors formulate the optimization problem for simultaneous state/Vth and state/Vth/Tox assignments under delay constraints and propose both an exact method for its optimal solution and two practical heuristics with reasonable run-time. Experimental results obtained on a number of benchmark circuits demonstrate the viability of the proposed methodology. Chapter 5 is concerned with the issue of minimizing power consumption of the memory subsystem in complex, multi-processor systems-on-chip (MPSoCs), such as those employed in multi-media applications. The focus is on design solutions and methods for synthesizing memory architectures containing both single-ported and multi-ported memory banks. Power efficiency is achieved by casting the memory partitioning design paradigm to the case of heterogeneous memory structures, in which data need to be accessed in a shared manner by different processing units. Chapter 6 addresses the relevant problem of minimizing the power consumed by the cache hierarchy of a microprocessor. Several design techniques are discussed, including application-driven automatic and dynamic cache parameter tuning, adoption of configurable victim buffers and frequent-value data encoding and compression. Power optimization for parallel, variable-voltage/frequency processors is the subject of Chapter 7. Given a processor with such an architecture, this chapter investigates the energy/performance tradeoffs that can be spanned in parallelizing array-intensive applications, taking into account the possibility that individual processing units can operate at different voltage/frequency levels. In assigning voltage levels to processing units, compiler analysis is used to reveal hetherogeneity between the loads of the different units in parallel execution.

TLFeBOOK xv Chapter 8 provides guidelines for the design and implementation of DSP and multi-media applications onto programmable embedded platforms. The RINGS architecture is first introduced, followed by a detailed discussion on power-efficient design of some of the platform components, namely, the DSPs. Next, design exploration, co-design and co-simulation challenges are addressed, with the goal of offering to the designers the capability of including into the final architecture the right level of programmability (or reconfigurability) to guarantee the required balance between system performance and power consumption. Chapter 9 targets software power minimization through source code optimization. Different classes of code transformations are first reviewed; next, the chapter outlines a flow for the estimation of the effects that the application of such transformations may have on the power consumed by a software application. At the core of the estimation methodology there is the development of power models that allow the decoupling of processorindependent analysis from all the aspects that are tightly related to processor architecture and implementation. The proposed approach to software power minimization is validated through several experiments conducted on a number of embedded processors for different types of benchmark applications. Reduction of the power consumed by TFT liquid crystal displays, such as those commonly used in consumer electronic products is the subject of Chapter 10. More specifically, techniques for reducing power consumption of transmissive TFT-LCDs using a cold cathode fluorescent lamp backlight are proposed. The rationale behind such techniques is that the transmittance function of the TFT-LCD panel can be adjusted (i.e., scaled) while meeting an upper bound on a contrast distortion metric. Experimental results show that significant power savings can be achieved for still images with very little penalty in image contrast. Chapter 11 addresses the issue of efficiently accessing remote memories from wireless systems. This problem is particularly important for devices such as palmtops and PDAs, for which local memory space is at a premium and networked memory access is required to support virtual memory swapping. The chapter explores performance and energy of network swapping in comparison with swapping on local microdrives and FLASH memories. Results show that remote swapping over power-manageable wireless network interface cards can be more efficient than local swapping and that both energy and performance can be optimized by means of poweraware reshaping of data requests. In other words, dummy data accesses can be preemptively inserted in the source code to reshape page requests in order to significantly improve the effectiveness of dynamic power management.

TLFeBOOK xvi Chapter 12 focuses on communication architectures for multi-processor SoCs. The network-on-chip (NoC) paradigm is reviewed, touching upon several issues related to power optimization of such kinds of communication architectures. The analysis goes on a layer-by-layer basis, and particular emphasis is given to customized, domain-specific networks, which represent the most promising scenario for communication-energy minimization in multi-processor platforms. Chapter 13 provides a natural follow up to the theory of NoCs covered in the previous chapter by describing an industrial application of this type of communication architecture. In particular, the authors introduce an innovative methodology for automatically generating the power models of a versatile and parametric on-chip communication IP, namely the STBus by STMicroelectronics. The methodology is validated on a multi-processor hardware platform including four ARM cores accessing a number of peripheral targets, such as SRAM banks, interrupt slaves and ROM memories. The last contribution, offered in Chapter 14, proposes an integrated end-toend power management approach for mobile video streaming applications that unifies low-level architectural optimizations (e.g., CPU, memory, registers), OS power-saving mechanisms (e.g., dynamic voltage scaling) and adaptive middleware techniques (e.g., admission control, trans-coding, network traffic regulation). Specifically, interaction parameters between the different levels are identified and optimized to achieve a reduction in the power consumption. Closing this introductory chapter, the editor would like to thank all the authors for their effort in producing their outstanding contributions in a very short time. A special thank goes to Mike Thompson and Philippe Magarshack of STMicroelectronics for their keynote presentation at DATE 2004 and for writing the foreword to this book. The editor would also like to acknowledge the support offered by Mark De Jongh and the Kluwer staff during the preparation of the final version of the manuscript. Last, but not least, the editor is grateful to Agnieszka Furman for taking care of most of the “dirty work” related to book editing, paging and preparation of the camera-ready material.

TLFeBOOK 1

Chapter 1 ULTRA-LOW-POWER DESIGN: DEVICE AND LOGIC DESIGN APPROACHES

Christoph Heer1 and Ulf Schlichtmann2 1

Infineon Technologies AG; 2Technische Universität München

Abstract

Power consumption increasingly is becoming the bottleneck in the design of ICs in advanced process technologies. We give a brief introduction into the major causes of power consumption. Then we report on experiments in an advanced process technology with ultra-low threshold voltage (Vth) devices. It turns out that in contrast to older process technologies, this approach increasingly is becoming less suitable for industrial usage in advanced process technologies. Following, we describe methodologies to reduce power consumption by optimizations in logic design, specifically by utilizing multiple levels of supply voltage Vdd and threshold voltage Vth. We evaluate them from an industrial product development perspective. We also give a brief outlook to proposals on other levels in the design flow and to future work.

Keywords:

Low-power design, dynamic power reduction, leakage power reduction, ultralow-Vth devices, multi-Vdd, multi-Vth, CVS

1.1

INTRODUCTION

The progress of silicon process technology marches on relentlessly. As predicted by Gordon Moore decades ago, silicon process technology continues to achieve improvements at an astonishing pace [1]. The number of transistors that can be integrated on a single IC approximately doubles every 2 years [2,3]. This engineering success has created innovative new industries (e.g. personal computers and peripherals, consumer electronics) and revolutionized other industries (e.g. communications). Today, however, it is becoming increasingly difficult to achieve improvements at the pace that the industry has become accustomed to. More and more technical challenges appear that require increasing resources to be

TLFeBOOK 2 solved [4]. One such problem is the increasing power consumption of integrated circuits. It becomes even more critical as an increasing number of today’s high-volume consumer products are battery-powered. In the following, we will consider the sources of power consumption and their development over time. We will show why reduction of power consumption increasingly is becoming critical to product success and will review traditional approaches in Sections 1.1 and 1.2. In Section 1.3 we will then analyze a potential solution based on introduction of an optimized transistor with a very low threshold voltage Vth. Thereafter, we will present and discuss logic-level design optimizations for power reduction in Section 1.4. Also, we will briefly point out potential optimizations on higher levels. Our observations are made from the perspective of industrial IC product development where technical optimizations must be carefully evaluated against the cost associated with achieving and implementing them. Mostly, the presented methodologies are already being utilized in leading-edge industrial ICs.

1.2

POWER CONSUMPTION BECOMES CRITICAL

Depending on the type of end-product and its application, different aspects of power consumption are the primary concern: dynamic power or leakage power. Reduction of dynamic power consumption is a concern for almost all IC products today. For battery-powered products, reduced power consumption directly results in longer operating time for the product, which is a very desirable characteristic. Even for non-battery-powered products, reduced power consumption brings many advantages, such as reduced cost because of cheaper packaging or higher performance because of lower temperatures. Finally, reduced power consumption often leads to lower system cost (no fans required; no or cheaper air conditioning for data / telecom center etc.). Dynamic power consumption is caused by the charging and discharging of capacitances when a circuit switches. In addition, during switching a short-circuit current flows, but this current is typically much smaller, and will therefore be neglected in the following. The dynamic current due to capacitance charging and discharging is determined by the following wellknown relationship:

Pdyn ~ f • CL • Vdd2

TLFeBOOK 3 Based on constant electrical field scaling, Vdd and CL each are reduced by 30% in each successive process generation. Also, delay decreases by 30%, resulting in 43% increase in frequency. Therefore, the dynamic power consumption per device is reduced by 50% from one process generation to the next. As scaling also doubles the number of devices that can be implemented in a given die area, dynamic power consumption per area should stay roughly identical. However, historically frequency has increased by significantly more than 43% from one process generation to the next (e.g. in microprocessors, it has roughly doubled, due to architectural optimizations, such as deeper pipeline stages), and in addition, die sizes have increased with each new process technology, further increasing the power consumption, due to an increased number of active devices [5]. For these reasons, dynamic power consumption has increased exponentially, as is shown in Figure 1-1 for the example of microprocessors. Reduction of leakage power consumption today is primarily a concern for products that are powered by battery and spend most of their operating hours in some type of standby mode, such as cell phones. For many process generations, however, leakage has increased roughly by a factor of 10 for every two process nodes [6]. Due to this dramatic increase with newer process generations, leakage is becoming a significant contribution to overall IC power consumption even in normal operating mode, as can be seen in Figure 1-1 as well. Leakage was estimated to increase from 0.01% of overall power consumption in a 1.0µm technology, to 10% in a 0.1µm technology [6]. For a microprocessor, Intel estimated leakage power consumption at more than 50W for a 100nm technology node[3]. This figure probably is extreme, and leakage depends strongly on a number of factors, such as threshold voltage (Vth) of the transistor, gate oxide thickness and environmental operating conditions (supply voltage Vdd, temperature T). Nevertheless, for an increasing number of products leakage power consumption is turning into a problem, even when they are not battery-powered.

TLFeBOOK 4

Figure 1-1. Development of dynamic and leakage power consumption over time [3,7]

1.3

TRADITIONAL APPROACHES TO POWER REDUCTION

As outlined above, dynamic power consumption is governed by:

Pdyn ~ f • CL • Vdd2 with f denoting the switching frequency, CL the capacitance being switched, and Vdd the supply voltage . This formula immediately identifies the key levers to reduce dynamic power: • Reduce operating frequency • Reduce driven capacity • Reduce supply voltage Traditionally, reduction in supply voltage Vdd has been the most often followed strategy to reduce power consumption. Unfortunately, lowering Vdd has the side effect of reducing performance as well, primarily because gate

TLFeBOOK 5 overdrive (the difference between Vdd and Vth) diminishes if the threshold voltage Vth is kept constant. Based on the alpha power law model [8], the delay td of an inverter is given by

td =

CL • Vdd (Vdd − Vth )α

with α denoting a fitting constant. As supply voltages are driven below 1.0V, the reductions in gate overdrive are more pronounced than previously. In addition, newer process technologies give significantly less of a performance boost compared to the previous process generation than has traditionally been the case, therefore a further reduction in performance is highly undesirable. Finally, the power reduction achieved by moving to a new process generation has trended down over time, since supply voltages have been scaled by increasingly less than the 30% prescribed by the constant electrical field scaling paradigm. Consequently, more advanced approaches are required. In the following, our main focus will be on dynamic power consumption, but we will also consider leakage power consumption.

1.4

ZERO-VTH DEVICES

The concept of zero-Vth devices was developed in the mid 90-ies. It overcomes the diminishing gate overdrive by radically setting the threshold voltage of the active devices to zero. It has been shown [9], that the optimum power dissipation is obtained, if Pleak (leakage contribution) is in the same order of magnitude as Pdyn (dynamic switching contribution). This can be achieved for transistors with Vth close to 0V (‘zero-Vth transistor‘). Therefore the devices will never completely switch off. But from an overall power perspective the gain in active power consumption is tremendous. Using these transistors the supply voltage of 130nm circuits can be reduced to values below 0.3V to achieve a Pdyn reduction by 90% without performance degradation. Alternatively, the circuit can be operated at twice the clock frequency when keeping the supply voltage at 1.2V, as shown in Figure 1-2. The corresponding Ion/Ioff-ratio for the zero-Vth transistor is about 10-100 instead of >105 for the standard transistor options. During standby, the complete circuits are switched-off or are set into a low leakage mode to cope with the very high leakage contribution. The low leakage mode is achieved by ‘active well’ control, which denotes the use of the body effect. The well potentials of the PFETs and NFETs are altered to change Vth. To achieve a lower leakage current, the absolute value of Vth is increased by

TLFeBOOK 6 reverse back biasing: a negative well-to-source voltage Usb is used. Therefore voltages below Vss for NFETs and above Vdd for PFETs have to be generated. Furthermore, active well is required to compensate the lot-tolot or wafer-to-wafer variations of Vth. The initial ‘zero-Vth’ concept assumed constant junction temperatures Tj below 40°C. For some high-end computer equipment the costs for active chip cooling are affordable to achieve this junction temperature. But this is definitely not the case for cost-driven consumer products. For this application domain Tj in active mode ranges between 85°C and 125°C, and in some applications the specified worst-case ambient temperature is even 80°C. The proposed zero-Vth concept is therefore not applicable without changes and adaptations.

Figure 1-2. Simulated performance curves of transistors with ultra-low Vth. Compared to lowVth, either a performance gain or a Vdd reduction can be achieved. Curves for reg-Vth and high-Vth transistors of a 130 nm technology are included

A more conservative approach with respect to zero-Vth, but still aggressive compared to current devices, had to be chosen. An ultra-low Vth device with about 150mV threshold voltage proved to be the best

TLFeBOOK 7 compromise between zero-Vth and current low-Vth of about 300mV within a 130 nm CMOS technology. To identify the optimal choice of Vth and Vdd in combination with the higher junction temperature Tj, simulations with modified parameters of the 130nm low-Vth transistor are performed. In Figure 1-3 the power dissipation is shown for a high activity circuit (ı= 20%) with various options for the transistor threshold voltages: reg-Vth, low-Vth, and transistors whose Vth are reduced to 200mV, 150mV, 100mV and 50mV. The reg-Vth circuit performance was used as the reference (Vdd = 1.5V), and the supply voltages for the other transistor options were reduced to meet that reference performance.

3,5E-05

V dd= 1.5V

T= 125°C

Power [W ]

3,0E-05 2,5E-05

fast

nom = target

2,0E-05

slow

1.2V

1,5E-05

1.0V

0.6V

1,0E-05

0.8V

0.7V

150mV

100mV

5,0E-06 0,0E+00 reg-Vt

low-Vt

200mV

50mV

Device Option / Vth (mV) Figure 1-3. Power dissipation at T=125°C in active mode for several transistor options with reduced Vth. A minimum power consumption is achieved at 150mV Vth. (At T=55°C the minimum is achieved for the same option but process variations show less impact).

The reduced supply voltage leads to lower overall active power consumption Pactive. A minimum power consumption is reached at Vth = 150mV. With even lower threshold voltages Pactive starts to increase again because of the increase of the leakage current. The steep rise of Pactive originates from the exponential relation between Vth and leakage current. As a rule of thumb a 100mV reduction of the threshold voltage allows for a Vdd

TLFeBOOK 8 reduction by § 0.15V but on the other hand results in a tenfold increase of the leakage current. From Figure 1-3 also the impact of technology variations is visible. Due to the high leakage contribution a power reduction of only 25% is achieved under fast process conditions. Using back biasing in reverse mode, the high performance of fast transistors can be reduced through increasing Vth. The corresponding leakage current therefore decreases and allows a power reduction by 50% (stippled arrow). A process modification has been developed to manufacture devices with the threshold voltage of 150 mV, which proves to be the most efficient for the target application domain of mobile consumer products [10]. In Table1-1 the key transistor parameters of our ultra-low-Vth FETs (ulv) and of the standard low-Vth transistor are listed. The Vth values are 165mV and 161mV for the ulv-NFET and ulv-PFET respectively, Ion increases by 35% and 22%, which translates into an average decrease of the CV/I-metric delay by 29%. Circuit simulations showed a performance increase of 25%. Concerning Vth, performance, and Ioff the target values have been nearly met. Table 1-1. Extracted key parameters of the ulv-FETSs in comparison with the target values and the low- Vth FETs

130nm low-Vt NFET / PFET

130nm ulv-FET NFET / PFET

Target

Ion [µA/µm]

560 / 240

755 / 295

Ioff [nA/µm] Vth [mV] body effect [mV/V] Vth@¨L=10nm [mV] Vth@¨L=15nm [mV] Simulated gate delay [relative units]

1.2 / 1.2

48 / 17

§35

295 / 260

165/160

150

150 / 135

60/65

90

35 / 30

65/30

65 / 70

100/90

1

0.8

0.75

The sensitivity of Vth to gate length variation (roll-off) is expressed in Vth-shift per 10nm or 15nm gate length decrease. A comparison with lowVth-FETs shows a pronounced increase. Therefore in addition to temperature compensation, back biasing has also to be used to compensate for this strong technology variation.

TLFeBOOK 9 The values of the body effect are also included in Table 1-1. The body effect is expressed in Vth-shift per 1V well bias. The ulv-FETs yield values, which are lower by more than 50% compared to the low-Vth transistors. The decrease of body effect in combination with the increased roll-off reduces the leverage of back biasing for ulv-FETs very significantly. The leverage is not even sufficient to compensate the technology variation, since the value of the roll-off is higher than that of the body effect. As an example, the ulvNFET shows roll-off values of 65mV/10nm and 100mV/15nm and a body effect of only 60mV/V. To investigate the migration potential of the ulv-FETs for future technology generations Ioff measurement results, obtained from a recent 90nm hardware, were used. Based on this measurement data the leverage of active well with the standard reg-Vth and low leakage transistor options has been analyzed. For supply voltages of 1.2V and 0.75V a reverse back biasing voltage of 0.5V has been applied. For the NFET, the back biasing results in a leakage reduction by 50% to 70% for all transistor widths and for both values of Vdd. In the case of the PFET, the leakage reduction values are similar (60% to 80%) for transistors with W> 0.5µm. For very narrow PFETs with Vdd = 1.2V, the reduction is only 20% or even less. Since narrow FETs are used within SRAMs, which contribute a major part of the circuit’s standby current, this small reduction for narrow transistors in addition reduces significantly the leverage of active well. The root cause is an additional leakage mechanism based on tunnelling currents across the drain-well junction, which limits the reverse back biasing to 0.5V. This tunnelling current depends exponentially on the drain-well voltage and is working against any reduction of the sub-threshold current via active well. At Vdd = 0.75V the drain-well voltage is reduced and the tunnelling current is therefore lower. In this case the effect of back biasing is not compensated by a rising tunnelling current and a leakage current reduction by 70% is still achieved. For a 90nm technology the limit of 0.5V for the well potential swing limits the reduction of the leakage currents to a factor between 2 and 4. This is still a major contribution of all feasible measures to reduce standby power consumption, but the leverage becomes quite small compared to the reduction ratios of several orders of magnitude obtained in previous technologies [11,12]. In future technologies, Ileak will become more strongly affected by the emerging tunnelling current Igate through the gate of the FET. This is due to the ever decreasing gate oxide thickness and also due to the fact, that even the on-state transistors shows gate leakage. Igate is not affected by well biasing reducing the leverage of active well even further.

TLFeBOOK 10 In summary the zero-Vth-devices have become very susceptible to process and temperature variations. Significant yield is only achievable with back biasing via active well control and with active cooling. The latter approach is not feasible for mobile applications. Therefore a more conservative approach with respect to zero-Vth, but still aggressive compared to current devices, had to be chosen. An ultra-low-Vth device with about 150mV threshold voltage proved to be the best compromise between zeroVth and current low-Vth of about 300mV within a 130 nm CMOS technology. But even though fabrication of this ultra-low-Vth device is possible, it affects some standard methods to overcome short-channel effects. The so called halo- or pocket-implantation had to be removed to bring the threshold voltage down. Unfortunately short-channel effects are now heavily increased, leading as shown to a very strong Vth roll-off at slight variations of the channel length. Finally this effect was prohibitive for the overall approach and led to cancellation of many zero-Vth projects in the industry[13].

1.5

DESIGN APPROACHES TO POWER REDUCTION

As outlined above, solutions from process technology by itself will not suffice to provide sufficient power reduction. Therefore, solutions must be found in algorithms, product architecture and logic design. Increasingly, differentiated device options provided by process technology are utilized on these levels in the search for optimization of power consumption. For leading-edge products which need to optimize both power consumption and system performance, optimization techniques on architecture and design level have been proposed and partly already been implemented. While academic research often focuses on the tradeoff between power consumption and performance, industrial product development must also take other variables into consideration. • Product cost: often, power optimization design techniques increase die area, directly affecting manufacturing cost. Also, utilization of additional devices (e.g. different Vth devices) increases mask count and consequently manufacturing cost, and additionally requires up-front expenditures for the development of such devices. Finally, increased manufacturing complexity poses the risk of lowered manufacturing yield. • Product robustness: it must be ensured that optimized products still work across the specified range of operating conditions, also taking manufacturing variations into account.

TLFeBOOK 11 1.5.1

Multi-Vdd Design

As outlined in the introduction, the supply voltage Vdd quadratically impacts dynamic switching power consumption. Thus, lowering Vdd is the preferred option to reduce dynamic power consumption. However, as discussed in Section 1.2, lowering Vdd reduces the system performance. Thus, the incentive to lower Vdd to reduce power consumption is kept in check by the need to maintain performance. Reduction of Vdd can be applied on different abstraction levels of a design. Most effective regarding power reduction, and also easiest to implement is to lower Vdd for an entire IC. As this will directly impact the performance of the IC design, this often is not an option. On a lower abstraction level, it is possible to lower Vdd for an entire module. This is still rather simple to implement, but if only modules are chosen such that overall IC performance is not impacted, the achieved gains in power reduction will often be very moderate. Finally, a reduction in supply voltage can be applied specifically to individual gates, such that the overall system performance is not reduced. This approach, as shown in Figure 1-4, recognizes that in a typical design, most logic paths are not critical. They can be slowed down, often significantly, without reducing the overall system performance. This slowing down is achieved by lowering the supply voltage Vdd for gates on the noncritical paths, which results in lowered power consumption.

TLFeBOOK 12

10ns D

SET

CLR

D

SET

CLR

D

Q

Q

SET

CLR

D

Q

Q

SET

Q

Q

Q

Q

5ns Non-critical path may be delayed CLR

10ns D

SET

CLR

D

SET

D

Q

Vdd_low Vdd_low Vdd_low

Q

CLR

D

Q

Q

SET

SET

Q

Q

Q

Q

8ns Non-critical path runs with reduced supply voltage CLR

CLR

Figure 1-4. Multi-Vdd design

This technique will modify the distribution of path delays in a design to a distribution skewed towards paths with higher delay, as indicated Figure 1-5 [14].

Single Supply Voltage SSV

Multiple Supply Voltages MSV MSV

SSV crit. paths

td

td 1/f

1/f

Figure 1-5. Distribution of path delays under single and multiple supply voltages

TLFeBOOK 13 A number of studies have shown significant variation in dynamic power reduction results from implementing a multi-Vdd design strategy, ranging from less than 10% up to almost 50%, with 40% being the average [15,16]. Rules of thumb for selecting appropriate supply voltage levels have been developed. When using two supply voltages, the lower Vdd was proposed to be 0.6x-0.7x of the higher Vdd [17]. The optimal supply voltage level also depends on Vth [18]. The benefit of using multiple supply voltages quickly saturates. The major gain is obtained by moving from a single Vdd to dual-Vdd. Extending this to ever more supply voltage levels yields only small incremental benefits [18,19], even when the overhead introduced by multiple supply voltages (see below) is not taken into consideration. The power reduction achieved by this technique roughly depends on two parameters: the difference between the regular supply voltage Vdd and the lowered supply voltage Vdd_low, and the percentage of gates to which Vdd_low is applied. Regarding the first parameter, it has been pointed out some years ago that the leverage of this concept decreases as process technologies are scaled down further [18]. Recent work has analyzed this in more detail [14]. At least for high-Vth devices, which are essential for low standby power design due to their lower leakage current, Vth has scaled much slower than Vdd recently. Therefore, gate overdrive (Vdd - Vth) is diminished, negatively impacting performance. Thus, even a little reduction in Vdd will have a very significant impact on performance. Therefore, the potential to lower Vdd while maintaining overall system performance is greatly reduced. It is shown that from 0.25µm down to 0.09µm, the effectiveness of dual-Vdd decreases by a factor of 2 (from 60% dynamic power reduction to 30%) for high-Vth designs, whereas it stays about constant for low-Vth designs. This can however be countered by introduction of variable threshold voltages, as will be seen later. Regarding the second parameter, experience has shown that especially in designs using the multi-Vth technique outlined below, path delays tend to be skewed to higher delays already, thus reducing the number of gates that can be slowed down further [14]. For the selection of those gates which will receive the lower supply voltage Vdd_low, a number of techniques have been proposed. Most prevalent is the concept of clustered voltage scaling (CVS). It recognizes that it is desirable to have clusters of gates assigned to the same voltage, since between the output of a gate supplied by Vdd_low and the input of a gate supplied by Vdd a level shifter is required to avoid static current flow [20]. This concept has been enhanced by extended clustered voltage scaling (ECVS)[17] which essentially allows an arbitrary assignment of supply

TLFeBOOK 14 voltage levels to gates. This strategy implies more frequent insertion of level shifters into the design. However, usually only power consumption and delay are considered in the literature. The additional area cost is neglected. In industry, this certainly is not feasible. While conceptually simple, the implementation of a multi-Vdd concept poses a number of challenges. • The additional supply voltage Vdd_low needs to be created on-chip by a dcto-dc converter, unless the voltage already exists externally. This results in area overhead, and in power consumption for the converter. • The additional supply voltage Vdd_low must be distributed across the chip. • Level-shifters are required between different supply domains. It is feasible to integrate level shifters into flip-flops [21]. The penalties in area, power consumption and delay resulting from these effects are not always taken into account by work published in the literature. Studies indicate that a 10% area overhead will result from implementing a dual-Vdd design [22]. An additional consideration for industrial IC product development is that EDA tool support for implementing a dual-Vdd design is still only rudimentary. It is not sufficient to have a single point tool which can perform power-performance tradeoffs. Instead, this methodology needs to encompass the entire design flow (e.g. power distribution in layout; automated insertion of level shifters etc.).

1.5.2

Multi-Vth Design

Another essential technique is the use of different transistor threshold voltages (multi-Vth design). Primarily this technique reduces leakage power consumption, thus increasing standby time of battery-powered ICs. As leakage power consumption becomes an increasingly important component of overall power consumption in modern process technologies, this technique increasingly also helps to reduce overall power consumption significantly, as design moves to more advanced process technologies. The idea is similar to multi-Vdd design: paths that do not need highest performance are implemented with special leakage-reduced transistors (typically higher Vth transistors, but also thicker gate-oxide Tox), as shown in Figure 1-6.

TLFeBOOK 15

10ns D

SET

CLR

D

SET

CLR

D

Q

Q

CLR

D

Q

Q

SET

5ns Non-critical path may be delayed

SET

CLR

Q

Q

Q

Q

10ns D

SET

CLR

D

SET

D

Q

Q

high Vt

SET

high Vt high V t CLR

D

Q

Q

SET

Q

Q

Q

Q

8ns Non-critical path runs with increased threshold voltage CLR

CLR

Figure 1-6. Multi-Vth design

A typical industrial approach today is to first create a design using lower Vth transistors to achieve the required performance and then to selectively replace gates off the critical path with higher Vth (or thicker Tox) transistors to reduce leakage. Studies in the literature have reported reductions in leakage of around 50% up to 80%. Some approaches assume that different Vth levels are provided by the process technology (through doping variations) and propose algorithms to optimally assign Vth levels to transistors, ensuring that performance is not compromised [23, 24]. Recently, it has also been proposed to achieve modifications in Vth by modifying transistor length or gate oxide thickness Tox [25]. Design-tool support for this technique is also rudimentary at best. While it is becoming established to design different modules of an IC with different Vth transistors, it is very challenging to do this on the level of individual transistors within a module. The primary reason is that the entire design flow must be able to handle cells with identical functionality and size, which differ in their electrical properties. This poses no principal algorithmic problems, but must be consistently implemented in all EDA tools within a design flow.

TLFeBOOK 16 1.5.3

Hybrid Approaches

Recently approaches have been suggested in the literature which combine implementation of multiple supply voltages and multiple threshold voltages for further power reduction. Especially for designs where minimization of total power consumption is key (as compared to e.g. minimization of standby power for mobile products), it is possible to trade off leakage and dynamic power, as originally proposed in the zero-Vth concept. Studies in the literature indicate a total power optimum when leakage power contributes 10% to 30% [26,12]. This ratio depends significantly on the process technology, operating environment, and clock frequency of a design. For applications where leakage power minimization is critical (e.g. mobile products), this approach usually is not feasible, as it requires a relatively low Vth which causes high leakage currents [14]. With the increasing significance of gate leakage currents, variations of gate oxide thickness Tox have also been proposed. An overall framework for using two supply voltages and two threshold voltages as well has been presented [19]. Theoretically, it is shown that more than 60% of total power consumption can be saved this way (not considering required overhead such as level shifters, routing etc.). Rules of thumb are proposed and it is shown that the optimal second Vdd is about 50% of the original Vdd in this case. It is also argued that the usefulness of multiVdd strategies is not diminished, but actually increased in more advanced technologies, if also a multi-Vth strategy is followed, since this strategy allows to trade off leakage vs. dynamic power consumption by changing Vth and Vdd to optimize power consumption, while maintaining a required timing performance. This approach has been applied to the practical example of an ARM processor in [27]. Due to specific layout considerations it was not possible to implement all four intended combinations of Vdd and Vth. Instead, three different libraries were implemented. Using a CVS algorithm, a reduction in dynamic power by 15% was achieved for a 0.18µm process technology. Leakage power was reduced by 40%. As leakage power was more than 1000x smaller than dynamic power, overall active power reduction was 15%. To achieve this, a 14% increase in area was required. A very recent approach considers also transistor width sizing in addition to Vdd and Vth assignment [28]. Using a two stage, sensitivity-based approach, total power savings of 37% on average over a suite of benchmark circuits are reported. In this study, the threshold voltage is chosen rather low, so that leakage represents 20-50% of total power consumption. Therefore, optimization of both leakage and dynamic power consumption is essential, which is achieved with the presented approach.

TLFeBOOK 17 An enhanced approach for leakage power consumption considers multiple gate oxide thicknesses Tox in addition to multi-Vth [29]. It is motivated by the fact that gate leakage increases very dramatically with newer process technologies. Gate leakage is of the same order of magnitude as subthreshold leakage at the 90nm process node. Their relationship also depends significantly on the operating temperature T. The key observation that an OFF transistor suffers from subthreshold leakage, an ON transistor from gate leakage, motivates the approach to analyze transistor states in standby mode and assign Vth and Tox such that leakage power consumption is minimized. Leakage reductions of 5-6x are obtained on benchmark circuits, compared to designs using a single Vth and Tox. Previous approaches that included Tox into the optimization varied Tox only for different design modules, not on critical paths within modules. These newer approaches promise further reductions in power consumption. This will come, however, at a price (as seen e.g. in the ARM example). Design complexity increases significantly when variations in many parameters are made available at the same time. In some studies, the resulting overhead is not considered. 1.5.4

Cost Tradeoffs

This overhead must be considered, however, since it is quite significant: • Multi-Vdd: level-shifter (area, power consumption, delay), routing of additional supply voltages (area). • Multi-Vth: additional masks (manufacturing costs); potentially special design rules at the boundary between different Vth devices (area). • Multi-Tox: additional masks (manufacturing costs). • In addition, IC development costs increase due to more complex design flows. Also, special process options (Vth, Tox) must be developed, qualified and continuously monitored. For each such option, the design library must be electrically characterized, modelled for all EDA tools, and potentially optimized regarding circuit design and layout. It must be maintained and regularly updated (changes in electrical parameters, changes in tools in the design flow) over a long period of time as well. If a very specialized manufacturing flow is developed to fully optimize a given product, it will be very difficult to shift manufacturing of this product to a different fab (e.g. a foundry in case additional capacity is required). For these and potentially other reasons, we are not yet aware of industrial products that have implemented such proposals in a fine-grained manner (i.e. different Vth, Vdd and Tox combined within one design module).

TLFeBOOK 18 Some approaches in the literature also determine optimum levels of threshold voltages depending on a given design. In industry, this is rarely feasible. Typically, a manufacturing process has to be taken as given, with only predefined values of Vth (and Tox) being available.

1.6

APPROACHES ON HIGHER ABSTRACTION LEVELS

The approaches outlined above on gate level and device level can be (and often must be) supported by measures on higher levels of abstraction. Some of the most promising concepts are as follows: • partitioning the system such that large areas can be powered off for significant periods of time (block turnoff) • especially partitioning memory systems such that large parts can be turned off in standby mode • clock gating is an essential method which reduces dynamic power consumption by local off-switching of non-active gates • coding strategies (e.g. for buses) can reduce switching and thus dynamic power consumption

1.7

CONCLUSION AND FUTURE CHALLENGES

There is no single “silver bullet” to solve the challenge of power reduction. While ultra-low voltage logic based on special ultra-low-Vth devices is a conceptually very convincing concept, its widespread implementation is hindered by manufacturing concerns. An extrapolation of current technology trends indicates that such a concept will become even more difficult in the future. Today, design techniques are the most promising approach to reduce power – both dynamic and leakage. The concepts outlined here can be further extended. It is feasible to dynamically adjust supply and threshold voltages. These are theoretically promising concepts which however still require more investigation especially with regard to feasibility under industrial boundary conditions. Quite likely, in the future even more emphasis than today will have to be placed on power reduction schemes on algorithmic and system level. On these levels, the levers to reduce power consumption are largest.

Acknowledgement The authors wish to acknowledge and thank Jörg Berthold and Tim Schönauer for their contributions and fruitful discussions.

TLFeBOOK 19

References [1] G. Moore, Cramming More Components onto integrated circuits, Electronics Magazine, Vol. 38, No. 8, 1965, pp. 114-117. [2] ITRS, International Technology Roadmap for Semiconductors, 2003, http://public.itrs.net. [3] F. Pollack, New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies, Micro32 Keynote, 1999. [4] U. Schlichtmann, Systems are Made from Transistors: UDSM Technology Creates New Challenges for Library and IC Development, IEEE Euromicro Symposium on Digital System Design, 2002, pp. 1-2. [5] S. Borkar, Design Challenges of Technology Scaling, IEEE Micro, July/August 1999, pp. 23-29. [6] S. Thompson, P. Packan, and M. Bohr, MOS Scaling: Transistor Challenges for the 21st Century, Intel Technology Journal, Q3 1998. [7] N. Kim et al., Leakage Current: Moore's Law Meets Static Power, IEEE Computer, Vol. 36, No. 12, December 2003, pp. 68-75. [8] S. Sakurai, A. R. Newton, Alpha-Power Law MOSFET Model and its Application to CMOS Inverter Delay and Other Formulas, IEEE Journal of Solid-State Circuits, Vol. 25, No. 2, 1990, pp. 584-594. [9] J.B. Burr, J. Schott, A 200 mV self-testing encoder/decoder using Stanford ultra-lowpower CMOS, 1994 IEEE International Solid-State Circuits Conference [10] J. Berthold, R. Nadal, C. Heer, Optionen für Low-Power-Konzepte in den sub-180-nmCMOS-Technologien (In German), U.R.S.I. Kleinheubacher Tagung 2002. [11] V. Svilan, M. Matsui, J. B. Burr, Energy-Efficient 32 x 32-bit Multiplier in Tunable Near-Zero Threshold CMOS, ISLPED 2000, pp. 268-272. [12] V. Svilan, J. B. Burr, L. Tyler, Effects of Elevated Temperature on Tunable Near-Zero Threshold CMOS, ISLPED 2001, pp. 255-258. [13] C. Heer, Designing low-power circuits: an industrial point of view, PATMOS 2001 [14] T. Schoenauer, J. Berthold, C. Heer, Reduced Leverage of Dual Supply Voltages in Ultra Deep Submicron Technologies, International Workshop on Power And Timing Modeling, Optimization and Simulation PATMOS 2003, pp. 41-50. [15] K. Usami, M. Igarashi, Low-Power Design Methodology and Applications utilizing Dual Supply Voltages, Proceedings of the Asia and South Pacific Design Automation Conference 2000, pp. 123-128. [16] M. Donno, L. Macchiarulo, A. Macii, E. Macii, M. Poncino, Enhanced Clustered Voltage Scaling for Low Power, Proceedings of the 12th ACM Great Lakes Symposium on VLSI, 2002, pp. 18-23. [17] K. Usami et al., Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor, IEEE Journal of Solid-State Circuits, Vol. 33, No. 3, March 1998, pp. 463-472. [18] M. Hamada, Y. Ootaguro, T. Kuroda, Utilizing Surplus Timing for Power Reduction, Proceedings IEEE Custom Integrated Circuits Conference CICC, 2001, pp. 89-92. [19] A. Srivastava, D. Sylvester, Minimizing Total Power by Simultaneous Vdd/Vth Assignment, Proceedings of the Asia and South Pacific Design Automation Conference 2003, pp. 400-403. [20] K. Usami, M. Horowitz, Clustered Voltage Scaling Technique for Low-Power Design, Proceedings of the International Symposium on Low Power Design ISLPD, 1995, pp. 38.

TLFeBOOK 20 [21] K. Usami et al., Design Methodology of Ultra Low-power MPEG4 Codec Core Exploiting Voltage Scaling Techniques, Proceedings of the 35th Design Automation Conference 1998, pp. 483-488. [22] C. Yeh, Y.-S. Kang, Layout Techniques Supporting the Use of Dual Supply Voltages for Cell-Based Designs, Proceedings of the 36th Design Automation Conference 1999, pp. 62-67. [23] Q. Wang, S. Vrudhula, Algorithms for Minimizing Standby Power in Deep Submicrometer, Dual-Vt CMOS Circuits, IEEE Transactions on CAD, Vol. 21, No. 3, March 2002, pp. 306/318. [24] L. Wei, Z. Chen, K. Roy, M. Johnson, Y. Ye, V. De, Design and Optimization of DualThreshold Circuits for Low-Voltage Low-Power Applications, IEEE Transactions on Very Large Scale Integration (VLSI), Vol. 7, No. 1, March 1999, pp. 16-24. [25] N. Sirisantana, K. Roy, Low-Power Design Using Multiple Channel Lengths and Oxide Thicknesses, IEEE Design & Test of Computers, January-February 2004, pp. 56-63. [26] K. Nose, T. Sakurai, Optimization of VDD and VTH for Low-Power and High-Speed Applications, Proceedings of the Asia and South Pacific Design Automation Conference 2000, pp. 469-474. [27] R. Bai, S. Kulkarni, W. Kwong, A. Srivastava, D. Sylvester, D. Blaauw, An Implementation of a 32-bit ARM Processor Using Dual Power Supplies and Dual Threshold Voltages, IEEE International Symposium on VLSI, 2003, pp. 149-154. [28] A. Srivastava, D. Sylvester, D. Blaauw, Concurrent Sizing, Vdd and Vth Assignment for Low-Power Design, Proceedings of the Design, Automation and Test in Europe Conference DATE, 2003, pp. 718-719. [29] D. Lee, H. Deogun, D. Blaauw, D. Sylvester, Simultaneous State, Vt and Tox Assignment for Total Standby Power Minimization, Proceedings of the Design, Automation and Test in Europe Conference DATE, 2003, pp. 494-499.

TLFeBOOK 21

Chapter 2 ON-CHIP OPTICAL INTERCONNECT FOR LOW-POWER Ian O’Connor and Fr´ed´eric Gaffiot Ecole Centrale de Lyon

Abstract

It is an accepted fact that process scaling and operating frequency both contribute to increasing integrated circuit power dissipation due to interconnect. Extrapolating this trend leads to a red brick wall which only radically different interconnect architectures and/or technologies will be able to overcome. The aim of this chapter is to explain how, by exploiting recent advances in integrated optical devices, optical interconnect within systems on chip can be realised. We describe our vision for heterogeneous integration of a photonic “above-IC" communication layer. Two applications are detailed: clock distribution and data communication using wavelength division multiplexing. For the first application, a design method will be described, enabling quantitative comparisons with electrical clock trees. For the second, more long-term, application, our views will be given on the use of various photonic devices to realize a network on chip that is reconfigurable in terms of the wavelength used.

Keywords:

Interconnect technology, optical interconnect, optical network on chip

2.1

INTRODUCTION

In the 2003 edition of the ITRS roadmap [17], the interconnect problem was summarised thus: “For the long term, material innovation with traditional scaling will no longer satisfy performance requirements. Interconnect innovation with optical, RF, or vertical integration ... will deliver the solution”. Continually shrinking feature sizes, higher clock frequencies, and growth in complexity are all negative factors as far as switching charges on metallic interconnect is concerned. Even with low resistance metals such as copper and low dielectric constant materials, bandwidths for long interconnect will be insufficient for future operating frequencies. Already the use of metal tracks to transport a signal over a chip has a high cost in terms of power: clock distribution for instance

TLFeBOOK 22 requires a significant part (30-50%) of total chip power in high-performance microprocessors. A promising approach to the interconnect problem is the use of an optical interconnect layer, which could empower an increase in the ratio between data rate and power dissipation. At the same time it would enable synchronous operation within the circuit and with other circuits, relax constraints on thermal dissipation and sensitivity, signal interference and distortion, and also free up routing resources for complex systems. However, this comes at a price. Firstly, high-speed and low-power interface circuits are required, design of which is not easy and has a direct influence on the overall performance of optical interconnect. Another important constraint is the fact that all fabrication steps have to be compatible with future IC technology and also that the additional cost incurred remains affordable. Additionally, predictive design technology is required to quantify the performance gain of optical interconnect solutions, where information is scant and disparate concerning not only the optical technology, but also the CMOS technologies for which optics could be used (post-45nm node). In section 2.2, we will describe the “above-IC” optical technology. Sections 2.3 and 2.4 describe an optical clock distribution network and a quantitative electrical-optical power comparison respectively. A proposal for a novel optical network on chip in discussed in section 2.5.

2.2

OPTICAL INTERCONNECT TECHNOLOGY

Various technological solutions may be proposed for integrating an optical transport layer in a standard CMOS system. In our opinion, the most promising approach makes use of hybrid (3D) integration of the optical layer above a complete CMOS IC, as shown in fig. 2.1. The basic CMOS process remains the same, since the optical layer can be fabricated independently. The weakness of this approach is in the complex electrical link between the CMOS interface circuits and the optical sources (via stack and advanced bonding). In the system shown in fig. 2.1, a CMOS source driver circuit modulates the current flowing through a biased III-V microsource through a via stack making the electrical connection between the CMOS devices and the optical layer. III-V active devices are chosen in preference to Si-based optical devices for high-speed and high-wavelength operation. The microsource is coupled to the passive waveguide structure, where silicon is used as the core and SiO2 as the cladding material. Si/SiO2 structures are compatible with conventional silicon technology and silicon is an excellent material for transmitting wavelengths above 1.2µm (mono-mode waveguiding with attenuation as low as 0.8 dB/cm has been demonstrated [10]). The waveguide structure transports the optical signal to a III-V photodetector (or possibly to several, as in the case of

TLFeBOOK 23 III−V laser source

metallic interconnect structure

electrical contact

III−V photodetector

Si photonic waveguide (n=3.5)

SiO2 waveguide cladding (n=1.5)

driver circuit

Figure 2.1.

CMOS IC

receiver circuit

Cross-section of hybridised interconnection structure

a broadcast function) where it is converted to an electrical photocurrent, which flows through another via stack to a CMOS receiver circuit which regenerates the digital output signal. This signal can then if necessary be distributed over a small zone by a local electrical interconnect network.

2.3

AN OPTICAL CLOCK DISTRIBUTION NETWORK

In this section we present the structure of the optical clock distribution network, and detail the characteristics of each component part in the system: active optoelectronic devices (external VCSEL source and PIN detector), passive waveguides, interface (driver and receiver) circuits. The latter represent extremely critical parts to the operation of the overall link and require particularly careful design. An optical clock distribution network, shown in fig. 2.2, requires a single photonic source coupled to a symmetrical waveguide structure routing to a number of optical receivers. At the receivers the high-speed optical signal is converted to an electrical one and provided to local electrical networks. Hence the primary tree is optical, while the secondary tree is electrical. It is not feasible to route the optical signal all the way down to the individual gate level since each drop point requires a receiver circuit which consumes area and power. The clock signal is thus routed optically to a number of drop points which will cover a zone over which the last part of the clock distribution will be carried out

TLFeBOOK 24 by the electrical secondary clock tree. The size of the zones is determined by calculating the power required to continue in the optical domain and comparing it to the power required to distribute over the zone in the electrical domain. The number of clock distribution points (64 in the figure) is a particularly crucial parameter in the overall system. The global optical H-tree was optimised to achieve minimal optical losses by designing the bend radii to be as large as possible. For 20mm die width and 64 output nodes in the H-tree at the 70nm technology mode, the smallest radius of curvature (r3 in fig. 2.2) is 625µm, which leads to negligible pure bending loss. die width, D L CR L CV : source−waveguide coupling loss

LY

r1

LW :

waveguide transmission loss L B : bending loss L Y : Y−coupler loss L CR : waveguide−receiver coupling loss

LB

r3 LW r2

L CV

r1=D/8, r2=D/16, r3=D/32

optical source electrical clock trees

optical waveguides

optical receivers

Figure 2.2. Optical H-tree clock distribution network (OCDN) with 64 output nodes. r1−3 are the bend radii linked to the chip width D

2.3.1

VCSEL sources

VCSELs (Vertical Cavity Surface Emitting Lasers) are certainly the most mature emitters for on-chip or chip-to-chip interconnections. Commercial VCSELs, when forward biased at a voltage well above 1.5V, can emit optical power of the order of a few mW around 850nm, with an efficiency of some 40%. Threshold currents are typically in the mA range. However, fundamental requirements for integrated semiconductor lasers in optical interconnect applications are small size, low threshold lasing operation and single-mode operation (i.e. only one mode is allowed in the gain spectrum). Additionally, the fact that VCSELs emit light vertically makes coupling less easy. It is clear that

TLFeBOOK 25 significant effort is required from the research community if VCSELs are to compete seriously in the on-chip optical interconnect arena, to increase wavelength, efficiency and threshold current in the same device. Long wavelength, and low-threshold VCSELs are only just beginning to emerge (for example, a 1.5µm, 2.5Gb/s tuneable VCSEL [5], and an 850nm, 70µA threshold current, 2.6µm diameter CMOS compatible VCSEL [11] have been reported). Ultimately however, optical interconnect is more likely to make use of integrated microsources as described in section 2.5, as these devices are intrinsically better suited to this type of application.

2.3.2

PIN photodetectors

In order to optimise the frequency and power dissipation performance of the overall link, photodetectors must exhibit high quantum efficiency, large intrinsic bandwidth and small parasitic capacitance. The photodetector performance is measured by the bandwidth efficiency product. Conventional III-V PIN devices suffer from two main limitations. On one hand, their relatively high capacitance per unit area leads to limitations in the design of the transconductance amplifier interface circuit. On the other hand, due to its vertical structure, there is a tradeoff between its frequency performance and its efficiency (the quantum efficiency increases and the bandwidth decreases with the absorption intrinsic layer thickness) [9]. Metal-semiconductor-metal (MSM) photodetectors offer an alternative over conventional PIN photodetectors. An MSM photodetector consists of interdigitated metal contacts on top of an absorption layer. Because of their lateral structure, MSM photodetectors have very high bandwidths due to their low capacitance and the possibility to reduce the carrier transit time. However, the responsivity is usually low compared to PIN photodetectors [4]. MSM photodiodes with bandwidth greater than 100GHz have been reported.

2.3.3

Waveguides

Optical waveguides are at the heart of the optical interconnect concept. In the Si/SiO2 approach, the high relative refractive index difference ∆ = (n21 − n22 )/2n21 between the core (n1 ≈ 3.5 for Si) and cladding (n2 ≈ 1.5 for SiO2 ) allows the realisation of a compact optical circuit with dimensions compatible with DSM technologies. For example, it is possible to realise monomode waveguides less than 1µm wide (waveguide width of 0.3µm for wavelengths of 1.55µm), with bend radii of the order of a few µm [15]. However, the performance of the complete optical system depends on the minimum optical power required by the receiver and on the efficiency of passive optical devices used in the system. The total loss in any optical link is the sum

TLFeBOOK 26 of losses (in decibels) of all optical components: Ltotal = LCV + LW + LB + LY + LCR

(2.1)

where LCV is the coupling coefficient between the photonic source and optical waveguide. There are currently several methods to couple the beam emitted from the laser into the optical waveguide. In this analysis we assumed 50% coupling efficiency LCV from the source to a single mode waveguide. LW is the rectangular waveguide transmission loss per unit distance of the optical power. Due to small waveguide dimensions and large index change at the core/cladding interface in the Si/SiO2 waveguide the side-wall scattering is the dominant source of loss (fig. 2.3a). For the waveguide fabricated by Lee [10] with roughness of 2nm the calculated transmission loss is 1.3dB/cm. LB is the bending loss, highly dependent on the refractive index difference ∆ between the core and cladding medium. In Si/SiO2 waveguides, ∆ is relatively high and so due to this strong optical confinement, bend radii as small as a few µm may be realised. As can be seen from fig. 2.3b, the bending losses associated with a single mode strip waveguide are negligible if the radius of curvature is larger then 3µm. LY is the Y-coupler loss, and depends on the reflection and scattering attenuation into the propagation path and surrounding medium. For high index difference waveguides the losses for the Y-branch are significantly smaller than for low ∆ structures and the simulated losses are less then 0.2dB per split [14]. LCR is the coupling loss from the waveguide to the optical receiver. Using currently available materials and methods it is possible to achieve an almost 100% coupling efficiency from waveguide to optical receiver. In this analysis the coupling efficiency is assumed to be 87% (LCR = 0.6dB) [16].

2.3.4

Interface circuits

High-speed CMOS optoelectronic interface circuits are crucial building blocks to the optical interconnect approach. The electrical power dissipation of the link is defined by these circuits, but it is the receiver circuit that poses the most serious design challenges. The power dissipated by the source driver is mainly determined by the source bias current and is therefore device-dependent. On the receiver side however, most of the receiver power is due to the circuit, while only a small fraction is required for the photodetector device.

TLFeBOOK 27 60

100 1 Pure bending loss (dB)

Transmission loss (dB/cm)

50

40

30

20

10

0.01 0.0001 1e-06 1e-08 1e-10 1e-12

0

1e-14 1

2

3

4

5 6 7 8 9 Sidewall roughness (nm)

10

11

12

Figure 2.3a. Simulated transmission loss for varying sidewall roughness in a 0.5µm× 0.2µm Si/SiO2 strip waveguide

2

3

4

5 6 7 Bend radius (um)

8

9

Figure 2.3b. Simulated pure bending loss for various bend radii in a 0.5µm× 0.2µm Si/SiO2 strip waveguide

2.3.4.1 Driver circuits. Source driver circuits generally use a current modulation scheme for high-speed operation. The source always has to be biased above its threshold current by a MOS current sink to eliminate turn-on delays, which is why low-threshold sources are so important (figures of the order of 40µA [7] have been reported). A switched current sink modulates the current flowing through the source, and consequently the output optical power injected into the waveguide. As with most current-mode circuits, high bandwidth can be achieved since the voltage over the source is held relatively constant and parasitic capacitances at this node have reduced influence on the speed. 2.3.4.2 Receiver circuits. A typical structure for a high-speed photoreceiver circuit consists of: a transimpedance amplifier (TIA) to convert the photocurrent of a few µA into a voltage of a few mV; a comparator to generate a rail-to-rail signal; and a data recovery circuit to eliminate jitter from the restored signal. Of these, the TIA is arguably the most critical component for high-speed performance, since it has to cope with a generally large photodiode capacitance situated at its input. The basic transimpedance amplifier structure in a typical configuration is shown in fig. 2.4 [8]. The bandwidth/power ratio of this structure can be maximised by using small-signal analysis and mapping of the individual component values to a filter approximation of Butterworth type. It is then possible to develop a synthesis procedure which, from desired transimpedance performance criteria (gain Zg0 , bandwidth and pole quality factor Q) and operating conditions (photodiode and load capacitances, Cd and Cl respectively) generates component values for the feedback resistance Rf and the voltage amplifer (voltage gain Av and output resistance Ro ). Circuits with  high Ro /Av ratio (≈ 1/ gm ) require the least quiescent current and area and this quantity constitutes therefore an important figure of merit in design space

TLFeBOOK 28 Rf

Ii

Cl

Cd Vdd

1 R oC y

Q=

M f (M x + M m(1 + M x ))(1 + A v ) 1 + M x (1 + M f ) + MmM f (1 + A v )

Vo

(

M3

M2 Vi

Z g0 = −

Vo

Cm

1 + Av M f (M x + Mm(1 + Mx ))

ω0= −Av

R f − Ro /Av 1 + 1/Av

)

M1

M f = Rf / R o M i = Cx / C y Mm= Cm / C y

Co

Ci

Cy = C l + Co

Cx = Cd + Ci

Figure 2.4.

CMOS transimpedance amplifier structure

exploration (fig. 2.5a). To reach a sized transistor-level circuit, approximate equations for the small-signal characteristics and bias conditions of the circuit are sufficient to allow a first-cut sizing of the amplifier, which can then be finetuned by numerical or manual optimisation, using simulation for exact results. The complete process is described in [13]. Amplifier Ro/Av requirement Ci=500fF Cl=100fF

1THzohm Transimpedance amplifier characteristics against technology node Cd = 400fF, Cl = 150fF

Ro/Av 300 250 200 150 100 50

400 350 300 250 200 150 100 50 0

100

Area / um2 Quiescent power / 100uW

10

1 10000

1 3 Bandwidth requirement /GHz

10

1000

3000 Transimpedance gain requirement /ohms

0.1 350

180

130

100

70

45

Technology node (nm)

Figure 2.5a. TIA Ro /Av design space with varying bandwidth and transimpedance gain requirements

Figure 2.5b. Evolution of TIA characteristics (power, area, noise) with technology node

Using this methodology with industrial transistor models for technology nodes from 350nm to 180nm and predictive BSIM3v3/BSIM4 models for technology nodes from 130nm down to 45nm [3], we generated design parameters for 1T HzΩ transimpedance amplifiers to evaluate the evolution in critical characteristics with technology node. Fig. 2.5b shows the results of transistor level simulation of fully generated photoreceiver circuits at each technology node.

TLFeBOOK 29

2.4

2.4.1

QUANTITATIVE POWER COMPARISON BETWEEN ELECTRICAL AND OPTICAL CLOCK DISTRIBUTION NETWORKS Design methodology

In an optical link there are two main sources of electrical power dissipation: (i) power dissipated by the optical receiver(s) and (ii) energy needed by the optical source(s) to provide the required optical output power. To estimate the electrical power dissipated in the system we developed the methodology shown in fig. 2.6. BER specification (SNR requirement)

photodiode characteristics (R,Cd,Idark )

minimum optical power at receiver

transimpedance amplifier

losses in passive waveguide network

minimum optical power at source

source efficiency

emitter power

receiver power

electrical power dissipated in optical system

Figure 2.6. Methodology used to estimate the electrical power dissipation in an optical clock distribution network

The first criterion for defining the performance of the optoelectronic link is the required signal transmission quality, represented by the bit error rate (BER) and directly linked to the photoreceiver signal to noise ratio. For an on-chip interconnect network, a BER of 10−15 is acceptable. To calculate the required signal power at the receiver, the characteristics of the receiver circuit have to be extracted from the transistor-level schematic, which is generated from the photodetector characteristics (responsivity R, Cd , dark current Idark ) and from the required operating frequency using the method described in section 2.3. For the given BER and for the noise signal associated with the photodiode and transimpedance circuit the minimum optical power required by the receiver to operate at the given error probability can be calculated using the Morikuni formula [12]. With this figure, and knowing the layout and therefore the optical losses that will be incurred in the waveguides, the minimum required optical power at the source can be estimated. The total electrical power dissipated in the optical

TLFeBOOK 30 link is the sum of the power dissipated by the number of optical receivers and the energy needed by the source to provide the required optical power. The electrical power dissipated by the receivers can be extracted from transistorlevel simulations. To estimate the energy needed by the optical source, laser light-current characteristics given by Amann [1] were used.

2.4.2

Design performance

Our aim in this work was to quantitatively compare the power dissipation in electrical and optical clock distribution networks for a number of cases, including technology node prediction. For both electrical and optical cases we used technology parameters from the ITRS roadmap (wire geometry, material parameters). For transistor models we used predictive model parameters from Berkeley (BSIM3V3 down to 70nm and BSIM4 down to 45nm). The power dissipated in the electrical system can be attributed to the charging and discharging of the wiring and load capacitance and to the static power dissipated by the buffers. In order to calculate the power we used an internally developed simulator, which allows us to model and calculate the electrical parameters of clock networks for future technology nodes [18]. For optical performance predictions we used existing technology characteristics while for the optoelectronic devices we took datasheets from two real devices and used these figures. The power dissipated in clock distribution networks was analysed in both systems at the 70nm technology node. Power dissipation figures for electrical and optical CDNs were calculated based on the system performance summarised in tables 2.1a and 2.1b. Table 2.1a. tics

Electrical CDN characteris-

Optical CDN characteristics

Optical system parameter

Electrical system parameter Technology (nm) Vdd (V) Tox (nm) Chip size (mm2 ) Global wire width (µm) Metal resistivity (Ω-cm) Dielectric constant Optimal segment length (mm) Optimal buffer size (µm)

Table 2.1b.

70 0.9 1.6 400 1 2.2 3 1.7 90

Wavelength λ (nm) Waveguide core index (Si) Waveguide cladding index (SiO2 ) Waveguide thickness (µm) Waveguide width (µm) Transmission loss (dB/cm) Loss per Y-junction (dB) Input coupling coefficient (%) Photodiode capacitance (fF) Photodiode responsivity (A/W)

1550 3.47 1.44 0.2 0.5 1.3 0.2 50 100 0.95

What follows is the results of comparisons of the power dissipation in electrical and optical clock distribution networks. This was quantitatively carried out for varying chip size, operating frequency, number of clock distribution points, technology node, and finally sidewall roughness. This latter perfor-

TLFeBOOK 31 mance characteristic is the only non system-driven characteristic, but it gives some important design information to technology groups working on optical interconnect. Fig. 2.7a shows a power comparison where we vary square die size from 10 to 37 mm width. This analysis was carried out for the 70nm node at a distribution frequency of 5.6GHz (which is the clock frequency associated with this node) and 256 drop points. Electrical CDN power rises almost linearly with die size, which is understandable since the line lengths increase and therefore require more buffers to drive them. Optical CDN power rises much more slowly since all that is really changing is transmission loss and this has a quite minor effect on the overall power dissipation. When we vary clock frequency for constant chip width, fig. 2.7b we observe a similar effect for the electrical CDN. Again, the number of buffers has to increase since the segment lengths have to be reduced in order to attain the lower RC time constants. For the optical CDN, what is changing is the receiver power dissipation. The transimpedance amplifier requires a lower output resistance in order to operate at higher frequencies and this translates to a higher bias current. In fig. 2.7c, we vary the number of drop points and see that both electrical and optical CDN power dissipation rises, but optical rises much faster than electrical. There are two reasons for this: firstly, every time the number of drop points is doubled, so is the number of receivers and this accounts for a large part of the power dissipation; secondly, the number of splitters is doubled, which in turn means that the power at emission also has to be doubled. These two factors cause the optical power to catch up with the electrical power at around 4000 drop points. Fig. 2.7e shows a comparison for varying technology node. Not only the technology is changing here, we are also changing the clock frequency associated with the node. We can see that at the 70nm node there is a five-fold difference between electrical and optical clock distribution. As the technology node advances, this difference becomes even more marked. A final analysis, fig. 2.7f, shows how technological advances are required to improve system performance, concerning in this case waveguide sidewall roughness. 5nm roughness translates to a transmission loss of around 8dB/cm, which in turn corresponds to a power dissipation figure of around 500mW for the 70nm node at 5.6GHz and 20mm chip width. Looking at the 2nm roughness point, achieved at MIT [10] and corresponding to a transmission loss of 1.3dB/cm, we obtain a power dissipation figure of about 10mW, a fiftyfold decrease in the overall power dissipation by going from 5nm roughness to 2nm roughness. This demonstrates the importance of optimising the passive waveguide technology for the whole system.

TLFeBOOK 32 1600

1400

Electrical CDN Optical CDN

Electrical CDN 256 Optical CDN 256 Electrical CDN 128 Optical CDN 128

1200 Power dissipation (mW)

Power dissipation (mW)

1400 1200 1000 800 600 400

1000 800 600 400 200

200

0

100

300

500

700

900

1

3

Die size (mm2)

Figure 2.7a. Comparison of power dissipation in electrical and optical clock distribution networks for varying chip size (70nm technology, 5.6GHz, 256 drop points)

7

Figure 2.7b. Comparison of power dissipation in electrical and optical clock distribution networks for varying clock frequency (70nm technology, 400mm2 , 256 drop points)

10000

30 20 Power gain optical/electrical (%)

Electrical CDN Optical CDN Power dissipation (mW)

5 Clock frequency (GHz)

1000

100

10

10 0 -10 -20 -30 -40

1

-50 4

32 256 Number of drop points (nodes)

2048

8172

Figure 2.7c. Comparison of power dissipation in electrical and optical clock distribution networks for varying number of drop points (70nm technology, 5.6GHz, 400mm2 )

4

32 256 Number of drop points (nodes)

2048

8172

Figure 2.7d. Comparison of power dissipation in electrical and optical clock distribution networks for varying number of drop points (70nm technology, 5.6GHz, 400mm2 )

1200 Electrical CDN Optical CDN

1000

Power dissipation (mW)

Power dissipation (mW)

1000

800

600

400

Optical CDN 256 Optical CDN 128

100

10

200

0 130

100 70 Technology node (nm)

45

Figure 2.7e. Comparison of power dissipation in electrical and optical clock distribution networks for varying technology nodes

1

3 5 7 Waveguide transmission loss (dB/cm)

9

Figure 2.7f. Evaluation of power dissipation in optical clock distribution networks for varying waveguide sidewall roughness (70nm technology, 5.6GHz, 400mm2 )

For a BER of 10−15 the minimal power required by the receiver is -22.3dBm (at 3GHz). Losses incurred by passive components for various nodes in the H-tree are summarised in table 2.2.

TLFeBOOK 33 Table 2.2.

Optical power budget for 20mm die width at 3GHz

Number of nodes in H-tree

16

32

64

128

Loss in straight lines (dB) Loss in curved lines (dB) Loss in Y-dividers (dB) Loss in Y-couplers (dB) Output coupling loss (dB) Input coupling loss (dB) Total optical loss (dB) Min. receiver power (dBm) Laser optical power (mW)

1.3 1.53 12 0.8 0.6 3 19.2 -22.3 0.5

1.3 1.66 15 1 0.6 3 22.5 -22.3 1.1

1.3 1.78 18 1.2 0.6 3 25.8 -22.3 2.30

1.3 1.85 21 1.4 0.6 3 29.1 -22.3 4.85

We can conclude from this analysis that power dissipation in optical clock distribution networks is lower than that of electrical clock distribution networks, by a factor of five for example at the 70nm technology node. This factor will in the future become larger due to two reasons: firstly due to improvements in optical fabrication technology; and secondly with the rise in operating frequencies. However, this figure is probably not sufficient to convince semiconductor manufacturers to introduce such large technological and methodological changes for this application. To improve the figure, weak points can be identified for each main part of an integrated optical link. For the source, the efficiency between electrical and optical power conversion is relatively low. This needs to be improved and one area is possibly in integrated microsources. For the waveguide structures, most of the losses need to be improved, especially transmission loss and coupling loss. Sidewall roughness especially has a direct and considerable impact on the power dissipation in the global system. Finally at the receiver end, the transimpedance amplifier power dissipation is too high. Better circuit structures must be devised, or the photodetector parasitic capacitance needs to be reduced.

2.5

OPTICAL NETWORK ON CHIP

In current SoC architectures, global data throughput between functional blocks can reach up to tens of gigabits per second, the load being shared by several communication buses. In the future the constraints acting on such data exchange networks will continue to increase: the number of IP blocks in an integrated system could be as high as several hundred and the global throughput could reach the Tb/s scale. To provide this level of performance, the communication system itself is designed as an IP block into which the various functional units will be connected. This type of standardised hardware communication architecture is called a network on a chip (NoC).

TLFeBOOK 34 Using wavelength division multiplexing (WDM) techniques, photonics and optoelectronics may offer new solutions to realise reconfigurable optical networks on chip (ONoC). An ONoC, as an electronic router with routing based on wavelength λ, is actually a circuit-switching based topology and can thus ensure data exchanges between IP blocks with very low contention. The advantages of using an optical network are many: independence of interconnect performance from distance and data rate, crosstalk reduction, connectivity increase, interconnect power dissipation reduction, increase in the size of isochronous tiles, use of communication protocols. Figure 2.8 shows a 4 × 4 ONoC with all electronic interfaces: photodetector and laser in III-V technology and optical network in SOI technology, using similar heterogeneous integration techniques as described in section 2.2. Intellectual property (IP) blocks shown can be processor cores, memory blocks, functional units etc. with standard interfaces to the communication network. This is a multi-domain device with high speed optoelectronic circuits (modulation of the laser current and photodetectors) and passive optics (waveguides and passive filters). In the figure, M are masters (processor, IP, ...) which can communicate with targets T (memory, ...). The network is comprised of 4 stages, each associated with a single resonant wavelength. The operation of the 4×4 network is summarised in the table of figure 2.3. This system is a fully passive circuit-switching network based on wavelength routing and is a non-blocking network. From Mi to Tj , there exists only one physical path associated with one wavelength. At any one time, singlewavelength emitters can make 4 connections and multi-wavelength emitters can make 12 connections. The network is in principle scalable to an infinite number of connections. In practice, this number is severely limited by lithography and etching precision. For a 5nm tolerance on the size of the microdisk, corresponding to state of the art CMOS process technology, the maximum size of the network is 8 × 8. Table 2.3.

Truth table for optical network on chip T1 T2 M1 λ2 λ3 λ3 λ4 M2 M3 λ1 λ2 M4 λ4 λ1

T3 λ1 λ2 λ4 λ3

T4 λ4 λ1 λ3 λ2

The basic element of the network is an optical filter, described in the next section. The ports ♯1 − ♯4 correspond to inputs/outputs of the optical filter. Its operation is the same as an electronic cross-bar: the cross function (output in ♯4) is activated when the injected wavelength in ♯1 does not correspond to a resonant ring wavelength and the bar function is activated (output in ♯3) when the injected wavelength in ♯1 corresponds to a resonant ring wavelength.

TLFeBOOK 35 IPM1

M1

IPM2

M2

IPM3

M3

IPM4

M4

1

IPT1

T2

IPT2

T3

IPT3

T4

IPT4

3

2

4

1

master IP blocks

T1

master passive optical interface network on chip (driver, elementary optical laser) filter operation

3

#1

#3

i

i

=

n

i

=

n

n

#2

#4

target interface (detector, receiver)

target IP blocks

Figure 2.8. Architecture of 4x4 optical network on chip

Operation is symmetrical: the same phenomena happens if the wavelength injection is placed in the port ♯4.

2.5.1

Microresonators

Microring resonators are ideal device candidates for integrated photonic circuits. Because they render possible the addition or extraction of signals from a waveguide based on wavelength in a WDM flow, they can be considered as basic building blocks to build complex communication networks. The use of standard SOI technology leads to high compactness (structures with radii as small as 4µm have been reported) and the possibility of low-cost photonic integration. Figure 2.9 shows the structure of an elementary add-drop filter based on microring resonators. The size of the structure is typically a few hundred µm2 . It consists of two identical disks evanescently side-coupled to two signal waveguides which are crossed at near right angles to facilitate signal directivity. The microdisks make up a selective structure: the electromagnetic field propagates in the rings for discrete propagation modes corresponding to specific wavelengths. The resonant wavelengths depend on geometric and structural parameters (indices of the substrate and of the microrings, thickness and diameter of the disks). The basic function of a microresonator can be thought of as a wave-lengthcontrolled switching function. If the wavelength of an optical signal passing through a waveguide in proximity to the resonator (for example injected at port ♯1) is close enough to a resonant wavelength λ1 (tolerance is of the order of a few nm, depending on the coupling strength between the disk and the waveguide), then the electromagnetic field is coupled into the microrings and then out along the second waveguide (in the example, the optical signal is transmitted to the

TLFeBOOK 36 #1

#3 r

#3 #4

10um

=

= r

#2

#1

r

i

#2

i

r

r

#4

30um

Figure 2.9.

Micro-disk realisation of an add-drop filter

output port ♯3, as shown in fig. 2.10a). If the wavelength of the optical signal does not correspond to the resonant wavelength, then the electromagnetic field continues to propagate along the waveguide and not through the structure (in the example, the optical signal would then be transmitted to the output port ♯4, as shown in fig. 2.10b). This device thus operates as an elementary router, the behaviour of which is summarised in the table in fig. 2.9.

Figure 2.10a. FDTD simulation of adddrop filter in on-state

Figure 2.10b. FDTD simulation of adddrop filter in off-state

First structures have been realised and preliminary results are promising. Fig. 2.11a shows an IR photograph of the structure in the cross state (top) and in the bar state (bottom), while fig. 2.11b represents the transmission coefficient on the cross output: the transmitted power on the cross output reaches 100% for wavelengths corresponding to the resonant frequencies of the microdisk.

2.5.2

Microsource lasers

From the viewpoint of mode field confinement and mirror reflection, microdisk lasers operate on the principle of total internal reflection, as opposed to multiple reflection, as is the case in VCSELs for example. This fact gives this type of source two distinct advantages over VCSELs for on-chip optical interconnect. Firstly, light emission is in-plane (as opposed to vertical), meaning

TLFeBOOK 37

Figure 2.11a. Infra-red photograph of structure in both cross (top) and bar (bottom) states

Figure 2.11b. Transmission coefficient on cross output for varying wavelength

that emitted light can be injected directly into a waveguide with minimum loss [6]. Secondly, for communication schemes requiring multiple wavelengths, it is easier from a technological point of view to control the radius of such a device than it is to control the thickness of an air gap in a VCSEL. In any case such devices, to be compatible with dense photonic integration, must satisfy the requirements of small volume and high optical confinement, with low threshold current and emitting in the 1.3-1.6µm range. Although these devices are not as mature as VCSELs, they seem extremely promising for optical interconnect applications. An overview of microcavity semiconductor lasers can be found in [2].

2.5.3

Demonstration of principle

Behavioural models enable us to verify the operation of the 4 × 4 ONoC at high level simulation. An injection of 4 wavelengths is realised (λ1 , λ2 , λ3 , and λ4 ) at the port ♯1 at the same moment (shown in figure 2.12). The input signal format is a matrix. Figure 2.12 is a 3-dimensional representation with wavelength on the X-axis (representing the 4 channels), time on the Y-axis and power (normalised) on the vertical axis. Each injected wavelength has two pulses (Gaussian) in time. The behavioural simulation analyses the 4 outputs T1 , T2 , T3 and T4 (T2 shown in fig. 2.12). As predicted in table 2.3, only λ3 is detected at the output T2 .

Figure 2.12.

Simulation of 4x4 optical network on chip

TLFeBOOK 38

2.6

CONCLUSION

Integrated optical interconnect is one potential technological solution to alleviate some of the more pressing issues involved in moving volumes of data between circuit blocks on integrated circuits. In this chapter, we have shown how novel integrated photonic devices can be fabricated above standard CMOS ICs, designed concurrently with EDA tools and used in clock distribution and NoC applications. The feasibility of on-chip optical interconnect is no longer really in doubt. We have given some partial results to quantitatively demonstrate the advantages of optical clock distribution. Although lower power can be achieved (of the order of a five-fold decrease), more work is required to explore new solutions that benefit from advances both at the architectural and at the technological level. Also the existing basic building blocks need to be integrated together to physically demonstrate on-chip optical links. Research is well under way in several research groups around the world to do this. Looking further ahead, the use of multiple wavelengths in on-chip communication networks and in reconfigurable computing is an extremely promising and exciting field of research.

References [1] M. Amann, M. Ortsiefer, and R. Shau: 2002, ‘Surface-emitting Laser Diodes for Telecommunications’. In: Proc. Symp. Opto- and Microelectronic Devices and Circuits. [2] T. Baba: 1997, ‘Photonic Crystals and Microdisk Cavities Based on GaInAsP-InP System’. IEEE J. Selected Topics in Quantum Electronics 3. [3] Y. Cao, T. Sato, D. Sylvester, M. Orchansky, and C. Hu: 2000, ‘New Paradigm of Predictive MOSFET and Interconnect Modeling for Early Circuit Design’. In: Proc. Custom Integrated Circuit Conference. [4] S. Cho et al.: 2002, ‘Integrated detectors for embedded optical interconnections on electrical boards, modules and integrated circuits’. IEEE J. Sel. Topics in Quantum Electronics 8. [5] A. Filios et al.: 2003, ‘Transmission performance of a 1.5-µm 2.5-Gb/s directly modulated tunable VCSEL’. IEEE Phot. Tech. Lett. 15. [6] M. Fujita, A. Sakai, and T. Baba: 1999, ‘Ultrasmall and ultralow threshold GaInAsP-InP microdisk injection lasers: Design, fabrication, lasing characteristics and spontaneous emission factor’. IEEE J. Sel. Topics in Quantum Electronics 5. [7] M. Fujita, R. Ushigome, and T. Baba: 2000, ‘Continuous wave lasing in GaInAsP microdisk injection laser with threshold current of 40µA’. IEE Electron. Lett. 36.

TLFeBOOK 39 [8] M. Ingels and M. S. J. Steyaert: 1999, ‘A 1-Gb/s, 0.7µm CMOS Optical Receiver with Full Rail-to-Rail Output Swing’. IEEE J. Solid-State Circuits 34(7). [9] I. Kimukin et al.: 2002, ‘InGaAs-Based High-Performance p-i-n Photodiodes’. IEEE Phot. Tech. Lett. 26(3). [10] K. Lee et al.: 2001, ‘Fabrication of ultralow-loss Si/SiO2 waveguides by roughness reduction’. Optics Letters 26. [11] J. Liu et al.: 2002, ‘Ultralow-threshold sapphire substrate-bonded topemitting 850-nm VCSEL array’. IEEE Phot. Lett. 14. [12] J. Morikuni et al.: 1994, ‘Improvements to the standard theory for photoreceiver noise’. IEEE J. Lightwave Technology 12. [13] I. O’Connor, F. Mieyeville, F. Tissafi-Drissi, G. Tosik, and F. Gaffiot: 2003, ‘Predictive design space exploration of maximum bandwidth CMOS photoreceiver preamplifiers’. In: Proc. IEEE International Conference on Electronics, Circuits and Systems. [14] A. Sakai, T. Fukazawa, and T. Baba: 2002, ‘Low Loss Ultra-Small Branches in a Silicon Photonic Wire Waveguide’. IEICE Tran. Electron. E85-C. [15] A. Sakai, G. Hara, and T. Baba: 2001, ‘Propagation Characteristics of Ultrahigh-∆ Optical Waveguide on Silicon-on-Insulator Substrate’. Jpn. J. Appl. Phys. – Part 2 40. [16] S. Schultz, E. Glytsis, and T. Gaylord: 2000, ‘Design, Fabrication, and Performance of Preferential-Order Volume Grating Waveguide Couplers’. Applied Optics-IP 39. [17] Semiconductor Industry Association: 2003, ‘International Technology Roadmap for Semiconductors’. [18] G. Tosik, F. Gaffiot, Z. Lisik, I. O’Connor, and F. Tissafi-Drissi: 2004, ‘Power dissipation in optical and metallic clock distribution networks in new VLSI technologies’. IEE Elec. Lett. 4(3).

TLFeBOOK 40

Chapter 3 NANOTECHNOLOGIES FOR LOW POWER

Jacques Gautier CEA-DRT – LETI/D2NT – CEA/GRE

Abstract

The conventional approach to improve the performance of circuits is to scale down the devices and technologies. This is also convenient to lower the power consumption per function. In this chapter, we overview the potential of nanotechnologies for this purpose, with emphasis on few-electron devices in the case of room-temperature operation. Other devices, especially carbon nanotube transistors, resonant tunnelling diodes and quantum cellular automata, are briefly discussed.

Keywords:

nanotechnologies; Single Electron Transistor; SET; molecular electronics; RTD; QCA; low power; Coulomb blockade

3.1

INTRODUCTION

In addition to packing-density increase and speed improvement, the downscaling of technologies comes with a reduction of the power consumption per function. However this gain is offset by the tremendous increase in the number of transistors per chip. A possible solution is to go further towards nano-scale devices where a lower amount of charge is needed to code a bit. This is the basis of what is known as single electronics. The use of molecules could be a realistic way to fabricate these tiny devices and other useful nanostructures. In this chapter we overview the potential of nanodevices for low power electronics with emphasis on few-electron electronics in the case of roomtemperature (RT) operation. Other devices, especially carbon nanotube transistors, resonant tunnelling diodes (RTD) and quantum cellular automata (QCA), are briefly discussed.

TLFeBOOK 41

3.2

SINGLE ELECTRONICS

In CMOS circuits, the total power consumption is the sum of the dynamic power and of the contribution of leakages. For advanced technology generations the later is rapidly rising, but it is still less than the former. So, we will focus on this dynamic power consumption which is given by the usual expression 2 Pd = a ⋅ N gate ⋅ (C gate + Cint er )⋅VDD ⋅ fc

(1)

where a is the activity factor, Ngate is the amount of gates, (Cgate + Cinter) is the load capacitance, gate and interconnect contributions, and f is the clock frequency. This equation shows that the power is proportional to the amount of charge in transistors and interconnects for coding a bit of information. For dense circuits with local interconnects, the dominant contribution is usually the one related to the gate capacitance of transistors which can also be expressed as Pd=a.Ngate.Q.VDD.fc, where Q is the channel charge. So there is a strong motivation to reduce it for power saving. This is currently obtained by the downscaling of technologies. From the extrapolation of the historical trend and from the ITRS roadmap anticipation[1], we can expect a value of only 10-20 electrons for sub-10nm MOSFETs. This is much less than the hundreds to thousands of electrons present in current devices. Is it possible to go still further, towards only one electron, using what is called a single electron transistor or SET [2]? That would be advantageous for power consumption, knowing that the reduction of power per function due to the scaling is more or less balanced by the tremendous increase of the number of transistors per chip. However this gain would be effective only if the capacitances of interconnects are not too large. Another factor in expression (1) is the electrostatic potential at which the charge Q is brought. At present, there is a strong incentive for reducing it. Whereas the supply voltage of current high performance circuits is in the range 1.2-1.8V, operation at only 0.3V on experimental circuits has already been demonstrated [3], which is close to the bottom limit anticipated by the ITRS. For a lower value the device is not in well defined On or Off states which results in either leakage or poor performance. What can be expected from SET's ? Before giving an answer to this question, their properties and modes of operation are briefly recalled. 3.2.1

Background on single electron transistors

A SET is a device which comprises a Source and a Drain reservoir of electrons and a control gate, like MOSFET's. In between, there is an island

TLFeBOOK 42 where carriers should be confined [2] (see Fig. 3-1). A common solution to obtain this effect is to insert tunnelling or potential barriers between the reservoirs and the island. This is the main structural difference from MOSFET's, but it is essential for the operation of SET's. Due to this confinement, there is always an integer number of electron in the island. However, the probability to have a given amount of charge is a continuous function of the device bias, such that there is also a continuous variation of the average charge versus the external bias.

RT>>RQ

S

D

G

RT Cj

RT Cj

island Cg Vg Figure 3-1. Schematics of a SET

Provided that just one electron more or less has a significant effect on the electrostatic energy of the device, it is shown that, for a given device bias, there are limited possible states of charge in the island [2]. Especially, there are bias domains for which only one state of charge is possible. In this case, there is no exchange of charge with the electron reservoirs and the device is in Off state. This is the Coulomb blockade effect. For the other cases, the number of electron oscillates between the highest probable states of charge leading to a flux of carriers between source and drain. For instance, when the two states n and n+1 are possible, the current is due to the repetition of the sequence: one electron coming from the source to the island then leaving the island to the drain. As shown in Fig. 3-2, the electrical characteristics of SET's are very different from those of MOSFET's. The ID(VG) curves have periodic oscillations of current and the output characteristics look like a resistance (or staircase for non-symmetrical device) with a low drain voltage domain where the device is periodically Off and On as a function of VG. The period of Coulomb Blockade Oscillations, CBO, is given by e/Cg. Between two successive oscillations, the only difference is that the average number of

TLFeBOOK 43 electron in the island is incremented or decremented by one. At a peak of current, two dominant states of charge have equal probability and, on the average, there is a half integer number of electron in the island.

-7

-7

ID (A)

1,2 10

) A ( -7 1 10 D I 8 10

-8

6 10

-8

4 10

-8

-7

1,5 10

VD=0.4V

-7

1 10 200mV

Vg=0.45V

-8

5 10

100mV 2 10

2 10

-8

Vg=0.1V 20mV 0

0

0,5

0 1

1,5

2 Vg (V)

0

0,1 0,2 0,3 0,4 0,5 0,6 0,7 Vd (V)

Figure 3-2. Typical ID(VG) and ID(VD) characteristics of a SET. They have been obtained by simulation with the following parameters: Cj=0.1aF, Cg=0.2aF, RT=1MΩ, T=300K

To observe the previous typical characteristics, there are two important conditions to meet. Firstly, the charging energy, which is the electrostatic energy increase due to the arrival of one electron in the island, should be large in comparison to the thermal energy kT: EC =

e2 >> kT 2CΣ

(2)

where e is the electron charge (absolute value), CΣ is the total capacitance of the island, CΣ=2Cj+Cg, where Cj is the junction capacitance and Cg is the gate to island capacitance. For room-temperature operation, CΣ should be less than 0.3 aF (Ec=10 kT, T=300K), which requires an island smaller than a few nm. The second condition is related to the confinement of the electron wave function in the island, which is essential to quantize the charge in this island. The resistance of the tunnel barriers should exceed the quantum resistance RK=h/e2~25.8 kΩ. For the fabrication of SET’s, there are many different possibilities since any kind of conductive material can be used for the island, metallic as well as semiconductor and even molecular. However silicon is advantageous for CMOS compatibility and also for the stability of devices [4].

TLFeBOOK 44 3.2.2

Designing a low VDD inverter

With regard to the power consumption of digital circuits, we consider in this part the case of a simple inverter, since this is a convenient reference to make comparisons with CMOS. The design of a SET inverter has been discussed by many authors [5,6,7].They pointed out that, since there is only one kind of SET, the complementary action of the pull-up and pull-down devices is not as easy to obtain as in CMOS where two types of transistor exist. A first solution is to choose the supply voltage in order that both of these devices are On or Off in a complementary way in the switching part of the transfer characteristic. An example of such situation is shown in Fig. 3-3. The shaded area displays the Coulomb blockage domains of the pull-up and pull-down transistors at zero temperature. Based on that, the transfer characteristics has been schematically drawn. Contrary to CMOS, we can observe that the voltage swing is less than rail-to-rail and that the DC current is minimal at the transition point.

Vout (V) 1 0,8 0,6

VDD

0,4 0,2 1,00

0 0,00

0,20

0,40

0,60

0,80

-0,2 -0,4 -0,6 Vin (V)

Figure 3-3. Theoretical Coulomb blockade domains, also known as Coulomb diamonds, (shaded areas) at 0K, for the pull-down and pull-up SET's of an inverter. At RT they are a little narrower. Cj=0.1aF, Cg=0.2aF, VDD=0.53V. The bold line is a drawing of the transfer characteristics.

TLFeBOOK 45

-7

10

1,6

Imin (A)

Voltage gain

Since a low VDD is advantageous for low power applications, we discuss now the possibility to minimize it for this simple SET inverter, taking account of the design constraints and aiming room-temperature operation: • Cg + 2.Cj < 0.3aF for RT operation (for Ec~10kT) • Cg / Cj > 1 for voltage gain • VDD = e / (Cg + Cj) for complementary action of transistors As a result, a very low VDD and RT operation would be difficult to achieve simultaneously. In fact, with the previous equations and for a ratio of gate to junction capacitances of 2, the minimum VDD would be equal to 0.7V ! However, for temperatures above 0K, the switching of the SET from Off to On state is not abrupt since there is an exponential variation of the current, equivalent to the subthreshold current of MOSFET’s. Consequently, the real Coulomb blockade diamonds are narrower than those shown in Fig. 3-3 and it is possible to reduce VDD. This is demonstrated in Fig. 3-4, where the DC voltage gain and the DC current at the transition point of an inverter have been plotted versus VDD. Note also that the constraint on CΣ has been a little relaxed. As thoroughly discussed by A. Korotkov [6], the acceptable VDD window is quite narrow. A too low VDD value would be detrimental for the noise margin and for the speed since the DC current at the transition point is exponentially decreasing with VDD. On the contrary, a higher value would increase the power consumption.

1,4 1,2

-8

10 V =e/(Cj+Cg)

1

DD

0,8 Cj=0.1aF Cg=0.2aF R =1MΩ

0,6 0,4

-9

10

T

T=300K -10

0,2 0

0,1

0,2

0,3

0,4

0,5

0,6

0,7 V (V)

10 0,8

DD

Figure 3-4. DC voltage gain (solid line) and DC current (dashed line) at the transition point of a SET inverter at room temperature.

TLFeBOOK 46 To go further in reducing VDD, a solution is to add control gates to each SET (Fig. 3-5). Based on this approach, NTT has demonstrated a quasiCMOS operation inverter at a supply voltage as low as 20 mV8, which is very advantageous for the power consumption, but in this case the temperature was only 27K. The bias of the control gates shifts the CBO, making possible to select the optimal part of the ID(VG) characteristics of each SET for complementary action. In this way, the equivalent of two types of transistors can be obtained, like for CMOS. In addition, their equivalent threshold voltages can be tuned, balancing the influence of eventual parasitic (background) charges in the neighbourhood of the SET: ∆Vg = −

C gc Cg

∆Vgc

(3)

To get a symmetrical transfer characteristic, from the Coulomb diamonds, it can be easily demonstrated that the sum of the control gate voltages should be equal to VDD: Vgcss + Vgcd d = VDD

(4)

As a result there is one more degree of freedom to design an inverter, in comparison to the case without control gates. That gives flexibility to fix the value of VDD. In fact, there is now one optimal supply voltage, leading to complementary states of pull-up and pull-down transistors, for each bias of the control gates. Taking equation (4) into account, it is given by: VDDopt =

e − 2CgcVgcss

(5)

Cg + C j

There is a consequent reduction of VDDopt thanks to the control gates, but it is important to note that the constraint on the total capacitance (equation 2) should also take account of the contribution of Cgc : CΣ=2Cj+Cg+Cgc A drawback of this approach is the requirement of extra lines to distribute the control gate voltages. However, this can be avoided in the particular case where Vgcss=VDD and Vgcdd=0V (VSS=0V). For this condition, the optimum value of VDD is given by: VDDopt =

e C g + C j + 2C gc

(6)

TLFeBOOK 47

5 10

-8

=0.1aF 4 10

-8

2 Master Equation

Cj=0.05aF C =0.1aF G

C

G

V

C

1,5

GC

load

=0.5fF

R =1MΩ

3 10

T

τ

2 10

p

-8

1 10

DD

T=300K +V V GCss

=V

GCdd

VGdd

Vout Vin C G

-8

CL

VGss

DD

0

0,5 10

VDD

CG

-8

1 V =0.3V

propagation delay (s)

DC voltage gain

As discussed previously for the case without control gates, at RT the Coulomb blockade area of SET is narrower than at 0K which makes possible a reduction of VDD or a change of bias of the control gates for a given VDD. However, it also results in a change of the SET current which may affect the speed of circuits. Consequently, there is a design trade-off. To illustrate it, in Fig. 3-5 we have plotted the variations of the DC voltage gain and of the propagation delay along a chain of SET inverters versus the DC current at the transition point of the transfer characteristics. The shaded area shows the most advantageous design window. In this example, the load capacitance is equal to 0.5fF, but for another value the design window would be the same, since the propagation delay directly scales with this capacitance. This is a difference with CMOS where the dominant load capacitance of dense logic is due to the gate capacitance of MOSFETs. Here, the gate capacitance of SETs is extremely small and the dominant load capacitance comes from the local interconnects. In fact the later should be much larger than e/2VDD to avoid any detrimental effects of the shot noise. Regarding the dynamic power consumption of SET logic, as long as a CMOS output buffer is not implemented, the major contribution would also come from the load capacitance due to the interconnects.

-10

10

-9

-8

-7

10 10 DC current @ transition point (A)

Figure 3-5. Variations of the DC voltage gain (solid line) and of the propagation delay along a chain of SET inverters (dashed line) versus the DC current at the transition point of the transfer characteristics. VDD=0.3V. The control gate voltages are varied as follow: 0.7V