128 67 5MB
English Pages 188 [184] Year 2010
Robust Computing with Nano-scale Devices
Lecture Notes in Electrical Engineering Volume 58 Robust Computing with Nano-scale Devices Chao Huang ISBN 978-90-481-8539-9, 2010 Intelligent Automation and Computer Engineering Sio-Iong Ao, Oscar Castillo, and Xu Huang ISBN 978-90-481-3516-5, 2010 Advances in Communication Systems and Electrical Engineering Xu Huang, Yuh-Shyan Chen, and Sio-Iong Ao ISBN 978-0-387-74937-2, 2008 Time-Domain Beamforming and Blind Source Separation Julien Bourgeois, and Wolfgang Minker ISBN 978-0-387-68835-0, 2007 Digital Noise Monitoring of Defect Origin Telman Aliev ISBN 978-0-387-71753-1, 2007 Multi-Carrier Spread Spectrum 2007 Simon Plass, Armin Dammann, Stefan Kaiser, and K. Fazel ISBN 978-1-4020-6128-8, 2007
Chao Huang Editor
Robust Computing with Nano-scale Devices Progresses and Challenges
123
Editor Dr. Chao Huang Virginia Tech Bradley Dept. Electrical & Computer Engineering 302 Whittemore Hall Blacksburg VA 24061 USA [email protected]
ISSN 1876-1100 e-ISSN 1876-1119 ISBN 978-90-481-8539-9 e-ISBN 978-90-481-8540-5 DOI 10.1007/978-90-481-8540-5 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2010922792 c Springer Science+Business Media B.V. 2010 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Cover design: eStudio Calamar S.L. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Contents
Introduction . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Chao Huang
1
Fault Tolerant Nanocomputing .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Bharat Joshi, Dhiraj K. Pradhan, and Saraju P. Mohanty Introduction .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Principles of Fault Tolerant Nanocomputer Systems . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Hardware Redundancy .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . N-Modular Redundancy .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . NAND Multiplexing .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Reconfiguration .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Information Redundancy .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Time Redundancy . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Coverage . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Process Variations in Nanoscale Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Fault Tolerant Nanocomputer Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . General-Purpose Computing .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Long-Life Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Critical-Computation Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . High-Availability Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Maintenance Postponement Applications . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Trends and Future . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . References . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .
7
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Aiman H. El-Maleh, Bashir M. Al-Hashimi, Aissa Melouki, and Ahmad Al-Yamani Introduction .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Previous Approaches .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Defect-Tolerant Techniques .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Defect Avoidance Techniques.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Defect-Tolerant FPGA Design Techniques.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Reliability Analysis.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .
7 9 11 11 13 15 18 18 19 20 21 22 22 22 23 23 23 24
29
29 31 31 33 35 37 v
vi
Contents
Proposed Defect Tolerance Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Experimental Results . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Stuck-Open and Stuck-Short Defect Analysis . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Bridging Defect Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Conclusion .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . References . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Yexin Zheng and Chao Huang Introduction .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Background . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Nanowire-Based PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Defect Model . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Redundancy-Based Defect-Tolerant Techniques.. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Conventional Redundancy-Based Techniques . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Logic Duplication in PLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Defect-Aware Logic Mapping for Nanowire-Based PLA . . . . . . . . . . . . . .. . . . . . . . . . . Defect-Free Subset Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Defect-Aware Mapping Through Bipartite Graph Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . SAT-Based Defect-Aware Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Experimental Results . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Conclusions and Perspectives .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . References . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Mohammad Tehranipoor Introduction .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . NanoFabric Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Fault Model . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Summary of the Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Related Prior Work . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . BIST Procedure.. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Test Architectures (TAs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Test Procedure . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Test Configurations .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Test Configurations for Detecting Faults in NanoBlocks .. . . . . .. . . . . . . . . . . Test Configurations for Detecting Faults in SwitchBlocks . . . . .. . . . . . . . . . . Number of Test Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Time Analysis for BIST .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Recovery Analysis. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .
37 42 42 45 46 47
51 51 52 52 53 54 54 55 56 57 58 60 64 67 67
69 69 70 72 73 73 75 76 78 79 79 82 85 87 88 90
Contents
vii
Discussion . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 95 Conclusion .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 96 References . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 96 The Prospect and Challenges of CNFET Based Circuits: A Physical Insight . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 99 Bipul C. Paul, Shinobu Fujita, Masaki Okajima, and Thomas Lee Introduction .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 99 Fundamentals of CNFET.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .101 Compact Model of CNFET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .103 Impact of Parasitics on Circuit Performance .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .107 Geometry Dependent Parasitic Capacitance . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .108 Analysis of Fringe Capacitance .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .109 Impact of Process Variation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .112 A Physical Insight to CNFET Characteristics Under Process Variation . .112 CNFET Performance Under Process Variation . . . . . . . . . . . . . . . . .. . . . . . . . . . .113 Summary .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .119 Appendix I: Polynomial Expression of QCNT . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .120 References . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .121 Computing with Nanowires: A Self Assembled Neuromorphic Architecture.. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .125 Supriyo Bandyopadhyay, Koray Karahaliloglu, and Sridhar Patibandla Introduction .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .125 Self Assembly of Neuromorphic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .126 Electrical Characterization of the Nanowires . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .127 What Causes the Negative Differential Resistance? . . . . . . . . . . . .. . . . . . . . . . .130 Other Electrical Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .130 Simulation Results . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .132 Conclusion .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .136 References . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .136 Computational Opportunities and CAD for Nanotechnologies .. . .. . . . . . . . . . .137 Michael Nicolaidis and Eleftherios Kolonis Introduction .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .137 A Holistic CAD Platform for Nanotechnologies.. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .139 High-Level Modeling and Simulation Tool (HLMS) .. . . . . . . . . . . . . . . . . .. . . . . . . . . . .141 Artificial Ecosystems .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .144 Spontaneous Creation of Artificial Organisms .. . . . . . . . . . . . . . . . .. . . . . . . . . . .144 Optimization of Existing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .145 Acquisition of New Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .145 Systems of Particles and Emergence of Relativistic Space-Time . . . . . .. . . . . . . . . . .146 Nano-Network Architecture Fit Tool (NAF) .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .147
viii
Contents
Circuit Synthesis Tool (CS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .148 Architectures for Complex Systems Implementation .. . . . . . . . . . . . . . . . . .. . . . . . . . . . .149 Architectures for Systems Comporting Mobile Modules .. . . . . .. . . . . . . . . . .149 Generic Communication Architectures for Systems Comprising Mobile Modules . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .153 Nonconventional Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .157 Computational Model of Quantum Systems . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .157 Implementing Complex Systems Composed of Quantum Objects . . . . . . . .161 Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .162 CAD for Non-conventional Architectures .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .169 Conclusions .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .170 References . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .171 Index . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .175
Introduction Chao Huang
Although complementary metal-oxide semiconductor (CMOS) technology will continue dominating the digital electronic circuits for the next 10–15 years, a number of new challenges have already stood in the way as the transistor size scales down. The rising costs of semiconductor mask and fabrication pose economic barriers to lithography. The quantum effects and increasing leakage power begin setting physical limits on continuous size-shrinking of CMOS transistors. Hence, the development of alternative new technologies has become critical. The current research advances of innovative nano-scale devices have created great opportunities to surpass physical barriers faced by CMOS technology, which include nanowires, carbon nanotube transistors, programmable molecular switches, resonant tunnelling diodes, quantum dots, etc. Although the researches of these nano-electronics are still at their infancy, they have demonstrated promising potentials on high circuit density, low power consumption, and cost-efficient computing solution. Through integrating nano-scale components such as wires and switches into conventional CMOS circuits, novel nano-electronic or hybrid nano/CMOS architectures and systems, exploiting the advantages of underlying nanotechnologies, have stimulated a wide range of possibilities on future nano-computing systems. The popular choices are all-regular structures, such as programmable arrays, since they are relatively easier to build with bottom-up construction using available nano-scale and molecular devices. These regular structures can be configured to implement specific circuit systems. The success of many emerging nanotechnologies relies on the self-assembly fabrication process that is required to fabricate circuits. Unfortunately, the stochastic self-assembly fabrication has low reliability with defect densities predicted to be several orders of magnitude higher than conventional CMOS technology. In a recently fabricated 8 8 crossbar structure by Hewlett-Packard, a defect rate of 15% of the molecular switches at the crosspoints is reported. With such high intrinsic defect rates, defect-tolerance methods at different design levels have become the key issues of future nano-electronic system design. Generally speaking, most of the defect-tolerance techniques for reliable nano-electronic system design are based on C. Huang () Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, Virginia, 24061 e-mail: [email protected] C. Huang (ed.), Robust Computing with Nano-scale Devices: Progresses and Challenges, Lecture Notes in Electrical Engineering 58, c Springer Science+Business Media B.V. 2010 DOI 10.1007/978-90-481-8540-5 1,
1
2
C. Huang
either masking faults with circuit redundancy or post-fabrication configuration to bypass defects. For the permanent defects, detecting and configuring around them is arguably the most economical way. However, the difficulties of these approaches are how to efficiently detect the defects among billions of on-chip devices and how to effectively handle these defects afterwards. The transient faults present another challenging problem due to the small feature size of nano-electronics. The redundancy-based methods are good solutions to tackle this challenge. This book focuses on various issues of robust nano-computing, in other words, defect-tolerance design for nano-technology. It is organized in two parts. The first part (second chapter to fifth chapter) addresses both redundancy- and configurationbased methods as well as fault detecting techniques. The second part (sixth chapter to eighth chapter) deals with device modeling and CAD tools, and also provides directions for future research. Second chapter provides a brief overview of various techniques on how to handle faults in nano-electronics. It first introduces the design principles for good fault–tolerance systems. To quantitatively analyze fault–tolerance efficiency, the reliability functions are formulated and used to provide theoretical comparisons. It summarizes the existing methods in different categories of hardware redundancy, information redundancy, and time redundancy. Specifically, the techniques utilizing hardware redundancy, such as N-modular redundancy, NAND multiplexing and reconfiguration, are discussed in detail. As error recovery capability is essential to both real- and run-time fault detection and reconfiguration, this chapter also discusses the different strategies based on Markov models. Process variation is one of the major factors that affect leakage dissipation, power consumption, yield, defects, and hence reliability of nano-system design. To illustrate the impacts, SRAM memory circuit is considered as example for faithful operations of both hold and read states. Because of different design requirements, fault tolerance design philosophy varies among different applications. This chapter further addresses different strategies for general-purpose computing, long-life applications, critical-computation applications, high-availability applications, and maintenance postponement applications. Third chapter investigates a specific redundancy-based technique at the transistor level for defect tolerance purpose. It replaces each transistor with a quaddedtransistor structure that implements redundant logic functions. The new design scheme is proved immune to all the single permanent defects, including stuckopen, stuck-close, and bridging defects, and a large number of multiple defects. The quadded-transistor structure is further generalized to an N 2 -transistor structure that guarantees defect tolerance of all the defects with multiplicity upto N 1. Theoretical analysis and simulation demonstrates that this transistor-level redundancy-based defect tolerance technique is more effective than the gate-level and unit-level approaches, in terms of both reliability and area overhead. It can also tolerate interconnect defects by adding parallel interconnect lines. The high-reliability quaddedtransistor structure can serve as the critical components, such as the voter gate in the triple-modular redundancy method. This transistor-level defect tolerance design initiates a new direction of hybrid techniques integrating design at both transistor level and higher levels.
Introduction
3
Fourth chapter focuses on the reliability problem of a popular nano-electronics structure, the nanowire-based programmable logic array (PLA). The nanowirebased PLA structure is a crossbar containing two layers of parallel nanowires perpendicular to each other with molecular switches at the crosspoints. The input/output signals are connected to the nanowires while the switches are configured as on/off to implement arbitrary logic functions. However, this structure is vulnerable to various defects of the nano-devices such as stuck-open and stuck-close at the crosspoints or broken nanowires. Several defect-tolerance techniques such as logic duplication, defect-free subset identification, and defect-aware mapping are discussed in detail. This chapter also introduces a defect-aware mapping approach using Boolean satisfiability (SAT). It formulates the PLA logic mapping problem as conjunctive normal form (CNF) formulae and defects as additional constraints. The resulting CNF is solved by SAT solvers – a solution to the formulae is a feasible mapping result. With the help of state-of-the-art SAT solvers, this configuration scheme is able to handle large logic functions with defect rates up to 20% efficiently. The configuration-based defect-tolerance approaches depend heavily on the accurate information of existing defects, including defect types, locations, etc. Usually, before configuration, a test and diagnose process is conducted on the defect-prone devices to locate defects. After the testing procedure, the information of detected faults are stored in a database, namely defect map. The faulty function units can then be bypassed during configuration according to the defect map. That is to say, efficient test and diagnose methods to locate defects are essential prerequisites for defect-aware reconfiguration. Fifth chapter presents a built-in self-test (BIST) procedure for the configurable nano-electronics structure, nanoFabrics. The nanoFabrics architecture consists of an array of clusters, where each is an array of nanoBlocks or switchBlocks. The nanoBlock is a molecular logic array which serves as function units, while the switchBlock can be configured to provide interconnection among nanoBlocks. The proposed BIST method configures each nanoBlock as either test pattern generator (PG) or response generator (RG). A test group contains a PG, an RG, and a switchBlock between them. The PG tests itself and provides the test patterns for the RG in the same test group, while the RG receives the patterns, tests itself, and generates responses. The architecture is decomposed into small test groups and the testing process is conducted in parallel to improve efficiency. Through evaluating the responses, the test groups are identified as either fault-free or faulty. A recovery increase procedure is then conducted to identify the fault-free blocks (PG, RG, or switchBlock) in the faulty test group. Due to increasing hardware density and design complexity of nano-electronic circuits, the abstraction-based design methodologies can enhance design simulation and verification at different design levels. For these methods to be effective and practical in the context of emerging nano-electronic circuit systems, accurate models of device, circuit, and architecture should be established to ensure correct functionality throughout the entire abstraction hierarchy. Although molecular devices may behave like conventional devices, the ways by which they achieve logic functionality can be significantly different, thus requiring new modeling methods. Moreover,
4
C. Huang
high-quality models can provide reliable cost metrics, such as performance, power, area, etc., to assist decision making during design space search. Hence, they are of great importance toward the success of future computer-aided design (CAD) tools. Another concern for nano-electronics is the development of CAD tools to accommodate the unique properties of new nanotechnologies. When the number of devices as well as defects increases significantly in nano-electronics, the CAD tools that can leverage computer powers to achieve quality system design rise to the center stage. Although many parts of the CAD flows for current CMOS technology can be directly applied to nanotechnologies, there are unprecedented design concerns and challenges, especially those related to high defect rates. Taking defects into account, each chip is unique. The question now becomes how a CAD tool can uniformly handle millions of chips differing in their own fault patterns. Sixth chapter focuses on the modeling, analysis, and simulation of carbon nanotube FETs (CNFETs), an alternative of silicon MOSFET, from the circuit and system design perspectives. It first introduces the basics of CNFETS and then discusses a circuit-friendly model for analysis, specifically the model of intrinsic and parasitic capacitance. The impacts of capacitance on circuit performance are examined through a quantitative analysis and comparison with its MOSFET counterpart. The relation of device parameter and circuit performance presents a direction of CNFET-based circuit design and optimization. This chapter also addresses the CNFET performance under process variation. Through analysis and simulation of various CNFET characteristics due to process variation, such as driven current and capacitances, the CNFET has demonstrated less sensitive to process variation than MOSFET. Therefore, CNFET is a natural fit for future fault–tolerance design. Seventh chapter is dedicated to the electrical characterization of another promising nano-device, nanowire. This chapter describes the self-assembly process of nanowire network. Because of the bottom-up self-assembly fabrication process, nanowires have a special feature of non-linearity in the current–voltage (I–V) curve, which is called negative differential resistance (NDR). The NDR behavior can be modeled using measured I–V characteristics. By exploiting the NDR feature, an array of nanowires exhibit powerful computation capabilities, especially for signal processing applications. The simulation results demonstrate two interesting application examples in image processing: smoothing and edge-enhancement detection. Eighth chapter gives an outlook of novel design paradigms for nanotechnology. The new design philosophy emphasizes on mapping complex systems directly into nano-scale circuits, without building computational units, such as processors. The corresponding holistic CAD flow is proposed, consisting of high-level modeling and simulation, nano-network architecture fit, and circuit synthesis. These tool sets help bridging the gaps between the high-level description, nano-network architecture, and final circuit implementation. Two complex systems, artificial ecosystems and artificial universes, are used as examples to illustrate the proposed CAD flow. Finally, the quantum computing system is discussed, including the computational model, architecture, and potential computing capability. In summary, this book provides novel research initiatives to address two major problems confronting future nano-electronic system design: the reliability issues
Introduction
5
and the development of accurate computation models and tools. There are a lot of challenging questions remaining open in this emerging yet promising area. We hope that this book can present an insightful overview of the ongoing researches in nano-electronic devices, circuits, architectures, and design methods, as well as serve as helpful materials for the researchers who are interested in the arena of nano-computing.
Fault Tolerant Nanocomputing Bharat Joshi, Dhiraj K. Pradhan, and Saraju P. Mohanty
Abstract This chapter provides an overview of fault tolerant nanocomputing. In general, fault tolerant computing can be defined as the process by which a computing system continues to perform its specified tasks correctly in presence of faults with the goal of improving the dependability of the system. Principles of fault tolerant nanocomputing as well as applications of the fault tolerant nanocomputers are discussed. The chapter concludes by observing trends in the fault tolerant nanocomputing. Keywords Availability Coverage Dependability Fault tolerance Redundancy Reliability
Introduction Fault tolerant computing can be defined as the process by which a computing system continues to perform its specified tasks correctly in presence of faults. These faults could be transient, permanent, or intermittent faults. A fault is said to be transient if it occurs for a very short duration of time. While permanent faults are the faults that continue to exist in the system, intermittent faults repeatedly appear for a period of time. They could be either hardware or software faults caused by errors in specification, design, or implementation, or faults caused by manufacturing defects. They could also be caused by external disturbances or simply due to ageing of the components in a system. B. Joshi () University of North Carolina-Charlotte, USA e-mail: [email protected] D.K. Pradhan University of Bristol, UK e-mail: [email protected] S.P. Mohanty University of North Texas, USA e-mail: [email protected] C. Huang (ed.), Robust Computing with Nano-scale Devices: Progresses and Challenges, Lecture Notes in Electrical Engineering 58, c Springer Science+Business Media B.V. 2010 DOI 10.1007/978-90-481-8540-5 2,
7
8
B. Joshi et al.
The goal of fault tolerant computing is to improve the dependability of a system where dependability can be defined as the ability of a system to deliver service at an acceptable level of confidence in either presence or absence of faults. Among all the attributes of dependability reliability, availability, fault coverage, and safety are commonly used to measure dependability of a system. Reliability of a system is defined as the probability that the system performs its tasks correctly throughout a given time interval [18, 49, 52]. Availability of a system is the probability that the system is available to perform its tasks correctly at time t. Fault coverage, in general, implies the ability of the system to recover from faults and continue to operate correctly given that it still contains a sufficient complex of functional hardware and software. Finally, safety of a system is the probability that the system will either operate correctly or switch to a safe mode if it operates incorrectly in event of a fault. The concept of improving dependability of computing systems by incorporating strategies to tolerate faults is being attempted before. Earlier computers, such as the Bell Relay Computer built in 1944, used ad-hoc techniques to improve reliability. The first systematic approach for fault tolerant design to improve system reliability was published by von Neumann in 1956 [58]. Earlier designers improved the computer system reliability by using fault tolerance techniques to compensate for the unreliable components that included vacuum tubes and electromechanical relays that had high propensity to fail. With the advent of more reliable transistor technology the focus shifted from improving reliability to improving performance [18, 38, 50]. While component reliability has drastically improved over the past 40 years increases in device densities have led to realization of complex computer systems that are more prone to failures. Researchers in industry and academe have proposed techniques to reduce the number of faults. However, with the advent of nanotechnology where reliability of the nanoelectronic devices is not fully convincible, it has been recognized that a shift in design paradigm to incorporate strategies to tolerate faults is much more important and necessary in order to improve dependability. The invention of nanometer-scale devices suggest that extremely large scale integration of these devices, in the order of 1012 devices=m2 , will be potentially feasible in future. A chip with 2.9 billion of 32 nm nanoscale CMOS (complementary metal oxide) devices has already been demonstrated [17]. However, to fully realize the benefits offered by such systems it is prudent to address the uncertainties introduced due to inherent characteristics of the devices, stochastic nature of the fabrication processes, and external disturbances. These uncertainties will lead to faulty devices and interconnects. The high level of unreliabilities common to the nanoscale devices stem from two different origins. The probabilistic nature of the self-assembly fabrication process will inevitably result in a fairly large percentage of defective devices. For example, while the manufacturing failure rate of the conventional complementary metal oxide semiconductor (CMOS) device is approximately 107 106 , the failure rate of a typical nanoscale device is just under unity. Moreover, these devices are more sensitive to external disturbances, such as radiation effects, electromagnetic influence, parameter fluctuations, and high temperatures. In addition,
Fault Tolerant Nanocomputing
9
process variations introduced due to uncertainty and complexity of nanoscale device fabrication process can affect leakage dissipation, power consumption, reliability, and defect [30, 32, 33]. Hence permanent and transient [28] faults or defects will occur during the manufacturing phase and during the field operation due to such factors as aging, while transient faults will appear in the field due to external disturbances. Thus successful realization of effective systems consisting of a trillion unreliable devices per square centimeter has created interesting opportunities and unique challenges in incorporating fault tolerance techniques in nano-computing. In the current VLSI processes the defect (fault) density is in the range of 1 part per billion and thus the manufacturer can afford to discard the defective chip. Some manufacturers also add redundancy that can replace the defective (faulty) resource with the goal of increasing the yield. While the exact levels of fault densities in nanoelectronic systems are not known researchers have assumed that up 15% of the resources on a chip will be faulty at the end of the manufacturing cycle. Hence, discarding such large percentage can over burden the cost of healthy chips which will be eventually sold to costumers. With this high level of fault densities none of the techniques used in the current VLSI processes are applicable. Thus, reliability issues need to be addressed at each level of circuit abstraction, from system level to device level. Handling reliability issues may be preferred at higher levels of design abstraction, such as system or architectural level for detection and prevention of faults at the early stage of design cycle. Current fault tolerance research for nanoelectronic systems can be broadly divided into three classes, namely: (1) masking techniques, (2) reconfiguration techniques, and (3) designs that are inherently fault tolerant.
Principles of Fault Tolerant Nanocomputer Systems The goal of fault tolerant nano-computer design is to improve the dependability of the system by improving the reliability during the manufacturing phase as well as field operation. Any fault tolerance technique requires the use of some form of redundancy which increases both the cost and the development time. Furthermore, redundancy can also have impact on performance, power dissipation, weight, and size of the system. Thus a good fault tolerant design is a trade-off between the level of dependencies provided and the amount of redundancies used. For nanocomputers the redundancy could be in various forms including: (1) hardware, (2) information, and (3) temporal. One of the most challenging tasks in the design of fault tolerant systems is the evaluation of the design for the dependability requirements [18,21,39,42,48,50,59]. All the evaluation methods can be divided into two main groups, namely the qualitative and quantitative methods. Qualitative techniques, which are typically subjective, are used when certain parameters that are used in the design process cannot be quantified. As the name implies, in the quantitative methods numbers are derived for a certain dependability attribute of the system and different systems
10
B. Joshi et al.
can be compared with respect to this attribute by comparing the numerical values. However, this requires development of appropriate models. At present, several probabilistic models have been developed based on combinatorial techniques and Markov and semi-Markov stochastic processes. Fundamental to all these models are the probability of failure, pm , of a nanoscale device during manufacturing and the failure rate defined as the expected number of failures per a time interval during field operation. Failure rates of most of the electronic devices follow bathtub curve shown in Fig. 1. Usually the useful life phase is the most important phase in a system’s life and during this time the failure rate is assumed to be constant and it is typically denoted by . During this phase reliability of a system and the failure rate are related by the exponential failure law expressed as: R(t) D et : Figure 2 shows the reliability function with the constant failure rate. The failure rate is related to the mean time to failure (MTTF) as follows: MTTF D 1=; where MTTF is defined as the expected time until the occurrence of the first failure of the system.
Fig. 1 Bathtub curve
Fig. 2 Reliability function with constant failure rate
Fault Tolerant Nanocomputing
11
Researchers have developed computer-aided design (CAD) tools based on quantitative methods, such as CARE III (NASA’s Langley Research Center), SHARPE developed at the Duke University, DEPEND (University of Illinois at Urbana Champaign), and HIMAP (Iowa State University), to evaluate dependability requirements. In the last several years there has been a considerable interest in the experimental analysis of computer system dependability. In the design phase CAD tools are used to evaluate the design by extensive simulations that include simulated fault injection. On the other hand, during the prototyping phase the system is allowed to run under a controlled environment. During this phase physical fault injection is used for the evaluation purposes. Several CAD tools are available including FTAPE developed at the University of Illinois at Urbana Champaign, NANOLAB developed at Virginia Polytechnic and State University, and Ballista developed at the Carnegie-Mellon University. In the following sections several redundancy techniques for fault tolerance will be briefly described.
Hardware Redundancy It is one of the most common forms of redundancy in which functionally identical modules replicated so that a faulty module can be replaced by a fault free module. Since each module acts as a fault-containment region the size of the fault containment region is dependent on the complexity of the module. These modules can be in various forms, such as, logic gates, logic blocks, functional units, or subsystems.
N-Modular Redundancy The basic hardware redundancy that can be used are known as passive or static techniques, where the faults are masked or hidden by preventing them from producing errors [34, 39]. Fault masking is accomplished by using voting mechanisms, typically majority voting, and the technique does not require any fault detection or system reconfiguration. The most common form of this type of redundancy is triple modular redundancy (TMR). As shown in Fig. 3, three identical modules are used and majority voting is performed to determine the output. Such a system can tolerate one fault. However, the voter is the single-point-of-failure, that is, the failure of the voter leads to the compete failure of the system.
Fig. 3 Basic triple modular redundancy
12
B. Joshi et al.
A generalization of the TMR technique is the N -modular redundancy (NMR) where N modules ˘ are used, where N is typically an odd number. Such a system can faulty modules. mask up to N1 2 In general, NMR system is a special case of M -of-N system that consists of N modules out of which at least M of them should work correctly in order for the system to operate correctly. Thus, the reliability of M -of-N system is given by, R.t/ D
N X N Ri .t/ Œ1 R.t/N i : i
i DM
For a TMR system N D 3 and M D 2 and if an non-ideal voter is assumed .Rvoter < 1/ then the reliability is given by: R.t/ D
3 X 3 iD2
i
Ri .t/ Œ1 R.t/3i D Rvoter .t/ 3R2 .t/ 2R3 .t/ :
Figure 4 shows reliability plots of a simplex, TMR, and NMR when N D 5 and 7. It can be seen that higher redundancy leads to higher reliability in the beginning. However, the system reliability sharply falls at the end for higher redundancies. Extension of the simple TMR system is cascaded TMR (CTMR). A simple CTMR scheme with simplex voters is shown in Fig. 5. The reliability of the scheme is expressed by the following: n 3 3 2 R .t/ .1 R .t// RCTMR .t/ D RVoter .t/ R .t/ C 2 The NMR scheme for safety usually requires trade-off between reliability and safety. For example, one can use k-out-of-n system where a voter produces the output y only if at least k modules agree on the output y. Otherwise, the voter asserts the unsafe flag. Noting that the reliability refers to the probability that the
Fig. 4 Reliability plots of NMR system with different values of N . Note that N D 1 implies a simplex system
Fault Tolerant Nanocomputing
13
Fig. 5 Cascaded TMR with simplex voters
system produces correct output and safety refers to the probability that the system either produces the correct output or the error is detectable. Thus, high reliability implies high safety. However, the reverse is not true. Voting of the NMR system becomes complicated when non-classical faults are present. Systems requiring ultra-high reliability a voting mechanism should be able to handle arbitrary failures, including the malicious faults where more than one faulty units may collaborate to disable the system. The Byzantine [24] failure model was proposed to handle such faults. The reliability equations presented above can be used if the faulty devices are independent and uniformly distributed. For example, these equations are reasonable under the transient faults that are expected to dominate during the useful lifetime of the nanoelectronic chip. However, they are not applicable for manufacturing defects and permanent faults since they tend to cluster on a chip rather than being statistically independent. Although it is premature to predict the processes that will be used for manufacturing nanocomputers many researchers, such as in [15, 34], have attempted to provide insight into reliability issues based on the current processes.
NAND Multiplexing In 1956 von Neumann [58] proposed a scheme, called the multiplexing technique, in which redundant unreliable components were used to create a reliable system. There has been a renewed interest in this concept since researchers are attempting to develop reliable systems based on unreliable nanoelectronic devices. The problem of enabling reliable computation using unreliable components is typically termed as probabilistic computation. NAND multiplexing refers to a construction shown in Fig. 6. It consists of N NAND gates and a randomization unit that randomly pairs each input from the first set of inputs with an input from the second set of inputs to form a pair of inputs to one of the NAND gates. In any system based on such a construction each signal is carried on a bundle of N wires and each logic computation is performed by N gates. The state of a bundle (0 or 1) is decided by whether the fraction of the excited wires are below or above a predetermined threshold value.
14
B. Joshi et al.
Fig. 6 NAND multiplexing unit
Fig. 7 NAND multiplexing system
As shown in Fig. 7, the computation is carried out by three cascaded NAND multiplexers. The executive unit performs the NAND operation while the restorative unit compensates for the degradation caused in the executive unit for a certain types of input configurations. von Neumann had concluded that by increasing the bundle size computation by such constructions can be performed such that the probability of the system failure is less than the probability of each NAND gate failing. Recently several researchers have extensively analyzed and proposed designs based on NAND multiplexing, such as the ones in [3,4,15,41,44]. Use of NAND instead of other logic gates can be of further advantage from leakage dissipation point of view. It has been recently observed (Fig. 8) that that NAND has minimal average gateoxide leakage and propagation delay compared other logic gates of same number of inputs [31–33]. Leakage is a major issue nanoscale circuits and system in which gate-oxide leakage is predominating for sub-65 nm CMOS technology [30, 31, 33].
Fault Tolerant Nanocomputing
15 Average Gate Leakage Current (nA/micron) Propagation Delay (ps)
400
Gate Leakage / Propagation Delay
350 300 250 200 150 100 50 0 NAND2
NOR2
AND2
OR2
Various Logic Gates Fig. 8 Gate-oxide leakage and delay of 45 nm CMOS logic gates
Reconfiguration The main motivations for incorporating fault tolerance at chip level is yield enhancement and enable real-time and compile-time reconfiguration with the goal of isolating faulty unit, such as, logic unit, processor or memory cells, during field operation. Strategies for manufacturing time and compilation time reconfigurations are similar as they do not affect the normal operation of the system and there are no real-time constraints imposed on reconfiguration time. Real-time reconfiguration schemes are difficult to design since the reconfiguration has to be performed without affecting the operation of the system. If the erroneous results are not acceptable at all then static or hybrid techniques are typically used, otherwise cheaper dynamic techniques would suffice. The effectiveness of any reconfiguration scheme is measured by two aspects: (1) the probability that a redundant unit can replace a faulty unit and (2) the amount of reconfiguration overhead involved. Numerous schemes have been proposed for all the three types of reconfiguration in the literature [2–9,16,20,22,38,45,46]. One of the first schemes that was proposed for an array of processing elements allowed each processing element to be converted into a connecting element for signal passing so that a failed processor resulted in converting other processors in the corresponding row and column to connecting elements and they ceased to perform any computation. This simple scheme is not
16
B. Joshi et al.
practical for multiple faults since an entire row and column must be disabled for each fault. This problem has been addressed by several reconfiguration schemes that either use spare columns and row or redundant processing elements are dispersed throughout the chip, such as the interstitial redundancy scheme. Hewlett-Packard has recently demonstrated an 8 8 crossbar switches using molecular switches at the cross-points. It was observed that 15% of the switches were defective. Such a high rate of defect rate has drawn the attention of several researchers to design effective strategies to bypass the defective switches in order to perform reliable computation on the remaining fault-free structure. As shown in Fig. 9 [56], a 2-D crossbar system can be represented by a bipartite graph B D .U; V; E/, where U is the set of inputs, V is the set of outputs, and E is the set of switches of the crossbar system. When the system is defect-free the corresponding bipartite graph is complete. Given a defective n n crossbar switches the problem is to find the largest k k defect-free subset. This corresponds to finding a balanced complete bipartite subgraph which has been proven to be NP complete. Thus researchers have attempted to design efficient heuristics to find such subsets [1, 56]. Compile-time and run-time reconfiguration requires built-in fault detection capability. Such strategies typically follow three basic steps: (1) fault detection and location, (2) error recovery, and (3) reconfiguration of the system. When a fault is detected and a diagnosis procedure locates the fault [19, 23, 25, 26, 36] the system is usually reconfigured [37,39,50] by activating a spare module or if the spare module is not available then the reconfigured system may have operate under degraded mode with respect to the resources available. Finally, error recovery is performed in which the spare unit takes over the functionality of the faulty unit from where it left off. While various approaches have been proposed for these steps some of them have been successfully implemented on the real systems. For example, among the recovery strategies the most widely adopted strategy is the rollback recovery using checkpoints [13, 47, 60]. The straight forward recovery strategy is to terminate execution of the program and re-executing the complete program from the beginning on all the processors known as the global restart. However, this leads to severe
a
b
4 x 4 nanoscale crossbar switch
O2
O3
Bipartite graph of crossbar switch of (a)
O4 Partition Horizontal Nanowires
O1 crossbar i1 i2 i3 i4
crosspoint
U
Partition V
i1
O1
i2
O2
i3
O3
i4
O4
Vertical Nanowires
Fig. 9 4 4 Crossbar system and its corresponding bipartite graph
Fault Tolerant Nanocomputing
17
performance degradation and increased power requirements since several tasks may have to be repeated. Thus, the goal of all the recovery strategies is to perform effective recovery from the faults with minimum overheads. The rollback recovery using checkpoints involves storing adequate amount of processor state information at discrete points during the execution of the program so that the program can be rolled back to these points and restarted from there in the event of a failure. The challenge is when and how these checkpoints are created. For interacting processes inadequate check pointing may lead domino effect where the system is forced to go to global restart state. Over the years researchers have proposed strategies that effectively address this issue. Although rollback recovery schemes are effective in most of the systems they all suffer from substantial inherent latency that may not be acceptable for realtime systems and increase power requirements. For such systems forward recovery schemes are typically used [40, 57]. The basic concept common to all the forward recovery schemes is that when a failure is detected the system discards the current erroneous state and determines the correct state without any loss of computation. These schemes are either based on hardware or software redundancy. The hardware based schemes can further be classified as static or dynamic redundancy. Several schemes based on dynamic redundancy and check-pointing that avoid rollback have been proposed in the literature. Unlike rollback schemes, when a failure is detected the roll-forward check-pointing schemes (RFCS) attempt to determine the faulty module and the correct state of the system is restored once the fault diagnosis is done. For example, in one RFCS the faulty processing module is identified by retrying the computation on a spare processing module. During this retry the duplex modules continue to operate normally. Figure 10 shows execution of two copies, A and B of a task. Assume that B fails in the check-pointing interval i . The checkpoints of A and B will not match at the end of the check-pointing interval i at time ti . This mismatch will trigger concurrent retry of the checkpoint interval i on a spare module while A and B continue to execute into the check-pointing interval i C 1. The previous checkpoint taken at ti 1 together with the executable code is loaded
Fig. 10 A RFCS scheme
18
B. Joshi et al.
into the spare and the checkpoint interval i in which the fault occurred is retired on the spare module. When the spare module completes the execution at time ti C1 the state of the spare module is compared with the states of A and B at time ti , and B is identified faulty. Then, the states of A are copied to B. A rollback is required is both A and B have failed. In this scheme one fault is tolerated without paying the penalty of the rollback scheme. Several version of this scheme have been proposed in the literature.
Information Redundancy Hardware redundancy is very effective, however it is expensive. In the information redundancy approach, redundant information is added to enable fault detection and sometimes fault tolerance by correcting the affected information. For certain systems, such as memory and bus, error-correcting codes are cost effective and efficient. Information redundancy includes all error-detecting codes, such as various parity codes, m-of-n codes, duplication codes, checksums, cyclic codes, arithmetic codes, Berger codes, and Hamming and Bose-Chaudhuri-Hocquenghem (BSH) error-correcting codes. Selection of a particular coding technique involves tradeoffs among several factors. For example, encoding and decoding requires time and hardware while incorporation of extra bits requires extra storage and circuitry. An important issue is to select a code meets the requirement of error detection and/or correction and at the same time keeps the cost within the acceptable level [17]. Important factors that have to be considered in the selection process of a code include whether the code should be separable, if the code should have the capability to detect error, correct error, or both, and the number of bits errors that need to be detected or corrected. Several researchers have addressed these issues for nanoelectronic digital memories [11, 53–55].
Time Redundancy Both hardware and information redundancy methods require substantial amount of extra hardware which tends to increase size, weight, cost, and power consumption of the system. Time redundancy attempts to reduce the hardware overhead by using extra time. The basic concept is to repeat execution of a software module to detect faults. The computation is repeated more than once and the results are compared. An existence of discrepancy suggests a fault and the computations are repeated once more to determine if the discrepancy still exists. Earlier this scheme was used for transient fault detection. However, with a bit of extra hardware permanent faults can also be detected. In one of the schemes, the computation or transmission of data is done in the usual way. In the second round of computation or transmission the data is encoded and the results are decoded before comparing them to the previous
Fault Tolerant Nanocomputing
19
results. The encoding scheme should be such that it enables detection of the faults. Such schemes may include complementation and arithmetic shifts.
Coverage The coverage probability varies depending on the circumstances. The systems in which the failure rate is dominated by the reliability of the hardware modules being protected through redundancy, their coverage probability may be of relatively minor importance and treating it as a constant may be adequate. However, if the goal is to provide extremely high reliability, then it is easy to provide sufficient redundancy to guarantee that a system failure due to the exhaustion of all hardware modules is arbitrarily small. In this case, failures from imperfect coverage begin to dominate and more sophisticated models are needed to account for this fact. The fault coverage consists of three components: (1) the probability that the fault is detected, (2) the probability that it is isolated to the defective module, and (3) the probability that normal operation can be resumed successfully on the remaining healthy modules. These probabilities are often time dependent, particularly in cases in which uninterrupted operation is required or in which loss of data or program continuity cannot be tolerated. If too much time elapses before a fault is detected, for example, erroneous data may be released with potentially unacceptable consequences. Markov models can be extended to account for coverage effects by inserting additional states. Following a fault three changes happen: (1) the system transitions to a state in which a fault has occurred, but has not yet been detected, (2) from there to a state in which it has been detected but not yet isolated, and (3) from there to a state in which it has been isolated but normal operation has not yet resumed successfully and finally to a successful recovery state. Transitions out of each of those states can be either to the next state in the chain or to a failed state, with the probability of each of these transitions most likely dependent on the time spent in the state in question. State transition models in which the transition rates are dependent on the state dwelling time are called semi-Markov processes. These models can be analyzed using techniques similar to those illustrated in the earlier Markov analysis, but the analysis is complicated by the fact that differences between the transition rates that pertain to hardware or software faults and those that are associated with coverage faults can be vastly different. The mean time between hardware module failures, for example, typically is on the order of thousands of hours whereas the time between the occurrence of a fault and its detection may well be of the order of microseconds. The resulting state-transition matrix is called “stiff” and any attempt to solve it numerically may not converge. For this reason, some of the aforementioned reliability models use a technique referred to as behavioral decomposition. Specifically, the model is broken down into its component parts, one subset of models used to account for coverage failures and the second incorporating the outputs of those models into one accounting for hardware and software faults.
20
B. Joshi et al.
Process Variations in Nanoscale Circuits A single important factor that affects leakage dissipation, power consumption, reliability, defect, yield, and circuit optimization, etc. in nanoscale devices, circuits, and systems is process variation [12, 14, 30, 32, 33, 51]. For the fabrication of circuits using nanoscale devices more and more sophisticated lithographic, chemical, and mechanical processing steps are adopted. The uncertainty in the processes, such as ion implantation, chemical mechanical polishing (CMP), chemical vapor deposition (CVD), etc. involved in nanoscale device fabrication has caused variations in process parameters [12, 14] such as channel length, gate-oxide thickness, threshold voltage, metal wire thickness, via resistance, etc. These variations are categorized as inter-die variations and intra-die. This may include systematic as well random variations. The process variations are treated as global or local variations. These variations can be either temporal or spatial in nature [14]. They ultimately affect design margins and yield and may lead to loss of money in the ever reducing time-to-market scenario. It has been observed that the major components power dissipation, such as gate-oxide leakage .Igate /, and subthreshold leakage .Isub /, and dynamic current .Idyn / as well as propagation delay predominantly depend on the device geometry namely gate-oxide thickness .Tgate /, device length .Le /, threshold voltage .Vth /, and supply voltage .VDD /, etc. Thus, in order to incorporate the process variation effects, statistical modeling of leakage current components is a key area to characterize nano-CMOS. Monte Carlo simulations provide a method to analyze the effect of process variation exhaustively. Via Monte Carlo simulations, the process and design variations are translated into gate-oxide leakage, subthreshold leakage, dynamic current, and propagation delay probability density distributions for a twoinput NAND logic gate. For Gaussian input process and design variations, it is observed that gate-oxide leakage, subthreshold leakage, and dynamic currents are lognormally distributed [30, 32, 51]. On the other hand, the delay follows normal distribution. This is demonstrated in Fig. 11 for a 45 nm CMOS logic gate. In addition to the variations in power, leakage and delay profile of devices and circuits, the variations can have impact on circuit operation. One important example is read-failure that can develop in a SRAM circuit realized using nano-CMOS devices. With increase in device density a larger fraction of a system on a chip (SoC) area is devoted to SRAM, because on-chip memory considerably offers the high system performance in exchange of space and power they consume. Correct operation of the SRAM circuit in both hold and read states are important for its faithfully operations. Let us consider a SRAM with two cross coupled inverted M1;2 and M3;4 with the two inputs of the inverters are Q and QB, respectively. From the plot the voltage transfer characteristics (VTC) as shown in Fig. 12, it is evident that read stability failure because there is no steady stable state. This happens when variations is introduced in the threshold voltage .Vth / of transistors. Thus, a 6-transistor SRAM cell is not stable during read operation under process variations.
Fault Tolerant Nanocomputing
a
21
Frequency
Frequency
b
μ = –18.6 σ = 1.22
Frequency Plot of In IG 300 200 100 0.00 –23
–20
In A
Frequency Plot of In IS
300
100 –20
Frequency
Frequency
d
μ = –17.4 σ = 2.20
Frequency Plot of In ID 300 200 100 0.00 –21
–17
–12
–16
In A
Subthreshold Leakage
Gate-Oxide Leakage
c
μ = –19.18 σ = 1.79
200
0.00 –24
–14
–17
400
–9.0
–13
μ = 265.6 psec σ = 104.3 psec
Frequency Plot of td
300 200 100 0.00 0.00
200p
400p
600p
In A
sec
Dynamic Current
Propagation Delay
Fig. 11 Effects of statistical process variations on gate-oxide leakage, subthreshold leakage, dynamic power, and propagation delay in a two-input 45 nm CMOS NAND logic gate
a
b
0.8
VQ
VQB
M1,2
0.7 0.6
0.6
VQ
VQB
0.4
VQ M3,4
VQB (V)
0.5 VQB
0.4 0.3
VQB M1,2
VQ M3,4
0.2
0.2
0.1
0 0
0.2
0.4 VQ (V)
0.6
0.8 Steady State
No process variations: steady state achieved
0 0
0.1
0.2
0.3 0.4 VQ (V)
0.5 0.6 0.7 No Steady State
Process variations: steady state not achieved
Fig. 12 Voltage Transfer Characteristic (VTC) in read state of a 45 nm CMOS 6-Transistor static random-access memory (SRAM) cell showing read failure introduced due to process variation
Fault Tolerant Nanocomputer Applications Incorporation of fault tolerance in nanocomputers will have tremendous impact on the design philosophy. A good design is a trade-off between the cost of incorporating fault tolerance and the cost of errors that, includes losses due to downtime and the cost of erroneous results.
22
B. Joshi et al.
Historically the applications of the fault tolerant computers were confined to military, communications, industrial, and aerospace applications where the failure of a computer could result in substantial financial loses and possibly loss of life. Currently most of the fault tolerant computers developed can be into five general categories based on the design requirements and challenges. As nanocomputers pervade these areas innovative fault tolerance strategies will have to be developed since these computers will be prone to different types and patterns of faults.
General-Purpose Computing General-purpose computers include workstations, and personal computers. These computers require minimum level of fault tolerance since errors that disrupt the processing for a short period of time is acceptable provided the system recovers from these errors. The goal is to reduce the frequency of such outages and increase reliability, availability, and maintainability of the system. Since the attributes of each of the main components, namely the processor, memory, input, and output, are unique, the fault tolerant strategies employed for these components are in general different. Moreover, the nanoelectronic devices based systems will be highly prone to transient failures than other failures and it will be desirable to facilitate rapid recovery. One of the most effective approaches for rapid error recovery for transient faults would be instruction retry in which the appropriate instruction is retried once the error is detected [35].
Long-Life Applications Applications typical of unmanned space flight and satellites require that the computers have reliability of 0.95 or greater at the end of a 10-year period. Computers in these environments are expected to use the hardware in an efficient way due to limited power availability and constraints on weight and size. Long-life systems can sometimes allow extended outages provided the system becomes operational again.
Critical-Computation Applications In these applications errors in computations can be catastrophic, such as human safety. Applications include certain type of industrial controllers, space applications, such as the space shuttle, aircraft flight control systems, and military applications. A typical requirement for such a system is to have reliability of 0.9999999 at the end of 3-h time period. One of the most important applications of nanocomputing would be in healthcare. It has been envisioned that in future it would be possible to deploy
Fault Tolerant Nanocomputing
23
large number of nanorobots to eliminate diseases and relieve a patient of physical trauma. This will require a high level of reliability and safety involving innovative strategies, such as a nanorobot with self-destruction capability to ensure the safety of a patient in the event of a nanorobot failure.
High-Availability Applications The applications under this category include time-shared systems, such as banking and reservation systems where providing the prompt services to the users is critical. In these applications while system-wide outages are unacceptable occasional loss of service to individual users are acceptable provided the service is restored quickly so that the downtime is minimized. In fact sometimes the downtime needed to update software and upgrade hardware may not be acceptable and so well coordinated approaches are used to minimize the impact on the users.
Maintenance Postponement Applications Applications where maintenance of computers is either costly or difficult or perhaps impossible to perform, such as some space applications, fall under this category. Usually the breakdown maintenance costs are extremely high for the systems that are located remote areas. Maintenance crew can visit these sites at a frequency that is cost-effective. Fault tolerance techniques are used between these visits to ensure that the system is working correctly. Applications where the systems may be remotely located include telephone switching systems, certain renewable energy generation sites, and remote sensing and communication systems.
Trends and Future While the hardware and software costs have been dropping the cost of computer downtime has been increasing due to increased complexity of the systems. This problem will be compounded by the nanoelectronic devices due to high device densities and high failure rates. Studies suggest that the state-of-the-art fault tolerant techniques are not adequate to address this problem. Thus, a shift in the design paradigm may be needed. Power, leakage, variability, performance, yield, and faulttolerance have to be considered simultaneously for fault-tolerant system realization using nano-devices. This paradigm shift is driven in large part by the recognition that hardware is becoming increasingly reliable for any given level of complexity. This is due to the fact that, based on empirical evidence, the failure rate of an integrated circuit increases roughly as the 0.6th power of the number of its equivalent gates. Thus, for a given level of complexity, the system fault rate decreases as the
24
B. Joshi et al.
level of integration increases. If, say, the average number of gates per device in a system is increased by a factor of 1,000, the fault rate for that system decreases by a factor of roughly 16. In contrast, the rapid increase in the complexity of software over the years, due to the enormous increase in the number and variety of applications implemented on computers, has not been accompanied by a corresponding reduction in software bugs. Significant improvements have indeed been made in software development [27, 43], but these are largely offset by the increase in software complexity and in the increasing opportunity for software modules to interact in unforeseen ways [29] as the number of applications escalate. As a result, it is expected that the emphasis in future fault-tolerant computer design will shift from techniques for circumventing hardware faults to mechanisms for surviving software bugs. A good viable alternative approach for fault tolerance would be to mimic biological systems. There is some activity in this area, but it is still in its infancy. For example, an approach to design complex computing systems with inherent fault tolerance capabilities based on the embryonics was recently proposed. Similarly, a fault detection, location, and reconfiguration strategy was proposed last year based on cell signaling, membrane trafficking, and cytokinesis [10]. Furthermore, the existing external infrastructure, such as Automatic Test Equipment (ATE), are not capable of dealing with the new defect levels that nanometer scale technologies create in terms of fault diagnosis, test time, and handling the amount of test data generated. In addition, continual development of ATE facilities that can deal with the new manufacture issues is not realistic. This has considerably slowed down the design of various system-on-a-chip efforts [61]. For example, while the embedded memory typically occupies half of the integrated circuits area their defect densities tend to be twice that of logic. Thus, to achieve and maintain cost advantages requires improving memory yields. As described earlier, one commonly used strategy to increase yield is to incorporate redundant elements that can replace the faulty elements during repair phase. The exiting external infrastructure cannot perform these tasks in a cost effective way. Thus, there is a critical need for on-chip (internal) infrastructure to resolve these issues effectively. The semiconductor industry has attempted to address this problem by introducing embedded intellectual property blocks called the infrastructure intellectual property (IPP). For embedded memory, the IPP may consist of various types of test and repair capabilities.
References 1. Al-Yamani, A., Ramsundar, S., and Pradhan, D., “A Defect Tolerance Scheme for Nanotechnology Circuits,” IEEE Transactions on Circuits and Systems, vol. 54, no. 11, November 2007, pp. 2402–2409. 2. Bertoni, G., Breveglieri, L., Koren, I., Maistri, P., and Piuri, V., “Detecting and Locating Faults in VLSI Implementations of the Advanced Encryption Standard,” in Proc. of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 105–113, Nov. 2003.
Fault Tolerant Nanocomputing
25
3. Bhaduri, D. and Shukla, S., “NANOLAB – A tool for evaluating reliability of defecttolerant nanoarchitectures,” IEEE Transactions on Nanotechnology, vol. 4, no. 4, July 2005, pp. 381–394. 4. Bhaduri, D., Shukla, S., Graham, P., and Gokhale, M., “Comparing reliability-redundancy tradeoffs for two von neuman multiplexing architectures,” IEEE Transactions on Nanotechnology, vol. 6, no. 3, May 2007, pp. 265–279. 5. Bowen N. and Pradhan, D., “Issues in Fault Tolerant Memory Management”, IEEE Transactions on Computers, August 1996, pp. 860–880. 6. Bowen, N. and Pradhan, D., “Processor and memory-based checkpoint and rollback recovery,” IEEE Computer, Vol. 26, No. 2, Feb. 1993, pp. 22–31. 7. Chatterjee, M. and Pradhan D., “A GLFSR-Based Pattern Generator for Near-Perfect Fault Coverage in BIST”, IEEE Transactions on Computers, December 2003, pp. 1535–1542. 8. Chen, C. and Somani, A., “Fault Containment in Cache Memories for TMR Redundant Processor Systems,” IEEE Transactions on Computers, Vol. 48, No. 4, March 1999, pp. 386–397. 9. Chillarege, R. and Iyer, R., “Measurement-Based Analysis of error latency,” IEEE Transactions on Computers, pp.529–537, May 1987. 10. Datta, K., Karanam, R., Mukherjee, A., Joshi, B., and Ravindran, A., “A Bio-Inspired SelfRepairable Distributed fault tolerant Design Methodology with Efficient Redundancy Insertion Technique,” in Proceedings of the 16th IEEE NATW, 2006. 11. DeHon, A., Goldstein, S., Kuekes, P., and Lincoln, P., “Nanophotolithographic nanoscale memory density prospects,” IEEE Transactions on Nanotechnology, vol. 4, no. 2, May 2005, pp. 215–228. 12. Drennan P. and McAndrew, C., “Understanding MOSFET mismatch for analog design,” IEEE Journal of Solid-State Circuits, vol. 38, no. 3, pp. 450–456, March 2003. 13. Elnozahy, E., Alvisi, L., Wang, Y., and Johnson, D., “A survey of rollback-recovery protocols in message-passing systems,” ACM Computing Surveys, vol. 34, no. 3, September 2002, pp. 375–408. 14. Gattiker, A., Haensch, W., Ji, B., Nassif, S., Nowak, E., Pearson, D., Bernstein, K., Frank, D., and Rohrer, N., “High-performance CMOS variability in the 65-nm regime and beyond,” IBM Journal of Research and Development, vol. 50, no. 4/5, pp. 433–449, July–September 2006. 15. Han, J. and Jonker, P., “A system architecture solution for unreliable nanoelectronic devices,” IEEE Transactions on Nanotechnology, vol. 1, no. 4, December 2002, pp. 201–208. 16. Hayne, R. and Johnson, B., “Behavioral Fault Modeling in a VHDL Synthesis Environment”, Proceedings of the IEEE VLSI Test Symposium, Dana Point, California, April 25–29, 1999, pp. 333–340. 17. Intel chips to shrink to 32-nanometer process, http://www.infoworld.com/article/07/09/18/ Intel-chips-shrink-to-32-nanometer-process 1.html or http://www.intel.com/idf/. 18. Johnson, B., “Design and Analysis of Fault Tolerant Digital Systems,” Addison-Wesley, 1989. 19. Joshi, B. and Hosseini, S. “Diagnosis Algorithms for Multiprocessor Systems,” in Proc. IEEE Workshop on Embedded Fault-Tolerant Systems, Boston, MA, 1998. 20. Kaufman, L., Bhide, S., and Johnson, B., “Modeling of Common-Mode Failures in Digital Embedded Systems”, in Proceedings of the IEEE Annual Reliability and Maintainability Symposium, Los Angeles, California, January 24–27, 2000, pp. 350–357. 21. Koren, I. and Mani Krishna, C., “Fault-Tolerant Systems,” Morgan Kauffman Publishers, 2007. 22. Krishnaswamy, S., Markov, I., and Hayes, J., “Logic circuit testing for transient faults,” in Proceedings of the European Test Symposium (ETS’05), 2005. 23. Lala, P., “Self-Checking and Fault Tolerant Digital Design,” Morgan Kaufmann, 2000. 24. Lamport, L., Shostak, S., and Pease, M., “The Byzantine Generals Problem,” ACM Transactions of Programming Languages and System, July 1982, pp. 382–401. 25. Li, M., Goldberg, D., Tao, W., and Tamir, Y., “Fault-Tolerant Cluster Management for Reliable High-Performance Computing,” in Proc. of the 13th International Conference on Parallel and Distributed Computing and Systems, Anaheim, CA, pp. 480–485 (August 2001). 26. Liu, C. and Pradhan, D., “EBIST: A Novel Test Generator with Built-In-Fault Detection Capability”, IEEE Transactions on CAD, Volume 24, No. 8, August 2005.
26
B. Joshi et al.
27. Lyu, M., editor, “Software Reliability Engineering,” McGraw Hill, 1996. 28. Maheshwari, A., Koren, I., and Burleson, W., “Techniques for Transient Fault Sensitivity Analysis and Reduction in VLSI Circuits,” Defect and Fault Tolerance in VLSI Systems, 2003. 29. Messer, A., Bernadat, P., Fu, G., Chen, D., Dimitrijevic, Z., Lie, D., Mannaru, D., Riska, A., and Milojicic, D., “Susceptibility of commodity systems and software to memory soft errors,” IEEE Transactions on Computers, Vol. 53, No. 12, pp.1557–1568, Dec. 2004. 30. Mohanty, S. and Kougianos, E., “Simultaneous Power Fluctuation and Average Power Minimization during Nano-CMOS Behavioral Synthesis”, in Proceedings of the 20th IEEE International Conference on VLSI Design (VLSID), pp. 577–582, 2007. 31. Mohanty, S. and Kougianos, E., “Steady and Transient State Analysis of Gate Leakage Current in Nanoscale CMOS Logic Gates”, in Proceedings of the 24th IEEE International Conference on Computer Design (ICCD), pp. 210–215, 2006. 32. Mohanty, S., Kougianos, E., Ghai D., and Patra, P., “Interdependency Study of Process and Design Parameter Scaling for Power Optimization of Nano-CMOS Circuits under Process Variation”, in Proceedings of the 16th ACM/IEEE International Workshop on Logic and Synthesis (IWLS), pp. 207–213, 2007. 33. Mohanty, S., Ranganathan N., Kougianos, E., and Patra, P., Low-Power High-Level Synthesis for Nanoscale CMOS Circuits, 1st Edition, Springer, 2008. 34. Nikolic, K., Sadek, A., and Forshaw, M., “Fault-tolerant techniques for nanocomputers,” Nanotechnology, vol. 13, 2002, pp. 357–362. 35. Oh, N., Shirvani, P., and McCluskey, E., “Error detection by duplicated instructions in superscalar processors,” IEEE Transactions on Reliability, Vol. 51, pp. 63–75. 36. Omari, R., Somani, A., and Manimaran, G., ‘An adaptive scheme for fault-tolerant scheduling of soft real-time tasks in multiprocessor systems,” Journal of Parallel and Distributed Computing, Vol. 65, no. 5, pp. 595–608, May 2005. 37. Papadopoulou, E., “Critical Area Computation for Missing material Defects in VLSI Circuits,” IEEE Transactions on CAD, vol. 20, pp. 503–528, May 2001. 38. Peper, F., Lee, J., Abo, F., Isokawa, T., Adachi, S., Matsui, N., and Mashiko, S., “Fault tolerance in nanocomputers: a cellular array approach,” IEEE Transactions on Nanotechnology, vol. 3, no. 1, March 2004, pp. 187–201. 39. Pradhan, D. (editor), “Fault-Tolerant Computer System Design,” Prentice Hall, 1996. 40. Pradhan, D. and Vaidya N., “Roll-Forward Checkpointing Scheme: A Novel fault-Tolerant Architecture” IEEE Transactions on Computers, vol. 43, no. 10, October 1994, 1163–1174. 41. Qi, Y., Gao, J., and Fortes, J., “Markov chains and probabilistic computation – a general framework for multiplexed nanoelectronic systems,” IEEE Transactions on Nanotechnology, vol. 4, no. 2, March 2005, pp. 194–205. 42. Ramesh, A., Twigg, D., Sandadi, U., Sharma, T., Trivedi, K., and Somani, A., “Integrated Reliability Modeling Environment,” Reliability Engineering and System Safety, Elsevier Science Limited; UK, Volume 65, Issue 1, March 1999, pp. 65–75. 43. Reis, G., Chang, J., Vachharajani, N., Rangan, R., August, D., and Mukherjee, S., “SoftwareControlled Fault Tolerance,” ACM Transactions on Architecture and Code Optimization, pp. 1–28, December 2005. 44. Roy, S. and Beiu, V., “Majority multiplexing – economical redundant fault-tolerant designs for nanoarchitectures,” IEEE Transaction on Nanotechnology, vol. 4, no. 4, July 2005, pp. 441–451. 45. Schmid, A. and Leblebici, Y., “A Highly Fault Tolerant PLA Architecture for Failure-Prone Nanometer CMOS and Novel Quantum Device Technologies,” Defect and Fault Tolerance in VLSI Systems, 2004. 46. Schwab, A., Johnson, B., and Bechta-Dugan, J., “Analysis Techniques for Real-Time, FaultTolerant, VLSI Processing Arrays”, in Proceedings of the 1995 IEEE Annual Reliability and Maintainability Symposium, Washington, DC, January 17–19, 1995, pp. 137–143. 47. Sharma, D. and Pradhan, D., “An Efficient Coordinated Checkpointing Scheme for Multicomputers,” Fault-Tolerant Parallel and Distributed Systems, IEEE Computer Society Press, 1995. 48. Shooman, M., “Reliability of Computer Systems and Networks,” Wiley Inter-Science, 2002.
Fault Tolerant Nanocomputing
27
49. Siewiorek, D. (editor), “Fault Tolerant Computing Highlights from 25 Years,” in Special volume of the 25th International Symposium on Fault-Tolerant Computing FTCS-25, Pasadena, CA. 50. Siewiorek, D. and Swartz, R., “The Theory and Practice of Reliable System Design,” A. K. Peters, 1999. 51. Singh, J., Mathew, J., Mohanty, S., and Pradhan, D., “Statistical Analysis of Steady State Leakage Currents in Nano-CMOS Devices”, in Proceedings of the 25th IEEE Norchip Conference (NORCHIP), 2007. 52. Somani, A. and Vaidya, N., “Understanding fault tolerance and reliability,” IEEE Computer, April 1997, pp. 45–50. 53. Strukov, D. and Likharev, K., “Defect-tolerant architectures for nanoelectronic crossbar memories,” Journal of Nanoscience and Nanotechnology, vol. 7, no. 1, January 2007. 54. Strukov, D. and Likharev, K., “Prospects for terabit-scale nanoelectronic memories,” Nanotechnology, vol. 16, January 2005, pp. 137–148. 55. Sun, F. and Zhang, T., “Defect and transient fault-tolerant system design for hybrid CMOS/nanodevice digital memories,” IEEE Transactions on Nanotechnology, vol. 6, no. 3, May 2007, pp. 341–351. 56. Tahoori, M., “Defects, Yield, and Design in Sublithographic Nano-Electronics,” Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2005. 57. Vaidya, N. and Pradhan, D., “Fault-Tolerant Design Strategies for High Reliability and Safety,” IEEE Transactions on Computers, vol. 42, no. 10, October 1993. 58. von Neumann, J. “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” Automata Studies, Annals of Mathematical Studies, Princeton University Press, No. 34, 1956, pp. 43–98. 59. Welke, S., Johnson, B., and Aylor, J., “Reliability Modeling of Hardware-Software Systems”, IEEE Transactions on Reliability, Vol. 44, No. 3, September 1995, pp. 413–418. 60. Zhang, Y. and Chakrabarty, K., “Fault Recovery Based on Checkpointing for Hard Real-Time Embedded Systems,” in Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’03), 2003. 61. Zorian, Y. and Chandramouli, M., “Manufacturability with Embedded Infrastructure IPs, http://www.evaluationengineering.com/archive/articles/0504/0504embedded IP.asp.
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics Aiman H. El-Maleh, Bashir M. Al-Hashimi, Aissa Melouki, and Ahmad Al-Yamani
Abstract Nanodevices based circuit design will be based on the acceptance that a high percentage of devices in the design will be defective. A promising defect tolerance approach is one combining expensive but reliable gates with cheap but unreliable gates. One approach for increasing reliability of gates is by adding redundancy at the transistor-level implementation. In this chapter, we investigate a defect tolerant technique that adds redundancy at the transistor level and provides built-in immunity to permanent defects (stuck-open, stuck-short and bridges). The proposed technique is based on replacing each transistor by a quadded-transistor structure that guarantees defect tolerance of all single defects and a large number of multiple defects as validated by theoretical analysis and simulation. As demonstrated by extensive simulation results using ISCAS 85 and 89 benchmark circuits, the investigated technique achieves significantly higher defect tolerance than recently reported nanoelectronics defect-tolerant techniques (even with up to four to five times more transistor defect probability) and at reduced area overhead. The proposed approach can be effective in enhancing the reliability of Triple-modularredundancy based defect tolerance approaches by enhancing the reliability of voter gates implementation. Keywords Defect tolerance Fault tolerance Reliability Failure rate Nanotechnology
Introduction With CMOS technology reaching the scaling limits, the need for alternative technologies became necessary. Nanotechnology-based fabrication is expected to offer the extra density and potential performance to take electronic circuits the next step. A.H. El-Maleh and A. Al-Yamani King Fahd University of Petroleum & Minerals, P.O. Box 1063, Dhahran, 31261, Saudi Arabia e-mail: [email protected]; [email protected] B.M. Al-Hashimi and A. Melouki University of Southampton, Southampton, SO17 1BJ, UK e-mail: [email protected]; [email protected] C. Huang (ed.), Robust Computing with Nano-scale Devices: Progresses and Challenges, Lecture Notes in Electrical Engineering 58, c Springer Science+Business Media B.V. 2010 DOI 10.1007/978-90-481-8540-5 3,
29
30
A.H. El-Maleh et al.
It is estimated that molecular electronics can achieve very high densities (1012 devices per cm2 ) and operate at very high frequencies (of the order of THz) [1]. Several successful nano-scale electronic devices have been demonstrated by researchers, some of the most promising being carbon naotubes (CNT) [2], silicon nano-wires (NW) [3,4], and quantum dot cells [5]. It has been shown that using self-assembly techniques, we can overcome the limitations posed by lithography for the smallest feature size [1]. Currently, due to fabrication regularity imposed by the self assembly process, only regular structures can be built. Hewlett-Packard has recently fabricated 8 8 crossbar switches using molecular switches at the crosspoints [6]. They observed that only 85% of the switches were programmable while the other 15% were defective. Whether nanotechnology circuits are implemented using self assembly or lithography processes, defect rates are rising substantially. At these nanometer scales, the small cross section areas of wires make them fragile, increasing the likelihood that they will break during assembly. Moreover, the contact area between nanowires and between nanowires and devices depends on a few atomic-scale bonds resulting in some connections being poor and effectively unusable [4,6,7]. With such high defect rates, defect-tolerant methods have to be devised for the emerging nanotechnology devices. Discarding all parts with any defects in them is not affordable any more as the yield hit will be substantial. Therefore, the necessity to cope with intrinsic defects at the circuit level must be recognized as a key aspect of nanodevices-based designs. To implement such robustness and defect tolerance, circuit design techniques capable of absorbing a number of defects and still be able to perform their functions need to be investigated. Typical approaches to reliable system design include defect-tolerant and defect avoidance techniques [8]. Defect-tolerant techniques are based on adding redundancy in the design to tolerate defects or faults. However, defect avoidance techniques are based on identifying defects and bypassing them based on reconfiguration. Examples of defect-tolerant techniques are the multiplexed logic approach [9], N-tuple modular redundancy (NMR), Triple-modular redundancy (TMR) and Triple Interwoven Redundant logic (TIR) [10, 11], cascaded triple modular redundancy (CTMR) or recursive triple modular redundancy (RTMR) [12, 13], and quadded logic (QL) [10, 11, 14, 15]. Examples of defect-avoidance techniques are given in [7, 8, 16–19]. While both approaches address the defect-tolerance issue, it is unclear from the literature which approach is more effective since the effectiveness of defect-tolerant approaches based on classical techniques such as TMR, is limited by the arbitration unit, whilst defect avoidance techniques require extensive defect mapping and reconfiguration infrastructure in addition to the large number of spare parts needed due to the high defect rate. Redundancy-based defect tolerance techniques have the advantage of defect tolerance against both permanent and transient faults while reconfiguration based techniques deal only with permanent defects. It is expected that an effective design style is to make individual modules more reliable by using redundancy internal to each module, and to choose an optimal combination of inter-module reconfigurability and intra-module redundancy [20].
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
31
To tolerate defects, redundancy can be added at the transistor level, gate level or functional block level. With the predicted high defects densities in nanotechnology, we believe that defect tolerance techniques should be developed and applied at low level of the design abstraction. By making the most fundamental component of a design (i.e. transistor) more robust by using redundancy improves the design reliability when compared with defect tolerance techniques applied at higher level of abstraction (block level or functional level). We will show that adding redundancy at the transistor level produces designs that can tolerate higher defects rates when compared with higher level defect tolerance techniques. Also, we will show that adding redundancy at the transistor level is less costly in terms of overhead when compared with the same defect tolerance techniques applied at higher level (block or functional). Because of these two attractive features (defect tolerance effectiveness in the presence of high defect density and low area overhead) we believe that adding redundancy at lower level of the design abstraction is a promising technique and merits further consideration and research. The recent work in [20] has demonstrated the importance of having a highly reliable implementation of certain parts in the design through a heterogeneous defect tolerance approach combining cheap but unreliable nanodevices and reliable but expensive CMOS devices. Such reliable CMOS devices can be achieved through the proposed transistor level defect tolerance technique.
Previous Approaches Defect-Tolerant Techniques The multiplexed logic approach, motivated by the pioneering work of John von Neumann [9], began as an attempt to build early digital computers out of unreliable components. This approach and subsequent derivatives [10,14,15] have provided insight on how to design reliable nanoelectronic systems out of components that might fundamentally be less reliable than those of currently available technologies. In the multiplexed logic approach, each gate is implemented based on an “executive” unit and a “restorative” unit. The “executive” unit is constructed based on a multiplexed logic unit in which each logic gate is duplicated N times and each input and output is also duplicated N times. The inputs randomly pair to feed the N gates. The “restorative” unit is comprised of two cascaded multiplexed logic units as shown in Fig. 1 for a two-input NAND gate, with N D 4. The “restorative” unit is intended to restore some of the incorrect values of the gates to their correct values. It is assumed that each gate has a probability of failure © and that the logic state of the bundle is decided based on the fraction of wires having a specific value above or below a preset threshold. In general, von Neumann’s construction requires a large amount of redundancy and a low error rate for individual gates. It is shown in [21] that for deep logic with a gate failure probability © D 0:01 and N D 100, it is possible to achieve
32
A.H. El-Maleh et al.
a
b
Randomizing Unit
Executive Unit
Randomizing Unit
Randomizing Unit
Restorative Unit
Fig. 1 (a) Original two-input NAND gate, (b) von Newmann’s multiplexed logic for the two-input NAND gate with N D 4
circuit failure probability in the order of 106 . This required amount of redundancy is excessive and is considered impractical. In order to reduce this large amount of redundancy, the work in [22, 23] combines NAND multiplexing with reconfiguration. Another defect-tolerant approach is known as the N-tuple modular redundancy (NMR) in which each gate is duplicated N times (where N must be an odd number) followed by an arbitration unit deciding the correct value based on majority. Triplemodular redundancy (TMR) is a special case of NMR. The reliability of such designs is limited by that of the final arbitration unit, making the approach difficult in the context of highly integrated nanosystems [8]. A TMR circuit can be further triplicated. The obtained circuit thus has nine copies of the original module and two layers of majority gates. This process can be repeated if necessary, resulting in a technique called cascaded triple modular redundancy (CTMR) or recursive triple modular redundancy (RTMR). Spagocci and Fountain [12] have shown that using CTMR in a nanochip with large nanoscale devices would require an extremely low device error rate. In [13], it is shown that recursive voting leads to a double exponential decrease in a circuit’s failure probability. However, a single error in the last majority gate can cause an incorrect result, hampering the technique’s effectiveness. Despite the high redundancy overhead, the nanocomputing community has proposed the employment of Triple- and N-modular redundancy techniques and shown that they provide high reliability even in the presence of high device failure or soft error rates [24]. Such techniques increase the reliability of designs, but the main assumption made in these redundancy techniques is that the voter block is free from the same failures that the rest of the design is facing. Whilst such assumption appears to hold true for designs based on traditional silicon technologies, this assumption will not hold for designs based on nanotechnology hindering the applicability of Triple- and N-modular redundancy techniques for nano designs. The quadded-transistor technique described in this chapter can improve the reliability of Triple-Modular Redundancy by voter hardening. This is achieved by employing the Quadded-Transistor (QT) technique in the voter block.
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
33
Pierce [10] introduced a fault-tolerant technique called interwoven redundant logic. Quadded logic [11, 14, 15] is an ad hoc configuration of the interwoven redundant logic. A quadded circuit implementation based on NAND gates replaces each NAND gate with a group of four NAND gates, each of which has twice as many inputs as the one it replaces. The four outputs of each group are divided into two sets of outputs, each providing inputs to two gates in a succeeding stage. The interconnections in a quadded circuit are eight times as many as those used in the non-redundant form. In a quadded circuit, a single critical error .1 ! 0/ is correctable after passing through two stages of logic and a single sub-critical error .0 ! 1/ will be corrected after passing a single stage. In quadded logic, it must be guaranteed that the interconnect pattern at the output of a stage differ from the interconnect patterns of any of its input variables. While quadded logic guarantees tolerance of most single errors, errors occurring at the last two stages of logic may not be corrected. Figure 2 shows an example of TMR and quadded logic circuits. In [20], a defect tolerant computational architecture is proposed based on a heterogeneous CMOS – carbon nanotube fabric. A methodology for realizing coded Boolean functions implemented in nano-gates is introduced. It is shown that the yield of nano-circuits is significantly increased in the presence of high defect density. Moore et al. [25] have analyzed the use of series-parallel and bridge configurations for the application of redundancy to relay networks. Suran [26] has evaluated the reliability of the quad-relay structure shown in Fig. 3b and has shown its application using bipolar junction transistors. However, only the reliability of a single structure is evaluated and no extensive circuit reliability analysis is made.
Defect Avoidance Techniques Unlike defect-tolerant techniques which are designed to work properly despite the presence of defects, defect avoidance techniques are based on a different principle. They are based on the identification of defective modules and replacing them by other redundant modules through configuration. Mishra and Goldstein [17] have proposed a defect avoidance approach to nanosystem design based on a large reconfigurable grid of nanoblocks. Each of these blocks can be configured as one of the basic logic building blocks like an AND, an OR, an XOR, a half-adder, and so forth. The approach is based on mapping defects on nanoblocks and synthesizing a feasible configuration realizing the application for each nanofabric instance. The main limitation of this approach is that it requires mapping, synthesis, and configuration at such a fine granularity, making it not scalable for large nanosystems [8]. Furthermore, extensive connectivity among nanoblocks is required for defect mapping. To address the density, scalability, and reliability challenges of emerging nanotechnologies, He et al. [8] proposed a hierarchy of design abstractions, constructed as reconfigurable fabric regions, whereby designers assign small functional flows to
34
A.H. El-Maleh et al.
a G1
A
G3
B C
G2
b
G11
A
G13
B C
G12 G21
A
G23
B C
G22 G31
A
G33
B C
c
G32
A A B B
G11
A A B B
G12
A A B B
G13
A A B B
G14
B B C C B B C C B B C C B B C C
G31
G32 G21 G33 G22 G34 G23
G24
Fig. 2 (a) Original circuit, (b) TMR circuit, (c) quadded logic circuit
M
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
c
b
a
35
A
T1
T2
T3
T4
A
A
A
A
T1
T2
A
T3
T4
A
A A
Fig. 3 (a) Transistor in original gate implementation, (b) first quadded-transistor structure, (c) second quadded-transistor structure
each region. The approach demonstrates a tradeoff between yield, delay, and cost in defect-prone nanotechnology-based computing systems. Various nano-architectures that have been recently proposed are based on a twodimensional (2D) nano-scale crossbar. The work in [19] has studied the impact of defects on the routability of a crossbar. It is shown that a defective crossbar of size N N can be still utilized as a smaller K K defect-free crossbar. It is also shown that switch stuck-short faults have a significantly higher impact on the manufacturing yield compared to switch stuck-open faults. For this technique to be practical, effective testing and diagnostic techniques are needed to identify defective elements in the crossbar such that they can be bypassed by configuration. In [27] Tahoori presents an application-independent defect-tolerant design flow where higher level design steps are not needed to be aware of the existence and the location of defects in the chip. Only a final mapping step is required to be defect aware minimizing the number of per-chip design steps, making it appropriate for high volume production. Two algorithms for identifying the maximum defect-free crossbar with the partially defective fabricated crossbar are given. In [28], Hogg and Snider examined the implementation of binary adders based on defective crossbars. They showed a tradeoff between defect tolerance and circuit area for different implementations. It is shown that the likelihood that defects are tolerable changes abruptly from near one to near zero over a small range of defect rates for a given crossbar size. Other work has explored defect tolerance in systems composed of multiple crossbars [29, 30].
Defect-Tolerant FPGA Design Techniques The regularity of FPGAs makes them suitable candidates for nanotechnology implementation. Reconfigurability in FPGAs has historically been used as means for mitigating the effect of defects. For permanent defects, a recent common practice for the largest two FPGA manufacturers has been to try mapping available designs on working blocks of the FPGA and use it as an “application-specific” FPGA.
36
A.H. El-Maleh et al.
The much more widely used fault tolerance application for FPGAs is fault tolerance for transient faults due to single event upsets (SEUs). Until recently, SEUs due to charged particles have been a threat mostly in remote and space computers. With device dimension shrinking to nanometer scales, the threat is now very significant even for terrestrial applications. Asadi et al. presented an evaluation for different single-event-upset (SEU) fault tolerance schemes implemented on FPGAs [31]. The bottleneck in triple modular redundancy (TMR) implementations is the voter. The reliability of the system is always limited by that of the voter. Samudrala et al. [32] proposed implementing TMR on FPGAs. They suggest using tri-state buffers (available on Xilinx virtex FPGAs) to build SEU tolerant voters. By doing so, they deal with the bottleneck in the reliability of the TMR. They use a program to evaluate the nodes with a high probability of errors due to SEUs, and selectively implement TMR at the potential gates. The work in [33] investigates the ideal positioning of the voters for a TMR system to achieve maximal robustness with minimal overhead. TMR is not the only way to deal with SEUs in FPGAs. Some work in the past has compared the efficiency of TMR with that of error detection (using duplication) followed by recomputing. At low error rates, it is more efficient to use duplication with recomputing. Tiwari et al. [34] proposed protecting against SEUs using parity bits in the memory blocks of FPGAs. If an error is detected, the memory is rewritten. The technique shows a significant power improvement compared to TMR. Sterpone et al. suggest a place and route algorithm to reduce the susceptibility to SEUs in FPGAs [35]. Huang et al. in [36, 37] presented a scheme for evaluating the fault tolerance of different FPGAs based on reconfigurability of routing resources in the presence of faulty switches. Routability (a realizable route between input and output endpoints) is used as a measure of fault tolerance. As switches fail, the probability of finding a route between endpoints decreases. The paper uses open and short switches as faults. The work in [38] also investigates routability. The paper proposes a faulttolerant design for the switch blocks for yield enhancement. The technique is based on wrapping additional multiplexers around the switch blocks to allow different routes for signals. The additional multiplexers lead to higher probability of finding a route between endpoints. In [39], duplication and concurrent error detection are used to tolerate transient and permanent faults in FPGAs. The paper classifies fault tolerance options in FPGAs as (1) using fault-tolerant blocks, (2) replacing the architecture with a more robust one, or (3) protecting the high level description using redundancy such as TMR, which has been used in the past for such fault tolerance. The authors use time redundancy as well as duplication with comparison. They also apply recomputation with shifted operands and swapped operands. Their technique requires fewer IO pads and consumes less power than TMR. However, it takes more time and requires more flip flops. In [40], the authors compare coarse and fine-grain redundancy in FPGAs to tolerate defects. For coarse-grain redundancy, they use spare rows and columns, and for
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
37
fine-grain redundancy they use spare wires. Their findings support using fine-grain redundancy since they offer much higher fault tolerance. They argue that matching the fine-grain fault tolerance using coarse-grain fault tolerance requires double the hardware overhead of fine-grain redundancy, which is about 50%.
Reliability Analysis For proper evaluation of defect tolerant architectures, it is important to rely on reliability analysis techniques that can provide accurate analysis while being efficient. Recently, several reliability evaluation approaches have been proposed [41–43]. In [41], a probabilistic-based design methodology for designing nanoscale computer architectures based on Markov Random Fields (MRF) is proposed. Arbitrary logic circuits can be expressed by MRF and logic operation is achieved by maximizing the probability of state configurations in the logic network. Markov random network, belief propagation algorithm, and Gibbs energy distribution are selected as the basis for nanoarchitectural approach since its operation does not depend on perfect devices or perfect connections. Successful operation requires that the energy of correct states is lower than the energy of errors. In [42], the computational scheme based on Markov random field and belief propagation are automated for the evaluation of reliability measures of combinational logic blocks. Various reliability results for defect-tolerant architectures, such as triple modular redundancy (TMR), cascaded TMR, and multistage iterations of these are derived. In [43], a general computational framework based on probabilistic transfer matrices (PTMs) is proposed for accurate reliability evaluation. Algebraic decision diagrams (ADDs) are used to improve the efficiency of PTM operations. The proposed approach can be effectively used in exact reliability calculations for analyzing fault-tolerant architectures and in calculating gate error susceptibility. In [44], a method is proposed that enhances the probabilistic gate models (PGMs) for reliability estimation to enable accurate evaluation of reliabilities of circuits. It significantly reduces the PGM method’s complexity, making it suitable for practical design-for-reliability applications.
Proposed Defect Tolerance Technique In this chapter, we investigate defect tolerance based on adding redundancy at the transistor-level for electronic circuits. The work is focused on transistor stuck-open, stuck-short and bridges between gates of transistors. A transistor is considered defective if its expected behavior changes regardless of the type of defect causing it. In order to tolerate single-defective transistors, each transistor, A, is replaced by a quadded-transistor structure implementing either the logic function .ACA/.ACA/
38
A.H. El-Maleh et al.
or the logic function .AA/ C .AA/, as shown in Fig. 3. In both of the quaddedtransistor structures shown in Fig. 3b and c, any single transistor defect (stuck-open, stuck-short, AND/OR-bridge) will not change the logic behavior, and hence the defect is tolerated. It should be observed that for NMOS transistors, OR-bridge and stuck-short defects produce the same behavior while AND-bridge and stuckopen defects have the same behavior. Similarly, for PMOS transistors, OR-bridge and stuck-open defects produce the same behavior while AND-bridge and stuckshort defects have the same behavior. Double stuck-open (or their corresponding bridge) defects are tolerated as long as they do not occur in any two parallel transistors (T1 &T2 or T3 &T4 for the structure in Fig. 3b, and T1&T2, T1 &T4 ; T3 &T2 or T3 &T4 for the structure in Fig. 3c). Double stuck-short (or their corresponding bridge) defects are tolerated as long as they do not occur in any two series transistors (T1 &T3 ; T1 &T4 ; T2 &T3 or T2 &T4 for the structure in Fig. 3b, and T1 &T3 or T2 &T4 for the structure in Fig. 3c). In addition, any triple defect that does not include two parallel stuck-open defects or two series stuck-short defects or their corresponding bridging defects is tolerated. Thus, one can easily see that using either of the quadded-transistor structures, the reliability of gate implementation is significantly improved. It should be observed that the effective resistance of the quadded-transistor structures has the same resistance as the original transistor. However, in the presence of a single defect, the worst case effective resistance of the first quadded-transistor structure (Fig. 3b) is 1.5R while that of the second quadded-transistor structure (Fig. 3c) is 2R, where R is the effective resistance of a transistor. This occurs in the case of single stuck-open (or corresponding bridge) defects. For tolerable multiple defects, the worst case effective resistance of both structures is 2R. For this reason, the first quadded-transistor structure (Fig. 3b) is adopted in this chapter. Next, we determine the probability of circuit failure given a transistor defect probability using quadded-transistor structures. A transistor is considered defective if it does not function properly due to manufacturing defects. Given a transistor-defect probability, P , the probability of quadded-transistor structure failure is Pq D 32 P 2 12 P 3 . This is proved with respect to stuck-open and stuck-short defects as bridge defects have equivalent behaviors to them as explained earlier. If there are only two defective transistors in a quadded-transistor structure, then we have four possible pairs of stuck-open and stuck short defects. In all cases, only one of those pairofdefects produces an error. Thus, the probability of failure in this case is 3 1 4 P 2 .1 P /2 D P 2 .1 P /2 : 2 4 2 If we assume that three transistors are defective, then we have eight possible combinations of stuck-open and stuck short defects. In all cases, five out of those combinations produce an error. Thus, the probability of failure in this case is 5 8
5 4 P 3 .1 P / D P 3 .1 P / 3 2
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
39
If four transistors are assumed defective, then in this case there will always be an 4 error and the probability of failure is 1 P 4 D P 4: 4 Thus, the probability of quadded-transistor structure failure is 3 2 5 P .1 P /2 C P 3 .1 P / C P 4 2 2 3 2 3 4 5 3 5 4 3 D P 3P C P C P P C P 4 2 2 2 2 3 2 1 3 D P P : 2 2
Pq D
Given a transistor-defect probability, P , and a circuit with N quadded-transistor structures, the probability of circuit failure and circuit reliability are Pf D
N X
.1/
i C1
i D1
R D 1 Pf D 1
N i
N X i D1
Pq
i
.1/
i C1
N i
i Pq :
We assume in this chapter that circuit reliability represents the probability that the circuit will function correctly in the presence of defects. It should be observed that while the result above represents the exact circuit failure probability for stuck-open and stuck-short defects, it represents an upper bound for bridging defects. This is due to the fact that not all bridging defects that result in a faulty quadded-structure result in a faulty gate behavior. For example, AND-bridging defects between gates of transistors within the same NAND gate do not change the gate behavior regardless of their multiplicity. Similarly, OR-bridging defects between gates of transistors within the same NOR gate do not change the gate behavior regardless of their multiplicity. Figure 4 compares the reliability of several NAND gates of various inputs, 2–8, implemented using the quadded-transistor structure and conventional complementary (pull-up, pull-down) CMOS implementation for stuck-open and stuck-short defects. As can be seen, the reliability of gates implemented using the quaddedtransistor structure is significantly higher than the reliability of conventional gate implementation. For example, for an 8-input NAND gate, with a probability of transistor failure D 10%, the probability of failure for the quadded-transistor structurebased design is 21% (and reliability is 79%), while the probability of failure for the conventional CMOS implementation is 81% (and reliability is 19%). Furthermore, as the number of inputs increases, the probability of gate failure increases and reliability decreases, as expected. The quadded-transistor structure, given in Fig. 3b, can be generalized to an N2 -transistor structure, where N D 2; 3; : : : ; k. An N2 -transistor structure is
40
A.H. El-Maleh et al. 1
Reliability
0 .9 0 .8
NAND2
0 .7
NAND2Q
0 .6
NAND4 NAND4Q
0 .5
NAND6
0 .4
NAND6Q
0 .3
NAND8
0 .2
NAND8Q
0 .1 0 0.0001
0.001 0.01 Prob. of transistor failure
0.1
Fig. 4 Gate reliability comparison between quadded-transistor structure (Q) and complementary CMOS
T1
TN+1
T2
TN+2
TN2–N+1
TN
T2 N
TN2
Fig. 5 Defect-tolerant N2 -transistor structure
composed of N blocks connected in series with each block composed of N parallel transistors, as shown in Fig. 5. An N2 -transistor structure guarantees defect tolerance of all defects of multiplicity less than or equal to .N 1/ in the structure. Hence, a large number of multiple defects can be tolerated in a circuit implemented based on these structures. Furthermore, it can be shown that the probability of failure for an N2 -transistor structure is O P N assuming a transistor-defect probability, P . The N2 -transistor structure, for N > 2, may be applied selectively for critical gates due to its increased overhead.
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
41
An interesting advantage of the N2 -transistor structure is that it fits well in existing design and test methodologies. In synthesis, a library of gates implemented based on the quadded-transistor structure will be used in the technology mapping process. The same testing methodology will be used assuming testing is done at the gate level based on the single stuck-at fault model. So, the same test set derived for the original gate-level structure can be used without any change. The gate capacitance that the quadded-transistor structure induces on the gate connected to the input A is four times the original gate capacitance. This has an impact on both delay and power dissipation. However, as shown in [45], a gate with higher load capacitance has better noise rejection curves and hence is more resistant to soft errors resulting in noise glitches. To determine the area, delay and power impact of the quadded-transistor structure, we have designed, using Magic, two libraries based on the 0.5 u CMOS Alcatel process. The libraries are composed of three basic cells, Inverter (INV), two-input Nand gate (NAND2), and two-input Nor gate (NOR2) based on the quaddedtransistor structure and the conventional CMOS implementation. Then, we obtained delay and power characteristics using SPICE simulations based on the extracted netlists. Delay characteristics were calculated after supplying proper load and drive conditions. For all the cells the drive was composed of two inverters in series and the load was composed of two inverters in parallel. The inverters were chosen from the same library. Dynamic power was measured using the.measure command in SPICE for the same period of time in both libraries. Table 1 summarizes delay, power and area characteristics of the two libraries. The delay and power consumption of cells designed based on the quadded-transistor structure are in the worst case 3.65 times more than the conventional cells and the cell area is about three times more. While the quadded-transistor structure increases the area, this increase is less than other gate-level defect tolerance techniques as will be shown in the experimental results. As with all defect tolerance techniques, the increase in area, power and delay is traded off by more circuit reliability. This is justified given that it is predicted that nanotechnology will provide much higher integration densities, speed and power advantages.
Table 1 Area, delay and power values of basic 0:5 cells designed using quadded-transistor structure (Fig. 3b) and complementary (pull-up, pull-down) CMOS INV NAND2 NOR2 Characteristics Delay (ps)
Dyn. Power (mW)
Area .um2 /
Fall Rise TPHL TPLH Avg. Max. RMS
CMOS
QT
CMOS
QT
CMOS
QT
270:8 566:6 169:6 300:3 0:120 1:469 0:355 89
763:0 1775 469:0 973:3 0:340 2:602 0:665 208
416:6 606:9 239:1 324:9 0:175 1:709 0:431 128
1143 2217 604:9 1182 0:533 2:602 0:815 402
285:7 1124 180:7 548:2 0:180 1:691 0:432 126
902:5 3986 557:6 1965 0:542 2:606 0:810 397
42
A.H. El-Maleh et al.
Experimental Results To demonstrate the effectiveness of the quadded-transistor structure technique, we have performed experiments on a number of the largest ISCAS85 and ISCAS89 benchmark circuits (replacing flip-flops by inputs and outputs). Two types of permanent defects are analyzed separately: transistor stuck-open and stuck-short defects, and AND/OR bridging defects.
Stuck-Open and Stuck-Short Defect Analysis For evaluating circuit failure probability and reliability, we adopt the simulationbased reliability model used in [11]. We compare circuit reliability based on the quadded-transistor structure with the compared approaches in [11] including Triple Interwoven Redundancy (TIR) and Quadded logic. We use a complete test set T that detects all detectable single stuckat faults in a circuit. We have used test sets generated by Mintest ATPG tool [46]. To compute the circuit failure probability, Fm , resulting from injecting m defective transistors, we use the following procedure: 1. Set the number of iterations to be performed, I , to 1,000 and the number of failed simulations, K, to 0. 2. Simulate the fault-free circuit by applying the test set T . 3. Randomly inject m transistor defects. 4. Simulate the faulty circuit by applying the test set T . 5. If the outputs of the fault-free and faulty circuits are different, increment K by 1. 6. Decrement I by 1 and if I is not 0 go to step 3. 7. Failure Rate Fm D K=1;000. Assuming that every transistor has the same defect probability, P , and that defects are randomly and independently distributed, the probability of having a number of m defective transistors in a circuit with N transistors follows the binomial distribution [11] as shown below: N P .m/ D P m .1 P /N m m Assuming the number of transistor defects, m, as a random variable and using the circuit failure probability Fm as a failure distribution in m, the probability of circuit failure, F , and circuit reliability, R, are computed as follows [11]: F D
N X
Fm P .m/
mD0
R D 1F D1
N X mD0
Fm P .m/
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
43
1 0.9 0.8 c880t c1908t c2670t c6288t c7552t c880e c1908e c2670e c6288e c7552e
Reliability
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.0001
0.001
0.01
0.1
Transistor Prob. Failure
Fig. 6 Reliability obtained both theoretically (t) and experimentally (e) based on quaddedtransistor structure and stuck-open and stuck-short defects
1
Prob. of Failure
0.9 0.8 0.7 0.6
TIR QL
0.5
QT
0.4 0.3
TIR-MQT
0.2 0.1 0 1
3
5
7
9
11
13
15
17
19
No. of Defects
Fig. 7 Comparison of circuit failure probability for an eight-stage cascaded half-adder circuit for stuck-open and stuck short defects
Figure 6 shows the reliability of some of the ISCAS85 benchmark circuits obtained both theoretically and experimentally based on the above simulation procedure and formulas for stuck-open and stuck short defects. As can be seen, there is almost identical match, clearly validating the derived theoretical results. In Fig. 7, we compare the probability of circuit failure for a given number of stuck-open and stuck-short defects between the quadded-transistor structure (QT),
44
A.H. El-Maleh et al.
quadded logic (QL) [11] and TIR logic [11]. It should be observed that TIR is a generalization of TMR logic. The comparison is made based on an 8-stage cascaded half adder circuit used in [11]. TIR logic is implemented by adding a majority gate for each sum and carry-out signal at each stage. Majority gate is also implemented as a single gate. As can be seen, adding transistor-level defect tolerance generates circuits with less probability of circuit failure than those that add defect tolerance at gate level (QL) and unit level (TMR). This is in addition to smaller area overhead in terms of smaller number of transistors used. The number of transistors in the quadded-transistor structure implementation is 512, while it is 608 in TIR logic and 1,024 in quadded logic. It should be observed that the total number of transistors in the circuit should be taken into account when making the comparison for a given number of defects as the percentage of injected defects varies. The probability of circuit failure for TIR and TMR logic can be significantly improved by improving the reliability of majority gates. We have implemented the majority gates in the 8-stage cascaded half adder TIR logic circuit based on the quadded-transistor structure (TIR-MQT). As shown in Fig. 7, the reliability of the implemented circuit is significantly improved compared to TIR circuit at the expense of increased number of transistors (No. transistors D 1,280). This shows an interesting potential application of the N2 -transistor structure in improving the reliability of voter-based redundancy techniques which will be more thoroughly investigated in future work. For TMR to be effective, a careful balance between the module size and the number of majority gates used needs to be made. For this reason, we focus comparison of the reliability of ISCAS benchmark circuits between quadded-transistor structure and quadded logic. A comprehensive comparison of the probability of circuit failure between the quadded-transistor structure and the quadded logic is given in Table 2 for several percentages of injected stuck-open and stuck-short defects. For all the circuits, the
Table 2 Comparison of circuit failure probability between quadded-transistor structure and quadded logic approaches for stuck-open and stuck-short defects Quadded-transistor structure Quadded logic Circuit c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 s5378 s9234 s13207 s15850
#Trans. 7,208 9,232 13,784 22,672 30,016 45,048 40,448 61,600 35,608 74,856 103,544 128,016
0.25% 0:015 0:023 0:030 0:047 0:067 0:095 0:085 0:136 0:081 0:166 0:212 0:257
0.5% 0:060 0:082 0:115 0:188 0:238 0:341 0:307 0:441 0:282 0:510 0:625 0:697
0.75% 0:135 0:176 0:248 0:375 0:457 0:614 0:576 0:732 0:521 0:791 0:888 0:936
1% 0.237 0.287 0.400 0.569 0.674 0.816 0.787 0.909 0.737 0.939 0.980 0.992
#Trans. 13,616 18,304 24,112 36,064 46,976 74,112 77,312 96,816 59,760 103,488 150,448 171,664
0.25% 0:452 0:531 0:673 0:958 0:59 0:991 0:685 0:985 1 0:999 1 1
0.5% 0:783 0:846 0:94 0:999 0:901 1 0:962 1 1 1 1 1
0.75% 0:905 0:975 0:984 1 0:996 1 0:999 1 1 1 1 1
1% 0:978 0:995 1 1 0:999 1 1 1 1 1 1 1
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
45
Table 3 Comparison of circuit reliability between quadded-transistor structure and quadded logic approaches for stuck-open and stuck-short defects Quadded-transistor structure Quadded logic Circuit #Trans. 0.0001 0.001 0.002 0.005 0.01 c880 7,208 0:999 c1355 9,232 0:999 c1908 13,784 0:999 c2670 22,672 0:999 c3540 30,016 0:999 c5315 45,048 0:999 c6288 40,448 0:999 c7552 61,600 0:999 s5378 35,608 0:999 s9234 74,856 0:999 s13207 103,544 0:999 s15850 128,016 0:999
0:997 0:996 0:994 0:991 0:989 0:984 0:986 0:978 0:985 0:972 0:961 0:953
0:989 0:986 0:979 0:967 0:956 0:935 0:941 0:912 0:948 0:894 0:856 0:825
0:934 0:917 0:879 0:809 0:755 0:656 0:685 0:562 0:717 0:496 0:379 0:302
#Trans. 0.0001 0.001 0.002 0.005 0.01
0:767 13,616 0:979 0:713 18,304 0:975 0:596 24,112 0:975 0:427 36,064 0:904 0:327 46,976 0:981 0:185 74,112 0:853 0:222 77,312 0:971 0:101 96,816 0:874 0:263 59,760 0:811 0:061 103,488 0:821 0:023 150,448 0:518 0:008 171,664 0:576
0:822 0:765 0:755 0:350 0:805 0:227 0:718 0:292 0:134 0:140 0:008 0:009
0:651 0:575 0:558 0:112 0:614 0:034 0:465 0:077 0:015 0:001 0:000 0:000
0:283 0:187 0:261 0:001 0:237 0:001 0:024 0:000 0:001 0:000 0:000 0:000
0:042 0:008 0:001 0:000 0:000 0:000 0:000 0:000 0:000 0:000 0:000 0:000
quadded-transistor technique achieves significantly lower circuit failure probability than the quadded logic technique for the same and for twice the percentage of injected defects. For ten out of 12 circuits, it achieves lower failure probability with four times the percentage of injected defects. In Table 3, we report the reliability results obtained based on the simulation procedure outlined above for the quadded-transistor structure and quadded logic approaches for several transistor defect probabilities based on stuck-open and stuck-short defects. The effectiveness of the quadded-transistor structure technique is clearly demonstrated by the results as it achieves higher circuit reliability with even four to five times more transistor defect probability. This is in addition to the observation that it requires nearly half the area of quadded logic as indicated by the number of transistors.
Bridging Defect Analysis In order to analyze the defect tolerance of quadded-transistor structure technique to bridging defects, the same simulation-based model was used. The experiments were performed on the same set of ISCAS circuits. The bridging defects were injected randomly between the gates of the defected transistor and one of its neighbors, located within a window of local transistors (˙8 transistors). Both AND and OR bridging defects were injected equally. It should be observed that for injecting m defective transistors due to bridges, only m=2 bridges need to be injected. Table 4 shows the results obtained for several percentages of injected bridging defects. As can be seen, the quadded-transistor technique exhibits a much lower failure probability than quadded-logic technique. Quadded-transistor technique achieves failure rates lower than quadded-logic for the same and twice the percentage of injected bridging faults. For 0.25% of injected defects, it achieves failure rates nine
46
A.H. El-Maleh et al.
Table 4 Comparison of circuit failure probability between quadded-transistor structure and quadded logic approaches for bridging defects Quadded-transistor structure Quadded logic Circuit #Trans. 0.0001 0.001 0.002 0.005 0.01 c880 7,208 0:999 c1355 9,232 0:999 c1908 13,784 0:999 c2670 22,672 0:999 c3540 30,016 0:999 c5315 45,048 0:999 c6288 40,448 0:999 c7552 61,600 0:999 s5378 35,608 0:999 s9234 74,856 0:999 s13207 103,544 0:999 s15850 128,016 0:999
0:997 0:996 0:994 0:991 0:989 0:984 0:986 0:978 0:985 0:972 0:961 0:953
0:989 0:986 0:979 0:967 0:956 0:935 0:941 0:912 0:948 0:894 0:856 0:825
0:934 0:917 0:879 0:809 0:755 0:656 0:685 0:562 0:717 0:496 0:379 0:302
#Trans. 0.0001 0.001 0.002 0.005 0.01
0:767 13,616 0:979 0:713 18,304 0:975 0:596 24,112 0:975 0:427 36,064 0:904 0:327 46,976 0:981 0:185 74,112 0:853 0:222 77,312 0:971 0:101 96,816 0:874 0:263 59,760 0:811 0:061 103,488 0:821 0:023 150,448 0:518 0:008 171,664 0:576
0:822 0:765 0:755 0:350 0:805 0:227 0:718 0:292 0:134 0:140 0:008 0:009
0:651 0:575 0:558 0:112 0:614 0:034 0:465 0:077 0:015 0:001 0:000 0:000
0:283 0:187 0:261 0:001 0:237 0:001 0:024 0:000 0:001 0:000 0:000 0:000
0:042 0:008 0:001 0:000 0:000 0:000 0:000 0:000 0:000 0:000 0:000 0:000
times less than quadded-logic and three times less for 0.5% of injected defects in most of the circuits. It should be observed that for the same percentage of defective transistors, the failure rate for bridging defects is less than that of stuck-open and stuck-short defects. This is due to the fact that not all bridging defects will result in a faulty gate behavior.
Conclusion In this chapter, we have investigated a defect-tolerant technique based on adding redundancy at the transistor level. The proposed technique provides defect tolerance against a large number of permanent defects including stuck-open, stuck-short and bridging defects. Experimental results have demonstrated that the proposed technique provides significantly less circuit failure probability and higher reliability than recently investigated techniques based on gate level (quadded logic) and unit level (Triple modular redundancy). This improvement is achieved at less area overhead, for example 50% less transistors than quadded logic. The results have been investigated theoretically and by simulation using large ISCAS 85 and 89 benchmark circuits. Whilst the chapter focused on tolerance of transistor defects, the proposed technique is capable of tolerating defects in interconnect, which is seen by many researchers as a source of unreliability in nanodevices. This can be achieved using four parallel interconnect lines to connect the driving gate to the four transistors in a quadded-transistor structure. This attractive feature adds to the credibility of the proposed approach for reliable nanoelectronics. In order to reduce overhead while marinating design reliability, the proposed transistor-level technique can be applied effectively in hybrid design approaches
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
47
combining expensive but reliable gates with cheap and less reliable gates. For example, in a TMR-based design the reliability of voter gates can be significantly enhanced using the proposed quadded-transistor level approach while other gates are implemented using less reliable implementation. Acknowledgments This work is supported by King Fahd University of Petroleum & Minerals under project COE/Nanoscale/388 for Dr. El-Maleh and Dr. Al-Yamani. It is also funded in part by the EPSRC/UK grant no. EP.E035965/1/for Prof. AL-Hashimi. The authors would like to thank Mr. Nadjib Mammeri for his help in this work.
References 1. M. Butts, A. DeHon, and S. C. Goldstein, “Molecular electronics: devices, systems and tools for gigagate, gigabit chips,” In ICCAD, pages 433–440, 2002. 2. T. N. A. Bachtold, P. Harley and C. Dekker, “Logic circuits with carbon nanotube transistors,” Science, 294:1317–1320, 2001. 3. Y. Cui and C. M. Lieber, “Functional nanoscale electronic devices assembled using silicon nanowire building blocks,” Science, 291:851–853, 2001. 4. Y. Huang, “Logic gates and computation from assembled nanowire building blocks,” Science, 294:1313–1317, 2001. 5. P. D. Tougaw and C. S. Lent. “Logical devices implemented using quantum cellular automata,” J. Applied Physics, 75:1818–1825, 1994. 6. Y. Chen, G.-Y. Jung, D. A. A. Ohlberg, X. Li, D. R. Stewart, J. O. Jeppesen, K. A. Nielsen, J. Fraser Stoddart, and R. S. Williams, “Nanoscale molecular-switch crossbar circuits,” Nanotechnology, 14:462–468, Apr. 2003. 7. D. Whang et al., “Large-Scale Hierarchical Organization of Nanowire Arrays for Integrated Nanosystems,” Nanoletters, vol. 3, no. 9, Sep. 2003, pp. 1255–1259. 8. Chen He, Margarida F. Jacome, and Gustavo de Veciana, “A Reconfiguration-Based DefectTolerant Design Paradigm for Nanotechnologies,” IEEE Design & Test of Computers, July– August 2005, pp. 316–326. 9. J. von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” Automata Studies, Princeton Univ. Press, Princeton, NJ, 1956, pp. 43–98. 10. W. H. Pierce, Failure-Tolerant Computer Design, Academic Press, New York, 1965. 11. Jie Han, Jianbo Gao, Yan Qi, Pieter Jonker, and Jos´e A.B. Fortes, “Toward HardwareRedundant, Fault-Tolerant Logic for Nanoelectronics,” IEEE Design & Test of Computers, July–August 2005, pp. 328–339. 12. S. Spagocci and T. Fountain, “Fault Rates in Nanochip Devices,” Proc. Electrochemical Society, 1999, vol. 98–19, pp. 582–593. 13. Darshan D. Thaker, Francois Impens, Isaac L. Chuang, Rajeevan Amirtharajah, and Frederic T. Chong, “Recursive TMR: Scaling Fault Tolerance in the Nanoscale Era,” IEEE Design & Test of Computers, July–August 2005, pp. 298–305. 14. J. G. Tryon, “Quadded Logic,” Redundancy Techniques for Computing Systems, Spartan Books, Washington, DC, 1962, pp. 205–228. 15. P. A. Jensen, “Quadded NOR Logic,” IEEE Trans. Reliability, vol. 12, no. 3, Sept. 1963, pp. 22–31. 16. J. R. Heath et al., “A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology,” Science, vol. 280, no. 5370, June 1998, pp. 1716–1721. 17. M. Mishra and S.C. Goldstein, “Defect Tolerance at the End of the Roadmap,” Proc. Int’l Test Conf., 2003, pp. 1201–1211. 18. W. B. Culbertson et al., “Defect Tolerance on the Teramac Custom Computer,” Proc. 5th IEEE Symp. FPGA-Based Custom Computing Machines, IEEE CS Press, 1997, pp. 116–123.
48
A.H. El-Maleh et al.
19. J. Huang, M.B. Tahoori, and F. Lombardi, “On the Defect Tolerance of Nano-scale TwoDimensional Crossbars,” In IEEE Symposium on Defect and Fault Tolerant (DFT), 2004. 20. Ashish Singh, Hady Ali Zeineddine, Adnan Aziz, Sriram Vishwanath, and Michael Orshansky, “A Heterogeneous CMOS-CNT Architecture utilizing Novel Coding of Boolean Functions,” 2007 IEEE International Symposium on Nanoscale Architecture (NANOARCH 2007), pp. 15–20. 21. Yan Qi, Jianbo Gao, and Jos´e A. B. Fortes, “Markov Chains and Probabilistic Computation—A General Framework for Multiplexed Nanoelectronic Systems,” IEEE Trans. on Nanotechnology, vol. 4, no. 2, pp. 194–205, March 2005. 22. J. Han and P. Jonker, “A defect- and fault-tolerant architecture for nanocomputers,” Nanotechnology, vol. 14, pp. 224–230, Jan. 2003. 23. A. S. Sadek, K. Nikoliæ, and M. Forshaw, “Parallel information and computation with restitution for noise-tolerant nanoscale logic networks,” Nanotechnology, vol. 15, pp. 192–210, Jan. 2004. 24. R. Iris Bahar, “Trends and future directions in nano structure based computing and fabrication,” International Conference on Computer Design (ICCD 06), pp. 522–528, 2006. 25. E. F. Moore et al., “Reliable Circuits using Less Reliable Relays,” J. Franklin Inst., Oct. 1956, pp. 191–197. 26. J. J. Suran., “Use of Circuit Redundancy to Increase System Reliability,” Int. Solid-State Circuits Conf., Feb. 1964, pp. 82–83. 27. M. B. Tahoori, “Application-Independent Defect Tolerance of Reconfigurable Nanoarchitectures,” ACM Journal on Emerging Technologies in Computing Systems, Vol. 2, No. 3, July 2006, Pages 197–218. 28. T. Hogg and G. S. Snider, “Defect-Tolerant Adder Circuits With Nanoscale Crossbars,” IEEE Trans. on Nanotechnology, vol. 5, no. 2, pp. 97–100, March 2006. 29. G. Snider, P. J. Kuekes, and R. S. Williams, “CMOS-like logic in defective, nanoscale crossbars,” Nanotechnology, vol. 15, pp. 881–891, 2004. 30. H. Naeimi and A. DeHon, “A greedy algorithm for tolerating defective crosspoints in NanoPLA design,” in Proc. Int. Conf. Field-Programmable Technol., 2004, pp. 49–56. 31. G. Asadi, S. G. Miremadi, H. R. Zarandi, and A. Ejlali, “Evaluation of fault-tolerant designs implemented on SRAM-based FPGAs,” 10th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 327–332, Mar. 2004. 32. P. Samudrala, J. Ramos, and S. Katkoori, “Selective Triple Modular Redundancy (STMR) Based Single-Event Upset (SEU) Tolerant Synthesis for FPGAs,” IEEE Transactions on Nuclear Science, Vol. 51, No. 5, pp. 2957–2969, Oct. 2004. 33. F. L. Kastensmidt, L. Sterpone, L. Carro, and M. S. Reorda, “On the optimal design of triple modular redundancy logic for SRAM-based FPGAs,” Design, Automation and Test in Europe, pp. 1290–1295, Vol. 2, 2005. 34. A. Tiwari and K. A. Tomko, “Enhanced reliability of finite-state machines in FPGA through efficient fault detection and correction,” IEEE Transactions on Reliability, Vol. 54, No 3, pp. 459–467, Sept. 2005. 35. L. Sterpone, M. S. Reorda, and M. Violante, “RoRA: a reliability-oriented place and route algorithm for SRAM-based FPGAs,” Research in Microelectronics and Electronics, Vol. 1, pp. 173–176, July 2005. 36. J. Huang, M. B. Tahoori, and F. Lombardi, “Probabilistic analysis of fault tolerance of FPGA switch block array,” 18th International Parallel and Distributed Processing Symposium, Apr. 2004. 37. Jing Huang, M. B. Tahoori, and F. Lombardi, “Fault Tolerance of Switch Blocks and Switch Block Arrays in FPGA” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 13, No 7, pp. 794–807, July 2005. 38. A. J. Yu and G. G. F. Lemieux, “Defect-tolerant FPGA switch block and connection block with fine-grain redundancy for yield enhancement,” International Conference on Field Programmable Logic and Applications, pp. 255–262, Aug. 2005.
Transistor-Level Based Defect-Tolerance for Reliable Nanoelectronics
49
39. F. G. Kastensmidt, G. Neuberger, R. F. Hentschke, L. Carro, and R. Reis, “Designing fault-tolerant techniques for SRAM-based FPGAs,” IEEE Design & Test of Computers, Vol. 21, No. 6, pp. 552–562, Nov–Dec 2004. 40. A. J. Yu and G. G. Lemieux, “FPGA defect tolerance: impact of granularity,” IEEE International Conference Field-Programmable Technology, pp. 189–196, Dec. 2005. 41. R. Iris Bahar, Joseph Mundy, and Jie Chen, “A probabilistic-based design methodology for nanoscale computation,” Proceedings of the International Conference on Computer Aided Design (ICCAD’03), pp. 480–486, 2003. 42. Debayan Bhaduri and Sandeep Shukla, “Nanolab—A Tool for Evaluating Reliability of DefectTolerant Nanoarchitectures,” IEEE Transactions on Nanotechnology, vol. 4, no. 4, pp. 381–394, July 2005. 43. Smita Krishnaswamy, George F. Viamontes, Igor L. Markov, and John P. Hayes, “Accurate Reliability Evaluation and Enhancement via Probabilistic Transfer Matrices,” Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05),. pp. 282–287, 2005. 44. Erin Taylor, Jie Han, and Jos´e Fortes, “Towards Accurate and Efficient Reliability Modeling of Nanoelectronic Circuits,” Sixth IEEE Conference on Nanotechnology, pp. 395–398, June 2006. 45. Chong Zhao, Sujit Dey, and Xiaoliang Bai, “Soft-Spot Analysis: Targeting Compound Noise Effects in Nanometer Circuits,” IEEE Design & Test of Computers, July–August 2005, pp. 362–375. 46. I. Hamzaoglu and J. H. Patel, “Test Set Compaction Algorithms for Combinational Circuits”, Proc. Int. Conf. Computer-Aided Design, Nov. 1998, pp. 283–289.
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays Yexin Zheng and Chao Huang
Abstract As complementary metal-oxide semiconductor (CMOS) technology faces more and more severe physical barriers down the path of continuously feature size scaling, innovative nano-scale devices have been developed to enhance future circuit design and computation. Among them, programmable logic arrays (PLAs) using self-assembly nanowire crossbars have shown promising potential for future nano-scale circuit design. However, due to the density and size factors of nanowires and molecular switches, the fabrication fault densities are much higher than the conventional silicon technology, and hence pose greater design challenges. To improve the reliability of nanoelectronic PLA designs, various electronic design automation tools with emphases on defect tolerance are developed. In this chapter, we discuss some promising examples of defect-tolerant techniques for nano-PLAs, specifically, the approaches based on information redundancy and defect-aware configuration. We summarize some of our own results as well as other works in this area and provide some future research directions. Keywords Fault tolerance Nanowire Programmable logic arrays Design method Satisfiability
Introduction Novel nano-scale electronic devices have been proposed to enhance or possibly replace the conventional complementary metal-oxide semiconductor (CMOS) technology for future circuit system design [1, 2]. Given these emerging opportunities, the traditional top-down fabrication process becomes difficult and costly to deal with such increasingly size-shrinking nano-devices. The alternative approach, bottom-up self-assembly process, demonstrates promising potential to fabricate nano-electronic circuits more economically and precisely [3, 4]. Y. Zheng () and C. Huang Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, Virginia, 24061 e-mail: [email protected]; [email protected] C. Huang (ed.), Robust Computing with Nano-scale Devices: Progresses and Challenges, Lecture Notes in Electrical Engineering 58, c Springer Science+Business Media B.V. 2010 DOI 10.1007/978-90-481-8540-5 4,
51
52
Y. Zheng and C. Huang
The lithography-independent self-assembled process features fabrication regularity, which is well suited to implement reconfigurable structures [5]. Architectures such as nanowire-based programmable logic array (PLA) thus have become active research topics most recently [6–8]. Due to the density and size factors of nanowires and molecular switches, however, the PLA circuit defects are inevitable during the non-deterministic selfassembly process. The defect rates could be relatively high – about 15% defective crosspoints of using molecular switches have been observed in a recently fabricated 8 8 crossbar [9]. This figure is orders of magnitude larger than conventional silicon technologies. Therefore, advanced fault-tolerant design methods are essential to fully take advantage of emerging nanotechnologies. In this chapter, we will focus on the defect-tolerance techniques and tools that aid the reliable implementation on defective nanoelectronic PLA structures. There are two basic approaches to deal with high defect rates: design with redundant logic, and defect-aware configuration to bypass defects. As the smaller feature size makes it relatively easier to have a larger number of computation units on chip, redundancy based defect-tolerant techniques can obtain higher reliability by adding redundant circuitry, such as conventional approach of triple modular redundancy (TMR) and approaches specially tuned for nanodevices. Techniques of defect-aware configuration, which rely on the information of defective devices, can configure a reliable implementation by only utilizing defect-free units. We first introduce the background materials of the PLA structure and defect models in section Background. Sections Redundancy-based defect-tolerant techniques and Defect-aware logic mapping for nanowire-based PLA discuss the redundancy-based and defectaware defect-tolerance methodologies for nanoelectronic PLAs, respectively. The experimental results are presented in section Experimental results. Finally section Conclusions and perspectives concludes.
Background Nanowire-Based PLA The nanowire-based PLA structures [10, 11] are widely used as the essential computation units in novel nanoelectronic architectures or hybrid CMOS/nano architectures. The nanoFabric [7] provides one example of a programmable architecture based on nanowire PLAs. It is a 2-D mesh of nanoBlocks, the basic structure implementing logic functions or interconnections with nanowire crossbars. The nanoPLA [6] consists of crossed sets of parallel nanowires serving as the basic logic structure to provide a programmable NOR plane. The two back-to-back NOR planes can implement arbitrary logic functions as the NOR-NOR structure is equivalent to traditional AND-OR PLA supported by the DeMorgan’s Law [6]. In these works, the nanoelectronic PLA structures based on molecular crossbars are the main building blocks. Generally speaking, a crossbar contains two groups of parallel nanowires, which are perpendicular to each other, and molecules
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays a
a
b
c
53
f 1 f2 ab bc ac ac
AND plane
f1 = ab+ bc + ac f2 = ab + ac
OR plane
Fig. 1 Logic function mapping on defect-free PLA
at the intersections (crosspoints) as the programmable switches. By using pullup or pulldown resistors to configure the crosspoint switches, the crossbars can implement logic AND or OR functions. A nanowire-based PLA circuit is shown in Fig. 1. It is an AND/OR PLA implementing Boolean functions in the form of sum of products. The AND plane selectively generates the product terms of the desired Boolean functions, while the OR plane chooses to sum the corresponding product terms, thereby, to result in the circuit outputs. This selection process, i.e., the PLA mapping process, is accomplished by programming the nanodevices at the crosspoints. Mapping an arbitrary set of Boolean functions on a defect-free PLA is straightforward. Figure 1 demonstrates N C ac N and f2 D aN bN C ac. an example of realizing functions f1 D aN bN C bc
Defect Model Due to the density and small size factors of nanowires and molecular switches, the circuit defects become unfortunately inevitable, which pose great challenges to designers. In general, the molecular PLA defects can be categorized as follows. Switch Stuck-Open Fault
The switch connecting two perpendicular nanowires is stuck-open – the switch is preprogrammed as “OFF”. Although the configurability of this specific junction is lost, two related nanowires can still be used as if the faulty switch is left open during the mapping. A stuck-open example, shown with sign in Fig. 2a, represents the faulty switch at the crosspoint of horizontal nanowire h2 and vertical nanowire v1 . We can still map a product term and a variable to nanowires h2 and v1 , respectively, if the variable mapped to v1 does not belong to the product term mapped to h2 . Switch Stuck-Closed Fault The switch connecting two perpendicular nanowires is stuck-closed – the switch is preprogrammed as “ON”. In this case, we can either generate a mapping with this switch turned on, or use the vertical (horizontal) nanowire and leave the horizontal (vertical) nanowire unused in AND (OR) plane. For the switch stuck-closed fault (sign ) in Fig. 2a at the crosspoint of v3 and h3 , we can either leave h3 unused, or map a variable to v3 and a product term containing this variable to h3 .
54
Y. Zheng and C. Huang
a
v1 v2 v3 v4 v5
v6 v7 v8
b b
a
c
a
f2
f1
h1 h2 h3 h4 h5
AND plane OR plane Switch stuck open Switch stuck closed Nanowire broken
AND plane OR plane Switch stuck open Switch stuck closed Nanowire broken
Fig. 2 Defective PLA: (a) defect model, and (b) defect-aware mapping
Nanowire Broken Fault
A nanowire in PLA is broken into two or more segments. In this situation, the segment that connects to the PLA input/output can still be used, such as the crosspoint at the intersection of the broken nanowire v7 and horizontal nanowire h1 shown in Fig. 2a. Other Faults That Result in Unusable Nanowires Since a complete unusable nanowire can be removed from the PLA model, these faults are not considered in the mapping algorithms. Figure 2b illustrates a possible mapping of the same functions as seen in Fig. 1 on the defective PLA given in Fig. 2a, where various aforementioned defects are presented.
Redundancy-Based Defect-Tolerant Techniques The high defect density of nano-PLA structures requires novel approaches to tolerant defects on the design process. One design approach to tackle high defect rates is to use fault masking by introducing redundant information. These techniques, efficient in improving design reliability, are defect unaware and do not require a fault diagnosis preprocess.
Conventional Redundancy-Based Techniques The early progress of fault masking via redundancy can be traced back to John von Neumann’s work on creating reliable systems with unreliable components [12]. Based on the concept of von Neumann, techniques such as triple modular redundancy (TMR), N-modular redundancy (NMR), etc. have been proposed, which can be also employed in nanocomputing systems. The TMR, as shown in Fig. 3a, is a structure with three identical functional units working on the same inputs in parallel whose outputs are compared by a majority
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays
a
b
55
FU1
FU1 Input
FU2
V
Output
Input
FU2 V
Output
FU3 FU3 FUN
Fig. 3 Conventional redundancy-based fault tolerant techniques: (a) TMR and (b) NMR
gate (a voter). Ideally, the three copies of hardware produce the same outputs if they are defect free. Since the majority voter selects the most common output among the three units, at most one of them can suffer from invalid output due to circuit defects. This is to say, the TMR structure can tolerate multiple defects existing only in a single copy of function unit. However, defects simultaneously existing in more than one function unit will lead to faulty outputs. The TMR structure can be easily adapted in nanoelectronic PLAs with the majority voter implemented as f D f1 f2 C f1 f3 C f2 f3 . The major overhead for the TMR is the increased circuit area, which can be overcome by the adequate hardware resources that nanoelectronic PLAs provide. The NMR is an extension of TMR. As illustrated in Fig. 3b, it uses N rather than three copies of the hardware to tolerant multiple defects in bN=2c copies of function units. Some other variations on TMR approaches are proposed such as cascaded TMR structures [13]. However, the fault tolerance capability of these structures may not be sufficient for the nanotechnologies with high defect rate. Additionally, the reliability of the voters become a dominant concern in these fault tolerance methods.
Logic Duplication in PLA A class of fault masking approaches by exploiting logic duplication are proposed in [14]. This method targets the missing device fault (switch stuck-open fault) at PLA crosspoints. For a logic function implemented on a two-level PLA, a missing device fault in the AND plane results in a variable missing from a product term (e.g., f D ab C cd becomes f D b C cd), while a missing device fault in the OR plane leads to a product term missing from the function (e.g., f D ab C cd becomes f D cd). Consider the example logic function f D ab C cd shown in Fig. 4a. To mask the variable or term from missing device faults, the implementation with duplicated input variables fAND D ababCcdcd (Fig. 4b) or duplicated product terms fOR D ab C cd C ab C cd (Fig. 4c) can be utilized for the AND plane and OR plane, b D abab C cdcd C abab C cdcd, illustrated respectively. A total redundant form f in (Fig. 4d), can mask the device dropping faults in both the AND and OR planes.
b
b
56
Y. Zheng and C. Huang
a
b a
b
c
d
a
f
c
d
a
b
c
d
fAND
ab
abab
cd
cdcd
OR plane
AND plane
b
c
AND plane
OR plane
d a
b
c
d
fOR
a
b
c
d
a
b
c
d
ab
AND plane
f abab
cd
cdcd
ab
abab
cd
cdcd
OR plane
AND plane
OR plane
b
Fig. 4 Fault masking of two-level PLA: (a) f D ab C cd, (b) fAND D abab C cdcd, (c) fc OR D f D abab C cdcd C abab C cdcd ab C cd C ab C cd, and (d) b
The logic duplication approach for PLA structures can be further extended by implementing multiple copies of the original function. Theoretically, if one of the copies is implemented on the defect-free PLA portion, the redundant function will correctly represent the desired functionality. Therefore, the redundancy-based approach proposed in [14] achieves higher defect tolerance than conventional fault masking approaches such as TMR or NMR. Moreover, the proposed method is more efficient for PLA structures in terms of hardware requirement. The additional hardware cost for the majority voter, the critical component in TMR or NMR, is avoided in this PLA masking scheme. Fault masking techniques are fault-independent so that they do not need a fault diagnosis step to locate all the defects before circuit configuration. It is scalable and demonstrates good reliability and performance under a relatively low defect rate. However, it suffers the problem that they cannot guarantee to provide a fault-free implementation for all the defect patterns, especially under relatively high defect rates.
Defect-Aware Logic Mapping for Nanowire-Based PLA For molecular PLAs, malfunctioning crosspoints and broken nanowires impose a great deal of topological constraints in logic synthesis. Apart from the redundancybased defect-tolerance techniques discussed in section Redundancy-based
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays
57
defect-tolerant techniques, some other researches focus on mapping logic functions onto a defective PLA without logic duplication. Such design methods are based on pre-knowledge of existing defects, which are usually provided as a defect map by defect diagnosis or testing procedures. The resulting implementations, which are dependent of defect types and locations on board, can vary greatly from one to another. As demonstrated at the example shown in Fig. 2, mapping logic functions exactly onto a defective PLA is nontrivial. It can be difficult to find a feasible solution if the defect rate is high. Heuristics have been developed to address this design automation problem [15–17]. In [16], the logic mapping is achieved on the defect-free subsets identified in a defective crossbar, such that defect information will be transparent to high-level designs. The work in [17] models the PLA synthesis problem as embedding a logic function bipartite graph into a crossbar bipartite graph, and develops heuristics to help prune impossible mappings. In this section, we will present in detail the aforementioned methods as well as defect-aware logic mapping on nanowire-based PLAs via satisfiability (SAT).
Defect-Free Subset Identification The design flow proposed in [16] is essentially to identify defect-free subsets in a defective crossbar such that the high-level configuration steps are relieved from the existing defects. Specifically for the nanoelectronic PLA structure, a greedy mapping algorithm is constructed to find a maximum k k defect-free subset within a defective N N crossbar for logic function mapping. The problem is modeled as the maximum biclique subset problem in a bipartite graph. A bipartite graph, denoted as B.U; V; E/, is a graph whose vertex set can be partitioned into two disjoint sets X and Y such that every edge connects a vertex in X and a vertex in Y . A biclique is a complete bipartite graph where every vertex in X is connected to every vertex in Y , denoted as KjXj;jY j . An N N crossbar can be represented by a bipartite graph B.H; V; E/, where H and V are the sets of horizontal and vertical nanowires, respectively, and E is the set of defect-free programmable switches at the crosspoints. An edge ehv belongs to set E if the switch at the crosspoint of horizontal nanowire h (h 2 H ) and vertical nanowire v (v 2 V ) is programmable. Specifically, if the crossbar is defect-free, we have jEj D jH j jV j and the graph is a complete bipartite graph, in other words, biclique. In a bipartite graph representation of the N N crossbar, a k k (k < N ) biclique corresponds to a defect-free k k subset. Therefore, to identify a defect-free subset can be modeled as finding the maximum biclique in a bipartite graph, which is NP-complete. To solve this intractable problem, the author in [16] converts it to finding the maximum independent set in the complement bipartite graph, which is then solved by a heuristic in a greedy manner. The complement bipartite graph of B.H; V; E/ N N where E=K N jXj;jY j -E. An independent set of a graph is a is defined as B.H; V; E/, set of vertices disconnected to each other. The greedy heuristic to find the maximum independent set in BN works as follows: the vertices with zero degree (isolated vertices) are added to the independent set if
58
Y. Zheng and C. Huang
a
b v1 v2 v3 v4 v5 h1 h2 h3 h4 h5
AND plane Switch stuck open Switch stuck closed Nanowire broken
c vertex set H
vertex set V
vertex set H
vertex set V
h1
v1
h1
v1
h2
v2
h2
v2
h4
v4
h4
v4
h5
v5
h5
v5
Fig. 5 Example of defect-free subset identification: (a) defective PLA, (b) bipartite graph N N B.H; V; E/, and (c) complement bipartite graph B.H; V; E/
available, otherwise the vertex with maximum degree in set H and V is removed alternatively until new isolated vertices appear in the remaining graph. The heuristic terminates if the remaining graph is empty. The defect-free subset obtained by this heuristic is a k1 k2 crossbar rather than k k. However, because of removing vertices from set H and set V alternatively, the value of k1 and k2 are close to each other, resulting in a near square shape of defect-free crossbar subset. Let us take the AND plane of the defective PLA in Fig. 2 as an example illustrated in Fig. 5. For the switch stuck closed fault at the crosspoint of h3 and v3 , the heuristic [16] treats both nanowires h3 and v3 as unusable and therefore the corresponding vertices are not included in the bipartite graph shown in Fig. 5b. The N both h1 bipartite graph B is then converted to its complement BN (Fig. 5c). In B, and v4 are isolated vertices and added to the independent set. Thereafter, all the remaining vertices have a degree of one and h2 is picked to remove, resulting in a new isolated vertex v1 . Then we continue to remove the vertex of maximum degree in set V , v2 for instance, and h5 is added to the independent set. By following this process, we will eventually have an independent set of h1 ; h5 ; v1 ; v4 ; v5 , representing a 2 3 defect-free crossbar. The greedy heuristic [16] can generate a biclique whose size is comparable to those obtained by exact algorithms, while the time complexity of the heuristic is only O.nlogn/. By storing only the information of defect-free subsets rather than the locations of all the defects, the size of the defect map is reduced from O.n2 / to O.n/. More importantly, it provides a defect-unaware design flow for high-level designs. But one major limitation of this approach is that considering only the defect-free subset crossbars will greatly reduce the hardware utilization ratio.
Defect-Aware Mapping Through Bipartite Graph Embedding To take full advantage of the partially functional portions of defective PLAs, a defect-aware mapping algorithm is proposed in [17] by modeling and solving the
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays
a
v1 v2 v3 v4 v5 h1 h2
b
59
c
vertex set H
vertex set V
vertex set I
vertex set P
h1
v1
a
ac
h2
v2
a
ab
h4
v4
b
ac
h5
v5
c
bc
h3 h4 h5
AND plane Switch stuck open Switch stuck closed Nanowire broken
Fig. 6 Example of bipartite graph: (a) defective PLA, (b) nanowire crossbar bipartite graph B.H; V; E/, and (c) logic function bipartite graph B 0 .P; I; E 0 /
logic mapping as a bipartite graph embedding problem. As discussed in section Defect-free subset identification, the bipartite graph model is an accurate and efficient representation of defective PLA topologies. The authors in [17] also employ the concept of bipartite graph to model the PLA structure, namely nanowire crossbar bipartite graph. Additionally, logic functions are modeled as bipartite graphs as well. For a two-level logic function in the form of sum-of-products, it can be represented as a logic function bipartite graph B 0 .P; I; E 0 /, where P and I are the sets of product terms and input variables of the logic function, respectively. An edge epi is added into the graph if and only if product p contains input variable i . N C ac Figure 6 demonstrates an example of realizing functions f1 D aN bN C bc N N and f2 D aN b C ac on a defective PLA. Here, only the AND plane configuration is considered. Figure 6b is the bipartite graph of the nanowire crossbar, while Fig. 6c shows the logic function bipartite graph representing the relationship between input variables and product terms of f1 and f2 . The configuration process from logic function to nanowire-based PLA is to map the input variables and product terms to the horizontal and vertical nanowires, respectively, such that the relationship between the inputs and products can be realized by programming the corresponding switches at the nanowire crosspoints. This process is equivalent to embedding the logic function bipartite graph B 0 .I; P; E 0 / to the crossbar bipartite graph B.H; V; E/. The embedding process can be formalized as finding a vertex mapping M W P ! H; I ! V such that for all epi 2 E 0 , we have eM.p/M.i / 2 E. The proposed algorithm [17] recursively explores the search space. It picks and maps an unmapped element in set P or I to an unused nanowire in set H or V through recursion. If a certain decision leads to an unfeasible mapping, it backtracks and tries another mapping alternative, until a mapping solution is found or the whole design space has been searched. Several search space pruning techniques are designed to facilitate an enhanced space exploring by taking into account vertex degrees.
60
Y. Zheng and C. Huang
a
b b ac
h1
ab
c
a ac
b h2
v2
h4
v4
h5
v5
ac bc
a
v1
ac a bc
c a
AND plane
ab
Switch stuck open Switch stuck closed Nanowire broken
Fig. 7 Example of bipartite graph embedding: (a) embedding result, and (b) corresponding PLA configuration
The embedding result of the logic function bipartite graph B 0 (Fig. 6c) to the crossbar bipartite graph B (Fig. 6b) is shown in Fig. 7a with the edges of B 0 highlighted in bold. This embedding result provides a feasible mapping solution in Fig. 7b.
SAT-Based Defect-Aware Mapping SAT is the problem of deciding if there exists an assignment for the variables in a propositional formula that makes the formula true. These problems are usually formulated in the conjunctive normal form (CNF), which consists of the conjunction (logical AND) of several clauses and each clause is a disjunction (logical OR) of one or more literals. SAT-based methods have been widely used to solve complex problems in electronic design automation, including combinational equivalence checking [18], model checking [19], routing [20], etc. Thanks to the achievements of current SAT solvers in terms of efficiency and scalability [21–23], it is beneficial to employ SAT-based methods in the context of emerging nano-technology design. We have developed techniques to efficiently formulate PLA logic mapping into Boolean CNF formulas. The PLA defects are integrated as covering and closure constraints, including switch stuck-open fault, switch stuck-closed fault, nanowire broken fault, and other faults that result in unusable nanowires. Compared with the prior works on defect-tolerant mapping which tackle a single crossbar each time [16, 17], our technique considers defects on both input (AND) and output (OR) planes at the same time, and generates mapping on these two crossbars synergistically. This comprehensive approach can help to solve mapping problems with higher defect rates. This work is also among the first to consider assignments on broken nanowires and stuck-closed switches, which are treated as completely defective/unusable
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays
61
in the prior works. Moreover, the proposed method does not limit itself to the AND/OR PLA structure, but is inherently suitable for other nano-scale structures, such as the NOR/NOR PLAs [6]. We further investigate the impact of different defects on PLA mapping, which helps set up an initial contribution for yield estimation and utilization of defective PLAs. A nanowire-based PLA architecture (Fig. 1), with an jH j jVa j crossbar for the AND plane and an jH j jVo j crossbar for the OR plane, is a 3-tuple C.H; Va ; Vo /, where set H represents the set of horizontal nanowires and sets Va and Vo represent the set of vertical nanowires in the AND and OR plane, respectively. A set of logic functions can also be expressed as a 3-tuple F .I; P; O/, where sets I , P , O represent the set of input variables, product terms, and outputs, respectively. Given a PLA C.H; Va ; Vo / and a function set F .I; P; O/, the PLA mapping problem can be defined as finding a mapping between C and F such that there exist injective relation g1 W I ! Va , g2 W P ! H , and g3 W O ! Vo . In order to model the PLA mapping as a satisfiability problem, we encode the mapping by introducing Boolean variables and formulating the constraints into SAT clauses. For the given input variables I and vertical nanowires Va of the AND plane, we use jI j jVa j Boolean variables XIVa D fxiv ji 2 I; v 2 Va g to represent the injective mapping g1 W I ! Va . If the i th input variable is mapped to the vth vertical nanowire, variable xiv is true, otherwise false. Similarly, we introduce jP j jH j Boolean variables YPH D fyph jp 2 P; h 2 H g to encode product set to horizontal Vo nanowire set mapping g2 W P ! H , and jOj jVo j Boolean variables ZO D fzvo jo 2 O; v 2 Vo g for g3 W O ! Vo mapping. To encode the constraints, let us consider the requirements for a feasible mapping solution. The requirements can be categorized as two types: covering constraints and closure constraints. The covering constraints ensure that each input, product term, or output is mapped to at least one nanowire. The closure constraints ensure that no input, product term, or output of the function and no nanowire is mapped more than once. For the injective mapping g1 W I ! Va , the constraints can be summarized as follows. Covering Constraints Each input must be assigned to at least one vertical nanowire: ^ _ i 2I
xiv
v2Va
Closure Constraints The first closure constraint is that each input must be assigned at most one vertical nanowire. In other words, for each pair of variables xiv1 and xiv2 , at least one is assigned to zero (false). ^ i 2I
v1^ ¤v2 v1 ;v2 2Va
.:xiv1 _ :xiv2 /
62
Y. Zheng and C. Huang
Another closure constraint ensures that at most one input is assigned to each nanowire. In other words, for variable pair xiv1 and xiv2 , at least one is assigned to zero. ¤i2 ^ i1^ v v .:xi1 _ :xi2 / i1 ;i2 2I
v2Va
Similarly, the constraints for g2 W P ! H are: ^_ p2P
p2P
h2H
^
^
^
^
yph
p1^ ¤p2
h1^ ¤h2
.:yph1
_
:yph2 /
h1 ;h2 2H
.:yph1 _ :yph2 /
p1 ;p2 2P
h2H
The constraints for g3 W O ! Vo are: !
^ _ o2O
zvo
^
o2O
v2Va
^
^
^ v2Vo
o1^ ¤o2
v1^ ¤v2
.:zvo1 _ :zvo2 /
v1 ;v2 2Vo
.:zvo1
_
:zvo2 /
o1 ;o2 2O
We construct a satisfiability formula by conjucting all the above constraints. The assignment that satisfies all the clauses is a possible assignment result for mapping function set F .I; P; O/ on a defect-free PLA C.H; Va ; Vo /. To take into account the PLA defects discussed in section Defect model, we formulate all these defects as satisfiability constraints such that the solution for these constraints in conjunction with the formula of the defect-free assignment is a feasible defect-aware mapping relation. Switch Stuck-Open Fault If the switch is stuck at open at the crosspoint of vertical nanowire v and horizontal nanowire h on the AND plane, we cannot map either input i to nanowire v or product term p to nanowire h, given that i is a literal contained in p. ^ ^ .:xiv _ :yph / p2P
i 2p
If such fault presents on the OR plane, we cannot map either output o to nanowire v or product p to nanowire h, if h is a product term of output o. ^^ o2O
.:yph po
_
:zvo /
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays
63
Switch Stuck-Closed Fault We use the AND plane as an example. If a switch is stuck at closed at the crosspoint of vertical nanowire v and horizontal nanowire h, and if a product p is map to nanowire h, we can either map an input belonging to p or a logic one to nanowire v. Therefore, we introduce a number of jVa j addition v variables here to represent mapping logic one to vertical nanowires: xone .v 2 Va /. Then the corresponding constraints for the stuck-closed fault are formulated as: _ ^ v :yph _ xone _ xiv i 2p
p2P
If such a defect presents on the OR plane, and if an output f is map to nanowire v, a product of output o or a logic one must be mapped to nanowire h. Similarly a group h .h 2 H / are introduced to formulate the constraints as: of new variables yone _ ^ h :zvo _ yone _ yph po
o2O
Due to the introduction of new variables, proper changes are made on closure constraints to guarantee that no other variables and logic one are mapped to the same nanowire. Nanowire Broken Fault If a horizontal nanowire is broken, we treat the whole nanowire as defective. For a broken vertical nanowire, the only segment that can be used are the ones that connect to the PLA input/output. We define the horizontal nanowires that intersect the defective vertical nanowire v at the unusable crosspoints as nanowire set Bv . For example, for the broken nanowire v7 in the OR plane in Fig. 2a, the horizontal nanowires other than h1 are in set Bv . Therefore, if an input/output is mapped to the broken nanowire v, its related product terms cannot be assigned on the horizontal nanowires in set Bv . We thus derive the following constraints for a broken nanowire in the AND and OR plane, respectively. ^ ^ ^ i 2I
h2Bv
.:xiv _ :yph /
p3i
^ ^ ^ o2O
h2Bv
.:zvo _ :yph /
po
The computation complexity of SAT solving is related to the size of the SAT instance, specifically, the number of variables and clauses. Given a PLA of N N crossbars, the number of variables and clauses of the SAT formula for a defect-free mapping is respectively O.kN / and O.kN 2 /, where k D jI j C jP j C jOj. Due to our efficient problem formulation, the defect-aware constraints do not increase the order of both variable and clause numbers. Thanks to the achievements of current SAT solvers, a feasible solution can be generated in reasonable times for N up to several hundred.
64
Y. Zheng and C. Huang
Experimental Results We evaluate our proposed defect-aware PLA mapping methodology by using the PLA benchmarks from LGSynth93 benchmark set [24]. We compare the effects when different PLA defect rates are present. SAT solver Berkmin561 [23] is employed to solve the formulated CNF instances. The experiments are performed on an Intel Xeon 3GH z workstation with 2GB memory running Linux operation system. Table 1 summarizes the experimental results of solving the CNF instances generated for mapping problems at different defect rates. Po , Pc , and Pb represents the defect rate of switch stuck-open fault, switch stuck-closed fault, and broken nanowire fault, respectively. We randomly generate these three types of faults with a ratio 3:1:1, since the switch stuck-open fault is the most common. The size of the PLA structure is related to the size of the benchmark circuit. For the experiment results demonstrated in Table 1, the PLA size is 1:5X the size of the benchmark circuit, in other words, jVa j D 1:5 jI j, jVo j D 1:5 jOj, and jH j D 1:5 jP j. The number of variables and clauses for each CNF instance is given under columns Var and Cls, respectively. Since the numbers of variables for defective-aware mapping problems are the same, the variable numbers are listed once. Column SAT lists whether the SAT solver can successfully find a solution within a time limit of 1,200 s. The actual run time is shown under column time. As demonstrated in Table 1, the proposed defect-aware PLA mapping method can find a feasible mapping efficiently even at a high defect rate. Compared with the work in [17], our method can be used for larger circuits with more defects yet with great improvements in solving performance. The SAT solving time of defect-aware PLA mapping is comparable to the defect-free mapping for most of the cases at a total defect rate of both 10% and 15%. As the defect rate increases, the constraints for a feasible mapping become large. Finding a mapping solution or proving that no solution exists may require longer computation time, possibly exceeding the time limit. However, the proposed method has demonstrated its capability to efficiently solve PLA mapping problems for large circuits, such as clip (with 167 product terms), at a total defect rate of 20%. We further provide experimental results to analyze the impact of different types of defects on defect-aware PLA mapping. We use benchmark circuit 5xp1 as an example and other benchmarks have shown a similar trend. Figure 8 illustrates the probability of finding a mapping solution at different defect rates. For each defect rate, 20 defective PLA structures are randomly generated. The PLA size is also proportional to the benchmark circuit size. For example, the 1:5X in the legend represents jVa j D 1:5 jI j, jVo j D 1:5 jOj, and jH j D 1:5 jP j. For those defective PLAs with Pc less than the value shown in Fig. 8, we can always find a feasible solution. Figure 9 gives the corresponding average SAT solving time. As expected, the PLAs with higher defect rates require longer SAT solving time and lead to a lower probability of finding a feasible solution. Especially when the defect rates are high enough to limit the feasible solutions, the solving time grows exponentially. Compared with the switch stuck-open fault, the switch stuck-closed fault contributes more in limiting performance. We can observe from Figures 8 and 9
Circuit rd53 inc misex2 sao2 bw 5xp1 9sym rd73 table5 clip
jI j=jOj=jP j 10/3/32 14/9/34 50/18/29 20/4/58 10/28/65 14/10/75 18/1/87 14/3/141 34/15/158 18/5/167
Var 1,701 2,154 5,512 5,670 7,696 8,919 11,885 30,201 39,525 42,443
Cls 61,719 78,192 286,469 375,367 554,597 794,850 1,241,432 5,251,100 7,436,517 8,729,595
Time (s) 0.017 0.018 0.068 0.087 0.131 0.179 0.297 1.463 2.120 2.380
Var 1,764 2,226 5,631 5,787 7,809 9,053 12,043 30,434 39,813 42,721
Cls 70,505 97,314 333,131 452,475 614,708 857,081 1,371,379 5,527,061 9,088,189 9,175,331
Time (s) 0.046 0.054 0.146 0.176 0.331 0.411 0.738 2.219 188.8 4.107
Table 1 SAT solving comparisons at different defect rates Defect-free Po = 6%, Pc = 2%, Pb = 2% SAT Y Y Y Y Y Y Y Y Y Y
Cls 73,986 105,977 353,956 488,167 641,679 883,892 1,430,541 5,649,527 9,893,247 9,377,051
Time (s) 0.039 0.063 0.483 0.214 0.724 0.488 0.630 2.288 >1,200 6.582
SAT Y Y Y Y Y Y Y Y NA Y
77,608 114,507 374,828 524,370 668,442 910,628 1,489,626 5,772,977 10,700,839 9,579,688
Cls
Time (s) 0.042 0.056 111.2 0.312 16.57 0.618 0.506 2.456 >1,200 108.5
SAT Y Y Y Y Y Y Y Y NA Y
Po = 9%, Pc = 3%, Pb = 3% Po , = 12%, Pc = 4%, Pb = 4%
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays 65
Y. Zheng and C. Huang Probability of finding mapping
66 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0%
2%
4%
6%
1x, Pc = 2%
8%
10% 12% 14% 16% 18% 20% Po
1.25x, Pc = 6%
1.5x, Pc = 8%
Fig. 8 Circuit 5xp1: probability of mapping at different PLA sizes and defect rates
Time (ms)
1000000 100000 10000 1000 100
0%
2%
4%
6%
8% 10% 12% 14% 16% 18% 20% Po
1x, Pc = 0%
1.5x, Pc = 4%
1.25x, Pc = 4%
1.25x, Pc = 6%
1.25x, Pc = 2%
1.5x, Pc = 8%
1x, Pc = 2%
1.5x, Pc = 6%
Fig. 9 Circuit 5xp1: average SAT solving time at different PLA sizes and defect rates
that the solving performance degrades more if increasing Pc by 2% than increasing Po by 2%. The reason is because a switch stuck-closed defect may affect the use of a complete horizontal nanowire, while a switch stuck-open defect only affects a single crosspoint. The probability of a feasible mapping under a given defective PLA is directly related not only to the defect rate but also to the PLA size. For the same defect rate, the increase of PLA size can improve the probability of successful mapping. As demonstrated in Fig. 9, for a total defect rate of 15% on crosspoints under the current fabrication technology [9], a 50% redundancy of PLA size is sufficient. These relations among defect rate, PLA size, and solving performance, demonstrated by our experimental results, provide a guidance for yield estimation and improvement for future nano-scale PLA implementations. Given a defect rate, a properly selected PLA size can help to achieve a high yield within a reasonable design time.
Fault-Tolerant Design for Nanowire-Based Programmable Logic Arrays
67
Conclusions and Perspectives The high defect density of nano-PLA structures requires novel defect-tolerance approaches to improve the reliability. Two types of promising techniques to handle the high defect rates, namely design with redundant logic and defect-aware configuration, are discussed. The fault masking methods by introducing redundant information, are efficient in the design reliability but cannot guarantee working properly under all the defective conditions. The defect-aware design methodologies attempt to avoid defective devices based on pre-knowledge of existing defects. The resulting implementations, which are dependent of defect types and locations on board, can vary greatly from one to another. Although these techniques are fairly efficient, there are still plenty room to improve the defect-tolerance designs of nanoelectronic systems. The defect-aware configuration approaches, for example, rely heavily on the information of defect locations. This suggests a requirement of effective diagnosis procedure to handle the increasing size of available hardware. A compact defect map design is another important issue that has not yet been fully explored. The defect-aware approaches can handle permanent defects efficiently, but may fail to work if unexpected transient faults appear. However, the redundancy-based methods, on the other hand, have a better tolerance for transient faults. The hybrid design method of integrating both types of techniques, may be a practical way to cover more failure conditions as well as alleviating the extensive diagnosis process. Furthermore, the works summarized in this chapter are performing at the logic level. To solve the problem efficiently and effectively, the defect tolerance considered at all the design levels would remain a fundamental topic for future research.
References 1. A. DeHon and K. K. Likharev, “Hybrid CMOS/nanoelectronic digital circuits: Devices, architectures, and design automation,” in Proc. Int. Conf. Computer-Aided Design, Nov. 2005, pp. 375–382. 2. S. K. Shukla and R. I. Bahar, Nano, Quantum and Molecular Computing: Implications to High Level Design and Validation, Kluwer Academic Publishers, Boston, MA, 2004. 3. Y. Huang, X. Duan, Q. Wei, and C. M. Lieber, “Directed assembly of one-dimensional nanostructures into functional networks,” Science, vol. 291, no. 5504, pp. 630–633, Jan. 2001. 4. V. V. Zhirnov and D. J. C. Herr, “New frontiers: Self-assembly and nanoelectronics,” IEEE Computer, vol. 34, no. 1, pp. 34–43, Jan. 2001. 5. J. R. Heath, P. J. Kuekes, G. S. Snider, and R. S. Williams, “A defect-tolerant computer architecture: Opportunities for nanotechnology,” Science, vol. 280, no. 5370, pp. 1716–1721, June 1998. 6. A. DeHon and M. J. Wilson, “Nanowire-based sublithographic programmable logic arrays,” in Proc. Int. Symp. Field-Programmable Gate Arrays, Feb. 2004, pp. 123–132. 7. S. C. Goldstein and M. Budiu, “NanoFabric: Spatial computing using molecular electronics,” in Proc. Int. Symp. Computer Architecture, June 2001, pp. 178–189. 8. G. S. Rose and M. R. Stan, “A programmable majority logic array using molecular scale electronics,” IEEE Trans. Circuits & Systems I, vol. 54, no. 11, pp. 2380–2390, Nov. 2007.
68
Y. Zheng and C. Huang
9. Y. Chen, G.-Y. Jung, D. A. A. Ohlberg, X. Li, D. R. Stewart, J. O. Jeppesen, K. A. Nielsen, J. F. Stoddart, and R. S. Williams, “Nanoscale molecular-switch crossbar circuits,” Nanotechnology, vol. 14, no. 4, pp. 462–468, Apr. 2003. 10. M. R. Stan, P. D. Franzon, S. C. Goldstein, J. C. Lach, and M. M. Ziegler, “Molecular electronics: From devices and interconnect to circuits and architecture,” Proc. IEEE, vol. 91, no. 11, pp. 1940–1957, Nov. 2003. 11. T. Hogg and G. S. Snider, “Defect-tolerant adder circuits with nanoscale crossbars,” IEEE Trans. Nanotechnology, vol. 5, no. 2, pp. 97–100, Mar. 2006. 12. J. von Neumann, “Probabilistic logics and the synthesis of reliable organisms from unreliable components,” Automata Studies, no. 34, pp. 43–98, 1956. 13. K. Nikolic, A. Sadek, and M. Forshaw, “Fault-tolerant techniques for nanocomputers,” Nanotechnology, vol. 13, no. 3, pp. 357–362, June 2002. 14. W. Rao, A. Orailoglu, and R. karri, “Logic level fault tolerance approaches targeting nanoelectronics PLAs,” in Proc. Design Automation & Test Europe Conf., Apr. 2007, pp. 865–869. 15. H. Naeimi and A. DeHon, “A greedy algorithm for tolerating defective crosspoints in NanoPLA design,” in Proc. Int. Conf. Field-Programmable Technology, Dec. 2004, pp. 49–56. 16. M. B. Tahoori, “A mapping algorithm for defect-tolerance of reconfigurable nanoarchitectures,” in Proc. Int. Conf. Computer-Aided Design, Nov. 2005, pp. 667–671. 17. W. Rao, A. Orailoglu, and R. Karri, “Topology aware mapping of logic functions onto nanowire-based crossbar architectures,” in Proc. Design Automation Conf., July 2006, pp. 723–726. 18. E. I. Goldberg, M. R. Prasad, and R. K. Brayton, “Using SAT for combinational equivalence checking,” in Proc. Design Automation & Test Europe Conf., Mar. 2001, pp. 114–121. 19. A. Biere, A. Cimatti, E. M. Clarke, M. Fujita, and Y. Zhu, “Symbolic model checking using SAT procedures instead of BDDs,” in Proc. Design Automation Conf., June 1999, pp. 317–320. 20. R. G. Wood and R. A. Rutenbar, “FPGA routing and routability estimation via Boolean satisfiability,” IEEE Trans. Very Large Scale Integration Systems, vol. 6, no. 2, pp. 222–231, June 1998. 21. J. P. Marques-Silva and K. A. Sakallah, “GRASP: A search algorithm for propositional satisfiability,” IEEE Trans. Computers, vol. 48, no. 5, pp. 593–595, May 1999. 22. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, “Chaff: Efficient SAT solver,” in Proc. Design Automation Conf., June 2001, pp. 530–535. 23. E. Goldberg and Y. Novikov, “Berkmin: A fast and robust SAT solver,” in Proc. Design Automation & Test Europe Conf., Mar. 2002, pp. 142–149. 24. ACM/SIGDA benchmarks: 1993 LGSynth Benchmarks, http://www.cbl.ncsu.edu/benchmarks/ LGSynth93/.
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics Mohammad Tehranipoor
Abstract In this chapter, a built-in self-test procedure is presented for testing and fault tolerance of molecular electronics-based nanoFabrics. The nanoFabrics are assumed to include up to 1012 devices/cm2 and as high as 10% defect density; this requires new test strategies that can efficiently test and diagnose the nanoFabrics in a reasonable time. The BIST procedure shown in this chapter, utilizes nanoFabric’s components as small test groups containing test pattern generator and response analyzer. Using small test groups will result in higher diagnosability and recovery. The technique applies the tests in parallel with a low number of test configurations resulting in a manageable test time for such large devices. Due to high defect density of nanoFabrics, an efficient diagnosis procedure must be done after BIST procedure to achieve high recovery. The procedure is called recovery increase procedure. It increases the available number of fault-free components detected in a nano-chip. Finally, a defect database called defect map is created to be used by compilers during configuring the nanoFabrics to avoid defective components. This results in a reliable system constructed using unreliable components. Keywords Fault tolerance Fault modeling BIST NanoFabrics Molecular electronics
Introduction There are various theoretical, practical and economical obstacles in further scaling the current CMOS technology and these challenges have caused new paradigms in electronic devices, circuits and design methods to emerge. One alternative to CMOS-based computing seems to be chemically assembled electronic nanotechnology (CAEN) [11]. CAEN uses direct self-assembly and self-alignment to construct electronic circuits out of nanometer scale devices. A CAEN-based device, also
M. Tehranipoor () University of Connecticut e-mail: [email protected] C. Huang (ed.), Robust Computing with Nano-scale Devices: Progresses and Challenges, Lecture Notes in Electrical Engineering 58, c Springer Science+Business Media B.V. 2010 DOI 10.1007/978-90-481-8540-5 5,
69
70
M. Tehranipoor
called nanoFabric, is an array of molecular electronic devices. It is estimated that nanoFabrics can include up to 1012 devices/cm2 which is significantly higher than that of CMOS-based devices. However, defect densities as high as 10% are expected in these devices as well [4, 22]. This includes defects in all wires, logic blocks and switches. Such high defect densities require completely new approaches to manufacturing and testing these new nano devices. Increasing defect density increases yield loss and with such a high defect density in nanoFabrics the manufacturing test cost can be prohibitively high and discarding a faulty nano-chip will no longer be possible. As a result a nanoFabric must be tested, diagnosed and the location of defects must be found. To create a reliable system a method of using defective devices must be devised. A natural solution to this problem is to design a reconfigurable nanoFabric device. The architectures can be originated from field-programmable gate arrays (FPGAs) [3] or the work on the Teramac custom computer [6, 13]. Prior research has shown that reconfigurable devices are fault tolerant in a sense that faults can be detected and their location can be stored in defect map [6]. The defect map is a database that stores defect information (such as location, defect type, etc.) of the chip under test and is used during reconfiguration. The faulty blocks can be avoided during the reconfiguration of the device by referring to the defect map. The defect map is required to be constructed for each reconfigurable device. Testing and diagnosis of nanoFabrics are required to achieve fault tolerance through reconfigurability. Similar to a FPGA, which is an interconnected set of configurable logic blocks (CLBs), nanoFabrics also include programmable logic blocks and switches to implement combinational or sequential circuits. This chapter presents a built-in self-test (BIST) procedure that tests the reconfigurable blocks and identifies the faulty ones. The method creates a defect map which can be used by programming device for defect avoidance. A relatively low number of test configurations is required to test all the blocks in a nanoFabric. The proposed test method is fine-grained, i.e., test groups used in the BIST procedure are very small. Using finer granularity will result in higher diagnosis resolution and Recovery. Recovery, in this chapter, is defined as the percentage of blocks identified as fault-free after testing the nanoFabric among all actual fault-free blocks in the nanoFabric [16]. When a test group fails during test, all its blocks will be assumed faulty while there may exist fault-free components in that test group.
NanoFabric Architecture The nanoFabric architecture was first proposed in [11] and an island style interconnect was assumed to provide the interconnection between clusters. Figure 1a shows the island style architecture as an array of size M M of clusters [3]. In Fig. 1b each cluster is shown as a N N array of nanoBlocks and switchBlocks. NanoBlock and switchBlock are two main components in a nanoFabric. A nanoBlock is similar to configurable logic block (CLB) in FPGAs. The nanoBlock is based on a molecular logic array (MLA) of size K K as shown in Fig. 1c [11]. Molecular layer on
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
a
b Inter-cluster interconnect (long wires)
c Inputs on north and east
Cluster NanoBlock
VDD +Vdd
Output on south and west
Switchblock
71
Outputs are latched by inline NDRs that also restore signals with power from clock C
. . MLA . M
Inline NDRs
K
N
Ground
C Gnd
Inline NDRs
Striped regions indicates connections from the CMOS layer
Fig. 1 (a) Island style architecture, (b) cluster structure in a nanoFabric architecture and (c) NanoBlock architecture [29]
From north nanoBlock
From west nanoBlock
Programmable switches in a switchBlock provide routing between wires of neighbouring nanoBlocks
To east nanoBlock
To south nanoBlock
Fig. 2 Internal structure of a switchBlock [28]
the crosspoints of horizontal and vertical nanowires can be programmed as diodes. In this case, they provide the connections between the nanowires. The molecular layer can also be programmed as disconnection (open). These arrays can be created using chemical processes such as self-assembly and self-alignment of nanowires. The region between a set of nanoBlocks, which provides the interconnections between various nanoBlocks, is known as switchBlock. Similar to nanoBlock, a switchBlock is also reconfigurable and serves to connect wire segments of adjacent nanoBlocks. The configuration switches of the switchBlock determine the direction of data flow between the nanoBlocks. Figure 2 shows the internal structure of a switchBlock. Diode-resistor logic can be used to perform logical operations. Both signals and their complements are created to produce a complete logic family. Figure 3 shows implementation of two simple logic gates, AND and OR. Logic values are restored using molecular latches [12]. The molecular latches provide a mechanism for latching the values (outputs of nanoBlocks) and an isolation between the outputs of a nanoBlock and inputs of adjacent nanoBlocks. To completely test a nanoFabric, its clusters and interconnections between them must also be tested. Since the interconnect structure is very similar to island style
72
M. Tehranipoor
a
b V
V
V
A A
A B
B F = A.B
A
B
B
F = A.B
F = A+B F = A+B
Fig. 3 Simple logic functions using diode-resistor logic. (a) An AND gate implemented on a nanoBlock and its diode-resistor equivalent circuitry and (b) An OR gate implemented on a nanoBlock and its diode-resistor equivalent circuitry [28]
FPGAs, the test methods used in FPGA interconnect testing can be used for testing interconnects as well [24]. Hence, our approach focuses on testing the internal structure of clusters, i.e., nanoBlocks and switchBlocks. Due to high defect rates, employing a fine-grained nanoFabric architecture will improve test quality. A fine-grained nanoFabric is defined to be a large array made of small clusters (M is large but N is small in Fig. 1). Such fine-granularity provides high access to the small clusters of the Fabric so that nanoBlocks and switchBlocks inside each cluster can be effectively configured as test groups and the test results can be read from the blocks. Designing fine-grained nano architectures will improve diagnosis capability and recovery through employing smaller test groups.
Fault Model A complete fault model in a nanoFabric includes stuck-at faults, stuck-open faults, forward-biased diode faults, reverse-biased diode faults and bridging (AND, OR) faults among vertical lines, horizontal lines and vertical/horizontal lines. These faults are classified based on different scenarios that occur during self-assembly of nanowires in nanoBlocks and switchBlocks [22, 31] and are discussed in the following: 1. Stuck-at Faults Any horizontal line or vertical line stuck at 0 (Gnd) or 1 (Vdd). 2. Stuck-Open Faults A single line is broken and it may result in a memory effect. 3. Forward and Reverse-Biased Diode Faults When a 1 is applied to one of the vertical lines and a 0 is applied to a horizontal line, a forward-biased diode is formed between these two lines. A defective forward-biased diode will change the output line value. 4. Bridging Faults Any two lines shorted to each other creates a bridging faults of either types; AND-bridging and OR-bridging. The test procedure targets these faults and provides test architectures and configurations for detecting them.
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
73
Summary of the Proposed Method Due to high defect density concerns and large number of components in nanoFabrics, an efficient test methodology is required to achieve high fault coverage and an effective diagnostic procedure is required to provide high recovery. In this chapter, we propose a new BIST procedure to effectively test and diagnose the nanoFabrics. The proposed technique is based on a fine-grained approach, i.e., it uses small test groups. The BIST procedure targets various fault types defined as the set of possible faults for nanoscale crossbars. New and efficient test groups, test architectures and test configurations are proposed to target these faults. Each test group includes a test pattern generator (PG), a response generator (RG) and the switchBlock between PG and RG. Both PG and RG test themselves as they also generate test patterns and responses, respectively. PG tests itself and also provides patterns for RG. The RG block receives the patterns to test itself and also provides responses that can be read back by the tester for evaluation and recovery analyses. Choosing small test groups improves test recovery. A fine-grained nanoFabric architecture is capable of providing the required access mechanism for reading the test results from the small test groups of the cluster. Different approaches for providing such access mechanisms for nanoscale clusters are proposed in literature [9,17]. The clusters blocks can be tested in parallel with low number of test configurations which in turn reduces the overall test time. Recovery Increase (RI) procedure is also proposed as a means to achieve higher recovery (i.e., a post test processing to find more number of fault-free blocks in nano-architectures). Our simulation results on few running examples demonstrate the effectiveness of our proposed BIST and RI procedures.
Related Prior Work Integrated circuits are usually thrown away after manufacturing test in case of detecting a fault in their components. Since the defect density of CMOS-based devices are still in a reasonable range, discarding a faulty chip is not very costly. However, there are some static and dynamic fault tolerance techniques that help a chip tolerate faults [14, 21]. Different approaches like TMR, CTMR and sparing techniques are used in these fault tolerance methods. Among various architectures, reconfigurable architectures like FPGAs can be very well adopted for fault tolerance techniques. A test procedure can find faulty blocks and programming device can avoid them during reconfiguration. Manufacturing test and application-dependent testing of FPGAs have been reported in [23–30]. Authors in [1,23] proposed a FPGA BIST that tests configuration logic blocks and switches and diagnoses the faulty ones. This approach divides the logic blocks of a FPGA into three blocks, i.e., test pattern generator (TPG), block under test (BUT) and response analyzer (RA). TPG generates and applies exhaustive test patterns to BUT. RA receives the responses to compare and determine
74
M. Tehranipoor
whether there is a defect. If one BUT is faulty, then the output of that BUT is different from other BUTs and this can be detected by RA. To avoid receiving faulty responses due to TPG faults, each RA receives the output from a set of equal TPGs and there are different TPGs in each test session. The diagnostic capability can also be added to the methodology, by configuring FPGA’s blocks first vertically and then horizontally; the faulty block located within vertical and horizontal configurations can be detected. Application-dependent testing of FPGAs has also been proposed in [25,27]. Such techniques rely on the fact that a defective FPGA may pass the test for a specific application. This is a one-time configuration, in other words, the FPGA will no longer be reconfigurable. This reduces yield loss and yields to manufacturing cost savings. In recent years, many sparing techniques have been proposed for memories [19, 20, 32]. Extra rows and columns are fabricated for each memory; if a defective row or column is detected, the extra rows or columns can be used. A yield analysis is required to find out how many extra rows or columns are required to completely repair a defective memory. Sparing techniques have been suggested for crossbars of nanowires [26] as well. However, sparing cannot be applied to a cluster in nanoFabric because each cluster is composed of an array (N N ) of crossbars. High percentage of columns and rows (i.e., blocks) in a cluster are expected to be defective due to high defect probabilities. In other words, sparing can be applied at wire/switch level to crossbars (as in [26]) but it may not be suitable for rows and columns of nanoBlocks and switchBlocks. A testing and defect mapping strategy is used in Teramac custom computer [6, 13]. Teramac is a FPGA-based system with a high defect density of nearly 3%. Testing is performed by configuring the components as linear feedback shift registers (LFSR). If the final bit stream generated by LFSR is correct, all the components are assumed to be defect-free. Otherwise, there is at least one faulty component in LFSR. To locate the faulty ones, the components in LFSR and other components are used to configure a new set of finer granularity LFSRs. If one of the new LFSRs is faulty, the component in the intersection of faulty LFSRs are identified as defective. A defect database is created after completing the test and diagnosis procedure. The compiler for generating FPGA configurations then uses this defect database to avoid configuring defective components. Therefore, a reliable system is constructed using unreliable components. Defect tolerance methodologies are presented in [15, 16] such that components of a nanoFabric are configured to test circuits to infer the defect status of individual components. Since the fabrication technology of crossbars can be tuned to reduce the probability of stuck-closed faults, only stuck-at and stuck-open faults are targeted in these methods. In [16], test circuits are assumed to have the capability of counting the number of faults occurring in a block under test. LFSRs or counters are suggested for these test circuits. The capability of counting the number of faults is used to assign a probability of having none, some or many faults in the block under test. This probability is used to find fault-free nanoBlocks in blocks with lower probability of having faulty nanoBlocks. Implementing complex test circuits like LFSRs and counters on simple crossbars will require considerable effort and results
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
75
in increasing test time. Also, very rich interconnections are assumed in [16] that can be used to implement these test circuits and connect blocks under test to external testers. Since the test circuits and blocks under test consist of large number of nanoBlocks in this method, the required interconnects should provide global routing. This technique may not provide high recovery but it will be executed fast using tester. The proposed technique in [2] assumes that nanoBlocks can be configured as test logic and also a memory in each nanoBlock is used for testing purposes. The memory stores the status (faulty/fault-free) of adjacent blocks. Assumption of using memory cells in crossbars may face difficulties due to limitations in usage of NDR latches as free storage elements in each crossbar. However, the testing procedure is relatively fast and achieved recovery is high. Authors in [31] proposed a BIST procedure to test nanoFabrics targeting more number of faults. A test group is considered to contain test pattern generator, block under test and output response analyzer. Since the technique uses AND and OR as ORA, it reads back only one bit thereby the technique loses diagnosis information. In contrast, the presented technique in this chapter considers test groups containing only two nanoBlocks; one configured as pattern generator (PG) and the other configured as response generator (RG). A single switchBlock connects PG to RG. Hence number of different test architectures required to cover all the faults in a test group is significantly reduced compared to [31]. On the other hand, reading out all the outputs of RG blocks identifies the exact location of faults.
BIST Procedure In this section, we describe the BIST procedure to test nanoFabrics and diagnose its defective components. We configure each nanoBlock either as a test pattern generator (PG) orresponse generator (RG). Due to the reconfigurability of nanoFabrics, no extra BIST hardware is required to be permanently fabricated on-chip. Moreover, a dedicated on-chip nanoscale test hardware can be highly susceptible to being defective. In this work, the test configuration generation is first externally performed and then delivered to the fabric for each test session. The test session is referred to as one particular test architecture with configured nanoBlocks and switchBlocks. The architecture used in BIST procedure is called test architecture (TA). Each TA includes test groups and each test group contains one PG, one RG and one switchBlock associated with PG and RG. PG tests itself and sends a pattern through a switchBlock to RG to test it and then the response is generated. Each cluster is divided into test groups. Consider a N N cluster including N 2 =2 nanoBlocks and N 2 =2 switchBlocks. In each test architecture, N 2 =4 nanoBlocks are configured as PG and the other N 2 =4 nanoBlocks are configured as RG. Overall N 2 =4 test groups are configured in each TA. Generally, TAs are generated based on (i) detection of faults in each nanoBlock and switchBlock and (ii) direction of connections between the nanoBlocks and
76
M. Tehranipoor
switchBlocks. Therefore, a number of TAs are generated for each test group and a number of test configurations are generated to cover all the faults in our fault model. During BIST, in each TA, test groups are configured similarly, i.e., same faults are targeted within test groups in each test session. In each test group, three components are configured, i.e., PG, RG and the associated switchBlock. A switchBlock in a test group is configured(i) to transfer the patterns from PG to RG, while PG or RG are being tested and(ii) as a block under test when we target faults in the switchBlock. The switchBlock configuration depends on the used test architecture. The programming device can configure the nanoBlocks and switchBlocks as required by the test architectures and configurations. Test results should be read back from the RGs blocks using tester (on/off chip). Interconnect resources provided in the nanoFabric architecture should be used for transmitting these results to the tester. Since we are suggesting the use of a fine-grained architecture with small clusters, each cluster is composed of low number of nanoBlocks. Therefore, the number of test groups in a cluster will be small and interconnect resources of the Fabric is assumed to be sufficient to implement the read back mechanism. Hence, when a test session is done, the output of RGs is read for evaluation and recovery analysis by the tester. The programming device configures 3N 2 =4 components in each TA and reads out the output of RG blocks, hence N 2 =4 components are accessed. All nanoBlocks of a cluster are configured as test groups using programming device and then the test can be executed, in parallel, in all test groups. Therefore, the test application time of a test architecture in a cluster is independent on the size of the cluster. However, parallel configuration of blocks of the cluster as test groups and parallel reading the test results of each test architecture depend on the interconnect and routing resources available for the cluster, and the programming device’s bandwidth. The richer the interconnect, the higher parallelism can be obtained when testing clusters. Notice that when testing a nanoFabric, in each step, one test architecture is used for the entire Fabric. This provides us with the opportunity of parallel configuration of clusters because configuration voltages will be the same for all clusters. The diagnostic and recovery analyses are performed after all test configurations are applied and all required data are collected. The BIST procedure can be performed using an on-chip tester (microprocessor or dedicated BIST hardware) implemented in reliable CMOS scale on the substrate of nanoFabric. This will reduce the test time since external devices are generally slower than on-chip testers. The on-chip tester can execute BIST procedure, collect the test results, perform recovery analysis and identify new test configurations to be used in RI procedure. It may also eliminate the need to store defect map on-chip since it can find the faulty blocks before configuration in each cluster [7, 18].
Test Architectures (TAs) Figure 4 shows four TAs in a nanoFabric, called TA11, TA12, TA21 and TA22, respectively. Note that the outputs in each test architecture must be located on south or
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
a
c PG
RG
PG
77
d PG
RG
RG
PG
TA21
TA22
TA11
b RG
PG
RG
TA12
Fig. 4 Test architectures TA11 to TA22 [29]
east of a nanoBlock and the inputs must be located on north or west of a nanoBlock as discussed earlier (see Fig. 1c). To simplify the testing and diagnostic procedures, we assume that each PG generates output only on one side (either south or east) but an RG can receive the input data either on one side (either north or west) or on both sides (north and west). To configure a PG to generate the output, physical 1s and 0s (Vdd and Gnd) must be used as inputs to nanoBlocks. Depending on the type of target fault, RG may or may not connect to the physical Vdd and Gnd. TA12 is the complement of TA11 since a PG in TA11 is used as RG in TA12 and vice versa. This guarantees that each block is configured as both PG and RG. TA21 and TA22 are also configured in a similar fashion except that TA22 is the complement of TA21. In TA11 and TA12, PGs generate the outputs on east side and RGs receive the patterns on their west side. In TA21 and TA22, PGs generate the output patterns on south and the outputs are transferred to the inputs of RG on north. Figure 5 shows an alternative set of test architectures (TA31, TA32, TA41, TA42). The difference between TAs shown in Figs. 4 and 5 is that different set of inputs and outputs are used in each test group. For instance, outputs on south of PGs and inputs on west of RGs are used in TA31. Either one of these test sets fTA11, TA12, TA21 and TA22g or fTA31, TA32, TA41 and TA42g can be used in each test procedure. There are cases that two PGs are required for testing some faults in a nanoBlock (see section Test configurations). Figure 6 shows three more TAs (TA51 to TA53) that use two PGs. Note that a combination of two PGs and one RG needs three test architectures. Using all seven TAs (fTA11, TA12, TA21, TA22, TA51, TA52, TA53g or fTA31, TA32, TA41, TA42, TA51, TA52, TA53g) ensures detection of all faults in nanoBlocks. Additional TAs must be applied to achieve full coverage for switchBlocks as well. Note that each TA (e.g., TA11) can be applied more than once depending on the type and number of faults considered to be tested in the fault model. Moreover, some faults may also need more than one test configurations. A portion of switchBlock is also tested in each TA during testing nanoBlocks. However, there will still be some undetected faults in switchBlocks that must be tested separately using some additional configurations. This is further discussed in section Test configurations.
78
M. Tehranipoor
Fig. 5 Test architectures TA31 to TA42 [29]
a
c PG
PG
RG
RG TA41
TA31
b
d RG RG
PG PG
TA32
a
b
c PG
RG
RG
RG
PG
PG
PG
PG PG
TA51
PG
RG
RG PG
RG
TA52
PG
PG
PG
RG
PG
PG
PG
PG
TA42
RG
PG
TA53
Fig. 6 Test architectures TA51 to TA53 [29]
Test Procedure There is no block under test in our BIST procedure as opposed to techniques proposed in [2, 31]. Both PG and RG test themselves in each test session in a test architecture. 0s and 1s can be generated inside each nanoBlock that results in testing that nanoBlock and the output is sent to another nanoBlock which can be reconfigured such that it tests itself. A combination of these two blocks working as PG and RG provides complete fault coverage for the defined set of faults. Since both PG and RG detect faults, our test procedure requires lower number of test configurations. A high level description of the BIST procedure is shown below:
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
79
BIST Procedure /* TA D fTA31, TA32, TA41, TA42, TA51, TA52, TA53g */ /* F D ffj g, where fj is the j th fault in the fault list */ for .i D 1I i 7; i C C/do begin Select TC (fj ) /* Selects appropriate test configuration (T Cj ) for targeting fault fj */ Partition Cluster(TAi ) /* Partitions the clusters into test groups based on TAi */ Config TG (TCj ) /* Configures the test groups based on selected TCj */ Apply Pattern /* PG applies the patterns and RG generates the responses*/ Read Back (RG) /* Reads back responses generated by RGs*/ end Identify Faulty TGs() /* Identifies the location of faulty test groups (FTG)*/ RI Procedure (FTG) /* Executes RI procedure*/
TA contains all seven test architectures used in BIST Procedure. F also consists of all the targeted faults listed in the fault model. The procedure starts with selecting a test configuration for targeting a fault from fault list (Select TC (fj )). It then partitions the clusters into test groups based on the selected test architecture (Config TG (TAi )). The test groups are then configured based on the selected test configuration (Config TG (TCj )). Test configuration (T Cj ) represents the configuration required for detecting fault fj . Next, the patterns are applied and the responses are generated (Apply Pattern). The responses are read back using an on-chip or external tester (Read Back (RG)). The process continues until all the faults are targeted (F D fg) and the defect information is given toIdentify Faulty TGs() function to identify the location of faulty test groups (FTGs) and constitute the initial defect database to be used in recovery increase procedure (RI Procedure (FTG)). RI procedure will be discussed in section Recovery analysis.
Test Configurations Test Configurations for Detecting Faults in NanoBlocks In this section, we present the required test configurations for testing faults in a nanoBlock. Stuck-at faults, stuck-open faults, forward-biased and reverse-biased diode faults and AND- and OR-bridging faults are targeted. Each test group includes one PG and one RG. One may suggest that only one PG could be sufficient to test a nanoBlock. This may test some of the faults in a
80
M. Tehranipoor
a
b V
V
V
c
V
V 1
1
1
1 1 1
1
1 1 1 sa0-V-PG
1
1
1
PG
1
RG
1
PG
1
1 1
1 1 1 1 1 sa0-H-RG and sa0-V-RG
1
GND
1 sa0-H-PG and FBDFs
Fig. 7 Test configuration for detecting sa0 faults in a nanoBlock (a) on vertical lines when nanoBlock is used as PG, (b) on vertical/horizontal lines when nanoBlock is used as RG, and (c) on horizontal lines when nanoBlock is used as PG. (c) also detects forward-biased diode faults [28]
nanoBlock such as stuck-at faults but not all of them. Note that there are some faults in a nanoBlock that require input(s) from other blocks to be detected. One of the important features in each nanoBlock is that the vertical lines can be connected to Vdd only and horizontal lines can be connected only to Gnd, while both vertical and horizontal lines can be connected to 1 and 0 generated at the output of surrounding nanoBlocks. Thereby, PG is connected to Vdd or Gnd to generate the patterns and RG may or may not be connected to Vdd and Gnd; in most cases RGs receive output values of PGs. Figure 7a shows the test configuration for a nanoBlock that tests stuck-at-0 faults on vertical lines [31]. All the vertical lines are connected to Vdd, hence the output lines are 1. The output value will be 0 if one of the vertical lines is stuck-at-0 or stuck-open. This architecture is used as PG and because of which the fault is called sa0-V-PG (stuck-at-0 on vertical lines of a nanoBlock used as PG). If a fault exists, RG will receive the faulty output and detect the fault. As seen, PG tests itself because any stuck-at-0 and/or stuck-open fault on a vertical line can change its corresponding output line value. Stuck-at-0 and/or stuck-open faults on vertical and horizontal lines can also be detected by the configuration shown in Fig. 7b. PG provides inputs either on the horizontal lines or on the vertical lines of RG (depending on the TA used). Hence, either horizontal or vertical lines of RG are tested. The test configuration shown in Fig. 7b is used as an RG and the detected faults are called sa0-H-RG and sa0-V-RG. Figure 7c shows detection of stuck-at-0 and/or stuck-open faults on horizontal lines. Only one vertical line is connected to Vdd and all the horizontal lines are connected to Gnd. A forward-biased diode is created on the junction of vertical lines and horizontal lines. The horizontal output lines are 1 for a defect-free case. An output will be 0 in the presence of a stuck-at-0 and/or stuck-open fault on horizontal lines. This test configuration is used as PG and the fault detected is called sa0-H-PG. The faulty/fault-free output is transferred to RG and is then detected.
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
a
b
PG
GND
1
1 1 sa1-H-PG
1
0
0
0
0
0
0
0
0
0
0
0
81
c
0 0 0
RG
PG
0 0
0 0 0 0 sa1-H-RG and sa1-V-RG
0
0 0 0 sa1-V-PG
Fig. 8 Test Configuration for detecting sa1 faults in a nanoBlock (a) on vertical lines when nanoBlock is used as PG,(b) on vertical/horizontal lines when nanoBlock is used as RG and (c) on horizontal lines when nanoBlock is used as PG [28]
0
0
0
0
1
1
1
1
RG
1
Fig. 9 Configuration for detecting reverse-biased diode faults in nanoBlocks [28]
1
1
1 0
0 0 RBDFs
0
The configuration shown in Fig. 7c also tests forward-biased diode faults (FBDFs). If a forward-biased diode is defective, it may create a very high volume resistor between vertical line and its attached horizontal lines and results in a 0 on the horizontal output lines. K configurations are required to test all forwardbiased diode faults where K is the size of nanoBlock (K K). As a result, overall K test configurations are required to test FBDFs, sa0-H-PG and stuck-open faults. Figures 8a–c show configuration of nanoBlocks required for detection of stuckat-1 faults on horizontal and vertical lines. As shown in Fig. 8a, all inputs are connected to Gnd, hence all horizontal output lines are 0. This configuration is used as PG. An output line value becomes 1 in the presence of stuck-at-1 fault on a horizontal line (sa1-H-PG). Stuck-at-1 faults on horizontal and vertical lines of an RG block (sa1-H-RG and sa1-V-RG) can be detected using the configuration shown in Fig. 8b. The test configuration shown in Fig. 8c is used to test sa1-V-PG and generates patterns for RG block. The test configuration shown in Fig. 9 is used to test reverse-biased diode faults (RBDFs) in nanoBlocks. All the diodes on the junctions are reverse-biased. To reverse bias the diodes, all the vertical input lines are connected to 0 and the horizontal wires are connected to 1. A RBDF may change the output line value from 0 to 1 or 1 to 0. Since 0 is applied on vertical wires and 1 is applied on horizontal wires, this configuration must be used as RG. PGs generate required 1’s and 0’s for RBDFs in
82
M. Tehranipoor
a
b
V 0
0
1
0
0
V
V
V
c
V
V
V
0
RG GND
V
0 1
1
1
1
0
1
1
0
1
0
RG
1
RG
1
1 1
1 1
0
1
1
1
1
1
ANDBF-H
ANDBF-V
1
1
0
ORBF
Fig. 10 Configuration for detecting (a) AND-bridging faults among vertical lines, (b) ANDbridging faults among horizontal lines and (c) OR-bridging faults among vertical lines [28]
a
b
c
V
V
V
V
0
PG
V
0
0
1
PG
0
PG
1 1
1 0
0
0
1
Fig. 11 PG for detecting (a) ANDBF-V, (b) ANDBF-H and (c) ORBF-H faults in RGs [28]
RG to be tested. This test configuration requires two PGs in each test architecture, hence test architectures TA51, TA52 and TA53 are used for this purpose. Figure 10a shows the required configuration to detect AND-bridging faults among vertical lines (ANDBF-V). A bridging fault between a line connected to Vdd and other vertical lines may change the vertical output, not connected to Vdd, from 0 to 1 and the output line connected to Vdd may change from 1 to 0. K configurations are used for detecting all faults on K vertical lines. Figure 10b shows the required configuration to detect AND-bridging faults among horizontal lines (ANDBF-H). A bridging fault among two horizontal lines changes the output line value. K configurations are required to test all AND-bridging faults among horizontal lines. Finally, Fig. 10c shows the required configuration for detecting OR-bridging faults (ORBF). An OR bridging fault among vertical lines may change the output of vertical lines from 0 to 1. All three configurations must be used as RG blocks. Figures 11a–c are used as PG to test ANDBF-H, ANDBF-V and ORBF-H faults on PG, respectively.
Test Configurations for Detecting Faults in SwitchBlocks In this section, we present the test configurations needed for testing switchBlocks in the Fabric. As shown in Fig. 2, a switchBlock in the Fabric is constructed from the
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
a
From PG (N) 1
0
0
b
0 1 1
SB
83
1 1
0 To RG (E)
From PG (W) 1
SB
1 1 0
0
0
0
To RG (S) FBDF-SB-NE
FBDF-SB-WS
Fig. 12 (a) Test configuration for testing FBDFs on north–east diodes (FBDF-SB-NE). (b) Test configuration for testing FBDFs on west–south diodes (FBDF-SB-WS) [29]
intersection of inputs and outputs of four neighboring nanoBlocks. This means that south and east outputs of two diagonally adjacent nanoBlocks, and north and west inputs of two other diagonally adjacent nanoBlocks all together create a switchBlock. The test configurations used for PG and RG to test stuck-at and bridging faults will completely cover them in switchBlocks as well. This is because wires in a switchBlock are the same wires of neighboring nanoBlocks. To have a complete coverage of switchBlock’s faults, additional test configurations must be designed to cover the remaining faults in switchBlock, such as forward- and reverse-biased diode faults (FBDFs and RBDFs). Figure 12 shows two configurations required to detect FBDFs in diodes located on the intersection of north and east wires of switchBlock (FBDF-SB-NE) and diodes located on the intersection of its west and south wires (FBDF-SB-WS). Inputs are applied from PG to SB so that north–east (NE) or west–south (WS) diodes are forward-biased and the results are sent to RG for evaluation. Figure 12a uses TA31 and TA32 and Fig. 12b uses TA41 and TA42 to detect all FBDF-SB-NE and FBDF-SB-WS faults in switchBlocks. Figure 13 shows two other configurations needed to detect FBDFs on north– west (NW) and south–east (SE) diodes of switchBlock. Since north and west lines are both inputs to switchBlock, additional diodes must be configured to send the test results to RG (south lines are used for this purpose in Fig. 13a). Also, since both south and east lines are outputs of RG, to test diodes on their intersection extra diodes should be configured to apply inputs from PG to switchBlock (west lines are used for this purpose as shown in Fig. 13b). These two configurations use TA11– TA12 and TA21–TA22 architectures. Figure 14 shows two more configurations to test RBDFs on north–east and west– south diodes of switchBlocks. These two use TA31–TA32 and TA41–TA42, respectively. Finally, Fig. 15 depicts the configurations required to test north–west and south–east diodes of switchBlocks for reverse-biased diode faults. These two use TA21,TA22 and TA11,TA12, respectively. The idea behind these test architectures
84
M. Tehranipoor
a 1
b
From PG (N) 0 0
0
applies a 1 to south line 1
1
From PG (W) 0
SB
1
SB
0
1
0 1
To RG (E)
1
1
1
1 To RG (s)
west lines send test results to south FBDF-SB-NW
FBDF-SB-SE
Fig. 13 (a) Test configuration for testing FBDFs on north–west diodes (FBDF-SB-NW). (b) Test configuration for testing FBDFs on south–east diodes (FBDF-SB-SE) [29]
b
a 1
Apply 0 to south lines
From PG (N) 0 0
0
0
1 1
SB
From PG (W)
To RG (E)
1
1
1
1
1
SB
0 Apply 1 to east lines RBDF-SB-NE
0
0
0
To RG (S) Reversed Biased Diodes RBDF-SB-WS
Fig. 14 (a) Test configuration for testing RBDFs on north–east diodes (RBDF-SB-NE). (b) Test configuration for testing RBDFs on west–south diodes (RBDF-SB-WS) [29]
a
b 1
0
From PG (N) 0 0 west lines send test results to south
applies 0 to south lines 1
1
From PG (W) 0
SB
0
1 To RG (E)
SB
1
0 1
1
1
1
1
To RG (s) RBDF-SB-NW
Reverse Biased Diodes
applies 1 to east lines RBDF-SB-SE
Fig. 15 (a) Test configuration for testing RBDFs on north–west diodes (RBDF-SB-NW). (b) Test configuration for testing RBDFs on south–east diodes (RBDF-SB-SE) [29]
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
85
is just same as what is done in previous parts, apply inputs (through PG) so that diodes are reverse biased and send the results out to RG. These four test configurations cover reverse-biased diode faults of switchBlocks.
Number of Test Configurations Since we use a test group containing PG and RG, we target one fault in PG and another in RG (and usually bridging or stuck-at faults in the switchBlock between them). If the output of RG is not equal to the fault-free one, therefore a fault is detected. Figure 16 lists the required test configurations to cover all stuck-at, ANDBF and ORBF faults in nanoBlocks and switchBlocks. The first column shows which test architecture is used. The second and third columns present the test configurations used in that test architecture for PG and RG, respectively. Detected faults in nanoBlocks are shown in the fourth column and the fifth column shows faults detected in switchBlocks. Number of test configurations used for a test architecture, number of architectures and the total number of configurations are listed in columns six, seven and eight, respectively.
Test Architecture
TA31 and TA32
TA41 and TA42
Configuration
NanoBlock Faults
SwitchBlock Faults
# of # of Total # of Confs Archs Configs in a TA
PG
RG
7(a)
7(b)
sa0-V-PG sa0-H-RG
Sa0-N-SB Sa0-E-SB
1
2
2
8(c)
8(b)
sa1-V-PG sa1-H-RG
sa1-N-SB sa1-E-SB
1
2
2
11(b)
10(b)
K
2
2K
K
2
2K
1
2
2
1
2
2
K
2
2K
K
2
2K
ANDBF-V-PG
ANDBF-N-SB
ANDBF-H-RG
ANDBF-E-SB
10(c)
10(b)
ORBF-V-PG ORBF-H-RG
ORBF-N-SB
7(c)
7(b)
sa0-H-PG sa0-V-RG
Sa0-W-SB Sa0-S-SB
8(a)
8(b)
11(a)
10(a)
11(c)
10(a)
ORBF-E-SB
sa1-H-PG
sa1-W-SB
sa1-V-RG
sa1-S-SB
ANDBF-H-PG
ANDBF-W-SB
ANDBF-V-RG
ANDBF-S-SB
ORBF-H-PG
ORBF-W-SB
ORBF-V-RG
ORBF-S-SB
Total number of configs for stuck-at and bridging faults:
8K+8
Fig. 16 Number of test configurations required to test all the stuck-at and bridging faults in nanoBlocks and switchBlocks [29]
86
M. Tehranipoor Test Architecture
Configuration
Faults
# of Confs in a TA
7(b)
FBDF in all nanoblocks Configured as PGs
K
2
2K
9
RBDF in all nanoblocks Configured as RGs
1
3
3
PG
TA41 and TA42
7(c)
RG
7(c)
TA51-TA53
8(c)
# of Total # of Archs Configs
2K+3
Total number of configs for FBDFs and RBDFs in nanoBlocks:
Fig. 17 Number of test configurations required to test all the FBDF and RBDF faults in nanoBlocks [29]
Test Architecture TA31 and TA32
TA41 and TA42 TA21 and TA22 TA11 and TA12
Configuration
SwitchBlock Faults
# of Configs # of Total # of Archs in a TA Configs
PG
RG
SB
11(a)
7(b)
12(a)
FBDF-NE
K
2
2K
11(b)
7(b)
14(a)
RBDF-NE
1
2
2
11(c)
8(b)
12(b)
FBDF-WS
K
2
2K
11(c)
8(b)
14(b)
RBDF-WS
1
2
2
11(b)
7(b)
13(a)
FBDF-NW
K
2
2K
11(b)
7(b)
15(a)
RBDF-NW
1
2
2
11(a)
7(b)
13(b)
FBDF-SE
K
2
2K
11(a)
7(b)
15(b)
RBDF-SE
1
2
2
Total number of configs for FBDFs and RBDFs in switchBlocks:
8K+8
Fig. 18 Number of test configurations required to test all the FBDF and RBDF faults in switchBlocks [29]
Figure 17 shows the same information for detecting FBDFs and RBDFs on the nanoBlocks and finally Fig. 18 lists the configurations for detecting FBDFs and RBDFs on switchBlocks. The main objective at this point is to minimize the total number of required test configurations. Assume that we use seven TAs fTA31, TA32, TA41, TA42, TA51, TA52, TA53g to test a nanoFabric. As shown in Fig. 16, when using TA31, configurations shown in Fig. 7a and b are used. Configuration 7(a), used as PG, detects stuck-at-0 faults on vertical lines (sa0-V-PG) and configuration 7(b), used as RG, detects stuck-at-0 faults on horizontal lines (sa0-H-RG). The number of configurations required to test all these faults, as shown in the last column, is two since both TA31 and TA32 are applied.
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
87
The test architecture is chosen based on the test configurations used to detect a particular fault. For example, to test sa0-V-PG faults, test configuration 7(a) generates 1 on its vertical line (south), because of which we must use TA31 to transfer the pattern from south side of PG and send it to the east side of RG. Therefore, we need to use a RG like 7(b). As shown in Fig. 17, configurations 7(b) and (c) are used in test architecture TA41 and TA42 to test FBDFs in nanoBlocks. Overall K configurations are required with each TA. To test RBDFs, one configuration is used for three test architectures, TA51, TA52 and TA53. Based on these figures, the required number of configurations is: Number of configs for stuck at and bridging faults D 8K C 8 Number of configs for FBDFs and RBDFs in nanoBlocks D 2K C 3 Number of configs for FBDFs and RBDFs in switchBlocks D 8K C 8 Total number of configs D 18K C 19 18K C 19 is the required number of configurations to achieve complete fault coverage for the faults listed in our fault model for test groups with nanoBlocks and switchBlocks of size K. 18K C 19 also indicates the total number of times a programming device configures the Fabric under test. Next, we estimate the total number of switches to be configured in BIST. Let’s assume that Nsw config is the average number of switches to be configured in each test group for each test session. Thus, .18K C 19/ Nsw config will be the total number of switches to be configured during BIST in each test group. Suppose the Fabric has clusters of size N , a M M array of clusters (M 2 clusters), and N 2 =4 test groups in each cluster (see Fig. 1). Therefore, the total number of switch configurations in the Fabric during BIST is calculated by M 2 N 2 =4 .18K C 19/ Nsw config . Under same conditions [31] reports 4K 2 C 4K C 6 configurations for each test group. Since test groups in [31] contain three nanoBlocks, thus total number of test groups in a cluster will be N 2 =6. Hence, the total number of switch configurations will be M 2 N 2 =6 .4K 2 C 4K C 6/ Nsw config 1 where Nsw config 1 in [31] is greater than Nsw config in our method since the test groups are larger. It can be expected that N 2 =6Nsw config 1 ' N 2 =4Nsw config . Therefore, the total number of switches to be configured using [31] is significantly larger compared to our technique due to 4K 2 factor in the number of required configurations for each test group in [31].
Time Analysis for BIST Test time (T ) for reconfigurable devices is constituted from three main components, namely: (i) configuration time (TCT ), (ii) test application time (TApT ), and
88
M. Tehranipoor
(iii) response read back time (TRBT ), i.e., T D TCT C TApT C TRBT . Test application time (TApT ) for the technique is very short, independent on the Fabric size, and negligible when compared to TCT and TRBT . Once the test groups are configured, test pattern will be generated and applied in parallel in all test groups of the Fabric. Accurate analysis of test time for our technique and would require detailed information about the routing and interconnect architecture of the nanoFabric. TC T depends on the programming device’s configuration capability and bandwidth. Let’s assume that programming device can configure C switches in one clock cycle. Thus, the total number of clock cycles required for BIST procedure is given by: .M 2 N 2 =4 .18K C 19/ Nsw config /=C As seen, the total number of clock cycles is proportional to the total number of test groups configurations (.18K C 19/ vs. .4K 2 C 4K C 6/). This demonstrates that our technique, due to using less number of test groups, offers lower TC T . Similar analysis can be performed for RI procedure with the difference that RI does not configure, test and read all the Fabric blocks. In RI only susceptive and recoverable blocks are considered.
Recovery Analysis If a test group is fault-free after applying all the test configurations, all the components in that test group are identified as fault-free. But, if the test group is faulty, at least one component (PG, RG or switchBlock) in that test group is faulty. The PG and RG used in this test group are also used in other test groups in different architectures. Assume that each test group is represented as (NBij , SBpq , NBkl ,) where NBij denotes a nanoBlock (PG) located in row i and column j , NBkl is RG located in kth row and lth column and SBpq is the switchBlock located in row p and column q. For instance, Fig. 19a shows that NB22 and NB33 are used in TA32 and TA42 as PG and RG, respectively. As seen, NB22 is also used as RG in TA31, TA41 and TA51 and NB33 is used as PG in TA31 and TA41 for the RG block located at row 4 and column 4 (NB44 ). Figure 19b shows a case where the test group (NB22 , SB32 , NB33 ) is faulty only when testing sa0-V-PG and sa0-H-RG using TA32 and test groups (NB11 , SB12 , NB22 ) and (NB33 , SB34 , NB44 ) are fault-free for all the test configurations. If one has this information about the fault-free status of these blocks, the only component in test group (NB22 , SB32 , NB33 ) that has an unknown status, is SB32 . Based on information obtained from previous tests that sa0-V-PG in NB22 is tested when testing test group (NB11 , SB12 , NB22 ) and sa0-H-RG is tested when testing test group (NB33 , SB34 , NB44 ), it is confirmed that NB22 and NB33 are fault-free and SB32 is faulty. This case is the simplest one, but if the neighboring test groups are not fault-free, then an extra process is needed to clarify which components of the susceptible test group are faulty or fault-free.
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
a
b 1
2
3
4
c Faulty TG Fault-free TG 1 2
5
TA51 1 TA31 TA41
TA32
TA42
TA31
4
1
1
2
2
3
3
4
4
TA21 2
Fault-free blocks 3
4
5
TA11 TA12
TA41
TA51 5
1
5
TA51
3 TA51
4
TA51
2 TA51
3
89
Faulty TG 5
TA22
Fault-free TG 5
TAs used in Recovery
Fig. 19 To diagnose a susceptive block, extra configurations must be applied. In these configurations test group must contain the susceptive block and two other blocks that are confirmed to be fault-free based on previous tests or confirmed to be usable in the specific test configuration required for testing susceptive block: (a) different test architectures for blocks, (b) faulty test group with fault-free neighbors, (c) faulty test group with susceptive neighbors [29]
We continue with above example and change our assumption about fault-free status of neighboring test groups (Fig. 19c). We assume that previous tests has confirmed that nanoBlocks NB31 and NB35 are not susceptible for horizontal faults, NB42 has no vertical faults, switchBlock SB12 is not susceptible to have WS (west– south) faults and SB34 is not susceptible to have WE (west–east) faults. To check the status of each of the components in test group (NB22 , SB32 , NB33 ) some more configurations must be applied. First, SB12 and NB22 should be configured in TA21. If the results of the applied test show no faults, NB22 has no vertical faults because SB12 has already shown itself free from south faults, if there is a fault, it must be in NB22 . The next step is to configure (NB33 , SB34 , NB35 ) as a test group. If the result shows no faults then NB33 is fault-free, but if there is a fault it must be in NB33 because the other two blocks in the test group are confirmed to have no horizontal faults. Diagnosing SB32 for east faults is possible if its eastern nanoBlock (NB33 ) has no horizontal faults. In this case, test group (NB31 , SB32 , NB33 ) can be configured in TA11 and if the result shows no faults, it means that SB32 has no east faults. This is the same for testing SB32 for north faults. Test group (NB22 , SB32 , NB42 ) should be configured in TA22 and if the result has no faults, SB32 must be free of north faults. If the north neighbor of SB32 is still susceptible to have vertical faults, then there is no way to confirm the status of this switchBlock for north faults. If the total number of obtained fault-free test blocks after completing BIST is higher than a limit (Rlimit ) defined by the manufacturer, then the chip is ready for shipping. Otherwise, a Recovery Increase (RI) Procedure needs to be performed to increase the recovery to Rlimit . To perform this procedure, we assume that the external or on-chip tester knows the faulty test groups and the fault that made the test group faulty. This information is obtained by running the BIST procedure proposed in previous section. The RI procedure will reconfigure the faulty test groups and/or their surrounding nanoBlocks to target some particular fault(s). This results
90
M. Tehranipoor
in increasing the number of test configurations and the increase depends on the required Rlimit . The higher the Rlimit the more number of test configurations may be required. One advantage of using our additional test architectures (e.g., TA11, TA12, TA21 and TA22) is that they provide a capability of finding the location of faulty component(s) in a defective test group. One may see that TA11, TA12, TA21 and TA22 do not increase the fault coverage. These architectures are solely used for diagnosis and in RI procedure. An algorithm is required to analyze and find the minimum number of additional test architectures/configurations to find the faulty or fault-free components in each faulty test group. The algorithm needs to consider the results obtained from applying TA31, TA32, TA41, TA42, TA51, TA52 and TA53 during BIST procedure to the nanoFabric under test as input and extract a minimum set of additional test architectures and configurations to recover the susceptive test groups and improve the recovery. This is performed by our proposed RI procedure which is shown in Fig. 20. The core of this procedure is a linear search in the defect map. It identifies each of the defective blocks and checks the status of its neighbor blocks. If a test group can be constructed using the defective block and its fault-free neighbors, then that test group will be scheduled to be applied to the blocks. When all possible recovery test architectures are extracted from the defect map, the BIST procedure will be executed for set of recovery architectures and configurations (RUN BIST(RTA)).Test results should be used to update the defect map (UPDATE DM()). Computational complexity for searching the defect map of nanoFabric and identifying TAs that can be implemented around each susceptive block to recover it is O(M 2 N 2 ). It shows that this search can be performed in an acceptable time. Timing requirements of configuring the TAs required for recovery of the blocks and reading the test results will depend on the configuration time and routing resources of the Fabric (as discussed in section Test configurations). If these resources provide the possibility of parallel configuration of these TAs, recovery procedure can be performed much faster. Total number of iterations required for the RI procedure depends on the test time limitations or the acceptable number of fault-free blocks detected in the Fabric. Our simulation results, shown in the next section, demonstrate that a low number of iterations considerably improves the recovery and takes it to acceptable range.
Simulation and Results A C++ model is created for the cluster and faults are randomly injected and BIST and RI procedures are performed. The simulation process includes the following phases: Phase 1
In this phase an array model of the Fabric is created and faults are randomly injected to it. Each block (nanoBlock/switchBlock) is modeled as an
Built-In Self-Test and Defect Tolerance for Molecular Electronics-Based NanoFabrics
91
# Define DM (Defect Map) /* Database used to store the fault locations and their types.*/ # Define RTA (Recovery TAs) /* Database to store location and type of TAs used for recovery.*/ # Define M (Fabric size) /* Fabric is a M M array of clusters.*/ # Define N (Cluster size) /* Clusters are N N arrays of nanoBlocks and switchBlocks.*/ # Define K (Block size) /* NanoBlocks are K K arrays of nanowires.*/ —————————————— RUN BIST Procedure (TA); UPDATE DM(); /* Location and type of faults are stored in the defect map.*/ // RI Procedure // DO: f for.i D 1I i > 0, all terms in Eq. (A7) for m > 1 are negligible. Hence, the continuity at the threshold point will be maintained.
References 1. P. Avouris, “Supertubes: the unique properties of carbon nanotubes may make them the natural successor to silicon microelectronics,” IEEE Spectrum, pp. 40–45, Aug. 2004. 2. S. J. Tans, R. M. Verschueren, and C. Dekker, “Room temperature transistor based on a single carbon nanotube,” Nature, vol. 393, pp. 49–52, 1998. 3. J. Guo, “Carbon nanotube electronics: modeling, physics and applications,” Ph.D. Thesis, Purdue University, 2004. 4. J. Guo, S. Datta, and M. Lundstrom, “Assessment of silicon MOS and carbon nanotube FET performance limits using a general theory of ballistic transistors,” IEDM tech. digest, pp. 29.3.1–29.3.4, 2002. 5. B. C. Paul, S. Fujita, M. Okajima, and T. Lee, “Impact of Geometry Dependent Parasitic Capacitances on the Performance of CNFET Circuits,” IEEE Electron Device Letters, vol. 27, n 5, pp. 380–382, May 2006. 6. B. C. Paul, S. Fujita, M. Okajima, T. Lee, H. S. P. Wong, and Y. Nishi, “Impact of Process Variation on Nanowire and Nanotube Device Performance,” IEEE Trans. on Electron Devices, vol. 54, n 9, pp. 2369–2376, September, 2007.
122
B.C. Paul et al.
7. J. Deng, N. Patil, K. Ryu, A. Badmaev, C. Zhou, S. Mitra, and H. S. P. Wong, “Carbon Nanotube Transistor Circuits: Circuit-Level Performance Benchmarking and Design Options for Living with Imperfections,” Intl. Solid State Circuit Conference (ISSCC), pp. 70–71, 2007. 8. J. Guo, A. Javey, H. Dai, and M. Lundstrom, “Performance analysis and design optimization of near ballistic carbon nanotube FETs,” IEDM tech. digest, pp. 703–706, 2004. 9. G. Pennington and N. Goldsman, “Semiclassical transport and phonon scattering of electrons in semiconducting carbon nanotubes,” Physical Review B, vol. 68, pp. 045426-1–045426-11, 2003. 10. C. Dwyer, M. Cheung, and D J. Sorin, “Semi-empirical SPICE Models for Carbon Nanotube FET Logic,” IEEE Conf. on Nanotechnology, pp. 386–388, 2004. 11. A. Javey, J. Guo, D. B. Farmer, Q. Wang, E. Yenilmez, R. G. Gordon, M. Lundstrom, and H. Dai, “Self-aligned ballistic molecular transistors and electrically parallel nanotube arrays,” Nano Letters, vol. 4, n. 7, pp. 1319–1322, 2004. 12. A. Javey, J. Guo, Q. Wang, M. Lundstrom, and H. Dai, “Ballistic carbon nanotube field-effect transistor,” Nature, vol. 424, pp. 654–657, 2003. 13. Y. Lin, J. Appenzeller, Z. Chen, Z. G. Chen, H. M. Cheng, and P. Avouris, “High-performance dual-gate carbon nanotube FETs with 40-nm gate length,” IEEE Electron Device Lett., vol. 26, n-11, 2005. 14. L. C. Castro and D. L. Pulfrey, “Extrapolated fmax for carbon nanotube field-effect transistors,” Nanotechnology, vol. 17, pp. 300–304, 2006. 15. K. Alam and R. Lake, “Performance of 2 nm gate length carbon nanotube field-effect transistors with source/drain underlaps,” Applied Physics Letters, vol. 87, pp. 073104-1-3, 2005. 16. D. L. John, L. C. Castro, and D. L. Pulfrey, “Quantum capacitance in nanoscale device modeling,” Journal of Applied Physics, vol. 96, n-9, pp. 5180–5184, 2004. 17. S. Rosenblatt, Y. Yaish, J. Park, J. Gore, V. Sazonova, and P. L. McEuen, “High performance electrolyte gated carbon nanotube transistors,” Nano Letters, vol. 2, n-8, pp. 869–872, 2002. 18. D. L. John and D. L. Pulfrey, “Switching-speed calculations for Schottky-barrier carbon nanotube field-effect transistors,” Journal of Vac. Sci. Technol., vol. A24, n-3, pp. 708–712, 2006. 19. Z. Chen, et al., “An integrated logic circuit assembled on a single carbon nanotube,” Science, vol. 311, n. 5768, p. 1735, March, 2006. 20. S. Iijima, “Helical microtubules of graphite carbon,” Nature, vol. 354, pp. 56–58, 1991. 21. J. W. Mintmire and C. T. White, “Universal density of states for carbon nanotubes,” Physical Rev. Lett., vol. 81, pp. 2506–2509, 1998. 22. S. J. Wind, J. Appenzeller, and P. Avouris, “Lateral scaling in carbon nanotube field-effect transistors,” Physical Review Letters, vol. 91, pp. 058301-1–058301-4, Aug. 2003. 23. P. L. McEuen, M. S. Fuhrer, and H. Park, “Single walled carbon nanotube electronics,” IEEE Trans. on Nanotechnology, vol. 1, pp. 78–85, 2002. 24. K. Natori, Y. Kimura, and T. Shimizu, “Characteristics of a carbon nanotube field-effect transistor analyzed as a ballistic nanowire field-effect transistor,” Journal of Applied Physics, vol. 97, pp. 034306-1–034306-7, Jan, 2005. 25. B. C. Paul, S. Fujita, M. Okajima, and T. Lee, “Modeling and Analysis of Circuit Performance of Ballistic CNFET,” Design Automation Conference (DAC), pp. 717–722, 2006. 26. S. J. Wing, J. Appenzeller, R. Martel, V. Derycke, and P. Avouris, “Vertical scaling of carbon nanotube field-effect transistors using top gate electrodes,” Applied Physics Lett., vol. 80, pp. 3817, 2002. 27. A. Bansal, B. C. Paul, and K. Roy, “Modeling and optimization of fringe capacitance of nanoscale DGMOS devices,” IEEE Trans. on Electron Devices, vol. 52, n 2, pp. 256–262, 2005. 28. J. Chen, C. Klinke, A. Afzali, and P. Avouris, “Self-aligned carbon nanotube transistors with charge transfer doping,” Applied Physics Letters, vol. 86, pp. 123108-1-3, 2005. 29. S. Han, X. Liu, and C. Zhou, “Template-free directional growth of single-walled carbon nanotubes on a- and r-plane sapphire,” Jl. AM. CHEM. SOC, vol. 127, pp. 5294–5295, 2005. 30. N. Patil, J. Deng, H. S. P. Wong, and S. Mitra, “Automated design of misaligned-carbonnanotube-immune circuits,” Design Automation Conference (DAC), pp. 958–961, 2007.
The Prospect and Challenges of CNFET Based Circuits: A Physical Insight
123
31. Y. Lin, J. Appenzeller, J. Knoch, and P. Avouris, “High-performance carbon nanotube field-effect transistor with tunable polarities,” IEEE Trans. on Nanotechnology, vol. 4, n-5, Sept, 2005. 32. http://www.eas.asu.edu/ ptm/ 33. J. Deng and H. S. P. Wong, “A Circuit-Compatible SPICE Model for Enhancement Mode Carbon Nanotube Field Effect Transistors,” SISPAD 2006. 34. Helix Material Solutions; http://www.helixmaterial.com/
Computing with Nanowires: A Self Assembled Neuromorphic Architecture Supriyo Bandyopadhyay, Koray Karahaliloglu, and Sridhar Patibandla
Abstract A two-dimensional array of nanowires, embedded in a semi-insulating medium, can elicit powerful computational and signal processing activity if the nanowires exhibit an N-type non-linearity in their current versus voltage characteristics. Such a system is relatively easy to self assemble using chemical routes and we have synthesized such systems using simple electrochemistry. The measured current–voltage characteristics of the nanowires exhibit the required N-type nonlinearity. Based on the experimentally extracted parameters, we have simulated the dynamics of the arrays to show that they replicate the behavior of cellular non-linear networks. Specific examples of image processing functions are presented. Keywords Nanoarchitecture Self assembly Nanowire Neuromorphic computing Image processing
Introduction In a seminal work, Roychowdhury et al, demonstrated that a two-dimensional array of nanowires, nested in a semi-insulating matrix, can perform useful computational functions if the nanowires are resistively/capacitively linked with nearest neighbors and each nanowire exhibits a negative differential resistance [1]. Such a system can perform the role of associative memory [1], act as Boolean logic gates [2], and also carry out useful image processing functions [1, 3, 4]. This is among the first successful ideas for establishing alternate paradigms for computing with nanoscale devices, particularly two-terminal devices that possess neither gain nor isolation between the input and output nodes. The advantage of two-terminal devices is the relative ease of fabrication and the associated low production cost. Most importantly, these devices, and indeed
S. Bandyopadhyay (), K. Karahaliloglu, and S. Patibandla Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA 23284 e-mail: [email protected] C. Huang (ed.), Robust Computing with Nano-scale Devices: Progresses and Challenges, Lecture Notes in Electrical Engineering 58, c Springer Science+Business Media B.V. 2010 DOI 10.1007/978-90-481-8540-5 7,
125
126
S. Bandyopadhyay et al.
an entire circuit, can be self assembled using inexpensive chemical routes, leading perhaps, to a computer that is “grown” in a beaker as opposed to being fabricated in a billion-dollar semiconductor foundry [5]. If this approach bears fruit, it will be a remarkable advance in many areas of science including materials science, chemistry, physics and electrical engineering. In the next section, we describe how the nanowire network, proposed in ref. [1], is self assembled. Then in the following section, we provide measurements of the current–voltage characteristics of the nanowires to show that they exhibit an Ntype non-linearity and a negative differential resistance. In the ensuing section, we provide some simulation results illustrating how these self assembled systems, possessing the properties measured by us, can perform useful signal processing work. Finally, in the last section, we present our conclusions.
Self Assembly of Neuromorphic Networks The self assembly process starts with a bare foil of aluminum 0.1 mm thick. For integration with a semiconductor wafer like silicon, one could electroplate a layer of aluminum, or sputter/evaporate a thin layer (few m thick), on the wafer. If we start with a commercially available foil, then this foil is first electropolished in a solution of perchloric acid, butyl cellusolve, ethanol and distilled water to create a mirror-like surface [6]. For an evaporated or sputtered layer, the electropolishing step is omitted. Next, the aluminum layer is anodized in either 15% sulfuric or 3% oxalic acid to produce a nanoporous alumina film ( 1m thick) on the surface. This anodic film contains a quasi-periodic hexagonal close packed array of pores. Figure 1 shows an atomic force micrograph of the pore arrangement viewed from the top. If the anodization is carried out in sulfuric acid, then the pore diameter is typically 10 nm. If the anodization is carried out in oxalic acid, then the pore diameter can be controlled by the anodization voltage. In oxalic acid, we have produced pores with diameter as small as 25 nm and as large as 50 nm by using anodization voltages of 10 and 40 V dc respectively. The pores can be selectively filled up with a semiconductor by using ac electrodeposition. For example, if we wish to fill the pores with the II–VI semiconductor CdS, we proceed as follows: The porous film is immersed in an electrolyte consisting of a non-aqueous solution of dimethyl sulfoxide comprising 50 mM cadmimum perchlorate, 10 mM lithium perchlorate, and 10 mM sulfur powder. Electrodeposition is carried out at 100ı C with an ac voltage of 20 V rms at 250 Hz applied between the aluminum substrate and a graphite counter electrode. During the negative cycle of the voltage, the CdCC ion in solution is reduced to zero-valent Cd and selectively deposited within the pores which offer the least impedance path for the displacement current to flow. During the positive cycle, the Cd atom is not re-oxidized to CdCC ion since alumina is a valve metal oxide that conducts unidirectionally. The high temperature of the solution then allows Cd to react with sulfur in the solution to produce CdS.
Computing with Nanowires: A Self Assembled Neuromorphic Architecture
127
Fig. 1 Atomic force micrograph of a porous alumina film. The dark areas are the pores and the surrounding light areas are alumina. The pore diameter is 50 nm. This sample was anodized in 0.3 M oxalic acid at room temperature with an anodizing voltage of 40 V dc
After the electrodeposition is complete, we have produced an array of CdS nanowires in a ceramic matrix (alumina). The cross-section diagram of a sample is shown in Fig. 2. The alumina is a semi-insulating material. Thus, we have realized a two dimensional quasi periodic array of vertically standing nanowires, each with diameter equal to the diameter of the pores, and length 1m, in a semi-insulating medium. If these nanowires exhibit negative differential resistance, then we would have realized the system visualized in ref. [1].
Electrical Characterization of the Nanowires In order to measure the current versus voltage characteristic of the nanowires, we made 0.1 mm 0.1 mm sized contacts on the surface. Gold wires were attached to the contact pad on the top and the bottom aluminum foil (see Fig. 2). These contacts cover a large number of nanowires in parallel. The density of the nanowires is 109 and 1010 cm2 in the case of oxalic acid and sulfuric acid anodization, respectively. Therefore, the contact pads cover 107 or 108 wires, in the two cases. However, not all of these wires are electrically probed. Figure 2 shows two important features. First, the pores are filled non-uniformly (this is a well-known feature
128
S. Bandyopadhyay et al.
Fig. 2 Cross section of the porous template filled with semiconductors to produce a quasi periodic array of vertically standing nanowires in a semi-insulating matrix. Note the location of the “barrier layer” at the interface of the semiconductor and aluminum. This layer is the barrier to current flow along the length of the wires
of ac electrodeposition) so that very few of the nanowires are actually contacted from both top and bottom. In fact, only about 1 in 1–100 million nanowires are probed [7]. This is actually advantageous. Because of this feature, we probe anything between 1 and 100 nanowires in parallel, even when we use the large area contact pads. Second, note that there is a “barrier layer” at the interface of the semiconductor and aluminum. It is the barrier to current flow along the length of the wires. This layer plays a central role in producing negative differential resistance in the current–voltage characteristics of the nanowires. Figure 3, shows the current–voltage (I–V) characteristic measured at room temperature in two different samples with a Hewlett-Packard semiconductor parameter analyzer. These samples contain CdS nanowires of diameter 10 nm. An N-type nonlinearity, with a clear negative differential resistance, is visible. These characteristics are completely repeatable from run to run in the same sample, but vary from sample
Computing with Nanowires: A Self Assembled Neuromorphic Architecture
a
b 12
Current (nA)
3.0
Current (nA)
129
2.0
1.0
0 20
24 Voltage (V)
28
8
4
0 35
45 Voltage (V)
55
Fig. 3 Measured current versus voltage characertistic of two different samples at room temperature. The peak to valley ratio in both samples is about 2.5:1. In (a), the voltage at which the current peaks is about 26 V and in (b) it is about 47 V
to sample because of the inherent variations in electrochemical synthesis. Add to that the fact that in different samples, we may be contacting different numbers of nanowires. Both samples however show very similar peak to valley ratio of 2.5:1. This ratio does not vary too much from sample to sample since it should not depend sensitively on the number of nanowires contacted.1 The voltage at which the peak current occurs varies significantly from sample to sample. In Fig. 3, we see that it varies by a factor of nearly 2. This is an unfortunate characteristic of nanostructures and is presently unavoidable. Because all nanostructures have large surface to volume ratios, their physical properties are significantly affected by surface variations. In the present nanostructures, there is a large concentration of surface states caused by defects and vacancies at the CdS/alumina interface. The surface charge density is estimated to be of the order of 1013 cm2 [8]. Because the surface states are random, there is a significant variation in the charge concentration in (and therefore the potential profile or conduction band bending across) different nanowires. This leads to the variation in the threshold voltage for the onset of negative differential resistance (or the voltage at which the peak current occurs). Concomitantly, there is also a significant variation in the peak current itself. In the two samples shown in Fig. 3, the peak current varies by a factor of nearly 5. Some of this variation accrues from the variation in the charge density (and therefore conductivity) of the nanowires, some of it accrues from the fact that the voltage at which the current peaks varies, while the rest accrues from the fact that the number of nanowires contacted in the two samples could be different even though the contact pad area is the same.
1 There should be some dependence of the peak to valley ratio on the number of nanowires contacted if the I–V characteristic varies from wire to wire. In that case, ensemble averaging over a large number of wires will suppress the peak to valley ratio.
130
S. Bandyopadhyay et al.
An advantage of neuromorphic networks is that they rely on collective computational models where the collective activity of many devices working in unison results in network functionality. Therefore, variations among individual device characteristics are not critical. In other words, these networks are immensely forgiving of stochastic variations and that makes them eminently suitable for nanoelectronics. We are currently simulating scenarios where the device characteristics vary within the limits established by measurements over multiple samples. If the circuit functionality is not significantly impaired by such variations, then we will have established neuromorphic networks as a serious contender for nanoelectronic architectures. The results of these simulations will be published in a forthcoming publication.
What Causes the Negative Differential Resistance? An important issue is what causes the negative differential resistance in the nanowires. The cause is not established conclusively, but there is a high likelihood that it is caused by tunneling through impurity states. In Fig. 4, we show the conduction band diagram along the length of the nanowires. Note, that the barrier to current flow is the barrier layer depicted in Fig. 2. Current flows by tunneling through and/or thermionic emission over this barrier. It is known that there is an impurity or trap level approximately 300 meV below the conduction band edge of alumina, as verified by infrared absorption measurements in ref. [8]. This trap level is depicted in Fig. 4. As the voltage across the nanowires is increased, the conduction band bends further and the Fermi level in the injecting contact ultimately lines up with this impurity level, at which point resonant tunneling occurs and the current increases dramatically. As the voltage is increased even more, the Fermi level floats above the impurity level and tunneling goes off resonance. At this point, the current drops, giving rise to the negative differential resistance. The voltage at which the peak current will occur depends on the conduction band bending in the CdS under zero bias, as well as band bending across the barrier layer. These quantities will vary from sample to sample because of the variations in the surface charge at the CdS/alumina interface,2 unintentional dopants and fixed charges in the barrier layer, leading to a spread in the value of threshold voltage for the onset of the negative differential resistance.
Other Electrical Parameters Other electrical parameters of interest that influence the network properties are inter-wire capacitance, the capacitance of an individual nanowire and the inter-wire 2
The surface charge comes about mostly from interface states which can vary widely.
Computing with Nanowires: A Self Assembled Neuromorphic Architecture
131
Fig. 4 Conduction band diagram along the length of the nanowires: (a) at zero bias, (b) at a bias corresponding less than the peak current, (c) at a bias corresponding to the peak current, (d) at a bias past the peak current, and (e) schematic representation of the current voltage characteristic showing the different regimes
resistance [1]. These three quantities were discussed in ref. [3]. Past measurements have shown that the capacitance of an individual nanowire is about 0.5 aF [7]. Extensive electrical characterization carried out in ref. [8] indicate that conduction
132
S. Bandyopadhyay et al.
Table 1 Measured electrical parameters
Peak to valley ratio Inter wire resistance Inter wire capacitance Wire capacitance
2.5 2G 5 aF 0.5 aF
through the alumina is via Poole-Frenkel emission from traps. The current voltage relation associated with this conduction mode is [9] I D GV exp Œq.q= d /1=2 V 1=2 =kT =kT
(1)
where G is the conductance, is the dielectric constant (D8 8:854 1012 F/m), d is the trap radius and is the potential barrier surrounding a trap. Ref. [8] found that d D 2 m and D 0:17 eV. Based on the data presented in ref. [8], we estimate from the above relation that G D 2:27 106 Siemens. At low voltages (10 V), the ratio I =V Gexp Œ=kT D 3:28 109 Siemens. This is the effective conductance of the samples used in ref. [8] which had cross sectional areas of 10 10 m and length 1 m. Therefore, the effective conductivity of the alumina is 3:28 109 S/m. Based on this value and the geometry and dimensions of the nanowires, we estimate that the inter-wire resistance is 2 G Ohms. The interwire capacitance is estimated to be 5 aF. Table 1 gives the measured electrical parameters.
Simulation Results A circuit-system level model was previously employed to study the system dynamics of the nanowire array. The state dynamics of the array is described by a set of coupled non-linear differential equations, each representing Kirchhoff’s current law [3] P P Gij .vj vi / C Cij vP j Cij /Pvi D IBi .t/ Jsi .vi / C .Csi C (2) j ¤i
j ¤i
Here, vi is the voltage dropped across the i -th nanowire, Jsi is the related negative differential resistance (NDR) current response from that nanowires, IB is an externally applied DC bias current, Csi denotes the capacitance of the nanowire, Cij is the coupling capacitance between neighboring nanowires and Gij is the coupling conductance between nanowires i and j . For numerical simulation, we group nanowires into clusters, each containing 77 nanowires in a square geometry. These clusters are considered as the basic processor units, which have resistive–capacitive coupling with each other and are arranged as a 2D processor array. The electrical parameters Csicluster ; Cijcluster and Gijcluster for a cluster are derived from the experimentally measured values for individual nanowires. All nanowires exhibiting a proper NDR response within these clusters follow a cooperative state dynamics based on their local neighbor nanowire states. Therefore,
Computing with Nanowires: A Self Assembled Neuromorphic Architecture
133
if a voltage state change is imposed on a particular cluster, the cooperative dynamics will tend to make nanowires in the cluster achieve similar states. Assuming that a cluster comprises 7 7 nanowires, Cijcluster D 7 5 aF D 35 aF and similarly Csicluster D 49 0:5 aF D 24:5 aF. These scaled capacitances are employed in all simulated results presented here. In order to obtain useful computational dynamics, the two time constants cluster should have similar order of magnitude, where Cijcluster =Gijcluster , and Cscluster =GNDR cluster GNDR is the positive slope of the N-type I–V characteristic for a cluster. The easiest way to enforce this near equality is to modify the cluster coupling conductance Gijcluster . This can be realized by intentionally decreasing Cijcluster , perhaps by thermal annealing which will anneal the traps and increase the potential barrier surrounding a trap, or by optical irradiation which fills/empties traps in the alumina, thereby permanently altering its resistivity. For simulation purposes, we will assume that Gijcluster D 0:5 nS. The nonlinear response assumed in simulations corresponds to Fig. 3a. This response is approximated as a piecewise linear (PWL) curve within the simulation tool (Fig. 5). Furthermore, the bistable operation requires a DC current bias for all the clusters in the 2D array [3]. Given the NDR peak and valley current levels, a typical choice for bistable operation is IB .Ipeak C Ivalley /=2. In the numerical examples, we assume such bias conditions can be achieved via suitable interfacing methods to pump current into the clusters. Equation (2) shows that the state dynamics of the array realize a neuromorphic computational model. For image processing applications, pixel intensity levels are mapped to voltages across clusters, with white corresponding to the highest voltage and black to the lowest. Each nanowire cluster represents an image pixel [3]. Computation proceeds via the dynamical response of the 2D coupled cluster array to the initial (voltage) state distribution, interpreted as the image input. The final steady state voltages across the clusters correspond to the processed output image.
3
Jpeak
Current (nA)
2.5 2 1.5 Jvalley
1 0.5 0 20
22
24 26 28 Voltage (V)
30
32
Fig. 5 Piecewise linear (PWL) approximation of the experimentally observed NDR nanowire cluster response for simulation of large 2D cluster arrays
134
S. Bandyopadhyay et al.
One needs to first isolate the voltage range over which NDR behavior is observed. Referring to the NDR curve in the experimental data (Fig. 3a), VL 26 V and VH 32 V is chosen as this voltage range. The range also naturally follows from an intersection of the constant bias current IB and the NDR response. These intersections also approximately reveal the stable state values in the bistable system. The exact stable states are also governed by the dynamical contributions of interwire coupling. The chosen voltage range 26V 32V is mapped to 8-bit grey-scale pixel intensities, in order to visualize the voltage states in the 2D array as images. The dynamics of the arrays correspond to nonlinear low pass filtering (image smoothing) and edge detection-enhancement response. The input images in the simulated examples all correspond to initial voltage states of the nanowire clusters. These states have to be programmed across the cluster array by modifying biasing conditions of individual clusters to set voltage states based on the input image data. Figure 6 shows the visualization of the voltage states as pixel intensities, with steady-state response from the system dynamics manifesting as smoothed image response. The cluster coupling used in this case is G D 0:2 nS (5 G ). This conductance value is about 18 times smaller than the experimentally extracted value, but as mentioned earlier, we can reduce the conductance by annealing. The figure also depicts a failure of steady-state response for an increased cluster coupling conductance (G D 0:5 nS), which rapidly pulls the cluster states to low voltage levels before the nonlinear cluster responses can take effect. In both simulations, the bias current employed per cluster is IB D 1:6 nA.
G=0.2 nS (Steady-state response)
Input image (t=0 nsec.)
Output image (t=400 nsec.)
G=0.5 nS (high states diminish in time)
t=400 nsec.
t=1000 nsec.
Fig. 6 Smoothing response for an 80 80 cluster array and image input, from the simulated array dynamics as a steady-state output (G D 0:2 nS) and as failure of a steady-state case with increased coupling conductance (G D 0:5 nS).
Computing with Nanowires: A Self Assembled Neuromorphic Architecture
135
This example highlights an important lesson. Useful signal processing attributes may not be found for any arbitrary values of electrical parameters. In the example just considered, useful image processing attributes emerge only when the coupling conductances are considerably lower (18 times lower) than what we measure. Thus, one may require additional processing steps (such as post-annealing) to make the self assembled arrays useful for specific functions. Irradiation by an optical source that fills or empties traps in the alumina can also permanently alter the coupling conductance. The second example that we consider is one where a response similar to edge detectionenhancement is obtained. In this case, G D 0:5 nS is employed with a bias current IB D 1:45 nA (Fig. 7). Here one concludes that the particular image features and the capacitive cluster coupling (Cij ) were sufficient to prevent a state failure demonstrated in the previous case, with the same value G D 0:5 nS. The sensitivity of the obtained steady-state responses to electrical parameters indicates that the cluster coupling strength, particularly its resistive component, will be a critical design issue for such applications. As improvements are made in fabrication and process control, it may be possible to implement higher order functions, such as the spatial programming of cluster couplings for image comparison and adaptations to applications like motion detection [10].
Input image (t⫽0 nsec.)
t⫽100 nsec.
t⫽300 nsec.
t⫽500 nsec. (steady-state)
Fig. 7 Edge detection-enhancement response for a 256 256 cluster array and image input, from the simulated array dynamics. Steady-state response is achieved in 500 ns
136
S. Bandyopadhyay et al.
Conclusion In this work, we have shown how an electrochemically self assembled system of semiconductor nanowires, nested in a ceramic matrix (alumina), can elicit useful image processing functions. This is a remarkable result in that a simple network, which can be chemically “grown” using essentially beaker chemistry, is capable of such sophisticated functions as smoothing and edge-enhancement detection. Acknowledgement The work is supported by National Science Foundation under grant CCF 0506710.
References 1. V. P. Roychowdhury, D. B. Janes, S. Bandyopadhyay and X. Wang, “Collective computational activity in self assembled arrays of quantum dots: A novel neuromorphic architecture for nanoelectronics”, IEEE Trans. Elec. Dev., Vol. 43, 1688 (1996). 2. V. P. Roychowdhury, D. B. Janes and S. Bandyopadhyay, “Nanoelectronic architecture for Boolean logic”, Proc. of the IEEE, Vol. 85, 574 (1997). 3. K. Karahaliloglu, S. Balkir, S. Bandyopadhyay and S. Pramanik, “A quantum dot image processor”, IEEE Trans. Elec. Dev., Vol. 50, 1610 (2003). 4. S. Bandyopadhyay, V. P. Roychowdhury and D. B. Janes, “Chemically self assembled nanoelectronic computing networks”, Int. J. High Speed Electron. Syst., Vol. 9, 1 (1998). Also in Quantum Based Devices and Systems. Eds. M. Dutta and M. A. Stroscio (World Scientific, Singapore, 1998), Chapter 1. 5. S. Bandyopadhyay, K. Karahaliloglu, S. Balkir and S. Pramanik, “Computational paradigm for nanoelectronics: Self assembled quantum dot cellular neural networks”, IEE Proc. - Circuits, Devices, Syst., Vol. 152, 85 (2005). 6. S. Bandyopadhyay, A. E. Miller, H-C Chang, G. Banerjee, V. Yuzhakov, D-F Yue, R. E. Ricker, S. Jones, J. A. Eastman, E. Baugher and M. Chandrasekhar, “Electrochemically assembled quasi periodic quantum dot arrays”, Nanotechnology, Vol. 7, 360 (1996). 7. N. Kouklin and S. Bandyopadhyay, “Capacitance voltage spectroscopy of self assembled ordered arrays of quantum dots”, 2000 IEEE International Symposium on Compound Semiconductors, IEEE Press, pg. 303. 8. V. Pokalyakin, S. Tereshin, A. Varfolomeev, D. Zaretsky, A. Baranov, A. Banerjee, Y. Wang, S. Ramanathan and S. Bandyopadhyay, “Proposed model for bistability in nanowire nonvolatile memory”, J. Appl. Phys., Vol. 100, 124306 (2005). 9. S. M. Sze, Physics of Semiconductor Devices, 2nd. edition, (John Wiley & Sons, New York, 1981). 10. K. Karahaliloglu and S. Balkir, “Quantum Dot Networks with Weighted Coupling”, in Proc. of IEEE Int. Symposium on Circuits and Systems, Vol. 4, pp. 896-899, Bangkok May 2003.
Computational Opportunities and CAD for Nanotechnologies Michael Nicolaidis and Eleftherios Kolonis
Abstract Silicon-based CMOS technologies are predicted to reach their ultimate limits by the end of the next decade. Research on nanotechnologies is actively conducted in a world-wide effort to develop new technologies able to maintain the Moores law. They promise revolutionizing the computing systems by integrating tremendous numbers of devices at low cost. These trends will provide new computing opportunities, have a profound impact on the architectures of computing systems, and require a new paradigm of CAD. This chapter discusses computing architectures and related CAD tools aimed at fitting the requirements and constraints of nanotechnologies, in order to achieve efficient use of the unprecedented computing capabilities promised by them. To benefit from such computing capabilities we consider applications requiring massive computing power such as the modeling and exploration of complex systems composed by huge numbers of relatively simple elements. We also discuss non conventional computing based on q-bits and other quantum objects, like Bose–Einstein Condensates, and the related CAD tools. Keywords Nanotechnology Computer-aided design Complex systems Nonconventional computing Quantum computing Computing with Bose-Einstein condensates
Introduction Shrinking of silicon-based technologies will reach its ultimate limits in about one decade. The reasons are not only technical (leakage current, signal integrity, energy consumption, circuit heating, . . . ), but also economic (excessive cost of fabrication lines) [22]. Single-electron transistors, quantum cellular automata, nano-tubes, M. Nicolaidis () TIMA Laboratory (CNRS, Grenoble INP, UJF), Avenue Felix Viallet F38031, Grenoble, Cedex, France e-mail: [email protected] E. Kolonis University of Piraeus, 80-82 Zeas st., 18534 Peireas, Greece e-mail: [email protected] C. Huang (ed.), Robust Computing with Nano-scale Devices: Progresses and Challenges, Lecture Notes in Electrical Engineering 58, c Springer Science+Business Media B.V. 2010 DOI 10.1007/978-90-481-8540-5 8,
137
138
M. Nicolaidis and E. Kolonis
molecular logic devices, nano-crystals, are some of the candidate alternative technologies [5, 11, 21, 23, 25, 36, 40, 50]. In their majority the related fabrication processes emerging in various research centers are bottom-up, taking advantage of the self-assembling properties of atoms to form molecules and crystals. They should allow the fabrication of very complex circuits at low cost and low energy consumption [4, 12, 28]. Some of the characteristics of such circuits will be: an extraordinary complexity, a high regularity, and a low fabrication yield. One consequence is that these processes can not produce highly sophisticated circuits with a high morphological differentiation, similar to MOS circuits. The fundamental reason yielding this kind of structures is that a self-assembling process consists on replicating a few “simple” basic modules a large number of times to form a regular structure (hereafter referred as nano-network). Of course, the basic modules may gain a certain complexity as these techniques gain on sophistication. By integrating tremendous numbers of devices at low fabrication cost, selfassembling processes should allow revolutionizing the computing systems. Furthermore, due to their intrinsic characteristics and constraints, these trends will have a profound impact on future architectures, requiring possibly shifting into a non von Neumann, highly parallel, reconfigurable computing paradigm for their efficient exploitation [2, 12, 17, 19]. A pioneering work [20] carried on at Hewlett-Packard Laboratories illustrated an approach able to tackle some of the constraints related to these technologies. They developed the TERAMAC computer, a multiprocessor built by interconnecting a large number of FPGAs. They used defective FPGAs and error-prone technologies to interconnect them, as an illustration of hardware platforms composed of a large number of identical, but error prone modules. The multiprocessor was created by programming conventional processor architectures into the regular structures of the FPGAs. Prior to this mapping, test and diagnosis procedures were applied to identify the defective elements of the system. Then, the mapping was done in a manner that only fault-free resources were employed. This work illustrates the feasibility of implementing complex systems by programming a regular network composed of a large number of simple and identical modules, affected by high defect densities. It provides an additional motivation for using regular, FPGA-like, structures in nanotechnologies, since achieving fault tolerance for the expected defect densities would be much more difficult and costly for irregular circuit structures. This fault tolerance approach is very general, and could be used with various reconfigurable architectures. But, as discussed in Chapter 2, some other aspects must also be addressed for exploiting efficiently the promise of nanotechnologies on producing low-cost extraordinary complex systems. From sections A holistic CAD platform for nanotechnologies to Architectures for complex systems implementation, we describe architectural solutions and a CAD platform that would allow tomorrows engineers to exploit efficiently the extraordinary computing power promised by nanotechnologies, in order to model and explore the dynamics of complex natural or artificial systems composed of huge numbers of simple elements. This platform is said holistic, because, it includes tools enabling a high-level description and simulation of the target application (e.g., an ecosystem, a set of interacting particles, . . . ), down to tools enabling efficient circuit-level
Computational Opportunities and CAD for Nanotechnologies
139
implementation. Most material in sections Introduction, A holistic CAD platform for nanotechnologies, High-level modeling and simulation tool (HLMS), Artificial ecosystems, Nano-network architecture fit tool (NAF), Circuit synthesis tool (CS) and Architectures for complex systems implementation comes from [26]. Material in section Systems of particles and emergence of relativistic space-time comes from [26, 29, 32, 33]. In section Nonconventional architectures, we discuss non-conventional computations. These computations involve behaviour quantum system and are difficult to process in systems based on macroscopic devices. First we describe a computational model of quantum systems which do not involve quantum superposition. Such a model is implementable in conventional computers as it involves conventional computations. However its implementation in such computers is not practical due to the related computation complexity. Then, we describe architectures of complex systems composed of elements exhibiting quantum behaviour. We comment on the huge computing power inherent to the way quantum systems evolve their state and we discuss the computing opportunities offered by quantum computers. This kind of computation is not limited to q-bit based computing. Computers based on other kinds of quantum systems could also be used to resolve complex problems not addressed by q-bit based computing. We describe an emerging paradigm of this kind of computation based on Bose–Einstein condensates. We also argue that the concept of quantum superposition and the related parallel computing, used to explain the huge computing power inherent to q-bits, leads to some paradoxes, in particularly when it is applied to observables with continues spectrum. Such observables will be in superposition on infinite number of states, leading to an infinite computing power!!! Then, we argue that it is most adequate to explain the huge computing power of q-bits and of other types of quantum computing, by the complex way quantum systems evolve the statistical distributions of their observables. We finish with a short description of our vision about the computing systems of the future. The underlying ideas in section Nonconventional architectures come from [29–31].
A Holistic CAD Platform for Nanotechnologies This section targets CAD tools allowing efficient use of the hardware resources promised by nanotechnologies. It is motivated by the following converging factors: (a) The availability of hardware platforms comprising huge amounts of identical modules interconnected into a regular network (the promise of nanotechnologies). (b) The flexibility of these platforms, which allow implementing various computing architectures by programming them. (c) The profound interest for science and technology to explore the dynamics of complex natural and artificial systems composed of huge numbers of identical elements. Point c reflects the fact that natural systems are often composed of huge numbers of simple elements. The ability for simulating accurately natural systems composed
140
M. Nicolaidis and E. Kolonis
of huge amounts of particles (e.g., atmosphere and oceans, chemical reactions, nuclear reactions, solar system formation, galactic interactions, . . . ) and biological systems (molecular interactions in the cell, cellular interactions in an organism, cellular differentiation in the embryo, neural systems, interactions between organisms in an ecosystem, . . . ), is a fundamental requirement for the computing systems of the future. In addition to natural systems, experimenting invented systems, composed of large numbers of simple modules, is convenient for studying the impact of the laws governing the basic modules on the global dynamics of this kind of systems. The availability of powerful hardware platforms, as promised in point a, will allow to increase drastically the accuracy of our explorations of the natural and artificial systems discussed above, and improve drastically our understanding of such systems. By using a traditional approach, as the one adopted in the TERAMAC project, we can implement such applications in two steps: First, program the nano-network to map a general purpose multiprocessor archi-
tecture. Then, program this architecture to simulate the target regular system.
But this approach results on severe resource waste, since: Parallel computers based on conventional architectures make poor usage of their
theoretical computational power. General-purpose architectures waste the hardware resources with respect to ar-
chitectures dedicated to target applications. A drastic gain in efficiency could be obtained if we program the nano-networks to map directly the target application. In other words, we will map the basic module of the target system on a set of modules of the nano-network. This mapping will be repeated to map on the nano-network as many modules of the target system as allowed by the complexity of the nano-network. This direct implementation of the complex system could be done efficiently if the nano-network architecture supports certain reconfigurable structures and the mapping is done in a manner that establishes the required communication between the modules of the target system. Furthermore, to make the approach practical, a dedicated CAD platform has to be developed. Because this platform should enable the direct implementation of the final application in the nano-network, it should include high-level tools enabling easy software-level description and simulation of the target application down to tools enabling its efficient implementation at circuit-level, hence the term holistic. This is not the case of todays EDA platforms as the current practice is to develop general purpose computers on which the final applications are programmed, or, in some particular cases, implement specific circuits accelerating power-consuming computations for a class of applications. But, in the later case again, the final application is implemented by software. The proposed CAD platform will comprise: A High-Level Modeling and Simulation Tool (HLMS)
It enables easy generation of a software description (model) of the target complex system and its exploration through simulation on conventional computers.
Computational Opportunities and CAD for Nanotechnologies
141
A Nano-Network Architecture Fit Tool (NAF)
It transforms the description of the complex system generated by the HLMS tool into a description that fits the constraints of the nano-network architecture. A Circuit Synthesis Tool (CS) It maps the description generated by the NAF tool onto the configurable architecture of the nano-network. These tools together with the constraints imposed by the nano-network architectures are discussed in the next sections.
High-Level Modeling and Simulation Tool (HLMS) This tool allows easy description of the target complex system by means of an interactive menu. Then, it generates a software description (model) of the complex system. Based on this description the tool can perform preliminary simulation on conventional general-purpose computers. Thus, before engaging the complex task of implementing the complex system in the nano-network, the user can simulate on conventional computers a reduced version of the system to validate his/her various choices concerning the system parameters and the evolution laws of the system entities. More importantly, the HLMS tool is mandatory for generating the software description of the target application which is the starting point for the other components of the synthesis platform. Consequently, our efforts are first concentrated on the development of this tool. The HLMS tool and its experimentation over two families of complex systems are presented below. A more detailed description will be provided in [27]. The complex systems considered are systems composed of a large number of interacting entities, in which a rich behavior emerges as a consequence of the large number of entities and of their interactions rather than as a consequence of the complexity of these entities. The entities composing the complex system can be identical or can belong to a small set of entity types. Each entity type will be described by: A set of state variables A set of communication signals between modules A set of functions that determine the evolution of the state variables of each type
of module The evolution of the state of the elementary entities and, thus, of the global system is produced in the following manner. At each simulation cycle, the evolution functions use the present state of the state variables and the incoming communication signals to produce the state of the entitys variables for the next simulation cycle (see Fig. 1). Thanks to a graphical interface, the user can specify the fundamental structure of her/his target complex system. This is done by specifying the set of internal variables of each entity type of the system, the inputs received from and the outputs send to the entities with which the specified entity interacts, and the evolution rules that determine the next state of each internal variable and of each output of this entity
142
M. Nicolaidis and E. Kolonis
State variables
State of the entity at cycle t
Evolution functions
State of the entity at cycle t+1
Communication signals with the environment Fig. 1 Elementary entity
as a function of its present state and of its inputs. To make the system intelligible, this interface allows also specifying some visualization parameters (e.g., associate different colors that reflect the type or the state of an entity or specify graphical representations of different parameters). The graphical interface allows modifying the visualization and the other parameters of the system even during simulation (on-line). To implement the High-level Modeling and Simulation tool we adopted an object oriented approach. The structure of the tool is shown in Fig. 2. It consists of a graphical interface, an application library, a generic component library and a simulation handler module. The tool also includes a central program not shown in the figure. As said earlier, the user specifies his/her target system by means of the graphical interface. The user can save this description in the application library for future use. Furthermore, the central program of the tool uses this description as well as the description of some generic components from the library of generic components to generate an internal representation of the application. This representation contains several classes: Classes of Elementary Entity Types
For example, in the case of an ecosystem, these entities can correspond to the types of cells of an organism (stomach cells, thermal isolation cells, . . . ). Parent Class From which the classes of elementary entity types can inherit the variables and functions that are common for all. For instance, in the case of an ecosystem, this class can contain variables such as position and heat and the functions of movement and heat diffusion that are common to all the cells of an organism. A Managing Class This manages the evolution of the state of the elementary entities and can, during the simulation, create new entities if this is allowed by the laws governing the system.
Computational Opportunities and CAD for Nanotechnologies
Application library
143
Library of generic components
Creation of a new application Graphical interface (on line / off line)
Paren t class
Managing class
Initialization
Execution of simulations
Application file Simulation handler
Fig. 2 Structure of high-level modeling and simulation tool
The user can employ the graphical interface to initialise the internal representation of the application. Then, the central program of the tool generates a file called application file. Finally the simulation handler uses this file to execute simulations. The adopted structure and implementation are generic and allow handling various kinds of systems composed of large numbers of interacting entities. To validate its versatility, we used the tool to experiment two quite distinct families of complex systems. The first family consists on artificial ecosystems and the second consists on artificial “universes” composed of a set of interacting particles governed by laws that determine the geometry of space-time (relativistic space-time in the example considered here) [29, 32, 33]. In addition to the generic modules of the High-level Modeling and Simulation tool, it is useful to implement specific modules for each family of complex systems to easy the users tasks for creating, experimenting and analyzing systems belonging to this family.
144
M. Nicolaidis and E. Kolonis
Artificial Ecosystems Our first experiments concern artificial ecosystems consisting on an artificial environment (the space of the ecosystem: the milieu which may have properties like temperature and resistance to the motion of objects, food/energy sources, . . . ) in which evolve virtual multi-cellular organisms with properties inspired from the real world, such as cell replication and cell differentiation leading to an adult organism from a single cell (ontogeny), movement, food capturing, sexual reproduction, adaptation to the environment, and phylogeny through mutation, crossing-over, and natural selection. Similarly to the natural organisms, the cells of an artificial organism all share the same genetic information (coded into “DNA” variables) and obey to the similar elementary functions as those described by molecular biology. From the computational point of view each module (a cell of an organism), is viewed as an automaton. It has a certain number of inputs and outputs, a certain number of internal variables, and a function associated to each variable. This function computes the next state of the variable from the present state of all the internal variables and the present values of the cell inputs. A similar function is also associated to each cell output. The graphical interface of the High-level Modeling and Simulation tool is used to specify these variables and their associated functions and subsequently the other parameters of the ecosystem. Furthermore, in order to easy the users tasks we implemented some functional modules. In particular: An Ontogeny Module
This module assists the user to code in the DNA variables the genetic information that guide the ontogeny process of an organism. A Module of Reproduction and Creation of Organisms This module manages the production of the new DNA of a children organism by the processes of crossing-over and genetic mutations. A Module of “Natural Selection” It allows the user to easily experiment with the “natural selection process” by selecting and eventually modifying on-line (i.e., during the simulation) parameters of this process. This is possible because the user graphical interface (Fig. 2) allows modifying the parameters of the system even during the execution of a simulation. Thanks to these modules, the user can easily create and experiment a large variety of artificial ecosystems as illustrated below.
Spontaneous Creation of Artificial Organisms In this experiment, we consider an environment with the potential of appearance of DNA variables with arbitrary values associated to various functions contained in a library (library of DNA functions). In the beginning of the simulation, we observed the appearance of DNA combinations that led to the formation of organisms in an arbitrary way. In the majority of the cases, the spontaneously created organisms lacked in complexity and ingenuity and survive for short time. As the simulation runs for
Computational Opportunities and CAD for Nanotechnologies
145
longer time, appear randomly some organisms which survive longer because their DNA combinations are more efficient and they begun to populate the ecosystem. In the long run, we observe an evolution towards forms which gain in complexity and are better adapted to the environmental conditions. Thus, we observed the emergence of organisms that could move and feed (from food sources randomly distributed in space and time), reproduce, capture another organism (prey) in order to feed, ...
Optimization of Existing Functions These experiments concern the optimisation of existing organs to adapt to the environmental conditions. We consider as example the influence that has on the size of the motion organs the factor of resistance (Fr ) that the milieu opposes to the motion of objects. We observe that for higher values of Fr the size of the motion organs increases and the organism spends more energy for movement. This is necessary in order to capture enough food to cover its other energy needs (for basic metabolism and temperature maintenance). We also observe an increase of the population when Fr decreases. Obviously, in this case, the organisms can spend more energy for reproduction (the energy saved from the movement) and also more organisms can survive for given energy (food) resources. Both results illustrate a good adaptability of the artificial organisms to the evolution of the environment. Numerous experiments on other genes also showed good adaptability.
Acquisition of New Functions These experiments illustrate another ability of the tool to create pertinent artificial ecosystems: the evolution towards more efficient organisms by acquisition of new functions. The library of functions contains among others an olfaction function (sensitivity to the concentration of smell substances (cells) released by the food sources) and a function of motion-direction control that reacts to the state of other organs. The exact manner the motion control function reacts to this state is determined by DNA values associated to a large number of functions. Initially the organisms of the ecosystem do not posses olfactory and motion-control organs and move randomly. Thus, they discover food sources only by chance. After a certain time we observed the emergence of organisms possessing olfaction organs, and other possessing motion-control organs. Later, new mutations created organisms that possess both functions but their motion control functions are not effective. Later emerged organisms that possess several olfactory organs placed at different parts of their body as well as motion-control organs that direct the motion towards the direction of those olfactory organs that sense higher concentrations of smell substances. In environments with sparse food sources, these organisms replace quickly all other
146
M. Nicolaidis and E. Kolonis
organisms as they grasp almost all food. This example illustrates an evolution-based learning that codes intelligent behavior in simple genetically programmed functions. A next step in our experiments will consist on exploring the emergence of antkind collaborative societies based on genetically programmed functions. A further step will consist on introducing neural-type functions. The facility offered by the tool to create and experiment an “infinity” of artificial ecosystems by expending the library of functions and by playing with the environmental and other parameters allows exploring a vast universe of artificial ecosystems.
Systems of Particles and Emergence of Relativistic Space-Time To illustrate the versatility of the High-level Modeling and Simulation tool we have considered a second family of complex systems referred hereafter as artificial “universes”. They consist on large numbers of elementary particles whose interactions obey to a set of laws specified by the user. The generic modules of the HLMS tool allow easy creation and simulation of such systems. However, our goal was to go one step further and study global properties of such systems and in particularly the structure of their space-time. Such experiments are useful to support new interpretations of modern theories of physics and in particular certain computational interpretations. Indeed, since several millenniums, we consider that our world is composed of a set of objects evolving in space with the progress of time. The evolution of objects and some of their properties should therefore conform to the structure of space and time. In particular, in special relativity, the structure of space-time is described by the Lorentz transformations and imposes the modification of the length of physical objects (length contraction), the reduction of the pace of evolution of the physical processes (time dilatation), and the loss of synchronization of distant events when observed from different inertial frames. As a consequence of this vision, two theories are needed to describe the laws governing the universe, one to describe the structure of space-time and another to describe the behavior of elementary particles (the elementary units that form the physical objects and the physical processes). Inspired from the theory of computing systems we can view the universe as a system composed of elementary computing modules (elementary particles), which determine their next state as a function of their present state and the state of the modules with which they interact. In this vision, the form of the space at any instance of the systems evolution is determined by the values of the position variables of the elementary modules. The structure of time is determined by the relationships of the paces of evolution of the different processes taking place in the system. Thus, this vision requires only a theory describing the laws governing the evolution of elementary particles, since these laws determine the evolution of the system, which determines on its turn the structure of space and time. As shown in [29,32,33], the space-time emerging in such a system obeys Lorentz transformations if and only if the laws of interaction of elementary particles obey a certain condition referred as Relativistic Constraint of Accelerations (RCA). The
Computational Opportunities and CAD for Nanotechnologies
147
central idea of this condition is that the intensity of the interactions is a function of the speed of the particles. As a consequence, the distances of mutual equilibrium of the particles forming a moving object are modified (contraction of lengths), the pace of the evolution of a group of particles (a process) is modified (dilatation of time), and the synchronism between distant events is altered if these events take place in moving frames. The aim of the experiments presented in this section is to illustrate this vision. This is done by simulating artificial “universes” in which the laws of interactions between elementary particles obey the RCA condition and by showing that in this case the results of measurements performed in different inertial frames obey the Lorentz transformations. Thanks to the HLMS tool, we can easily create and simulate artificial “universes” governed by interaction laws obeying the RCA condition. However, we also need new modules that allow measuring object lengths and process durations in various inertial frames. As a matter of fact, we implemented some new modules facilitating performing such measurements: They include: A module for creating a length reference (length unit) composed of a set of parti-
cles forming a rigid object. This object is created in any inertial frame and at any orientation specified by the user. A module for creating a time reference (clock) composed of a set of particles that produce a periodic process of constant period. This clock is created at any inertial frames and any positions specified by the user. A module for synchronizing clocks placed at any inertial frame and any locations specified by the user. We used the tool to create and experiment artificial universes whose laws of interactions obey the RCA condition. We experimented laws which, for particles at rest (position variable is constant), take various forms such as r 2 (r being the distance between the interacting particles), or more strange forms such as P(r)/Q(r) or e Q.r/ where P(r) and Q(r) are arbitrary polynomials of r. For particles moving with arbitrary speeds these laws were adapted to conform the RCA condition. In all cases, as expected from the theoretical results in [29,32,33], all measurements were in conformity with Lorentz transformations.
Nano-Network Architecture Fit Tool (NAF) The HLMS tool generates a software description of the target application executable in conventional computers. The implementation and exploration of the target application in a nano-network is conditioned by the particular architecture of the nano-network. The aim of the NAF tool is to transform the description generated by the HLMS tool into a description that fits the constraints of this architecture (Fig. 3). Once the functions of the modules (inputs, outputs, internal variables and evolution rules) have been determined, the complex system can be programmed in
148
M. Nicolaidis and E. Kolonis
Fig. 3 The architectural exploration tool
a nano-network, consisting on a programmable (FPGA-like) hardware. However, efficient exploitation of the hardware resources depends on the architecture of this hardware and the way a target application is mapped on it. The NAF tool will generate a description that includes two parts, the description of the function of the basic modules of the complex systems and the description of the communication of these modules. The former will be identical to the one generated by the HLMS tool. The generation of the later has to take into account the communication resources of the nano-network and the strategies that could best solve the communication strangle, which is known to be the most challenging task in highly parallel systems. Thus, determining relevant nano-network architectures and communication strategies is a task mandatory for implementing an efficient NAF tool. These aspects are addressed in section Architectures for complex systems implementation. In many aspects, the NAF tool is similar to the High-level Modeling and Simulation tool as it generates descriptions similar to the ones generated by HLMS. The difference is that it will introduce in the descriptions of the system modules and of their communications several constraints. For instance, in the case 3b of section Architectures for complex systems implementation, it will introduce the spatial constraints imposed to the position variable of each module. So, to develop the NAF tool we can use as basis the HLMS tool and modify it in order to introduce the constraints of the nano-network architectures and of the communication strategies described in section Architectures for complex systems implementation.
Circuit Synthesis Tool (CS) This tool maps the target complex system on the configurable architecture of the nano-network (Fig. 4). The input of this tool is the description of the target complex system created by the NAF tool. From this description the CS tool first creates an RTL description of the function executed by the basic module(s) composing the target complex system and of their interconnections. Then it maps this description on the configurable architecture of the nano-network. The transformation of the high-level description of the function of the basic modules and of their interconnections into an RTL description does not represent a new research challenge as this task is similar to that addressed by high-level synthesis tools, like those described in [13, 14, 16].
Computational Opportunities and CAD for Nanotechnologies
Model of the target system (NAF)
149
Circuit Synthesis tool
RTL description
Mapping
Nano-network
Fig. 4 Circuit synthesis tool
The second component of the synthesis tool is similar to synthesis tools used today to map RTL descriptions on reconfigurable architectures of FPGA families [6, 10, 51]. Similar approaches would be used for developing a tool that maps RTL descriptions on reconfigurable architectures of nano-networks. This component too does not represent a new challenge.
Architectures for Complex Systems Implementation For applications where the modules are not mobile and their function is stable in time, there are no particular architectural and communication challenges. For these applications we can map the function of the modules on the reconfigurable structures of the nano-network, similarly to the mapping of traditional hardware on FPGAs.
Architectures for Systems Comporting Mobile Modules For applications using mobile modules, which eventually have functions evolving over time, managing the interactions between modules is very challenging. To support the movement of modules we can use three options: (a) The first option uses a position variable to code the position of each module in the environment. The module moves in the environment by changing the value of this variable, but its position in the nano-network remains stable. Thus, modules located in physically distant locations within the nano-network may be neighbor in
150
M. Nicolaidis and E. Kolonis
x1,y1 x2,y2
x3,y3
simulated environment
nanonetwork Fig. 5 The first option for supporting applications with moving modules
the environment and have to interact (see Fig. 5, modules [x2, y2] and [x3, y3]). For performing the interaction between such modules the nano-network should either support interconnections of exponential complexity (in order to be able to connect at any time any pair of locations), or it will need to spend a number of computation cycles proportional to the size of the network for transferring the interaction information between a pair of distant modules by using local only connections. These solutions will affect dramatically either the size of the hardware required to map a given application or the speed of the computation. Two solutions for avoiding these problems are described below. They implement the application in a manner that only local interactions are used. (b) The second option does not use a position variable for each module. Instead, the position of each module within the environment is identical to its physical position in the nano-network. In this case, when a module is moving within the environment its implementation must also move within the nano-network. For applications where all modules are identical, all the areas of the nano-networks implement the same function. Thus, module movement requires passing the values of the variables of a module to an unoccupied neighbor location of the nano-network, or exchanging the values of the variables of two modules occupying neighbor locations in the nano-network. This can be done easily with conventional FPGA-like architectures (non-reprogrammable or reprogrammable). However, if the application uses modules of several types, then, moving a module requires passing to a neighbor location not only the values of its variables but also its function, resulting on intense reconfiguration process. In conventional reprogrammable FPGA architectures this task is possible but very time consuming (since the reconfiguration information has to be transferred to an external system that controls the programming of the nano-network and from this system to be written to the configuration memory for reprogramming the function of the considered locations). Doing so for a large number of moving modules will affect dramatically the performance of the system. Thus, conventional FPGA-like architectures are not suitable in this case. Instead, the architecture of the nano-network should provide connections for exchanging between neighbor cells the values programming their function. This
Computational Opportunities and CAD for Nanotechnologies
151
solution enables also efficient hardware use in applications where the function of modules can vary over time as a result of their evolution. (c) In applications where the different modules occupy a small amount of the space in which they evolve, using as position of a module in the simulated space its physical position in the nano-network, as suggested in point b, will waste a large amount of hardware resources. This will happen because all unoccupied positions of the simulated space will correspond to unused locations of the nano-network. To cope with, we propose a solution that merges the above cases a and b. With this solution, the position of each module is determined by an internal variable as in case a. In order to move, the module increases or decreases its coordinates (e.g., the x and/or y component of the position variable in a two-dimensional space). However, to solve the communication problem between modules we impose some constraints that maintain a spatial coherence between the positions of the modules in the simulated space and their positions in the nano-network. A module occupying position (a, b) in the nano-network will be mentioned as module (a, b). Its neighbors in the nano-network occupy positions (a 1, b 1), (a 1, b), (a 1, b C 1), (a, b 1), (a, b C 1), (a C 1, b 1), (a C 1, b), (a C 1, b C 1). We impose the following spatial constraints between the position (a, b) in the nano-network and the position variable (x, y) in the environment: The x component of the position variable of module (a, b) never takes a value lower than the x component of the position variables of modules (a 1, b 1), (a 1, b), (a 1, b C 1). Also, this component never takes a value higher than the x component of the position variables of modules (a C 1, b 1), (a C 1, b), (a C 1, b C 1). The y component of the position variable of module (a, b) never takes a value lower than the y component of the position variables of modules (a 1, b 1), (a, b 1), (a C 1, b 1). Also, this component never takes a value higher than the y component of the position variables of modules (a 1, b C 1), (a, b C 1), (a C 1, b C 1). With these constraints, the modules surrounding the module (a, b) in the simulated space are the same as in the nano-network (i.e., (a 1, b 1), (a 1, b), (a 1, b C 1), (a, b 1), (a, b C 1), (a C 1, b 1), (a C 1, b), (a C 1, b C 1)). Thus, (a, b) will only interact with its eight neighbors in the nanonetwork, requiring only local communications. While this solution maintains local communications as in point b, it does not require using as many modules as the locations of the simulated space. In fact, since, a module can move by modifying its position variable, we do not associate a location of the nano-network to each unoccupied position of the simulated space, avoiding the waste of the hardware resources. This solution is therefore suitable for applications where the modules occupy a small amount of the simulated space (e.g., in an artificial ecosystem living organisms occupy a small amount of the environment). But, as a counterpart, we need to maintain the spatial constraints announced earlier. For doing so, each module receives from its eight neighbors in the nano-network the value of their position variable. When the new values of the position variables of two modules do not conform to the spatial constraints, the mapping of the modules in the nano-network is modified to satisfy these constraints (an example is given in Fig. 6).
152
M. Nicolaidis and E. Kolonis
i 3,5 4,5 6,3 7,3
iv
ii
3,5 4,5 6,3 7,3
2,6 6,5 6,4 7,4
2,6 6,5 5,4 7,4
3,7 4,7 5,7 8,7
3,7 4,7 5,7 8,7
3,5 5,4 6,3 7,3
iii
3,5 4,5 6,3 7,3
2,6 4,5 6,5 7,4
2,6 5,4 6,5 7,4
3,7 4,7 5,7 8,7
3,7 4,7 5,7 8,7
Fig. 6 Example maintaining the spatial constraints in case c: (i) the spatial constraints are respected, (ii) a module moves in the simulated environment and destroys the conformity to the spatial constraints, (iii) two modules exchange their positions in the nano-network but the constraints are not yet restored and (iv) another change is made in the nano-network and the spatial constraints are restored
The remapping is done locally, by using a nano-network architecture allowing local reprogramming as described in point b. In some situations, this remapping may involve all the modules of a certain area of the nano-network, and may take several cycles since the remapping of some cells may involve the remapping of new cells. But excepting very special situations, the remapping will not involve large areas of the nano-network, since it is not propagated through the empty space surrounding a group of neighbor modules (e.g., the cells of a living organism or of a group of living organisms). During the remapping of the modules, when a location of the nano-network is reprogrammed, it cannot compute and send valid outputs to its neighbors. Thus, it activates a signal to inform them. In response, the neighbors do not perform new computations until this signal is deactivated. In addition, at the next clock cycle they activate a similar signal to inform their neighbors, which interrupt computation on their turn, and so on. A wave of interruptions propagates through the nano-network, but it is stopped when the empty space surrounding the cells of an organism or of a group of organisms is reached. Thus, these interruptions remain local and do not affect the operation of the rest of the network. The above discussion determines three architectures: 1. For applications using modules deprived of movement, conventional FPGA-like architectures (reprogrammable or non-reprogrammable) can be used. 2. For applications using a single module-type provided with movement, conventional reprogrammable FPGA-like architectures can be used. The position of each module will be determined by a position variable. 3. For applications using multiple types of modules which possess movement capabilities, we will employ architectures allowing local transfer of the function of one location of the nano-network to a neighbor location. Two different approaches will be used to determine the position of each module in the simulated space: a. if the simulated space is of similar size as the nano-network, this position will be identical to its physical position in the nano-network.
Computational Opportunities and CAD for Nanotechnologies
153
b. if the simulated space is much larger than the size of the nano-network, this position will be determined by a position variable. In addition, spatial constraints will be imposed to the position variable, as described earlier in point c.
Generic Communication Architectures for Systems Comprising Mobile Modules The above approaches may solve the communication problem for various complex systems, which is known to be the most challenging problem in highly parallel architectures and which is exacerbated by the total parallelism of the approach considered here. However, these approaches have various limitations such as for instance the size of the application space versus the size of the nanonetwork, the enforcement of spatial constraints, and the freezing of the computation during the enforcement of these constraints. Therefore it is worth to investigate other architectures based on more general principles. Two such architectures are discussed below. In both architectures, the modules of the complex system have fixed positions in the nanonetwork. Thus, their position in the application space is determined by a position variable.
Associative Computer Approach Figure 7 presents the first of these architectures. It comprises two layers. The first layer (Cell Computer) maps each module of the complex system into a set of cells of the nanonetwork. This layer implements the functions determining the next state
Associative Computer
Cell computer
Fig. 7 A two layer architecture
154
M. Nicolaidis and E. Kolonis
of the modules of the complex system. To compute this state, the second layer (Associative Computer) provides to each module the states of its neighbor modules (those having close values of position variables). Thus, the challenging part is the Associative Computer, which ensures the communication between modules. Each area of the nanonetwork implementing a module of the complex system is connected to the Associative Computer through a parallel or a serial connection. This connection is bidirectional. At a first phase it transfers the state of the module (the values of its state variables, including its position variable). At a second phase it transfers from the Associative Computer to the module the state of each of its neighbors. Obviously, the number of the connections needed between the Cell Computer and the Associative Computer is equal to the number of modules implemented in the nanonetwork. Thus, the complexity of these interconnections is manageable. The challenging part of this architecture concerns the Associative Computer. It should determine the modules which have close values of position variables. The research of efficient architectures for this computer is a challenging topic that remains open. The related computations can be parallel, serial, or a mix of both. For instance, one can employ a mixed architecture: A decoder is used for each module (like the address decoders implemented in
memories). It decodes the k most significant bits of the position variable of this module (we can use for instance k D 10). The Associative Computer is divided into 2k (e.g., 210 ) regions. The signals bringing the state of the modules are routed over all the 2k regions of the Associative Computer. The nth output of each decoder is routed to the nth region of the Associative Computer. When such an output is active, it selects the signals bringing the state of the corresponding module. The nth region of the Associative Computer performs a serial comparison of the N-k less significant bits of the position variables of the modules selected by the active nth outputs of the decoders (where N is the total number of position variables).
We observe that this architecture is a compromise between a parallel and a serial computation. It uses a number of decoders equal to the number of modules of the complex system (parallelism) and a serial computation performed by each region of the Associative Computer over a reduced number of modules. This can lead to an interesting compromise between hardware complexity and computation time. However, both the hardware complexity and the computation time will remain important.
Hertzian Communication Approach Clearly, managing the communication complexity in a system composed of large numbers of modules is very challenging. Probably more efficient solutions should be searched on communication approaches different than the ones used currently in computer designs. A fast communication architecture allowing all interacting
Computational Opportunities and CAD for Nanotechnologies
155
modules to exchange information in parallel and within a single cycle is the Graal of this search. However, this seems unfeasible for the kind of systems we study here. Indeed, since each module does not know the positions (or the identities) of the modules with which it has to exchange information, since these modules can be at any positions in the nanonetwork at any time, and since they are different at different computation cycles, then, each module should be able to search and find all along the nanonetwork the ones with which it has to interact. To do that in a single cycle, each module has to communicate in parallel with all other modules and has also to check in one cycle their position variables. This seems unrealistic. However, a speculative idea for doing so will consist in using Hertzian communication. This idea is borrowed from another work [29–31], where it was suggested not for practical implementations but as an illustration for introducing the vision of a computational Universe. But Hertzian communication is not of interest for todays chips. So, why it could become interesting for future ones? In todays architectures there is a limited number of blocks that need to exchange information. In this context, point to point physical interconnections can be used to transfer information at very higher speed than with Hertzian communications. However, what about a system comprising thousands or millions of units; in which any two units may need to exchange information; and in which one unit does not know with which unit it has to communicate before it receives some information from it (e.g., before receiving the value of its position variable)? In such a system communication through physical, point to point interconnections, is unrealistic. On the other hand, Hertzian communication may offer a simple way to establish communication in systems where two units have to exchange information each time the values of some of their internal variables take close values. Indeed, Hertzian communication can transfer information between any modules without needing a physical interconnection support. Furthermore, with Hertzian communication there is a simple way to check if the position variables (or any kind of variables) of two modules have close values, and exchange information when this is the case. This can be done in the following manner. Each area of the nanonetwork implementing a module of the complex system comprises one tiny emitter and one tiny receiver. At each computation cycle, each module emits its current state by modulating it in a frequency corresponding to the value of its position variable.1 Furthermore, on the receiver side, it uses its position variable as demodulation frequency.2 Thus, the module will receive only the messages modulated in frequencies that are close to its own frequency of demodulation. In this
1
More exactly for a 3-D space, 3 frequency ranges are allotted (one for the X, one for the Y and one for the Z coordinate). First the state of each module is modulated to a frequency corresponding to the Z component of its position variable. Then, the resulting signal is modulated to a frequency corresponding to the Y component of its position variable. Finally the resulting signal is modulated to a frequency corresponding to the X component of its position variable. 2 More exactly a 3-frequency demodulation (the inverse of the 3-frequency modulation of note 1) will be used. The receiving module will use a demodulation frequency around the value of each coordinates X, Y and Z of its position variable to receive and the states coming from modules having position variables close to its own position variable.
156
M. Nicolaidis and E. Kolonis
manner, it will receive only the messages coming from the modules whose position variables have values close to the value of its own position variable. Thus, it will receive the states of its neighbour modules. Several problems have to be addressed for making practical this kind of communication: The power dissipation of a large numbers of emitters integrated in a single chip,
and the related power density The interferences that can affect the transferred information The development of the technologies allowing implementing thousands or hun-
dreds of thousands of tiny emitters/receivers in a single chip These problems are very challenging, but the effort for solving them is worth since it would enable hundreds of thousands of parallel computations, resulting in a huge gain of computing power with respect to current computing systems. The good news are that the communication has to cover very small distances (the size of the nanonetwork). So, the required emission power could be very low. Also, if we isolate the nanonetwork from external disturbances and we perform the communication during lapses of time where no computations are performed in the nanonetwork, then, interferences on the emitted signals coming from external sources or from the computations performed in the chip can be eliminated. As concerning interferences between the emitted signals, a good separation of the emission frequencies is mandatory. This separation will require using a large frequency spectrum, given the large number of emission frequencies required. This separation can be improved if we do not perform all emissions at the same time. For instance, we could use a large number of emission cycles. In each cycle we make emissions corresponding to position variables in which the k less significant bits of each coordinate take a certain value. Thus, for neighbour positions having close emission frequencies the emissions will be performed at different cycles, improving frequency separation within each emission cycle. Developing the technologies for implementing thousands or hundreds of thousands of tiny emitters and receivers occupying very small area and able to modify their modulation/demodulation frequency (following the value of the position variable of each module) is a very challenging task. However, recent developments indicate that the required technologies point already in the horizon. These developments concern the realisation of a nanotube radio recently announced by K. Jensen, J. Weldon, H. Garcia, and A. Zettl from the University of Berkeley [44]. It consists on a single nanotube which realizes simultaneously all essential components of a radio: antenna, tunable band-pass filter, amplifier, and demodulator. The demodulation frequency is selected by the voltage value applied on the electrodes (tungsten and copper) between which is placed the nanotube. The successful operation of the nanotube radio was demonstrate by receiving and playing music from commercial radio stations in both FM and AM.
Computational Opportunities and CAD for Nanotechnologies
157
Nonconventional Architectures In the previous sections we considered that the laws determining the state of the modules composing the complex system are deterministic. However, the behavior of complex systems can be in many cases richer if its evolution is governed by stochastic laws. The approaches and architectures developed in the previous sections can be easily applied to the case where the computation of the state of the modules composing the complex system involves pseudorandom signals produced by pseudorandom generators. Also, instead of pseudorandom signals we can use stochastic signals produced by various sources of noise. Indeed, the use of pseudorandom or stochastic signals does not increase adversely the computational complexity as far as the functions manipulating these are not complex. However, the evolution of microscopic natural systems is governed by the laws of quantum mechanics which on the one had are know to be very complex and on the other hand are known to have strange behavior non encountered in the macroscopic world. In the following we consider the case of complex systems composed by entities governed by this kind of laws. Then, we extend the discussion of non-classical computations by considering the opportunities to achieve unprecedented computing power by exploiting the behavior of various natural systems to solve various complex problems.
Computational Model of Quantum Systems3 If we want to simulate a complex system composed of modules having quantum behavior, e.g., elementary particles, governed by quantum evolution laws, we have to address the “strange” behavior of quantum systems, such as the state superposition and the non locality, which is not encountered in the macroscopic world. So, the first question is, could we reproduce this behavior by means of conventional computation? For doing so, we need to propose a model of quantum systems based on conventional computing. Such a model is proposed in [29–31, 34]. This goal was to show that the Universe, as it can be perceived by any observer being part of it, could be engendered by a computation performed on a meta-substrate inaccessible to our observational means. In the next we are not interested about the non-local behavior of entangled particles. So, we consider a reduced model that takes into account the behavior of state superposition but not the one of non-locality. The left side in Fig. 8 illustrates the behavior of a quantum system: The influence of the environment is determined by a function of potential. The wave function is determined as a solution of the equation of Schrodinger (or another equation such as Klein–Gordon or Dirac).
3
A model for quantum systems more refined than the one presented in this section can be found in [35].
158
M. Nicolaidis and E. Kolonis
Fig. 8 The concept of quantum superposition and its computational counterpart
This function determines the statistical distributions of the physical observables of the particle, such as its position, momentum, spin, energy, . . . . More precisely these distributions are determined by using the algebra of operators according to the following rules: To each observable A is associated an operator b A (whose form has a certain
relation with the expression of the observable in non-quantum physics, i.e., Newtonian or relativistic). The possible values ˛1 ; ˛2 ; :::; of an observable A are the eigenvalues of the operator b A of this observable. The probabilities associated to these values are P1 Dj c1 j2 , P 2 Dj c2 j2 ,... with ci D< i j > 8i 2 f1; 2; :::g, (the inner product of he eigenfunction i corresponding to eigenvalue ˛i and of the wave function ). The above discussion concerns observables, like energy, having discrete spectrum. Similar rules are used in the case of observables having continuous spectrum, such as position or momentum. In this case the statistical distribution is described by a probability density function P(a). For instance, the probability density of the position is equal to j j2 . In the illustration presented in the left part of Fig. 8, we consider the observables of position, momentum, energy and spin. P(r) and P(p) are the probability densities of position and momentum, E1, E2, . . . are the eigenvalues of energy and Pe1, Pe2, . . . the corresponding probabilities, etc. In the following, and in order to simplify the discussion, our illustrations will use the conventions related to observables having discrete spectrum, but the arguments used are also applicable to the case of observables having continuous spectrum. When we measure an observable A, the obtained value is not the result of a deterministic process but can be any value among the eigenvalues ˛1 ; ˛2 ; :::; of operator b A. The probability to obtain a value ˛i among these eigenvalues is equal to Pi .
Computational Opportunities and CAD for Nanotechnologies
159
This measurement acts on the state of the particle by modifying its wave function . Thus, after the measurement, the wave function becomes equal to j , where j is the eigenfunction of b A corresponding to the eigenvalue ˛j obtained as result of the measurement. This kind of state transition is often referred as decoherence while the state of the quantum system before decoherence happens is referred as coherence. According to the dominant interpretation of quantum mechanics, during coherence the observable A of the quantum system is in a superposition of the values ˛1 ; ˛2 ; ::: However, we observe [29] that the concept of superposition: It is not necessary in order to relate the mathematical models with the behaviour
of a quantum systems, as the concepts discussed previously are sufficient for that. It describes a metaphysical state since there is no way to observe that during
coherence the system is simultaneously on several states: none of these states can be observed as far as the system is in coherence. The right part of Fig. 8 illustrates a computational model that reproduces the same behaviour as the quantum system on the left part of this figure. It consists in a computation that transforms a set of stochastic signals wr ; wp ; we having any but given statistical distribution (e.g., Gaussian) into a set of signals r; p; E; ::: having the statistical distributions of the observables of the quantum system as determined by the laws of quantum mechanics. This computation is composed of two parts corresponding to the two computing blocks in the right part of Fig. 8: The first computation determines several deterministic functions (fr, fp, fe,
fs, . . . ). The second computation uses these functions to transform the signals
wr ; wp ; we ; : : : into the signals r; p; E; : : :. The first computation determines the function fa corresponding to an observable A in the following manner: Computation of the wave function (according to the rules of quantum mechanics) Computation of the statistical distribution of observable A (according to the rules
of quantum mechanics) Computation of the function fa (according to the method described in [29–31]),
which transform the signal wa into the signal wa having the statistical distribution of observable A computed in the previous step The above computation steps generate a signal ˛ that has the same statistical distribution as the distribution of observable A determined by the laws of quantum mechanics. That is, the possible values of ˛ are the eigenvalues ˛1 ; ˛2 ; :::; of operator b A corresponding to its eigenfunctions 1 2 ; :::; and the probabilities corresponding to these values are P1 Dj c1 j2 , P 2 Dj c2 j2 ,... with ci D< i j > 8i 2 f1; 2; :::g. If an observable A is measured the result of the measurement is the current value ˛i D fa .wai / of signal ˛, where wai is the current value of the stochastic signal wa (from the above discussion ˛i is always one of the eigenvalues of b A). Also, as a result of the measurement, the first computing bloc sets D i and computes accordingly the functions fr ; fp ; fe ; :::. This influence on functions fa of
160
M. Nicolaidis and E. Kolonis
the value obtained from the measurement is represented in the right part of Fig. 8 by an arrow which brings the output value of the second computing bloc (signals ˛ W r; p; E; s) to the inputs of the first computing bloc. The relation D i implies that the computation of the function fa corresponding to the measured observable gives fa .wa / ˛i , i.e., a function that gives as result the value ˛i for all values of wa (decoherence). Consequently, any new measurement of this observable will give as result the unique value ˛i generated by the function fa .wa / ˛i . These steps can be done by conventional computations, that is, not involving the “strange” state of superposition (e.g., a superposition of the position observable r on an infinite of different positions), which does not exist in the macroscopic world and could be reproduced by a classical computer). Except some simple cases, these computations will be extremely complex since quantum systems exhibit extremely complex behaviour related to a very complex computation of their wave function. To illustrate this complexity let us consider Thomas Youngs double-slit experiment. In the original experiment, light diffracts through two slits and creates wave-like interference patterns on a screen. These patterns can be attributed to the “wave” nature of light. The same experiment can also be carried out by means of bean firing a single particle at a time (e.g., a single electron). In this case each electron hits the screen at a given position exhibiting “corpuscular” nature. This is expected since the particles were launched one by one, so they can not interfere with each other. The experimental evidence shows that the position on which each member of a series of electrons will hit the screen is completely unpredictable. However, after a large number of electrons have hit the screen, we observe again wave-like interference fringes. The shape of the fringes is perfectly predictable. It reveals a perfectly predictable statistical distribution for the positions at which the electrons hit the screen, which corresponds to a wave function D . 1 C 2 / such that: 1 corresponds to the wave function of an electron traversing slit 1 and 2 corresponds to the wave function of an electron traversing slit 2. According to the superposition concept, each particle takes all possible trajectories (in the present case the trajectories which pass through the two slits). The superposition of the wave functions 1 and 2 corresponding to these two trajectories gives as result the wave function D . 1 C 2 /. On the other hand, according to the computational interpretation presented previously, a veritable particle will not take simultaneously several veritable paths in a veritable space. Instead, through its interactions with the particles composing the wall on which the two slits are pierced, the electron receives information which is used by the first computational part (in the right part of Fig. 8) to compute a function fr that corresponds to the wave function D . 1 C 2 /. This will give the same statistical distributions for the position r of the electron as in the first interpretation. Thus, the two interpretations will give identical interference fringes. We also observe that in both interpretations, the state of the electron is determined in an extremely complex manner: Determining its wave function by taking all the possible trajectories, in the su-
perposition interpretation
Computational Opportunities and CAD for Nanotechnologies
161
Receiving information from all the particles of the environment and using it to
compute the wave function interpretation
and then the functions f˛ , in the computational
Implementing Complex Systems Composed of Quantum Objects We observe that we could use the computational model described above to implement, in a nanonetwork performing conventional computations, complex systems whose elementary modules are quantum objects. However, except some simple cases, the computational power required for each quantum object could be huge. Thus, each quantum object will require vast hardware and/or time resources, reducing dramatically the complexity of the implementable complex systems (number of elementary modules that could be implemented) and/or increasing dramatically the time required to simulate a target complex system. The alternative solution is a mix of conventional and non conventional computing. We can consider two cases of systems composed of quantum objects. The first case is the simpler one. In this case, the quantum objects do not interact with each other, or their interactions are very weak and can be ignored. Thus, the environment in which evolve the quantum objects is determined by a potential external to the simulated system of quantum objects. In this case, each elementary module will comprise: A quantum object A measurement unit able to measure the state of the target observables of the
quantum object In addition the system will include digitally controlled means for creating the potential in which are immersed the quantum objects. In the second case, the interactions between quantum objects are strong, so that the environment in which evolves each quantum object is determined by the state of the other quantum objects. In this case, each elementary module of the complex system will be implemented by means of: A quantum object A measurement unit able to measure the state of the target observables of the
quantum object. Communication means for the modules to exchange the results of measurements Numerical computing means to determine the environment (e.g., potential func-
tion) in which is immersed the quantum object, as a function of the measurements of the states of the other quantum objects determined as above A digitally controlled environment to create around the quantum object the environment (potential) computed above The results exchanged between elementary modules could be: The results of individual measurements of the observables (if at each computation
cycle the environment seen by each quantum object is created from the individual measurements of the other quantum objects).
162
M. Nicolaidis and E. Kolonis
The statistical distributions of these observables (if the environment is created on
the base of the statistical distributions of the observables). In the later case, and at each module, a large number of computation and measurements cycles will be repeated for the same environment of the quantum object, to obtain statistical distributions, before using these distributions to determine the changes in the environment of each quantum object and move to a new computation phase. The realization of this kind of computing systems will enable simulating complex systems composed of large numbers of quantum objects and leverage the investigation and understanding of various natural systems at unprecedented levels. But, realizing such systems is easier said than done. For one thing, quantum decoherence seems to make impossible realizing reliable computations by a system which comprises thousands of quantum objects. However, while in quantum computing using q-bits exploits the behaviour of quantum systems in order to perform deterministic computations, in the present discussion we are interested to reproduce the behavior of quantum systems, which includes decoherence. Thus, decoherence may not be the basic obstacle. What seems most problematic relies to the technologies required for miniaturizing and integrating in nanonetworks the various elements composing each module, as described above, and doing that in vast numbers. So, it should take a significant amount of time before being able to realize such systems. So far, efforts are concentrated on the development of technologies for realizing and controlling systems comprising relatively small amounts of quantum objects, with the goal to use their inherent computing power for solving some problems intractable by conventional computing. This kind of quantum computing is discussed in the next section.
Quantum Computing4 The evolution of the state of any natural system can be associated to a computation [29–31]. Accordingly, any natural system could be used to perform computations. There is therefore an extremely wide range of potential computing systems, leaving a large field for future investigations. These investigations should address several issues. First, to perform useful computations we have to identify natural objects together with useful problems that could be solved by exploiting the behaviour of these objects. Then, we have to find practical technologies allowing to built systems composed of such objects and provided with means for controlling their parameters in order to constraint them to perform the desirable computation, as well as with means for reading out the observables that contain the computations result. As discussed in section Computational model of quantum systems, the behaviour of quantum systems is extremely complex, revealing the huge computing power that
4
Arguments against the parallel computing interpretation of quantum computing more elaborated than the ones presented in this section can be found in [35].
Computational Opportunities and CAD for Nanotechnologies
163
Fig. 9 An inefficient computing system
could be delivered by such systems. However, in objects composed by large numbers of quantum systems, this computing power is masked by the averaging effects of the macroscopic world. This discussion illustrates that, using systems that are non adequate for solving a particular problem and/or using macroscopic systems to perform computations results in a huge waste of natures tremendous computing power. For instance, a human, a black board and a piece of chalk (Fig. 9) is a very inefficient system for performing certain simple operations like multiplication. A computer is more adequate and thus more efficient for performing this kind of computation. Due to their macroscopic nature, both systems make a huge waste of natures computing power. Indeed, just the piece of chalk contains a tremendous computing power inherent to each one of the myriads of quantum systems that composed it, but their computing power can not be exploited to deliver useful computations as far as they are organised to form a chalk. Mastering the matter at the microscopic level and in particularly controlling quantum systems in a manner that allows exploiting their huge computing power is one of the goals of nanotechnologies. Quantum computing is usually associated with computing systems based on q-bits. But we should rather employ quantum computing to designate any system using quantum objects for performing useful computations. The huge power of q-bits based computing is attributed to the superposition state and related parallel computing. However, section Computational model of quantum systems describes a computational model of quantum systems which attributes the complex behavior of quantum systems to the complex manner in which they determine the statistical distributions of their observables, rather than to the state of superposition. In the following we use this vision to discuss two kinds of quantum computing, the one based on q-bits and the other based on Bose–Einstein condensates.
Computing with q-Bits: Parallel or Not Parallel? Quantum computing based on q-bits is a tentative to perform useful computations by exploiting the amazing computing power of quantum systems. The goal of this section is not to present recent ant past developments on q-bit based computing (a few references extracted from the very rich literature related to this topic
164
M. Nicolaidis and E. Kolonis
are [9, 18, 42, 43, 47–49]), but rather to discuss the properties of quantum systems which make them powerful computing objects and to highlight the fact that quantum computing should be viewed as stochastic computing rather than as parallel computing [34]. This kind of interpretation is also more convenient for supporting quantum computing principles different than q-bit based computing. In few words, a q-bit corresponds to an observable A of a quantum system which can take two possible states. We can associate the value 0 to the one of these states and the value 1 to the other. According to the interpretation of quantum mechanics based on the concept of superposition, in its state of superposition a q-bit may be at the same time on the state 0 and on the state 1. For a quantum computer comprising n q-bits, in the state of superposition these n q-bits may be simultaneously on 2n states. A quantum algorithm transforms the state of the n q-bits. Then, according to the interpretation using the superposition concept, the extraordinary power of quantum computing is due to the fact that this transformation manipulates at the same time all the 2n values of the n q-bits (i.e., it is executed in parallel 2n times), while a traditional algorithm manipulates only one of these values. The claimed parallelism of quantum computing is also accredited by the mathematical formalism used to describe the transformation of the state of the n q-bits performed by quantum algorithms. This formalism comprises: The description of the state of the n q-bits by a wave function j .t/ >D
P2n 1
ai .t/jli > where jli >i 2 f0; 1; : : : ; 2n 1g are the 2n possible states of the n q-bits, and ai .t/ is a complex number such that ai .t/ D< li j .t/ > and jai .t/j2 is the probability of occurrence of state jli > during a measurement. A 2n 2n unitary matrix U(t,0) , which transforms the initial state j .0/ > of the n q-bits into their final state j .t/ > given by j .t/ >D U.t; 0/j .0/ >. i D0
Thus, the quantum algorithm would perform the operation j .t/ >D U.t; 0/j .0/ > which transforms the initial 2n coefficients ai .t/ into the final ones. That is, it performs simultaneously an operation over 2n coefficients, revealing a parallel manipulation over 2n values. So, it seems that we should accept the superposition as a factual state and that we should reject the computational interpretation of quantum systems based on stochastic processes presented in section Computational model of quantum systems; or perhaps should we not? One argument against the idea of a veritable superposition and the related parallel computing is that the state of superposition of observables with continues spectrum, like position and momentum (but also of an observables of discrete spectrum like the energy), will comprise an infinity of values. Thus, a quantum system will evolve its state by manipulating simultaneously an infinity of values, revealing an infinite computing power!!! Of course, the construction of quantum computers using such observables seems improbable (due to their very high susceptibility to decoherence), but these technological limitations should not mask the absurdity of an infinite computing power that the concept of state superposition attributes to quantum systems. Also, we saw earlier that: The concept of superposition describes a metaphysical state since there is no way
to observe that during coherence the system is simultaneously on several states: none of these states can be observed as far as the system is in coherence.
Computational Opportunities and CAD for Nanotechnologies
165
The concept of superposition is not necessary for modelling the behaviour of
quantum systems (the Schrodinger equation together with the algebra of operators are sufficient for that). All these reasons plead for abandoning the idea of a veritable superposition. However, as the efficiency of quantum algorithms is justified by a parallel computation executed simultaneously on each value of the superposition state, the paradigm of quantum computing seems to accredit the concept of superposition by attributing to it a factual status. To answer this dilemma, let us discuss the transformations of the quantum system performed by a quantum algorithm by ignoring the interpretation based on the concept of superposition and by using only the strictly necessary concepts of quantum mechanics required to relate the mathematical models with the behaviour of a quantum system. According to these concepts, when an observable A of a quantum system is in coherence, the state of the observable is described by a statistical distribution which describes the possible values that can be obtained when the observable is measured, together with the corresponding probabilities to obtain these values during a measurement. Accordingly, the quantum algorithm transforms the state of the n q-bits from an initial state, corresponding to a statistical distribution where the solution of the problem appears with a low probability, into a new state corresponding to a statistical distribution where the solution of the problem appears with a high probability. The quantum algorithm finds the solution because of this high probability. That is, by measuring the state of the n q-bits at the end of the algorithm, we have a large probability to obtain as result the solution of the problem. By repeating the algorithm and the measurement several times and by selecting the value observed with the highest frequency, we have a very high probability to obtain the solution of our problem. These steps, which describe the way q-bit systems compute, reveal and/or involve transformations of the statistical distributions of a physical observable. They also involve the measurement of this observable for a statistically significant number of times to obtain the result with sufficient confidence. There is nothing in these steps that involve or reveal a parallel computation over a plurality of values. Considering such a computation could make sense if it was explaining some capability of quantum computers that the concept of statistical distributions could not explain (e.g., a parallel computing provides the solution with certitude, not just with a high probability). Accordingly, the computing power of q-bits is not due to an hypothetical parallel process which manipulates simultaneously a plurality of values in a hypothetical superposition, but to the ability of quantum systems to evolve their state (and therefore the related statistical distributions) in a very complex manner, as was illustrated in the previous section Computational model of quantum systems with the example of Youngs double-slit experiment. In short, a quantum algorithm would consist in a judicious technique allowing to constraint quantum processes to produce pertinent results by manipulating what is deterministic in these processes, that is, their statistical distributions. On the light of the above discussion let us consider quantum computing in the context of the computational model of quantum mechanics described in section Computational model of quantum systems. This interpretation replaces
166
M. Nicolaidis and E. Kolonis
the superposition of states by a computation which transforms a stochastic signal w˛ into another stochastic signal ˛ by means of a deterministic function f˛ according to the relation ˛ D f˛ .w˛ /. This process is too simple to be able to become the support of a computation as powerful as that of quantum algorithms. Nevertheless, the computational interpretation of quantum processes comprises a second computation, the one which determines the functions f˛ . All the complexity of the behaviour of quantum systems lies in this second computation. Then, the subterfuge of quantum algorithms would consist in finding means to exploit this computation in order to force the quantum system to transform in a controlled way its wave function, and thus the function f˛ corresponding to the observable A. The goal of this transformation will be to obtain a function f˛ which produces a signal ˛ D f˛ .w˛ / having a statistical distribution in which the value corresponding to the solution of the problem treated by the algorithm has a large probability. Thus, the quantum algorithm would induce a transformation of the initial function f˛ into a new a function f˛ having this characteristic. This discussion pleads for abandoning the idea that quantum superposition corresponds to a veritable superposition over several states, which supports in this way a veritable parallel computation over these states.Iinstead, it suggests to consider as quantum superposition the state where the statistical distribution of an observable allocates a non nil probability to more than one value, and to attribute the computing power of quantum systems to the complex way they can transform the statistical distributions of their observables. To close this section and introduce the next one, let us give a last argument supporting the above ideas. Attributing the computing power of quantum systems to a veritable superposition and to the related parallel processing is limiting, since quantum systems different than the ones using q-bits could be employed to perform powerful computations that are not described by means of a parallel processing performed over a plurality of values in superposition. Thus, while the complex behaviour and the related computing power of all quantum systems is related to a unique principle – their capability to evolve in a complex manner their state (their wave function or equivalently the associated statistical distributions) – we will need to use two different concepts to interpret the computing power of various quantum computing systems: A parallel computation over 2n states in superposition, to interpret the computing
capabilities of q-bit based computing The complex manner in which quantum systems determine their wave function,
to interpret other types of quantum computing
Computing with Bose–Einstein Condensates While the field of quantum computing based on q-bits has gained significant diffusion among computer scientists, this is not the case for other types of quantum computing. Their development is promising for realizing computing systems which can solve problems that could be treated neither by conventional computers nor by
Computational Opportunities and CAD for Nanotechnologies
167
q-bit based quantum computers. Thanks to such developments quantum computing will not be limited to the problems known today to be treatable by fast q-bit based quantum algorithms (number factorisation, discrete logarithms, searching of unsorted data, analysis of large amounts of data). Successful accomplishment of such developments will require: Determining practical quantum systems (i.e., systems that are easy to realize and
control and allow easy reading of their results) Determining complex and useful problems that the behaviour of these quantum
systems could be used to solve A practical example of this kind of quantum computing has recently emerged with the progress of the technology related to the realisation and control of Bose– Einstein condensates (BECs). BECs correspond to a state of matter formed by Bosons confined by an external potential and cooled below a critical temperature, close to absolute 0. Under such conditions a large fraction of the Bosonic particles fall into the lowest quantum state of the external potential (and thus are described by a single wave function). The existence of this state of matter was predicted on 1924-1925 by S. N. Bose and A. Einstein. But the first condensate was realized experimentally only on 1995 [1]. Because they can comprise from several thousands to few millions atoms, these quantum systems are more stable and much easier to realize and control than q-bits. Thanks to the progress of laser cooling and trapping and of magnetic trapping and evaporative cooling technologies, we can today tune the parameters of BECs with high accuracy. This allows creating BECs which reproduce the behaviour of various complex systems. Thus, we can exploit the huge processing power of quantum systems for resolving problems intractable with conventional computers. Examples are: Simulation of solid state systems that are difficult or impossible to calculate: Use
of several laser beams to create optical lattices resembling at crystalline lattices (egg-carton like potential wells formed at the intersection of laser beams) and trap rubidium atoms in potential wells with controlled depth. This technique allows creating an artificial “optical crystal” with controllable almost all aspects of the underlying periodic structure and the interactions between the atoms [3]. It can therefore be used as a computer simulating solid state systems. Resolution of polynomial equations with stochastic coefficients: Random polynomials play an important role in various fields of physics. But until recently nobody had any idea about how their roots look like, neither was possible to compute them even by means of the most powerful computers. But the wave function of a rotating condensate can be described by a random polynomial. Thus, quantum vortices created by fast rotating condensates can be used to solve the problem: the vortices sit at the locations where the wave function “vanishes”, so they correspond to the roots of the random polynomial [7]. Simulation of black-holes: In dilute-gas Bose–Einstein condensates, there exist both dynamically stable and unstable configurations which, in the hydrodynamic limit, exhibit a behaviour resembling that of gravitational black holes [15].
168
M. Nicolaidis and E. Kolonis
Other works show the possibility to use Bose–Einstein condensates to simulate
neutron stars, white dwarfs and supernova explosions, while Bose–Einstein condensates were also used to compute the value of previously unknown key parameter of superconductivity in metals. Other applications of BEC should certainly come in this young and fast emerging field, while the use of other quantum systems for computing purposes should also emerge in the future. The unprecedented computing power exhibited in these examples is explained by the huge computing power related to the complex manner quantum systems determine their wave function (or the related statistical distributions of their observables). Today it becomes possible to exploit this computing power as we dispose the technology for tuning the parameters of BECs with high accuracy, and thus, for tuning the evolution of their wave function to reproduce the complex behavior of various natural systems, which are impossible to simulate even with the most powerful conventional computers. Thus, the computing power exhibited in the above applications can be attributed to the complex manner quantum systems determine their wave function rather than to a supposed parallel computation over a set of states in superposition. Knowing that the behavior of all quantum systems is determined by the same principles, is it reasonable to consider two different interpretations for explaining the computing power of q-bits based computing and of BEC based computing? Existing achievements concern the realization and experimentation of single BECs in lab conditions. Developments for miniaturizing the related technology, like silicon lasers [8] and chip magnetic traps [39, 45], could enable creating complex systems comprising large numbers of BECs. Given the huge computing power of a single BEC, we can imagine the tremendous computing power delivered by systems including large numbers of BCEs, q-bits or other kinds of quantum computing elements, and controlled numerically to perform computing tasks of a complexity difficult to imagine today. The development of nanotechnologies could bring in a not so far future all the ingredients of the mixture of technologies needed to integrate such systems in a single device. This could seem today an extraordinary achievement, but it is only a beginning as we are far from understanding which natural systems are best suited for resolving one or another computational problem. This is not limited to the behavior of various kinds of quantum systems and the kind of problems they could be used to solve. Various other objects could be efficient for performing one or another kind of computation. As an example, the memristive behavior observed in some nanowire junctions could be used to efficiently implement the learning and classification processes of synapses in artificial neural networks [46]. A promethean research effort will be required for progressing conjointly on three interdependent fronts: Creating new computation paradigms Understanding the behavior of various natural systems that could be used to effi-
ciently implement existing or new computation paradigms Developing the technologies for realizing them
Probably computers in the more distant future will be far different from the todays model of universal computers, based on the von Neumann paradigm, realized
Computational Opportunities and CAD for Nanotechnologies
169
by switching components, and offering a vast versatility that allows programming them to execute virtually any task but with “low” computing efficiency. The computing of the future could be based on a variety of computing paradigms, each one adapted to treat with unprecedented efficiency some particular classes of computations. The driving force towards this direction will be the opportunity to treat very complex problems untreatable by other means. Its advent will be delayed by the time and investments needed for developing the underlying scientific knowledge and technologies, but also by the economic constraints of cost/benefits related to a fragmented market addressed by a variety of computing components each targeting a particular class of problems.
CAD for Non-conventional Architectures As for the CAD framework(s) needed for these kinds of systems, it will be in many aspects different than nowadays CAD, and in any case it is very early to investigate them. Thus, for the moment we can only express some thoughts. Some important difficulties will concern the modeling and simulation of non-conventional computing systems, like quantum computers. For instance, the simulation of q-bit based computers of meaningful size will be unrealistic by means of classic computers. Q-bit based computers of one generation could be used to simulate q-bit based computers of a next generation. On the other hand, q-bit based algorithms could be easily described at high-level by using unitary matrixes. Then they can be synthesized at an implementable net list by means of a quantum cell library, comprising a universal set of quantum gates [24, 41]. Thus, the high level description of a quantum algorithm could be easily handled by a CAD tool which encapsulates the knowledge of known quantum algorithm. Their transformation into a quantum-gate-level implementation could also be handled easily by decomposing the corresponding unitary matrixes into a netlist of quantum logic gates. To extend the versatility of quantum computers, reconfiguration techniques have to be developed allowing to program digitally the configuration of a network of quantum gates for selectively implementing one or another quantum algorithm in a single substrate (a Field Programmable Quantum Array [FPQA] – the quantum counterpart of FPGAs). Using classical computers for simulating other kinds of quantum computers, like the ones based on Bose–Einstein condensates, will also be intractable. Again, quantum computers of one generation could be used to simulate quantum computers of a next generation. The high level and low level descriptions and modeling of such computers is today less obvious than for q-bit based computers. Significant experience gained in lab implementations should be accumulated to create a comprehensive list of computing problems and corresponding BEC architectures able to solve them. It could then be possible to develop CAD tools encapsulating the required knowledge for associating a BEC system structure to each problem domain. This is necessary in order to create CAD tools able to generate the appropriate implementation description corresponding to each computing problem that could be treated by BECs. Fabrication technologies
170
M. Nicolaidis and E. Kolonis
and digitally controlled BEC architectures allowing controlling the parameters of a BCE implementation for treating various computing problems, need also to be developed, together with the relevant CAD tools. The development of a variety of tools each able to treat a particular domain of computing problems, will enable the development of a CAD framework able to master the design of complex heterogeneous nanometric systems. Such systems could include various parts (digital units, q-bit units, BEC units, neural network units, . . . ). Such tools will greatly benefit from recent progress on CAD for heterogeneous systems design [37,38]. On the other hand, the simulation of such heterogeneous systems could be problematic due to the difficulty for simulating the quantum computing parts by conventional computers. But quantum computing systems of an earlier generation could help on simulating a later generation. Furthermore, validation of such heterogeneous systems through simulation in classical computer could also be handled at a certain extend by using high level models of the quantum computing systems. Thus, the CAD framework could encapsulate knowledge concerning the computing problems that could be solved by each kind of quantum computing system, the quantum system architecture and its parameter specifications able to solve each of these problems, the structure of input and output data of the quantum systems, their timing characteristics, etc. Based to such knowledge, a classical computer could simulate the heterogeneous system to validate various of its characteristics, such as the compatibility of the interfaces between digital and quantum systems, the capability of the implementations of the quantum parts to treat the problems they are designed for (thanks to prior encapsulation in the CAD tools of knowledge concerning the implementation aspects of quantum computing systems), the capability of the digital parts to control the quantum parts for solving the various problems the system is designed for, the timing compatibility between the digital and the quantum parts, . . . . Such simulations could validate at a large extend the system without needing to simulate in details the computations performed by its quantum parts.
Conclusions By considering the constraints for fabricating components including huge amounts of devices (nano-networks), as promised by nanotechnologies, we suggest using a non von Neumann, highly parallel, reconfigurable computing paradigm, for implementing applications of profound interest for science and technology. They consist of complex systems composed of large numbers of simple elements. Then, we describe a CAD platform aimed at exploiting efficiently the nano-networks for implementing this kind of applications. This platform includes: A High-level Modeling and Simulation (HLMS) tool allowing implementing in a traditional software approach complex systems composed of a large number of identical modules. A Nano-network Architecture FIT (NAF) tool allowing adapting the high-level description generated by HLMS to the nano-network architecture. A Circuit Synthesis tool allowing mapping the description generated by the NAF tool on the
Computational Opportunities and CAD for Nanotechnologies
171
architecture of the nano-network. The first of these tools was finalized and described shortly here. It allows generating at low user effort a software representation (model) of the target complex system and exploring its behavior through simulation experiments. The versatility of the tool was illustrated by experimenting two quite different families of complex systems: artificial ecosystems and artificial universes. Our first experiments of artificial ecosystems showed their rich behavior and the interest to pursue the exploration of more complex cases. The experiments of artificial universes illustrated the mechanisms of emergence of relativistic space-time geometry in such systems. Ongoing work concerns the development of the second (NAF) tool. We also discussed architectural solutions for implementing in nanonetworks complex systems composed of large numbers of simple elements, and we propose communication solutions for addressing the communication struggle in this kind of totally parallel systems. In the last part of this chapter we discussed non conventional computations, including, the implementation of complex systems composed of large numbers of entities exhibiting quantum behavior, q-bit based quantum computing as well as other kinds of quantum computing illustrated with the case of quantum computing based on Einstein–Bose condensates. We also presented a computational model of quantum systems which could be used to replace the concept of a veritable state superposition and of a related veritable parallel processing, as these concepts could lead to the absurd situation of quantum systems exhibiting infinite computing power. We finish with a short discussion of our vision concerning the future of computing.
References 1. M. H. Anderson, J. R. Ensher, M. R. Matthews, C. E. Wieman and E. A. Cornell, “Observation of Bose-Einstein condensation in a dilute atomic vapor,” Science 269, 198–201 (1995). 2. Andre DeHon, “Array-Based Architecture for FET-based, Nanoscale Electronics”, IEEE Transactions on Nanotechnology, 2(1):23–32, March 2003. 3. I. Bloch, “Quantum Gases in Optical Lattices”, Physics World 17, April 2004. 4. N. Bowden, A. Terfort, J. Carbeck, and G.M. Whitesides, “Self-assembly of mesoscale objects into ordered two-dimensional arrays,” Science, v. 276, April 1997, pp. 233–235. 5. P. J. Burke, “AC Performance of Nanoelectronics: Towards a Ballistic THz Nanotube Transistor,” Solid-State Electronics 48, 1981 (2004). 6. Timothy J. Callahan, Philip Chong, Andre DeHon, and John Wawrzynek. Fast module mapping and placement for datapaths in FPGAs Pages: 123–132. of the 1998 ACM fifth international symposium on Field-programmable. 7. Y. Castin, Z. Hadzibabic, S. Stock, J. Dalibard, and S. Stringari, “Quantized Vortices in the Ideal Bose Gas: A Physical Realization of Random Polynomials”, Physical Review Letters, vol. 96, Issue 4, Feb. 2006. 8. H.-H. Chang et al, “1310nm silicon evanescent laser,” Optics Express, vol. 15, no 18, pp. 11466–11471, August 2007. 9. J. I. Cirac and P. Zoller, “Quantum Computation with Cold Trapped Ions,” Physical Review Letters, vol. 74. no. 20, pp. 4091–4094, 1995. 10. J. Cong and C. Wu “Optimal FPGA Mapping and Retiming with Efficient Initial State Computation,” IEEE Trans. On Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 11, pp 1595–1607, November 1999.
172
M. Nicolaidis and E. Kolonis
11. D. Cory, A. Fahmy, and T. Havel, “Experimental Implementation of Fast Quantum Searching,” Proc. Nat. Acad. Sci. U.S.A. 94, 1634 (1997). 12. A. DeHon et al, “Sub-lithographic Semiconductor Computing Systems”, Appearing in HotChips 15, Stanford University, August 1719, 2003. 13. J. P. Elliott, Understanding Behavioral Synthesis: A Practical Guide to High-Level Design. Kluwer Academic Publishers, 1999. 14. D. Gajski and L. Ramacahndran, Introduction to high level synthesis. IEEE Design and Test Computer, October 1994. 15. L. J. Garay, “Black Holes in BoseEinstein Condensates,” International Journal of Theoretical Physics, Volume 41, Number 11, Nov. 2002. 16. M. Genoe et al, On the use of VHDL-based behavioral synthesis for telecom ASIC design. In Proc. Intl. Symposium on System Synthesis ISSS’95, Feb. 1995. 17. S. Goldstein et al, “Reconfigurable computing and electronic nanotechnology”, ASAP 03, pp. 132–142, June 2003. 18. L. K. Grover, “Quantum Mechanics Helps in Searching for a Needle in a Haystack”, Physical Review Letters, vol 79, no 2, July 1997. 19. J. Han and P. Jonen, “A system architecture solution for unreliable nanoelectronic devices”, Nanotechnology, Vol. 14, No. 2, pp. 224–230, 2003. 20. J. R. Heath, P. J. Kuekes, G. S. Snider, and R. S. Williams, “A Defect-Tolerant Architecture: Opportunities for Nanotechnologies”, Science, June 98, Vol. 280, pp.1716-1721. 21. X. Huo, M. Zhang, P. C. H. Chan, Q. Liang, and Z. K. Tang, “High-frequency S. parameters characterization of back-gate carbon nanotube field-effect transistors,” in IEEE IEDM Tech. Digest, 2004, p. 691. 22. International Technology Roadmap for Semiconductors. http://public.itrs.net , 2004. 23. B. Kane, “A Silicon-based Nuclear Spin Quantum Computer”, Nature 393, 133 (1998). 24. P. Kaye, R. Laflamme, and M. Mosca “An Introduction to Quantum Computing,” Oxford University Press, New York, 2007. 25. D. Klein et al, “A single electron transistor made from a candmium selenide nanocrystal”, Nature, Vol. 389, pp. 699–701, Octob. 1997. 26. E. Kolonis and M. Nicolaidis, “Towards a Holistic CAD platform for Nanotechnologies”, to appear in Microelectronics Journal, Elsevier. 27. E. Kolonis and M. Nicolaidis, “Computational Opportunities and CAD for Nanotechnologies”, Keynote, Workshop on Robust Computing with Nano-scale Devices: Progresses and Challenges, DATE 2007, April 16–20, 2007, Nice France. 28. V. Ng et al, “Nanostructure array fabrication with temperature-controlled self-assembly techniques” 2002 Nanotechnology 13, 554–558. 29. M. Nicolaidis, “Une Philosophie Numerique des Univers”, TIMA EDITIONS, June 2005, ISBN: 2-916187-00-6. 30. M. Nicolaidis, “A Numerical Philosophy of the Universe and the Fundamental Nature of Computer Science”, 3rd Asia-Pacific Computing and Philosophy Conference, 2–4 November 2007, Singapure. 31. M. Nicolaidis, “A Numerical Philosophy of the Universe and the Fundamental Nature of Computer Science”, TIMA Laboratory, ISRN TIMA-RR–07/10-02FR, October 2007. 32. M. Nicolaidis, “Emergence of Relativistic Space-Time in a Computational Universe”, 3rd AsiaPacific Computing and Philosophy Conference, 2–4 November 2007, Singapure. 33. M. Nicolaidis, “Emergence of Relativistic Space-Time in a Computational Universe”, TIMA Laboratory, ISRN TIMA-RR-07/10-03-FR, October 2007. 34. M. Nicolaidis, “On the State of Superposition and the Parallel or not Parallel Nature of Quantum Computing: a controversy raising view point”, 2008 European Computing and Philosophy Conference (ECAP08), June 16–18, 2008, Montpellier, France. 35. M. Nicolaidis, “Computational Space, Time and Quantum Mechanics,” chapter in Thinking Machines and the Philosophy of Computer Science: Concepts and Principles, editor Jordi Vallverd, in print, IGI Global, 2010.
Computational Opportunities and CAD for Nanotechnologies
173
36. A. O. Orlov, R. Kummamuru, J. Timler, C. S. Lent, G. L. Snider, and G. H. Bernstein, “Experimental studies of quantum-dot cellular automata devises”, Mesoscopic Tunnelling Devises, 2004: ISBN: 81-271-0007-2. 37. H. D. Patel and S. K. Shukla, “Towards a heterogeneous simulation kernel for system level models: a SystemC kernel for synchronous data flow models,” VLSI, 2004. Proceedings. IEEE Computer Society Annual Symposium on 19-20 Feb. 2004, pp. 241–242. 38. Ptolemy Group, Ptolemy II, Website: http://ptolemy.eecs.berkeley.edu/ptolemyII/ 39. J. Reichel, “Microchip traps and Bose-Einstein Condensation”, Applied Physics. B, Lasers and Optics, vol. 74, no 6, pp. 469–487, 2002. 40. S. Rosenblatt, Y. Yaish, J. Park, J. Gore, V. Sazonova, and P. L. McEuen, “High performance electrolyte-gated carbon nanotube transistors”, Nano Letters 2, 869 (2002). 41. V. V. Shende1, S. S. Bullock, and I. L. Markov “Synthesis of Quantum Logic Circuits,” IEEE Trans. on Computer-Aided Design, vol. 25, no. 6, pp. 1000–1010, June 2006. 42. P. W. Shor, “Scheme for Reducing Decoherence in Quantum Computer Memory”, Physical Review A. no 4, vol. 52, pp. R2493–2496, Oct. 1995. 43. P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer”, SIAM Journal of Computing 26, pp. 1484–1509, 1997. 44. C. D. J. Sinclair et al, “Preparation of a Bose-Einstein Condensate on a Permanent-Magnet Atom Chip”, Conference on Atoms and Molecules Near Surfaces, Heidelberg, 4-8 april 2005, Journal of Physics, Conference Series vol. 19, pp. 74–77, 2005. 45. Single nanotube makes world’s smallest radio, http://www.berkeley.edu/news/media/releases/ 2007/10/31 NanoRadio.shtml. 46. G. S. Snider, “Self-organized Computation with Unreliable, Nanodevices,” Nanotechnology no. 18 365202-365214, Issue 36, September 2007. 47. A. M. Steane, “Error Correcting Codes in Quantum Theory,” Physical Review Letters, vol. 77, no 5, pp. 793–797, 1996. 48. D. Stick et al, “Ion Trap in a Semiconductor Chip”, Nature Physics, pp. 36–39, January 2006. 49. D. Stick et al, “The Trap Technique, Toward a Chip-Based Quantum Computer”, IEEE Spectrum, vol. 44, no 6, pp. 30–37, August 2007. 50. S. J. Wind, J. Appenzeller, R. Martel, V. Derycke, and P. Avouris, “Vertical scaling of carbon nanotube field-effect transistors using top gate electrodes”, Appl. Phys. Lett. 80, 3817 (2002). 51. Xue-Jie Zhang, Kam-wing Ng, and Gilbert H. Young. High-level synthesis starting from RTL and using genetic algorithms for dynamically reconfigurable FPGAs. Proc. of the 1998 ACM fifth international symposium on Field-programmable.
Index
A Abo, F., 8, 15 Abramivici, M., 73 Abramovici, M., 72, 73 Adachi, S., 8, 15 Afzali, A., 110 Alam, K., 100 Al-Hashimi, B.M., 29–47 Alvisi, L., 16 Al-Yamani, A., 16, 29–47 Amerson, R., 70, 74 Amirtharajah, R., 30, 32 Anderson, M.H., 167 Appenzeller, J., 100, 103, 108, 109, 110, 112, 138 Asadi, G., 36 Avouris, P., 100, 103, 108, 109, 110, 112, 138 Aziz, A., 30, 31, 33
B Bachtold, T.N.A., 30 Badmaev, A., 100, 112 Bahar, R.I., 32, 37, 51 Bai, X., 41 Balkir, S., 125, 126, 131, 132, 133, 135 Bandyopadhyay, S., 125–136 Banerjee, A., 129, 130, 131, 132 Banerjee, G., 126 Bansal, A., 108, 110, 111 Baranov, A., 129, 130, 131, 132 Baugher, E., 126 Beiu, V., 14 Bernstein, G.H., 138 Bernstein, K., 20 Bertoni, G., 15 Bhaduri, D., 14, 15, 37 Bhide, S., 15
Biere, A., 60 Blanton, R.D.S., 75, 78, 95 Bloch, I., 167 Bowden, N., 138 Bowen, N., 15 Brayton, R.K., 60 Breveglieri, L., 15 Brown, J.G., 75, 78, 95 Brown, J.R.S., 70 Budiu, M., 52, 69, 70, 95 Built-in self-test (BIST), 3, 69–96 Bullock, S.S., 169 Burke, P.J., 138 Butts, M., 30, 70
C Callahan, T.J., 149 Carbeck, J., 138 Carbon nanotube FET, 1, 33, 100–103, 106, 107, 108, 112, 117, 119 Carro, L., 36 Carter, R., 70, 74 Castin, Y., 167 Castro, L.C., 100 Chakrabarty, K., 16, 72, 75, 78, 80, 87, 95, 96 Chandramouli, M., 24 Chandrasekhar, M., 126 Chang, H-C., 126 Chang, H.-H., 168 Chan, P.C.H., 138 Chatterjee, M., 15 Chen, C., 15 Cheng, H.M., 100, 109, 110, 112 Chen, J., 37, 110 Chen, P., 73 Chen, Y., 30, 52, 66 Chen, Z.G., 100, 109, 110, 112
175
176 Cheung, M., 100 Chillarege, R., 15 Chong, F.T., 30, 32 Chong, P., 149 Chuang, I.L., 30, 32 Cimatti, A., 60 Cirac, J.I., 164 Clarke, E.M., 60 Complex systems, 4, 138–141, 143, 146–149, 153, 154, 155, 157, 161–162, 167, 170, 171 Computer-aided design (CAD), 2, 4, 11, 137–171 Computing with Bose-Einstein condensates, 135, 163, 166–169 Cong, J., 149 Cornell, E.A., 167 Cory, D., 138 Cui, Y., 30 Culbertson, W.B., 30, 70, 74
D Dai, H., 100, 102, 103, 108, 110, 111, 112, 113, 114, 116 Dalibard, J., 167 Datta, K., 24 Datta, S., 100, 103, 106 Defect tolerance, 1–3, 29–47, 52, 56, 67, 69–96 DeHon, A., 18, 30, 35, 51, 52, 57, 61, 70, 73, 76, 95, 138, 149 Dekker, C., 30, 100 Deng, J., 100, 110, 112, 114, 115 Dependability, 8, 9, 11 Derycke, V., 108, 138 Design method, 5, 52, 57, 67, 69 de Veciana, G., 30, 32, 33 Device modeling, 2, 100, 103, 104, 112 Dey, S., 41 Drennan, P., 20 Duan, X., 51 Dwyer, C., 100
E Eastman, J.A., 126 Ejlali, A., 36 Elliott, J.P., 148 El-Maleh, A.H., 29–47 Elnozahy, E., 16 Emerging technology, 100 Ensher, J.R., 167
Index F Fahmy, A., 138 Failure rate, 8, 10, 19, 23, 42, 45, 46 Farmer, D.B., 102, 103, 108, 110, 111, 112, 113, 114 Fault modeling, 72, 76, 79, 87, 95, 96 Fault tolerance, 2, 4, 8, 9, 15, 18, 21, 22, 23, 24, 36, 37, 55, 70, 73, 138 Faure, P., 73, 74 Forshaw, M., 11, 13, 32, 55 Fortes, J.A.B., 14, 30, 31, 33, 37, 42, 44 Fountain, T., 30, 32 Francis, R., 70 Frank, D., 20 Franzon, P.D., 52, 70, 72 Fraser Stoddart, J., 30 Fuhrer, M.S., 103, 104, 114 Fujita, M., 60 Fujita, S., 100, 102, 103, 104, 105, 107, 111, 113, 114, 115, 116, 117, 118, 119
G Gajski, D., 148 Gao, J., 14, 30, 31, 33, 42, 44 Garay, L.J., 167 Gattiker, A., 20 Genoe, M., 148 Ghai D., 9, 14, 20 Gokhale, M., 14, 15 Goldberg, D., 16 Goldberg, E.I., 60, 64 Goldsman, N., 100 Goldstein, S.C., 18, 30, 33, 52, 69, 70–72, 74, 75, 95, 138 Gordon, R.G., 102, 103, 108, 110, 111, 112, 113, 114 Gore, J., 100, 138 Graham, P., 14, 15 Grover, L.K., 164 Guo, J., 100, 102, 103, 106, 108, 110, 111, 112, 113, 114, 116
H Hadzibabic, Z., 167 Haensch, W., 20 Hamzaoglu, I., 42 Han, J., 13, 14, 30, 32, 33, 37, 42, 44, 138 Han, S., 110, 115 Harley, P., 30 Havel, T., 138 Hayes, J.P., 15, 37 Hayne, R., 15
Index Heath, J.R., 30, 52, 70, 74, 138 He, C., 30, 32, 33 Hentschke, R.F., 36 Herr, D.J.C., 51 Hogg, T., 35, 52 Hosseini, S., 16 Huang, C., 51–67 Huang, J., 30, 35, 36 Huang, Y., 30, 51 Huo, X., 138 I Iijima, S., 101 Image processing, 4, 125, 133, 135, 136 Impens, F., 30, 32 Isokawa, T., 8, 15 Iyer, R., 15 J Jacome, M.F., 30, 32, 33 Janes, D.B., 125, 126, 127, 131 Javey, A., 100, 102, 103, 108, 110, 111, 112, 113, 114, 116 Jensen, P.A., 30, 31, 33 Jeppesen, J.O., 30, 52, 66 Ji, B., 20 John, D.L., 100 Johnson, B., 8, 9, 15 Johnson, D., 16 Jonen, P., 138 Jones, S., 126 Jonker, P., 13, 14, 30, 32, 33, 42, 44 Joshi, B., 7–24 Jung, G.-Y., 30, 52, 66 K Kam-wing Ng, 149 Kane, B., 138 Karahaliloglu, K., 125–136 Karanam, R., 24 Karri, R., 55–60, 64 Kastensmidt, F.G., 36 Kastensmidt, F.L., 36 Katkoori, S., 36 Kaufman, L., 15 Kaye, P., 169 Kimura, Y., 104 Klein, D., 138 Klinke, C., 110 Knoch, J., 110 Kolonis, E., 139–171 Konala, S., 73
177 Koren, I., 9, 15 Kougianos, E., 9, 14, 20 Kouklin, N., 128, 131 Krishnaswamy, S., 15, 37 Kuekes, P.J., 18, 35, 52, 70, 74, 138 Kummamuru, R., 138 L Lach, J.C., 52, 70, 72 Laflamme, R., 169 Lake, R., 100 Lala, P., 16, 73 Lashinsky, M., 72, 73 Leblebici, Y., 15 Lee, E., 73 Lee, J., 8, 15 Lee, T., 100, 102, 103, 104, 105, 107, 111, 113, 114, 115, 116, 117, 118, 119 Lemieux, G.G.F., 36 Lent, C.S., 30, 138 Liang, Q., 138 Lieber, C.M., 30, 51 Likharev, K.K., 18, 51 Li, M., 16 Lincoln, P., 18, 73 Lin, Y., 100, 109, 110, 112 Liu, C., 16 Liu, X., 110, 115 Li, X., 30, 52, 66 Lombardi, F., 30, 35, 36 Lundstrom, M., 100, 102, 103, 106, 108, 110, 111, 112, 113, 114, 116 M Madigan, C., 60 Maistri, P., 15 Malik, S., 60 Mani Krishna, C., 9 Manimaran, G., 16 Markov, I.L., 15, 37, 169 Marques-Silva, J.P., 60 Martel, R., 108, 138 Mashiko, S., 8, 15 Mathew, J., 20 Matsui, N., 8, 15 Matthews, M.R., 167 McAndrew, C., 20 McCluskey, E.J., 22, 73, 74 McEuen, P.L., 100, 103, 104, 114, 138 Melouki, A., 29–47 Miller, A.E., 126 Mintmire, J.W., 101 Miremadi, S.G., 36
178 Mishra, M., 30, 33, 70, 74, 75, 95 Mitra, S., 100, 110, 112 Mohanty, S.P., 9, 14, 20 Molecular electronics, 30, 69–96 Moore, E.F., 33 Mosca, M., 169 Moskewicz, M., 60 Mukherjee, A., 24 Mundy, J., 37
N Naeimi, H., 35, 57, 76 Nall, J., 72, 73 Nanoarchitecture, 35, 72, 73, 95 Nano-computing, 1, 2, 5 NanoFabrics, 3, 69–96 Nanotechnology, 4, 29, 30, 32, 35, 41, 52, 55, 137–171 Nanowire, 3, 4, 51–67, 95, 125, 126, 130, 131, 132, 133, 134, 168 Nassif, S., 20 Natori, K., 104 Neuberger, G., 36 Neuromorphic computing, 133 Ng, V., 138 Nicolaidis, M., 137–171 Nielsen, K.A., 30, 52, 66 Nikolic, K., 11, 13, 32, 55 Nishi, Y., 100, 113, 114, 115, 116, 117, 118, 119 Non-conventional computing, 161, 169 Novikov, Y., 60, 64 Nowak, E., 20 O Ohlberg, D.A.A., 30, 52, 66 Oh, N., 22 Okajima, M., 100, 102, 103, 104, 105, 107, 111, 113, 114, 115, 116, 117, 118, 119 Omari, R., 16 Orailoglu, A., 55–60, 64 Orlov, A.O., 138 Orshansky, M., 30, 31, 33 P Papadopoulou, E., 16 Park, H., 103, 104, 114 Park, J., 100, 138 Patel, H.D., 170 Patel, J.H., 42 Patibandla, S., 125–136
Index Patil, N., 100, 110, 112 Patra, P., 9, 14, 20 Paul, B.C., 99–121 Paul, S., 74 Pearson, D., 20 Pennington, G., 100 Peper, F., 8, 15 Picot, O., 74 Pierce, W.H., 30, 31, 33 Piuri, V., 15 Pokalyakin, V., 129, 130, 131, 132 Pradhan, D., 7–24 Pramanik, S., 125, 126, 131, 132, 133 Prasad, M.R., 60 Process variation, 2, 4, 9, 20–21, 100, 112–119 Programmable logic arrays, 51–67 Pulfrey, D.L., 100
Q Qi, Y., 14, 30, 31, 33, 42, 44 Quantum computing, 4, 139, 162–171
R Rad, R.M.P., 71, 73, 76–78, 83–86, 89, 92–94, 95 Rajsuman, R., 74 Ramacahndran, L., 148 Ramanathan, S., 129, 130, 131, 132 Ramesh, A., 9 Ramos, J., 36 Ramsundar, S., 16 Ranganathan, N., 9, 14, 20 Rao, W., 55–60, 64 Ravindran, A., 24 Reichel, J., 168 Reis, R., 36 Reliability, 1–4, 8–10, 12, 13, 19, 20, 22, 23, 31, 32, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 52, 54, 56, 67 Renovel, M., 73, 74 Reorda, M.S., 36 Ricker, R.E., 126 Rohrer, N., 20 Rose, G.S., 52 Rosenblatt, S., 100, 138 Rosewater, D., 71 Roychowdhury, V.P., 125, 126, 127, 131 Roy, K., 108, 110, 111 Roy, S., 14 Rutenbar, R.A., 60 Ryu, K., 100, 112
Index S Sadek, A.S., 11, 13, 32, 55 Sakallah, K.A., 60 Samudrala, P., 36 Sandadi, U., 9 Satisfiability, 3, 57, 61, 62 Savage, J.E., 73 Sazonova, V., 100, 138 Schmid, A., 15 Schober, V., 74 Self assembly, 1, 4, 8, 30, 51, 69, 71, 72, 126–127 Sharma, D., 16 Sharma, T., 9 Shende1, V.V., 169 Shimizu, T., 104 Shirvani, P., 22 Shokourian, S., 74 Shooman, M., 9 Shor, P.W., 164 Shukla, S.K., 14, 15, 37, 51, 170 Siewiorek, D., 8, 9, 16 Sinclair, C.D.J., 156 Singh, A., 30, 31, 33 Singh, J., 20 Snider, G.L., 35, 70, 74, 138 Snider, G.S., 35, 52, 70, 74, 138, 168 Somani, A., 8, 9, 15, 16 Sorin, D.J., 100 Spagocci, S., 30, 32 Spielman, D.A., 73 Stan, M.R., 52, 70, 72 Steane, A.M., 164 Sterpone, L., 36 Stewart, D.R., 30, 52, 66 Stick, D., 164 Stock, S., 167 Stoddart, J.F., 52, 66 Stringari, S., 167 Stroud, C., 72, 73 Strukov, D., 18 Sun, F., 18 Suran, J.J., 33 Swartz, R., 8, 9, 16 Sze, S.M., 132
T Tahoori, M.B., 16, 30, 35, 36, 57, 58, 60, 73, 74 Tamir, Y., 16 Tang, Z.K., 138 Tans, S.J., 100
179 Tao, W., 16 Taylor, E., 37 Tehranipoor, M., 69–96 Tereshin, S., 129, 130, 131, 132 Terfort, A., 138 Testability, 96 Thaker, D.D., 30, 32 Timler, J., 138 Tiwari, A., 36 Tomko, K.A., 36 Tougaw, P.D., 30 Trivedi, K., 9 Tryon, J.G., 30, 31, 33 Tsai, T.M., 73 Twigg, D., 9
V Vaidya, N., 8, 17 Varfolomeev, A., 129, 130, 131, 132 Verschueren, R.M., 100 Viamontes, G.F., 37 Violante, M., 36 Vishwanath, S., 30, 31, 33 von Neumann, J., 8, 13, 14, 30, 31, 54 Vranesic, Z., 70
W Wang, Q., 100, 102, 103, 108, 110, 111, 112, 113, 114, 116 Wang, S.J., 73 Wang, X., 125, 126, 127, 131 Wang, Y., 16, 129, 130, 131, 132 Wang, Z., 72, 75, 78, 80, 87, 95, 96 Wawrzynek, J., 149 Wei, Q., 51 Whang, D., 30 White, C.T., 101 Whitesides, G.M., 138 Wieman, C.E., 167 Williams, R.S., 30, 35, 52, 66, 70, 74, 138 Wilson, M.J., 52, 61, 95 Wind, S.J., 103, 138 Wing, S.J., 108 Wong, H.S.P., 100, 110, 112, 113, 114, 115, 116, 117, 118, 119 Wood, R.G., 60 Wu, C., 149
X Xue-Jie Zhang, 149
180 Y Yaish, Y., 100, 138 Yenilmez, E., 102, 103, 108, 110, 111, 112, 113, 114 Yield, 2, 9, 15, 20, 23, 24, 30, 33, 35, 36, 61, 66, 70, 74, 138 Young, G.H., 149 Yu, A.J., 36 Yue, D-F., 126 Yuzhakov, V., 126 Z Zarandi, H.R., 36 Zaretsky, D., 129, 130, 131, 132
Index Zeineddine, H.A., 30, 31, 33 Zhang, L., 60 Zhang, M., 138 Zhang, T., 18 Zhang, Y., 16 Zhao, C., 41 Zhao, Y., 60 Zheng, Y., 51–67 Zhirnov, V.V., 51 Zhou, C., 100, 110, 112, 115 Zhu, Y., 60 Ziegler, M.M., 52, 70, 72 Zoller, P., 164 Zorian, Y., 24, 74