127 64 8MB
English Pages 263 [257] Year 2020
Lecture Notes in Electrical Engineering 636
Alexander Barkalov Larysa Titarenko Kamil Mielcarek Sławomir Chmielewski
Logic Synthesis for FPGA-Based Control Units Structural Decomposition in Logic Design
Lecture Notes in Electrical Engineering Volume 636
Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Naples, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Università di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München, Munich, Germany Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Stanford University, Stanford, CA, USA Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martin, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University, Palmerston North, Manawatu-Wanganui, New Zealand Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering—quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning:
• • • • • • • • • • • •
Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please contact leontina. [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Associate Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Executive Editor ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada: Michael Luby, Senior Editor ([email protected])
All other Countries: Leontina Di Cecco, Senior Editor ([email protected]) ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, SCOPUS, MetaPress, Web of Science and Springerlink **
More information about this series at http://www.springer.com/series/7818
Alexander Barkalov Larysa Titarenko Kamil Mielcarek Sławomir Chmielewski •
•
•
Logic Synthesis for FPGA-Based Control Units Structural Decomposition in Logic Design
123
Alexander Barkalov Institute of Metrology, Electronics and Computer Science University of Zielona Góra Zielona Góra, Poland
Larysa Titarenko Institute of Metrology, Electronics and Computer Science University of Zielona Góra Zielona Góra, Poland
Vasyl Stus’ Donetsk National University (in Vinnytsia) Vinnytsia, Ukraine
Department of Infocommunications Kharkov National University of Radio Electronics Kharkiv, Ukraine
Kamil Mielcarek Institute of Metrology, Electronics and Computer Science University of Zielona Góra Zielona Góra, Poland
Sławomir Chmielewski Institute of Science and Technology Automatics and Robotics, Metallurgy State Higher Vocational School (PWSZ) Głogów, Poland
ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-3-030-38294-0 ISBN 978-3-030-38295-7 (eBook) https://doi.org/10.1007/978-3-030-38295-7 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
The book is dedicated to the blessed memory of Prof. Vladimir Popovskij
Preface
Our time is characterized by the very rapid development of computer science. Computers and embedded systems can be found in practically all fields of human activity. The up-to-day state-of-the-art in this area is characterized by three major factors. The first factor is a development of ultra complex VLSI such as “system-on-a-programmable chip” with billions of transistors and hundreds of millions of equivalent gates. The second factor is a development of hardware description languages such as VHDL and Verilog that permits to capture a design with tremendous complexness. The third factor is a wide application of different computer-aided design (CAD) tools to design very complex projects in the satisfactory time. These three factors affected significantly the process of hardware design. Now the hardware design is very similar to the development of computer programs. The mutual application of hardware description languages and CAD tools allows concentrating the designer’s energy on the basic problems of design, whereas a routine work remains the prerogative of computers. Tremendous achievements in the area of semiconductor electronics turn microelectronics into nanoelectronics. Actually, we observe a real technical boom connected with achievements in nanoelectronics. It results in the development of very complex integrated circuits, particularly in the field of programmable logic devices. Our book targets FSM-based control units implemented with field programmable gate arrays (FPGA). The largest of FPGA chips have over 7 billion transistors. So, they are so huge, that only a single chip is enough to implement a very complex digital system including a data-path and a control unit. Because of the extreme complexity of modern microchips, it is very important to develop effective design methods targeting particular properties of logical elements in use. As it is known, any digital system can be represented as a composition of a data-path and a control unit (CU). Logic circuits of operational blocks forming a data-path have regular structures. It allows using standard library elements of CAD tools (such as counters, multibit adders, multipliers, multiplexers, decoders and so on) for their design. A control unit coordinates the interplay of other system blocks producing a sequence of control signals. These control signals cause executing some operations by a data-path. As a rule, control units have irregular structures. It vii
viii
Preface
makes the process of their design very sophisticated. Many important features of a digital system, such as hardware amount, performance, power consumption, depend to a large extent on characteristics of its CU. Therefore, to design competitive digital systems with FPGAs, a designer should have fundamental knowledge in the area of logic synthesis and optimization of logic circuits of control units. As the experience of many scientists shows, design methods used by standard industrial packages are far from optimal. Especially it is true in the case of designing complex control units. It means that a designer could be forced to develop his own design methods, next to program them and at last to combine them with standard packages to get a result with desired characteristics. To help such a designer, this book is devoted to solution of the problems of logic synthesis and reduction of hardware amount in control units. We discuss a case when a control unit is represented by the model of finite state machine (FSM). The book contains some original synthesis and optimization methods based on taking into account the peculiarities of a control algorithm and an FSM model in use. Regular parts of these models can be implemented using such library elements as embedded memory blocks. It results in reducing the irregular part of the control units described by means of Boolean functions. It permits decreasing for the total number of look-up table (LUT) elements in comparison with logic circuits based on known models of FSM. Also, it makes the problem of place-and-routing much simpler. The third benefit is the reducing power consumption in comparison with FSM circuits implemented only with LUTs. In our book, control algorithms are represented by graph-schemes of algorithms (GSA). This choice is based on the obvious fact that this specification provides a simple explanation of the methods proposed by the authors. To minimize the number of LUTs in FSM logic circuits, it is possible to use the methods of structural decomposition. It leads to an increasing number of logic levels in the final FSM circuit. But it allows representing some parts of FSM circuit by regular systems of Boolean functions. In turn, it allows using embedded memory blocks (EMB) instead of LUTs. We propose two new approaches for structural decomposition of FSM circuits. The first of them is called a twofold state assignment. The twofold state assignment is based on a partition of the set of states by classes of states such that any Boolean function inside a class could be implemented using a single LUT. Each internal state is encoded in two ways: as an element of the set of FSM states and as an element of some class of states. The final circuit has three levels of logic and regular interconnections. The second approach is a mixed encoding of microoperations. Some microoperations are represented by one-hot codes, while the others are considered as elements of collections of microoperations. It allows reducing hardware in the part of FSM circuit generating microoperations. We combine the proposed methods with such known approaches of structural decomposition as the replacement of logical conditions, encoding of collections of microoperations, transformation of object codes. The proposed methods are used for optimizing logic circuits of Mealy, Moore and combined FSMs. For Mealy FSM, we show how to combine these new methods with the replacement of a state register by a state counter. These methods are based on creating linear chains of states. Before, this approach was used only for Moore
Preface
ix
FSMs. It is known that using counters simplifies the system of input memory functions and, therefore, decreases the number of LUTs in the resulting FSM circuit. We combine this approach using EMBs for implementing some parts of FSM circuits. It allows a significant decreasing for the number of LUTs, as well as eliminating a lot of interconnections in the FSM logic circuit. It saves area occupied by the circuit and diminishes the resulting power dissipation. Of course, it leads to more sophisticated synthesis process than the one connected only with LUTs. The process of FSM logic synthesis can be viewed as a transformation of a control algorithm into some tables describing the behaviour of FSM blocks. These tables are used to find the systems of Boolean functions, which can be used to implement logic circuits of particular FSM blocks. In order to implement corresponding circuits, this information should be transformed using data formats of particular industrial CAD systems. We do not discuss this step in our book. Our book contains a lot of examples illustrating the design of FSMs using the proposed methods. Some examples are illustrated by logic circuits. The main part of the book contains eight chapters. Chapter 1 provides some basic information. It is shown that control algorithms could be implemented using either microprogram control units or finite state machines. The language of GSA is introduced. Next, the connections are shown between GSAs and state transition graphs of Mealy, Moore and combined FSMs. Classical principles are discussed for logic synthesis of Mealy and Moore FSMs. The FSM logic circuits are implemented using system gates. The methods of preliminary assessment of hardware amount are discussed in the last part of the Chapter. Chapter 2 deals with the analysis of methods of structural decomposition. The main idea of these methods is reduced to diminishing the numbers of literals in systems of Boolean functions due to increasing the number of logic levels in FSM circuits. Methods of state assignment are analysed. Next, the basic features of field programmable gate arrays are discussed. It is shown that embedded memory blocks allow implementing systems of regular Boolean functions. The modern design flow targeting FPGA-based projects is analysed. Different methods of structural decomposition are considered such as the replacement of logical conditions, encoding of the collections of microoperations, encoding of the fields of compatible microoperations and verticalization of initial GSA. The methods basing on classes of pseudoequivalent states are discussed for Moore FSM. The FPGA-based structural diagrams of FSM circuits based on structural decomposition are shown. Chapter 3 begins from the general idea of twofold state assignment for Mealy FSMs. The method is based on partitioning the set of states by classes. Each internal state is encoded in two ways: as an element of the set of FSM states and as an element of some class of states. It allows implementing any function for a given class using only a single LUT. The structural diagram and design method are proposed for FPGA-based Mealy FSMs with twofold state assignment. Next, it is proposed the formal method allowing to find the partition with a minimum amount of classes. It is shown that the twofold state assignment could be combined with encoding of collections of microoperations. There are proposed corresponding
x
Preface
models and their synthesis methods. There are proposed methods of diminishing encoding of states and collections of microoperations allowing to diminish the number of logic elements and their interconnections in LUT-based logic circuits The last part of the Chapter is devoted to showing results of investigations of proposed methods of structural decomposition. The standard benchmarks are used for conducting the investigations. Chapter 4 starts from the main idea of using the twofold state assignment for optimizing the circuits of FPGA-based Moore FSMs. This method could be combined with refined state assignment leading to minimizing the block of microoperations. It is shown that using classes of pseudoequivalent states (PES) could be combined together with the twofold state assignment. There is proposed the formal method of partition of the classes of PES leading to the minimal number of LUT-based blocks in the final FSM circuit. Next, it is shown how to combine the twofold state assignment with encoding of the collections of microoperations. The last part of the Chapter is devoted to combining the twofold state assignment with encoding of the fields of compatible microoperations. There are proposed structural diagrams of FSM circuits. The examples of synthesis are given for the majority of proposed FSM models. Chapter 5 deals with optimization of FSM logic circuits by combining the twofold state assignment with transformation of object codes. In the beginning, the idea of the transformation is discussed. Next, it is shown how to combine the twofold state assignment with transformation of microoperations into Mealy FSM’s states. This approach allows removing the direct dependence among logical conditions and input memory functions of Mealy FSM. Further, there are discussed methods based on combining twofold state assignment with transformation of states into microoperations of Mealy FSM. It allows removing direct dependence among logical conditions and output functions of Mealy FSM. The last part of the Chapter is devoted to combining the twofold state assignment with transformation of microoperations into classes of pseudoequivalent states of Moore FSM. Chapter 6 is devoted to hardware reduction based on combining twofold state assignment with replacement of logical conditions. Embedded memory blocks are used for executing the replacement. The replacement for Moore FSMs is based on encoding of the classes of pseudoequivalent states. The possibility of transformation of initial GSA allowing decreasing the number of additional variables is discussed. Next, these methods are discussed for both Mealy and Moore FSMs. Also, it is shown how to combine these two methods with encoding of the collections of microoperations. The last part of the Chapter is devoted to synthesis methods based on transformation of initial GSA. Chapter 7 is devoted to the method of mixed encoding of microoperations. The main idea is discussed regarding Mealy FSMs. There is proposed a formal method allowing a partition of the set of microoperations by two sets. The elements of the first set are encoded by one-hot codes; the elements of the second set are combined into collections of microoperations. Next, the same approaches are discussed for FPGA-based Moore FSMs. There are proposed different structural diagrams of FSMs and corresponding methods of synthesis. The classes of pseudoequivalent
Preface
xi
states are used to optimize the hardware for Moore FSMs. Further, it discusses the proposed methods for combined FSMs. It is shown how to combine different methods of structural decomposition for synthesis of the FPGA-based combined FSMs. At last, the mixed encoding of microoperations for LUT-based Mealy FSMs is discussed. It is proposed to form the collections of microoperations for elements of both parts of the partition of the set of microoperations. Chapter 8 is devoted to the using linear chains of states in Mealy FSMs. The known counter-based models of Moore FSMs are discussed together with corresponding design methods. Then, there are proposed models and design methods for counter-based Mealy FSMs. There are proposed synthesis methods based on natural and extended linear chains of states. Next, the case of synthesis for regular GSA having a single chain of states is discussed. There are proposed different models of counter-based Mealy FSMs based on combining together known and proposed methods of structural decomposition. We hope that our book will be interesting and useful for students and Ph.D. students in the area of Computer Science, as well as for designers of modern digital systems. We think that proposed FSM models enlarge the class of models applied for implementation of control units with modern FPGA chips. Zielona Góra, Poland April, 2019
Alexander Barkalov Larysa Titarenko
Contents
1 FSM-Based Models of Control Units . . . . . . . . . . . . . . . 1.1 Background of Control Units . . . . . . . . . . . . . . . . . . 1.2 Logic Synthesis of Moore FSM . . . . . . . . . . . . . . . . . 1.3 Logic Synthesis of Mealy FSM . . . . . . . . . . . . . . . . . 1.4 Logic Synthesis of Combined FSM . . . . . . . . . . . . . . 1.5 Preliminary Assessment of Hardware Amount in FSM Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
1 1 5 10 13
......... .........
17 21
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . .
23 23 29 34 41 48 53
........ ........ ........
61 61 65
........ ........ ........
74 83 90
4 Twofold State Assignment for Moore FSMs . . . . . . . . . . . . . . . . . . . 4.1 Analysis of Possible Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Synthesis of Moore FSM with the Base Structure . . . . . . . . . . . . .
91 91 96
2 Structural Decomposition in FSM Synthesis . . . . . . . . . . . . 2.1 General Characteristic of Structural Decomposition . . . . 2.2 Characteristic of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Replacement of Logical Conditions . . . . . . . . . . . . . . . . 2.4 Encoding of Microoperations . . . . . . . . . . . . . . . . . . . . 2.5 Structural Decomposition for FPGA-Based Moore FSMs References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Twofold State Assignment for Mealy FSMs . . . . . . . . . . . 3.1 General Idea of the Method . . . . . . . . . . . . . . . . . . . . 3.2 Synthesis of Mealy FSM with the Base Structure . . . . . 3.3 Synthesis of Mealy FSM with Encoding of Collections of Microoperations . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Investigation of Proposed Method . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
xiii
xiv
Contents
4.3 Synthesis of Moore FSM with Encoding of Collections of Microoperations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.4 Encoding of the Fields of Compatible Microoperations . . . . . . . . . 111 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5 Combining Twofold State Assignment with Transformation of Object Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction into Transformation of Object Codes . . . . . . 5.2 Synthesis of Mealy FSMs with Transformation of Microoperations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Synthesis of Mealy FSMs with Transformation of State Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Synthesis of Moore FSMs with Transformation of Microoperations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 139 . . . . . . 149
6 Combining Twofold State Assignment with Replacement of Logical Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Analysis of Possible Solutions . . . . . . . . . . . . . . . . . . 6.2 Synthesis of Basic Model of Mealy FSM . . . . . . . . . . 6.3 Synthesis of Basic Model of Moore FSM . . . . . . . . . 6.4 Synthesis Based on Transformation of GSA . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
151 151 157 164 171 178
7 Mixed Encoding of Microoperations . . . . . . . . 7.1 Mixed Encoding for Mealy FSMs . . . . . . . 7.2 Mixed Encoding for Moore FSMs . . . . . . . 7.3 Synthesis of FPGA-based Combined FSMs 7.4 Mixed Encoding for Combined FSMs . . . . 7.5 Mixed Encoding for LUT-based FSMs . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
181 181 190 196 202 207 211
8 Synthesis of Mealy FSMs with Counters . . . . . . . . . . . . . . . . 8.1 Using Counters in Control Units . . . . . . . . . . . . . . . . . . . 8.2 Using Counters in Mealy FSMs . . . . . . . . . . . . . . . . . . . 8.3 Structural Decomposition for Counter-Based Mealy FSMs References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
213 213 219 226 240
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . 117 . . . . . . 117 . . . . . . 121 . . . . . . 132
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Abbreviations
BE BF BIMF BMO BMTS BRAM BRLC BTMS CFSM CLB CM CMCU CMO CT CU DST EMB EMBer FCMO FF FPGA FSM GFT GSA LCS LUT LUTer MCU MEMO MO
Basic element Block of functions Block of input memory functions Block of microoperations Block of transformation of states into microoperations Block of random access memory Block of replacement of logical conditions Block of transformation of microoperations into states Combined finite state machine Configurable logic block Control memory Compositional microprogram control unit Collection of microoperations Counter Control unit Direct structure table Embedded memory block Logic block consisting from EMBs Fields of compatible microoperations Flip-flop Field-programmable gate arrays Finite state machine Generalized formula of transition Graph-scheme of algorithm Linear chain of states Look-up table Logic block consisting from LUTs Microprogram control unit Mixed encoding of microoperations Microoperation
xv
xvi
MPI MX PAL PES PLA PO PROM RAM RG RLC ROM SBF SFT SG SO SOP ST STG TOC VLSI
Abbreviations
Matrix of programmable interconnections Multiplexer Programmable array logic Pseudoequivalent states Programmable logic arrays Primary object Programmable read-only memory Random access memory Register Replacement of logical conditions Read-only memory System of Boolean functions System of formulae of transitions System gate Secondary object Sum of products Structure table State transition graph Transformation of object codes Very large scale integration circuit
Chapter 1
FSM-Based Models of Control Units
Abstract The chapter provides some basic information. It is shown that control algorithms could be implemented using either microprogram control units or finite state machines. The language of GSA is introduced. Next, the connections are shown with GSAs and state transition graphs of Mealy, Moore and combined FSMs. Classical principles are discussed for logic synthesis of Mealy and Moore FSMs. The FSM logic circuits are implemented with using system gates. The methods of preliminary assessment of hardware amounts are discussed in the last part of the chapter.
1.1 Background of Control Units As a rule, any digital system may be represented as a composition of a data-path and a control unit [8]. Such a representation is based on the principle of microprogram control proposed by M. Wilkes in 1951 [1]. In accordance with this principle, any complex operation is represented as a sequence of elementary operations (microoperations). To control the order of execution of microoperations, special status signals (logical conditions) are used. So, any complex operation is represented as a microprogram represented in terms of microoperations and logical conditions. The microprogram is a particular form of a digital system’s specification. Using this principle, it is possible to represent a digital system as it is shown in Fig. 1.1 [2]. The control unit generates microoperations yn ∈ Y , where Y = {y1 , ..., y N } is a set of microoperations, and systems outputs. Each microoperation (MO) evokes execution of some elementary operation by the data-path. The data-path executes operations under data and produces results of these operations. Each cycle of operation is connected with producing logical conditions xe ∈ X , where X = {x1 , ..., x L } is a set of logical conditions. Values of logical conditions show the state of an operation’s execution. To generate a distributed in time sequence of microoperations, a control unit (CU) analyses values of logical conditions, as well as system inputs generated by environment of a system. To communicate with environment, a system generates special system outputs. So, a control unit can be viewed as a brain of a digital system. © Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7_1
1
2
1 FSM-Based Models of Control Units X
System Inputs
Control Unit System Outputs
Data Y
Data Path Results
Fig. 1.1 Model of digital system based on principle of microprogram control
X
Next Address
Sequencer
Current Address
Control Memory
Y
Start Clock
Fig. 1.2 Structural diagram of MCU
There are two main approaches in organization of the microprogram control [3–5, 9]. A microprogram could be implemented as a program in some high-level programming language. This way is used in microcontrollers. It is the most universal approach for implementing control, for example, in embedded systems [4]. But a rather low performance of such control units is a reverse of the medal (universality). The second way is a hardware implementation of control algorithms. A microprogram could be kept into a special microprogram memory [1, 6]. This approach leads to microprogram control units [7]. Sometimes, these CUs are named automata with “programmed logic” [10]. A microprogram control unit (MCU) can be implemented using three main blocks: a sequencer (SQ), a register of microinstruction address (RAMI) and a control memory (CM). Its structural diagram is shown in Fig. 1.2 [2]. The SQ forms an address of transition to point the next microinstruction to be executed. The RAMI is controlled by pulses Start and Clock. The pulse Start zeroes the RAMI; it corresponds to the beginning of operation. The pulse Clock shows the cycles of MCU. It permits changing content of RAMI. A microprogram is represented as a list of microinstructions kept into the CM. Each microinstruction includes an operation part containing information about microoperations to be executed and address part having information about the next address. The SQ analyses a logical condition xe pointed in the address part. Basing on this analysis, the next address is generated. We do not discuss formal methods of MCU design in this book. These methods could be found in [1, 5, 9, 10] and so on. Let us only point out that many methods used for optimization of MCUs are used now for optimization of control units implemented with VLSI chips.
1.1 Background of Control Units Fig. 1.3 Organization of MCU blocks
3
(a) X
(b) MX
Z
X0 CM
BMO
Y
K(xe)
Fig. 1.4 Structural diagram of FSM
X
Y Combinational Circuit
RG
Φ Start Clock
T
For example, there is a special multiplexer (MX) in the SQ. Its informational inputs are connected with logical conditions xe ∈ X and its control inputs are connected with bits representing a code K (xe ) (Fig. 1.3a). The code K (xe ) is kept into the address part. It points which logic condition should be analysed. A variable x0 is equal to the value of a logical condition represented by K (xe ). This approach has turned into the method of replacement of logical conditions [11, 12] used for optimization of modern CUs. In the fifties, the volume of the control memory was extremely limited [6]. This has necessitated reducing the length of the operation part of microinstruction. To do it, the collections of microoperations (CMO) were encoded using additional variables from the set Z . Next, these codes were decoded by a block of microoperations (BMO) into microoperations yn ∈ Y [5, 9]. It leads to a two-level circuit used for implementing the microoperations (Fig. 1.3b). This method is used now by the designers of modern control units. The second way of implementing CUs is connected with finite state machines (FSM) [13, 20]. An FSM is a sequential circuit. Its outputs depends on pre-history of operation [20]. The pre-history is represented by internal states forming the set A = {a1 , ..., a M }. To implement an FSM logic circuit, the states are encoded by binary codes K (am ) having R bits. To encode the states, internal state variables are used forming the set T = {T1 , ..., TR }. These codes are kept into a special register (RG). As a rule, an RG consists from flip-flops having informational inputs of type D [11, 21, 22]. To change the content of RG, special input memory functions Dr ∈ Φ are used where Φ = {D1 , ..., D R }. An FSM could be implemented as a composition of combinational circuit (CC) and RG (Fig. 1.4). The pulses Start and Clock have the same meaning as for MCU. So, the CC is represented by the following systems of Boolean functions (SBF): Φ = Φ(T, X );
(1.1)
Y = Y (T, X ).
(1.2)
4
1 FSM-Based Models of Control Units
Fig. 1.5 Types of vertices of GSA Γ
(c)
(b)
(a) Start
(d) Yq
End
1
xe
0
The system (1.2) corresponds to Mealy FSM [14]. In the case of Moore FSM [23], output functions are represented by the following system: Y = Y (T ).
(1.3)
So, in the Mealy FSMs outputs yn ∈ Y depend on both logical conditions xe ∈ X and state variables Tr ∈ T . But in the Moore FSM outputs depend only on state variables. This difference leads to different methods of hardware reduction for their logical circuits. Our book is devoted to logic synthesis of FSMs. Logic synthesis is a process by which a specification on an FSM behaviour is turned into a design implementation in terms of logic elements. There are many ways for specifying a behaviour of an FSM. It can be specified by: (1) a state transition graph (STG) [15, 20]; (2) a state transition table [22]; (3) a program in a hardware description language such as Verilog or VHDL [16–18, 33–35, 37]. But in this book we use a language of graph-schemes of algorithm (GSA) [11]. As follows from our previous publications, this language has a high visibility allowing to understand the discussed synthesis methods. The GSA Γ includes four types of vertices (Fig. 1.5). Each GSA includes exactly a single initial vertex corresponding to the starting point of a control algorithm (Fig. 1.5a) and a single final vertex (Fig. 1.5b). A GSA Γ contains a finite amount of operational and conditional vertices. Each operational vertex (Fig. 1.5c) contains an CMO Yq ⊆ Y . It includes microoperations executed in the same instant. It has a single input and a single output. A conditional vertex (Fig. 1.5d) contains some logical condition xe ∈ X . It has two outputs marked by the logic values of this condition xe ∈ X . The conditional vertices are used to organize the branching in the control algorithm. More formalized definition of GSA could be found in [11]. There is an example of a GSA represented by GSA Γ1 (Fig. 1.6). The GSA Γ1 includes 4 operational vertices and 2 conditional vertices. The following sets could be found from Γ1 : X = {x1 , x2 } with L = 2 and Y = {y1 , .., y4 } with N = 4. But there are no marks of states on Fig. 1.6. To synthesize an FSM circuit it is necessary to execute the following actions [11, 19] based on the theory of structural automata developed by V.Glushkov [36]: 1. 2. 3. 4.
Marking the initial GSA Γ by states of FSM. Encoding of the states am ∈ A (the state assignment). Creating a direct structure table (DST) of FSM. Creating systems of Boolean functions representing the FSM logic circuit.
1.1 Background of Control Units
5
Fig. 1.6 Initial GSA Γ1
5. Transformation of SBF to take into account the restrictions of used logic elements. 6. Creating the FSM logic circuit using transformed SBF. Let us discuss this approach for three basic models of FSMs. The synthesis is executing using GSA Γ1 .
1.2 Logic Synthesis of Moore FSM There are the following rules for marking a GSA by states of Moore FSM [10, 11]: 1. The vertices Start and End are marked by a1 . 2. Each operational vertex is marked by a unique state starting from a2 . Let us name this approach as a procedure P1 . Applying P1 to GSA Γ1 leads to the marked GSA Γ1 shown in Fig. 1.7a. As follows from Fig. 1.7a, there are M = 5 states in Moore FSM corresponding to GSA Γ1 . So, there is a set A = {a1 , ..., a5 }. It is enough R state variables for encoding of M states [11, 20]: R = log2 M .
(1.4)
In (1.4) a means a nearest integer number higher of a if a is a fraction or equal to a if a is a whole number [20].
6
1 FSM-Based Models of Control Units
(a)
΄ΥΒΣΥ
Β͢
Ϊ ͢Ϊ ͣ
Βͣ
͢
Ω͢
(b)
͡ a1 ͢
Ϊͤ
Βͤ
Ϊ͢Ϊͣ
͡
Ωͣ
x1x2 y1y2
Βͥ
a2
x1
Ϊ͢Ϊͥ
Βͦ
ͶΟΕ
a3
x1x2 y1y2
y3
Β͢
a5
a4
y1y4
Fig. 1.7 Marked GSA Γ1 (a) and STG corresponding to Γ1 (b)
In the discussed case there is R = 3. It gives the sets T = {T1 , T2 , T3 } and Φ = {D1 , D2 , D3 }. Let us encode the states am ∈ A in the trivial way: K (a1 ) = 000, ..., K (a5 ) = 100. The DST of Moore FSM is constructed on the base of STG. The STG vertices correspond to FSM states, whereas arks to transitions between the states am ∈ A. If a CMO Yq ⊆ Y is generated in the state am ∈ A, then it is written near the corresponding vertex of STG. The arks are marked by input signals causing transitions am , as . Input signals are conjunctions of logical conditions corresponding to a path in GSA Γ leading from am into as . In the discussed case, the STG has M = 5 vertices and H = 7 arks (Fig. 1.7b). Each ark corresponds to a single row of DST. The hth ark of STG determines a vector Vh = am , X h , as , where X h is an input signal determining transition am , as . To get the system (1.1), it is necessary to expand the vector Vh by three additional components. These components are the following: the code K (am ) of the current state am ∈ A; the code K (as ) of the state of transition as ∈ A; the collection of input memory functions equal to 1 to load into RG the code K (as ). Let us denote this collection as Φh (h = 1, H ). So each line of DST is determined by a vector Vh = am , K (am ), as , K (as ), X h , Φh . A DST is a list of vectors Vh (h = 1, H ) corresponding to an STG and state codes. The columns of DST correspond to components of vectors Vh . Besides, there is a CMO Yq ⊆ Y generated in the state am ∈ A in the column am of DST. In the discussed case, the DST is represented by Table 1.1. The column h contains a number of the corresponding row. It is known [11] that functions Dr repeat values of K (as ). Let lr be a value of the r th bit of K (as ) where lr ∈ {0, 1} (r = 1, R). If lr = 1 for K (as ) from the hth row of DST, then there is
1.2 Logic Synthesis of Moore FSM
7
Table 1.1 Direct structure table of Moore FSM for GSA Γ1 am K (am ) as K (as ) Xh a1 a2 (y1 y2 )
000 001
a3 (y3 ) a4 (y1 y2 ) a5 (y1 y4 )
010 011 100
a2 a3 a4 a1 a5 a5 a1
001 010 011 000 100 100 000
1 x1 x¯1 x2 x¯1 x¯2 1 1 1
Φh
h
D3 D2 D2 D3 − D1 D1 −
1 2 3 4 5 6 7
a symbol Dr in the column Φh for this row. For unconditional transitions there is X h = 1 (h = 1, H ). Functions (1.1) depend on product terms Fh determining as Fh = Am X h (h = 1, H ).
(1.5)
In (1.5), the symbol Am stands for the conjunction of state variables corresponding to the code K (am ) from the hth row of DST: Am =
R
Tremr (m = 1, M).
(1.6)
r =1
In (1.6), the symbol lmr stands for the value of r th bit of K (am ); lmr ∈ {0, 1}, Tr0 = T¯r , Tr1 = Tr (r = 1, R). Functions (1.1) are represented in the form of sum-of-products (SOP) [32]. They could be expressed as the following: Dr =
H
Cr h Fh (r = 1, R).
(1.7)
h=1
In (1.7), Cr h ∈ {0, 1} is a Boolean variable equal to 1 if and if only (iff) the symbol Dr is written in the column Φh . The following functions could be derived from Table 1.1: D1 = F5 ∨ F6 ; D2 = F2 ∨ F3 ; D3 = F1 ∨ F3 .
(1.8)
The functions (1.3) depend on the terms Am (m = 1, M). They could be expressed as the following SOP:
8
1 FSM-Based Models of Control Units
Fig. 1.8 State codes of Moore FSM
yn =
M
Cnm Am (n = 1, N ).
(1.9)
m=1
In (1.9), Cnm is a Boolean variable equal to 1 iff a MO yn ∈ Y is generated in the state am ∈ A. The following system could be derived from Table 1.1: y1 = A2 ∨ A4 ∨ A5 ; y3 = A3 ;
y2 = A2 ∨ A4 ; y4 = A5 .
(1.10)
Let us show the Karnaugh map [32] with codes of states am ∈ A. In the discussed case, it is the map shown in Fig. 1.8. Using the Karnaugh map (Fig. 1.8), we can optimize the system (1.10). Using the expansion law [32], we could obtain the following system: y1 = T2 ∨ T3 ; y3 = T1 T¯2 ;
y2 = T2 ; y4 = T3 .
(1.11)
Let us use so called system gates for implementing FSM logic circuits. A system gate is a NAND gate having S = 2 inputs [26, 31]. Let L( f i) be the number of literals in the function f i . The literal is a Boolean variable either with or without negation [19, 32]. The function f i should be transformed if the following condition takes place: (1.12) L( f i ) > S. The transformation is executed using three laws of Boolean algebra [19, 32]: double negation, De Morgan law and associative law. These laws are the following: a¯¯ = a; a(bc) = (ab)c;
¯ a ∨ b = a¯ · b; a ∨ (b ∨ c) = (a ∨ b) ∨ c.
(1.13)
Using Fig. 1.8 we could represent the functions Am as the following: A1 = T¯1 T¯2 T¯3 ; A2 = T¯1 T2 ; A3 = T1 T¯2 ; A4 = T1 T2 and A5 = T3 . After transformation of the system (1.8), we can get the following equations: D1 = F¯5 · F¯6 ;
D2 = F¯2 · F¯3 ;
D3 = F¯1 · F¯3
(1.14)
1.2 Logic Synthesis of Moore FSM
9
So, it is enough to get the negative forms of the terms (1.5) to construct the logic circuit for functions (1.14). Taking into account the conjunctions Am , we could find the following functions: F¯1 = (T¯1 T¯2 )T¯3 ; F¯5 = T1 T¯2 ;
F¯2 = T¯1 T2 · x1 ; F¯6 = T¯3 .
F¯3 = T¯1 T2 · x¯1 x2 ;
(1.15)
Analysis of systems (1.14), (1.15) shows that there are four levels of logic gates (logic levels) in the circuit implementing functions Dr ∈ Φ. Now let us transform the system (1.11): y1 = T¯2 · T¯3 ; y2 = T2 ; (1.16) y3 = T1 T¯2 ; y4 = T3 . There are two levels of system gates in the circuit corresponding to the system (1.16). The FSM circuit is shown in Fig. 1.9. It is necessary NΦ = 15 system gates for implementing the system of input memory functions (1.14). There are NY = 2 system gates in the part of the circuit implementing the system of microoperations (1.16). So, there are N SG = 17 system gates in the Moore FSM circuit (Fig. 1.9). Three master-slave flip-flops form the register RG. These flip-flops have mutual inputs of clearing (Start) and synchronization (Clock).
Fig. 1.9 Logic circuit of Moore FSM for Γ1
6 8 13 13 x1 14 1 10 x1 2 6 x2 7 3 16 x2 4 16 T1 17 5 1 T1 6 6 T2 7 7 19 T2 8 19 T3 2 9 3 T3 10 21 21 20 11 22 Clock 5 Start 8 12 10
24 25 18 14 23 F1 15 15 23 13
D1 D2 D3
8 10 7 27 5 8 28 9 26
16 17
F2 18 26 19 11 12 A2 20 21 27 11 22 12
T1 5 D C TT T1 6 R T2 7 D C TT T2 8 R
F3 23 28
F5 24 11 F6 25
12
T3 9 D C TT T310 R
y1 y2 y3 y4
10
1 FSM-Based Models of Control Units
Now, let us discuss how to synthesize the logic circuit of Mealy FSM. To compare Moore and Mealy FSMs, we should use the same GSA Γ1 . Let us name two FSMs equivalent if they are synthesized using the same GSA Γ .
1.3 Logic Synthesis of Mealy FSM Marking GSA Γ for Mealy FSM is executed using the rules [11]. We name them as procedure P2 . It includes the following rules: 1. The output of start vertex is marked as a1 , as well as the input of end vertex. 2. If the input of a vertex is connected with the output of an operational vertex, then this input is marked by a unique state am ∈ A. 3. If an input is marked, then it could not be marked once more. Applying P2 to GSA Γ1 produces the marked GSA Γ1 (Fig. 1.10a). As follows from (Fig. 1.10a), there are M0 = 3 states in Mealy FSM corresponding to Γ1 . So, there is a set A = {a1 , a2 , a3 }. It is enough R0 state variables for encoding of M0 states: (1.17) R0 = log2 M0 . As follows from (1.17), there is R0 = 2 in the discussed case. It gives the sets T = {T1 , T2 } and Φ = {D1 , D2 }. Let us encode the states am ∈ A in the trivial way: K (a1 ) = 00, K (a2 ) = 01, K (a3 ) = 11.
(a)
(b)
Fig. 1.10 Marked GSA Γ1 (a) and STG (b) for Mealy FSM corresponding to Γ1
1.3 Logic Synthesis of Mealy FSM
11
Table 1.2 Direct structure table of Moore FSM for GSA Γ1 am K (am ) as K (as ) Xh Yh a1 a2
00 01
a3
10
a2 a3 a3 a1 a1
01 10 10 00 00
1 x1 x¯1 x2 x¯1 x¯2 1
y1 y2 y3 y1 y2 − y1 y4
Φh
h
D2 D1 D1 − −
1 2 3 4 5
The STG of Mealy FSM is shown in Fig. 1.10b. Each ark of STG is marked by the pair input signal X h , collection of microoperations Yh . The signal X h determines a transition am , as . The CMO Yh ⊆ Y is generated during the transition am , as . An ark number h(h = 1, H0 ) determines the vector Vh = am , K (am ), as , K (as ), X h , Yh , Φh . The vector Vh determines the hth row of Mealy FSM’s DST. In the discussed case, there is H0 = 5 arks in the STG (Fig. 1.10b). So, the DST of corresponding Mealy FSM includes H0 = 5 rows. Both functions (1.1) and (1.2) depend on product terms (1.5). It means that system (1.2) is the following: H0 Cnh Fh (n = 1, N ). (1.18) yn = h=1
In (1.18), the Boolean variable Cnh is equal to 1, if the symbol yn is written in the hth row of DST (h = 1, H0 ). The following systems could be derived from Table 1.2: D1 = F2 ∨ F3 ; y1 = F1 ∨ F3 ∨ F5 ;
D2 = F1 .
y2 = F1 ∨ F2 ;
y3 = F2 ;
(1.19) y4 = F5 .
(1.20)
The functions Fh (h = 1, H0 ) are the following: F1 = T 1 T¯2 ; F4 = T¯1 T2 x¯1 x¯2 ;
F2 = T¯1 T¯2 x1 ; F5 = T1 T¯2 .
F3 = T¯1 T2 x¯1 x2 ;
(1.21)
Remind that the FSM circuit is implemented using the system gates. It means that functions (1.19)–(1.21) should be transformed. Let us use the code 11 to minimize conjunctions Am (m = 1, M0 ). This code is not used for the state assignment. Using it, we can get: A1 = T¯1 T¯2 ; A2 = T2 ; A3 = T1 . It allows minimizing the system (1.21). Now we could expresses the functions (1.21) as the following: F1 = T¯1 T¯2 ; F4 = T2 x¯1 x¯2 ;
F2 = T2 x1 ; F5 = T1 .
F3 = T2 x¯1 x2 ;
(1.22)
12
1 FSM-Based Models of Control Units
Fig. 1.11 Logic circuit of Mealy FSM for GSA Γ1
x1
1
x1 x2
2 3
x2
4
T1
5
T1
6
T2
7
T2
8
Clock Start
9
6 8 1 7 2 3 13 13 14 7 6 12 15 11 11
11 15 F2 12 19 19 20 13 16
11 12 12 20 12 16 y1 16
F1 11
19
y2 y3 y4
14
F3 15 17 9
F5 16 10 D1 17
18
D2 18 9
10
10
T1 5 D C TT T1 6 R T2 7 D C TT T2 8 R
As follows from Table 1.2, there is no need in implementing the circuit for F4 . The transformed functions (1.19), (1.20) and (1.22) are the following: D1 = F¯1 · F¯2 ;
D2 = F¯1 ;
y1 = F¯1 F¯3 · F¯5 ;
y2 = F¯1 · F¯2 ;
y3 = F¯2 ;
y4 = F¯5 ;
F¯1 = T¯1 T¯2 ;
F¯2 = T2 x1 ;
F¯3 = T2 x¯1 x2 ;
F¯5 = T¯1 .
(1.23)
The system (1.23) is a base to design the Mealy FSM logic circuit (Fig. 1.11). As follows from Fig. 1.11, it is necessary NΦ = 7 system gates to implement the circuit for functions Dr ∈ Φ. Also, it is necessary NY = 6 system gates to implement the system yn ∈ Y . So, there are N SG = 13 system gates in the circuit of Mealy FSM (Fig. 1.11). There is N SG = 17 for equivalent Moore FSM (Fig. 1.9). Besides, there are three flip-flops in the circuit of Moore FSM. But there are only two flip-flops in the circuit of Mealy FSM. There are four levels of logic in the circuit for functions Dr ∈ Φ and six levels of logic for yn ∈ Y . Let us point out that Mealy FSM logic circuit should include additional flip-flops to be stable [29, 30]. To stabilize the Mealy FSM operation, it is possible to use either register RY (Fig. 1.12a) or RX (Fig. 1.12b). In both cases, any fluctuations of logical conditions do not change microoperations yn ∈ Y [30, 35]. The pulse Clk controls the corresponding additional register. The problems of synchronization are beyond the scope of our book.
1.4 Logic Synthesis of Combined FSM
(a)
13
(b)
Fig. 1.12 Stabilization of Mealy FSM using RY (a) or RX (b)
1.4 Logic Synthesis of Combined FSM It is possible that a control unit generates two kinds of control signals. Some signals exist only a short part of a CU cycle. They can be treated as the outputs of Mealy FSM. Other signals exist during the whole cycle. These signals have types of Moore FSM outputs. Such control units could be represented by a model of combined finite state machine (CFSM) [11]. From the theoretical point of view, a CFSM could be represented by the following vector: S = A, X, Y 1 , Y 2 , δ, λ1 , λ2 , a1 . (1.24) As in the previous sections, the set A includes M internal states, the set X consists from L logical conductions. The set Y 1 includes N1 microoperations of Mealy FSM. The set Y 2 includes N2 microoperations of Moore FSM. The set Y includes N = N1 + N2 elements. Of course, the following relations are true: Y 1 ∩ Y 2 = ∅; Y 1 ∪ Y2 = Y. In (1.24), δ is a function of transitions [11]; it determines a state of transition as ∈ A as some function depending on a current state am ∈ A and logical conditions from the set X : as = δ(am , X ). (1.25) The function δ determines, for example, arks in STG of CFSM. The function λ1 is a function of outputs of Mealy FSM: yn = λ1 (am , X ) (yn ∈ Y 1 ).
(1.26)
The function λ2 is a function of outputs of Moore FSM: yn = λ2 (am ) (yn ∈ Y 2 ).
(1.27)
The function λ1 corresponds to the system (1.2), the function λ2 to (1.3). If there is Y 1 = ∅, then CFSM turns into Moore FSM. In this case, elements Y 1 and λ1 should be eliminated from the vector (1.24). If there is Y 2 = ∅, then CFSM turns into Mealy
14
1 FSM-Based Models of Control Units
Fig. 1.13 Structural diagram of CFSM
FSM. In this case, elements Y 2 and λ2 should be eliminated from the vector S. In the common case, a CFSM is represented by the following structural diagram (Fig. 1.13). The block of functions (BF) implements the systems (1.1) and (1.28): Y 1 = Y 1 (T, X ).
(1.28)
The block of microoperations (BMO) implements the system of functions (1.29): Y 2 = Y 2 (T ).
(1.29)
Both systems (1.1) and (1.28) include functions depended on the product terms (1.5). Boolean functions (1.29) depend on the product terms (1.6). Due to it, the operational vertices of GSA Γ should be marked by the states of Moore FSM. But it is possible to diminish the number of states in CFSM in comparison with an equivalent Moore FSM [27, 28]. Let a GSA Γ include operational vertices such as: 1) their outputs are connected with the same input of some vertex of GSA Γ and 2) there are no MOs yn ∈ Y 2 in these vertices. These vertices are marked by the same state am ∈ A. Let us name this procedure of marking as P3 . The GSA Γ2 shown in Fig. 1.14a is marked using the procedure P3 . There are MOs yn ∈ Y 1 shown above the arks of GSA Γ1 . So, there is the set 1 Y = {y3 , y4 , y5 }. There are MOs yn ∈ Y 2 inside the operational vertices of GSA Γ1 . They form the set Y 2 = {y1 , y2 , y6 , y7 }. The set X includes L = 2 logical conditions: X = {x1 , x2 }. The states form the set A = {a1 , ..., a5 } with M = 5. The corresponding STG is shown in Fig. 1.14b. Its arks are marked by the pairs X h , Yh1 where Yh1 ⊆ Y 1 . Each ark corresponds to a vector am , as , X h , Yh1 Using (1.4), the value R = 3 could be found. It gives the sets T = {T1 , T2 , T3 } and Φ = {D1 , D2 , D3 }. Let us encode the states am ∈ A in the way shown in Fig. 1.15. Let us construct a DST of this CFSM. It is constructed using vectors Vh = am , K (am ), as , K (as ), X h , Yh1 , Φh . Besides, there are CMO Yq ⊆ Y 2 in the column am (Table 1.3). Let us find the systems (1.1), (1.28) and (1.29). Let us start from the system (1.1). The functions Dr ∈ Φ do not depend on the terms F6 , F8 , F9 (see table Table 1.3). So, let us find the functions Fh required for functions Dr ∈ Φ: F1 = T¯1 T¯2 T¯3 ; F4 = T1 T¯3 x¯1 x¯2 ;
F2 = T1 T¯3 x1 ; F5 = T2 x1 ;
F3 = T1 T¯3 x¯1 x2 ; F7 = T1 T3 x1 .
(1.30)
1.4 Logic Synthesis of Combined FSM
15
(a)
(b)
Fig. 1.14 Marked GSA Γ2 (a) and STG of CFSM (b)
Fig. 1.15 State codes of CFSM for GSA Γ2 Table 1.3 Direct structure table of CFSM for GSA Γ2 am K (am ) as K (as ) Xh a1 a2 (y1 y2 )
000 100
a3 (−)
010
a4 (y2 y6 )
101
a5 (y7 )
001
a2 a3 a3 a4 a5 a1 a5 a1 a1
100 010 010 101 001 000 001 000 000
1 x1 x¯1 x2 x¯1 x¯2 x1 x¯1 x1 x¯1 1
Yh1
Φh
h
y3 y4 y3 y5 − y3 − y3 − −
D1 D2 D2 D1 D3 D3 − D3 − −
1 2 3 4 5 6 7 8 9
16
1 FSM-Based Models of Control Units
In the system (1.30) we used minimized conjunctions Am (m = 1, M). They could be found from the Karnaugh map (Fig. 1.15) as the following: A1 = T¯1 T¯2 T¯3 ;
A2 = T1 T¯3 ;
A3 = T2 ;
A4 = T1 T3 .
(1.31)
The system (1.1) is represented as: D1 = F1 ∨ F4 ;
D2 = F2 ∨ F3 ;
D3 = F4 ∨ F5 ∨ F7 .
(1.32)
The system (1.28) is the following: y3 = F1 ∨ F3 ∨ F5 ∨ F7 ;
y4 = F2 ;
y5 = F3 .
(1.33)
y6 = A4 ;
y7 = A5 .
(1.34)
Now, let us find the system (1.29): y1 = A2 ;
y2 = A2 ∨ A4 ;
Using the state codes from Fig. 1.15, we can transform the system (1.34) into the following system: y1 = T1 T¯3 ;
y2 = T1 ;
y6 = T1 T3 ;
y7 = T¯1 T3 .
(1.35)
Let us construct the logic circuit for this CFSM. To use the system gates, we should transform the Eqs. (1.30), (1.32), (1.33) and (1.35). It is executed is the same manner as we do it for either Mealy of Moore FSMs. Remind that a system gate is the NAND gate having S = 2 inputs. After the transformation, there are the following systems: F¯1 = T¯1 T¯2 T¯3 ;
F¯2 = T1 T¯3 · x1 ;
F¯3 = T1 T¯3 · x¯1 x2 ;
F¯4 = T1 T¯3 · x¯1 x¯2 ;
F¯5 = T2 x1 ;
F¯7 = T1 T3 · x1 .
D1 = F¯1 · F¯4 ;
D2 = F¯2 · F¯3 ;
y¯3 = F¯1 F¯3 · F¯5 F¯7 ; y1 = T1 T¯3 ;
y2 = T1 ;
D3 = F¯4 F¯5 · F¯7 .
y4 = F¯2 ; y6 = T1 T3 ;
y5 = F¯3 . y7 = T¯1 T3 .
(1.36)
(1.37) (1.38) (1.39)
Using systems (1.36)–(1.39), we can construct the circuit shown in Fig. 1.16. This circuit includes 30 system gates and three flip-flops. Of course, it is necessary 4 additional flip-flops for stabilizing the microoperations yn ∈ Y 1 . To minimize the number of gates, we used common parts of different functions. This approach is a factorization [11]. For example, y1 = A2 . We do not implement
1.4 Logic Synthesis of Combined FSM Fig. 1.16 Logic circuit of CFSM for GSA Γ2
17
6 8 13 13 x1 14 1 10 x1 2 5 x2 9 3 16 x2 4 16 T1 17 5 1 T1 5 6 T2 10 7 19 T2 8 19 T3 2 9 3 T3 10 21 21 20 11 22 Clock 2 Start 4 12 24 24 20 25 7 1 17 1
13 14
F1 15 16
A4 17
15 26 18 23 26 27 31 31 32 28
D1
29
D2
30 31 32
D3
33
F2 18 19
A2 20 21 29 11 22 12
T1 5 D C TT T1 6 R
F3 23
30 11 24 12 25 33
F4 26 11 F5 27
12
20 5 15 23 34 34 27 28 36 36 35 37 18 18 23 23 16 6 9 38 38
y1 y2 34 35 36 37
y3 y4 y5 y6 38
y7
T2 7 D C TT T2 8 R T3 9 D C TT T310 R
F7 28
this circuit using the equation A2 = T1 T¯3 . We just use the fact that the conjunction A2 is a part of the equation for T¯3 (if corresponds to the wire 20 of the circuit from Fig. 1.16. Let us point out that we do not try to get the FSM circuits with minimum amount of gates. We just show how to design FSM logic circuits starting from GSAs. We discuss the optimization problems in the next chapters of this book. Obviously, to find the best solution, it is necessary to implement logic circuits based on different FSM models, As you can see a bit later, there are many approaches permitting optimization of FSM circuits.
1.5 Preliminary Assessment of Hardware Amount in FSM Circuits As follows from [12, 24, 25], there are hundreds and hundreds of variants of FSM structural diagrams. The use of each of these variants for the same GSA Γ leads to FSM circuits with different characteristics. There are three main characteristics of FSM circuits: the hardware amount, the performance, and the consumed energy.
18
1 FSM-Based Models of Control Units
The hardware amount can be estimated either as the number of logic elements in the FSM circuit or as the chip area occupied by this circuit. The smaller the area, the less the length of the interconnections between the circuit elements, the less energy consumption and higher operating frequency. In our book we propose various methods of organization of FSM circuits targeting reducing the hardware amount. It is very important to evaluate the hardware amount without the implementation of the FSM circuit in hardware. This allows a designer to quickly compare the different variants and saves the time of selecting the best of them. In this section we consider how to evaluate the number of system gates in the FSM circuit and the consumed chip area. Let n s ( f i ) be the number of NAND gates having S inputs enough to implement the circuit for a term having i arguments. Let n s ( f¯i ) mean the same for the negation of f i . Let us form a table for terms f i and f¯i for system gates (S = 2). These numbers are shown in Table 1.4. As follows from Table 1.4 there are the following dependences: n 2 ( f i ) = 2i − 2(i ≥ 2);
(1.40)
n 2 ( f¯i ) = n 2 ( f i ) − 1(i ≥ 2).
(1.41)
So, we can use these formulae to assess the number of system gates in FSM circuits. Let us start from Moore FSMs discussed in Sect. 1.2. As follows from (1.14), functions Dr ∈ Φ depend on negations of terms Fh . The equation for y1 (1.10) could be transformed as y1 = A¯ 2 · A¯ 4 · A¯ 5 . So, the functions yn ∈ Y depend on negations of conjunctions Am . It is possible to represent this approach as a structural diagram having 4 blocks consisting from NAND gates (Fig. 1.17). Let us name this structure as P Moore FSM. Table 1.4 Numbers of system gates for f i and f¯i i n 2 ( fi ) n 2 ( f¯i ) i n 2 ( fi ) 1 2 3 4 5
0 2 4 6 8
1 1 3 5 7
6 7 8 9 10
10 12 14 16 18
Fig. 1.17 Structural diagram of P Moore FSM
n 2 ( f¯i )
i
n 2 ( fi )
n 2 ( f¯i )
9 11 13 15 17
11 12 13 14 15
20 22 24 26 28
19 21 23 25 27
1.5 Preliminary Assessment of Hardware Amount in FSM Circuits Table 1.5 Preliminary assessment of P Moore FSM Fh L(Fh ) n 2 ( F¯h ) Dr L(Dr ) n 2 (Dr ) F1 F2 F3 F5 F6
3 3 4 2 1 F T otal
3 3 5 1 0 12
D1 D2 D3 − −
2 2 2 − − Φ
1 1 1 − − 3
19
yn
L(yn )
n 2 (yn )
y1 y2 y3 y4
2 1 2 1
1 0 1 0
Y
2
17
¯ The block NAND1 generates the negations of product terms Fh forming the set F. The block NAND2 generates functions Dr ∈ Φ. The block NAND3 implements the ¯ The block NAND4 generates the negations of conjunctions Am forming the set A. microoperations yn ∈ Y . Let us use Eq. (1.15) to estimate n 2 ( F¯i ), Eq. (1.8) to estimate n 2 (Dr ), Eq. (1.16) to estimate n 2 (yn ). Let us point out that there is no explicit forms of functions Am into the system (1.16). So, we do not calculate n 2 ( A¯ m ). The results are shown in Table 1.5. The following expressions are used to calculate the values of F, Φ, Y and Total : H F= n 2 ( F¯h ); (1.42) h=1
Φ=
R
n 2 (Dr );
(1.43)
n 2 (yn );
(1.44)
r =1
Y =
N n=1
T otal
=
F+
Φ+
Y.
(1.45)
Let us point out that we can get from Table 1.5 that NΦ = F + Φ = 15, NY = Y = 2, and N SG = 17. These numbers are the same as the ones following from the circuit (Fig. 1.9). Now let us evaluate the Mealy FSM from Sect. 1.3. As follows from (1.23), both functions Dr ∈ Φ and yn ∈ Y depend on the negations of terms Fh ∈ F. The term Fh depends on state variables and logical conditions. It allows representing P Mealy FSM as a two-level structure (Fig. 1.18). Using (1.19), (1.20) and (1.22) we can execute the preliminary assessment for P Mealy FSM. The results are shown in Table 1.6.
20
1 FSM-Based Models of Control Units
Fig. 1.18 Structural diagram of P Mealy FSM Table 1.6 Preliminary assessment of P Mealy FSM Fh L(Fh ) n 2 ( F¯h ) Dr L(Dr ) n 2 (Dr ) F1 F2 F3 F5
2 2 3 1 F T otal
1 1 3 0 5
D1 D2 − −
2 1 − − Φ
1 1 − − 2
L(yn )
yn y1 y2 y3 y4
3 2 1 1 Y
n 2 (yn ) 3 1 1 1 6
13
As follows from Table 1.6, there are NΦ = 7, NY = 6 and N SG = 13. These numbers are the same as the ones from Sect. 1.3. Blocks NANDi could be replaced by matrix circuits [30]. In this case, it is necessary to have both direct and complement values of input variables for matrices NAND1 and NAND3. The hardware amount could be calculated as a total circuit area measured in some square units [29]. Let the symbol Si stand for the area of the matrix NANDi. In the case of P Moore FSM, these areas could be represented as S1 = 2(L + R) · H ; S3 = 2R · M;
S2 = H · R; S4 = M · N .
(1.46)
The result of summation for S1 –S4 gives the chip area occupied by P Moore FSM. It is determined as S(P Moor e) = (2L + 3R)H + (2R + N ) · M.
(1.47)
In the discussed case, there are L = 2, R = 3, H = 7, M = 5 and N = 4. It gives S(P Moor e) = 141square units. In the case of P Mealy FSM, we can find that: S1 = 2(L + R0 ) · H0 ; S2 = (R0 + N ) · H0 .
(1.48)
Now we can define the total area S(P Mealy) as a result of summation for S1 and S2 : S(P Mealy) = (2L + 3R0 + N ) · H0 . (1.49)
1.5 Preliminary Assessment of Hardware Amount in FSM Circuits
21
Because of R0 = 2 and H0 = 5, we can find that S(P Mealy) = 70 square units. Both ways of evaluation could be used to compare different models of FSMs. For example, it is obvious that it is better to use the model of P Mealy FSM for GSA Γ1 . The most important is the fact that this conclusion has been made without implementing FSM circuits shown in Figs. 1.9 and 1.11.
References 1. Adamski M, Barkalov A (2006) Architectural and sequential synthesis of digital devices. University of Zielona Góra Press 2. Adamski M, Barkalov A, Bukowiec A (2005) Structures of mealy FSM logic circuits under implementation of verticalized flow-chart. In: Proceedings of the IEEE east-west design and test workshop (EWDTW’05), Kharkov, 2005. Kharkov National University of Radioelectronics, pp 70–74 3. Agerwala T (1976) Microprogram optimization: a survey. IEEE Trans Comput 25(10):962–973 4. Agrawala A, Rauscher T (1976) Foundations of microprogramming. Academic Press, New York 5. Altera. http://www.altera.com. Accessed Jan 2019 6. Amann R, Baitinger U (1989) Optimal state chains and states codes in finite state machines. IEEE Trans Comput-Aided Des 8(2):153–170 7. Anceau F (1986) The architecture of microprocessors. Addison-Wesley, Workingham 8. Asahar P, Devidas S, Newton A (1992) Sequential logic synthesis. Kluwer Academic Publishers, Boston 9. Atmel. http://www.atmel.com. Accessed Jan 2019 10. Bacchetta P, Daldos L, Sciuto D, Silvano C (2000) Low-power state assignment techniques for finite state machines. In Proceedings of the 2000 IEEE international symposium on circuits and systems (ISCAS’2000), vol 2. IEEE, Geneva, pp 641–644 11. Baranov S (1994) Logic synthesis of control automata. Kluwer Academic Publishers, Boston 12. Baranov S (2008) Logic and system design of digital systems. TUT Press, Tallinn 13. Barkalov A (1983) Microprogram control unit as composition of automate with programmable and hardwired logic. Autom Comput Sci 17(4):36–41 14. Barkalov A (1995) Multilevel PLA schemes for microprogram automata. Cybern Syst Anal 31(4):489–495 15. Barkalov A (1998) Principles of logic optimization for Moore microprogram automaton. Cybern Syst Anal 34(1):54–60 16. Barkalov A, Barkalov A Jr (2004) Synthesis of finite-state machines with transformation of the object’s codes. In Proceedings of the international conference TCSET’2004, Lviv, Ukraina. Lviv Polytechnic National University. Publishing House of Lviv Polytechnic, Lviv, pp 61–64 17. Barkalov A, Barkalov A Jr (2005) Design of mealy finite-state machines with the transformation of object codes. Int J Appl Math Comput Sci 15(1):151–158 18. Barkalov A, Beleckij O, Nedal A (1999) Applying of optimization methods of Moore automaton for synthesis of compositional microprogram control unit. Autom Control Comput Sci 33(1):44–52 19. Barkalov A, Bukowiec A (2005) Synthesis of mealy finite-states machines for interpretation of verticalized flow-charts. Theor Appl Inform 5(5):39–51 20. Barkalov A, Dzhaliashvili Z, Salomatin V, Starodubov K (1986) Optimization of a microinstruction address scheme for microprogram control unit with PLA and PROM. Autom Control Comput Sci 20(5):83–87 21. Barkalov A, Salomatin V, Starodubov K, Das K (1991) Optimization of Mealy automaton logic using programmable logic arrays. Cybern Syst Anal 27(5):789–793
22
1 FSM-Based Models of Control Units
22. Barkalov A, Shwec A (1994) Synthesis of compositional microprogram control unit with modified microinstruction addressing. Autom Control Comput Sci 28(5):22–30 23. Barkalov A, Shwec A (1995) Synthesis of compositional microprogram control unit with a code transformer. Autom Comput Sci 29(6):16–24 24. Barkalov A, Titarenko L (2007) Design of control units with programmable logic devices. In: Korbicz J (ed) Measurements. methods, systems and design. Wydawnictwo Komunikacji i Ła˛czno´sci, Warsaw, pp 371–391 25. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of logic circuit of Moore FSM on CPLD. Pomiary Autom Kontrola 53(5):18–20 26. Barkalov A, Titarenko L, Kołope´nczyk M (2006) Optimization of control unit with code sharing. In: Proceedings of the 3rd international workshop of IFAC discrete–event system design (DESDES’06). University of Zielona Góra Press, Rydzyna, pp 195–200 27. Barkalov A, Titarenko L, Kołope´nczyk M (2006) Optimization of control unit with code sharing. In: Proceedings of the IEEE east-west design and test workshop (EWDTW’06), Sochi, Kharkov, 2006. Kharkov National University of Radioelectronics, pp 171–174 28. Barkalov A, Titarenko L, Kołope´nczyk M (2007) Optimization of control memory size of control unit with codes sharing. In: Proceedings of the IXth international conference CADSM 2007 “The experience of designing and application of CAD systems in microelectronics”. Lviv–Polana, Ukraine, pp 242–245 29. Barkalov A, Titarenko L, Wi´sniewski R (2006) Optimization of address circuit of compositional microprogram unit. In: Proceedings of the IEEE east-west design and test workshop (EWDTW’06), Sochi, Kharkov, 2006. Kharkov National University of Radioelectronics, pp 167–170 30. Barkalov A, Titarenko L, Wi´sniewski R (2006) Synthesis of compositional microprogram control units with sharing codes ADN address decoder. In Proceedings of the international conference mixed design of integrated circuits and systems – MIXDES 2006. Łódz, pp 397– 400 31. Barkalov A, We˛grzyn M (2006) Design of control units with programmable logic. University of Zielona Góra Press 32. Barkalov A, We˛grzyn M, Wi´sniewski R (2006) Partial reconfiguration of compositional microprogram control units implemented on FPGAs. In: Proceedings of IFAC workshop on programmable devices and embedded systems (Brno), pp 116–119 33. Barkalov A, Wi´sniewski R (2004) Design of compositional microprogram control units with maximal encoding of inputs. Radioelectron Inform 3:79–81 34. Barkalov A, Wi´sniewski R (2004) Optimization of compositional microprogram control unit with elementary operational linear chains. Control Syst Comput 5:25–29 35. Barkalov A, Wi´sniewski R (2004) Optimization of compositional microprogram control units with sharing of codes. In: Proceedings of the fifth international conference CADD’DD 04, vol 1, Minsk, Belorus, 2004. United Institute of the Problems of Informatics, Minsk, pp 16–22 36. Barkalov A, Wi´sniewski R (2005) Optimization of compositional microprogram control units implemented on system-on-chip. Theor Appl Inform 9:7–22 37. Barkalov A, Zelenjova I (2000) Optimization of replacement of logical conditions for an automaton with bidirectional transitions. Autom Control Comput Sci 34(5):48–53, Allerton Press Inc
Chapter 2
Structural Decomposition in FSM Synthesis
Abstract The chapter deals with analysis of methods of structural decomposition. Main idea of these methods is reduced to diminishing the numbers of literals in systems of Boolean functions due to increasing the number of logic levels in FSM circuits. Methods of state assignment are analysed. Next, the basic features of field programmable gate arrays are discussed. It is shown that embedded memory blocks allow implementing systems of regular Boolean functions. The modern design flow targeting FPGA-based projects is analysed. Different methods of structural decomposition are considered such as the replacement of logical conditions, encoding of the collections of microoperations, encoding of the fields of compatible microoperations and verticalization of initial GSA. The methods basing of classes of pseudoequivalent states are discussed for Moore FSM. The FPGA-based structural diagrams of FSM circuits based on structural decomposition are shown.
2.1 General Characteristic of Structural Decomposition Decomposition is the process of breaking whole into parts, preserving the original properties of the whole. For example, some system is divided by subsystems. The interaction of these subsystems gives results that are indistinguishable from the results generated by the initial system without decomposition. Let us understand that the structural decomposition of FSM is some process of splitting the initial structural diagram by the blocks, each implemented as a separate subcircuit. Each block has common input variables and is characterized by the functional purpose that distinguishes this block from other blocks of the decomposed circuit. The structural decomposition is related to the introduction of some additional variables which are: (1) some functions of initial variables; (2) arguments of some additional functions and (3) arguments of output functions. Let us point out that output functions of some block could be arguments for functions of other blocks. Let an FSM circuit (Fig. 2.1a) be decomposed by three interconnected blocks (Fig. 2.1b). After decomposition, the circuit includes three blocks. © Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7_2
23
24 Fig. 2.1 FSM circuit before (a) and after (b) the decomposition
2 Structural Decomposition in FSM Synthesis
(a)
(b)
Fig. 2.2 FSM as a ‘block box’
The initial circuit has inputs X , outputs Y and state variables T . If an FSM is represented as a “black box”, then the external observer sees only the inputs X and outputs Y (Fig. 2.2). The behaviour of the circuit before and after the decomposition remains unchanged. That is, the observer cannot distinguish between the original and decomposed circuits. The initial FSM is characterized by the SBFs (1.1) and (1.2). The decomposed FSM has two sets of additional variables, P and Z . The variables from the set P are generated as output functions of Block 1: P = P(T, X ).
(2.1)
These variables are treated as arguments of functions Z and Φ generated by Block 2: Z = Z (T, P); (2.2) Φ = Φ(T, P).
(2.3)
The functions Z are input arguments of functions Y generated by Block 3: Y = Y (Z ).
(2.4)
After the decomposition, the functions Y still depend on the logical conditions xe ∈ X and state variables Tr ∈ T . But after the decomposition, these functions are represented as: Y = Y Z T, P(T, X ) . (2.5)
2.1 General Characteristic of Structural Decomposition
25
Analysis of Fig. 2.1 shows that the decomposed circuit has more structural levels than its initial counterpart. This could lead to an increase in the propagation time in the decomposed circuit compared to the circuit without decomposition. However, functions (2.1)–(2.4) are much simpler than functions (1.1), (1.2). This can lead to a decrease in the total number of logic levels in the decomposed circuit compared with the initial circuit. Thus, it is quite possible a paradox, when the decomposed circuit will have fewer logic levels compared to the initial circuit. Typically, the structural decomposition is used to reduce the hardware amount in an FSM circuit [16]. However, it is possible that the performance of the decomposed circuit will be only marginally different from the performance of the single-level circuit (Fig. 2.1a). The structural decomposition can be used in conjunction with three groups of methods [4, 15]: 1. The optimizing encoding of states and additional variables. 2. The Heterogeneous implementation of circuits for different blocks of a decomposed circuit. 3. The functional decomposition of Boolean functions representing an FSM circuit as a whole or its individual blocks. Let us characterized briefly these approaches. Let us understand under the optimizing encoding a replacement of states (or additional variables) by binary codes aimed at optimizing some parameters of an FSM circuit. As a rule, three main parameters could be optimized: (1) the hardware amount; (2) the performance; (3) the consumed energy [8, 15]. The vast majority of methods in this area include different methods of state assignment. The term ‘state assignment’ refers to the replacement of states am ∈ A by their binary codes K (am ) [6, 17, 25]. There are thousands of works devoted to the problem of state assignment. Many of algorithms target the hardware reduction. The simplest algorithm could be found in [3]. It targets FSMs whose state registers consist from D flip-flops. The principle of state encoding is very simple: the more times a state am ∈ A appears in DST, the more zeros its code K (am ) includes. Among the state assignment methods targeting the hardware reduction it should be noted such methods as MUSTANG [13], MUSE [20], ALTO [19], NOVA [37] and many other algorithms [10, 12, 18, 31–36]. One of the most popular algorithms of state assignment is JEDI which is distributed with the system SIS [39, 45]. There are two main approaches in JEDI: the input dominant and the output dominant approaches. The input dominant algorithm assigns higher weights to pairs of present states which assert similar inputs and produce sets of similar next states. It targets maximizing the size of common cubes in the implemented logic function. The output dominant algorithm assigns higher weights to pairs of next states which are generated by similar input combinations and similar sets of present states. It maximizes the number of common cubes in the logic function. The lower-power design is now one of the central points in construction of integrated systems [41]. It is determined by two main issues. First, the power consump-
26
2 Structural Decomposition in FSM Synthesis
tion is critical in the case of portable devices. Second, more expensive packaging is required with increase for the VLSI density and clock frequency. FSMs are parts of modern digital systems. So, it is very important to decrease the power consumption by FSM circuits [40]. Different state assignment methods have been developed targeting the decrease in power consumption of FSM circuits [42–44, 46–49, 51–54, 56, 89, 90]. There are many works where the reducing of energy is achieved due to minimizing the total switching the flip-flops [40, 55–58]. In those cases, the problem of minimum weighted Hamming distance arises. A simple solution of this problem could be found, for example, in [3]. The switching activity could be reduced due the disabling some flip-flops [59–61]. It could be done by either using output enable signal or gating the clock pulses [63]. There are state assignment methods targeting optimization more than a one parameter. For example, the methods [64, 67] optimize area and performance. The methods [41, 46, 66, 90] optimize chip area occupied by FSM circuit and its power consumption. Also, there are methods targeting only the increase of FSM performance [4, 15]. The nature of a GSA Γ influences the methods of state assignment. For example, there are a lot of unconditional interstate transitions in FSMs designed for so named linear GSAs [14]. A linear GSA has more than 75% of operational vertices [14]. In this case, the state register RG could be replaced by a state counter (CT). In such FSMs, an incremental state assignment is executed [65]. It is executed in the following way. If there is an unconditional transition am , as , then the following relation should take place: (2.6) K (as ) = K (am ) + 1. The operation (2.6) is evoked by a special variable y0 . It leads to LCS-based FSM (Fig. 2.3), where LCS means a linear chain of states. The block of input memory functions (BIMF) generates functions Dr ∈ Φ. These functions could change the content of CT for conditional transitions. The block of microoperations (BMO) forms functions yn ∈ Y and the variable y0 . As you can see, it is the Moore FSM. This approach was proposed in [68, 70]. The work [69] coins the term ‘compositional microprogram control unit’ (CMCU) for LCS-based FSMs. The design methods of CMCU could be found in [71–74, 76–84]. The theory of CMCU design is
+1
Fig. 2.3 Structural diagram of LCS-based FSM X
y0 BIMF
Φ
CT
Start Clock
T
BMO
Y
2.1 General Characteristic of Structural Decomposition Fig. 2.4 Logic circuit (a) and Karnaugh map (b) for function (2.7)
(a)
27
(b)
represented in [14]. This approach could be used in the case of conditional transitions [65]. Now let us consider the nature of heterogeneous implementation. As our experience shows [4, 15], there are three different types of SBF creating during the structural decomposition. These functions could be either multiplexed or regular or irregular. A multiplexed function depends on logical conditions xe ∈ X and state variables Tr ∈ T . But in any cycle of FSM operation, such a function is equal to the value of a single variable xe ∈ X . This variable is selected by the corresponding combination of state variables. Let us consider the function p1 = x1 T¯1 T¯2 ∨ x2 T1 T¯2 ∨ x3 T1 T¯2 ∨ x3 T¯1 T2 ∨ x4 T1 T2 .
(2.7)
Obviously, this function could be implemented using a multiplexer (Fig. 2.4). The control inputs of MX are connected with state variables T1 , T2 . Logical conditions xe ∈ X enter the informational inputs of MX. The Karnaugh map (Fig. 2.4b) represents the law of operation for the circuit from Fig. 2.4a. A regular function is determined for more than 50% of possible input assignments. For example, the BMO of Moore FSM is represented by the system of regular functions (1.3). The best way for implementing such functions is using memory blocks [23, 24]. These blocks could be either read-only memory (ROM) or programmable read-only memory (PROM) or random-access memory (RAM). If a Boolean function is determined for less than 50% of possible input assignments, then it is an irregular function. For example, the functions (1.1) and (1.2) are irregular. Consider the following numbers: L = 30, R = 8, H = 2000. It means that only 2000 input assignments are significant for functions (1.1), (1.2). But there are 238 = 274877906944 possible input assignments, because there is L + R = 38. It means that functions (1.1), (1.2) are determined only for 73 ∗ 10−8 % of possible input assignments. These functions could be implemented using, for example, either programmable logic arrays (PLA) or programmable array logic (PAL) chips. The heterogeneous implementation of FSMs is known from the middle of 1970’s [23, 24]. It is connected with the fact that either MX or PROM are quite cheaper than PLA. So, replacement of PLAs by MXs or PROMs (for some parts of the FSM circuit) results in reducing cost of the project.
28
2 Structural Decomposition in FSM Synthesis
Fig. 2.5 Implementation of FSM from Fig. 2.1b
Fig. 2.6 The principle of functional decomposition
Consider the circuit shown in Fig. 2.1b. Let functions (2.1) be multiplexed, functions (2.2), (2.3) irregular and functions (2.4) regular. In this case, the decomposed circuit could be implemented as shown in Fig. 2.5. The methods of heterogeneous implementation could be found in many works. For example, the PLA-based methods are shown in [23, 24, 85]. If the number of arguments of a Boolean function exceeds the number of inputs of a logic element, then the methods of functional decomposition could be applied [86, 88]. In general, the method of functional decomposition is based on representing a Boolean function F(X ) in the following form: F(X ) = H X 0 , G 1 (X 1 ), ..., G I (X I ) .
(2.8)
This equation determines a logic circuit whose structure is shown in Fig. 2.6. These methods are widely used in FSM based on field programmable gate arrays (FPGA). FPGAs include hundreds of thousands of look-up table elements (LUT) [87, 92, 93]. But each LUT has a very limited amount of inputs (around 6). If there is L + R = 50, then the functional decomposition is the only way for implementing an FSM circuit. Mostly, there are methods targeting LUT-based FSMs [94–98]. But FPGAs include embedded memory blocks (EMB) which can be used for implementing FSM circuits. So, there are the methods of functional decomposition targeting EMBs [99– 102, 104–106]. Approaches discussed in this Section are universal. It means that they could be used regardless of the type of FSM model (Mealy, Moore, combined FSM) or the logical elements used to implement an FSM circuit (NAND gates, PLA, PAL, FPGA). But it is possible to get the best results if a designer takes into account the following: 1. Specifics of the used logic elements. 2. Specifics of the used FSM model. 3. Specifics of the control algorithm which is implemented using some FSM model.
2.1 General Characteristic of Structural Decomposition
29
4. Optimization criteria that must be satisfied (minimization of hardware amount, minimizing energy consumption, maximizing performance or a trade-off between these characteristics). This book is devoted to FPGA-based FSMs. We disuses three FSM models (Mealy, Moore, combined FSM). The main goal of all discussed methods is a reduction of hardware in the designed circuits. At the beginning, let us briefly discuss the specifics of modern FPGA chips. This material is mostly taken from our monographs published by Springer [4, 15, 65] and the official websites of main manufactures of FGPAs [87, 92, 93].
2.2 Characteristic of FPGAs Field-programmable gate arrays were invented by designers of Xilinx in 1984 [26]. Their influence on different directions of engineering has been growing extremely fast. One of the most important reasons for this process is a relatively cheap development cost. These chips can replace billions 2NAND gates (system gates) [22]. The first FPGAs were used for implementing simple and glue logic [7]. Now, they have up to 7 billions transistors [7], posses clock frequency acceding gigahertz, their the most advanced technology is 17 nm [107]. The world’s first FPGA XC2064 (Xilinx, 1985) offered 85000 transistors, 128 logic cells, 64 configurable logic blocks (CLB) based on three-input look-up table (LUT) elements having clock frequency up to 50 MHz. In accordance with [18], from 1990 to 2005 FPGA grew 200 times in capacity, became 40 times faster, 500 times cheaper, reduced power consumption in 50 times. Analysis conducted by the authors of [22] shows that from 2005 till 2011 the capacity of FPGA has been increased in at least 10 times. Five companies dominate on the FPGA market: Xilinx, Altera, Atmel, Lattice Semiconductor, Microsemi and QuickLogic. All their products can be found on corresponding homepages [87, 92, 93, 108–110]. Let us point out that now Altera is purchased by Intel. In this chapter we discuss only the basic features of FPGAs relevant to implementing logic circuits of control units. Let us analyze peculiarities of LUT-based FPGAs. As a rule, typical FPGAs include four main elements: configurable logic blocks based on LUTs, matrix of programmable interconnections (MPI), input-output blocks (IOB) and embedded memory blocks (EMB). The organization of an FPGA chip is shown in Fig. 2.7. As a rule, LUTs are based on RAM having limited amount of inputs S(S ≤ 6). A single LUT can implement an arbitrary Boolean function depended on L input variables (L ≤ S) represented by a truth table. A typical CLB includes a single LUT, programmable flip-flop (FF), multiplexer (MX) and logic of clock and set-reset (LCSR). The simplified structure of CLB is shown in Fig. 2.8.
30
2 Structural Decomposition in FSM Synthesis
Fig. 2.7 Simplified organization of FPGA
IOB
IOB
CLB CLB CLB CLB
IOB
MPI
EMB Fig. 2.8 Simplified structure of CLB
1 S
LUT
IOB
CLB CLB CLB CLB EMB
R D
TT
MX
Qj
C LCSR
S
The output of LUT is connected with FF which could be programmed as D, JK, or T flip-flop. The FF could be by-passed due to programmable MX. So, the output Oi of a CLB can be either combinational or registered. The existence of flip-flops allows organization of either registers or counters. Both these devices are used for FSM implementation. To show the progress in FPGA characteristics, let us start from the family Spartan3 by Xilinx [93]. They were introduced in 2002, were powered by 1, 2 V and used the 90 nm technology. They included LUTs having 4 inputs. The chips of Spartan-3 included up to 104 EMBs with 18Kb for each of them. These blocks are named blocks of RAMs (BRAM). So, the chips included up to 1, 87 Mb of BRAMs. The frequency of operation for these FPGAs was variable (from 25 MHz till 325 MHz). Some characteristics of Spartan-3 family are shown in Table 2.1 The second column of Table 2.1 contains the number of system gates (SG) for a chip. The column 4 determines the capacity of memory created by LUTs. It is named a distributed random access memory (DRAM). The structure of CLB has become more and more complex with the development of technology. For example, the CLB of Virtex-7 includes 4 slices having fast interconnections. A slice includes 2 LUTs, four multiplexers, arithmetic logic and two programmable flip-flops (Fig. 2.9). This slice includes 2 LUTs; each of them has S = 4 inputs. Each LUT can implement an arbitrary logic function depended on 4 variables. Using the multiplexer F5 both LUTs are viewed as a single LUT having S = 5. The multiplexer FX combines together outputs of F5 and FX from other slices. So, a slice can implement a Boolean
2.2 Characteristic of FPGAs
31
Table 2.1 Characteristics of Spartan-3 family Device Number CLB SG (K) XC3550 XC35200 XC35400 XC351000 XC351500 XC352000 XC354000 XC355000
1728 4320 8064 17280 29952 46080 62208 74880
50 200 400 1000 1500 2000 4000 5000
FX
LUT1 FS
Arithmetic block
Y
LUT2
X
Capacity in bits BRAMs (K)
DRAM (K)
72 216 288 432 576 720 1728 1872
12 30 56 120 208 320 432 520
Q D CE CLK SR
Q D CE CLK SR
Fig. 2.9 Structural diagram of a slice of Virtex-4 family
functions depending on 5 variables; two slices on 6 variables; four slices (a CLB) on 7 variables. The arithmetic block allows organizing adders and multiplexers. Multiplexers Y and X determine input data for programmable flip-flops. So, each CLB can include either RG or CT. The number of inputs per a LUT is increased up to 5 for Virtex-5 family, whereas CLBs of Virtex-6 and Virtex-7 include LUTs having S = 6. There are different modifications of FPGAs for each family. We do not discuss them. Some characteristics of modern FPGA chips by Xilinx are shown in Table 2.2. Analysis of Tables 2.1 and 2.2 proves our statement about the tremendous progress in FPGAs. Let us point out that modern chips include blocks of digital signal processors and central processing units. But these blocks are not used for FSM design. So, we do not discuss them. As it follows from Table 2.2, modern FPGA includes huge blocks of memory. Let us name these blocks embedded-memory blocks. Modern EMBs have a property of configurability. It means that they have the constant size (V0 ) but both the numbers
32
2 Structural Decomposition in FSM Synthesis
Table 2.2 Characteristics of FPGAs by Xilinx Family Modification Number of slices Virtex-4
Virtex-5
Virtex-6
LX SX FX LX LXT SXT TXT FXT LXT SXT HXT CHT
Virtex-7
T XT HT
10752 – 8908 10240 – 24576 5472 – 63168 4800 – 51840 3120 – 51840 5440 – 37440 17280 – 24320 5120 – 30720 11640 – 118560 49200 – 74400 39360 – 88560 11640 – 37680 44700 – 305400 64400 – 135000 45000 – 135000
Capacity in Kbits BRAMs
DRAM
1296 – 6048 2304 – 5760
168 – 1392 160 – 384
648 – 9936 1152 – 10368 936 – 11664 3024 – 18576 8208 – 11664
89 – 987 320 – 3420 210 – 3420 520 – 4200 1500 – 2400
2448 – 16416 5616 – 25920
390 – 2280 1045 – 8280
25344 – 28304 18144 – 32832 5616 – 14976
5090 – 7640
14760 – 46512 31680 – 64800 21600 – 64800
Technology (nm) 90
65
40
3040 – 6370 1045 – 3650 3475 – 21550
28
6525 – 13275 4425 – 13275
of cells (V ) and their outputs (t f ) can be changed. There are the following typical configurations of EMBs: 36K×1, 18K×2, 8K×4, 4K×8 (4K×9), 2K×16 (2K×18), 1K×32 (1K×36) and 512×64 (512×72), bits [1, 9, 22, 29, 34]. Let an EMB contain V cells having t F outputs. Let V0 be a number of cells if there is t F = 1. So, the number of V can be determined as V = V0 /t F .
(2.9)
In a typical FPGA 60% of power is consumed by the programmable interconnections, 16% is consumed by programmable logic [111]. Replacement of LUTs by EMBs allows decreasing of the number of interconnections. So, it is very important to use EMBs in implementing FSM circuits. The exceptional complexity of FPGA requires using computer-aided design (CAD) tools for designing logic circuits [27]. It assumes development of formal
2.2 Characteristic of FPGAs
33
methods for synthesis and verification of control units [25, 67, 112, 113]. For example, a design process for FPGAs from Xilinx includes the following steps: 1. Specification of a project. A design entry can be executed by the schematic editor (if a design is represented by a circuit), or the state editor (a design in represented by an STG) or a program written with some hardware description language (HDL). The most popular HDLs are VHDL and Verilog [5, 114]. This initial specification is verified and corrected if necessary. 2. Logic synthesis. During this step, the package FPGA Express executes synthesis and optimization of an FSM logic circuit. As an outcome of this step, an FPGA Netlist file is generated. This file is represented in either EDIF or XNF format. During this step, library cells from system and user libraries are used. 3. Simulation. The functional correctness of an FSM is checked. This step is executed without taking into account real propagation times in a chip. If the outcome of simulation is negative, then the previous steps should be repeated. 4. Implementation of logic circuit. Now the Netlist is translated into an internal format of CAD system. Such physical objects as CLBs and chip pins are assigned for initial Netlist elements. This step is named the packing. The step of mapping is the first stage of the packing. The mapping refers to the process of associating entities such as gate-level functions in the gate-level netlist with the LUT-level functions available on the FPGA [26]. It is not a one-to-one mapping because each LUT can be used to represent a number of logic gates [28]. The mapping step gives results for executing the packing. During this step, the LUTs and flip-flops are packed into the CLBs. Both mapping and packing steps are very difficult because there are many variants of their solutions. Following packing the step of place-and-route is executed. Now we know the connections between CLBs and parts of logic functions are implemented. But there are many ways how these CLBs could be placed in the FPGA. The placement problem is also very difficult because hundreds of thousands CLBs should be placed. During the routing, it is necessary to decide how to connect all CLBs for a particular project. This step should be executed in a way giving the maximum possible performance. Obviously, the outcome of placement affects tremendously the outcome of routing. When routing is finished, the real performance could be found. Also, the BitStream is formed which will be used for chip programming. 5. Project verification. The final simulation is performed where the actual values of delays among the physical elements of a chip are used. If outcome of this step is negative (the actual performance of an FSM is less than it is necessary), then the previous steps of the design process should be repeated. 6. Chip programming. This step is connected with the writing of the final bit stream into the chip. One of the most important roles in the design process plays the step of logic synthesis. Let us analyse this step for FPGA-based FSMs. The synthesis is a transformation of the initial specification of a project into the structural specification where elements of lower abstraction levels are used [115]. The synthesis process is
34
2 Structural Decomposition in FSM Synthesis
repeated till each element to be assigned is represented by some library element. In the case of FSM with FPGAs, the library elements are LUTs and EMBs. An FSM circuit includes LUTs and flip-flops. To get a structure of FSM, the sequential synthesis is executed. It transforms specifications of FSM (GSA, STG) into structure tables describing some parts of an FSM logic circuit. Next, the systems of Boolean functions are derived from those tables. The stage of logic synthesis follows the sequential synthesis. Now, the functions are transformed into smaller subsystems. Each of these subsystems could be implemented using either a LUT or an EMB of a particular FPGA chip. Both these steps are considered in our book. We combine them in a single stage of synthesis of FSM logic circuit. Now let us analyse the most known methods of structural decomposition. In the next sections of this chapter we discuss general approaches and their implementation in FPGA-based FSMs.
2.3 Replacement of Logical Conditions If an FSM structural diagram has only a single structural level, then let us name it P FSM. The structural diagram of P FSM is shown in Fig. 1.4. For P FSMs, functions (1.1), (1.2) could depend on up to R + L arguments. In many cases the value R + L exceeds the number of inputs of logic elements used for implementing an FSM circuit. As a result, the functions Dr ∈ Φ and yn ∈ Y are implemented as a multilevel circuits. This leads to deterioration of FSM characteristics. To decrease the number of arguments in functions (1.1), (1.2), the method of replacement of logical conditions (RLC) could be used [15, 23, 24]. Let X (am ) be a set of logical conditions determining transitions from the state am ∈ A, where X (am ) ⊆ X . Let us find the value of G determining as G = max X (a1 ), ..., X (a M ) .
(2.10)
Let the following condition take place: G L.
(2.11)
In this case, the RLC could be used. The main idea of RLC is reduced to finding a set P = { p1 , ..., pG } whose elements replace the logical conditions xe ∈ X . To replace the logical conditions, it is necessary to find the system (2.1). If it is found, then the systems (1.1), (1.2) are replaced by the systems (2.3) and (2.4) where the system (2.4) is the following: Y = Y (T, P).
(2.12)
2.3 Replacement of Logical Conditions
35
Fig. 2.10 Structural diagram of MP Mealy FSM
Fig. 2.11 Structural diagram of MP Moore FSM
The system (2.1) determines a block of RLC (BRLC). The systems (2.3) and (2.12) determine the block BF. Combination of BRLC, BF and RG determines MP Mealy FSM (Fig. 2.10). In the case of MP Moore FSMs, there are three blocks of logic and the RG. The BRLC executes the operation of RLC, the BIMF generates functions (2.3), and the BMO functions (1.3). The structural diagram of MP Moore FSM is shown on Fig. 2.11. Let us discuss the synthesis method of MP Mealy FSM. The method includes the following steps: 1. 2. 3. 4. 5. 6. 7.
Marking the initial GSA Γ and finding the set A. Executing the state assignment. Constructing the direct structure table of P FSM. Executing the RLC. Constructing the transformed DST. Constructing the functions (2.1), (2.3), (2.12). Implementing the FSM logic circuit.
Let us discuss an example of MP Mealy FSM synthesis. Let us start from the DST of Mealy FSM S1 (Table 2.3). So, the steps 1–3 are already executed. It is possible to find the following sets and values from Table 2.3: X = {x1 , ..., x4 }, Y = {y1 , ..., y5 }, T = {T1 , T2 }, Φ = {D1 , D2 }, L = 4, N = 5, R = 2 and H0 = 8. The following sets X (am ) could be derived from Table 2.3: X (a1 ) = ∅, X (a2 ) = {x1 , x2 }, X (a3 ) = {x3 }, X (a4 ) = {x4 }. Using (2.10), we can find G = 2. It gives the set P = { p1 , p2 }. Let us construct the table of RLC. The table has columns marked by variables pg ∈ P and M0 columns marked by states am ∈ A. If a variable pg replaces a variable xe for the state am ∈ A, then the symbol x1 is written on the intersection of the row am and the column pg of the table. To minimize hardware in BRLC, any variable xe ∈ X should be placed in the same column pg . There are formal methods for solution of
36
2 Structural Decomposition in FSM Synthesis
RLC problem. But they are required in the case of complex FSMs [23, 24]. In this simple case, the table of RLC (Table 2.4) is constructed in the trivial way. We add the column K (am ) in Table 2.4. It is used for constructing the system (2.1). In the discussed case, it is the following system: p1 = T¯1 T¯2 x1 ∨ T1 T¯2 x3 ;
p2 = T¯1 T¯2 x2 ∨ T1 T2 x4 .
(2.13)
Obviously, there is p1 = x1 for K (a2 ) and p1 = x3 for K (a3 ). So, functions (2.1) belong to the class of multiplexed functions. It is necessary two MXs to implement BRLC corresponding to (2.13). To construct the systems (2.3) and (2.12), it is necessary to construct the transformed DST of MP Mealy FSM. In this table, the column X h of initial DST is replaced by the column Ph . The column Ph includes variables pg ∈ P. For example, the conjunction x¯1 x¯2 (the row 4 of Table 2.3) should be replaced by the conjunction p¯1 p¯2 . It follows from Table 2.4. The transformed DST is represented by Table 2.5 for the discussed case. Systems Dr ∈ Φ and yn ∈ Y depend on terms Fh represented as Fh = Am Ph (h = 1, H0 ).
(2.14)
In (2.14), the symbol Ph stands for the conjunction of variables pg ∈ P from the row number h. For example, F1 = T¯1 T¯2 , F2 = T¯1 T2 p1 , F3 = T¯1 T2 p¯1 p2 and so on.
Table 2.3 DST of P Mealy FSM S1 am K (am ) as K (as ) a1 a2
00 01
a3
10
a4
11
a2 a2 a3 a4 a1 a3 a1 a4
01 01 10 11 00 10 00 11
Table 2.4 Table of RLC for MP FSM S1 am p1 a1 a2 a3 a4
− x1 x3 −
Xh
Yh
Φh
h
1 x1 x¯1 x2 x¯1 x¯2 x3 x¯3 x4 x¯4
y1 y2 y3 y2 y4 y1 y3 y5 y1 y2 y1 y3 y5 y1 y4 y3
D2 D2 D1 D1 D2 − D1 − D1 D2
1 2 3 4 5 6 7 8
p2
K (am )
− x2 − x4
00 01 10 11
2.3 Replacement of Logical Conditions
37
Table 2.5 Transformed DST of MP Mealy FSM S1 am K (am ) as K (as ) Ph a1 a2
00 01
a3
10
a4
11
a2 a2 a3 a4 a1 a3 a1 a4
Fig. 2.12 Logic circuit of MP Mealy FSM S1
01 01 10 11 00 10 00 11
1 P1 P¯1 P2 P¯1 P¯2 P1 P¯1 P2 P¯2
x1 x2 x3 x4 T1 T2
1 2 3 4
3 1
7 8
h
y1 y2 y3 y2 y4 y1 y3 y5 y1 y2 y1 y3 y5 y1 y4 y3
D2 D2 D1 D1 D2 − D1 − D1 D2
1 2 3 4 5 6 7 8
3 2 MX 1 1 0
P1 9 9 1 10 2 5 3 6 4
PLA
1 2
4
P2 3 MX 10 2 1 11 D 2 0 12 1 RG D2 1 7 R 2 8 C
2
Clock Start
Φh
5 6
5 6
Yh
5 6
1 2 3 4 5 11 6 D1 12 7 D2 8
T1
y1 y2 y3 y4 y5
T1 5
T2 T2 6
The functions (2.3) are represented in the form (1.7). The functions (2.12) are represented as the SOP (1.9). For example the following Boolean functions could be derived from Table 2.5: D1 = F3 ∨ F4 ∨ F6 ∨ F8 , y1 = F1 ∨ F4 ∨ F5 ∨ F6 . Let a designer could use MXs having 2 control inputs and PLAs having 4 inputs, 8 outputs and 8 product terms. If PLAs are used, then it is necessary 2 chips for implementing systems (1.1), (1.2). But only a single PLA is enough for MP Mealy FSMS1 (Fig. 2.12). In this circuit, M X 1 implements the equation for p1 , M X 2 for p2 . Control inputs of MXs are connected with state variables. A single PLA implements input memory functions Dr ∈ Φ and microoperations yn ∈ Y depended on terms (2.14). The register RG includes two D flip-flops with joint inputs of clearing (R) and synchronization (C). Analysis of Figs. 1.2 and 1.3 shows that the RLC was used by M. Wilkes in 1951. This very approach was used for hardware reduction in FSM circuits based on customized matrices [3, 117], PLAs [23, 24, 118], PALs [4] and nano-PLAs [119]. So, this approach is universal and could be used in the case of FPGA-based FSMs.
38 Fig. 2.13 Structural diagrams of RLC-based Mealy FSMs
2 Structural Decomposition in FSM Synthesis
(a)
X
T
(b)
X EMBer1
LUTer P Clock Start
T EMBer Y
T
P Clock Start
T EMBer Y
In FPGA-based FSMs, the RLC is used when EMBs implement FSM circuits. This approach is discussed in many articles, for example, in [15, 46, 89, 101, 104, 106, 123]. In all cases, the structural diagram includes a block consisting from LUTs and a block consisting from EMBs. Let us denote the first block as LUTer and the second as EMBer. Let us discuss RLC-based Mealy FSMs. An FSM could be implemented using a single EMB if the following condition takes place [124]: (2.15) 2 L+R0 (N + R0 ) ≤ V0 . If (2.15) is violated, then two RLC-based models are possible (Fig. 2.13). As follows from Fig. 2.13, the BRLC could be implemented either as LUTer (Fig. 2.13a) or as EMBer1 (Fig. 2.13b). The BF is implemented as EMBer. We assume that EMBs could be synchronized [87, 93]. Because of it, there is no RG in these circuits. We use the same approach in this book. Let the following conditions take places: 2G+R0 (N + R0 ) ≤ V0 ;
(2.16)
R0 + N ≤ t F .
(2.17)
The value of t F in (2.17) is taken from the configuration of EMB S A , t F such that the condition (2.16) takes place. The symbol S A stands for the number of address inputs of EMB. If conditions (2.16), (2.17) are true, then there is only a single EMB in EMBer. If condition (2.16) takes place and condition (2.17) is violated, then it is necessary n E blocks EMBs in EMBer:
R0 + N . (2.18) nE = tF If condition (2.16) is violated, then there is no sense in using RLC. We do not discuss Moore FSMs in this section. They have some specifics discussed a bit further. Let an EMB have a configuration S A , t F such that condition (2.17) takes place as well as the following condition:
2.3 Replacement of Logical Conditions X
39 P
1
LUTer
2
X
Y
EMBer Start Clock
T
Fig. 2.14 Splitting set of logical conditions for RLC-based Mealy FSMs
(a)
(b)
(c)
Fig. 2.15 Transformation of GSA
S A − (L + R) ) = Δ S > 0.
(2.19)
In this case, it is possible to optimize the block BRLC [15, 125]. Let us split the set X by sets X 1 and X 2 , where |X 1 | = L − Δ S , |X 2 | = Δ S . Obviously, it is necessary to replace only the variables xe ∈ X 1 . The variables xe ∈ X 2 enter EMBer (Fig. 2.14). Comparison of Fig. 2.13a and Fig. 2.14 shows that it should be fewer LUTs in LUTer of FSM with splitting the set of logical conditions. If the splitting leads to decreasing for G, then the process could be repeated. Three methods are known [118] leading to hardware reduction in BRLC: 1. Transformation of initial GSA Γ . 2. Special encoding of states. 3. Encoding of logical conditions. Let us briefly discuss these methods. Let us consider Fig. 2.15. The transitions from state a6 (Fig. 2.15a) depend on three logical conditions. Let the subgraph (Fig. 2.15a) be a part of some GSA Γ1 . As follows from Fig. 2.15a, the value of G could not be less than 3. Let it be M0 = 20. Let us introduce an operational vertex into this subgraph (Fig. 2.15b). Now, there is X (a6 ) = {x1 , x3 }. So, it is possible that G = 2. But a new state a21 is added into the set A.
40
2 Structural Decomposition in FSM Synthesis
Table 2.6 Table of RLC for Mealy FSM S2 pg \ a1 a2 a3 a4 a5 a6 a7 a8 am p1 p2
x1 x3
− −
− −
− −
x3 x4
− −
− −
x5 −
a9
a10 a11 a12 a13 a14 a15 a16 a17
− −
− −
− −
− −
− −
− −
x6 x7
− −
− −
Fig. 2.16 Outcome of special state encoding for S2
If two vertices are introduced into initial subgraph, then 2 states are added to A (Fig. 2.15c). But now there is X (a6 ) = {x1 }. This approach leads to decreasing the value of G. To do it, we should transform all subgraphs where the cardinal number of X (am ) exceeds the desired value of G. The decreasing for G leads, for example, to increasing for set X 2 in the case of splitting X . It can lead for decreasing of hardware amount in LUTer. But this approach has two negative features. First, it is possible the increasing for R0 due to introducing additional states. Second, it is necessary more cycles of FSM operation for executing a control algorithm than in the case of initial GSA Γ . Now let us discuss the special encoding of states. Let us consider Table 2.6. It shows RLC for some Mealy FSM S2 . There is M0 = 17 and R0 = 5 for S2 . But only M A = 4 states have X (am ) = ∅. Let us use the code with all zeros for a1 ∈ A, then next codes starting from 00001 are reserved for the states with conditional transitions. The remained codes are used for other states (Fig. 2.16). Using Table 2.6, we can find the following equations: p1 = A1 x1 ∨ A5 x3 ∨ A8 x5 ∨ A15 x6 ; p2 = A1 x2 ∨ A5 x4 ∨ A15 x7 . Using the codes from the Karnaugh map (Fig. 2.16), these equations could be transformed into the system: p1 = T¯2 T¯3 x1 ∨ T¯2 T3 x3 ∨ T2 T¯3 x5 ∨ T2 T3 x6 ; p2 = T¯2 T¯3 x2 ∨ T¯2 T3 x4 ∨ T2 T3 x7 .
(2.20)
As follows from (2.20), only state variables T2 , T3 enter the BRLC. Let us form the set T = {T2 , T3 }. In general case the FSM from Fig. 2.13a, for example, is represented by the following structural diagram (Fig. 2.17). If |T | < |T |, then this approach leads to diminishing for the number of LUTs. We do not see any negative effect of this approach.
2.3 Replacement of Logical Conditions
41
Fig. 2.17 Structural diagram of Mealy FSM with special state encoding
Fig. 2.18 Structural diagram of Mealy FSM with encoding of logical conditions
P
X
LUTer
EMBer
T
Y
Start Clock
Z
The encoding of logical conditions could be executed for rather small values of G [118]. Let the set X ( pg ) ⊆ X include logical conditions replaced by the variable pg ∈ P. Let it be |X ( pg )| = L G . Let us encode logical conditions xe ∈ X ( pg ) by binary codes K (xe ) having Rg bits: Rg = log2 L G .
(2.21)
Let us use the variables zr ∈ Z g for the encoding where |Z g | = Rg . It leads to the following circuit (Fig. 2.18). In Fig. 2.18, the set Z is determined as Z 1 ∪ Z 2 ∪ ... ∪ Z G . For example, in the case of S2 , there are K (x1 ) = K (x2 ) = 00, K (x3 ) = K (x4 ) = 01, K (x5 ) = 10 and K (x6 ) = K (x7 ) = 11. Because the same codes K (xe ) are always generated together, then it is enough only two variables for the encoding. So, there is Z = {z 1 , z 2 }. Now, the system (2.20) is transformed into the system (2.22): p1 = z¯1 z¯2 x1 ∨ z¯1 z 2 x3 ∨ z 1 z¯2 x5 ∨ z 1 z 2 x6 ; p2 = z¯1 z¯2 x2 ∨ z¯1 z 2 x4 ∨ z 1 z 2 x7 .
(2.22)
Using this method requires that the following condition takes place: t F − (R0 + N ) > |Z |.
(2.23)
So, it should be free outputs of EMBs implementing the circuit of EMBer.
2.4 Encoding of Microoperations The main aim of these methods is a decrease for the number of arguments in functions (1.2) as compared to L + R0 or G + R0 [4]. These methods have been proposed in
42
2 Structural Decomposition in FSM Synthesis
Table 2.7 Direct structure table of Mealy FSM S3 am K (am ) as K (as ) Xh a1
000
a2
001
a3 a4
010 011
a5
100
a6
101
a2 a3 a4 a5 a2 a4 a5 a6 a6 a1 a6 a1
001 010 011 100 001 011 100 101 101 000 101 000
x1 x¯1 x2 x3 x2 x¯3 x¯2 1 x3 x¯3 x4 x¯3 x¯4 x5 x¯5 1
Yh
Φh
h
y1 y2 y3 y1 y5 y6 y7 y8 y9 y2 y8 y5 y7 y4 y6 y8 y7 y8 y1 y2 y3 − y1 y5 y6 y9
D3 D2 D2 D3 D1 D3 D2 D3 D1 D1 D3 D1 D3 − D1 D3 −
1 2 3 4 5 6 7 8 9 10 11 12
Fig. 2.19 Structural diagram of PY Mealy FSM
the mid-1950s to reduce the capacity of ROMs in computers with microprogrammed control [38, 126, 129–135]. There are two main approaches used for optimizing the FSM part implementing system (1.2): 1. The Encoding of collections of microoperations. 2. The Encoding of the fields of compatible microoperations. Let us discuss these approaches. Let us explain them using a DST of Mealy FSM S3 (Table 2.7). There are Q = 8 different CMOs in Table 2.7: Y1 = ∅, Y2 = {y1 , y2 , y3 }, Y3 = {y1 , y5 , y6 }, Y4 = {y7 , y8 }, Y5 = {y9 }, Y6 = {y2 , y8 }, Y7 = {y5 , y7 }, Y8 = {y4 , y6 , y8 }. Also, the following parameters could be found: R0 = 3, N = 9, H0 = 12, L = 5. Let us discuss the methods of encoding of CMOs. Let us encode each CMO Yq ⊆ Y by a binary code K (Yq ) having R Q bits: R Q = log2 Q .
(2.24)
Let us use variables zr ∈ Z for encoding of CMOs where |Z | = R Q . Now a Mealy FSM could be implemented as PY Mealy FSM (Fig. 2.19). In PY Mealy FSM, the block of functions implements system (1.1) and (2.25): Z = Z (T, X ).
(2.25)
2.4 Encoding of Microoperations
43
Fig. 2.20 Codes of CMOs for FSM S3
The BMO implements the system (2.4). Functions (2.4) are regular. So, the BMO could be implemented using memory blocks (ROM, RAM, EMB). There are the following steps in the design method of PY FSM: 1. 2. 3. 4. 5. 6. 7. 8.
Marking the initial GSA Γ and finding the set A. Executing the state assignment. Constructing the DST of P Mealy FSM. Finding the collections of microoperations Yq ⊆ Y . Encoding of CMOs Yq ⊆ Y . Constructing the transformed DST. Constructing the systems (1.1), (2.4) and (2.25). Implementing the FSM logic circuit.
The steps 1–4 are already executed for Mealy FSM S3 . Let us discuss the step 5. If a memory block is used for implementing BMO, then codes K (Yq ) do not affect the hardware amount. But if other logic elements are used, it is necessary to optimize the system (2.4). In the discussed case, the following SBF could be constructed for MOs yn ∈ Y : y1 = Y2 ∨ Y3 ; y2 = Y2 ∨ Y6 ; y3 = Y2 ;
y4 = Y8 ; y5 = Y3 ∨ Y7 ; y6 = Y3 ∨ Y8 ;
y7 = Y4 ∨ Y7 ; y8 = Y4 ∨ Y6 ∨ Y8 ; y9 = Y5 .
(2.26)
Let us encode the CMOs in the way minimizing the number of product terms in (2.26). The initial system (2.26) has 16 terms. Using (2.24), the value R Q = 3 could be found. It gives the set Z = {z 1 , z 2 , z 3 }. Let us encode CMOs as shown in the Karnaugh map (Fig. 2.20). Using the codes from Fig. 2.20, it is possible to transform (2.26) into the following system: y2 = z 1 z¯3 ; y3 = z 1 z¯2 z¯3 ; y1 = z 1 z¯2 ; y4 = z 1 z 2 z 3 ; y5 = z 1 z¯2 z 3 ∨ z¯1 z 2 z 3 ; y6 = z 1 z 3 ; (2.27) y8 = z 1 z¯3 ∨ z 1 z 2 ; y9 = z¯1 z¯2 z 3 . y7 = z¯1 z 2 ; There are 11 terms in system (2.27). It is 32% less than in system (2.26). In addition, some functions of (2.27) depend on less than three variables. Different algorithms of encoding of CMOs could be found in [2, 23, 24]. The transformation of initial DST is reduced to the replacement of the column Yh by Z h . There are variable zr ∈ Z in the column Z h equal to 1 in the code K (Yq ) of a
44
2 Structural Decomposition in FSM Synthesis
Table 2.8 Transformed DST of Mealy FSM S3 am K (am ) as K (as ) Xh a1
000
a2
001
a3 a4
010 011
a5
100
a6
101
(a) Clock Start
a2 a3 a4 a5 a2 a4 a5 a6 a6 a1 a6 a1
(b)
X
T LUTer1
Clock Start
001 010 011 100 001 011 100 101 101 000 101 000
x1 x¯1 x2 x3 x2 x¯3 x¯2 1 x3 x¯3 x4 x¯3 x¯4 x5 x¯5 1
(c)
X
T LUTer
Z
Z
LUTer2
EMBer
Y
Y
Clock Start
Zh
Φh
h
z1 z1 z3 z2 z3 z1 z2 z2 z3 z1 z2 z3 z2 z1 − z1 z3 z3
D3 D2 D2 D3 D1 D3 D2 D3 D1 D1 D3 D1 D3 − D1 D3 −
1 2 3 4 5 6 7 8 9 10 11 12
(d)
X
T EMBer1 Z EMBer2 Y
Clock Start
X
T EMBer Z LUTer Y
Fig. 2.21 Structural diagrams of FPGA-based PY Mealy FSMs
CMO from the h-th row of initial DST. For example, there is the CMO Y6 in the row 5 of Table 2.7. Because K (Y6 ) = 110 (see Fig. 2.20), there are symbols z 1 and z 2 in the row 5 of the transformed DST (Table 2.8). The functions (1.1) and (2.25) depend on terms (1.5). It is possible to use unused input assignments for minimizing these functions. In the discussed case these codes are 110 and 111. After minimizing, the following equations could be, for example, derived from Table 2.7: D1 = T¯1 T¯2 T¯3 x2 x¯3 ∨ T2 T3 ∨ T3 T¯3 x¯5 ; z 1 = T¯1 T¯2 T¯3 ∨ T¯1 T¯2 T3 x¯2 ∨ T2 T3 x3 ∨ T2 T3 x¯4 ∨ T1 T¯3 x¯5 . We do not discuss the last step. Its execution depends on used logic elements. If FPGAs are used for implementing PY FSMs, then four different implementations are possible (Fig. 2.21). To diminish the numbers of arguments in the input memory functions it is possible to use the RLC together with the encoding of CMOs. For example, using RLC for FSM (Fig. 2.21c) leads to the MPY Mealy FSM shown in Fig. 2.22a. If the splitting logical conditions is used, then, for example, the FSM (Fig. 2.21d) turns into MPY FSM shown in Fig. 2.21b.
2.4 Encoding of Microoperations Fig. 2.22 Structural diagram of MPY Mealy FSMs without (a) and with (b) splitting logical conditions
45
(a)
X
T
(b)
1
X
LUTer1
LUTer1 P Clock Start
T
T EMBer1
X2
P Clock Start
Z
Z EMBer2
T
EMBer1
EMBer2
Y
Y
The problem of encoding of fields of compatible MOs was considered by S. Schwartz in 1968 [136]. Formal methods for solution of this problem could be found in [137–139]. These methods are analyzed in [140]. Let us consider the problem. The microoperations yn , ym ∈ Y are compatible if the following relation takes place: / Yq (q = 1, Q). (2.28) yn ∈ Yq → ym ∈ So, the compatible MOs have never appeared together in the same CMOs. Let us find a partition Y = {Y 1 , ..., Y I } of the set Y such that: Y i ∩ Y j = ∅ (i = j, i, j ∈ {1, ..., I }); I Y = i=1 Yi; i I R Q = i=1 log2 |Y | + 1 → min.
(2.29)
The third line of (2.29) means that the total number of bits in codes of MOs should be minimal. Let us explain it. For each set Y i , the microoperations are encoded separately. It is possible that some CMO Yq ⊆ Y does not contain MOs from the set Y i . This property corresponds to a code with all zeroes. It means that |Y i | + 1 objects should be encoded for each set Y i ⊆ Y . Let us encode each MO yn ∈ Y i by a binary code K (yn ) having Ri bits:
Ri = log2 |Y i | + 1 .
(2.30)
Let us use variables zr ∈ Z i for such encoding, where Ri = |Z i |. Let us form the set Z = Z 1 ∪ Z 2 ∪ ... ∪ Z I . It is enough to have a decoder (DC) with Ri inputs to generate MOs yn ∈ Y i . So, the block BMO is organized as a set of decoders DCi (i = 1, I ). Because of it the symbol PD is used to denote the FSMs with encoding of the fields of compatible MOs. The term “field” means that some bits of microinstruction format [140] are used to represent the set Y i (i = 1, I ).
46
2 Structural Decomposition in FSM Synthesis
Table 2.9 Encoding of microoperations for PD Mealy FSM S3 yn ∈ Y 1 K (yn ) yn ∈ Y 2 K (yn ) z1 z2 z3 z2 ∅ y1 y4 y7
00 01 10 11
∅ y2 y6 y9
00 01 10 11
yn ∈ Y 3
K (yn ) z5 z6
∅ y3 y5 y8
00 01 10 11
Table 2.10 Encoding of microoperations for PD Mealy FSM S3 yq K (yq ) yq K (yq ) yq z1 z2 z3 z4 z5 z6 z1 z2 z3 z4 z5 z6 y1 y2 y3
000000 010101 011010
y4 y5 y6
110011 001100 000111
y7 y8 −
K (yq ) z1 z2 z3 z4 z5 z6 110010 101011 −
There are identical structural diagrams for PY and PD FSMs. So, the structural diagram of PD Mealy FSM is the same as shown in Fig. 2.19. The design method for PD FSM differs from the one for PY FSM in the names of steps 4 and 5: 4. Finding the partition Y corresponding to (2.29). 5. Encoding of the fields of compatible microoperations. Let us consider the FSM S3 (Table 2.7). The steps 1–3 are already executed for synthesis of PD FSM. Using results from [138], the following partition could be found: Y = {Y 1 , Y 2 , Y 3 } where Y 1 = {y1 , y4 , y7 }, Y 2 = {y2 , y6 , y9 } and Y 3 = {y3 , y5 , y8 }. Using (2.30) the values R1 = R2 = R3 = 2 could be found. It means that R Q = 6 and Z = {z 1 , ..., z 6 }. Let us encode MOs yn ∈ Y i as it is shown in Table 2.9. There is no influence of these codes on the hardware of BMO. So, we encode them in the trivial way. Now it is possible to find the codes K (Yq ) represented as concatenations of codes K (yn ). These codes are shown in Table 2.10. Let us explain these codes. For example, Y2 = {y1 , y2 , y3 } where K (y1 ) = 01, K (y2 ) = 01, K (y3 ) = 01. So, there is K (Y2 ) = 010101. The next example is Y5 = {y9 }. Because y9 ∈ Y 2 and K (y9 ) = 11, we have K (Y5 ) = 001100. And so on. The transformed DST of Mealy FSM S3 is constructed in the same way as it is for PY FSM. Let us name this table as DST of PD Mealy FSM. The DST of PD Mealy FSM S3 is shown in Table 2.11. The column Z h is filled using the column Yh (Table 2.7) and codes K (yn ) from Table 2.10. Of course, it is possible to take the codes K (yn ) from Table 2.9. The functions (1.1) and (2.25) are formed as it is for PY FSM. The functions (2.4) are derived from Table 2.9:
2.4 Encoding of Microoperations
47
Table 2.11 DST of PD Mealy FSM S3 am K (am ) as K (as ) a1
000
a2
001
a3 a4
010 011
a5
100
a6
101
a2 a3 a4 a5 a2 a4 a5 a6 a6 a1 a6 a1
001 010 011 100 001 011 100 101 101 000 101 000
Fig. 2.23 Illustration of principle of verticalization of GSA
Xh
Zh
Φh
h
x1 x¯1 x2 x3 x2 x¯3 x¯2 1 x3 x¯3 x4 x¯3 x¯4 x5 x¯5 1
z2 z4 z6 z2 z3 z5 z1 z2 z5 z6 z3 z4 z4 z5 z6 z1 z2 z5 z1 z3 z5 z6 z1 z2 z5 z6 z2 z4 z6 − z2 z3 z5 z3 z4
D3 D2 D2 D3 D1 D3 D2 D3 D1 D1 D3 D1 D3 − D1 D3 −
1 2 3 4 5 6 7 8 9 10 11 12
(a)
y1 = z¯1 z 2 ; y2 = z¯3 z 4 ; y3 = z¯5 z 6 ;
y4 = z 1 z¯2 ; y5 = z 5 z 6 ; y6 = z 3 z¯4 ;
(b)
y7 = z 1 z 2 ; y8 = z 5 z 6 ; y9 = z 3 z 4 .
(2.31)
Analysis of (2.31) shows that LUTs perfectly fit for implementing the system (2.4) of PD Mealy FSM. So, only two structural diagrams from Fig. 2.21 could be used for PD Mealy FSM. They are the ones shown in Fig. 2.21a and b. It is possible to reduce hardware amount in BMO using the verticalization of GSA Γ [11, 50, 122]. The method is reduced to splitting each operational vertex of a GSA Γ by |Yq | vertices where Yq ⊆ Y is a CMO from the initial vertex (Fig. 2.23). Let some GSA Γ be marked using M0 = 20 states of Mealy FSM. Consider a part of GSA Γ shown in Fig. 2.23a. There are three MOs in the operational vertex. After verticalization, there are three operational vertices (Fig. 2.23b) corresponding to the analyzed part of initial GSA. Now all MOs are compatible and there is R Q = log2 (N + 1) . The variable y0 initializes the start of operational unit [122].
(2.32)
48
2 Structural Decomposition in FSM Synthesis
This approach is connected with increasing for the number of states. For example, states a21 and a22 are added into GSA Γ after verticalization (Fig. 2.23b). It could lead for increasing of hardware in BF. So, it is necessary to compare the total hardware amount before and after verticalization. The results of comparison are used for choosing the method of encoding of CMOs. Let us consider the following conditions:
max log2 |Y i | + 1 ≤ S (i = 1, I );
(2.33)
log2 (N + 1) ≤ S.
(2.34)
If condition (2.33) takes place, then each MO yn ∈ Y is implemented as a single LUT having S inputs. In this case, there are only N LUTs in the BMO of PY Mealy FSM. If condition (2.34) is true, then there are N LUTs in BMO of PD Mealy FSM. Let us point out that condition (2.34) should be added into (2.29) in the case of FPGA-based PY FSMs.
2.5 Structural Decomposition for FPGA-Based Moore FSMs The main specific of Moore FSM is represented by (1.3). Because microoperations yn ∈ Y depend only on states am ∈ A, each operator vertex should be marked by a unique state [3]. It is known that states am , as are equivalent if: δ(am , X h ) = δ(as , X h ) (h = 1, H );
(2.35)
λ(am ) = λ(as ).
(2.36)
If outputs of operator vertices are connected with input of the same vertex, then condition (2.35) takes place for states marking these vertices. Of course, the relation (2.36) should not take place for these states. Otherwise these states would be replaced by a single state [3]. So, there are pseudoequivalent states in Moore FSM. For pseudoequivalent states (PES) condition (2.35) takes place, but the condition (2.36) is violated. Existence of PES is used for optimization of Moore FSM logic circuit [4, 21]. Let us consider the GSA Γ3 (Fig. 2.24). Its analysis gives: L = 3, N = 6, M = 7, R = 3. There are H = 15 rows in DST of P Moore FSM corresponding to Γ3 . Using definition of PES, it is possible to find the partition A of the set A by I classes of PES: A = {B1 , ..., B I }. In the discussed case, there is the partition A = {B1 , ..., B4 }, where B1 = {a1 }, B2 = {a2 , a3 }, B3 = {a4 } and B4 = {a5 , a6 , a7 }. Two methods could be used for optimizing Moore FSM logic circuit [4]:
2.5 Structural Decomposition for FPGA-Based Moore FSMs
49
Fig. 2.24 Initial GSA Γ3
1. Proper state assignment. 2. Transformation of state codes into class codes. The simplest way for the hardware reduction is a proper state assignment. There are different approaches named optimal, refined and combined state assignments [30]. Let us disuses these approaches. In the case of optimal state assignment, the code of each class Bi ∈ A is represented by the minimal possible amount of generalized intervals of R-dimensional Boolean space. In the best case, each class Bi ∈ A is represented by a single generalized interval. It follows from Fig. 2.25 that the class B1 corresponds to the interval 0, 0, 0 the class B2 to the interval 0, ∗, 1, the class B3 to the interval 0, 1, 0, and the class B4 to the interval 1, ∗, ∗. Each class Bi ∈ A corresponds exactly to a single generalized interval. So, it is the best possible solution. Now, the following class codes can be found: K (B1 ) = 000, K (B2 ) = 0 ∗ 1, K (B3 ) = 010 and K (B4 ) = 1 ∗ ∗. To get the system (1.1), a transformed structure table should be constructed [4]. The table is based on the system of generalized formulae of transitions (GFT) [4]. In the discussed case, it is the following system: B1 → x1 a2 ∨ x¯1 a3 ; B3 → a5 ;
B2 → x2 a4 ∨ x¯2 x3 a6 ∨ x¯2 x¯3 a3 ; B4 → x3 a7 ∨ x¯3 a1 .
(2.37)
Let us use the symbol of FSM structure (P, MP,...) and name of GSA Γ j to should that an FSM with a particular structure is synthesized for GSA Γ j . So, the Karnaugh
50
2 Structural Decomposition in FSM Synthesis
Fig. 2.25 Optimal state codes for Moore FSM P(Γ3 )
Fig. 2.26 Structural diagram of P Moore FSM
Table 2.12 Transformed DST of Moore FSMP0 (Γ3 ) Bi K (Bi ) as K (as ) B1
000
B2
0*1
B3 B4
010 1**
a2 a3 a4 a6 a3 a5 a7 a1
001 011 010 101 011 100 110 000
Xh
Φh
h
x1 x¯1 x2 x¯2 x3 x¯2 x3 1 x3 x¯3
D3 D2 D3 D2 D2 D3 D2 D3 D1 D1 D2 −
1 2 3 4 5 6 7 8
map (Fig. 2.25) represents codes for Moore FSM P(Γ3 ). The structural diagram of P Moore FSM is shown in Fig. 2.26. Let us use the symbol P0 to show that the optimal state assignment is used to Moore FSM. The system (2.37) is a base to form the transformed structure table of Moore FSMP0 (Γ3 ) (Table 2.12). The transformed table is used to construct the system (1.1). Now, terms Fh include conjunctions Am where some variables could be insignificant [4]. Let us point out that the number of rows of transformed DST is the same as it is for the equal P Mealy FSM. After minimizing, the following system can be derived from Table 2.12: D1 = T¯1 T3 x¯2 x3 ∨ T¯1 T2 T¯3 ∨ T1 x3 ; D2 = T¯1 T 3 ∨ T¯1 T¯2 T¯3 x¯1 ∨ T¯1 T3 x¯3 ; D3 = T¯1 T¯2 T¯3 ∨ T¯1 T3 x¯2 .
(2.38)
There are three different approaches for implementing the circuit of PY Moore FSM with FPGAs: 1. LUT-based implementation. In this case both BIMF and BMO are implemented using LUTs.
2.5 Structural Decomposition for FPGA-Based Moore FSMs
51
Fig. 2.27 Refined state code of Moore FSM P(Γ3 )
2. EMB-based implementation. In this case both BIMF and BMO are implemented using EMBs. 3. Heterogeneous implementation. In this case the BIMF is implemented with LUTs, whereas the BMO with EMBs. If LUTs are used to implement the BMO, then if could be necessary to optimize the system (1.3). It is necessary if the following condition takes place: S < R.
(2.39)
In this case, the refined state assignment could be used. Let us explain its nature. The following system (1.3) could be found from Fig. 2.24 y1 = A2 ∨ A4 ∨ A6 ; y2 = A3 ∨ A4 ;
y3 = A3 ∨ A4 ∨ A5 ; y4 = A6 ∨ A7 ;
y5 = A5 ∨ A6 ; y6 = A5 ∨ A6 ∨ A7 .
(2.40)
There is R = 3 for P(Γ3 ). If S = 2, then it is necessary to use the functional decomposition for implementing the BMO circuit. Let us encode states of P(Γ3 ) as it is shown in Fig. 2.27. These codes target optimizing the system (2.40). Such approach is named refined state assignment [65]. Using these codes, it is possible to get the following system on the base of (2.40): y1 = T3 ; y2 = T¯1 T2 ;
y3 = T2 ; y4 = T1 T¯2 ;
y5 = T1 T3 ∨ T1 T2 ; y6 = T1 .
(2.41)
Each function (2.41) depends on not more than two state variables. If S = 2, then only 3 LUTs are necessary to implement the circuit of BMO. For LUT-based FSMs it is necessary to optimize the systems (1.3) and B = B(A). To do it, the combined state assignment could be used [75, 91, 103, 116, 120, 121, 127, 128]. To execute it, it is possible to use the algorithm JEDI. Let us encode each class Bi ∈ A by a binary code C(Bi ) having R B bits: R B = log2 I .
(2.42)
Let us use the variables τr ∈ T for class encoding, where |T | = R B . It leads to PC Moore FSM (Fig. 2.28). Here the block of code transformation (BTC) implements the functions T = T (T ). (2.43)
52
2 Structural Decomposition in FSM Synthesis
Fig. 2.28 Structural diagram of PC Moore FSM
Table 2.13 Table of the system (2.41) for Moore FSM PC (Γ3 ) am k(am ) Bi K (Bi ) a1 a2 a3 a4 a5 a6 a7
000 001 011 010 100 101 110
B1 B2 B2 B3 B4 B4 B4
00 01 01 10 11 11 11
τm
m
− τ2 τ2 τ1 τ1 τ2 τ1 τ2 τ1 τ2
1 2 3 4 5 6 7
The block BIMF implements the system Φ = Φ(T , X ). There is R B = 2 for Moore FSM PC (Γ3 ). It gives the set T = {τ1 , τ2 }. Let us encode the classes Bi ∈ Φ A as the following: K (B1 ) = 00, ..., K (B4 ) = 11. The transformed DSTs are the same for P0 (Γ3 ) and PC (Γ3 ). But there are the codes 00, ..., 11 in the column K (Bi ) of DST for PC (Γ3 ). There are the following equations in the system Φ for PC (Γ3 ): D1 = τ1 τ¯2 ∨ τ1 τ2 x3 ; D2 = τ¯1 τ¯2 x¯1 ∨ τ¯1 τ2 ∨ τ1 τ2 x3 ; D3 = τ¯1 τ¯2 x¯1 ∨ τ¯1 τ2 x2 .
(2.44)
The system (2.43) could be found from Table 2.13. After minimizing, the following system could be derived from Table 2.13: τ1 = T¯1 T2 T¯3 ∨ T1 ;
τ2 = T¯1 T3 ∨ T1 .
(2.45)
The system (2.45) is used for implementing the circuit of BCT. Using this approach, it is possible to simplify the circuit of BRLC for M PC Moore FSM (Fig. 2.29). In Fig. 2.29, the EMBer implements systems (2.3), (1.3) and (2.43). The LUTer implements functions (2.46): P = P(T , X ). (2.46) This approach could be combined with the splitting logical conditions [62].
2.5 Structural Decomposition for FPGA-Based Moore FSMs Fig. 2.29 Structural diagram of FPGA-based M PC Moore FSM
X
53 P LUTer
EMBer Start Clock
T
Fig. 2.30 Structural diagram of PY Moore FSM
X
T
EMBer
Y
Z
LUTer
Y
Start Clock
If the condition (2.39) takes place, then it is possible to encode the CMOs of Moore FSM. It leads to PY Moore FSM (Fig. 2.30). In PY FSM, the EMBer implements the system Z = Z (T ),
(2.47)
the LUTer implements microoperations represented as (2.4). Now, let us discuss new design methods targeting FPGA-based FSMs.
References 1. Adamski M, Barkalov A (2006) Architectural and sequential synthesis of digital devices. University of Zielona Góra Press, Zielona Góra 2. Atmel http://www.atmel.com. Accessed Jan 2019 3. Baranov S (1994) Logic synthesis of control automata. Kluwer Academic Publishers, Boston 4. Baranov S (2008) Logic and system design of digital systems. TUT Press, Tallinn 5. Barkalov A, Beleckij O, Nedal A (1999) Applying of optimization methods of Moore automaton for synthesis of compositional microprogram control unit. Autom Control Comput Sci 33(1):44–52 6. Barkalov A, Dzhaliashvili Z, Salomatin V, Starodubov K (1986) Optimization of a microinstruction address scheme for microprogram control unit with PLA and PROM. Autom Control Comput Sci 20(5):83–87 7. Barkalov A, Salomatin V, Starodubov K, Das K (1991) Optimization of Mealy automaton logic using programmable logic arrays. Cybern Syst Anal 27(5):789–793 8. Barkalov A, Shwec A (1994) Synthesis of compositional microprogram control unit with modified microinstruction addressing. Autom Control Comput Sci 28(5):22–30 9. Barkalov A, Titarenko L (2007) Design of control units with programmable logic devices. In: Korbicz J (ed) Measurements. methods, systems and design. Wydawnictwo Komunikacji i Ła˛czno´sci, Warsaw, Poland, pp 371–391 10. Barkalov A, Titarenko L (2008) Logic synthesis for compositional microprogram control units, vol 22. Springer, Berlin
54
2 Structural Decomposition in FSM Synthesis
11. Barkalov A, Titarenko L (2009) Logic synthesis for FSM-based control units, vol 53. Lecture notes in electrical engineering. Springer, Berlin 12. Barkalov A, Titarenko L (2009) Synthesis of operational and control automata. UNITECH, Donetsk 13. Barkalov A, Titarenko L, Barkalov A Jr (2007) Moore FSM synthesis with coding of compatible microoperations fields. In: Proceedings of IEEE east-west design and test symposium - EWDTS’07, Yerevan, Armenia, Kharkov, 2007. Kharkov National University of Radioelectronics, pp 644–646 14. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of logic circuit of Moore FSM on CPLD. Pomiary Automatyka Kontrola 53(5):18–20 15. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on CPLD. In: Proceedings of the sixth international conference CAD DD’07, vol 2. Minsk, pp 39–45 16. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on system-on chip. In: Proceedings of IEEE east-west design and test symposium – EWDTS’07, Yerevan, Armenia, Kharkov, pp 105–109 17. Barkalov A, Titarenko L, Chmielewski S (2007) Reduction in the number of PAL macrocells int the circuit of a Moore FSM. Int J Appl Math Comput Sci 17(4):565–675 18. Barkalov A, Titarenko L, Chmielewski S (2008) Decrease of hardware amount in logic circuit of Moore FSM. Przegla˛d Telekomunikacyjny i Wiadomo´sci Telokomunikacyjne 6:750–752 19. Barkalov A, Titarenko L, Chmielewski S (2008) Optimization of Moore control unit with refined state encoding. In: Proceedings of the 15th international conference MIXDES 2008, Pozna´n, Poland, 2008. Departament of Microeletronics and Computer Science, Technical University of Łódz, pp 417–420 20. Barkalov A, Titarenko L, Chmielewski S (2008) Optimization of Moore FSM on system-onchip using PAL technology. In: Proceedings of the international conference TCSET 2008, Lviv-Slavsko, Ukraina. Ministry of Education and Science of Ukraine. Lviv Polytechnic National University, Lviv, Publishing House of Lviv Polytechnic, pp 314–317 21. Barkalov A, Titarenko L, and Chmielewski S (2014) Hardware reduction in CPLD-based Moore FSM. J Circuits, Syst, Comput 23(6):1450086–1–1450086–21 22. Barkalov A, Titarenko L, Kołope´nczyk M (2006) Optimization of control unit with code sharing. In: Proceedings of the 3rd international workshop of IFAC discrete–event system design (DESDES’06), Rydzyna, 2006. University of Zielona Góra Press, pp 195–200 23. Barkalov A, Titarenko L, Wi´sniewski R (2006) Optimization of Address Circuit of Compositional Microprogram Unit. In: Proceedings of the IEEE east-west design and test workshop (EWDTW’06), Sochi, Kharkov, 2006. Kharkov National University of Radioelectronics, pp 167–170 24. Barkalov A, Titarenko L, Wi´sniewski R (2006) Synthesis of compositional microprogram control units with sharing codes adn address decoder. In: Proceedings of the international conference mixed design of integrated circuits and systems – MIXDES 2006, Łódz, pp 397– 400 25. Barkalov A, We˛grzyn A, Barkalov A Jr (2007) Synthesis of control units with transformation of the codes of objects. In: Proceedings of the IXth international conference CADSM 2007 (The experience of designing and application of CAD systems in microelectronics), Lviv - Polyana, Ukraine 2007, Lviv Polytechnic National University. Publishing House of Lviv Polytechnic National University, Lviv, pp 260–261 26. Barkalov A, We˛grzyn M (2006) Design of control units with programmable logic. University of Zielona Góra Press, Zielona Góra 27. Barkalov A, Wi´sniewski R (2004) Design of compositional microprogram control units with maximal encoding of inputs. Radioelectron Inform 3:79–81 28. Barkalov A, Wi´sniewski R (2004) Optimization of compositional microprogram control unit with elementary operational linear chains. Control Syst Comput 5:25–29 29. Barkalov A, Wi´sniewski R (2005) Optimization of compositional microprogram control units implemented on system-on-chip. Theor Appl Inf 9:7–22
References
55
30. Barkalov A, Zelenjova I (2000) Optimization of replacement of logical conditions for an automaton with bidirectional transitions. Autom Control Comput Sci, 34(5):48–53, Allerton Press Inc 31. Bolton M (1990) Digital system design with programmable logic. Addison-Wesley, Boston 32. Bomar BW (2002) Implementation of microprogrammed control in FPGAs. IEEE Trans Ind Electron 49(2):415–422 33. Borowik G (2007) Synthesis of sequential devices into FPGA with embedded memory blocks. PhD thesis, Warszawa, WUT 34. Brayton R, Hatchel G, McMullen C, Sangiovanni-Vincentelli A (1984) Logic minimization algorithms for VLSI synthesis. Kluwer Academic Publishers, Boston 35. Brayton R, Rudell R, Sangiovanni-Vincentelli A, Wang A (1987) MIS: a multi-level logic optimization system. IEEE Trans Comput-Aided Des 6(11):1062–1081 36. Brown S, Vernesic Z (2000) Fundamentals of digital logic with VHDL design. McGraw-Hill, New York 37. Bukowiec A (2008) Synthesis of finite state machines for programmable devices based on multi-level implementation. PhD thesis, University of Zielona Góra 38. Webb C, Liptay J (1997) A high-frequency custom CMOS S/390 microprocessor. IBM J Res Dev 41(4/5):463–473 39. Cao C, O’Nils B, Oelmann D (2004) A tool for low-power synthesis of FSMs with mixed synchronous/asynchronous state memory. In: Proceedings of norchip conference, pp 199–202 40. Chattopadhyay S (2005) Area conscious state assignment with flip-flop and output polarity selection for finite state machines synthesis - a genetic algorithm. Comput J 48(4):443–450 41. Chattopadhyay S, Chaudhuri P (1998) Genetic algorithm based approach for integrated state assignment and flipflop selection in finite state machines synthesis. In: Proceedings of the IEEE international conference on VLSI design, Los Alamitos, 1998. IEEE Computer Society, pp 522–527 42. Chu P (2006) RTL hardware design using VHDL: coding for efficiency, portability and scalability. Wiley-Interscience, New York 43. Chu YC (1972) Computer organization and microprogramming. Prentice Hall, Upper Saddle River 44. Ciesielski M, Jang S (1992) PLADE: a two-stage PLA decomposition. IEEE Trans ComputAided Des 11(8) 45. Clements A (2000) The principles of computer hardware. Oxford University Press Inc, New York 46. Cypress Semiconductor Corporation. http://www.cypress.com. Accessed Jan 2019 47. Cypress Semiconductor Corporation. Cypress programmable logic: delta 39K. Data sheet. http://cypress.com/pld/delta39k.html. Accessed Jan 2019 48. Czerwi´nski R, Kania D (2004) State assignment method for high speed FSM. In: Proceedings of programmable devices and systems, pp 216–221 49. Czerwi´nski R, Kania D (2005) State assignment for PAL-based CPLDs. In: Proceedings of 8th Euromicro symposium on digital system design, pp 127–134 50. Czerwinski R, Kania D (2013) Finite state machine logic synthesis for complex programmable logic devices, vol 231. Lecture notes in electrical engineering, Springer, Berlin 51. Dasgupta S (1979) The organization of microprogram stores. ACM Comput Surv 24:101–176 52. Debnath D, Sasao T (1999) Multiple-valued minimization to optimize PLA with output EXOR gates. In: Proceedings of IEEE international symposium on mupltiple-valued logic, pp 99–104 53. Debnath D, Sasao T (2005) Doutput phase optimization for AND-OR-EXOR PLAs with decoders and its application to design of adders. IFICE Trans Inf Syst E88-D(7):1492–1500 54. Deniziak S, Sapiecha K (1998) An efficient algorithm of perfect state encoding for CPLD based systems. In: Proceedings of IEEE workshop on design and diagnostic of electronic circuits and systems (DDECS’98), pp 47–53 55. Devadas S, Ma H (1990) Easily testable PLA-based finite state machines. IEEE Trans ComputAided Des Integr Circuits Syst 9(6):604–611
56
2 Structural Decomposition in FSM Synthesis
56. Devadas S, Ma H, Newton A, Sangiovanni-Vincentelli A (1988) MUSTANG: state assignment of finite state machines targeting multilevel logic implementation. IEEE Trans Comput-Aided Des 7(12):1290–1300 57. Devadas S, Newton A (1991) Exact algorithms for output encoding, state assignment, and four-level boolean minimization. IEEE Trans Comput-Aided Des 10(1):143–154 58. Du X, Hachtel G, Lin B, Newton A (1991) MUSE: a multilevel symbolic encoding algorithm for state assignment. IEEE Trans Comput-Aided Des Integr Circuits Syst 10(1):28–38 59. Escherman B (1993) State assignment for hardwired VLSI control units. ACM Comput Surv 25(4):415–436 60. Flynn MJ, Rosin RF (1971) Microprogramming: an introduction and a viewpoint. IEEE Trans Comput C–20(7):727–731 61. Gajski D (1997) Principles of digital design. Prentice Hall, New York 62. Garcia-Vargas I, Senhadji-Navarro R, Jiménez-Moreno G, Civit-Balcells A, Guerra-Gutierrez P (2007) ROM-based finite state machine implementation in low cost FPGAs. In: IEEE international symposium on industrial electronics ISIE 2007. IEEE, pp 2342–2347 63. Goren S, Ferguson F (2002) CHESMIN: a heuristic for state reduction of incompletely specified finite state machines. In: Proceedings of the design, automation and test in Europe conference and exhibition (DATE’02), pp 248–254 64. Gupta B, Narayanan H, Desai M (1999) A state assignment scheme targeting performance and area. In: Proceedings of 12th international conference on VLSI design, pp 378–383 65. Habib S (1988) Microprogramming and firmware engineering methods. Wiley, New York 66. Hassoun S, Sasao T (2002) Logic synthesis and verification. Kluwer Academic Publishers, Boston 67. Hatchel G, Somenzi F (2000) Logic synthesis and verification algorithms. Kluwer Academic Publishers, Boston 68. Hrynkiewicz E, Kania D (2003) Impact of decomposition direction on synthesis effectiveness. In: Proceedings of programmable devices and systems (PDS’03), pp 144–149 69. Hu H, Xue H, Bian J (2003) A heuristic state assignment algorithm targeting area. In: Proceedings of 5th international conference on ASIC, vol 1, pp 93–96 70. Huang J, Jou J, Shen W (2000) ALTO: an iterative area/performance algorithms for LUT-based FPGA technology mapping. IEEE Trans VLSI Syst 18(4):392–400 71. Husson S (1970) Microprogramming: principles and practices. Prentice Hall, Englewood Cliffs 72. Iranli A, Rezvani P, Pedram M (2003) Low power synthesis of finite state machines with mixed D and T flip-flops. In: Proceedings of the Asia and South Pacific – DAC, pp 803–808 73. Iwai H (2004) Future CMOS scaling. Proceedings 11th conference mixed design of integrated circuits and systems, MIXDES 2004, Szczecin, Poland, 2004. Technical University of Łód´z, Departament of Microelectronics and Computer Science, pp 12–18 74. Jenkins J (1995) Design with FPGAs and CPLDs. Prentice Hall, New York 75. Kahng A (2011) VLSI physical design: from graph partitioning to timing closure. Springer, Berlin 76. Kam T, Villa T, Brayton R, Sangiovanni-Vincentelli A (1998) A synthesis of finie state machines: functional optimization. Kluwer Academic Publishers, Boston 77. Kania D (1999) Two-level logic synthesis on PAL-based CPLD and FPGA using decomposition. In: Proceedings of 25th Euromicro conference, pp 278–281 78. Kania D (1999) Two-level logic synthesis on PALs. Electron Lett 17:879–880 79. Kania D (2000) Coding capacity of PAL-based logic blocks included in CPLDs and FPGAs. In: Proceedings of IFAC workshop on programmable devices and sysytems (PDS’2000). Elsevier Science, pp 164–169 80. Kania D (2000) Decomposition-based synthesis and its application in PAL-oriented technology mapping. In: Proceedings of 26th Euromicro conference. IEEE Compuetr Society Press, Maastricht, pp 138–145 81. Kania D (2002) An efficient algorithm for output coding in PAL-based CPLDs. Int J Eng 15(4):325–328
References
57
82. Kania D (2002) Logic synthesis of multi–output functions for PAL-based CPLDs. In: Proceedings of IEEE international conference on field-programmable technology, pp 429–432 83. Kania D (2003) An efficient approach to synthesis of multi-output boolean functions on PAL-based devices. IEEE Proc - Comput Digital Tech 150:143–149 84. Kubatova H (2005) Design of embedded control systems. Chapter finie state machine implementation in FPGAs. Springer, New York, pp 177–187 85. Łuba T, Rawski M, Jachna Z (2002) Functional Decomposition as a universal method for logic synthesis of digital circuits. In: Proceedings of IX international conference MIXDES’02, pp 285–290 86. Maxfield C (2004) The design Warrior’s guide to FPGAs. Academic Press Inc, Orlando 87. Maxfield C (2008) FPGAs: Instant access. Newnes 88. McCluskey E (1986) Logic design principles. Prentice Hall, Englewood Cliffs 89. De Micheli G (1986) Symbolic design of combinational and sequential logic implemented by two-level macros. IEEE Trans Comput-Aided Des 5(9):597–616 90. De Micheli G (1994) Synthesis and optimization of digital circuits. McGraw-Hill, New York 91. Microsemi. http://www.microsemi.com. Accessed Jan 2019 92. Minns P, Elliot I (2008) FSM-based digital design using Verilog HDL. Wiley, New York 93. Navabi Z (2007) Embedded core design with FPGAs. McGraw-Hill, New York 94. Papachristou C (1981) Hardware microcontrol schemes using PLAs. In: Proceeding of 14th microprogramming workshop, vol 2, pp 3–15 95. Papachristou C, Gambhir S (1982) A microsequencer architecture with firmware support for modular microprogramming. ACM SIGMICRO Newsletters 13(4):105–113 96. Park S, Yang S, Cho S (2000) Optimal state assignment technique for partial scan designs. Electron Lett 36(18):1527–1529 97. Parnel K, Mechta N (2003) Programmable logic design quick start hand book. Xilinx 98. Patterson D, Henessy J (1998) Computer organization and design: the hardware/software interface. Morgan Caufmann, San Moteo 99. Pedram C, Despain A (1998) Low-power state assignment targeting two- and multilevel logic implementations. IEEE Trans Comput-Aided Des Integr Circuits Syst 17(12):1281–1291 100. Pedroni V (2004) Circuit design with VHDL. MIT Press, Cambridge 101. Pomerancz I, Cheng K (1993) STOIC: state assignment based on output/input functions. IEEE Trans Comput-Aided Des Integr Circuits Syst 12(8):1123–1131 102. Pugh E, Johnson L, Palmer J (1991) IBM’s 360 and early 370 systems. MIT Press, Cambridge 103. QuickLogic. http://www.quicklogic.com. Accessed Jan 2019 104. Rawski M, Łuba T, Jachna Z, Tomaszewicz P (2005) Design of embedded control systems. Chapter The influence of functional decomposition on modern digital design process. Springer, Boston, pp 193–203 105. Rawski M, Selvaraj H, Łuba T (2005) An application of functional decomposition in ROMbased FSM implementation in FPGA devices. J Syst Architect 51(6–7):423–434 106. Rho J, Hatchel F, Somenzi R, Jacoby R (1994) Exact and heuristic algorithms for the minimization of incompletely specified state machines. IEEE Trans Comput-Aided Des 13(2):167–177 107. Rudell R, Sangiovanni-Vincentelli A (1987) Multiple-valued minimization for PLA optimization. IEEE Trans Comput-Aided Des 6(5):727–750 108. Sakamura K (2002) Future SoC possibilities. IEEE Micro 22(5):7 109. Salcic Z (1998) VHDL and FPLDs in digital systems design, prototyping and customization. Kluwer Academic Publishers, Boston 110. Salisbury A (1976) Microprogrammable computer architectures. Elsevier Science, New York 111. Sasao T (1984) Input variable assignment and output phase optimization of PLA optimization. IEEE Trans Comput 33(10):879–894 112. Saucier G, Depaulet M, Sicard P (1987) ASYL: a rule-based system for controller synthesis. IEEE Trans Comput-Aided Des 6(11):1088–1098 113. Saucier G, Sicard P, Bouchet L (1990) Multi-level synthesis on programmable devices in the ASYL system. In: Proceedings of Euro ASIC, pp 136–141
58
2 Structural Decomposition in FSM Synthesis
114. Scholl C (2001) Functional decomposition with application to FPGA synthesis. Kluwer Academic Publishers, Boston 115. Schwartz S (1968) An algorithm for minimizing read-only memories for machine control. In: IEEE 10th annual symposium on switching and automata theory, pp 28–33 116. Senhadji-Navarro R, Garcia-Vargas I, Jiménez-Moreno G, Civit-Balcells A, Guerra-Gutierrez P (2004) ROM-based FSM implementation using input multiplexing in FPGA devices. Electron Lett 40(20):1249–1251 117. Sentowich E, Singh K, Lavango L, Moon C, Murgai R, Saldanha A, Savoj H, Stephan P, Bryton R, Sangiovanni-Vincentelli A (1992) SIS: a system for sequential circuit synthesis. Technical report, University of California, Berkely, 1992 118. Sentowich E, Singh K, Lavango L, Moon C, Murgai R, Saldanha A, Savoj H, Stephan P, Bryton R, Sangiovanni-Vincentelli A (1992) SIS: a system for sequential circuit synthesis. In: Proceedings of the international conference of computer design (ICCD’92), pp 328–333 119. Shriver B, Smith B (1998) The anatomy of a high-performance microprocessor: a systems perspective. IEEE Computer Society Press, Los Alamitos 120. Skliarova I, Sklyarov V, Sudnitson A (2012) Design of FPGA-based circuits using hierarchical finite state machines. TUT Press, Tallinn 121. Sklyarov V (2000) Synthesis and implementation of RAM-based finite state machines in FPGAs. In: Proceedings of field-programmable logic and applications: the roadmap to reconfigurable computing, Villach, 2000. Springer, pp 718–728, 122. Sklyarov V, Skliarova I, Barkalov A, Titarenko L (2014) Synthesis and optimization of FPGAbased systems, vol 294. Lecture notes in electrical engineering, Springer, Berlin 123. Smith M (1997) Application-specific integrated circuits. Addison-Wesley, Boston 124. Solovjev V, Czyzy M (1999) Refined CPLD macrocells architecture for effective FSM implementation. In: Proceedings of the 25th EUROMICRO conference, vol 1. Milan, Italy, pp 102–109 125. Solovjev V, Czyzy M (1999) The universal algorithm for fitting targeted unit to complex programmable logic devices. In: Proceedings of the 25th EUROMICRO conference, vol 1. Milan, Italy, pp 286–289 126. Solovjev V, Czyzy M (2001) Synthesis of sequential circuits on programmable logic devices based on new models of finite state machines. In: Proceedings of the EUROMICRO conference. Milan, pp 170–173 127. Sutter G, Todorovich E, López-Buedo S, Boemo E (2002) Low-power FSMs in FPGA: encoding alternatives. In: Integrated circuit design. Power and timing modeling, optimization and simulation. Springer, Berlin, pp 363–370 128. Tiwari A, Tomko K (2004) Saving power by mapping finite-state machines into embedded memory blocks in FPGAs. In: Proceedings of the conference on design, automation and test in Europe - volume 2. IEEE Computer Society, pp 916–921 129. Tucker S (1967) Microprogram control for system/360. IBM Syst J 6(4):222–241 130. Venkatamaran G, Reddy S, Pomerancz I (2003) GALLOP: genetic algorithm based low power FSM synthesis by simultaneous partitioning and state assignment. In: Proceedings of 16th international conference on VLSI design, pp 533–538 131. Villa T, Kam T, Brayton R, Sangiovanni-Vincentelli A (1998) A synthesis of finite state machines: logic optimization. Kluwer Academic Publishers, Boston 132. Villa T, Saldachna T, Brayton R, Sangiovanni-Vincentelli A (1997) Symbolic two-level minimization. IEEE Trans Comput-Aided Des 16(7):692–708 133. Villa T, Sangiovanni-Vincentelli A (1990) NOVA: State assignment of finite state machines for optimal two-level logic implememntation. IEEE Trans Comput-Aided Des 9(9):905–924 134. Wilkes M (1951) The best way to design an automatic calculating machine. In: Proceedings of Manchester University computer inaugural conference 135. Wilkes M, Stringer J (1953) Microprogramming and the design of the control circuits in an electronic digital computer. Proc Camb Philos Soc 49:230–238 136. Wi´sniewski R (2008) Synthesis of compositional microprogram control units for programmable devices. PhD thesis, University of Zielona Góra
References
59
137. Xia Y, Almani A (2002) Genetic algorithm based state assignment for power and area optimization. IEEE Proc Comput Digit Tech 149:128–133 138. Xilinx. http://www.xilinx.com. Accessed Jan 2019 139. Yang S (1991) Logic synthesis and optimization benchmarks user guide. Technical report, Microelectronic Center of North Carolina 140. Yanushkevich S, Shmerko V (2008) Introduction to logic design. CRC Press, Boca Raton
Chapter 3
Twofold State Assignment for Mealy FSMs
Abstract The chapter begins from the general idea of twofold state assignment for Mealy FSMs. The method is based on partitioning the set of states by classes. Each internal state is encoded in two ways: as an element of the set of FSM states and as an element of some class of states. It allows implementing any function for a given class using only a single LUT. The structural diagram and design method are proposed for FPGA-based Mealy FSMs with twofold state assignment. Next, it is proposed the formal method allowing to find the partition with minimum amount of classes. It is shown that the twofold state assignment could be combined with encoding of collections of microoperations. There are proposed corresponding models and their synthesis methods. There are proposed methods of diminishing encoding of states and collections of microoperations allowing to diminish the number of logic elements and their interconnections in LUT-based logic circuits The last part of the chapter is devoted to showing results of investigations of proposed methods of structural decomposition. The standard benchmarks are used for conducting the investigations.
3.1 General Idea of the Method When FSM circuits are implemented with FPGAs, there is a contradiction between a large number of arguments in functions (1.1), (1.2) and a very small amount of address inputs of a LUT [2, 3]. Let us consider the following function: D3 = T1 T¯2 T3 x1 x¯2 ∨ T¯1 T¯2 T¯3 x3 .
(3.1)
Let N ( f i ) be the number of arguments in some Boolean function f i . So, there is N (D3 ) = 6. If LUTs are used for implementing a circuit, then each argument represents a single address bit of a LUT [3]. If S = 6, then it is enough a single LUT to implement the circuit corresponding to (3.1). There are the minimal amount of LUTs and interconnections in the circuit (Fig. 3.1a). It has only a single level of LUTs. This circuit is the most efficient if we talk about either hardware amount or performance or energy consuming. © Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7_3
61
62
3 Twofold State Assignment for Mealy FSMs
Fig. 3.1 Implementing function D3 with LUTs having S = 6 (a) and S = 4 (b)
(b)
(a)
Let the following condition take place: N ( f i ) > S.
(3.2)
In this case, it is necessary to use the functional decomposition [6] to implement a logic circuit. It S = 4, then the function D3 should be transformed: D3 = T1 (T¯2 T3 x1 x¯2 ) ∨ T¯1 (T¯2 T¯3 x3 ).
(3.3)
Now, it is necessary 3 LUTs with S = 4 to implement this function (Fig. 3.1b). There are two levels of logic in this circuit. Let us denote as C1 the circuit from Fig. 3.1a and as C2 from Fig. 3.1b. There are 6 interconnections, one level of logic and a single LUT in the circuit C1. There are 10 interconnections, two levels of logic and three LUTs in the circuit C2. It means that the circuit C1 is two times faster and three times cheaper than the circuit C2. Besides, the inputs T2 , T3 should be more powerful for C2, because they are connected with two LUTs. More interconnections means more parasitic capacitors required power to charge. So, the circuit C2 consumes more energy than the circuit C1. This example shows the importance of reducing the number of literals in the terms (1.5). It could be done using the transformation of GSA Γ . In this case some additional states are introduced [2]. It results in the increase in the number of clock cycles required for algorithm execution. It is also necessary that variables xe ∈ X and Tr ∈ T were connected with the lowest possible number of LUTs. There is another requirement. It is necessary to order the interconnections between the circuit elements to give them a regular pattern. Let us represent a structural diagram of P Mealy FSM as it is shown in Fig. 3.2. It includes R0 + N blocks of LUTers. Each block implements a single function from the systems (1.1), (1.2). If the condition (3.2) takes place, the circuit Fig. 3.2 is irregular, multi-level and rather slow. Of course, it is true only in comparison with the circuit of an FSM for each function of which the following condition takes place:
3.1 General Idea of the Method
63
Fig. 3.2 Structural diagram of P Mealy FSM implemented with LUTs
N ( f i ) ≤ S.
(3.4)
The number of inputs S is fixed for a given FPGA chip. Therefore, the only way to achieve compliance with the condition (3.4) is to reduce the number of arguments in functions (1.1), (1.2). To do it, we propose the following approach. Let us find a partition Π B = {A1 , ..., A K } of the set A such that the following condition takes place: (3.5) Rk + L k ≤ S (k ∈ {1, ...K }). In (3.5), the symbol Rk stands for the number of state variables necessary to encode states am ∈ Ak . The symbol L k stands for the number of input variables xe ∈ X determining transitions from the states am ∈ Ak . Let it be Mk elements in the set Ak . Let us encode these states using minimal possible amount of state variables τr ∈ T . Also, it should be encoded the situation / Ak . The number Rk of state variables are determined as am ∈ Rk = log2 (Mk + 1) .
(3.6)
To encode the states am ∈ A1 , we use the first R1 elements of T . And so on. So, there are R A elements in the set T : RA =
K
Rk .
(3.7)
k=1
Each class Ak ∈ Π B determines a subtable STk of an FSM structure table. Let us construct the sets Y k ⊂ Y and Φ k ⊂ Φ. The set Y k includes microoperations yn ∈ Y from the column Yh of STk . The set Φ k includes input memory functions Dr ∈ Φ from the column Φh of STk . So, each class Ak ∈ Π B determines the sets X k , Y k and Φ k . The following relations are true: K Xk; (3.8) X= k=1
Y =
K k=1
Y k;
(3.9)
64
3 Twofold State Assignment for Mealy FSMs
Φ=
K
Φk .
(3.10)
k=1
Let us use two state codes for each state. We use code K (am ) for am ∈ A and code C(am ) for am ∈ Ak (k = 1, K ). Because of it, we name this approach the twofold state assignment. Let us denote as PT FSM the P Mealy FSM based on the twofold state assignment. Each subtable STk corresponds to a block LUTerk in the circuit of PT Mealy FSM. As follows from (3.5) it is enough a single LUT having S inputs to keep a truth table for any function Dr ∈ Φ k and yn ∈ Y k . There is a structural diagram of PT Mealy FSM shown in Fig. 3.3. The following functions are implemented by the block LUTerk (k = 1, K ): Y k = Y k (T k , X k );
(3.11)
Φ k = Φ k (T k , X k ).
(3.12)
In (3.11), (3.12) the symbol T k stands for the subset of the set T whose elements are used to encode the states am ∈ Ak . The block LUTerYT generates outputs yn ∈ Y and state variables Tr ∈ T . This block includes a distributed RG keeping state codes K (am ). To control the process of state code changing, there are pulses Star t and Clock entering the LUTerYT. The block LUTerT transforms the state codes K (am ) into state codes C(am ). As a result, it generates the variables τr ∈ T . These variables are distributed among the blocks LUTerk (k = 1, K ). Let the symbol ynk mean that yn ∈ Y k . Let the symbol Drk stand for the relation Dr ∈ Φ k . In this case, the following systems of Boolean functions represent the block LUTerYT: K ynk (n = 1, N ); (3.13) yn = k=1
Dr =
K
Drk (r = 1, R0 ).
(3.14)
k=1
Fig. 3.3 Structural diagram of PT Mealy FSM
1
X1 LUTer1 Y1 Clock Start
K
XK LUTerK
Φ1
YK
ΦK
LUTerYT T Y
LUTer
3.1 General Idea of the Method
65
The LUTerT implements the functions τr = τr (T ) (r = 1, R A ).
(3.15)
In each instant of time, only a single LUTerk is “active”. It means that it could be ones on some outputs of this block. At the same time, there are only zeros on the outputs of other blocks forming functions (3.11), (3.12). These blocks are “idle”. Let us use zero code to show that the block LUTerk is idle. It corresponds to the following relation: (3.16) τr ∈ T k → τr = 0. Let the following condition take place: K ≤ S.
(3.17)
In this case, there are not more than R0 + N LUT elements in the circuit of LUTerYT. Let the following condition take place: R0 ≤ S.
(3.18)
In this case, there are R A LUTs in the circuit of LUTerT . To reduce the number of LUTs in circuits of LUTer1–LUTerK, it is necessary to find the partition Π B with the following properties: (1) It includes the minimum possible number of blocks (K → min). (2) It provides minimum intersection of the sets X k , Φ k and Y k : i X ∩ X j → min(i = j, i, j ∈ {1, ..., K });
(3.19)
i Y ∩ Y j → min(i = j, i, j ∈ {1, ..., K });
(3.20)
i Φ ∩ Φ j → min(i = j, i, j ∈ {1, ..., K }).
(3.21)
Let us discuss how to design P, PY and MP Mealy FSMs based on the twofold state assignment. We denote them as PT, PYT and MPT Mealy FSMs, respectively. Let us say that PT Mealy FSM is an FSM with the base structure.
3.2 Synthesis of Mealy FSM with the Base Structure Let us consider GSA Γ4 Fig. 3.4. It is marked by states of Mealy FSM using the rules [1]. There are the following sets which could be derived from Fig. 3.4: A = {a, ..., a9 }, X = {x1 , ..., x7 }, Y = {y1 , ..., y8 }. So, there are M0 = 9, L = 7 and N = 8. Using (1.17), we can find that R0 = 4. It gives the sets T = {T1 , ..., T4 } and Φ = {D1 , ..., D4 }. Let us encode the states am ∈ A in the trivial way: K (a1 ) =
66
3 Twofold State Assignment for Mealy FSMs
Fig. 3.4 Initial GSA Γ4 marked by states of Mealy FSM
0000, ..., K (a9 ) = 1000. Let us construct the DST of Mealy FSM P(Γ4 ). Because this chapter is devoted to Mealy FSM, we will sometimes omit the name “Mealy”. It is done, for example, in Table 3.1. The table includes H0 = 17 rows.
3.2 Synthesis of Mealy FSM with the Base Structure Table 3.1 Direct structure table of FSM P(Γ4 ) am K (am ) as K (as ) Xh a1 a2
0000 0001
a3
0010
a4 a5
0011 0100
a6
0101
a7
0110
a8 a9
0111 1000
a2 a3 a3 a3 a3 a4 a4 a3 a5 a6 a3 a6 a3 a8 a7 a9 a1
0001 0010 0010 0010 0010 0011 0011 0010 0100 0101 0010 0110 0110 0111 0110 1000 0000
1 x1 x2 x1 x¯2 x¯1 x3 x¯1 x¯3 x3 x¯3 x4 x¯3 x¯4 1 x7 x¯7 x5 x¯5 x6 x¯6 1 1
67
yh
Φh
h
y1 y2 y1 y2 y3 y3 y4 y5 y6 y7 y3 y5 y5 y2 y3 y3 y6 y5 y1 y7 y8 y7 y8 y8 y7 y1
D4 D3 D3 D3 D3 D3 D4 D3 D4 D3 D2 D2 D4 D3 D2 D3 D2 d3 D2 D3 D4 D1 D1 –
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
This table is used for deriving functions (1.1), (1.2). For example, the following minimized Boolean functions could be derived from Table 3.1: D3 = T¯1 T¯2 T¯3 T4 ∨ T¯1 T¯2 T3 T¯4 ∨ T¯1 T2 T¯3 T¯4 x¯7 ∨ T¯1 T2 T3 T¯4 , y8 = T¯1 T2 T¯3 T4 x¯5 ∨ T¯1 T2 T3 T¯4 . Let us point out that it is possible to use insignificant assignments of state variables for minimization. Let us also point out, that it makes sense only if some variables Tr ∈ T are excluded from all terms of a minimized function. It is explained by the fact that a LUT implements perfect SOP [4] represented by a truth table. Therefore, if a variable Tr ∈ T matches at least one term Fh , it corresponds to a single input of a LUT. The number of LUTs in the circuit of PT FSM depends strongly on the result of constructing the partition Π B . In this chapter, we consider a simple sequential algorithm for finding the partition Π B . The problem is formulated as the following. It is necessary to find a partition Π B of the set A such as: (1) it includes a minimum number of classes Ak and (2) the restriction (3.5) takes place for each class of the Π B . Let us characterize each state am ∈ A by two sets. The set X (am ) includes logical conditions xe ∈ X determining transitions from the state am ∈ A. The set Y (am ) includes microoperations yn ∈ Y generated during transitions from the state am ∈ A. Obviously, if am ∈ Ak , then X (am ) ⊆ X k and Y (am ) ⊆ Y k . We use two evolutions in the sequential algorithm. The first of them determines how many new logical conditions will be added to the set X k due to including a state am ∈ A into the block Ak ∈ Π B . The second evaluation determines the number of microoperations shared by Y (am ) and Y k . Let us denote these evaluations by the
68
3 Twofold State Assignment for Mealy FSMs
symbols N (am , X k ) and N (am , Y k ), respectively. They are calculated in the following manner: (3.22) N (am , X k ) = X (am ) \ X k ; N (am , Y k ) = Y (am ) ∩ Y k .
(3.23)
There are two stages in generation of each block Ak ∈ Π B . Let A∗ = A \ {A1 ∪ A ∪ ... ∪ Ak−1 } be the set of states which are not distributed after creating the blocks A1 , ..., Ak−1 . At the first stage, we choose a state am ∈ A∗ as a basic element (BE) for the class Ak . The BE should satisfy to the following relation: X (am ) = max X (a j ), a j ∈ A∗ \ {am }. (3.24) 2
If condition (3.24) takes place for states am and as , then we chose the state am if m < s. The second stage is a multistep one. At each step, the next state is successively added to the class Ak . It is done in accordance with rules given below. The process is terminated if: (1) all states are already distributed among the classes (A∗ = ∅) or (2) it is not possible to include any state into Ak without violation of (3.5). We propose the following rules for including a next successive state into Ak . Let ∗ A include all unallocated states am ∈ A. Let us choose all states am ∈ A∗ whose inclusion does not violate the restriction (3.5). Let us place them into the set P(Ak ). Let us select a state am ∈ P(Ak ) with the following property: N (am , X k ) = min N (a j , X k ), a j ∈ P(Ak ) \ {am }.
(3.25)
If there are more than a single state having minimum values of (3.25), then we choose a state with the following property: N (am , Y k ) = max N (a j , Y k ), a j ∈ P(Ak ) \ {am }.
(3.26)
If the evaluations (3.26) are equal for several states, let us choose a state with minimum value of subscript. Next, we should eliminate all elements from P(Ak ). It gives P(Ak ) = ∅. Now, we can start the next step in generating the class Ak ∈ Π B . Let us use this algorithm to form the partition Π B for FSM P(Γ4 ). The process is shown in Table 3.2. We assume that there is S = 5. It means that the following pairs L k , Rk are possible: 0, 5, 1, 4, 2, 3, 3, 2, 4, 1. So, each block could include from 31 to 1 states am ∈ A. Let us explain the columns and denotations in Table 3.2. There are states am ∈ A in the column am . The second column shows the numbers of logical conditions for states from the first column. There are basic elements for stage k in the columns BEk(k = 1, 3). The symbol I stands for the evaluation N (am , X k ), the symbol II for N (am , Y k ). The digits 1,2,... show the numbers of the steps in generating the class Ak . The sign “⊕” means that a particular state is chosen as a basic element or that it is included into the set Ak during the step corresponding to a particular column. / A∗ , where am is the state from the corresponding The sign “–” means that am ∈
3.2 Synthesis of Mealy FSM with the Base Structure
69
Table 3.2 Process of generating Π B for FSM P(Γ4 ) am X (am ) B E1 I /I I I /I I B E2 I /I I 1 2 1 a1 a2 a3 a4 a5 a6 a7 a8 a9
0 3 2 0 1 1 1 0 0 Ak
⊕
a2
0/2 ⊕ – 1/2 0/2 1/2 1/1 1/0 0/0 0/1 a1 A1
– – 1/2 0/2 ⊕ 1/2 1/1 1/0 0/0 0/1 a4
– – ⊕ –
a3
– – – – 1/3 ⊕ 1/1 1/1 0/1 0/0 a5 A2
I /I I 2
B E3
I /I I 1
I /I I 2
– – – – – 1/1 1/1 0/1 ⊕ 0/0 a8
– – – – – ⊕
– – – – – – 1/2 ⊕ – 0/0 a7 A3
– – – – – – – – 0/0 ⊕ a9
– a6
row. There are the states am ∈ Ak into the row Ak . They are shown in the order of selecting. The last row shows each state am ∈ A belongs to each class Ak ∈ Π B . As follows from Table 3.2 there are M0 = 9 steps in the process of state selection. As a result, we obtain the following partition Π B = {A1 , A2 , A3 } with K = 3. Also, we get the classes A1 = {a1 , a2 , a4 }, A2 = {a3 , a5 , a8 } and A3 = {a6 , a7 , a9 }. Using the rule (3.25) allows decreasing the amount of the same variables xe ∈ X in different blocks Ak ∈ Π B . In turn, it leads to a decrease in the numbers of LUTs in comparison with the case when logical conditions are duplicated in different classes Ak ∈ Π B . Using the rule (3.26) allows decreasing the amount of the same microoperations yn ∈ Y in different blocks Ak ∈ Π B . It could result in decreasing the number of LUTs in FSM circuit compared to when the microoperations are shared among the different classes Ak ∈ Π B . Let us point out that it is possible a situation when the condition (3.5) is violated even for a single state am ∈ A. It corresponds to the following condition: X (am ) + 1 > S.
(3.27)
If (3.27) takes place, then we see two ways for optimizing circuits of PT FSMs. The first way is reduced to collecting the states for which (3.27) is true into a single block A K +1 . This part of the circuit is implemented using the functional decomposition [6]. Let us say that it is a partially PT Mealy FSM. It has the same structural diagram as the one shown in Fig. 3.3. But it includes K + 1 blocks of LUTers. The condition (3.27) is violated for each state am ∈ A K +1 . We do not discuss this approach in our book. The second way is reduced to transformation of GSA Γ . The transformation is executed for all states, where the condition (3.27) takes place. It leads to increase for the numbers of states and cycles of algorithm’s execution. We discussed this approach in Chap. 2.
70
3 Twofold State Assignment for Mealy FSMs
Both approaches have their drawbacks. It is the task of a designer to choose the approach leading to FSM circuit with required characteristics. We propose the following method for designing PT Mealy FSM: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Constructing the marked GSA Γ and finding the set A. Executing the state assignment. Constructing the structure table of P Mealy FSM. Constructing the partition Π B . Executing the state encoding for states am ∈ Ak . Constructing subtables STk for classes Ak ∈ Π B . Constructing systems (3.11), (3.12) for each class Ak ∈ Π B . Constructing systems (3.13), (3.14) for LUTerYT. Constructing the table for LUTerT . Implementing the circuit of PT FSM with LUTs of a particular FPGA chip.
This algorithm could be applied for the case when the condition (3.27) is violated for all states am ∈ A. Otherwise, we should use either functional decomposition or transformation of GSA for states am ∈ A when condition (3.27) takes place. Let us discuss an example of synthesis for Mealy FSM P T (Γ4 ). There are already executed the first 4 steps of the design method. Remind, that S = 5. So, we have the partition Π B = {A1 , A2 , A3 } with the classes A1 = {a1 , a2 , a4 }, 2 A = {a3 , a5 , a8 } and A3 = {a6 , a7 , a9 }. Using DST Table 3.1 we can find the following sets: X 1 = {x1 , x2 , x3 }, Y 1 = {y1 , ..., y5 }, Φ 1 = {D1 , D3 , D4 }, X 2 = {x3 , x4 , x7 }, Y 2 = {y3 , y5 , y6 , y7 }, Φ 2 = {D1 , ..., D4 }, X 3 = {x5 , x6 }, Y 3 = {y1 , y7 , y8 }, Φ 3 = {D2 , D3 , D4 }. Using (3.6), we can find that Rk = 2(k = 1, 3). It gives R A = 6. So, there is the set T = {τ1 , ..., τ6 }. Let it be T 1 = {τ1 , τ2 }, T 2 = {τ3 , τ4 } and T 3 = {τ5 , τ6 }. Obviously, if S = 5, then the condition (3.5) takes place for each class Ak ∈ Π B . So, it is possible to use the model PT(Γ4 ). Each function Drk , ynk (r = 1, R0 , n = 1, N , k = 1, K ) is implemented as a single LUT. So, the state encoding (step 5) has no influence on the hardware amount. Let us encode the states am ∈ Ak in the following way: C(a1 ) = C(a3 ) = C(a6 ) = 01, C(a2 ) = C(a5 ) = C(a7 ) = 10 and C(a4 ) = C(a8 ) = C(a9 ) = 11. Each subtable STk includes the following columns: am , C(am ), as , K (as ), X hk , k Yh , h. The meaning is clear for all columns (from previous text). In the case of PT(Γ4 ), there are three subtables ST1 − ST3 . They are shown in Tables 3.3, 3.4 and 3.5, respectively. There are H1 = 6 rows in ST1 , H2 = 6 in ST2 and H3 = 5 in ST3 . So, these tables include H0 = 17 rows. It means that there is the same amount of interstate transitions in FSMs P(Γ4 ) and PT(Γ4 ). Systems (3.11), (3.12) are derived from subtables STk . In the discussed case, the following systems could be found: D11 D21 D31 D41
= 0; = τ1 τ2 ; = τ1 τ¯2 ; = τ¯1 τ2 ;
D12 D22 D32 D42
= τ3 τ4 ; = τ3 τ¯4 x7 ; = τ¯3 τ4 ∨ τ3 τ¯4 τ¯7 ; = τ¯3 τ4 x3 ∨ τ¯3 τ4 x4 ∨ τ3 τ¯4 x7 ;
D13 D23 D33 D43
= 0; = D33 ; = τ¯5 τ6 ∨ τ5 τ¯6 ; = τ5 τ¯6 x6 ;
(3.28)
3.2 Synthesis of Mealy FSM with the Base Structure Table 3.3 Subtable ST1 of Mealy FSM P T (Γ4 ) am C(am ) as K (as ) X h1 a1 a2
01 10
a4
11
a2 a3 a3 a3 a3 a5
0001 0010 0010 0010 0010 0100
1 x1 x2 x1 x¯2 x¯1 x3 x¯1 x¯3 1
Table 3.4 Subtable ST2 of Mealy FSM P T (Γ4 ) am C(am ) as K (as ) X h2 a3
01
a5
10
a8
11
a4 a4 a3 a6 a3 a9
0011 0011 0010 0101 0010 1000
x3 x¯3 x4 x¯3 x¯4 x7 x¯7 1
Table 3.5 Subtable ST3 of Mealy FSM P T (Γ4 ) am C(am ) as K (as ) X h3 a6
01
a7
10
a9
11
y11 y21 y31 y41 y51 y61
a7 a7 a8 a7 a1
0110 0110 0111 0110 0000
= τ¯1 τ2 ∨ τ1 τ¯2 x1 x2 ; = τ¯1 τ2 ∨ τ1 τ¯2 x1 x2 ; = τ1 τ¯2 x1 x¯2 ∨ τ1 τ¯2 x¯1 x3 ∨ τ1 τ2 ; = τ1 τ¯2 x¯1 x3 ; = τ1 τ¯2 x¯1 x¯3 ; = y71 = y81 = 0
y23 = y33 = y43 = y53 = y63 = 0; y73 = τ¯5 τ6 x5 ∨ τ5 τ¯6 x6 ;
x5 x¯5 x6 x¯6 1
y12 y32 y52 y62 y72
71
Yh1
Φh1
h
y11 y21 y11 y21 y31 y31 y41 y51 y21 y31
D41 D31 D31 D31 D31 D21
1 2 3 4 5 6
Yh2
Φh2
h
y62 y72 y32 y52 y52 y32 y62 y52 y72
D32 D42 D32 D42 D32 D22 D42 D32 D12
1 2 3 4 5 6
Yh3
Φh3
h
y13 y73 y83 y73 y83 y83 y13
D23 D33 D23 D33 D23 D33 D43 D23 D33
1 2 3 4 5
–
= y22 = y42 = y82 = 0; = τ¯3 τ4 x¯3 x4 ∨ τ3 τ¯4 x7 ; = τ¯3 τ4 x¯3 ∨ τ3 τ¯4 x¯7 ; = τ¯3 τ4 x3 ∨ τ3 τ¯4 x7 ; = τ¯3 τ4 x3 ∨ τ3 τ4 ;
(3.29)
y13 = τ¯5 τ6 x5 ∨ τ5 τ6 ; y83 = τ¯5 τ6 x5 ∨ τ5 τ¯6 ;
The systems (3.13), (3.14) are constructed in the trivial way. The LUTerYT uses outputs of LUTer1–LUTer3 as its inputs. If a function Drk = 0, then it is not used to form the system (3.14). If a function ynk = 0, then it is not used to form the system (3.13). For example, we could find the following equations:
72
3 Twofold State Assignment for Mealy FSMs
Table 3.6 Table of LUTerT for Mealy FSM P T (Γ4 ) am K (am ) Tm
m
τ2 τ1 τ4 τ1 τ2 τ3 τ6 τ5 τ3 τ4 τ5 τ6
1 2 3 4 5 6 7 8 9
a1 a2 a3 a4 a5 a6 a7 a8 a9
0000 0001 0010 0011 0100 0101 0110 0111 1000
y1 = y11 ∨ y13 ; y2 = y21 ;
D1 = D12 ; D2 = D21 ∨ D22 ∨ D23 .
(3.30)
There are M0 = 9 rows in the table of LUTerT . It includes the following columns: am , K (am ), Tm , m. The meaning of these columns is clear from Table 3.6. Let us explain how to fill this table. For example, there is a1 ∈ A1 and C(a1 ) = 01. Because T 1 = {τ1 , τ2 }, then there is the symbol τ2 in the row 1 of Table 3.6. Next, a9 ∈ A3 and C(a9 ) = 11. Because T 3 = {τ5 , τ6 }, then there are symbols τ5 , τ6 in the row 9 of Table 3.6. To execute the last step of the design method, it is necessary to create truth tables for each functions generated by LUTer1,...,LUTerK, LUTerYT and LUTerT . Let us discuss this step for a given example. Because each function (3.11), (3.12) is implemented by a single LUT, each function Drk , ynk is represented by a single truth table. Practically, each function from (3.28), (3.29) is represented as a minimized SOP. There is no problem with such a transition [4]. Let us discuss the function D32 . It depends on the arguments τ3 , τ4 , x7 . To get the perfect SOP, it is necessary to multiply the first term (τ¯3 τ4 ) by the tautology (x7 ∨ x¯7 ). It gives the following equation: D32 = τ¯3 τ4 (x7 ∨ x¯7 ) ∨ τ3 τ¯4 x¯7 = τ¯3 τ4 x7 ∨ τ¯3 τ4 x¯7 ∨ τ3 τ¯4 x¯7 .
(3.31)
There is S = 5, then each truth table includes 32 rows. Let us connect the input 3 of the LUT D32 with τ3 , input 4 with τ4 and input 5 with x7 . It gives the following truth table Table 3.7. Each row h of Table 3.7 corresponds to 4 rows of a truth table having 32 rows. There are numbers of these rows shown in the column h c . The sign “×” in a column means that the particular input could be equal either 0 or 1. Analysis of the systems (3.28), (3.29) shows that there are 21 LUTs in the blocks LUTer1–LUTer3 of FSM P T (Γ4 ). In the common case, there are NΦY LUTs in this circuit, where
3.2 Synthesis of Mealy FSM with the Base Structure
73
Table 3.7 Truth table for LUT D32 1 –
2 –
3 τ3
4 τ4
5 x7
D32
h
hc
× × × × × × × ×
× × × × × × × ×
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
0 0 1 1 1 0 0 0
1 2 3 4 5 6 7 8
1, 9, 17, 25 2, 10, 18, 26 3, 11, 19, 27 4, 12, 20, 28 5, 13, 21, 29 6, 14, 22, 30 7, 15, 23, 31 8, 16, 24, 32
Table 3.8 Truth table for LUT y1 1 2 3 4 – – – y11 × × × ×
× × × ×
× × × ×
0 0 1 1
5 y13
y1
h
hc
0 1 0 1
0 1 1 *
1 2 3 4
1, 5, 9, 13, 17, 21, 25, 29 2, 6, 10, 14, 18, 22, 26, 30 3, 7, 11, 15, 19, 23, 27, 31 4, 8, 12, 16, 20, 24, 28, 32
NΦY = K (R0 + N ).
(3.32)
The expression (3.32) determines, also, the amount of interconnections among LUTer1–LUTerK and LUTerYT. In the case of PT(Γ4 ), there is NΦY = 3(4 + 8) = 36. So, we saved 42% of LUTs and interconnections. It is done due to the proposed strategy of constructing the partition Π B . Due to this strategy, some functions yn ∈ Y are generated only by a single LUTerk. In this case, there is no either LUTyn or interconnection with LUTerYT to function yn ∈ Y . The same is true for input memory functions Dr ∈ Φ. Because K ≤ S in the discussed case, the truth tables for LUTs of LUTerYT are constructed in the trivial way. For example, the following truth table corresponds to the function y1 (Table 3.8). The combination 11 is impossible for y11 and y13 . It is connected with the fact that only a single LUTerk is “active”. It means that y11 · y13 = 0. The table of LUTerT determines R A truth tables for variables τr ∈ T . All these tables have the same inputs, namely, the state variables Tr ∈ T . There is the truth table for function τ1 shown in Table 3.9. Now it is possible to implement the circuit of Mealy FSM P T (Γ4 ). In this case, there are 21 LUTs in the circuits of LUTer1–LUTer3, 7 LUTs in LUTerYT and 6 LUTs in LUTerT . So, there are 34 LUTs in the circuit of Mealy FSM P T (Γ4 ).
74
3 Twofold State Assignment for Mealy FSMs
Table 3.9 Truth table for LUTτ1 1 2 3 4 – T1 T2 T3 × × × × × × ×
× 0 × 0 0 1 1
× 0 × 0 1 0 1
0 0 1 1 × × ×
5 T4
τ1
h
hc
0 1 0 1 × × ×
0 1 0 1 0 0 0
1 2 3 4 5 6 7
1, 5, 9, 13, 17, 21, 25, 29 2, 18 3, 7, 11, 15, 19, 23, 27, 31 4, 20 5, 6, 7, 8, 21, 22, 23, 24 9, 10, 11, 12, 25, 26, 27, 28 13, 14, 15, 16, 29, 30, 31, 32
In the common case, there are NY T LUTs in the circuit of LUTerYT and Nτ LUTs in LUTerT : (3.33) N Y T = R0 + N ; Nτ = R A .
(3.34)
Using (3.32)–(3.34), it is possible to find the maximum number of LUTs in the circuit of PT Mealy FSM: N (Mealy P T ) = (R0 + N )(K + 1) + R A .
(3.35)
This formula could be used if conditions (3.5), (3.17) and (3.18) take places. In the discussed case, there is N (Mealy P T ) = 54. So, our strategy saves 38% of LUTs as compared to the worst case represented by (3.35).
3.3 Synthesis of Mealy FSM with Encoding of Collections of Microoperations Let us find the partition Π B = {A1 , ..., A K } determined in Sect. 3.1. Let it correspond to condition (3.5). Let us find CMOs Yq ⊆ Y for a given GSA Γ . Let us encode each CMO Yq (q = 1, Q) by a binary code K (Yq ) using R Q variables zr ∈ Z , where R Q is determined by (3.24). A class Ak ⊆ A determines the sets X k ⊆ X , Φ k ⊆ Φ and Z k ⊆ Z . The set Z k includes variables zr ∈ Z equal to 1 into the codes K (Yq ) of CMOs generated during transitions from the states am ∈ Ak . Let us encode states am ∈ Ak by binary codes C(am ) having Rk bits, where Rk is determined by (3.6). Let us use variables τr ∈ T k for encoding of states am ∈ Ak . They form the set T having R A variables (3.7).
3.3 Synthesis of Mealy FSM with Encoding of Collections of Microoperations Fig. 3.5 Structural diagram of PYT Mealy FSM
1
X1
75
LUTerK
LUTer1 1
1
Z
Φ
Clock Start
K
XK
ZK
ΦK
LUTerZT Z LUTerY
T LUTer
Y
Now, we propose the model of PYT Mealy FSM shown in Fig. 3.5. In PYT Mealy FSM, the block LUTerk implements the systems (3.12) and Z k = Z k (X k , T k ) (k = 1, K ).
(3.36)
The block LUterZT implements system (3.14) and zr =
K
zrk (r = 1, R Q )
(3.37)
k=1
The block LUTerY implements the system (2.4). The block LUTerT implements the system (3.15), which could be determined as T = T (T ).
(3.38)
Let us analyse the GSA Γ5 Fig. 3.6. The GSA Γ5 has M0 = 10 marks of states found using the rules [1]. It is possible to derive the following sets from GSA Γ5 : A = {a1 , ..., a10 }, X = {x1 , ..., x5 }, Y = {y1 , ..., y8 }. It gives R0 = 4, T = {T1 , ..., T4 }, Φ = {D1 , ..., D4 }, L = 5 and N = 8. Let us form an ST for Mealy FSM P(Γ5 ) represented by Table 3.10. We use trivial state codes K (am ) : K (a1 ) = 0000, ..., K (a10 ) = 1001. The ST is a base for deriving functions Dr ∈ Φ and yn ∈ Y . Analysis of GSA Γ5 allows constructing the following collections of microoperations: Y1 = ∅, Y2 = {y1 , y2 }, Y3 = {y3 }, Y4 = {y2 , y4 }, Y5 = {y3 , y5 }, Y6 = {y6 , y7 }, Y7 = {y1 , y7 }, Y8 = {y7 }, Y9 = {y2 , y5 } and Y10 = {y2 , y8 }. The CMO Y1 is generated during the transition from a10 into a1 (row 20 of Table 3.10). So, there is Q = 10. Using (2.24) leads to R Q = 4 and Z = {z 1 , ..., z 4 }. Let us encode the CMOs Yq ⊆ Y in the trivial way: K (Y1 ) = 0000, ..., K (Y10 ) = 1001. These codes are shown in Karnaugh map (Fig. 3.7). Recall, that we use the symbol PY to underline that there is the encoding of CMOs in a particular Mealy FSM. So, we use the symbol PY (Γ5 ) for Fig. 3.7. To find the
76
3 Twofold State Assignment for Mealy FSMs
Fig. 3.6 Initial GSA Γ5 marked by states of Mealy FSM
partition Π B , it is necessary to construct a transformed ST of PY Mealy FSM. It is Table 3.11 in the case of FSM PY (Γ5 ) There are variables zr ∈ Z in the column Z h of Table 3.11 if there are 1’s in the code K (Yq ) of CMO from the same row of Table 3.10. There is z 4 = 1 in the code K (Y2 ) Fig. 3.7. So, there is the variable z 4 in the first row of Table 3.11. All other rows are filled in the same order. Before discussing the proposed design method, let us discuss how to construct the partition Π B to design PYT FSM. The problem is formulated in the same way as it is for PT Mealy FSM. So, it should be minimum number K of classes in Π B . Also, we try to minimize the appearance of the same logical conditions xe ∈ X into different sets X k (k = 1, K ). Each state am ∈ A is characterized by the sets X (am ) and Z (am ). The set Z (am ) includes variables zr ∈ Z equal to 1 in codes of CMOs Yq ⊆ Y generated during transitions from state am ∈ A. If am ∈ Ak , then X (am ) ⊆ X k and Z (am ) ⊆ Z k .
3.3 Synthesis of Mealy FSM with Encoding of Collections of Microoperations Table 3.10 Structure table of Melay FSM P(Γ5 ) am K (am ) as K (as ) Xh a1 a2
0000 0001
a3
0010
a4 a5
0011 0100
a6
0101
a7 a8
0110 0111
a9 a10
1000 1001
a2 a3 a3 a4 a4 a3 a5 a6 a6 a7 a7 a8 a8 a9 a9 a10 a1 a1 a4 a1
0001 0010 0010 0011 0011 0010 0101 0101 0101 0110 0110 0111 0111 1000 1000 0010 0000 0000 0011 0000
1 x1 x¯1 x2 x1 x¯2 x¯1 x¯2 1 x3 x4 x3 x¯4 x¯3 x5 x¯3 x¯5 x1 x¯1 x3 x¯1 x¯3 1 x5 x¯5 1 x6 x¯6
77
Yh
Φh
h
y1 y2 y3 y2 y4 y1 y2 y3 y5 y2 y4 y6 y7 y6 y7 y3 y2 y4 y7 y2 y5 y2 y8 y3 y3 y2 y4 y2 y8 – y1 y2 –
D4 D3 D3 D3 D4 D3 D4 D3 D2 D2 D4 D2 D4 D2 D3 D2 D3 D2 D3 D4 D2 D3 D4 D1 D1 D1 D4 – – D3 D4 –
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fig. 3.7 Codes K (Yq ) for FSM PY (Γ5 )
Two evaluations are used in the algorithm. The first of them is determined by the difference between the numbers of shared logical conditions and the number of different logical conditions for class Ak ⊆ A and state am ∈ A: N (am , X k ) = X (am ) ∩ X k − X (am ) \ X k .
(3.39)
The second evaluation is equal to the number of variables zr ∈ Z common for Z (am ) and Z k : (3.40) N (am , Z k ) = Z (am ) ∩ Z k . As in the case of PT FSM, the basic element BEk is chosen using (3.24). Each step of adding a state into Ak is connected with two evaluations. Let the set P(Ak )
78
3 Twofold State Assignment for Mealy FSMs
Table 3.11 Transformed ST of Melay FSM PY (Γ5 ) am K (am ) as K (as ) Xh a1 a2
0000 0001
a3
0010
a4 a5
0011 0100
a6
0101
a7 a8
0110 0111
a9 a10
1000 1001
a2 a3 a3 a4 a4 a3 a5 a6 a6 a7 a7 a8 a8 a9 a9 a10 a1 a1 a4 a1
0001 0010 0010 0011 0011 0010 0101 0101 0101 0110 0110 0111 0111 1000 1000 0010 0000 0000 0011 0000
1 x1 x¯1 x2 x1 x¯2 x¯1 x¯2 1 x3 x4 x3 x¯4 x¯3 x5 x¯3 x¯5 x1 x¯1 x3 x¯1 x¯3 1 x5 x¯5 1 x6 x¯6
Zh
Φh
h
z4 z3 z3 z4 z4 z2 z3 z4 z2 z4 z2 z4 z3 z3 z4 z2 z3 z4 z1 z1 z4 z3 z3 z3 z4 z1 z4 – z4 –
D4 D3 D3 D3 D4 D3 D4 D3 D4 D2 D4 D2 D4 D2 D3 D2 D3 D2 D3 D4 D2 D3 D4 D1 D1 D1 D4 – – D3 D4 –
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
include all states whose inclusion into Ak does not violate condition (3.5). These states are analysed to find the one corresponding to the following condition: N (am , X k ) = max N (a j , X k ), a j ∈ P(Ak ) \ {am }. j
(3.41)
If there are two or more states with maximum N (am , X k ), then we choose a state having the following property: N (am , Z k ) = max N (a j , Z k ), a j ∈ P(A∗ ) \ {am }. j
(3.42)
If the values of N (am , Z k ) are equal for several states am ∈ P(A∗ ), then we select a state with the minimum value of subscript m. Let us find the partition Π B for Mealy FSM PY T (Γ5 ). Let it be S = 5. To do it, we should use Table 3.11. The process of creating Π B is represented by Table 3.12. The columns of Table 3.12 have the same meaning as for Table 3.2. But now the symbol “I” stands for (3.39), whereas “II” for (3.40). As follows from Table 3.12, there is the following partition Π B : Π B = {A1 , A2 , A3 } where A1 = {a4 , a5 , a8 }, A2 = {a2 , a3 , a6 } and A3 = {a1 , a7 , a9 }. It is possible to find the following sets: X 1 = {x3 , x4 , x5 }, X 2 = {x1 , x2 , x3 }, X 3 = {x6 },
3.3 Synthesis of Mealy FSM with Encoding of Collections of Microoperations Table 3.12 Process of generating Π B for FSM PY (Γ5 ) X (am ) am B E1 I/II B E2 I/II 1 2 1 2 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10
0 1 2 1 3 2 0 1 0 1 Ak
⊕
a5
0/1 –1/2 –2/3 0/2 – 0/2 0/1 1/2⊕ 0/0 –1/0 a8
0/1 -1/2 -2/3 0/2⊕ – 0/2 0/1 – 0/0 –1/0 a4
⊕
a3
Fig. 3.8 Block diagram of Mealy FSM PY T (Γ5 )
0/1 1/2⊕ – – – 1/2 0/1 – 0/0 –1/0 a2
x3 x4 x5
1
I/II
⊕ a10
x1 x2 x3
2
1
2
3
0/0⊕ – – – – – 0/0 – 0/0 – a1
– – – – – – 0/0⊕ – 0/0 – a7
– – – – – – – – 0/0⊕ – a9
x6
2
Φ1
Clock Start
Z2
3
3
LUTer3
LUTer2
LUTer1 Z1
B E3
0/1 – – – – 1/2⊕ 0/1 – 0/0 –1/0 a6
2
79
D1 D3 D4 Z3 Z4
Φ2
LUTerTZ 4
Z
LUTerY
4
T
LUTer
7
4
Y
Z 1 = {z 1 , ..., z 4 }, Z 2 = {z 1 , ..., z 4 }, Z 3 = {z 3 , z 4 }. Using this information, it is possible to depict a block diagram of Mealy FSM PY T (Γ5 ) (Fig. 3.8). Using (3.6), we can find R1 = R2 = 2 and R3 = 3. It gives R A = 7, T 1 = {τ1 , τ2 }, T2 = {τ3 , τ4 }, T 3 = {τ5 , τ6 , τ7 } and T = {τ1 , ..., τ7 }. There are 8 LUTs in LUTer1, 8 LUTs in LUTer2 and 5 LUTs in LUTer3. Because S = 5 > K = 3, then there are N = 8 LUTs in LUTerY. Because R0 < 5, there are R A = 7 LUTs in LUTerT . So, there are not more than 44 LUTs in the circuit of FSM PY T (Γ5 ). As follows from Fig. 3.8, only x3 is shared between LUTer1 and LUTer2. So, our approach produces circuits with more regular connections than in the case of P Mealy FSMs. There are the following steps in the design of Mealy FSM PYT: 1. Constructing the marked GSA Γ and finding the set of states A. 2. Executing the state assignment.
80
3 Twofold State Assignment for Mealy FSMs
3. 4. 5. 6. 7. 8. 9. 10. 11.
Encoding of collections of microoperations. Constructing the structure table of P Mealy FSM. Constructing the transformed structure table. Constructing the partition Π B . Constructing tables STk for classes Ak ∈ Π B . Finding systems (3.12) and (3.36) for each class Ak . Finding systems (3.14) and (3.37) for LUTerZT. Constructing tables for LUTerY and LUTerT . Implementing FSM circuit with particular FPGA chips.
Let us design the Mealy FSM PY T (Γ5 ). We have already executed 6 first steps of this example. To execute the step 7, let us encode the states am ∈ Ak . The state codes are shown in Table 3.13. Let us construct only the table ST1 Table 3.14. It is constructed using Table 3.11 and state codes from Table 3.13. Using Table 3.14, we can find equations (3.12) and (3.36) for the class A1 ∈ Π B . They are represented by the system (3.43). D11 = τ1 τ2 x5 ; D31 = τ1 τ¯2 x¯3 ; z 11 = τ1 τ2 x¯5 ; z 31 = τ1 τ¯2 x¯4 ∨ τ1 τ¯2 x¯3 ∨ τ1 τ2 x5 ;
D21 = τ¯1 τ2 ∨ τ1 τ¯2 ; D41 = τ1 τ¯2 x3 ∨ τ1 τ2 x5 ; z 21 = τ¯1 τ2 ∨ τ1 τ¯2 x3 x4 ∨ τ1 τ¯2 x¯3 x¯5 ; z 41 = τ¯1 τ2 ∨ τ1 τ¯2 x4 ∨ τ1 τ¯2 x¯3 ∨ τ1 τ2 ;
(3.43)
The system (3.43) is used to implement the circuit of LUTer1. The tables ST2 –ST3 and corresponding Boolean functions could be found in the same way.
Table 3.13 State codes C(am ) for Mealy FSM PY T (Γ5 ) am ∈ A 1 C(am ) am ∈ A 2 C(am ) τ1 τ2 τ3 τ4 a4 a5 a8 –
01 10 11 –
a2 a3 a6 –
01 10 11 –
Table 3.14 Structure table ST1 for Mealy FSM PY T (Γ5 ) am C(am ) as K (as ) Xh a4 a5
01 10
a8
11
a5 a6 a6 a7 a7 a10 a1
0100 0101 0101 0110 0110 1001 0000
1 x3 x4 x3 x¯4 x¯3 x5 x¯3 x¯5 x5 x¯5
am ∈ A 3
C(am ) τ5 τ6 τ7
a1 a7 a9 a10
001 010 011 100
Z h1
Φh1
h
z 21 z 41 z 21 z 41 z 31 z 31 z 41 z 21 z 31 z 41 z 31 z 41 z 11 z 41
D21 D21 D41 D21 D41 D21 D31 D21 D31 D11 D41 –
1 2 3 4 5 6 7
3.3 Synthesis of Mealy FSM with Encoding of Collections of Microoperations
81
Using outputs of LUTer1–LUTer3, we can form Eqs. (3.14) and (3.37). It is done in the trivial way. For example, D23 = 0, so, there is D2 = D21 ∨ D22 . Next, there is the relation z 13 = 0. It means that z 1 = z 11 ∨ z 12 . And so on. Functions (2.4) depend on variables zr ∈ Z . It determines the following columns in table of LUTerY: Yq , K (Yq ), Y , q. There are Q = 10 rows in table of LUTerY for FSM PY T (Γ5 ) (Table 3.15). This table could be viewed as N truth tables for functions yn ∈ Y . Let us point out that all functions yn ∈ Y are equal to zero for codes 1010–1111. Table of LUTerT is constructed on the base of the table of state codes C(am ). It has the following columns: am , K (am ), T , m. In the discussed case, there are M0 = 10 rows in this table (Table 3.16). Let us explain how to fill this table. For example, there is C(a4 ) = 01 Table 3.13. / A2 ) and τ5 = τ6 = τ7 = 0 (a4 ∈ / A3 ). All So, τ1 = 0, τ2 = 1, τ3 = τ4 = 0 (a4 ∈ Table 3.15 Table of LUTerY for Mealy FSM PY T (Γ5 ) Yq K (Yq ) Y z1 z2 z3 z4 y1 y2 y3 y4 y5 y6 y7 y8 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
00000000 11000000 00100000 01010000 00101000 00000110 10000010 00000010 01001000 01000001
Table 3.16 Table of LUTerT for Mealy FSM PY T (Γ5 ) am K (am ) T T1 T2 T3 T4 τ1 τ2 τ3 τ4 τ5 τ6 τ7 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
0000001 0001000 0010000 0100000 1000000 0011000 1000010 1100000 0000011 0000100
q 1 2 3 4 5 6 7 8 9 10
m 1 2 3 4 5 6 7 8 9 10
82
3 Twofold State Assignment for Mealy FSMs
Fig. 3.9 Structural diagram of PT Mealy FSM with EMBerT
X
1
1
X
LUTer1 Y
1
Clock Start
K
K
LUTerK 1
Φ
YK
ΦK
LUTerYT T Y
EMBer
other rows are filled in the same manner. Table of LUTerT corresponds to R A tables of LUTs. We show results of investigation for these methods in this chapter. In the investigation we use the benchmarks from the library [10]. Let us point out that it is possible to use EMBs for implementing some parts of the FSM circuits. It is true for both PT and PYT Mealy FSMs. System (3.15) is a regular one. So, it is possible to replace the block LUTerT by EMBerT . It leads to the following structural diagram of PT Mealy FSM Fig. 3.9. Let us point out that pulses Star t and Clock could be connected with EMBerT . It diminishes the requirements for fan-out in comparison with the circuit from Fig. 3.3. By the way, it is possible to connect Star t and Clock with LUTerT of the circuit from Fig. 3.3. Using EMB is the better way for implementing an FSM circuit [9]. In the case of PT FSMs, it has sense if the following conditions are true: R0 > S;
(3.44)
2 R0 · R A ≤ V0 .
(3.45)
If (3.44) is true, then it is necessary to apply the functional decomposition to functions (3.15). If (3.45) takes place, then it is enough a single EMB to implement all functions τr ∈ T . Obviously, it is possible to replace the block LUTerT by EMBerT in PYT Mealy FSM. The system (2.4) is a regular one. So, the LUTerY could be replaced by EMBerY in PYT FSMs. This approach has sense if the following conditions are true: R Q > S;
(3.46)
2 R Q · N ≤ V0 .
(3.47)
If (3.46) is true, then the system (2.4) should be decomposed. In this case, the LUTerY is implemented using at least 2N LUTs. If (3.47) is true, then it is enough a single EMB to implement the system (2.4).
3.3 Synthesis of Mealy FSM with Encoding of Collections of Microoperations Fig. 3.10 Structural diagram of PYT Mealy FSM with a single EMB
1
X1
83
LUTer1 1
Z Clock Start
K
XK LUTerK ZK
1
Φ
ΦK
LUTerZT T
Z
EMBerY
Y
It is known that EMBs of modern FPGA chips have the dual-port structure [7, 8]. It means that it is possible to implement both systems (2.4) and (3.15) using a single EMB. It is possible if the following conditions together take places:
tF & RA ≤ = 1; 2
(3.48)
tF V0 2 R0 · N ≤ & N≤ = 1. 2 2
(3.49)
2 R0 · R A ≤
V0 2
If condition (3.48), (3.49) are true, then there is the following structural diagram of PYT Mealy FSM Fig. 3.10. We do not discuss these approaches in our book. Let us point out that it is possible to use the same tables for blocks LUTerY and EMBerY. The same is true for tables for LUTerT and EMBerT . But there is no need in the transformation of these tables if EMBs are used. Of course, to apply EMBs, we should have “free” blocks, do not used for implementing other parts of a digital system.
3.4 Investigation of Proposed Method To investigate the efficiency of the proposed method, we use standard benchmarks from the LGSynth93 library [11]. It includes 48 benchmarks taken from the practice of FSM design. These benchmarks are presented in the KISS2 format. There are characteristics of benchmarks shown in Table 3.17. To use these benchmarks, we used the CAD tool named K2F. It translates the KISS2 file into VHDL model of an FSM. To synthesize and simulate the FSM, we use the Active-HDL environment. To get the FSM circuit, we use Xilinx ISE package. The investigation path used in our system is shown in Fig. 3.11. The Xilinx ISE 14.1 package was used for synthesis and implementation of FSM for a given control algorithm.
84
3 Twofold State Assignment for Mealy FSMs
Table 3.17 Characteristics of benchmarks Benchmar k L N
H0
M0
R0
60 56 24 28 91 56 32 108 32 14 30 96 138 72 36 21 32 34 36 170 370 11 25 22 10 24 22 115 115 73 107 251 250 107 153 34 1096 64 20
10 16 6 7 16 7 4 27 8 7 15 24 20 19 10 14 9 8 10 19 16 4 9 15 4 12 10 48 48 24 20 48 48 20 18 6 218 13 5
4 4 3 3 4 3 2 5 3 3 4 5 25 5 4 4 4 3 4 5 4 2 4 4 2 4 4 6 6 5 5 6 6 5 5 3 8 4 3
bbara bbsse bbtas bbcount cse dk14 dk15 dk16 dk17 dk27 dk512 donfile ex1 ex2 ex3 ex4 ex5 ex6 ex7 keyb kirkman lion lion9 mark1 mc modulo12 opus planet planet1 pma s1 s1488 s1494 s1a s208 s27 s298 s386 s8
4 7 2 3 7 3 3 2 2 1 1 2 9 2 2 6 2 5 2 7 12 2 2 5 3 1 5 7 7 8 8 8 8 8 11 4 3 7 4
2 7 2 4 7 5 5 3 3 2 3 1 19 2 2 9 2 8 2 7 6 1 1 16 5 1 6 19 19 8 6 19 19 6 2 1 6 7 1
(continued)
3.4 Investigation of Proposed Method Table 3.17 (continued) Benchmar k L sand shiftreg sse styr tav tbk tma train11 train4
11 1 7 9 4 6 7 2 2
Fig. 3.11 Typical investigation path based on K2F tool
85
N
H0
M0
R0
9 1 7 10 4 3 6 21 1
184 16 56 166 49 1569 44 5 14
32 8 16 30 4 32 20 11 4
5 3 4 5 2 5 5 4 2
KISS2
K2F
transformation
VHDL structures generation
synthesis FPGA
synthesis
We compared our approach with four other methods. There are the following methods used for comparison: (1) Auto of ISE 14.1; (2) Compact of ISE 14.1; (3) JEDI; (4) DEMAIN. The results of investigations are shown in Table 3.18. For each method, we found two characteristics of benchmark FSMs. They are the number of LUTs in FSM circuit (columns “LUTs”) and the FSM maximum operating clock frequency (column “Freq.”) measured in MHz. There are results of summation for both number of LUTs and frequency in the row “Total”. We have taken the summarized characteristics of PT as 100%. The row “Percentage” shows the percentage of summarized characteristics respectively to the benchmarks synthesized as P T . As follows from Table 3.18, the proposed method allows minimizing the number of LUTs in FSM circuits in comparison with other investigated methods. There is the following economy: (1) 22% in comparison with P Auto; (2) 28% in comparison with P Compact; (3) 8% in comparison with JEDI-based FSMs and (4) 12% in comparison with FSMs designed by DEMAIN.
86
3 Twofold State Assignment for Mealy FSMs
Table 3.18 Results of investigations for PT Mealy FSMs Benchmar k P Auto P Com J E DI D E M AI N PT LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. bbara bbsse bbtas bbcount cse dk14 dk15 dk16 dk17 dk27 dk512 donfile ex1 ex2 ex3 ex4 ex5 ex6 ex7 keyb kirkman lion lion9 mark1 mc modulo12 opus planet planet1 pma s1 s1488 s1494 s1a s208 s386 s8 sand
11 29 5 7 49 8 7 16 6 5 17 15 64 14 12 15 14 29 14 56 51 3 6 27 5 26 22 100 100 73 77 140 124 77 28 26 4 99
639 559 962 952 480 545 1062 556 952 900 730 558 586 940 980 962 986 553 988 384 874 1084 980 726 1071 612 596 888 888 554 550 425 412 550 559 577 962 569
13 29 5 7 46 8 7 15 6 5 7 14 74 16 13 16 15 20 15 65 53 3 5 19 5 28 21 138 138 72 75 141 143 75 23 28 4 121
635 582 966 952 463 945 1062 625 952 897 899 612 447 985 986 626 986 621 990 358 569 1080 996 708 1071 632 628 389 389 438 447 432 442 447 669 581 962 426
10 24 5 6 42 7 6 14 6 5 13 11 51 11 12 12 13 26 12 48 43 3 5 22 5 19 17 88 88 67 70 131 112 70 23 24 4 89
690 592 980 989 498 982 1090 582 952 900 789 596 620 1002 982 1003 998 579 1002 410 901 1080 996 798 1071 710 688 989 989 596 598 470 492 598 670 598 962 612
9 26 5 5 44 6 7 16 5 4 14 13 53 12 11 13 12 24 11 50 49 3 5 24 5 22 20 92 92 68 76 136 118 76 25 22 4 91
702 580 978 1022 482 996 1066 578 964 912 776 574 608 988 998 996 1003 599 1060 398 898 1080 998 749 1071 678 642 921 921 574 582 452 478 586 582 621 962 598
11 22 5 6 38 7 6 12 6 6 11 9 46 10 12 10 13 25 11 44 40 3 5 20 5 21 18 80 80 64 65 121 108 64 20 22 4 81
680 604 980 992 521 980 1068 616 958 900 817 629 649 1056 980 1079 996 586 1062 432 977 1080 996 810 1071 685 670 1058 1058 614 621 484 508 620 692 620 962 637 (continued)
3.4 Investigation of Proposed Method
87
Table 3.18 (continued) Benchmar k P Auto P Com J E DI D E M AI N PT LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. shiftreg sse styr tav tbk tma train11 train4 s27 s298 Total Percentage
3 29 118 6 55 30 28 8 4 362 2028 122%
1584 559 430 1556 406 440 560 416 962 406 35870 93%
3 28 127 6 71 32 26 10 4 330 2125 128%
1584 543 369 911 465 438 580 466 962 313 33526 87%
3 24 109 6 48 26 25 8 4 320 1797 108%
1584 610 476 1560 492 476 598 416 962 438 37686 98%
3 26 111 6 49 28 27 7 4 334 1863 112%
1584 588 462 1560 484 461 572 470 962 429 37245 97%
3 20 98 6 41 22 26 7 4 309 1657 100%
1584 624 490 1560 517 498 582 470 962 458 38493 100%
The following conclusion can be made. There are more LUTs in FSM circuits designed by ISE 14.1 in comparison with their counterparts designed using either JEDI or DEMAIN or K2F. It M0 < 15, then the best results are obtained using JEDI. Our approach gives better results for rather complex FSMs having more than 15 states. Sometimes, the DEMAIN gives better results than JEDI (for rather simple FSMs). So, it has sense to use the twofold state assignment for rather complex Mealy FSMs having more than 15 states. It allows getting circuits with less amount of LUTs and higher operating frequency than it is for other methods. Also we investigated the methods based on combining the twofold state assignment and encoding of collections of microoperations [5]. The same benchmarks were used for this research. The results of investigations are shown in Table 3.19. As follows from Table 3.19, the proposed method allows minimizing the number of LUTs in FSM circuits in comparison with other investigated methods. There is the following economy: (1) 23% in comparison with P Auto; (2) 29% in comparison with P Compact; (3) 9% in comparison with JEDI-based FSMs and (4) 13% in comparison with FSMs, designed by DEMAIN. The following conclusion can be made. There are more LUTs in FSM circuits designed by ISE 14.1 in comparison with their counterparts designed using either JEDI or DEMAIN or K2F. If M0 < 15, then the best results are obtained using JEDI. Our approach gives better results for rather complex FSMs having more than 15 states. Sometimes, the DEMAIN gives better results than JEDI (for rather simple FSMs). To support this conclusion, we make Table 3.20. It contains the results of investigations for 10 the most complex benchmarks of [11]. As follows from Table 3.20, our approach requires 19% fewer LUTs in comparison with JEDI and 25% fewer in comparison with DEMAIN. So, the gain is practically two times more for complex benchmarks of [11] than the average gain for all benchmarks.
88
3 Twofold State Assignment for Mealy FSMs
Table 3.19 Results of investigations for P TY Mealy FSMs Benchmar k P Auto P Com J E DI D E M AI N PT Y LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. bbara bbsse bbtas bbcount cse dk14 dk15 dk16 dk17 dk27 dk512 donfile ex1 ex2 ex3 ex4 ex5 ex6 ex7 keyb kirkman lion lion9 mark1 mc modulo12 opus planet planet1 pma s1 s1488 s1494 s1a s208 s386 s8 sand
11 29 5 7 49 8 7 16 6 5 17 15 64 14 12 15 14 29 14 56 51 3 6 27 5 26 22 100 100 73 77 140 124 77 28 26 4 99
639 559 962 952 480 545 1062 556 952 900 730 558 586 940 980 962 986 553 988 384 874 1084 980 726 1071 612 596 888 888 554 550 425 412 550 559 577 962 569
13 29 5 7 46 8 7 15 6 5 7 14 74 16 13 16 15 20 15 65 53 3 5 19 5 28 21 138 138 72 75 141 143 75 23 28 4 121
635 582 966 952 463 945 1062 625 952 897 899 612 447 985 986 626 986 621 990 358 569 1080 996 708 1071 632 628 389 389 438 447 432 442 447 669 581 962 426
10 24 5 6 42 7 6 14 6 5 13 11 51 11 12 12 13 26 12 48 43 3 5 22 5 19 17 88 88 67 70 131 112 70 23 24 4 89
690 592 980 989 498 982 1090 582 952 900 789 596 620 1002 982 1003 998 579 1002 410 901 1080 996 798 1071 710 688 989 989 596 598 470 492 598 670 598 962 612
9 26 5 5 44 6 7 16 5 4 14 13 53 12 11 13 12 24 11 50 49 3 5 24 5 22 20 92 92 68 76 136 118 76 25 22 4 91
702 580 978 1022 482 996 1066 578 964 912 776 574 608 988 998 996 1003 599 1060 398 898 1080 998 749 1071 678 642 921 921 574 582 452 478 586 582 621 962 598
12 21 8 9 40 12 11 12 8 9 12 10 46 12 14 15 16 28 14 42 41 6 7 26 7 24 22 64 64 52 61 101 93 64 20 24 8 81
650 620 860 920 480 860 920 594 920 880 760 580 620 980 960 920 890 580 992 420 890 920 910 744 980 640 622 940 940 580 570 460 472 580 590 590 920 602 (continued)
3.4 Investigation of Proposed Method
89
Table 3.19 (continued) Benchmar k P Auto P Com J E DI D E M AI N PT Y LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. LU T s Fr eq. shiftreg sse styr tav tbk tma train11 train4 s27 s298 Total Percentage
3 29 118 6 55 30 28 8 4 362 2028 123%
1584 559 430 1556 406 440 560 416 962 406 35870 102%
3 28 127 6 71 32 26 10 4 330 2125 129%
1584 543 369 911 465 438 580 466 962 313 33526 95%
3 24 109 6 48 26 25 8 4 320 1797 109%
1584 610 476 1560 492 476 598 416 962 438 37686 107%
3 26 111 6 49 28 27 7 4 334 1863 113%
Table 3.20 Results of investigations for the most complex benchmarks Benchmar k M0 J E DI D E M AI N LU T s Fr eq. LU T s Fr eq. s298 planet planet1 s1488 s1494 sand tbk styr dk16 donfile Total Percentage
218 48 48 48 48 32 32 30 27 24
320 88 88 131 112 89 48 109 14 11 1010 111%
438 989 989 470 492 612 492 976 582 596 6116 104%
339 92 92 136 118 91 49 111 16 13 1062 125%
429 921 921 452 478 598 484 462 578 574 5898 100,3%
1584 588 462 1560 484 461 572 470 962 429 37245 105%
6 21 92 8 36 21 28 9 6 286 1650 100%
1220 562 440 1402 441 458 568 401 890 410 35151 100%
LU T s
PT Y Fr eq.
286 64 64 101 93 81 36 92 12 10 849 100%
410 940 940 460 472 602 441 440 594 580 5879 100%
As follows from Table 3.19, our approach produces FSMs which are a bit slower than FSMs produced by P Auto (2%), JEDI (5%) and DEMAIN (5%). But this drawback is diminished for complex benchmarks Table 3.20. For the complex benchmarks, our approach provides the operating frequency only 4% fewer than JEDI and practically the same as DEMAIN. Let us point out that these conclusions are valid only for the benchmarks [11] and the device XC5VLX30FF324 used for implementing FSM circuits. In the case of FPGA-based design, it is almost impossible to make precise predictions for the common case. However, it is evident from our investigations that our approach could give better results for Mealy FSMs with M0 > 15.
90
3 Twofold State Assignment for Mealy FSMs
References 1. Baranov S (1994) Logic synthesis of control automata. Kluwer Academic Publishers, Boston 2. Baranov S (2008) Logic and system design of digital systems. TUT Press, Tallinn 3. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on CPLD. In: Proceedings of the sixth international conference CAD DD’07, vol 2. Minsk, pp 39–45 4. Barkalov A, We˛grzyn M, Wi´sniewski R (2006) Partial reconfiguration of compositional microprogram control units implemented on FPGAs. In: Proceedings of IFAC workshop on programmable devices and embedded systems (Brno), pp 116–119 5. Barkalov O, Titarenko L, Mielcarek K (2018) Hardware reduction for LUT-based mealy FSMs. Int J Appl Math Comput Sci 28(3):595–607 6. Maxfield C (2004) The design Warrior’s guide to FPGAs. Academic Press Inc, Orlando 7. Maxfield C (2008) FPGAs: Instant access. Newnes 8. Navabi Z (2007) Embedded core design with FPGAs. McGraw-Hill, New York 9. Rudell R, Sangiovanni-Vincentelli A (1987) Multiple-valued minimization for PLA optimization. IEEE Trans Comput-Aided Des 6(5):727–750 10. Xilinx. XST user guide. V. 11.3. http://www.xilinx.com/support/documentation/sw_manuals/ xilinx11/xst.pdf. Accessed Jan 2019 11. Yang S (1991) Logic synthesis and optimization benchmarks user guide. Technical report, Microelectronic Center of North Carolina
Chapter 4
Twofold State Assignment for Moore FSMs
Abstract The chapter starts from main idea of using the twofold state assignment for optimizing the circuits of FPGA-based Moore FSMs. This method could be combined with refined state assignment leading to minimizing the block of microoperations. It is shown that using classes of pseudoequivalent states could be combined together with the twofold state assignment. There is proposed the formal method of partition of the classes of PES leading to the minimal number of LUT-based blocks in the final FSM circuit. Next, it is shown how to combine the twofold state assignment with encoding of the collections of microoperations. The last part of the chapter is devoted to combining the twofold state assignment with encoding of the fields of compatible microoperations. There are proposed structural diagrams of FSM circuits. The examples of synthesis are given for majority of proposed FSM models.
4.1 Analysis of Possible Solutions Two specifics differ Moore and Mealy FSMs: (1) existence of pseudoequivalent states creating classes Bi ∈ ΠA ; (2) dependence of microoperations on states that permits the refined state encoding (if R > S). So, two approaches are possible for dividing an initial DST by subtables STk (k = 1, K). It is possible to find a partition of either the set A or the set ΠA . But in both cases, there is the same structural diagram of PT Moore FSM (Fig. 4.1). In PT Moore FSM, each block LUTerk implements only functions Drk ∈ Φ k , where Φ k ⊆ Φ. These functions are represented by system (3.12). So, it is necessary to implement only system (3.14). It is implemented by LUTerT. Because both systems T and Y depend only on state variables Tr ∈ T , both systems are generated by a single block LUTerYT . Systems (1.3) and (3.38) are regular. So, it is possible to replace the LUTerYT by EMBerYT (Fig. 4.2). If only LUTs are used in FSM circuit, that it has sense to use the refined state encoding [18, 19]. If the following relation takes place
© Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7_4
91
92
4 Twofold State Assignment for Moore FSMs
Fig. 4.1 Structural diagram of PT Moore FSM
X
1
1
K
XK
LUTer1
LUTerK
1
ΦK
Φ Clock Start
LUTerT T LUTerY
Y
Fig. 4.2 Structural diagram of PT Moore FSM with EMBs
1
X1
K
K
X
LUTerK
LUTer1
K
1
Φ
Φ Clock Start
LUTerT T EMBerY
Y
R > S,
(4.1)
then it is necessary to use the functional decomposition for functions yn ∈ Y . Let us discuss the following example. Let the system (1.3) be represented as the system (4.2): y1 y3 y5 y7
= A2 ∨ A3 ; = A3 ∨ A6 ; = A3 ∨ A8 ∨ A9 ; = A9 ∨ A10 ∨ A11 ;
y2 y4 y6 y8
= A3 ∨ A4 ∨ A5 ; = A2 ∨ A7 ; = A4 ∨ A7 ∨ A10 ; = A2 ∨ A3 ∨ A4 ∨ A7 .
(4.2)
Let M = 11, so R = 4 and T = {T1 , ..., T4 }. If S = 3, then each equation of (4.2) requires 3 LUTs to be implemented. It gives 24 LUTs and 72 interconnections. Let us encode states am ∈ A as it is shown in Fig. 4.3. To do it, we use the algorithm from [20]. After minimizing, we can transform (4.2) into the following system:
4.1 Analysis of Possible Solutions
93
Fig. 4.3 Outcome of refined state encoding for (4.2)
Fig. 4.4 Structural diagram of PYT Moore FSM
X
1
1
LUTerK
LUTer1
K
1
Φ
Φ Clock Start
K
K
X
LUTerT T LUTeZ Z LUTerY
Y
y1 = T¯1 T2 T3 ; y5 = T¯1 T4 ;
y2 = T¯3 T4 ; y6 = T1 T¯2 ;
y3 = T1 T¯2 ; y7 = T2 T3
y4 = T2 T¯3 T¯4 ; y8 = T¯2 T¯3 .
(4.3)
To implement (4.3), it is enough N = 8 LUTs. This circuit has 18 interconnections. There are two levels of logic for implementing (4.2). But it is enough only a single level of logic for implementing (4.3). Let us encode each CMO Yq ⊆ Y by a binary code K(Yq ) using RQ (2.27) elements of the set Z. The variables zr ∈ Z are functions of state variables Tr ∈ T . It is a system (2.46). This approach leads to PYT Moore FSM (Fig. 4.4). This approach has sense if the condition (4.1) takes place together with the following condition: (4.4) RQ ≤ S. Let us point out that it is possible to encode CMOs in such an order that the number of interconnections is minimal for LUTerY. Let it be the following system representing the system Y = Y (Z): y1 = Y2 ∨ Y3 ∨ Y8 ∨ Y9 ; y4 = Y4 ∨ Y5 ; y7 = Y4 .
y2 = Y3 ∨ Y6 ; y5 = Y2 ∨ Y3 ;
y3 = Y5 ∨ Y7 ∨ Y8 ∨ Y9 ; y6 = Y5 ∨ Y8 ;
(4.5)
94
4 Twofold State Assignment for Moore FSMs
Fig. 4.5 Refined encoding of CMOs for system (4.5)
If S = 3, then it is necessary 3 LUTs to implement each function of (4.5). The resulting circuit includes 21 LUTs, 2 levels of logic and 63 interconnections. Let us use an algorithm [20] and encode the CMOs Yq ⊆ Y as it is shown in Fig. 4.5. Let us call this style a refined encoding of CMOs. After minimizing the following system is created: y1 = z2 ; y5 = z2 z¯3 ;
y2 = z¯3 z4 ; y6 = z3 z¯4 ;
y3 = z3 ; y7 = z1 z¯3 .
y4 = z1 ;
(4.6)
To implement the circuit of LUTerY for (4.6), it is necessary 4 LUTs. The circuit has a single level of logic and 8 interconnections. So, there is an obvious gain thanks to the using the refined encoding of CMOs. There are two ways for implementing PYT FSMs. The first is connected with partitioning the set A. The second is reduced to partitioning the set ΠA . There are different methods of object transformation used in FSM design [7, 8, 11–17, 21, 22]. They could be used for optimizing PT Moore FSMs. Let us discuss one of the possible approaches. As a rule, the following relation takes place for Moore FSMs: Q ≤ M.
(4.7)
It means that each state am ∈ A could be represented as Yq , Im , where Yq = Y (am ) and Im is an identifier [8]. Let it be MI identifiers. So, it is necessary RI variables vr ∈ V to encode the identifiers: RI = log2 MI .
(4.8)
Each code K(am ) is represented as K(am ) = K(Yq ) ∗ K(Im ).
(4.9)
In (4.9), K(Im ) is a code of the identifier Im , “*” is a sign of concatenation. If CMOs are transformed into states, then an FSM is denoted as PY Y FSM [8]. There is the structural diagram of PY YT Moore FSM shown in Fig. 4.6.
4.1 Analysis of Possible Solutions
95
Fig. 4.6 Structural diagram of PY YT Moore FSM
The LUTerk implements the systems Z k = Z k (T k , X k ) (k = 1, K);
(4.10)
V k = V k (T k , X k ) (k = 1, K);
(4.11)
The LUTerVZ implements the system (3.37) and vr =
K
vrk (r = 1, RI )
(4.12)
k=1
The LUTerY implements the system (2.4), the LUTerT the system T = T (Z, V ).
(4.13)
It is possible to replace LUTerY and LUTerT by a single EMBerYT . It is possible, if the following condition takes place: 2RQ +RI (N + RA ) ≤ V0 .
(4.14)
The structural diagram (Fig. 4.7) corresponds to the PY YT Moore FSM with EMBerYT . Now let us discuss how to design logic circuits of PT Moore FSMs. Let us denote the structural diagram shown in Fig. 4.1 as a base structure of Moore FSM with twofold state assignment.
96
4 Twofold State Assignment for Moore FSMs
Fig. 4.7 Structural diagram of PY YT Moore FSM with EMBs
1
X1
X
LUTer1 1
LUTerK 1
Z
V
Clock Start
K
K
K
Z
K
V
LUTerVZ Z
V EMBerY Y
4.2 Synthesis of Moore FSM with the Base Structure Let us use the GSA Γ6 shown in Fig. 4.8. It is marked by M = 17 states of Moore FSM using the rules [5]. The following sets could be derived from GSA Γ6 : X = {x1 , ..., x7 }, Y = {y1 , ..., y9 } and A = {a1 , ..., a17 }. It gives L = 7, N = 9 and M = 17. Using (1.4), we can find that R = 5. It gives the sets T = {T1 , ..., T5 } and Φ = {D1 , ..., D5 }. Let us start from the PT Moore FSM based on partitioning the set A. We will try to encode the states in a way optimizing the circuits of LUTer1,...,LUTerK, LUTerT (Fig. 4.1). So, we encode the states after partitioning the set A. Let us find the partition ΠB for GSA Γ6 . It is done as for Mealy PT FSM. Let us try to minimize the number of logical conditions and states of transitions for classes Ak ∈ ΠB . It could be done using GSA Γ6 . Let it be S = 5. The following partition ΠB = {A1 , ..., A4 } could be found where A1 = {a2 , a3 , a4 }, A2 = {a1 , a5 , a6 , a10 , a11 , a15 }, A3 = {a7 , a8 , a9 , a13 , a14 }, A4 = {a12 , a16 , a17 }. It gives the sets X 1 = {xa , x2 , x3 }, X 2 = {x2 , x3 }, X 3 = {x5 } and X 4 = {x4 , x6 , x7 }. Let the symbol NΦ stands for the maximal number of LUTs in blocks LUTerk (k = 1, K). For PT Moore FSM, there is NΦ = K · R.
(4.15)
There is NΦ = 20 in the discussed case. Let A(Ak ) be the set of states of transitions for states am ∈ Ak . Let the following condition take place for all states as ∈ A(Ak ): Trk = 0.
(4.16)
In this case, there is Drk = 0. It means that there is no LUT for implementing the function Drk in LUTerk.
4.2 Synthesis of Moore FSM with the Base Structure Fig. 4.8 Initial GSA Γ6
97
98
4 Twofold State Assignment for Moore FSMs
Fig. 4.9 State codes for Moore FSM PT (Γ6 )
Let us encode the states am ∈ A in such a way that there is maximal possible number of equations (4.16) for a given FSM. It could be done using, for example, the methods from [9, 10]. In the discussed case, there are the following sets: A(A1 ) = {a3 , ..., a9 }, A(A2 ) = {a2 , a7 , a8 , a9 , a12 , a16 , a17 }, A(A3 ) = {a10 , a11 , a15 } and A(A4 ) = {a1 , a9 , a13 , a14 }. Let us encode the states am ∈ A as it is shown in Fig. 4.9. The following equations could be extracted from Fig. 4.9: T21 = 0; T32 = T42 = 0; T13 = T23 = 0; T14 = T24 = T34 = 0. So, there are D21 = D32 = D42 = D13 = D23 = D14 = D24 = D34 = 0. It means that NΦ = 20 − 8 = 12. We subtracted the number of functions Drk = 0 from the maximal value of NΦ . Let us point out that the number of interconnections also decreases. Let us name this approach a “diminishing state assignment”. It could be used in all models of Mealy and Moore FSMs with twofold state assignment. Because only T22 = 0, the function D2 ∈ Φ is generated by LUTer2. So, there are only 4 LUTs in LUTerT. Because condition (4.1) is true, there are N + RA LUTs in LUTerYT . Analysis of set Ak gives R1 = R4 = 2, R2 = R3 = 3. So, there is RA = 10. It gives NY T = 19. In total, there are NΦ + NT + NY T = 35 LUTs in the circuit of Moore FSMPTB (Γ6 ). The subscript B shows that the partition ΠB is constructed. Now let us analyse the model PTC . The subscript C means that the design method is based on the partition ΠC = {B1 , ..., BK }. We explain it a bit later. Let us start from the method of finding the partition ΠC . Let we have a partition ΠA of the set A by classes of PES. The problem is formulated as the following. It is necessary to find a partition ΠC for the set ΠA such that: (1) it includes the minimum possible number of blocks and (2) the restriction (3.5) has place for each class Bk ∈ ΠC . Now Rk is the number of class variables used to encode the classes Bi ∈ Bk (k = 1, K). The partition ΠC could be constructed using a sequential algorithm similar to previous ones. Each class Bi ∈ ΠA is characterized by two sets. The set X (Bi ) includes logical conditions xe ∈ X determining transitions from states am ∈ Bi (i = 1, I ). The set A(Bi ) includes the states of transitions from states am ∈ Bi . If Bi ∈ Bk , then X (Bi ) ⊆ X k . Let us form a set A(Bk ) such that A(B ) = k
I i=1
A(Bi ), Bi ∈ Bk .
(4.17)
4.2 Synthesis of Moore FSM with the Base Structure
99
We use two evaluations in this algorithm. The evaluation N (Bi , X k ) shows how many new logical conditions would be included into the set X k due to including the class Bi ∈ ΠA into the class Bk ∈ ΠC . The evaluation N (Bi , A(Bk )) shows the number of states of transition shared by A(Bi ) and A(Bk ). These evaluations are calculated as the following: N (Bi , X k ) = X (Bi ) \ X k ;
(4.18)
N Bi , A(Bk ) = A(Bi ) ∩ A(Bk ).
(4.19)
Each class Bk ∈ ΠC is generated in two stages. At the first stage, we take a class Bi ∈ ΠA∗ as a basic element of Bk ∈ ΠC . Here ΠA∗ is a set of undistributed classes Bi ∈ ΠA . It could be determined as ΠA∗ = ΠA \ {B1 ∪ B2 ∪ ... ∪ Bk−1 }. The BE should satisfy to the following condition: X (Bi ) = max X (Bj ), Bj ∈ Π ∗ \ {Bi }. A j
(4.20)
If condition (4.20) takes places for classes Bi and Bj , then let us choose a class Bi such that i < j. The second stage is a multistep one. At each step the next element of ΠA∗ is successively added to the block Bk . To do it, we use the rules given below. The process is terminated for a block Bk if: (1) all classes Bi ∈ ΠA are already distributed or (2) it is not possible to include any class Bi ∈ ΠA∗ without violation of (3.5). There are the following rules for including the next successive class in Bk . First of all, we choose all classes Bi ∈ ΠA∗ whose inclusion into Bk do not violate (3.5). Let us place these classes into the set P(Bk ). Let us select a class Bi ∈ P(Bk ) such that: (4.21) N (Bi , X k ) = min N (Bj , X k ), Bj ∈ P(Bk ) \ {Bi }. j
If there are more than a single such class Bi , then let us choose a class with the following property: N Bi , A(Bk ) = max N Bj , A(Bk ) , Bj ∈ P(Bk ) \ {Bi }. j
(4.22)
If evaluations (4.22) are equal for several classes Bi ∈ P(Bk ), then any of them could be included into the class Bk . Next, we eliminate all elements from the set P(Bk ) to get P(Bk ) = ∅. There is the process of forming the partition ΠC for Moore FSMPTC (Γ6 ) represented by Table 4.1. We use LUTs with S = 5. It gives the following pairs Lk , Rk : 0, 5, 1, 4, 2, 3, 3, 2, 4, 1. So, each class Bi ∈ ΠC could include from 1 to 31 elements. We hope everything is clear in Table 4.1. Let us only point out that “I” stands for (4.21) and “II” for (4.22).
100
4 Twofold State Assignment for Moore FSMs
Table 4.1 Process of forming the partition ΠC Bi X (Bi ) BE1 I /II I /II BE2 1 2 B1 B2 B3 B4 B5 B6 B7 B8 B9
0 3 2 1 0 3 0 1 0 Bk
⊕
B2
0/0 ⊕ – 0/0 1/0 0/0 3/0 0/0 0/0 0/0 B1
– – 0/0 ⊕ 1/0 0/0 3/0 0/0 0/0 0/0 B3
– – – – ⊕
B6
I /II 1
I /II 2
BE3
I /II 1
I /II 2
– – – 1/0 0/0 ⊕ – 0/0 1/0 0/0 B5
– – – 1/0 – – 0/0 ⊕ 1/0 0/0 B7
– – – ⊕ – –
– – – – – – – 1/0 0/0 ⊕ B9
– – – – – – – 1/0 ⊕ – B8
– B4
Using Table 4.1, we can find the following sets: B1 = {B1 , B2 , B3 }, B2 = {B5 , B6 , B7 } and B3 = {B4 , B8 , B9 }. Using GSA Γ6 , we can find that X 1 = {x1 , x2 , x3 }, X 2 = {x4 , x6 , x7 } and X 3 = {x3 , x5 }. Also, it is possible to find the sets A(Bk ): A(B1 ) = {a2 , ..., a9 }, A(B2 ) = {a9 , a12 , ..., a15 }, A(B3 ) = {a1 , a10 , a11 , a16 , a17 }. There are R1 = R2 = R3 = 2, RA = 6, T 1 = {τ1 , τ2 }, T 2 = {τ3 , τ4 }, T 3 = {τ5 , τ6 }, T = {τ1 , ..., τ6 }. Before further analysis, let us propose the method of synthesis for PTC Moore FSM. It includes the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Finding the set of states A for a GSA Γ . Constructing the partition ΠA = {B1 , ..., BI }. Constructing the partition ΠC = {B1 , ..., BK }. Executing the state assignment. Executing the class assignment. Constructing subtables STk for classes Bk ∈ ΠC . Constructing the systems (3.12). Constructing the table of LUTerT. Constructing the table of LUTerYT . Implementing FSM circuit with particular LUTs.
There are already executed the steps 1–3 for FSMPTC (Γ6 ). Let us encode the states am ∈ A to maximize the number of equations (4.16). One of the possible solutions is shown in Karnaugh map (Fig. 4.10). The following equations could be obtained from Fig. 4.10 and sets A(Bk ): T21 = 0; T42 = T52 = 0; T13 = T23 = 0.
(4.23)
It means that there are 4 LUTs in LUTer1, 3 in LUTer2 and 3 in LUTer3. Because T21 = T23 = 0, then there is no LUT for function D2 in LUTerT of FSMPTc (Γ6 ). It gives NΦ = 10 for PTC (Γ6 ).
4.2 Synthesis of Moore FSM with the Base Structure
101
Fig. 4.10 State codes for Moore FSM PTC (Γ6 ) Table 4.2 Table ST1 of Moore FSMPTC (Γ6 ) Bi C(Bi ) as K(as ) B1 B2
01 10
B3
11
a2 a3 a4 a5 a6 a7 a8 a9
10000 10001 10010 10011 10111 10101 10110 10100
Table 4.3 Table ST2 of Moore FSM PTC (Γ6 ) Bi C(Bi ) as K(as ) B5 B6
01 10
B7
11
a12 a13 a14 a15 a9 a15
01000 01100 11000 11100 10100 11100
Xh1
Φh1
h
1 x1 x2 x1 x¯2 x¯1 x3 x¯1 x¯3 x2 x3 x3 x¯2 x¯1 x3
D11 D11 D51 D11 D41 D11 D41 D51 D11 D31 D41 D51 D11 D31 D51 D11 D31 D41 D11 D31
1 2 3 4 5 6 7 8
Xh1
Φh1
h
1 x4 x¯4 x6 x¯4 x¯6 x7 x¯4 x¯6 x¯7 1
D22 D22 D32 D12 D22 D12 D22 D32 D12 D32 D12 D22 D32
1 2 3 4 5 6
Because R < S = 5, then each function yn ∈ Y and τr ∈ T is implemented by a single LUT. So, there is no influence of codes C(Bi ) on the number of LUTs. Let us encode classes Bi ∈ ΠA in the trivial way: C(B1 ) = C(B4 ) = C(B5 ) = 0, C(B2 ) = C(B6 ) = C(B8 ) = 10 and C(B3 ) = C(B7 ) = C(B9 ) = 1. Having GSA Γ6 , codes K(am ) and codes C(Bi ), we can form tables STk (k = 1, 3). Each table has the following columns: Bi , C(Bi ), as , K(as ), Xhk , Φhk , h. These tables are represented by Tables 4.2, 4.3 and 4.4.
102
4 Twofold State Assignment for Moore FSMs
Table 4.4 Table ST3 of Moore FSM PTC (Γ6 ) Bi C(Bi ) as K(as ) B4
01
B8
10
B9
11
a10 a11 a16 a17 a1
00001 00010 00011 00100 00000
Xh1
Φh1
h
x5 x¯5 x6 x3 x¯3 1
D53 D43 D43 D53 D33
1 2 3 4 5
–
Using Tables 4.2, 4.3 and 4.4, we can derive the following systems: D11 = τ¯1 τ2 ∨ τ1 ; D41 = τ1 τ¯2 x¯2 ∨ τ1 τ¯2 x¯1 ∨ τ1 τ2 x3 x¯2 ; D51 = τ1 τ¯2 x¯1 ∨ τ1 τ¯2 x2 ∨ τ1 τ2 x3 x2 . D12 D22 D32 D42
D21 = 0; D31 = τ1 τ¯2 x¯2 x¯3 ∨ τ1 τ2 ;
= τ3 τ¯4 x¯4 ∨ τ1 τ2 ; = τ4 ∨ τ3 τ¯4 x4 ∨ τ3 τ¯4 x6 ∨ τ3 τ¯4 x¯4 x¯6 x7 ; = τ3 τ¯4 x4 ∨ τ3 τ¯4 x¯6 ∨ τ3 τ4 ; = D52 = 0.
D13 = D23 = 0; D43 = τ5 τ¯6 x3 ∨ τ¯5 τ6 x¯5 ;
D33 = τ5 τ¯6 x¯3 ; D53 = τ¯5 τ6 x5 ∨ τ5 τ¯6 x3 .
(4.24)
(4.25)
(4.26)
The system (4.24) allows constructing truth tables for LUTs of LUTer1, the system (4.25) for LUTer2 and the system (4.26) for LUTer3. As follows from (4.24)–(4.26), there are 10 LUTs in the circuits of LUTer1–LUTer3. In the discussed case, the system (3.14) is the following one: D1 = D11 ∨ D12 ; D4 = D41 ∨ D43 ;
D3 = D31 ∨ D32 ∨ D33 ; D5 = D51 ∨ D53 .
(4.27)
There is no equation for D2 ∈ Φ into (4.27). It follows from equations D21 = D23 = 0. So, the function D2 is implemented by LUTer2. There are 4 LUTs in LUTerT of PTC (Γ6 ). There are the following columns in table of LUTerYT : am , K(am ), Y , T , m. Because R = 5, then there are 32 rows in this table for PTC (Γ6 ). So, we show only 17 rows in Table 4.5. In this table, we use codes from Fig. 4.10. If am ∈ Bi and Bi ∈ Bk , then there is the code C(Bi ) in the column T . Table 4.5 corresponds to 15 tables of LUTs. It means that there are 15 LUTs in the circuit of LUTerYT . It is possible to diminish this number using the refined state assignment. But there is no such a possibility in the discussed case.
4.2 Synthesis of Moore FSM with the Base Structure Table 4.5 Table of LUTerYT for Moore FSM PTC (Γ6 ) am K(am ) Y y1 y2 y3 y4 y5 y6 y7 y8 y9 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17
00000 10000 10001 10010 10011 10111 10101 10110 10100 00001 00010 01000 01100 11000 11100 00011 00100
000000000 110000000 100000000 011000000 001100000 000010000 000001100 001010000 000100010 000000101 000000010 011000000 110000000 000000101 001100000 100000000 000010000
103
T
m
τ1 τ2 τ3 τ4 τ5 τ6 010000 100000 110000 110000 110000 110000 000001 000001 000001 000100 000100 001000 001100 001100 000010 000011 000011
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
So, there are 29 LUTs with S = 5 in the circuit of PTC (Γ6 ). There are 35 LUTs in the circuit of PTB (Γ6 ). So, using classes Bi ∈ ΠA instead of states am ∈ A gives 10% of economy. Of course, it is true only for this example. But we found that there are less amount of blocks in ΠC in comparison with ΠB (for equivalent FSMs). There are the following steps in synthesis method for PTB Moore FSM: 1. 2. 3. 4. 5. 6. 7. 8.
Finding the set of states A for GSA Γ . Constructing the partition ΠB = {A1 , ..., AK }. Executing the state assignment. Constructing the subtables STk for classes Ak ∈ ΠB . Constructing the systems (3.12). Constructing the table of LUTerT. Constructing the table of LUTerYT . Implementing FSM circuit with particular LUTs.
We discussed some of these steps for Moore FSM PTB (Γ6 ). We hope that a reader has no problems with this method.
104
4 Twofold State Assignment for Moore FSMs
4.3 Synthesis of Moore FSM with Encoding of Collections of Microoperations Let us discuss the design method for LUT-based PYTC Moore FSM. The circuit is synthesized using the partition ΠC . There is a structural diagram of PYTC FSM shown in Fig. 4.4. Of course, the same diagram represents PYTB Moore FSM. Let us discuss an example of synthesis using GSA Γ7 (Fig. 4.11). There are M = 17, L = 6 and N = 8 for Moore FSM P(Γ7 ). There are the following steps in the proposed method of synthesis: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Finding the set of states A for GSA Γ . Constructing the partition ΠA = {B1 , ..., BI }. Constructing the partition ΠC = {B1 , ..., BK }. Executing the state assignment. Executing the class assignment. Constructing the subtables STk for classes Bk ∈ ΠC . Constructing the systems (3.12). Constructing the table of LUTerT. Executing the diminishing encoding of CMOs. Constructing the table of LUTerZT . Constructing the table of LUTerY. Implementing FSM circuit with particular LUTs.
Analysis of GSA Γ7 shows that A = {a1 , ..., a17 }. Let us find the partition ΠA = {B1 , ..., BI }. It is constructed in the trivial way using the definition of PES [2]. Let it be S = 5. To find the partition ΠC , we use the same approach as in Sect. 4.2. There is the following partition ΠA = {B1 , ..., B9 } with B1 = {a1 }, B2 = {a2 , ..., a5 }, B3 = {a6 }, B4 = {a7 , a8 , a12 }, B5 = {a9 }, B6 = {a10 }, B7 = {a11 }, B8 = {a13 , a14 , a15 }, B9 = {a16 , a17 }. These classes are distributed among the following classes Bi ∈ ΠC : B1 = {B1 , B2 , B5 }, B2 = {B3 , B4 , B6 }, B3 = {B7 , B8 , B9 }. It gives the sets X 1 = {x1 , x2 }, X 2 = {x3 , x4 , x5 }, X 3 = {x5 , x6 }, A(B1 ) = {a2 , ..., a6 , a10 }, A(B2 ) = {a7 , a8 , a9 , a11 , a13 , a14 , a15 }, A(B3 ) = {a1 , a12 , a15 , a16 , a17 }. Let us use the diminishing state assignment for FSMPYTC (Γ7 ). One of the variants is shown in Fig. 4.12. Let us point out that there is R = 5. There is M1 = M2 = M3 = 3 in the discusses case. So, there is R1 = R2 = R3 = 2. It gives RA = 6 and set T = {τ1 , ..., τ6 }. To find it, we use relations (3.6), (3.7). Let it be T 1 = {τ1 , τ2 }, T 2 = {τ3 , τ4 }, T 3 = {τ5 , τ6 }. Let us encode the classes Bi ∈ Bk in the following way: C(B1 ) = C(B3 ) = C(B7 ) = 01, C(B2 ) = C(B4 ) = C(B8 ) = 10 and C(B5 ) = C(B6 ) = C(B9 ) = 11. As in previous case, we use codes 00 to show that a class Bi does not belong to the class Bk (i = 1, I , k = 1, K). Let us form tables ST1 –ST3 for the given example. They are shown in Tables 4.6, 4.7 and 4.8, respectively Using tables ST1 –ST3 , we can construct the system (3.12). It is represented by the following three systems:
4.3 Synthesis of Moore FSM with Encoding of Collections of Microoperations Fig. 4.11 Initial GSA Γ7 marked by states of Moore FSM
Fig. 4.12 Outcome of diminishing state assignment for PYTC (Γ7 )
105
106
4 Twofold State Assignment for Moore FSMs
Table 4.6 Table ST1 for Moore FSM PYTC (Γ7 ) Bi C(Bi ) as K(as ) B1
01
B2
10
B5
11
a2 a3 a4 a5 a6 a1
01100 01000 11000 01110 01010 11010
Table 4.7 Table ST2 for Moore FSM PYTC (Γ7 ) Bi C(Bi ) as K(as ) B3
01
B4
10
B6
11
a7 a8 a9 a13 a14 a15 a8 a11
10100 10000 10101 10110 10010 00100 10000 10001
Table 4.8 Table ST3 for Moore FSM PYTC (Γ7 ) Bi C(Bi ) as K(as ) B7 B8
01 10
B9
11
a12 a16 a17 a15 a1
00001 00101 00010 00100 00000
D11 = τ1 τ¯2 x1 x2 ; D31 = τ¯1 τ2 x1 ∨ τ1 τ¯2 x1 x¯2 ; D41 = τ1 τ¯2 x¯2 ∨ τ1 τ¯2 x¯1 ∨ τ1 τ2 ; D12 = τ4 ∨ τ3 τ¯4 x5 ; D32 = τ¯3 τ4 xx ∨ τ¯3 τ4 x3 ∨ τ3 τ¯4 x5 ; D42 = τ3 τ¯4 x5 ; D13 = D23 = 0; D43 = τ5 τ¯6 x5 x¯6 ;
Xh1
Φh1
h
x1 x¯1 x1 x2 x1 x¯2 x¯1 1
D21 D31 D21 D11 D21 D21 D31 D41 D21 D41 D11 D21 D41
1 2 3 4 5 6
Xh2
Φh2
h
x3 x4 x3 x¯4 x¯3 x3 x5 x1 x¯3 x5 x¯5 x4 x1 x¯4
D22 D32 D12 D12 D32 D52 D12 D32 D42 D12 D42 D32 D12 D12 D52
1 2 3 4 5 6 7 8
Xh3
Φh3
h
1 x5 x6 x5 x¯6 x¯5 1
D53 D33 D53 D43 D33
1 2 3 4 5
–
D21 = τ1 ∨ τ2 ; (4.28) D51 = 0.
D22 = 0; (4.29) D52 = τ¯3 τ4 x¯3 ∨ τ3 τ4 x¯4 .
D33 = τ5 τ¯6 x6 ∨ τ5 τ¯6 x¯5 ; D53 = τ¯5 τ6 ∨ τ5 τ¯6 x6 .
(4.30)
4.3 Synthesis of Moore FSM with Encoding of Collections of Microoperations
107
Let us form the system (3.14) instead of table of LUTerT. In the discussed case, it is the following system: D1 = D11 ∨ D12 ; D3 = D31 ∨ D32 ∨ D33 ; D5 = D52 ∨ D53 .
D2 = D21 ; D4 = D41 ∨ D42 ∨ D43 ;
(4.31)
Analysis of (4.28), (4.29) shows that there is NΦ = 11. So, our approach of state assignment gives economy in 4 LUTs. As follows from (4.31), there is NT = 4. It gives 15 LUTs in LUTer1–LUTer3, LUTerT instead of 20. So, our approach saves 25% of LUTs in the discusses part of the circuit of FSMPYTC (Γ7 ). There are CMOs Yq ⊆ Y written in the operational vertices of GSA Γ7 . They are the following: Y1 = ∅, Y2 = {y1 , y2 , y7 }, Y3 = {y3 , y5 }, Y4 = {y5 }, Y5 = {y4 , y7 }, Y6 = {y1 , y4 , y7 }, Y7 = {y2 , y6 , y7 }, Y8 = {y6 , y8 }, Y9 = {y4 , y5 , y7 }, Y10 = {y5 , y8 }, Y11 = {y2 , y3 , y7 }, Y12 = {y3 , y5 , y8 }. So, there is Q = 12. Using (2.24) gives RQ = 4 and Z = {z1 , ..., z4 }. If RQ ≤ S, then it is necessary exactly N LUTs in the circuit of LUTerY. The circuit has NI = RQ · N interconnections. In the discussed example, there is N = 8. It gives NI = 32. Let us try to decrease these parameters using the diminishing encoding of CMOs. Each MO yn ∈ Y could be represented by the following formula: yn =
Q
Cnq Yq (n = 1, N ).
(4.32)
q=1
The Boolean variable Cnq = 1 if and only if (iff) yn ∈ Yq . Let us form such a system for GSA Γ7 . It is the following: y1 y3 y5 y7
= Y2 ∨ Y6 ; = Y3 ∨ Y11 ∨ Y12 ; = Y3 ∨ Y4 ∨ Y10 ∨ Y12 ; = Y2 ∨ Y5 ∨ Y6 ∨ Y7 ∨ Y9 ∨ Y11 ;
y2 y4 y6 y8
= Y2 ∨ Y7 ∨ Y11 ; = Y5 ∨ Y6 ∨ Y9 ; = Y7 ∨ Y8 ∨ Y9 ; = Y8 ∨ Y10 ∨ Y12 .
(4.33)
Using the approach from [20], it is possible to execute the diminishing encoding of CMOs. Its outcome is shown in Fig. 4.13.
Fig. 4.13 Outcome of diminishing encoding of CMOs for MooreFSM PYTC (Γ7 )
108
4 Twofold State Assignment for Moore FSMs
Table 4.9 Table of LUTerZT for Moore FSM PYTc (Γ7 ) am K(am ) Bi Yq C(Bi ) τ1 τ2 τ3 τ4 τ5 τ6 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17
00000 01100 01000 11000 01110 01010 10100 10000 10101 11010 10001 10101 10110 10010 00100 00101 00010
B1 B2 B2 B1 B2 B3 B4 B4 B5 B6 B7 B4 B8 B8 B8 B9 B9
Y1 Y2 Y3 Y5 Y6 Y7 Y10 Y11 Y4 Y8 Y9 Y12 Y11 Y4 Y5 Y2 Y4
010000 100000 100000 100000 100000 000100 001000 001000 110000 001100 000001 001000 000010 000010 000010 000011 000011
K(Yq ) z1 z2 z3 z4
m
0000 0100 1000 1111 0101 0110 1011 1110 1001 0010 0111 1010 1110 1001 1111 0100 1001
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
After minimizing, the following system Y(Z) could be extracted from the Karnaugh map (Fig. 4.13): y1 = z2 z¯3 ; y5 = z1 z¯2 ;
y2 = z2 z¯4 ; y6 = z¯1 z3 ;
y3 = z1 z¯4 ; y7 = z2 ;
y4 = z2 z4 ; y8 = z¯2 z3 ;
(4.34)
Because y7 = z2 , there is no LUT for implementing the function y7 in the block LUTerY. So, there are 7 LUTs in this circuit. The number NI could be found as a total number of literals in the system (4.34). It gives NI = 14. So, using the diminishing encoding of CMOs gives economy in 12% for the number of LUTs and 56% for interconnections. There are the following columns in the table of LUTerZT : am , K(am ), Bi , Yq , C(Bi ), K(Yq ), m. In the discussed case, this table has M = 17 rows (Table 4.9). This table corresponds to RA + RQ = 10 truth tables for LUTs of LUTerZT . There are 10 LUTs and 40 interconnections in this circuit. The number of interconnections could be diminished if we minimize functions zr ∈ Z and τr ∈ T . For example, it is possible to get the function τ5 = z¯2 z¯3 z4 ∨ z¯1 z¯2 z3 . There are only 4 interconnections for the corresponding circuit. The tables of LUTerY is constructed using the system (4.34). It is necessary to form 7 truth tables. It could be done in the trivial way.
4.3 Synthesis of Moore FSM with Encoding of Collections of Microoperations
109
This approach can be used for PYT FSMs based on the partition ΠB . There are the following steps in the proposed design method for PYTB Moore FSM: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Finding the set A for a GSA Γ . Constructing the partition ΠB = {A1 , ..., AK }. Executing the state assignment. Constructing the tables STk for classes Ak ∈ ΠA . Constructing the system (3.12). Constructing the system (3.14). Executing the diminishing encoding of CMOs. Constructing the table of LUTerZT . Constructing the table of LUTerY. Implementing FSM circuit with particular LUTs.
Let us discuss an example of synthesis for Moore FSM PYTB (Γ7 ). There is the set A = {a1 , ..., a17 }. Let us use LUTs with S = 5. The partition ΠB is constructed as in all previous cases. It is the following: ΠB = {A1 , ..., A4 } with A1 = {a1 , ..., a5 , a9 , a11 }, A2 = {a6 , a10 , a16 , a17 }, A3 = {a7 , a8 , a12 } and A4 = {a13 , a14 , a15 }. It gives the sets X 1 = {x1 , x2 }, X 2 = {x3 , x4 }, X 3 = {x3 , x5 } and X 4 = {x5 , x6 }. Let A(Ak ) be the set of states of transition for states am ∈ Ak . There are the following sets in the discussed case: A(A1 ) = {a2 , ..., a6 , a10 , a12 }, A(A2 ) = {a1 , a7 , a8 , a9 , a11 }, A(A3 ) = {a13 , a14 , a15 } and A(A4 ) = {a16 , a17 }. It is interesting that the following condition is true: A(Ai ) ∩ A(Aj ) = ∅ (i = j; i, j ∈ {1, ..., K}).
(4.35)
Let us encode the states as it is shown in Fig. 4.14. To do it, we use the approach from [20]. Analysis of Fig. 4.14 and sets A(Ak ) shows that: D31 = 0; D23 = D33 = 0;
D12 = D22 = 0; D14 = D24 = 0.
(4.36)
There are maximal K · R = 20 LUTs in the circuits of LUTer1–LUTer4 of PYTB (Γ7 ). But there are only 13 LUTs in these circuits due to the state assignment (Fig. 4.14).
Fig. 4.14 Outcome of state assignment for Moore FSM PYTB (Γ7 )
110
4 Twofold State Assignment for Moore FSMs
Table 4.10 Codes C(am ) for Moore FSM PYTB (Γ7 ) am C(am ) am C(am ) am τ1 τ2 τ3 τ4 τ5 τ6 a1 a2 a3 a4 a5 a9 a11 ∈ / A1
001 010 011 100 101 110 111 000
a6 a10 a16 a17 ∈ / A2 – – –
001 010 100 011 000 – – –
a7 a8 a12 ∈ / A3 – – – –
Table 4.11 Table ST2 for Moore FSM PYTB (Γ7 ) am C(am ) as K(as ) a6
001
q10
010
a16 a17
100 011
a7 a8 a9 a8 a11 a1 a1
00001 00010 00010 00010 00101 00000 00000
C(am ) τ7 τ8
am
C(am ) τ9 τ10
01 10 11 00 – – – –
a13 a14 a15 ∈ / A4 – – – –
01 10 11 00 – – – –
Xh2
Φh2
h
x3 x4 x3 x¯4 x¯3 x4 x¯4 1 1
D52 D42 D32 D42 D42 D32 D52 – –
1 2 3 4 5 6 7
To construct tables ST1 − ST4 , it is necessary to encode the states am ∈ Ak . There are R1 = 3, R2 = 3, R3 = 2 and R4 = 2. So, there is RA = 10. It gives the sets T 1 = {τ1 , τ2 , τ3 }, T 2 = {τ4 , τ5 , τ6 }, T 3 = {τ7 , τ8 }, T 4 = {τ9 , τ10 } and T = {τ1 , ..., τ10 }. There are codes C(am ) shown in Table 4.10. Tables STk are constructed as it was before. For example, there are H2 = 7 rows in the Table ST2 (Table 4.11). Let us find equations (3.12) following from Table 4.1. It is possible to use insignificant input assignments 101, 110 and 111 for optimizing this system. Now we have the following system: D12 = D22 = 0; D42 = τ¯5 τ6 x¯4 ∨ τ¯5 τ6 x¯3 ∨ τ5 τ¯6 x4 ;
D32 = τ¯5 τ6 x¯3 ∨ τ5 τ¯6 x¯4 ; D52 = τ¯5 τ6 x3 x4 ∨ τ5 τ¯6 x¯4 .
(4.37)
The system (4.37) determines the circuit of LUTerZ. It includes 3 LUTs and 12 interconnections. In the worst case, it should include 5 LUTs and 20 interconnections. The economy is achieved due to the style of state assignment proposed in the book. There are no problems with constructing tables ST1 , ST3 and ST4 . We leave this task to a reader.
4.3 Synthesis of Moore FSM with Encoding of Collections of Microoperations
111
The system (3.14) is constructed in the trivial way. Using the system (4.36), we can get the following system: D1 = D11 ∨ D13 ; D4 = D41 ∨ D42 ∨ D43 ∨ D44 ;
D3 = D2 − 3 ∨ D34 ; D5 = D51 ∨ D52 ∨ D53 ∨ D54 .
(4.38)
Analysis of (4.38) shows that there are 4 LUTs in the circuit of LUTerT. Also, there are 12 interconnections. In the worst case, there are 5 LUTs and 20 interconnections. Tables of LUTerZT and LUTerY are constructed in the way as for the previous case. Let us use codes K(Yq ) from Fig. 4.13. In this case, there are 7 LUTs in LUTerY and RQ + RA = 14 LUTs in LUTerZT . So, there are NΦ + NT + NZT + NY = 32 LUTs in the circuit of FSMPYTC (Γ7 ). There are NΦ + NT + NZT + NY = 38 LUTs in the circuit of FSM PYTB (Γ7 ). Both circuits have 4 levels of logic. So, their performance is the same. So, using the partition ΠC saves 16% of LUTs without loss of performance. Of course, it is true only for GSA Γ7 and S = 5.
4.4 Encoding of the Fields of Compatible Microoperations It is possible to use the twofold state assignment together with the encoding of the fields of compatible microoperations (FCMO). A microoperation yi ∈ Y is compat/ Yq for q ∈ {1, ..., Q} [2, 6]. This approach ible with MO yj ∈ Y if yi ∈ Yq → yj ∈ could be used for both Mealy and Moore FSMs. Let us discuss the case when the encoding of FCMO is used in Moore FSM. In the case of twofold state assignment, it leads to PDT Moore FSM (Fig. 4.15).
Fig. 4.15 Structural diagram of PDT Moore FSM
1
X1
K
1
Φ
Φ Clock Start
K
K
X
Block T T
Z
Y
112
4 Twofold State Assignment for Moore FSMs
In PDT FSM, any Block k corresponds to LUterk (k = 1, K), Block T to LUTerT. The circuit of Block ZT could be implemented as either LUTerZT or EMBerZT . The variables zr ∈ Z encode the FCMO. The Block D is implemented with LUTs. Each LUT of LUTerD decodes a single code K(yn ) and forms the function yn ∈ Y . Let us discuss the method of synthesis for PDTC Moore FSM. It includes the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Finding the set A for GSA Γ . Constructing the partition ΠA = {B1 , ..., BI }. Constructing the partition ΠC = {B1 , ..., BK }. Executing the state assignment. Executing the class assignment. Constructing tables STk for classes Bk ∈ ΠC . Finding the system (3.12). Finding the system (3.14). Constructing the partition ΠY = {Y 1 , ..., Y V }. Executing the encoding of compatible MOs. Constructing the table of LUTerZT . Constructing tables for LUTerD. Implementing FSM circuit with particular LUTs.
Let us discuss an example of synthesis for Moore FSM PDTC (Γ8 ). There is the marked GSA Γ8 shown in Fig. 4.16. Let us use LUTs with S = 5. There are the sets A = {a1 , ..., a12 }, X = {x1 , ..., x6 } and Y = {y1 , ..., y9 } in the discussed case. Using (1.4) the value R = 4 could be found. It gives the sets T = {T1 , ..., T4 } and Φ = {D1 , ..., D4 }. Using definition of PES, it is possible to find the partition ΠA = {B1 , ..., B6 } with the classes B1 = {a1 }, B2 = {a2 }, B3 = {a3 , ..., a6 }, B4 = {a7 , a8 }, B5 = {a9 , a10 , a12 } and B6 = {a11 }. Using the methods discussed in this chapter, we can find the partition ΠC = {B1 , B2 } where B1 = {B1 , B2 , B3 } and B2 = {B4 , B5 , B6 }. It gives the sets X 1 = {x1 , x2 , x3 }, X 2 = {x4 , x5 , x6 }, A(B1 ) = {a2 , ..., a8 } and A(B2 ) = {a1 , a9 , ..., a12 }. Let us encode the states am ∈ A as it is shown in Fig. 4.17. There is R1 = R2 = 2. So, there are RA = 4, T = {τ1 , ..., τ4 }, T 1 = {τ1 , τ2 }, T 2 = {τ3 , τ4 }. Let us encode the classes Bi ∈ Bk in the trivial way: C(B1 ) = C(B4 ) = 01, C(B2 ) = C(B5 ) = 10 and C(B3 ) = C(B6 ) = 11. Using state codes (Fig. 4.17), class codes and GSA Γ8 we can construct tables ST1 (Table 4.12) and ST2 (Table 4.13). Analysis of Tables 4.12 and 4.13 shows that: (1) there are 4 LUTs in LUTer1; (2) there are three LUTs in LUTer2; (3) there are 3 LUTs in LUTerT (there is no LUT for T1 ). So, there is a saving in LUTs due to the diminishing state assignment (Fig. 4.17). Using methods [1, 3, 4], we can find the following partition ΠY = {Y 1 , Y 2 , Y 3 } where Y 1 = {y1 , y5 , y6 }, Y 2 = {y2 , y7 , y9 } and Y 3 = {y3 , y4 , y8 }. Using (2.30), we can find R1 = R2 = R3 = 2. It give RQ = 6, Z = {z1 , ..., z6 } with Z 1 = {z1 , z2 }, Z 2 = {z3 , z4 } and Z 3 = {z5 , z6 }. There is no influence of codes K(yn ) on the number of LUTs in LUTerD. Due to it, let us encode the microoperations in the trivial way (Table 4.14).
4.4 Encoding of the Fields of Compatible Microoperations
113
Fig. 4.16 Marked GSA Γ8
Fig. 4.17 State codes foe Moore FSM PDTC (Γ8 )
T1 T2 11
10
00
a1 a11 a6
00
a2
01
a9 a12 a7
a3
11
∗
∗
∗
a5
10
a10
∗
a8
a4
T3 T4
01
114
4 Twofold State Assignment for Moore FSMs
Table 4.12 Table ST1 for Moore FSM PDTC (Γ8 ) Bi C(Bi ) as K(as ) B1 B2
01 10
B3
11
a2 a3 a4 a5 a6 a7 a8
1000 1001 1010 1011 1100 1101 1110
Table 4.13 Table ST2 for Moore FSM PDTC (Γ8 ) Bi C(Bi ) as K(as ) B4
10
B5
10
B6
11
a9 a10 a11 a9 a1 a12
0001 0010 0100 0001 0000 0101
Table 4.14 Codes K(yn ) for Moore FSM PDTC (Γ8 ) Y1 K(yn ) Y2 K(yn ) z1 z2 z3 z4 ∈ / Y1 y1 y5 y6
00 01 10 11
∈ / Y2 y2 y7 y9
00 01 10 11
Xh1
Φh1
h
1 x1 x¯1 x2 x¯1 x¯2 x2 x¯2 x3 x¯2 x¯3
D11 D11 D11 D31 D11 D21 D41 D11 D22 D11 D21 D41 D11 D21 D31
1 2 3 4 5 6 7
Xh2
Φh2
h
x4 x5 x4 x¯5 x¯4 x6 x¯6 1
D42 D32 D22 D42 – D22 D42
1 2 3 4 5 6
Y3
K(yn ) z5 z6
∈ / Y3 y3 y4 y8
00 01 10 11
To construct table of LUTerZT , it is necessary to find the codes of CMOs C(Yq ). These codes are shown in Table 4.15. The CMOs Yq ⊆ Y are extracted from operational vertices of GSA Γ8 . There are Q = 7 of CMOs. Their content is shown in Table 4.15 (column yn ∈ Yq ). These codes are used to construct the table of LUTerZT . There are M rows in table of LUTerZT . Each row corresponds to a single state am ∈ A. If am ∈ Bi , then there are the variables τr ∈ T equal to 1 in the code C(Bi ). If a CMO Yq ⊆ Y is generated in the state am ∈ A, then there are variables zr ∈ Z equal to 1 into C(Yq ) in the mth row of the table. Using these rules, we construct Table 4.16. There are M = 12 rows in this table. There are the following columns in the table of LUTerZT : am , K(am ), Bi , Yq , C(Bi ), C(Yq ), m. The meaning of each column is clear from Table 4.16.
4.4 Encoding of the Fields of Compatible Microoperations
115
Table 4.15 Codes C(Yq ) for Moore FSM PDTC (Γ8 ) Yq
yn ∈ Yq
C(Yq ) z1 z2 z3 z4 z5 z6
Y1 Y2 Y3 Y4 Y5 Y6 Y7
– y1 y2 y3 y2 y4 y7 y2 y8 y5 y9 y2 y4 y6 y6 y8 y9
000000 010101 011010 000111 101100 110110 111111
Table 4.16 Table of LUTerZT for Moore FSM PDTC (Γ8 ) am K(am ) Bi Yq C(Bi ) τ1 τ2 τ3 τ4 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12
0000 1000 1001 1010 1011 1100 1101 1110 0001 0010 0100 0101
B1 B2 B3 B3 B3 B3 B4 B4 B5 B5 B6 B5
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y2 Y3 Y4 Y5 Y6
0100 1000 1100 1100 1100 1100 0001 0001 0010 0010 0011 0010
C(Yq ) z1 z2 z3 z4 z5 z6
m
00 00 00 01 01 01 01 10 10 00 01 11 10 11 00 11 01 10 11 11 11 01 01 01 01 10 10 00 01 11 10 11 00 11 01 10
1 2 3 4 5 6 7 8 9 10 11 12
We take the state codes from Fig. 4.17, class codes from Tables 4.12 and 4.13, codes C(Yq ) from Table 4.15. This table corresponds to RA + RQ truth tables. Each of them is constructed in the trivial way. Each LUT of LUTerD implements exactly a single function yn ∈ Y . To construct the corresponding truth tables, we should form the system (2.4). It is derived from Table 4.14 in the trivial way: y1 = z¯1 z2 ; y4 = z5 z¯6 ; y7 = z3 z¯4 ;
y2 = z¯3 z4 ; y5 = z1 z¯2 ; y8 = z5 z6 ;
y3 = z¯5 z6 ; y6 = z1 z2 ; y9 = z3 z4 .
(4.39)
Each equation from (4.39) determines a truth table for particular LUT. Having all truth tables, we can implement the circuit of FSM.
116
4 Twofold State Assignment for Moore FSMs
References 1. Altera. http://www.altera.com. Accessed Jan 2019 2. Amann R, Baitinger U (1989) Optimal state chains and states codes in finite state machines. IEEE Trans Comput-Aided Des 8(2):153–170 3. Atmel. http://www.atmel.com. Accessed Jan 2019 4. Bacchetta P, Daldos L, Sciuto D, Silvano C (2000) Low-power state assignment techniques for finite state machines. In Proceedings of the 2000 IEEE international symposium on circuits and systems (ISCAS’2000), vol 2, Geneva, 2000. IEEE, pp 641–644 5. Baranov S (1994) Logic synthesis of control automata. Kluwer Academic Publishers, Dordrecht 6. Baranov S (2008) Logic and system design of digital systems. TUT Press, Tallinn 7. Barkalov A, Titarenko L, Barkalov A Jr (2012) Structural decomposition as a tool for the optimization of an FPGA-based implementation of a Mealy FSM. Cybern Syst Anal 48(2):313– 322 8. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on CPLD. In: Proceedings of the sixth international conference CAD DD’07, vol 2, Minsk, pp 39–45 9. Barkalov A, Titarenko L, Wi´sniewski R (2006) Optimization of address circuit of compositional microprogram unit. In: Proceedings of the IEEE east-west design and test workshop (EWDTW’06), Sochi, Kharkov, 2006. Kharkov National University of Radioelectronics, pp 167–170 10. Barkalov A, Titarenko L, Wi´sniewski R (2006) Synthesis of compositional microprogram control units with sharing codes adn address decoder. In: Proceedings of the international conference mixed design of integrated circuits and systems – MIXDES 2006, Łódz, pp 397– 400 11. Benini L, De Micheli G (1995) State assignment for low power dissipation. IEEE J Solid-State Circuits 30(3):258–268 12. Chen C, Zhao J, Ahmadi M (2003) A semi-Gray encoding algorithm for low-power state assignment. In: Proceedings of the 2003 international symposium on circuits and systems, 2003. ISCAS’03., vol 5. IEEE, pp 389–392 13. El-Maleh A, Sait S, Khan F (2006) Finite state machine state assignment for area and power minimization. In: 2006 IEEE international symposium on circuits and systems, 2006. ISCAS 2006. Proceedings. IEEE, pp 5303–5306 14. Grout I (2008) Digital systems design with FPGAs and CPLDs. Elsevier Science, Oxford 15. Nöth W, Kolla R (1999) Spanning tree based state encoding for low power dissipation. In: Proceedings of the conference on design, automation and test in Europe. Association for Computing Machinery, p 37 16. Park S, Cho S, Yang S, Ciesielski M (2004) A new state assignment technique for testing and low power. In: Proceedings of the 41st annual design automation conference. Association for Computing Machinery, pp 510–513 17. Roy K, Prasad S (1992) SYCLOP: synthesis of CMOS logic for low power applications. In: Proceedings, IEEE 1992 international conference on computer design: VLSI in computers and processors, 1992. ICCD’92. IEEE, pp 464–467 18. Senhadji-Navarro R, Garcia-Vargas I, Jiménez-Moreno G, Civit-Balcells A, Guerra-Gutierrez P (2004) ROM-based FSM implementation using input multiplexing in FPGA devices. Electron Lett 40(20):1249–1251 19. Sklyarov V (2000) Synthesis and implementation of RAM-based finite state machines in FPGAs. In: Proceedings of field-programmable logic and applications: the roadmap to reconfigurable computing, Villach, 2000. Springer, pp 718–728 20. Tatalov E (2011) Synthesis of compositional microprogram control units for programmable devices. Master’s thesis, Donetsk National Technical University, Donetsk 21. Tsui C, Pedram M, Despain A (1994) Exact and approximate methods for calculating signal and transition probabilities in FSMs. In: 31st conference on design automation, 1994. IEEE, pp 18–23 22. Zeidman B (2002) Designing with FPGAs and CPLDs. CMP Books, London
Chapter 5
Combining Twofold State Assignment with Transformation of Object Codes
Abstract The chapter deals with optimization of FSM logic circuits by combining the twofold state assignment with transformation of object codes. In the beginning, the idea of the transformation is discussed. Next, it is shown how to combine the twofold state assignment with transformation of microoperations into Mealy FSM’s states. This approach allows removing the direct dependence among logical conditions and input memory functions of Mealy FSM. Further, there are discussed methods based on combining twofold state assignment with transformation of states into microoperations of Mealy FSM. It allows removing direct dependence among logical conditions and output functions of Mealy FSM. The last part of the chapter is devoted to combining the twofold state assignment with transformation of microoperations into classes of pseudoequivalent states of Moore FSM.
5.1 Introduction into Transformation of Object Codes The main goal of the transformation of object codes (TOC) is diminishing the number of functions depending simultaneously on logical conditions xe ∈ X and state variables Tr ∈ T [2, 3]. To achieve this goal, codes of some objects are transformed into codes of other objects [6, 8]. There are two main objects of FSMs [8]. They are states am ∈ A and microoperations yn ∈ Y . Also, there are additional objects such as classes of compatible microoperations, classes of pseudoequivalent states, collections of microoperations [2, 3]. If counters are used instead of registers, then the following objects appear: operational linear chains, elementary operational linear chains, classes of pseudoequivalent operational linear chains [1], linear chains of states (elementary, natural, extended) and their classes [7]. The principle of TOC is clear from Fig. 4.6. We discuss two main approaches when: (1) states are represented by MOs (or CMOs) and identifiers; (2) MOs (or CMOs) are represented by states and identifiers: A = A(Y, I );
(5.1)
A = A(Z , I );
(5.2)
© Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7_5
117
118
5 Combining Twofold State Assignment with Transformation of Object Codes
Fig. 5.1 Structural diagram of PY Mealy FSM
Fig. 5.2 Structural diagram of PY Y Mealy FSM
Fig. 5.3 Structural diagram of PA Mealy FSM
Y = Y (A, I );
(5.3)
Z = Z (A, I ).
(5.4)
In (5.1)–(5.4), the symbol I stands for the set of identifiers. As always, variables zr ∈ Z encode the collections of microoperations (or FCMOs). In the case of Mealy FSM, the dependence (5.1) determines the model of PY FSM (Fig. 5.1), the dependence (5.2) PY Y FSM (Fig. 5.2), the dependence (Fig. 5.3) P A FSM (Fig. 5.3) and the dependence (5.4) P A Y FSM (Fig. 5.4).
5.1 Introduction into Transformation of Object Codes
119
X
Fig. 5.4 Structural diagram of PA Y Mealy FSM
T
BlockΦI Φ I
RG
Clock Start
T BlockZ Z BlockY Y
There is a direct relation between the title of any block and systems of functions generated by it. Let us define these functions. The BlockYI generates functions (1.2) and I = I (A, X ). (5.5) The BlockΦ could implement either system (1.1) or the following systems: Φ = Φ(Y, I );
(5.6)
Φ = Φ(Z , I ).
(5.7)
The BlockZ I implements the systems (2.24) and (5.5). The BlockΦI implements the systems (1.1) and (5.5). The BlockY implements either the system (2.4) or the following system: Y = Y (T, I ). (5.8) The BlockZ generates the system: Z = Z (T, I ).
(5.9)
To optimize hardware consumption, it is better to encode the identifiers Im ∈ I by binary codes K (Im ) using R I bits. The value of R I is determined by (4.8). Now it is possible to replace the set I by the set V , where |V | = R I . To get structural diagrams of Mealy FSMs with encoding of identifiers, it is necessary to replace the letter I by V . The same should be done for formulae (5.5)–(5.9). It gives the following systems of Boolean functions: V = V (T, X );
(5.10)
Φ = Φ(Y, V );
(5.11)
120
5 Combining Twofold State Assignment with Transformation of Object Codes
Φ = Φ(Z , V );
(5.12)
Y = Y (T, V );
(5.13)
Z = Z (T, V ).
(5.14)
All these functions are irregular. So, it is preferable to use LUTs for their implementing. If it is possible, EMBs could be used for implementing regular functions Y = Y (Z ). Because of dependence (1.3), states of Moore FSM are always transformed into microoperations (or codes of CMOs). So, it is necessary to discuss only structural diagrams basing on transformation of microoperations (or collections of microoperations) into codes of states (or codes of classes of PES). It gives the following structural diagrams (Figs. 5.5, 5.6, 5.7 and 5.8). There is an interesting peculiarity connected with Moore FSMs shown in Figs. 5.5,5.6, 5.7 and 5.8. In this case the register keeps either microoperations yn ∈ Y or codes of CMOs Yq ⊆ Y . There are N flip-flops in the first case and R Q in the second. If state codes K (am ) are transformed into class codes K (Bi ), then functions Φ are represented as (1.1) and functions vr ∈ V as V = V (T , X ).
(5.15)
X
Fig. 5.5 Structural diagram of PY Moore FSM
T BlockYV Clock Start
Φ V
RG
BlockT
Y
X
Fig. 5.6 Structural diagram of PY Y Moore FSM
T
BlockZV Clock Start
Φ RG
V
Z BlockY Y
BlockT
5.1 Introduction into Transformation of Object Codes
121
X
Fig. 5.7 Structural diagram of PY C Moore FSM
BlockYV Clock Start
Φ V
RG
Block
Y
X
Fig. 5.8 Structural diagram of PY C Y Moore FSM
BlockZV Clock Start
Φ RG
V
Z BlockY
Block
Y
Let us name objects which are transformed as primary objects (PO). Let us name objects depended on POs as secondary objects (SO). To design FSMs with TOC, it is necessary to express SOs as some functions depending on POs and identifiers Im ∈ I . If functions depend on Tr ∈ T and xe ∈ X , then they are implemented by LUTer. We propose to use the twofold state assignment to diminish hardware in this LUTer. Let us discuss these methods. We discuss only LUT-based FSM circuits. Let us use LUTs with S = 5.
5.2 Synthesis of Mealy FSMs with Transformation of Microoperations This approach has sense if the following condition takes place: N + R I ≤ S. The R I is a number of functions vr ∈ V determined by (4.8).
(5.16)
122
5 Combining Twofold State Assignment with Transformation of Object Codes
Fig. 5.9 Structure diagram of PY T Mealy FSM
1
X1 LUTer1 Y1
K
XK LUTerK
V1
Y
K
K
V
LUTerYV V Clock Start
LUTer
Y
To get the structural diagram of LUT-based PY T Mealy FSM, it is enough to replace the BlockYI (Fig. 5.1) by LUTer1, ..., LUTerK and BlockΦ by LUTerT (Fig. 5.9). The register RG is distributed among the flip-flops of LUTerT . There are R A flip-flops in RG. In PY T FSM, the block LUTerk generates functions (3.11) and (4.11). The block LUTerYV implements functions yn ∈ Y as (3.13) and vr ∈ V as (4.12). The block LUTerT implements functions T = T (Y, V ).
(5.17)
Two interesting conclusions could be done from analysis of Fig. 5.9. Firstly, there is no increasing for the number of logic levels due to mutual applications of two approaches of structural decomposition. There are only three levels of logic in PY T Mealy FSM. It is the same as for PT Mealy FSM. Secondly, there is no need in generation of variables Tr ∈ T . It is possible to transform the microoperations into codes C(am ) where am ∈ Ak and Ak ∈ B . We propose the synthesis method for PY T Mealy FSMs. There are the following steps in this method: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Finding the set A for a given GSA Γ . Constructing the partition B = {A1 , ..., A K }. Constructing the set I . Representing states as functions of yn ∈ Y and Im ∈ I . Encoding the states am ∈ Ak . Encoding the identifiers Im ∈ I . Constructing tables ST1 –STK . Constructing systems (3.11) and (4.11). Constructing table of LUTerYV. Constructing table of LUTerT . Implementing FSM circuit with particular LUTs.
5.2 Synthesis of Mealy FSMs with Transformation of Microoperations
123
Fig. 5.10 Marked GSA Γ9
Let us discuss an example of synthesis for Mealy FSM PY T (Γ9 ). There is marked GSA Γg shown in Fig. 5.10. There are the following sets extracted form GSA Γ9 : X = {x1 , ..., x5 }, Y = {y1 , ..., y4 }, A = {a1 , ..., a9 }. Using (1.17), we can find that R0 = 4. It gives Φ = {D1 , ..., D4 }, T = {T1 , ..., T4 }. There are the following CMOs extracted from operational vertices of GSA Γ9 : Y1 = ∅, Y2 = {y1 , y2 }, Y3 = {y1 }, Y4 = {y2 }, Y5 = {y3 }, Y6 = {y4 }, Y7 = {y1 , y4 }, Y8 = {y2 , y4 }, Y9 = {y1 , y3 }, Y10 = {y1 , y2 , y3 }, Y11 = {y1 , y2 , y4 }. Using methods from previous chapters, it is possible to find the partition B = {A1 , A2 } where A1 = {a1 , a2 , a3 } and A2 = {a4 , .., a9 }. There are the sets X 1 = {x1 , x2 , x3 } and X 2 = {x4 , x5 }. Using (3.6) and (3.7), we can find that R1 = 2, R2 = 3 and R A = 5. Let us point out that R A − R0 = 1. It means that the number of state variables Tr ∈ T and τr ∈ T is practically the same for FSMs P(Γ9 ) and P T (Γ9 ).
124
5 Combining Twofold State Assignment with Transformation of Object Codes
Table 5.1 Dependence states on CMOs for FSM P(Γ9 ) Yq am I Yq am I Y1 Y4 Y5 Y9
a1 a4 a3 a5 a9
− − I1 I2 −
Y2 Y6 – Y10
a2 a6 a3 a7 a1
I1 I2 I1 I2 −
Yq
am
I
Y3
a4 a5 a1 a8 a1
I1 I2 − − −
Y7 Y8 Y11
Let us construct a table showing dependence between CMOs and states. It is Table 5.1 in the discussed case. There are identifiers Im ∈ I into the column I of Table 5.1. So, there is M I = 2. It gives R I = 1 and V = {v1 }. Using Table 5.1, it is possible to represent states am ∈ A by pairs Yq , Im . There are the following dependences in the discussed case: a1 a3 a5 a7 a9
⇒ Y1 , ∅ ∨ Y7 , ∅ ∨ Y10 , ∅ ∨ Y11 , ∅; ⇒ Y5 , I1 ∨ Y6 , I1 ; ⇒ Y3 , I2 ∨ Y5 , I2 ; ⇒ Y6 , I2 ; ⇒ Y9 , ∅.
a2 a4 a6 a8
⇒ Y2 , I1 ; ⇒ Y4 , ∅ ∨ Y3 , I1 ; ⇒ Y2 , I2 ; ⇒ Y8 , ∅;
(5.18)
Let us explain the system (5.18). For example, the state a1 is written in four of Table 5.1. So, it is determined by four pairs Yq , Im . In all cases, there are no identifiers determining a1 ∈ A. We denote this situation by symbol ∅ replacing Im . Next, there is the state a2 in a single cell of Table 5.1. So, it corresponds only to a single pair Y2 , I1 . If a state depends on more than a single pair, we connect these pairs by the sign of disjunction. Let us encode the states am ∈ Ak in the trivial way. It gives the following codes: C(a1 ) = 01, C(a2 ) = 10, C(a3 ) = 11, C(a4 ) = 001, C(a5 ) = 010, C(a6 ) = 011, C(a7 ) = 100, C(a8 ) = 101 and C(a9 ) = 110. There is no influence of codes K (Im ) on the hardware amount in LUTer1–LUTer2. So, let us encode the identifiers in the trival way: K (I1 ) = 0, K (I2 ) = 1. Now we have all information necessary to construct tables of LUTerk. They are represented by Tables 5.2 and 5.3. In these tables states as ∈ A are replaced by the pairs from (5.18). If a CMO Yq is generated during the transition into a state as , then this state is replaced by the CMO Yq and an identifier Im ∈ I . Now we can construct systems (3.11) and (4.11). They are represented by systems (5.19), (5.20) and (5.21), (5.22), respectively. y11 = τ¯1 τ2 ∨ τ1 τ¯2 x¯1 x3 ∨ τ1 τ2 x¯1 ; y31 = τ1 τ¯2 x1 x2 ∨ τ1 τ2 x1 ;
y21 = τ¯1 τ2 ∨ τ1 τ¯2 x¯1 x¯3 ; y41 = τ1 τ¯2 x1 x¯2 .
(5.19)
5.2 Synthesis of Mealy FSMs with Transformation of Microoperations Table 5.2 Table ST1 of Mealy FSM PY T (Γ9 ) am C(am ) X h1 Yh1 a1 a2
01 10
a3
11
1 x1 x2 x1 x¯2 x¯1 x3 x¯1 x¯3 x1 x¯1
y11 y21 y31 y41 y11 y21 y31 y11
Table 5.3 Table ST2 of Mealy FSM PY T(Γ9 ) am C(am ) X h2 Yh2 a4
001
a5
010
a6
011
a7 a8 a9
100 101 110
y12 y22 y32 y42
x4 x¯4 x4 x¯4 x5 x¯5 1 1 1
125
Ih1
Vh1
h
− I11 I11 I11 − I21 I21
− − − − − v11 v11
1 2 3 4 5 6 7
Ih2
Vh2
h
y12 y22 y42 y12 y42
I22 I22 I12
− y12 y22 y12 y22 y32 y22 y42 y12 y32 y12 y22 y42
− I22 − − − −
v12 v12 − − v12 − − − −
1 2 3 4 5 6 7 8 9
= τ¯3 τ¯4 τ5 x4 ∨ τ¯3 τ4 τ¯5 x4 ∨ τ¯3 τ4 τ5 ∨ τ3 τ¯4 τ5 ∨ τ3 τ4 τ¯5 ; = τ¯3 τ¯4 τ5 x4 ∨ τ¯3 τ4 τ5 ∨ τ3 τ¯5 ; = τ¯3 τ4 τ5 x¯5 ∨ τ3 τ¯4 τ5 ; = τ¯3 τ¯4 τ5 x¯4 ∨ τ¯3 τ4 τ¯5 x4 ∨ τ3 τ¯5 .
(5.20)
v11 = τ1 τ2 .
(5.21)
v12 = τ¯3 τ¯4 τ5 ∨ τ4 τ5 x5 .
(5.22)
Systems (5.19)–(5.22) determine truth tables for LUTs from LUTer1 and LUTer2. Table of LUTerYV consists from N + R I truth tables for functions vr ∈ V and yn ∈ Y . They are constructed using systems (3.13) and (4.12). They are represented by systems (5.23) and (5.24), respectively. y1 = y11 ∨ y12 ; y3 = y31 ∨ y32 ;
y2 = y21 ∨ y22 ; y4 = y41 ∨ y42 .
v1 = v11 ∨ v12 .
(5.23) (5.24)
126
5 Combining Twofold State Assignment with Transformation of Object Codes
Table 5.4 Table of LUTerT for Mealy FSM PY T(Γ9 ) am Y V y1 y2 y3 y4 v1 a1
0000 1001 1110 1101 1100 0010 0001 0100 0010 1000 0010 1100 0001 0101 1010
a2 a3 a4 a5 a6 a7 a8 a9
0 0 0 0 0 0 0 0 0 1 1 1 1 0 0
C(am ) τ1 τ2 τ3 τ4 τ5
h
01000 01000 01000 01000 10000 11000 11000 00001 00001 00010 00010 00011 00100 00101 00110
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
There are the following columns in the table of LUTerT : am , Y , V , C(am ), h. The number of rows in this table is equal to the number of pairs Yq , Im representing the states am ∈ A. In the discussed case, there are 15 rows in Table 5.4. This number coincides with the number of terms in system (5.18). Table of LUTerT is used to construct the system (5.17). It is the system (5.24) in the disused case. Next, each equation of (5.17) is transformed into a truth table for a particular LUT. τ1 τ2 τ3 τ4 τ5
= = = = =
y1 y2 y¯3 y¯4 v¯1 ∨ y¯1 y¯2 y3 y¯4 v¯1 ∨ y¯1 y¯2 y¯3 y4 v¯1 ; y¯1 y¯2 y¯3 y¯4 v¯1 ∨ y1 y¯3 y4 v¯1 ∨ y1 y2 y3 y¯4 v¯1 ∨ y¯1 y2 y3 y¯4 v¯1 ∨ y¯1 y¯2 y¯3 y4 v¯1 ; y¯1 y¯2 y¯3 y4 v1 ∨ y¯1 y2 y¯3 y4 v¯1 ∨ y1 y¯2 y3 y¯4 v¯1 ; y1 y¯3 y¯4 v1 ∨ y¯1 y¯2 y3 y¯4 v1 ; y¯1 y2 y¯3 y¯4 v¯1 ∨ y1 y¯2 y¯3 y¯4 v¯1 ∨ y1 y2 y¯3 y¯4 v1 ∨ y¯1 y2 y¯3 y4 v¯1 .
(5.25)
To construct the table of LUT, it is necessary to transform SOPs (5.25) into perfect SOPs. We shown how to do it in the previous chapters. For example, the function τ4 depends on variables y1 , y2 , y3 , v1 . It could be represented by the Karnaugh map (Fig. 5.11). The transformation from Karnaugh map into truth table is executed in the trivial way. There are 10 LUTs in LUTer1 and LUTer2 of PY T(Γ9 ). There are 5 LUTs in LUTerYV. Because the condition (5.16) is true, then each function (5.25) is represented by a single LUT. So, there are 5 LUTs in the circuit of LUTerT . In total, there
5.2 Synthesis of Mealy FSMs with Transformation of Microoperations Fig. 5.11 Karnaugh map for function τ4
y1 y2
y3 v1
Fig. 5.12 Structural diagram of PY YT Mealy FSM
127
00
01
11
10
00
0
0
0
0
01
0
0
1
1
11
1
1
1
1
10
0
0
1
1
1
X1
LUTerK
LUTer1 Z1
K
XK
V1
K
Z
K
V
LUTerZV Z
V
LUTerY
Clock Start
LUTer
Y
are NΦ + NY V + Nτ = 10 + 5 + 5 = 20 LUTs with S = 5 in the circuit of FSM PY T(Γ9 ). Now let us discuss how to design LUT-based PY YT Mealy FSM. Its structural diagram is shown in Fig. 5.12. In LUT-based PY YT Mealy FSM, blocks LUTer1–LUTerK implement systems (4.10), (4.11). The systems (3.37) and (4.12) are implemented by LUTerZV. The LUTerY implements the system (2.4). The system (4.13) is implemented by LUTerT . This approach has sense if the following condition takes place: R Q + R I ≤ S.
(5.26)
We propose the synthesis method for PY YT Mealy FSM. It includes the following steps: 1. 2. 3. 4. 5. 6.
Finding the set of states A. Construction of the partition B = {A1 , ..., A K }. Constructing the set of identifiers I . Constructing the collections of MOs Yq ⊆ Y . Representing states of FSM as functions of CMOs and identifiers. Encoding of the states am ∈ Ak , identifiers Im ∈ I and CMOs Yq ⊆ Y .
128
5 Combining Twofold State Assignment with Transformation of Object Codes
Fig. 5.13 Codes of CMOs for Mealy FSM PY YT(Γ9 )
7. 8. 9. 10. 11.
z1 z2 z3 z4
00
01
11
10
00
Y1
Y4
Y2
Y3
01
Y5
*
Y10 Y9
11
*
*
*
10
Y6
Y8
Y7 Y11
*
Constructing tables STk (k = 1, K ). Constructing table of LUTerZV. Constructing table of LUTerT . Constructing table of LUTerY. Implementing FSM circuit with particular LUTs.
Let us discuss an example of synthesis for Mealy FSM PY YT(Γ9 ). There are already executed five first steps (during the previous example). The system (5.18) is an outcome for the step 5. Let us use the same codes C(am ) and K (Im ) as for Mealy FSM PY T(Γ9 ). Let us encode CMOs Yq ⊆ Y in such a manner that the number of interconnections is minimal for LUTerY. Let us form the system (2.4). It is the following one: y1 = Y2 ∨ Y3 ∨ Y7 ∨ Y9 ∨ Y10 ∨ Y11 ; y2 = Y2 ∨ Y4 ∨ Y8 ∨ Y10 ∨ Y11 ;
y3 = Y5 ∨ Y9 ∨ Y10 ; y4 = Y6 ∨ Y7 ∨ Y8 ∨ Y11 .
(5.27)
Let us use an algorithm [9] for encoding of the CMOs Yq ⊆ Y . There is the outcome of encoding shown in Fig. 5.13. To encode CMOs, we use variables zr ∈ Z where |Z | = 4. Using the codes (Fig. 5.13), it is possible to transform the system (5.27) into the following system (corresponding to FSM from Fig. 5.14): y1 = z 1 ; y3 = z 4 ;
y2 = z¯1 z 2 ∨ z 2 z¯3 ∨ z 1 z¯2 z 3 ; y4 = z 3 .
(5.28)
As follows from (5.28), there is only a single LUT in the LUTerY. It has three interconnections. The functions y1 , y3 and y4 are generated by the block LUTerZV. Because R Q = 4 and R I = 1, the condition (5.26) takes place. It means that there is sense in using the model PY YT for GSA Γ9 and LUTs with S = 5. There is the same approach for constructing tables STk for PY T and PY YT FSMs. But in the later case, the column Yhk is replaced by Z hk . These tables are represented by Tables 5.5 and 5.6. Let us compare Tables 5.2 and 5.5. They differ only by the column number four. In Table 5.2 this column includes microoperations yn ∈ Y , whereas there are variables
5.2 Synthesis of Mealy FSMs with Transformation of Microoperations
X
1
1
K
K
X
LUTerK
LUTer1 Z1
129
K
V1
K
Z
V
LUTerZV Z
V
Clock Start
EMB Y
Fig. 5.14 Structural diagram of PY YT Mealy FSM with EMB Table 5.5 Table ST1 of Mealy FSM PY YT(Γ9 ) am C(am ) X h1 Z h1 a1 a2
01 10
a3
11
1 x1 x2 x1 x¯2 x¯1 x3 x¯1 x¯3 x1 x¯1
z 11 z 21 z 41 z 31 z 11 z 21 z 41 z 11
Table 5.6 Table ST2 of Mealy FSM PY YT(Γ9 ) am C(am ) X h2 Z h2 a4
001
a5
010
a6
011
a7 a8 a9
100 101 110
x4 x¯4 x4 x¯5 x5 x¯5 1 1 1
Ih1
Vh1
h
− I11 I11 I11 − I21 I21
− − − − − v11 v11
1 2 3 4 5 6 7
Ih2
Vh2
h
z 12 z 22 z 32 z 12 z 22 z 32
I22 I22 I12
− z 12 z 22 z 12 z 22 z 42 z 22 z 32 z 12 z 42 z 12 z 32
− I22 − − − −
v12 v12 − − v12 − − − −
1 2 3 4 5 6 7 8 9
130
5 Combining Twofold State Assignment with Transformation of Object Codes
zr ∈ Z in Table 5.5. These variables are taken from Fig. 5.13. They are equal to 1 in codes K (Yq ) from corresponding rows of Table 5.6. The same is true for Tables 5.3 and 5.6. Tables STk (k = 1, K ) are used to construct the systems (4.10) and (4.11). For example, the following equations could be derived from Table 5.5: z 11 = τ¯1 τ2 ∨ τ1 τ¯2 x¯1 x3 ∨ τ1 τ2 x¯1 ; v11 = τ1 τ2 .
(5.29)
These equations are used to form truth tables for LUTs of LUTerk (k = 1, K ). We use systems (3.37) and (4.12) to construct truth tables for LUTs from LUTerZV. For example, the following equations could be derived: z 1 = z 11 ∨ z 12 ;
z 2 = z 21 ∨ z 22 ;
z 3 = z 31 ∨ z 32 ;
v1 = v11 ∨ v12 .
z 4 = z 41 ∨ z 42 .
(5.30) (5.31)
Equations (5.30) are used to form truth tables for LUTs generating functions zr ∈ Z . The equation (5.31) is the base to truth table for function v1 ∈ V . Let us point out that equations (5.30) and (5.24) are the same. Table of LUTerY is used to find functions (2.4). But we have this system for the discussed case. So, this step could be omitted. In general case, there are the following columns in table of LUTerY: Yq , K (Yq ), Y , q. Table of LUTerT has the following columns: am , Yq , Z , V , C(am ), h. It is practically the same as its counterpart for PY T Mealy FSM. But microoperations yn ∈ Yq are replaced by variables zr ∈ Z . It is Table 5.7 in the discussed case. The table of LUTerT gives possibility to find functions T = T (Z , V ).
(5.32)
Next, each equation of this system is transformed into a truth table of corresponding LUT. In the discussed case, there are NΦ = 10, N Z V = 5, NY = 1 and Nτ = 5. So, there are 21 LUTs with S = 5 in the circuit of Mealy FSM PY YT(Γ9 ). So, there is practically the same amount of LUTs in FSMs PY T(Γ9 ) and PY YT(Γ9 ). It is connected with the fact that the following condition takes place: RQ = N .
(5.33)
As a rule, the following condition is true: RQ N .
(5.34)
If (5.33) is true, we can expect that PY T FSM has less hardware then equivalent PY YT FSM. If (5.34) is true, then it is better to use the model of PY YT Mealy FSM.
5.2 Synthesis of Mealy FSMs with Transformation of Microoperations Table 5.7 Table of LUTerT for Mealy FSM PY YT(Γ9 ) am Yq Z V z1 z2 z3 z4 v1 a1
a2 a3 a4 a3 a6 a7 a8 a9
Y1 Y7 Y10 Y11 Y2 Y5 Y6 Y4 Y3 Y3 Y5 Y2 Y6 Y8 Y9
0000 0100 1101 1010 1100 0001 0010 0100 1000 1000 0001 1100 0010 0110 1001
0 0 0 0 0 0 0 0 0 1 1 1 1 0 0
131
C(am ) τ1 τ2 τ3 τ4 τ5
h
01000 01000 01000 01000 10000 11000 11000 00001 00001 00010 00010 00011 00100 00101 00110
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
It is possible to replace some LUTs by EMB. If the following condition takes place (5.35) 2 R Q · N ≤ V0 , then LUTerY could be replaced by EMB. Let the following condition take place: 2 R Q +R I (N + R A ) ≤ V0 .
(5.36)
In this case, both LUTerY and LUTerT are replaced by EMB (Fig. 5.15). To design FSM with EMB, it is necessary to execute steps 1–8 of the discussed method. The steps 9–10 are replaced by the step of constructing the table of EMB. It includes the following columns: Z , V (address of a cell), Y, T (content of a cell), h. This table is constructed in the trivial way. It is possible to replace LUTerT of PY T FSM by EMB. It could be done if the following condition takes place: 2 R Q +R I · R A ≤ V0 .
(5.37)
132
5 Combining Twofold State Assignment with Transformation of Object Codes
Fig. 5.15 Structural diagram of P A T Mealy FSM
1
1
X
X
LUTerK
LUTer1 1
Φ Clock Start
K
K
V
1
K
Φ
V
K
LUTerZV V T LUTerY
LUTer
Y
5.3 Synthesis of Mealy FSMs with Transformation of State Codes It is possible to represent CMOs Yq ⊆ Y as functions of states am ∈ A and identifiers [1, 3]. If Q m different CMOs are generated during transitions into state am ∈ A, then it is necessary Q m identifiers to distinguish these CMOs. So, the value of Im is determined as (5.38) Im = max(Q 1 , ..., Q M ). Now, microoperations yn ∈ Y could be determined by (5.13), whereas CMOs Yq ⊆ Y by (5.14). There is the structural diagram of P A T Mealy FSM shown in Fig. 5.16. In P A T Mealy FSM, systems (3.12) and (4.11) are implemented by LUTer1– LUTerK. The LUTerTV implements functions (3.14) and (4.12). The system (5.13) is generated by LUTerY, the system (2.43) by LUTerT . We propose the method of synthesis for P A T Mealy FSMs. The method includes the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Constructing the set A for a given GSA Γ . Constructing the partition B = {A1 , ..., A K }. Constructing the set of identifiers I . Constructing the pairs am , Im for MOs yn ∈ Y . Encoding of the states am ∈ A by codes K (am ) and C(am ). Encoding of identifiers Im ∈ I . Constructing tables for LUTer1–LUTerK. Constructing table of LUTerTV. Constructing table of LUTerY. Constructing table of LUTerT . Implementing FSM circuit with particular LUTs.
5.3 Synthesis of Mealy FSMs with Transformation of State Codes
133
Fig. 5.16 Marked GSA Γ10
Let us discuss an example of synthesis for Mealy FSM P A T(Γ10 ). There is a GSA Γ10 shown in Fig. 5.17. The following sets can be found from GSA Γ10 : A = {a1 , ..., a8 }, X = {x1 , ..., x4 }, Y = {y1 , ..., y9 }. Let us find the partition B using methods from Chap. 3. Recall, there is S = 5. There is the partition B including the classes A1 = {a1 , ..., a4 , a6 , ..., a8 } and 2 A = {a5 }. It gives the sets X 1 = {x1 , x2 } and X 2 = {x3 , x4 }. Because Q m ≤ 2 for any state am ∈ A, there is Im = 2. So, there is the set I = {I1 , I2 }.
134
5 Combining Twofold State Assignment with Transformation of Object Codes
T1 T2
Fig. 5.17 State codes for P A T(Γ10 )
00
01
11
10
0
a1
a2
a7
a4
1
a8
a3
a6
a5
T3
Let us construct the table showing dependence among states, microoperations and identifiers for FSM P A T(Γ10 ). It is Table 5.8. This table is similar to Table 5.1. Having Table 5.8, it is possible to express MOs yn ∈ Y as functions of pairs am , Im . This dependence is represented by system (5.39). y1 y2 y3 y4 y5 y6 y7 y8 y9
⇒ a2 , ∅ ∨ a3 , I2 ∨ a8 , I2 ∨ a1 , I2 ; ⇒ a3 , I1 ∨ a3 , I2 ∨ a1 , I2 ; ⇒ a3 , I1 ∨ a4 , ∅; ⇒ a3 , I1 ∨ a6 , ∅ ∨ a8 , I1 ; ⇒ a3 , I1 ∨ a3 , I2 ∨ a7 , ∅ ∨ a1 , ∅; ⇒ a3 , I1 ∨ a5 , ∅; ⇒ a4 , ∅; ⇒ a2 , ∅ ∨ a8 , I2 ; ⇒ a5 , ∅ ∨ a6 , ∅.
(5.39)
Let us execute the diminishing state encoding. There are the following sets A(Ak ): A(A1 ) = {a1 , ..., a7 }, A(A2 ) = {a8 }. The outcome of state assignment is shown in Fig. 5.18. Because M0 = 8, we have R0 = 3. It gives the sets T = {T1 , T2 , T3 } and Φ = {D1 , D2 , D3 }. Obviously, there are three bits in state codes for FSM P A T(Γ10 ). There are R1 = 3 and R2 = 1. It gives T 1 = {τ1 , τ2 , τ3 } and T 2 = {τ4 }. Let C(a1 ) = 001, ..., C(a4 ) = 100, C(a6 ) = 101, C(a7 ) = 110, C(a8 ) = 111 and C(a5 ) = 1. There is M I = 2. It gives R I = 1 and V = {v1 }. Let us encode the identifiers Im ∈ I as the following: K (I1 ) = 0 and K (I2 ) = 1. Now, it is possible to construct the tables for LUTer1 (Table 5.9) and LUTer2 (Table 5.10). The following systems of Boolean functions could be derived from Tables 5.9 and 5.10: D11 = τ¯1 τ2 τ¯3 x¯1 ∨ τ¯1 τ2 τ3 ∨ τ1 τ¯2 τ¯3 ∨ τ1 τ¯2 τ3 x1 ∨ τ1 τ2 τ¯3 x2 ; D21 = τ¯1 τ¯2 τ3 ∨ τ¯1 τ2 τ¯3 x1 ∨ τ¯1 τ2 τ3 ∨ τ1 τ¯2 τ3 ∨ τ1 τ¯2 τ3 x1 ∨ τ1 τ2 τ¯3 x2 ; D31 = τ¯1 τ2 τ¯3 x1 x¯2 ∨ τ¯1 τ2 τ3 x¯2 ∨ τ1 τ¯2 τ3 x¯1 ∨ τ1 τ2 τ¯3 x¯2 ; v11 = τ¯1 τ2 τ¯3 x1 x¯2 ∨ τ¯1 τ2 τ3 x¯2 ∨ τ1 τ¯2 τ3 x¯1 ∨ τ1 τ2 τ¯3 x¯2 . D12 = D22 0;
D32 = τ4 ; v12 = τ4 x¯3 .
(5.40)
(5.41)
The system (5.40) determines 4 LUTs of LUTer1, the system (5.41) the single LUT of LUTer2. So, there is NΦ = 5 in the discussed case.
5.3 Synthesis of Mealy FSMs with Transformation of State Codes Table 5.8 Dependence between objects for FSM P A T(Γ10 )
135
Yn
am
I
y1
a2 a3 a8 a1 a3 a3 a1 a3 a4 a3 a6 a6 a3 a3 a7 a1 a3 a5 a4 a2 a8 a5 a6
− I2 I2 I2 I1 I2 I2 I1 − I1 − I1 I1 I2 − I2 I1 − − I2 − − −
y2
y3 y4
y5
y6 y7 y8 y9
Fig. 5.18 Structural diagram of PY T Mealy FSM
1
1
X
LUTerK
LUTer1 Φ1 Clock Start
K
K
X
V1
K
Φ
LUTerTV V T LUTerZ Z LUTerY Y
LUTer
V
K
136
5 Combining Twofold State Assignment with Transformation of Object Codes
Table 5.9 Table ST1 of Mealy FSM P A T(Γ10 ) am C(am ) as K (as ) X h1 a1 a2
001 010
a3
011
a4 a6
100 101
a7
110
a8
111
a2 a3 a3 a4 a6 a7 a5 a6 a1 a7 a1 a1
010 011 011 100 111 110 101 111 000 110 000 000
1 x1 x2 x1 x¯2 x¯1 x2 x¯2 1 x1 x¯1 x2 x¯2 1
Table 5.10 Table ST2 of Mealy FSM P A T(Γ10 ) am C(am ) as K (as ) X h2 a5
1
a7 a8 a8
001 001 001
x3 x¯3 x4 x¯3 x¯4
Ih1
Φh1
Vh1
h
− I11 I21 − I11 I21 − I11 I21 I11 I21 −
D21 D21 D31 D21 D31 D11 D11 D21 D31 D11 D21 D11 D31 D11 D21 D31 − D11 D21 − −
− − v11 − − v11 − − v11 − v11 −
1 2 3 4 5 6 7 8 9 10 11 12
Ih2
Φh2
Vh2
h
I12 I12 I22
D32 D32 D32
− v12 v12
1 2 3
The following system determines the LUTerTV: D1 = D11 ;
D2 = D21 ;
D3 = D31 ∨ D32 ; v1 = v11 ∨ v12 .
(5.42)
As follows from (5.42) there are 2 LUTs in the circuit of LUTerTV. To design this circuit, it is necessary to construct tables for functions D3 and v1 . Table of LUTerY is constructed on the base of system (5.39). The number of rows is equal to the number of pairs am , Im in the system (5.39). So, the LUTerY is represented by Table 5.11. Using Table 5.11, it is possible to derive the equations (5.13). For example, there are the following Boolean functions: y1 = T¯1 T¯2 T¯3 v1 ∨ T¯1 T2 T¯3 ∨ T¯1 T3 v1 ; y2 = T¯1 T¯2 T¯3 v1 ∨ T¯1 T2 T3 ; y9 = T1 T3 .
(5.43)
Next, each equation (5.43) is represented by a truth table. The truth tables are used for finding bit-streams of particular LUTs [4, 5]. The table of LUTerT is constructed in the trivial way. We do not discuss this step for given example.
5.3 Synthesis of Mealy FSMs with Transformation of State Codes Table 5.11 Table of LUTerY for Mealy FSM PY YT(Γ10 ) am K (am ) Im K (Im ) Y T1 T2 T3 v1 y1 y2 y3 y4 y5 y6 y7 y8 y9 a1 a2 a1
000 010 011
a4 a5 a6 a7 a8
100 101 111 110 001
I2 − I1 I2 − − − − I1 I2
1 − 0 1 − − − − 0 1
110010000 100000010 011111000 110010000 001010100 000001011 000100001 000010000 000100000 100000010
137
h 1 2 3 4 5 6 7 8 9 10
Fig. 5.19 Codes of CMOs for FSM P A YT(Γ10 )
Now let us discuss the case when CMOs Yq ⊆ Y are encoded by binary codes K (Yq ). It results in P A YT Mealy FSM shown in Fig. 5.19. In P A YT FSM, there is the block LUTerZ implementing the system (5.14). In this FSM, the system (2.4) is implemented by the block LUTerY. We propose the synthesis method for P A YT Mealy FSM. It includes the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Constructing the set A for a given GSA Γ . Constructing the partition B of the set A. Constructing the collections of MOs Y Q ⊂ Y . Constructing the set of identifiers I . Finding pairs am , Im for CMOs Yq ⊂ Y . Encoding of the states am ∈ A by codes K (am ) and C(am ). Encoding of identifiers Im ∈ I . Encoding of CMOs Yq ⊆ Y . Constructing tables for LUTer1–LUTerK. Constructing table of LUTerTV, LUTerZ, LUTerT and LUTerY. Implementing FSM circuit with particular LUTs.
138
5 Combining Twofold State Assignment with Transformation of Object Codes
Let us discuss an example of synthesis for Mealy FSM P A YT(Γ10 ). Due to the previous example, there are executed steps 1, 2, 4, 6, 7, 9 of the discussed example. Also, there are the same tables of LUTerTV and LUTerT for PY T(Γ10 ) and PY YT(Γ10 ). Let us show how to execute the steps unique for PY YT FSM. There are Q = 9 CMOs Yq ⊆ Y which could be extracted from operational vertices of GSA Γ10 . They are the following: Y1 = ∅, Y2 = {y1 , y8 }, Y3 = {y2 , y3 , y4 , y5 , y6 }, Y4 = {y1 , y2 , y5 }, Y5 = {y3 , y5 , y7 }, Y6 = {y6 , y8 , y9 }, Y7 = {y4 , y9 }, Y8 = {y5 }, Y9 = {y4 }. Let us find the dependence between CMOs Yq ⊆ Y and pairs am , Im . It is represented by system (5.44): Y1 ⇒ a1 , I1 ; Y4 ⇒ a3 , I2 ∨ a1 , I2 ; Y7 ⇒ a6 , ∅;
Y2 ⇒ a2 , ∅ ∨ a8 , I2 ; Y5 ⇒ a4 , ∅; Y8 ⇒ a7 , ∅;
Y3 ⇒ a3 , I1 ; Y6 ⇒ a5 , ∅; Y9 ⇒ a8 , I1 .
(5.44)
Let us represent MOs yn ∈ Y as functions of CMOs Yq ⊆ Y . It gives the system (5.45): y1 = Y2 ∨ Y4 ; y4 = Y3 ∨ Y7 ∨ Y9 ; y7 = Y5 ;
y2 = Y3 ∨ Y4 ; y5 = Y3 ∨ Y4 ∨ Y5 ∨ Y8 ; y8 = Y2 ∨ Y6 ;
y3 = Y3 ∨ Y5 ; y6 = Y3 ∨ Y6 ; y9 = Y6 ∨ Y7 .
(5.45)
There is Q = 9, so R Q = 4. It gives the set Z = {z 1 , ..., z 4 }. Let us execute the diminishing encoding of CMOs for the discussed example. The outcome is shown in the Karnaugh map (Fig. 5.20). Using system (5.45) and the codes from Fig. 5.20, it is possible to transform the system (5.45) into the following system:
Fig. 5.20 Structural diagram of PY T B and PY TC Moore FSMs
1
X1
LUTerK
LUTer1 1
Y Clock Start
K
XK
V
1
YK
VK
LUTerYV V LUTer Y
5.3 Synthesis of Mealy FSMs with Transformation of State Codes Table 5.12 Table of LUTerZ for Mealy FSM P A YT(Γ10 ) am K (am ) Im K (Im ) Z T1 T2 T3 v1 z1 z2 z3 z4 a1 a2 a1
000 010 011
a4 a5 a6 a7 a8
100 101 111 110 001
I2 − I1 I2 − − − − I1 I2
y1 = z¯1 z 2 z¯3 ; y4 = z 1 ; y7 = z¯1 z¯2 z 3 ;
1 − 0 1 − − − − 0 1
0101 0100 1111 0101 0011 0110 1010 0001 1100 0100
y2 = z 2 z 4 ; y5 = z 4 ; y8 = z¯1 z 2 z 3 ;
139
h
Yq
1 2 3 4 5 6 7 8 9 10
Y4 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y2
y3 = z 3 z 4 ; y6 = z 2 z 3 ; y9 = z 3 z¯4 .
(5.46)
Analysis of (5.46) shows that there are 7 LUTs and 17 interconnections into LUTerY. It gives 23% economy for LUTs and 53% of saving for interconnections. Table of LUTerZ is similar to table of LUTerY for P A T FSM. But there is a replacement of the column Y (Table 5.11) by the column Z (Table 5.12). In (Table 5.12) we use codes from Fig. 5.20. We added the column Yq in Table 5.12 to explain the filling column Z. We took CMOs Yq from the system (5.44). Using table of LUTerZ, it is possible to form truth tables for functions (5.14). It is possible to minimize the number of interconnections in LUTerZ by minimizing equations from (5.14). There are no problems with constructing tables and Boolean systems representing circuits of LUTers. So, we do not discuss these steps for given examples.
5.4 Synthesis of Moore FSMs with Transformation of Microoperations Two approaches are possible for Moore FSM. It is possible to express states as functions (5.1) or (5.2). It is possible to represent classes of PES as functions depending either on MOs yn ∈ Y and identifiers Im ∈ I or on CMOs Yq ⊆ Y and identifiers: B = B(Y, I );
(5.47)
B = B(Z , I ).
(5.48)
140
5 Combining Twofold State Assignment with Transformation of Object Codes
Fig. 5.21 Structural diagram of PY YT B and PY YTC Moore FSMs
X
1
1
LUTer1 1
Z Clock Start
K
K
X
LUTerK V
1
K
K
Z
V
LUTerZV Z
V
LUTerY
LUTer
Y
If functions (5.1), (5.2) are generated, that it is necessary to find the partition B of the set A. It leads to structural diagrams of either PY T B or PY YT B Moore FSMs. If functions (5.47), (5.48) are generated, then the partition C = {B 1 , ..., B K } should be found. If results in either PY TC or PY YTC Moore FSMs. There are corresponding structural diagrams shown in Figs. 5.21 and 5.22. As follows from previous chapters, there are many similar steps in synthesis of PT B and PTC Moore FSM. So, let us discuss how to synthesize PY TC and PY Y TC Moore FSMs. There is sense in using the model of PY TC Moore FSM if the condition (5.16) is true. In this case, there is no need in functional decomposition for functions (5.17) generated by LUTerT . We propose the synthesis method for PY TC Moore FSMs. It includes the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Finding the set of states for GSA Γ . Constructing the partition A = {B1 , ..., B I }. Representing classes Bi ∈ A as functions (5.47). Constructing the partition C = {B 1 , ..., B K }. Encoding of the classes Bi ∈ B k (k = 1, K ). Encoding of identifiers Im ∈ I . Constructing tables ST1 –ST K . Constructing systems (3.11) and (4.11). Constructing table of LUTerYV. Constructing table of LUTerT . Implementing FSM circuit with particular LUTs.
Let us discuss an example of synthesis for Moore FSM PY TC (Γ11 ). There is the GSA Γ11 shown in Fig. 5.23. As in previous cases, let us use LUTs with S = 5. There are the following sets obtained from analysis of GSA Γ11 : A = {a1 , ..., a13 }, Y = {y1 , y2 , y3 }, X = {x1 , ..., x4 }. Using the definition of PES, we can find the parti-
5.4 Synthesis of Moore FSMs with Transformation of Microoperations
141
Fig. 5.22 Initial GSA Γ11
tion A = {B1 , ..., B7 } with the classes B1 = {a1 }, B2 = {a2 , a3 , a4 }, B3 = {a5 , a6 }, B4 = {a7 , a8 , a10 }, B5 = {a9 }, B6 = {a11 }, B7 = {a12 , a13 }. Let us find the CMOs written in the operational vertices of GSA Γ11 . There are the following CMOs: Y1 = ∅, Y2 = {y1 }, Y3 = {y2 }, Y4 = {y3 }, Y5 = {y1 , y2 }, Y6 = {y1 , y3 }, Y7 = {y2 , y3 } and Y8 = {y1 , y2 , y3 }. Let us construct a table showing dependence between CMOs Yq ⊆ Y and classes Bi ∈ A . It is Table 5.13 in the discussed case. As follows from Table 5.13, there is the set I = {I1 , I2 }. Let us represent classes Bi ∈ A as functions (5.47):
142
5 Combining Twofold State Assignment with Transformation of Object Codes
Fig. 5.23 Initial GSA Γ12
B1 B3 B5 B7
⇒ Y1 , ∅; ⇒ Y5 , I1 ∨ Y6 , I1 ; ⇒ Y8 , ∅; ⇒ Y5 , I2 ∨ Y6 , I2 .
B2 ⇒ Y2 , I1 ∨ Y3 , I1 ∨ Y4 , I1 ; B4 ⇒ Y2 , I2 ∨ Y4 , I2 ∨ Y7 , ∅; B6 ⇒ Y3 , I2 ;
(5.49)
These pairs are taken from Table 5.13. Each CMO Yq ⊆ Y could be represented as a conjunction N l ynnq (q = 1, Q). (5.50) Yq = n=1
5.4 Synthesis of Moore FSMs with Transformation of Microoperations Table 5.13 Dependence of classes on CMOs for FSM P(Γ11 ) Yq Bi Y1 Y2 Y2 Y3 Y3 Y4 Y4 Y5 Y5 Y6 Y6 Y7 Y8
B1 B2 B4 B2 B6 B2 B4 B3 B7 B3 B7 B4 B5
143
I − I1 I2 I1 I2 I1 I2 I1 I2 I1 I2 − −
In (5.50), the symbol lnq stands for Boolean function equal to 1 iff yn ∈ Yq . For example, there are Y1 = y¯1 y¯2 y¯3 , Y2 = y1 y¯2 y¯3 , Y3 = y¯1 y2 y¯3 and so on. If we use representation (5.50) in the system (5.49), we will get the system (5.47). Using approaches from previous chapters, we can find the partition C = {B 1 , B 2 } with B 1 = {B2 , B4 , ..., B7 }, B 2 = {B1 , B3 }. Using (3.6), (3.7), we can find that R1 = 3, R2 = 2, R A = 5. It gives T = {τ1 , ..., τ5 } with T 1 = {τ1 , τ2 , τ3 } and T 2 = {τ4 , τ5 }. Let us encode the classes Bi ∈ A in the following manner: C(B1 ) = 01, C(B2 ) = 001, C(B3 ) = 10, C(B4 ) = 010, C(B5 ) = 011, C(B6 ) = 100, C(B7 ) = 101. Because M I = 2, then using (4.8) gives R I = 1 and V = {v1 }. Let K (I1 ) = 0, K (I2 ) = 1. There are the following columns in tables ST1 –ST K of PY TC Moore FSM: Bi , C(Bi ), X hk , Yhk , Ihk , Vhk , h. In the discussed case, the LUTer1 is represented by Table 5.14, the LUTer2 by Table 5.15. Using Table 5.14, the following SOPs could be derived: y11 = τ¯1 τ¯2 τ3 x¯3 ∨ τ¯1 τ¯2 τ3 x¯2 ∨ τ1 τ¯2 τ¯3 ; y21 = τ¯1 τ¯2 τ3 x2 x¯3 ∨ τ¯1 τ2 τ¯3 ∨ τ1 τ¯2 τ¯3 x2 ; y31 = τ¯1 τ¯2 τ3 x3 ∨ τ¯1 τ¯2 τ3 x¯2 ∨ τ2 τ3 ∨ τ1 τ¯2 τ¯3 x¯2 ; v11 = τ¯1 τ2 ∨ τ1 τ¯2 τ¯3 .
(5.51)
Using Table 5.15, the following SBF could be derived: y12 = τ5 x1 ∨ τ4 x1 x¯4 ; y32 = τ4 x4 ∨ τ4 x¯1 ;
y22 = τ5 x¯1 ∨ τ4 x1 x4 ; v12 = τ4 x¯4 ∨ τ4 x¯1 .
(5.52)
144
5 Combining Twofold State Assignment with Transformation of Object Codes
Table 5.14 Table ST1 of Moore FSM PY T(Γ11 ) Bi C(Bi ) X h1 Yh1 B2
001
B4 B5 B6
010 011 100
B7
101
x2 x3 x2 x¯3 x¯2 1 1 x2 x¯2 1
Ih1
Vh1
h
y31 y11 y21 y11 y31 y21 y31 y11 Y21 y11 Y31
I11 I11 I11 I21 I21 I21 I21
−
−
0 0 0 v11 v11 v11 v11 −
1 2 3 4 5 6 7 8
Ih2
Vh2
h
I12 I12 I12 I22 I22
0 0 0 v12 v12
1 2 3 4 5
Table 5.15 Table ST2 of Moore FSM PY T(Γ11 ) Bi C(Bi ) X h2 Yh2 B1
01
B3
10
x1 x¯1 x1 x4 x1 x¯4 x¯1
y12 y22 y22 y32 y12 y32
Systems (5.51), (5.52) are used to construct truth tables for LUTs of LUTer1– LUTer2. As follows from these systems, there is NΦ = 8. Table of LUTerYV determines 4 SOPs shown below: y1 = y11 ∨ y12 ; y3 = y31 ∨ y32 ;
y2 = y21 ∨ y22 ; v1 = v11 ∨ y12 .
(5.53)
As follows from (5.53), there are NY V = 4 LUTs in the circuit of LUTerYV. Table of LUTerT is constructed using pairs Yq , Im . In the discussed case, the system (5.49) represents this dependence. There are the following columns in the table of LUTerT : Bi , Y, V, C(B1 ), h. It is Table 5.16 in the discussed case. Obviously, each pair Yq , Im determines an unique state am ∈ A. So, there are M = 13 rows in Table 5.16. Table of LUTerT determines R A functions τr ∈ τ . In the discussed case, it is system (5.54). τ1 τ2 τ3 τ4 τ5
= = = = =
y2 y¯3 v1 ∨ y1 y¯2 y3 v1 ; y2 y3 v¯1 ∨ y1 y¯2 y¯3 v1 ∨ y¯1 y¯2 y3 v1 ; y1 y¯2 y¯3 v¯1 ∨ y¯1 y2 y¯3 v¯1 ∨ y¯1 y¯2 y3 v¯1 ∨ y1 y2 y3 v¯1 ∨ y1 y2 y¯3 v1 ∨ y1 y¯2 y3 v1 ; (5.54) y1 y2 y¯3 v¯1 ∨ y1 y¯2 y3 v¯1 ; y¯1 y¯2 y¯3 v¯1 .
5.4 Synthesis of Moore FSMs with Transformation of Microoperations Table 5.16 Table of LUTerT of Moore FSM PY T(Γ11 ) Bi Y V y1 y2 y3 v1 B1 B2
B3 B4
B5 B6 B7
000 100 010 001 110 101 011 100 001 111 010 110 101
0 0 0 0 0 0 0 1 1 0 1 1 1
145
C(Bi ) τ1 τ2 τ3 τ4 τ5
h
00001 00100 00100 00100 00010 00010 01000 01000 01000 01100 10000 10100 10100
1 2 3 4 5 6 7 8 9 10 11 12 13
System (5.54) determines R A = 5 truth tables for LUTs of LUTerT . To construct them, it is necessary to obtain the perfect SOPs of functions (5.54). Let us discuss how to synthesize the circuit of PY YTC Moore FSM. Its structural diagram is shown in Fig. 5.22. Let us use the GSA Γ12 (Fig. 5.24) to illustrate the process of synthesis. The following sets could be derived from Γ12 : A = {a1 , ..., a13 }, X = {x1 , ..., x4 }, Y = {y1 , ..., y7 }. It gives the values of M = 13, L = 4 and N = 7. We propose the synthesis method for Moore FSMs having the structural diagram from Fig. 5.21. There are the following steps in the proposed method: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Finding the set of states A. Constructing the partition A = {B1 , ..., B I }. Constructing the CMOs Yq ⊆ Y . Constructing the set of identifiers I . Representing the classes Bi ∈ A as pairs Yq , Im . Constructing the partition C = {B 1 , ..., B K }. Encoding of the classes Bi ∈ B k , CMOs Yq ⊆ Y and identifiers Im ∈ I . Constructing tables of LUTer1–LUTerK. Constructing table of LUTerZV. Constructing table of LUTerT . Constructing table of LUTerY. Implementing FSM circuit with particular LUTs.
Let us execute steps 2–4 of proposed method. There is the partition A = {B1 , ..., B7 } with B1 = {a1 }, B2 = {a2 , a3 , a4 }, B3 = {a5 , a6 }, B4 = {a7 , a8 , a10 }, B5 = {a9 }, B6 = {a11 } and B7 = {a12 , a13 }. There are the following CMOs written
146
5 Combining Twofold State Assignment with Transformation of Object Codes
into operational vertices of GSA Γ12 : Y1 = ∅, Y2 = {y1 , y7 }, Y3 = {y1 , y2 , y5 }, Y4 = {y2 , y4 }, Y5 = {y3 , y4 }, Y6 = {y2 , y5 }, Y7 = {y3 , y7 }, Y8 = {y3 }, Y9 = {y3 , y5 , y6 }. Analysis of GSA Γ12 shows that some CMOs are generated in the states from two different classes Bi ∈ A . So, there is the set I = {I1 , I2 }. Let us represent classes Bi pairs Yq , Im : B1 B3 B5 B7
⇒ Y1 , ∅; ⇒ Y5 , I1 ∨ Y6 , I1 ; ⇒ Y9 , ∅; ⇒ Y5 , I2 ∨ Y3 , I2 .
B2 ⇒ Y2 , I1 ∨ Y3 , I1 ∨ Y4 , I1 ; B4 ⇒ Y2 , I2 ∨ Y8 , ∅ ∨ Y4 , I2 ; B6 ⇒ Y7 , ∅;
(5.55)
There is the following partition C = {B 1 , B 2 } with the classes B 1 = {B2 , B4 , B5 , B6 , B7 } and B2 = {B1 , B3 }. It is found for LUTs with S = 5. Now we have M1 = 5 and M2 = 2. Using (3.6) gives R1 = 3 and R2 = 2. So, there is R A = 5 and T = {τ1 , ..., τ5 }. Let T 1 = {τ1 , τ2 , τ3 } and T 2 = {τ4 , τ5 }. Let us encode the classes Bi ∈ B k in the trivial way: C(B1 ) = 01, C(B2 ) = 10, C(B3 ) = 001, C(B4 ) = 010, C(B5 ) = 011, C(B6 ) = 100 and C(B7 ) = 101. There is Q = 9. Using (2.24) gives R Q = 4 and Z = {z 1 , ..., z 4 }. Let us encode the CMOs Yq ⊆ Y in the diminishing way. Let us express MOs yn ∈ Y as functions of CMOs Yq ⊆ Y : y1 y3 y5 y7
= Y2 ∨ Y3 ∨ Y7 ; = Y5 ∨ Y8 ∨ Y9 ; = Y3 ∨ Y6 ∨ Y9 ; = Y2 ∨ Y7 .
y2 = Y3 ∨ Y4 ∨ Y6 ; y4 = Y4 ∨ Y5 ; y6 = Y9 ;
(5.56)
Let us encode the CMOs as it is shown in Fig. 5.2. Using codes from Fig. 5.24, we can turn the system (5.56) into the following system: y1 = z 2 ; y2 = z¯1 z 4 ; y3 = z 1 ; y4 = z¯3 z 4 ; (5.57) y5 = z 3 ; y6 = z 1 z 3 ; y7 = z 2 z¯4 .
Fig. 5.24 Diminishing codes K (Yq ) for Moore FSM PY YT(Γ12 )
z1 z2 z3 z 4
00
01
11
10
00
Y1
Y2
*
Y8
01
Y4
*
*
Y5
11
Y6
Y3
*
Y9
10
*
Y7
*
*
5.4 Synthesis of Moore FSMs with Transformation of Microoperations Table 5.17 Table ST1 of Moore FSM PY YT(Γ12 ) Bi C(Bi ) X h1 Yq1 Z h1 B2
001
B4 B5 B6
010 011 100
B7
101
x2 x3 x2 x¯3 x¯2 1 1 x2 x¯2 1
y4 y11 y5 y11 y6 y7 y4 y5 y11 Y3 Y1
Ih1
Vh1
h
z 41 z 11 z 41 z 31 z 41 z 21 z 31 z 41 z 11 z 41 z 21 z 31 z 41
I11 I11 I11 I21 I21 I21 I21
−
−
0 0 0 v11 v11 v11 v11 −
1 2 3 4 5 6 7 8
Ih2
Vh2
h
I12 I12 I12 I22 I22
0 0 0 v12 v12
1 2 3 4 5
Table 5.18 Table ST2 of Moore FSM PY YT(Γ12 ) Bi C(Bi ) X h2 Yq2 Z h2 B1
01
B3
10
x1 x¯1 x1 x4 x1 x¯4 x¯1
y2 y3 y2 y8 y9
147
z 22 z 22 z 32 z 42 z 22 z 12 z 12 z 22 z 32
Analysis of system (5.57) shows that there are 4 LUTs in the circuit of LUTerY. Also, it has only 8 interconnections. It gives 42% economy in LUTs and 71% economy in interconnections. Because M I = 2, using (4.8) gives R I = 1 and V = {v1 }. Let us encode the identifiers as following: K (I1 ) = 0, K (I2 ) = 1. Now, we have executed everything connected with step 7. There are the following columns in a table STk : Bi , C(Bi ), X hk , Yqk , Z hk , Vh1 , h. In the discussed case, these tables are represented by Tables 5.17 and 5.18. The codes K (Yq ) are taken from Fig. 5.24. Using these tables gives systems (4.10), (4.11). They are constructed in the well-known way. For example, the following equations could be obtained from Table 5.17: z 41 = τ1 τ¯2 τ¯3 ∨ τ¯1 τ3 ; v11 = τ¯1 τ2 τ3 ∨ τ1 τ¯2 τ¯3 . To construct truth tables for LUTerZV, it is necessary to find systems (3.37) and (4.12). They are represented by the following system in the discussed case: z 1 = z 11 ∨ z 12 ; z 4 = z 41 ∨ z 42 ;
z 2 = z 21 ∨ z 22 ; v1 = v11 ∨ v12 .
z 3 = z 31 ∨ z 32 ;
(5.58)
Table of LUTerT is constructed using a system similar to (5.55). It is Table 5.19 in the discussed example. Table of LUTerT uses codes K (Yq ) from Fig. 5.24. This table is a base for constructing SBF (5.33). To optimize this system, it is possible to use the insignificant
148
5 Combining Twofold State Assignment with Transformation of Object Codes
Table 5.19 Table of LUTerT of Moore FSM PY T(Γ11 ) Bi Yq Im Z V z1 z2 z3 z4 v1 B1 B2
Y1 Y2 Y3 Y4 Y5 Y6 Y2 Y8 Y4 Y9 Y7 Y2 Y3
B3 B4
B5 B6 B7
∅ I1 I1 I1 I1 I2 I2 ∅ I2 ∅ ∅ I2 I2
0000 0100 0111 0001 1001 0011 0100 1000 0001 1011 0110 1001 0111
− 0 0 0 0 0 1 0 1 − − 1 1
C(Bi ) τ1 τ2 τ3 τ4 τ5
h
00001 00100 00100 00100 00010 00010 01000 01000 01000 01100 10000 10100 10100
1 2 3 4 5 6 7 8 9 10 11 12 13
assignments of variables zr ∈ Z . Of course, the optimization has sense if some variable zr ∈ Z or vr ∈ V is deleted from all terms of some function (5.33). The following system could be derived from Table 5.19: τ1 τ2 τ3 τ4 τ5
= z¯1 z 2 z 3 z¯4 ∨ z 1 z¯2 z¯3 z 4 v1 ∨ z¯1 z 2 z 3 z 4 ; = z¯1 z 2 z¯3 z¯4 v1 ∨ z 1 z¯2 z¯3 z¯4 v¯1 ∨ z¯1 z¯2 z¯3 z 4 v1 ∨ z 1 z¯2 z 3 z 4 ; = z¯1 z 2 z¯3 z¯4 v¯1 ∨ z¯1 z 2 z 3 z 4 v¯1 ∨ z¯1 z¯2 z¯3 z 4 v¯1 ∨ z 1 z¯2 z 4 v1 ∨ z¯1 z 2 z 3 z 4 v1 ; = z 1 z¯2 z¯3 z 4 v¯1 ∨ z¯1 z¯2 z 3 z 4 v¯1 ; = z¯1 z¯2 z¯3 z¯4 .
(5.59)
The table of LUTerY is constructed in the trivial way. We do not discuss this step leaving it to the reader. If the following condition takes place 2 N +R I · R A ≤ V0 ,
(5.60)
then both LUTerY and LUTerT could be replaced by a single EMB (for PY YT B and PY YTC Moore FSMs). Using EMB changes a bit the synthesis methods. It is necessary to construct table of EMB instead of either LUTerT or LUTerY and LUTerT . We hope that it could be done in the trivial way.
References
149
References 1. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of logic circuit of Moore FSM on CPLD. Pomiary Autom Kontrola 53(5):18–20 2. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on CPLD. In: Proceedings of the sixth international conference CAD DD’07, vol 2. Minsk, pp 39–45 3. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on system-on chip. In: Proceedings of IEEE east-west design and test symposium – EWDTS’07. Yerevan, Armenia, Kharkov, pp 105–109 4. Barkalov A, Titarenko L, Kołope´nczyk M (2006) Optimization of control unit with code sharing. In: Proceedings of the 3rd international workshop of IFAC discrete–event system design (DESDES’06), Rydzyna, 2006. University of Zielona Góra Press, pp 195–200 5. Barkalov A, We˛grzyn M (2006) Design of control units with programmable logic. University of Zielona Góra Press 6. Grout I (2008) Digital systems design with FPGAs and CPLDs. Elsevier Science, Oxford 7. Habib S (1988) Microprogramming and firmware engineering methods. Wiley, New York 8. Park S, Cho S, Yang S, Ciesielski M (2004) A new state assignment technique for testing and low power. In: Proceedings of the 41st annual design automation conference. Association for Computing Machinery, pp 510–513 9. Tatalov E (2011) Synthesis of compositional microprogram control units for programmable devices. Master’s thesis, Donetsk National Technical University, Donetsk
Chapter 6
Combining Twofold State Assignment with Replacement of Logical Conditions
Abstract The chapter is devoted to hardware reduction based on combining twofold state assignment with replacement of logical conditions. Embedded memory blocks are used for executing the replacement. The replacement for Moore FSMs is based on encoding of the classes of pseudoequivalent states. There is discussed the possibility of transformation of initial GSA allowing decreasing the number of additional variables. Next, these methods are discussed for both Mealy and Moore FSMs. Also, it is shown how to combine these two methods with encoding of the collections of microoperations. The last part of the chapter is devoted to synthesis methods based on transformation of initial GSA.
6.1 Analysis of Possible Solutions The RLC is reduced to replacement of logical conditions xe ∈ X by additional variables pg ∈ P, where |P| = G |X | = L [4–6]. The variables pg ∈ P are functions depending on X and T . They are represented by (2.1). After execution of RLC, functions Dr ∈ Φ are represented as (2.3). There is also change for microoperations of Mealy FSM. They are represented by (2.12). There is no change for functions yn ∈ Y of Moore FSM. If EMBs could be used for implementing FSM circuits, then it has sense to implement the circuit of the block RLC as EMBer (Fig. 2.13b). In the best case, the following condition takes place: 2 R0 +L · G ≤ V0 .
(6.1)
In this case, it is enough only a single EMB to implement the system (2.1) of Mealy FSM. In the case of Moore FSM, the following condition should take place: 2 R+L · G ≤ V0 .
(6.2)
In this chapter we discuss FSMs for whom conditions (6.1), (6.2) are true. All other blocks will be implemented using LUTs. © Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7_6
151
152
6 Combining Twofold State Assignment … X
Fig. 6.1 Structural diagram of MPT Mealy FSM EMB P P
1
1
LUTerK
LUTer1 1
Φ Clock Start
K
PK
Y
1
K
Φ
Y
K
LUTerYT T
Y
LUTer
There are structural diagrams of MP Mealy and Moore FSMs shown in Figs. 2.10 and 2.11, respectively. In the case of twofold state assignment, the BRLC is implemented as EMB. Also, it is necessary to transform the block BF. There is a structural diagram of MPT Mealy FSM shown in Fig. 6.1. We use EMB to implement the BRLC. We combine buses P and T into a single bus. The system P = P(T, X ) is implemented by EMB. It is possible that only some subset P k ⊆ P takes part into generating functions Φ k = Φ k (P k , T k ) (k = 1, K );
(6.3)
Y k = Y k (P k , T k ) (k = 1, K );
(6.4)
The LUTerYT implements functions (3.11), (3.12), the LUTerT the functions (3.15). Also, there is a feed-back from LUTerYT to EMB. It is necessary to implement functions (2.1). To optimize the number of K , we propose to distribute the logical conditions xe ∈ X among the functions pg ∈ P in the following order. If transitions from am ∈ A depend on a single logical condition xe ∈ X , then p1 = xe . If transitions from am ∈ A depend on logical conditions xe , xs ∈ X , where R < S, then p1 = xe and p2 = xs . And so on. This approach differs from methods [5, 6, 8] targeting PLAs. In PLA-based MP FSMs, the distribution of logical conditions targets diminishing for appearing the same variable xe ∈ X into different functions pg ∈ P. But in EMBbased BRLC, the circuit is represented by a truth table. There is the same size of this table for given values of R0 and L. This truth table always has 2 R0 +L rows and G columns. So, there is no influence of distribution of variables xe ∈ X on the hardware amount in the EMB-based BRLC.
6.1 Analysis of Possible Solutions
153 X EMB P 1
1
P
LUTer1
LUTerK K
1
Φ
Φ Clock Start
K
PK
LUTerYT T LUTerY Y
Fig. 6.2 Structural diagram of MPT B and MPTC Moore FSMs
For Moore FSMs, there are possible two basic solutions. The first is based on the partition Π B , the second on ΠC . But there are the same structural diagrams for MPT B and MPTC Moore FSMs (Fig. 6.2). The following condition takes place for Moore FSM: X (am ) = X (Bi ) (am ∈ Bi , i = 1, I ).
(6.5)
In (6.5), the symbol X (am ) stands for the set of logical conditions determining transitions from a state am ∈ A, the symbol X (Bi ) for the set of logical conditions determining transitions from all states am ∈ Bi . Due to it, it is possible to represent functions pg ∈ P as the following: P = P(B, X ).
(6.6)
It leads to Moore FSM M B P shown in Fig. 6.3. Its block RLC implements the system of functions P = P(T , X ). (6.7) This approach has sense if the following condition takes place: R > RB ,
(6.8)
where R B is determined as (2.42). If EMB implements the BRLC, then the condition (6.2) is replaced by the following condition:
154
6 Combining Twofold State Assignment …
X
P BRLC
RG
Y
T
B Y
Start Clock
Fig. 6.3 Structural diagram of M B P Moore FSM
X
Fig. 6.4 Structural diagram of M B PT B and M B PTC Moore FSMs
0
EMB P 1
1
P
LUTer1
LUTerK K
1
Φ
Φ Clock Start
K
K
P
LUTerYT T LUTerY Y
2 R B +L · G ≤ V0 .
(6.9)
But it is necessary to transform codes K (am ) into K (Bi ). So, there are additional LUTs in the block LUTerYT of M B PT Moore FSM in comparison with Fig. 6.2. Now, the set T includes R A + R B elements. It leads to the structural diagram shown in Fig. 6.4. There is a set T 0 shown in Fig. 6.4. It includes variables τr ∈ T used for encoding of the classes Bi ∈ Π A . So, there is twofold class assignment used in M B PT Moore FSMs. This approach could be used if the condition (6.2) is violated but the condition (6.9) takes place. Using special state assignment [2] could diminish requirements for EMB. In the case of Mealy and Moore MP FSMs. The state assignment starts from the states am ∈ A for whom X (am ) = ∅. These states create a set A0 . It is necessary R0 variables to encode states am ∈ A0 for Mealy FSM and R for Moore FSM: R0 = log2 |A0 | ;
(6.10)
R = log2 |A0 | .
(6.11)
6.1 Analysis of Possible Solutions
155 X
Fig. 6.5 Effect of special state assignment
TI EMB P
Let it be formed the set T ⊆ T including the variables used for encoding the states am ∈ A0 . Only these variables create a feed-back to EMB (Fig. 6.5). Now, functions P are represented as P = P(T , X ).
(6.12)
Let us point out that it has sense to transform codes K (am ) into K (Bi ) if X (am ) = ∅. So, it is necessary to find a set Π A0 including classes Bi ∈ Π A such that X (Bi ) = ∅. The value of R B0 is determined as R B0 = log2 |Π A0 | .
(6.13)
So, there are R B0 variables in the set T0 . If R B0 < R B , then: (1) it is necessary less amount of LUTs in LUTerYT and (2) it is necessary less amount of EMB inputs. Of course, it is “less” as compared to FSM based on using R B variables τr ∈ T 0 . There is one more way for optimizing BRLC. It is a transformation of GSA Γ [9]. Introducing additional vertices leads to decreasing the value of G. This principle is illustrated by Fig. 2.15. Using the transformation of GSA, it is possible to get G from 1 to its maximum value determined by a GSA. So, it is possible to get, for example, such models as M1 P, M2 P, ..., MG P for the same GSA [3]. In this chapter, we discuss the case with G = 1. It reminds the classical approach of Wilkes [1, 13] where only a single logical condition could be checked during a single cycle of control unit’s operation. Let us point out that control units of computers by IBM were able to check up to three logical conditions during a cycle [7, 11, 12]. It is very simple to transform a GSA in a manner leading to G = 1. It is done in the trivial way. There is a structural diagram of M1 PT Mealy FSM shown in Fig. 6.6. In M1 PT Mealy FSMs, the EMB implements only a single function p1 = p1 (T, X ).
(6.14)
This approach could be used if the following condition takes place: 2 R0 +L ≤ V0 .
(6.15)
In the case of M1 PT Moore FSM, the following condition should take place: 2 R+L ≤ V0 .
(6.16)
156
6 Combining Twofold State Assignment … X
Fig. 6.6 Structural diagram of M1 PT Mealy FSM EMB p1
1
p1 LUTer1 1
Φ Clock Start
K
p1 LUTerK
Y
1
K
Φ
K
Y
LUTerYT T
Y
LUTer
It is possible to diminish the requirements for the amount of address inputs of EMB. It could be done due to encoding of logical conditions [9]. In this case, each variable xe ∈ X is encoded by binary code K (xe ) having R L bits: R L = log2 (L + 1) .
(6.17)
It is added 1 to L in (6.17) to take into account unconditional transitions. Let us use variables wr ∈ W for encoding logical conditions. So, there is |W | = R L . These variables should be generated by transforming state codes K (am ) into K (xe ). Let us denote such FSMs as M1L P FSMs. There is a structural diagram of M1L PT Mealy FSM shown in Fig. 6.7. In M1L PT Mealy FSMs, the EMB generates function p1 = p1 (W, X ).
(6.18)
The block LUTerT W implements functions (2.43) and W = W (T ).
(6.19)
This approach could be used if the following condition takes place: R L < R0 .
(6.20)
Now, it is necessary only R L + L address inputs in EMB. But using this transformation leads to increasing the number of states (M0 ) and may be the value of R0 . In turn, it increases the number of LUTs in LUTer1–LUTerK, LUTerYT. Also, it is necessary at least R L LUTs to implement functions (6.19).
6.1 Analysis of Possible Solutions
157 X
Fig. 6.7 Structural diagram of M1L PT Mealy FSM EMB P1
1
P1
LUTerK
LUTer1 1
Φ Clock Start
K
P1
Y
1
ΦK
YK
LUTerYT T
Y
LUTer W W
Let us discuss the synthesis methods based on joined application of RLC and twofold state assignment. We discuss only some of possible models of Mealy and Moore FSMs. Let us name FSMs from Figs. 6.1 and 6.2 basic models of FSMs with RLC and twofold state assignment. Let us start from synthesis of basic models.
6.2 Synthesis of Basic Model of Mealy FSM There is a structural diagram of MPT Mealy FSM shown in Fig. 6.1. We propose a method of synthesis for Mealy FSM with basic model. It includes the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Finding the set A for initial GSA Γ . Finding the set of additional variables pg ∈ P. Executing the replacement of logical conditions. Transformation of initial GSA Γ . Finding the partition Π B = {A1 , ..., A K } based on RLC. Executing the twofold state assignment. Creating tables ST1 –ST K based on RLC. Constructing the table of EMB. Constructing the systems (3.13), (3.14). Constructing the table of LUTerT . Implementing FSM logic circuit.
Let us discuss an example of synthesis for Mealy FSM MPT (Γ13 ). There is the GSA Γ13 shown in Fig. 6.8. The following sets could be found from GSA Γ13 : A{a1 , ..., a7 } with M0 = 7, X = {x1 , ..., x9 } with L = 9 and Y = {y1 , ..., y6 } with
158
6 Combining Twofold State Assignment …
Fig. 6.8 Initial GSA Γ13
N = 6. Using (1.15) gives R0 = 3. Therefore, there are the sets T = {T1 , T2 , T3 } and Φ = {D1 , D2 , D3 }. There are the following sets X (am ) in the discussed case: X (a1 ) = ∅, X (a2 ) = {x1 , x2 }, X (a3 ) = {x3 }, X (a4 ) = {x4 , x5 }, X (a5 ) = {x6 }, X (a6 ) = {x7 , x8 }, X (a7 ) = {x9 }. Using (2.10) gives G = 2. So, there is the set P = { p1 , p2 }. Let us distribute logical conditions as it is shown in Table 6.1. The distribution is based on the following rule: if |X (am )| = 1, then xe ∈ X (am ) is replaced by p1 ∈ P.
6.2 Synthesis of Basic Model of Mealy FSM
159
Table 6.1 Distribution of logical conditions for Γ13 pq \ am a1 a2 a3 a4 − −
p1 p2
x1 x2
x3 −
x4 x5
a5
a6
a7
x6 −
x7 x8
x9 −
Let us transform the initial GSA Γ13 . The transformation is reduced to replacement of variables xe ∈ X by variables pg ∈ P. The variables pg ∈ P are written in conditional vertices of GSA Γ13 (Fig. 6.9). If we use LUTs with S = 5, then there is only a single block A1 = A in the partition Π B . So, let us use LUTs with S = 4. The partition Π B is constructed on the base of transformed GSA. There is the following partition Π B in the discussed case: Π B = {A1 , A2 } with A1 = {a2 , a4 , a6 } and A2 = {a1 , a3 , a5 , a7 }. As follows from Fig. 6.9, there are the sets P 1 = { p1 , p2 }, Y 1 = {y1 , y2 , y3 , y5 }, P 2 = { p1 }, Y 2 = Y . Let us use EMB having S A = 12 and t F = 2. So, there are the following relations for EMB and FSM PT(Γ13 ): S A = R0 + L, t F = G. So, it is possible to implement the table of RLC using this EMB without any special approach for state assignment. Because of it, let us execute the diminishing state assignment. There are the sets A(A1 ) = {a3 , a5 , a6 , a7 } and A(A2 ) = {a1 , ..., a4 , a6 , a7 }. The state codes are shown in Fig. 6.10. Using (3.6) gives R1 = 2 and R2 = 3. So, there is R A = 5. Let it be T 1 = {τ1 , τ2 }, 2 T = {τ3 , τ4 , τ5 }. It gives the set T = {τ1 , ..., τ5 }. Let us encode the states am ∈ Ak in the following way: C(a1 ) = 001, C(a2 ) = 01, C(a3 ) = 010, C(a4 ) = 10, C(a5 ) = 100, C(a6 ) = 11, C(a7 ) = 101. Now, it is possible to construct tables ST1 and ST2 . There are the following columns in table STk : am , C(am ), as , K (as ), Phk , Yhk , k Φh , h. The only difference between a table STk for PT and MPT Mealy FSM is reduced to replacement of the column X hk by the column Phk . There is the table ST1 represented by Table 6.2 and the table ST2 by Table 6.3. Let us find the systems (6.3), (6.4). The following systems could be derived from Table 6.2 (after minimization): D11 = τ1 ∨ τ2 ; y11 y21 y31 y41 y51 y61
D21 = τ1 τ2 ;
D31 = τ1 τ¯2 p1 ∨ τ1 τ2 p¯1 .
(6.21)
= τ1 τ2 p¯1 p2 ; = τ¯1 τ2 p1 p¯2 ∨ τ1 τ¯2 p1 p¯2 ∨ τ1 τ2 p¯1 p2 ; = τ¯1 τ2 p2 ∨ τ¯1 τ2 p¯1 ∨ τ1 τ¯2 p2 ∨ τ1 τ¯2 p¯1 ∨ τ1 τ2 p¯2 ∨ τ1 τ2 p1 ; = τ¯1 τ2 p1 p¯2 ∨ τ1 τ¯2 p1 p¯2 ∨ τ1 τ2 p1 p2 ; = τ¯1 τ2 p¯1 ∨ τ1 τ¯2 p¯1 ∨ τ1 τ2 p1 ; = τ1 τ¯2 p1 p2 .
(6.22)
160
6 Combining Twofold State Assignment …
Fig. 6.9 Transformed GSA Γ13
Fig. 6.10 Outcome of diminishing state assignment
T1 T2 00
01
11
10
0
a1
a4
a6
a3
1
a2
a7
a5
T3
6.2 Synthesis of Basic Model of Mealy FSM
161
Table 6.2 Table ST1 of Mealy FSM MPT (Γ13 ) am C(am ) as K (as ) Ph1 a2
01
a4
10
a6
11
a3 a3 a3 a5 a5 a3 a6 a7 a7
100 100 100 101 101 100 110 111 111
p1 p2 p1 p¯2 p¯1 p1 p2 p1 p¯2 p¯1 p1 p¯1 p2 p¯1 p¯2
Table 6.3 Table ST2 of Mealy FSM MPT (Γ13 ) am C(am ) as K (as ) Ph1 a1 a3
001 010
a5
010
a7
101
a2 a3 a4 a6 a3 a7 a1
001 100 010 110 100 111 000
1 p1 p¯1 p1 p¯1 p1 p¯1
Yh1
Φh1
h
y31 y21 y41 y31 y51 y31 y61 y21 y41 y31 y51 y31 y51 y11 y21 y31 y41
D11 D11 D11 D11 D31 D11 D31 D11 D11 D21 D11 D21 D31 D11 D21 D31
1 2 3 4 5 6 7 8 9
Yh1
Φh1
h
y12 y22 y32 y12 y52 y32 y52 y32 y52 y12 y22
D32 D12 D22 D12 D22 D12 D12 D22 D32 −
1 2 3 4 5 6 7
−
The following systems could be derived from Table 6.3: D12 = τ3 τ¯5 ∨ τ4 τ¯5 p1 ∨ τ3 τ5 p1 ; D22 = τ4 τ¯5 p¯1 ∨ τ4 p1 ; D32 = τ¯3 τ5 ∨ τ3 τ5 p1 . y12 = τ¯3 τ5 ∨ τ4 τ¯5 1¯ ∨ τ3 τ¯5 p1 ; y32 = τ4 τ¯5 p1 ∨ τ3 τ¯5 ; y52 = τ4 τ¯5 p¯1 ∨ τ3 τ¯5 ;
y22 = τ¯3 τ5 ∨ τ3 τ¯5 p1 ; y42 = 0; y62 = 0.
(6.23)
(6.24)
Systems (6.21)–(6.24) are used to construct truth tables for LUTs of LUTer1 and LUTer2. We do not discuss this step. There are the following columns in the table of EMB: K (am ), X , P, h. There are HE rows in this table, where (6.25) HE = 2 R0 +L . There is HE = 4096 in the discussed case. Let us present the EMB by Table 6.4. We added the column m to show the correspondence between K (am ) and am ∈ A.
162
6 Combining Twofold State Assignment …
Table 6.4 Table of EMB of Mealy FSM MPT (Γ13 ) K (am ) X T1 T2 T3 x1 x2 x3 x4 x5 x6 x7 x8 x9 000 001 001 001 001 010 010 010 010 011 100 100 101 101 110 110 110 110 111 111
××××××××× 00 × × × × × ×× 01 × × × × × ×× 10 × × × × × ×× 11 × × × × × ×× × × ×00 × × × × × × ×01 × × × × × × ×10 × × × × × × ×11 × × × × ××××××××× ××0×××××× ××1×××××× × × × × ×0 × ×× × × × × ×1 × ×× × × × × × × 00× × × × × × × 01× × × × × × × 10× × × × × × × 11× ××××××××0 ××××××××1
P p1 p2
h
m
00 00 01 10 11 00 01 10 11 00 00 10 00 10 00 01 10 11 00 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 2 2 2 2 4 4 4 4 − 3 3 5 5 6 6 6 6 7 7
row If X (am ) = 0, then there are only symbols “×” in the particular of Table 6.4. For example, the row 1 corresponds to 512 rows of EMB. If X (am ) = 1, then two rows represent all possible combinations of logical conditions. For example, the row 19 corresponds to 256 rows of EMB. If X (am ) = 2, then 4 rows of Table 6.4 correspond to values of logical conditions replaced by variables p1 , p2 ∈ P. We hope there is a transparent connection between Table 6.1, state codes from Fig. 6.10 and Table 6.4. The last table is a base for constructing bit-stream for EMB. Equations (3.13), (3.14) are constructed in the trivial way using Eqs. (6.21)–(6.24). For example, there are the following equations: D1 = D11 ∨ D12 ; y4 = y41 and so on. The table of LUTerT is constructed in the same way as for PT Mealy FSMs. Using RLC does not change the table of LUTerT . Obviously, it is possible to combine RLC with encoding the CMOs Yq ⊆ Y . It leads to MPYT Mealy FSM (Fig. 6.11). In MPYT FSMs, LUTerk implements systems (6.3) and Z k = Z k (T k , P k ) (k = 1, K ).
(6.26)
6.2 Synthesis of Basic Model of Mealy FSM
163 X
Fig. 6.11 Structural diagram of MPYT Mealy FSM EMB P P
1
1
P
LUTerK
LUTer1 1
ΦK
1
Z
Φ Clock Start
K
K
ZK
LUTerTZ Z
T
LUTerY
LUTer
Y
Fig. 6.12 Codes of CMOs for FSM MPYT(Γ13 )
z1 z2 z3
00
01
11
10
0
Y1
Y6
Y4
Y2
1
Y3
Y5
Y8
Y7
To synthesize MPYT Mealy FSM, it is necessary to add the following steps in the method discussed above. They are the following: 1. Constructing and encoding of CMOs Yq ⊆ Y (after step 2). 2. Constructing the system (3.37) instead of (3.13). 3. Constructing the table of LUTerY (after step 9). Let us discuss these peculiarities using as example FSM MPYT (Γ13 ). There are the following CMOs in the discussed case: Y1 = ∅, Y2 = {y1 , y2 }, Y3 = {y3 }, Y4 = {y2 , y4 }, Y5 = {y3 , y5 }, Y6 = {y1 , y5 }, Y7 = {y3 , y6 }, Y8 = {y3 , y4 }. Because Q = 8, there is R Q = 3. It gives the set Z = {z 1 , z 2 , z 3 }. Let us encode the CMOs in the manner targeting optimizing the system (2.4). Let us encode CMOs as it is shown in Fig. 6.12. Using Fig. 6.12, the following system (2.4) could be found: y1 = Y2 ∨ Y6 = z 1 z¯2 z¯3 ∨ z¯1 z 2 z¯3 ; y2 = Y2 ∨ Y6 = z 1 z¯3 ; y3 = Y3 ∨ Y5 ∨ Y7 ∨ Y8 = z 3 ;
y4 = Y4 ∨ Y8 = z 1 z 2 ; y5 = Y5 ∨ Y6 = z¯1 z 2 ; y6 = Y7 = z 1 z¯2 z 3 .
The system (6.27) is used to construct truth tables for LUTs of LUTerY.
(6.27)
164
6 Combining Twofold State Assignment …
Table 6.5 Table of LUTer1 of Mealy FSM MPYT(Γ13 ) am C(am ) as K (as ) Ph1 a2
01
a4
10
a6
11
a3 a3 a3 a5 a5 a3 a6 a7 a7
100 100 100 101 101 100 110 111 111
p1 p2 p1 p¯2 p¯1 p1 p2 p1 p¯2 p¯1 p1 p¯1 p2 p¯1 p¯2
Z h1
Φh1
h
z 31 z 11 z 21 z 21 z 31 z 11 z 31 z 11 z 21 z 21 z 31 z 21 z 31 z 11 z 11 z 21 z 31
D11 D11 D11 D11 D31 D11 D31 D11 D11 D21 D11 D21 D31 D11 D21 D31
1 2 3 4 5 6 7 8 9
It is necessary to construct tables ST1 –ST K to derive the systems (6.3) and (6.26). Let us construct the table of LUTer1 (Table 6.5). To fill the column Z h1 of Table 6.5, we use the codes K (Yq ) from Fig. 6.12. Obviously, Table 6.5 differs from Table 6.2 only by existing the column Z h1 instead of Yh1 . The following equations could be found: z 11 = τ¯1 τ2 p1 p¯2 ∨ τ1 τ¯2 p1 ∨ τ1 τ2 p¯1 ; z 21 = τ¯1 τ2 p¯2 ∨ τ¯1 τ2 p¯1 ∨ τ1 τ¯2 p¯2 ∨ τ1 τ¯2 p¯1 ∨ τ1 τ2 p1 ∨ τ1 τ2 p¯2 ; z 31 = τ¯1 τ2 p2 ∨ τ¯1 τ2 p¯1 ∨ τ1 τ¯2 p2 ∨ τ1 τ¯2 p¯1 ∨ τ1 τ2 p1 ∨ τ1 τ2 p¯2 .
(6.28)
All other steps of design method are the same as we discussed before. Now, let us discuss how to design the circuits for MPT Moore FSMs.
6.3 Synthesis of Basic Model of Moore FSM There is a structural diagram of MPT Moore FSM shown in Fig. 6.2. Let us discuss the proposed method of synthesis for MPTc Moore FSM. It includes the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Finding the set A for initial GSA Γ . Constructing the partition Π A = {B1 , ..., B I }. Finding the set of additional variables pg ∈ P. Executing the RLC. Transformation of initial GSA Γ . Finding the partition ΠC = {B 1 , ..., B K } based on RLC. Encoding of states by codes K (am ). Encoding of classes Bi ∈ B k by codes C(Bi ). Constructing tables ST1 –ST K based on RLC. Constructing the table of EMB.
6.3 Synthesis of Basic Model of Moore FSM
165
Fig. 6.13 Itinial GSA Γ14
11. Constructing the system (3.14) for LUTerT. 12. Constructing the table of LUTerYT . 13. Implementing FSM logic circuit. Let us discuss an example of synthesis for Moore FSM MPTc (Γ14 ). The GSA Γ14 is shown in Fig. 6.13. We can find the following sets from GSA Γ14 : A = {a1 , ..., a15 }, X = {x1 , ..., x8 }, Y = {y1 , ..., y7 }. Using the definition of PES, we can find the partition
166
6 Combining Twofold State Assignment …
Table 6.6 Replacement of logical conditions for Moore FSM MPTC (Γ14 ) pq \ Bi B1 B2 B3 B4 B5 B6 B7 p1 p2
x1 x2
x3 −
x4 x5
x6 −
− −
x7 −
− −
B8
B9
x8 −
− −
Π A = {B1 , ..., B9 } with B1 = {a1 }, B2 = {a2 , a3 , a4 }, B3 = {a5 , a6 }, B4 = {a7 , a8 , a10 }, B5 = {a9 }, B6 = {a11 , a12 }, B7 = {a13 }, B8 = {a14 }, B9 = {a15 }. Analysis of GSA Γ14 shows that there is G = 2. So, there is the set P = { p1 , p2 }. Let us replace the logical conditions (Table 6.6). There is the RLC shown for classes Bi ∈ Π A in Table 6.6. If am ∈ Bi , then there is the same RLC for am ∈ Bi as for Bi ∈ Π A . Now, we can transform the initial GSA Γ14 . To do it, we should replace logical conditions xe ∈ X into each conditional vertex. The replacement is executed basing on Table 6.6. There is the transformed GSA Γ14 shown in Fig. 6.14. Let we have LUTs with S = 5. Let us find the partition ΠC . In the discussed case, it is the partition ΠC = {B 1 , B 2 }, where B 1 = {B1 , ..., B4 , B6 , B7 , B8 } and B 2 = {B5 , B9 }. It gives the following sets: P 1 = { p1 , p2 }, A(B 1 ) = {a2 , ...., a9 , a11 , ..., a14 }, P 2 = ∅ and A(B 2 ) = {a1 , a10 }. Let us execute the diminishing encoding of states am ∈ A(B k ). The outcome is shown in Fig. 6.15. We used R = 4 internal variables Tr ∈ T for the state encoding. Using (3.6) gives R1 = 3 and R2 = 2. So, there are the following sets: T 1 = {τ1 , τ2 , τ3 }, T 2 = {τ4 , τ5 } and T = {τ1 , ..., τ5 }. Let us encode the classes Bi ∈ B k in the following manner: C(B1 ) = 001, C(B2 ) = 010, C(B3 ) = 011, C(B4 ) = 100, C(B6 ) = 101, C(B7 ) = 110, C(B8 ) = 111, C(B5 ) = 01 and C(B9 ) = 10. Now it is possible to construct the tables of LUTer1 (Table 6.7) and LUTer2 (Table 6.8). There are 4 LUTs in LUTer1. Using the assignment 11, we can find that D42 = τ5 . So, there are no LUTs in LUTer2. The table of EMB is constructed in the same way as for MPT Mealy FSM. The system (3.14) is the following one: D1 = D11 ;
D 2 = D21 ;
D3 = D31 ;
D4 = D41 ∨ D42 .
(6.29)
Analysis of (6.29) shows that there is only a single LUT in the LUTerT of Moore FSM MPTc(Γ14 ). So, there are 5 LUTs in blocks LUTer1, LUTer2 and LUTerT in the discussed case. The table of LUTerYT is constructed in the trivial way. We do not discuss this step. Now let us discuss the model of MT PY YTC Moore FSM shown in Fig. 6.16. The subscript “T ” means that variables τr ∈ T are used for RLC. It is possible, if the following condition takes place: 2 L+R A · G ≤ V0 .
(6.30)
6.3 Synthesis of Basic Model of Moore FSM
167
Fig. 6.14 Transformed GSA Γ14
Fig. 6.15 State codes of Moore FSM MPTc(Γ14 )
T1 T2 T3 T4 00 01
00
01
a1
a3 a12 a7
10
a10 a4 a13 a8 a6 a15 a11
11 10
11
a2
a5 a14 a9
168
6 Combining Twofold State Assignment …
Table 6.7 Table of LUTer1 of Moore FSM MPTC (Γ14 ) Bi C(Bi ) aS K (a S ) Ph1 B1
001
B2
010
B3
011
B4
100
B6
101
B7 B8
110 111
a2 a3 a4 a5 a6 a7 a8 a9 a11 a12 a11 a13 a14 a14 a15
0010 0100 0101 0110 0111 1000 1001 1010 1011 1100 1011 1101 1110 1110 1111
p1 p¯1 p2 p¯1 p¯2 p1 p¯1 p1 p2 p1 p¯2 p¯1 p1 p¯1 p1 p¯1 1 p1 p¯1
Φh1
h
D31 D21 D21 D41 D21 D31 D21 D31 D41 D11 D11 D41 D11 D31 D11 D31 D41 D11 D21 D11 D31 D41 D11 D21 D41 D11 D21 D31 D11 D21 D31 D11 D21 D31 D41
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Table 6.8 Table of LUTer2 of Moore FSM MPTC (Γ14 ) Bi C(Bi ) aS K (a S ) Ph2 B5 B9
01 10
a10 a1
0001 0000
1 1
Φh2
h
D42 −
1 2
X
Fig. 6.16 Structural diagram of MT PY YTC Moore FSM
0
EMB P 1
1
P
P
LUTer1 Z1 Clock Start
K
K
LUTerK V1
ZK
VK
LUTerVZ Z LUTerY Y
V LUTer
6.3 Synthesis of Basic Model of Moore FSM
169
The following methods of structural decomposition are used in MT PY YTC Moore FSM: 1. The replacement of logical conditions. It is executed by EMB generating the functions P = P(T , X ). (6.31) 2. Encoding of collections of microoperations Yq ⊆ Y . 3. Transformation of CMOs and identifiers into classes of PES Bi ∈ B k . 4. Representation of functions Z = Z (T , P) and V = V (T , P) as functions Z k = Z k (T k , k) and V k = V k (T k , Z k ). The LUTerk generates functions (6.26) and V k = V k (T k , Z k ).
(6.32)
There are the same first 6 steps in synthesis of Moore FSM MT PY YTC as for synthesis of MPTc Moore FSM. The following steps are: 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
Constructing the CMOs Yq ⊆ Y . Constructing the set of identifiers I . Encoding of CMOs and identifiers. Representing classes Bi ∈ Π A by pairs Yq , Im . Encoding of classes Bi ∈ B k by codes C(Bi ). Constructing the tables ST1 –ST K . Constructing the table of EMB. Constructing the systems (3.37) and (4.12). Constructing the table of LUTerY. Constructing the table of LUTerT . Implementing FSM logic circuit.
Let us discuss an example of synthesis for Moore FSM MT PY YTC (Γ14 ). There are already executed the steps 1–6. So, we start from the step 7. There are the following CMOs in the discussed case: Y1 = ∅, Y2 = {y1 , y2 }, Y3 = {y4 }, Y4 = {y2 , y3 }, Y5 = {y1 , y3 }, Y6 = {y5 }, Y7 = {y1 , y6 }, Y8 = {y2 , y7 }, Y9 = {y4 , y6 }, Y10 = {y5 , y5 }. So, there is Q = 10. Using (2.24) gives R Q = 4 and the set Z = {z 1 , ..., z 4 }. There is the CMO Y2 generated in states a2 ∈ B2 , a11 ∈ B6 and a15 ∈ B9 . It gives the set I = {I1 , I2 , I3 }. So, there is M I = 3. Using (4.8) gives R I = 2 and V = {v1 , v2 }. Encoding of CMOs should lead to optimizing the circuit of LUTerY. So, it is necessary to optimize the number of terms and literals in the system (2.4). In the discussed case, the following system could be obtained: y1 = Y2 ∨ Y5 ∨ Y7 ; y4 = Y3 ∨ Y9 ∨ Y10 ; y7 = Y8 .
y2 = Y2 ∨ Y4 ∨ Y8 ; y5 = Y6 ∨ Y10 ;
y3 = Y4 ∨ Y5 ; y6 = Y7 ∨ Y9 ;
(6.33)
170
6 Combining Twofold State Assignment …
Fig. 6.17 Encoding of CMOs for Moore FSM MT PY YTC (Γ14 )
Using approach [10] gives codes K (Yq ) shown in Fig. 6.17. Using codes from Fig. 6.17 transforms the system (6.33) into the following system: y1 = z¯1 z 2 ; y4 = z 3 z¯4 ; y7 = z 1 z 4 .
y2 = z¯3 z 4 ; y5 = z 1 z¯2 z¯4 ;
y3 = z¯1 z¯2 z 4 ∨ z 2 z¯3 z¯4 ; y6 = z 2 z 3 ;
(6.34)
Let us encode the identifiers in the trivial way: K (I1 ) = 00, .., K (I3 ) = 10. There is the following system where classes Bi ∈ Π A are represented by pairs
Yq , Im : B1 B3 B5 B7 B9
⇒ Y1 , ∅; ⇒ Y5 , I1 ∨ Y6 , ∅; ⇒ Y5 , I2 ; ⇒ Y4 , I2 ; ⇒ Y2 , I3 .
B2 B4 B6 B8
⇒ Y2 , I1 ∨ Y3 , I1 ∨ Y4 , I1 ; ⇒ Y7 , ∅ ∨ Y3 , I2 ∨ Y9 , ∅; ⇒ Y2 , I2 ∨ Y9 , ∅; ⇒ Y10 , ∅;
(6.35)
Let us use the same codes C(Bi ) as in the previous example. Now, it is possible to construct tables ST1 and ST2 . Each pair from (6.35) determines a single state as ∈ A. So, it is necessary to replace the column K (as ) by columns Yq , K (Yq ), Im , K (Im ). The column Φhk is replaced by columns Z hk and Vhk . For example, Table 6.9 represents LUTer1 of MT PY YTC (Γ14 ). It has 15 rows as its counterpart for previous example. We use codes K (Yq ) from Fig. 6.17. The block LUTer2 is represented by Table 6.10. Analysis of Table 6.10 shows that z 12 = z 42 = τ5 . It means that there are no LUTs in LUTer2. Also, there are only two LUTs in LUTerVZ. They implement functions z 1 and z 4 . Functions z 2 , z 3 , v1 , v2 are generated directly by LUTer1. There are the following columns in table of EMB: C(B1 ), X , P, h, i. It is constructed as Table 6.4. Let is point out that functions (6.7) do not depend on τ4 , τ5 . So, there are only variables τ1 , τ2 , τ3 in table of EMB. The system (6.34) is used to create truth tables for LUTs of LUTerY. The table of LUTerT is constructed in a trivial way. Analysis of Table 6.10 shows that there are no functions zr ∈ Z and vr ∈ V depended on τ4 . So, it is necessary to implement truth tables for functions τ1 , τ2 , τ3 , τ5 . We do not discuss this step for given example.
6.4 Synthesis Based on Transformation of GSA
171
Table 6.9 Table of LUTer1 of Moore FSM MT PY TC (Γ14 ) Bi C(Bi ) a S Yq K (Yq ) Im K (Im ) Ph1 B1
001
B2
010
B3
011
B4
100
B6
101
B7 B8
110 111
a2 a3 a4 a5 a6 a7 a8 a9 a11 a12 a11 a13 a14 a14 a15
Y2 Y3 Y4 Y5 Y6 Y7 Y3 Y5 Y2 Y9 Y2 Y4 Y10 Y10 Y2
0101 0010 0001 0100 1000 0111 0010 0100 0101 1110 0101 0001 1010 1010 0101
I1 I1 I1 I1 − − I2 I2 I2 − I2 I2 − − I3
00 00 00 00 − − 01 01 01 − 01 01 − − 10
p1 p¯1 p2 p¯1 p¯2 p1 p¯1 p1 p2 p1 p¯2 p¯1 p1 p¯2 p1 p¯2 1 p1 p¯2
Table 6.10 Table of LUTer2 of Moore FSM MT PY TC (Γ14 ) Bi C(Bi ) a S Yq K (Yq ) Im K (Im ) Ph2 B5 B9
01 10
a10 a1
Y8 Y1
1001 0000
I1 −
00 −
1 1
Z h1
Vh1
h
z 21 z 41 z 31 z 41 z 21 z 11 z 21 z 31 z 41 z 31 z 21 z 21 z 41 z 11 z 21 z 31 z 21 z 41 z 41 z 11 z 31 z 11 z 31 z 21 z 41
− − − − − − v21 v21 v21 − v21 v21 − − v11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Z h2
Vh2
h
z 12 z 42
− −
1 2
−
6.4 Synthesis Based on Transformation of GSA In this section, we discuss the case when G = 1 due to the transformation of initial GSA Γ . The transformation leads to increasing for the number of states [9]. If there are L m variables determining the transitions from a state am ∈ A, then L m − 1 states are added to the set A. Let ΔM be a number of new states.The value of ΔM is determined as: M0 Cm (L m − 1). (6.36) ΔM = m=1
The Boolean variable Cm = 1, iff L m > 0. The formula (6.36) is valid for Mealy FSMs. For Moore FSMs, it is necessary to replace M0 by M. Obviously, increasing the number of states could result in increasing for the number of state variables. Let RT variables be necessary to encode the states after the transformation: (6.37) RT = log2 (M0 + ΔM) .
172
6 Combining Twofold State Assignment …
Fig. 6.18 Transformed GSA Γ13
The replacement of M0 by M gives a formula for the case of Moore FSMs. Let us discuss the method of synthesis for M1 PT Mealy FSM. Its structural diagram is shown in Fig. 6.6. This method is practically the same as for MPT Mealy FSM. The only difference is reduced to the introducing the step of transformation of initial GSA Γ . It could be treated as a zero step of the method. Next, all other steps discussed in Sect. 6.2 should be executed. But now the step 1 is “Finding the set A for transformed GSA Γ ”. Let us discuss an example of synthesis for M1 P T Mealy FSM based on GSA Γ13 . There is a transformed GSA Γ13 shown in Fig. 6.18.
6.4 Synthesis Based on Transformation of GSA
173
Table 6.11 Distribution of logical conditions for Mealy FSM M1 PT(Γ13 ) am a1 a2 a3 a4 a5 a6 a7 a8 p1
−
x1
x3
x4
x6
x7
x9
x2
a9
a10
x5
x8
Fig. 6.19 State codes for Mealy FSM M1 P T (Γ13 )
There are three states added to the set A. Now there is A = {a1 , ..., a10 } and M0 = 10. Using (6.37) gives RT = 4. So, there are the sets T = {T1 , ..., T4 } and Φ = {D1 , ..., D4 }. Obviously, there is P = { p1 }. The result of RLC is shown in Table 6.11. The only difference between GSA (Fig. 6.17) and the transformed GSA is the replacement of logical conditions xe ∈ X by the variable p1 ∈ P. We do not show this result. Let us use LUTs having S = 4. The following partition Π B could be found: Π B = {A1 , A2 } where A1 = {a2 , ..., a8 } and A2 = {a1 , a9 , a10 }. It gives R1 = 3, R2 = 2 and R A = 5. Let it be T 1 = {τ1 , τ2 , τ3 } and T 2 = {τ4 , τ5 }. Also, there are A(A1 ) = {a1 , a3 , a4 , a6 , a7 , a8 , a9 , a10 } and A(A2 ) = {a2 , a5 , a7 }. Let us encode the states am ∈ A as it is shown in Fig. 6.19. Now, let us encode the states am ∈ Ak by the codes C(am ). Let us use the following codes: C(a1 ) = 01, C(a9 ) = 10, C(a10 ) = 11, C(a2 ) = 001, ..., C(a8 ) = 111. Let us construct the tables of LUTerk. There are Tables 6.12 and 6.13 representing LUTer1 and LUTer2. Using Tables 6.12 and 6.13 allows deriving the following SBFs: D11 = 0; D12 = τ¯1 τ3 p1 ∨ τ1 τ¯2 τ¯3 p1 ∨ τ1 τ¯2 τ3 ; D31 = τ¯2 τ3 p¯1 ∨ τ¯1 τ2 ∨ τ1 τ¯2 p¯1 ∨ τ2 τ3 ; D41 = τ¯1 τ¯2 τ3 p1 ∨ τ¯1 τ2 τ¯3 p¯1 ∨ τ1 τ¯2 τ3 p¯1 ∨ τ1 τ2 τ¯3 p1 . y11 = τ¯1 τ2 τ¯3 p¯1 ∨ τ1 τ2 τ¯3 p1 ; y31 = τ¯1 τ3 p¯1 ∨ τ1 τ¯2 τ¯3 ∨ τ1 τ2 τ3 p1 ; y51 = τ¯1 τ3 p¯1 ∨ τ1 τ¯2 τ¯3 ;
y21 = τ1 τ2 τ¯3 p1 ∨ τ1 τ2 τ3 p¯1 ; y41 = τ1 τ2 τ3 p¯1 ; y61 = 0.
D12 = τ¯4 τ5 ∨ τ4 τ¯5 ; D32 = τ4 τ5 ;
D22 = 0; D42 = τ4 τ¯5 .
(6.38)
(6.39)
(6.40)
174
6 Combining Twofold State Assignment …
Table 6.12 Table of LUTer1 of Mealy FSM M1 PTC (Γ13 ) am C(am ) aS K (a S ) Ph a2
001
a3
010
a4
011
a5
100
a6
101
a7
110
a8
111
a8 a3 a3 a4 a9 a3 a6 a3 a6 a10 a7 a1 a3 a3
0101 0010 0010 0011 0110 0011 0100 0010 0100 0111 0001 0000 0010 0010
p1 p¯1 p1 p¯1 p1 p¯1 p1 p¯1 p1 p¯1 p1 p¯1 p1 p¯1
Table 6.13 Table of LUTer2 of Mealy FSM M1 PTC (Γ13 ) am C(am ) aS K (a S ) Ph a1 a9
01 10
a10
11
a2 a5 a5 a7 a7
1000 1001 1001 0010 0010
1 p1 p¯1 p1 p¯1
y12 = τ¯4 τ5 ∨ τ4 τ5 p1 ; y22 = τ¯4 τ5 ∨ τ4 τ¯5 p¯1 ∨ τ4 τ5 p1 ; y32 = τ4 p¯1 ;
Yh1
Φh1
h
− y31 y51 y31 y11 y51 − y31 y51 y31 y51 y31 y51 y31 y51 − y11 y21 − y31 y21 y41
D21 D41 D31 D31 D31 D41 D21 D31 D31 D21 D31 D21 D21 D31 D41 D41 − D31 D31
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Yh2
Φh2
h
y12 y22 y32 y62 y22 y42 y12 y22 y32 y42
D12 D12 D42 D12 D42 D12 D42 D32
1 2 3 4 5
y42 = y52 = 0; (6.41) y62 = τ4 τ¯5 p1 .
Analysis of systems (6.38)–(6.41) shows that: (1) there are 8 LUTs in LUTer1; (2) there are 7 LUTs in LUTer2; (3) there are 5 LUTs in LUTerYT. Obviously, there are 5 LUTs in LUTerT . The table of LUTerT is constructed in the trivial way. So, there are NΦY + NY T + NT = 25 LUTs in the circuit of Mealy FSM M1 PT(Γ14 ). Let the following condition take place: 2 L+R0 · (R A + 1) ≤ V0 .
(6.42)
In this case, it is possible to implement systems P = P(T, X ) and T = T (T ) by a single EMB. It leads to the structural diagram shown in Fig. 6.20. We name this FSM
6.4 Synthesis Based on Transformation of GSA
175 X
Fig. 6.20 Structural diagram of M E1 PT Mealy FSM EMB P1 P1
1
1
P1
LUTer1 Φ1 Clock Start
K
K
LUTerK Y1
K
Φ
Y
K
LUTerYT T
Y
as M E1 PT, where the subscript E 1 means that a single EMB is used to implement two parts of FSM circuit. Now, let us discuss how to design Moore FSMs with G = 1. Let us use the GSA Γ14 (Fig. 6.13) for the synthesis. There is G = 2 for GSA Γ14 . Let us introduce additional operational vertices to make G = 1. It leads to the transformed GSA Γ14 shown in Fig. 6.21. Two vertices are introduced marked by states a16 and a17 . Each new state corresponds to a new class of PES. Now, there is the set Π A = {B1 , ..., B11 } where B10 = {a16 } and B11 = {a17 }. So, the formula (6.36) determines also the number of added classes. To encode the classes Bi ∈ Π A , it is necessary R I T variables: R I T = log2 (I + ΔM) .
(6.43)
In (6.43), the symbol I determines the number of classes Bi ∈ Π A before the transformation. Let us discuss an example of synthesis for M1 PTC (Γ14 ). Let us use LUTs with S = 4. There is a distribution of variables xe ∈ X among the classes Bi ∈ Π A shown in Table 6.14. It is executed in the trivial way because there is G = 1. Let us find the partition ΠC for classes Bi ∈ Π A . Because G = 1, each class B k ∈ Πc could have up to 2 S−1 − 1 elements. In previous chapters we propose “greedy” algorithms for creating classes B k . It means that each class includes as many elements as it is possible. There is the following partition ΠC in the discussed case: ΠC = {B 1 , B 2 } with B 1 = {B1 , ..., B4 , B6 , B8 , B10 } and B 2 = {B5 , B7 , B9 , B11 }. The following sets could be found for this partition: P(B 1 ) = { p1 }, P(B 2 ) = { p1 }, A(B 1 ) = {a2 , ..., a6 , a11 ,..., a17 } and A(B 2 ) = {a1 , a7 , a8 , a10 , a14 }. Let us encode the states am ∈ A. Using (1.4) gives R = 5. So, there are the sets T = {T1 , ..., T5 } and Φ = {D1 , ..., D5 }. As usually, we try to diminish the numbers of LUTs in LUTerk and LUTerT. Do it, we propose to “split” the state a14 by states a14 and a18 . It leads to some transformation in the transformed GSA Γ14 (Fig. 6.22).
176
6 Combining Twofold State Assignment …
Fig. 6.21 Transformed GSA Γ14 Table 6.14 RLC for Moore FSM M1 PTC (Γ14 ) Bi B1 B2 B3 B4 B5 B6 p1
x1
x3
x4
x6
−
x7
B7
B8
B9
B10
B11
−
x8
−
x2
x5
6.4 Synthesis Based on Transformation of GSA
177
Fig. 6.22 Transformed part of GSA Γ14
Fig. 6.23 Outcome of diminishing state assignment for Moore FSM M1 PTC (Γ14 )
Due to this transformation, we have the following: (1) the state a18 is added into the class B8 ; (2) there is no change in the number of classes Bi ∈ Π A and (3) the state a14 ∈ A(B 2 ) is replaced by a18 . Now, we can encode the states as it is shown in Fig. 6.23. Now T1 = 0 for K (a14 ) and T1 = 1 for K (a18 ). Also, there is T1 = 1 for all states am ∈ A(B 2 ). There are T1 = 0 and T2 = 0 for all codes K (am ) where am ∈ A(B 2 ). It diminishes the number of LUTs in LUTer1 and LUTer2. Let us encode the classes Bi ∈ B k . There is R1 = 3 and R2 = 3. Let us use the variables τ1 , τ2 , τ3 ∈ T 1 for encoding the classes Bi ∈ B 1 , the variables τ4 , τ5 , τ6 ∈ T 2 for Bi ∈ B 2 . Let us encode the classes in the following way: C(B1 ) = 001, C(B2 ) = 010, C(B3 ) = 011, C(B4 ) = 100, C(B6 ) = 101, C(B8 ) = 110, C(B10 ) = 111, C(B5 ) = 001, C(B7 ) = 010, C(B9 ) = 101 and C(B11 ) = 100. Now, it is possible to construct the tables for LUTer1 (Table 6.15) and LUTer2 (Table 6.16) for FSM M1 PTC (Γ14 ). We use codes K (am ) from Fig. 6.23. There are 5 LUTs in LUTer1, 3 LUTs in LUTer2 and 3 LUTs in LUTerT. Because S = 4, it is necessary to use the functional decomposition for functions Y (T ) and T (T ). It means that there is no sense in using this model for GSA Γ14 . Of course, it is possible to encode the states am ∈ A in a way minimizing the number of literals in functions yn ∈ Y and τr ∈ T . It is possible to use such encoding of states that it diminishes the number of functions required the functional decomposition. For example, let us encode the states as it is shown in Fig. 6.24. We can find that now the functional decomposition should be applied only to functions y2 , y3 , τ1 , τ2 and τ3 . All other functions have less than 5 literals and do not require the functional decomposition.
178
6 Combining Twofold State Assignment …
Table 6.15 Table of LUTer1 of Moore FSM M1 PTC (Γ14 ) Bi C(Bi ) aS K (a S ) Ph B1
001
B2
010
B3
011
B4
100
B6
101
B8
110
B10
111
a2 a16 a5 a6 a17 a9 a11 a12 a11 a13 a18 a15 a3 a4
10000 11011 10100 11011 11100 10101 10110 10111 10110 11000 11001 11010 10001 10010
p1 p¯1 p1 p¯1 p1 p¯1 p1 p¯1 p1 p¯1 p1 p¯1 p1 p¯1
Table 6.16 Table of LUTer2 of Moore FSM M1 PTC (Γ14 ) Bi C(Bi ) aS K (a S ) Ph B5 B7 B9 B11
001 010 101 100
a10 a14 a1 a7 a8
00100 00101 00000 00001 00010
1 1 1 p1 p¯1
Φh1
h
D11 D11 D21 D41 D51 D11 D41 D51 D11 D31 D11 D21 D31 D11 D31 D51 D11 D31 D41 D11 D31 D41 D51 D11 D31 D41 D11 D21 D11 D21 D51 D11 D21 D41 D11 D51 D11 D41
1 3 3 4 5 6 7 8 9 10 11 12 13 14
Φh1
h
D32 D32 D52 − D52 D42
1 2 3 4 5
Fig. 6.24 State codes for M1 PTC (Γ14 )
References 1. Adamski M, Barkalov A (2006) Architectural and sequential synthesis of digital devices. University of Zielona Góra Press, Zielona Góra 2. Asahar P, Devidas S, Newton A (1992) Sequential logic synthesis. Kluwer Academic Publishers, Boston 3. Baranov S (2008) Logic and system design of digital systems. TUT Press, Tallinn
References
179
4. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on CPLD. In: Proceedings of the sixth international conference CAD DD’07, vol 2. Minsk, pp 39–45 5. Barkalov A, Titarenko L, Wi´sniewski R (2006) Optimization of address circuit of compositional microprogram unit. In: Proceedings of the IEEE east-west design and test workshop (EWDTW’06), Sochi, Kharkov, 2006. Kharkov National University of Radioelectronics, pp 167–170 6. Barkalov A, Titarenko L, Wi´sniewski R (2006) Synthesis of compositional microprogram control units with sharing codes adn address decoder. In: Proceedings of the international conference mixed design of integrated circuits and systems – MIXDES 2006. Łódz, pp 397– 400 7. Chmielewski S (2014) Using structural pecularities of Moore FSM for reduction of number of PALS. PhD thesis, University of Zielona Góra 8. Łuba T, Rawski M, Jachna Z (2002) Functional Decomposition as a universal method for logic synthesis of digital circuits. In: Proceedings of IX international conference MIXDES’02, pp 285–290 9. Sentowich E, Singh K, Lavango L, Moon C, Murgai R, Saldanha A, Savoj H, Stephan P, Bryton R, Sangiovanni-Vincentelli A (1992) SIS: a system for sequential circuit synthesis. In: Proceedings of the international conference of computer design (ICCD’92), pp 328–333 10. Tatalov E (2011) Synthesis of compositional microprogram control units for programmable devices. Master’s thesis, Donetsk National Technical University, Donetsk 11. Tucker S (1967) Microprogram control for system/360. IBM Syst J 6(4):222–241 12. Wilkes M (1951) The best way to design an automatic calculating machine. In: Proceedings of Manchester University computer inaugural conference 13. Wilkes M, Stringer J (1953) Microprogramming and the design of the control circuits in an electronic digital computer. Proc Camb Philos Soc 49:230–238
Chapter 7
Mixed Encoding of Microoperations
Abstract The Chapter is devoted to the method of mixed encoding of microoperations. The main idea is discussed regarding Mealy FSMs. There is proposed a formal method allowing a partition of the set of microoperations by two sets. The elements of the first set are encoded by one-hot codes; the elements of the second set are combined into collections of microoperations. Next, the same approaches are discussed for FPGA-based Moore FSMs. There are proposed different structural diagrams of FSMs and corresponding methods of synthesis. The classes of pseudoequivalent states are used to optimize the hardware for Moore FSMs. Further, it is discussed the using proposed methods for combined FSMs. It is shown how to combine different methods of structural decomposition for synthesis of the FPGA-based combined FSMs. At last, it is discussed the mixed encoding of microoperations for LUT-based Mealy FSMs. It is proposed to form the collections of microoperations for elements of both parts of the partition of the set of microoperations.
7.1 Mixed Encoding for Mealy FSMs Let us encode CMOs Yq ⊆ Y by binary codes K (Yq ) having R Q bits. The value of R Q is determined by (2.24). We use elements of the set Z = {z 1 , . . . , z R Q } for the encoding. Now, the microoperations yn ∈ Y are represented by the system (2.4). Let it be possible to use only a single EMB for design of Mealy FSM. Let the following condition take place: 2 L+R0 (R0 + R Q ) ≤ V0 .
(7.1)
In this case, the mixed model of Mealy FSM (Fig. 7.1) could be used. Let us denote this model as P H Y, where the subscript “H” means “heterogeneous”. In P H Y FSM, the EMB implements the systems Φ = Φ(T, X ) and Z = Z (T, X ). The LUTerY implements the system Y = Y (Z ). Let the following condition take place: R Q > S.
(7.2)
As in previous Chapters, a LUT has S address inputs. © Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7_7
181
182
7 Mixed Encoding of Microoperations
Fig. 7.1 Mixed model of PY Mealy FSM
Z
X
EMB
T
LUTerY
Y
Start Clock
If (7.2) is true, then it is necessary to apply the functional decomposition to functions (2.4). As a result, there are many levels of logic in the circuit of LUTerY. It leads to the following relation: (7.3) NY N . Recall, the symbol NY stands for the number of LUTs in the circuit of LUTerY. Let the condition (7.1) be true for some configuration of EMB determined by the pair S A , t1 . Let the following condition take place: Δt = t1 − (R0 + R Q ) ≥ 0.
(7.4)
Let Y ∗ be a set of CMOs Yq ⊆ Y . Let it include CMOs Yi , Y j ∈ Y ∗ such that Yi = Y j ∪ {yn }. Therefore, if we eliminate yn from Yi , then we get the relation Yi = Y j . In this case, there is decrease by 1 for the value of Q = |Y ∗ |. Let I (yn ) be the number of pairs Y j , Yi such that elimination of yn from Yi ⊆ Y leads to the equality Yi = Y j . So, the elimination of yn leads to the decrease by I (yn ) for Q. The elimination of yn leads to a transformation of the set Y ∗ into a set Y1∗ having Q 1 elements: Q 1 = Q − I (yn ). (7.5) To encode the CMOs Yq ∈ Y1∗ , it is necessary R Q 1 bits, where R Q 1 = log2 Q 1 .
(7.6)
It is possible that the following condition takes place: R Q1 < R Q .
(7.7)
If (7.7) is true, then there is a free output of EMB which could be used for generating some MO yn ∈ Y . Next, it is necessary to find a MO ym ∈ Y \ {yn } such that its elimination from CMOs Yq ∈ Y ∗ could lead to the set Y ∗ having Q 2 < Q 1 elements. It is enough R Q 2 bits to encode Q 2 CMOs: (7.8) R Q 2 = log2 Q 2 . If R Q 2 < R Q 1 , then there is the next free output of EMB. It could be used to generate the microoperation ym .
7.1 Mixed Encoding for Mealy FSMs
183
Fig. 7.2 Structural diagram of PY M Mealy FSM
X
Z EMB Start Clock
YE
LUTerY
YL
T
The process of elimination could be continued till the point when the elimination of any from remained microoperations does not decrease the number of bits for codes K (Yq ). Let it be the set Y E consisting from eliminated microoperations. Let it be the set Y L = Y \Y E . The set Y L includes MOs generated by LUTerY, whereas MOs yn ∈ Y E by EMB. The proposed approach allows obtaining two kinds of codes. There are unitary (one-hot) codes generated by EMB. The LUTerY generates maximal codes of CMOs Yq ⊆ Y L∗ . The set Y L∗ includes CMOs consisting from MOs yn ∈ Y L . We name such style of encoding the mixed encoding of microoperations (MEMO). Using MEMO leads to PY M Mealy FSM (Fig. 7.2). There are Q L collections of MOs in the set Y ∗ . It is necessary R L variables to encode CMOs Y Q ⊆ Y ∗ : (7.9) R L = log2 Q L . The proposed approach produces the circuit of LUTer with NY = |Y L | < N LUTs if the following condition takes place: R L ≤ S.
(7.10)
There are the following steps in the proposed method of synthesis of PY M Mealy FSMs: 1. 2. 3. 4. 5. 6. 7. 8.
Constructing the set A for a given GSA Γ . Executing the state assignment. Constructing the set of CMOs Y ∗ . Dividing the set Y by sets Y E and Y L . Executing the encoding of CMOs Yq ⊆ Y L . Constructing the table of EMB. Constructing the table of LUTerY. Implementing the FSM circuit with particular EMB and LUTs.
Let us discuss the proposed method for the executing the step 4. It is the most important part of synthesis. Its outcome determines the number of LUTs in the circuit of LUTerY. There is a block diagram of proposed algorithm shown in Fig. 7.3. In the beginning, all microoperations are placed in the set Y L . It gives Y E = ∅. Also, all CMOs Yq ⊆ Y are placed in the set Y L∗ . It is shown in the Block 1 from Fig. 7.3. The basic idea of the method is reduced to finding microoperations yn ∈ Y L the inclusion of which in the set Y E minimizes the number of CMOs remained in the
184
7 Mixed Encoding of Microoperations
Fig. 7.3 Block diagram of algorithm of dividing the set of microoperations
set Y L∗ . To do it, the queue γ is formed during each step of the algorithm. It includes MO yn ∈ Y L (Block 3). The elements of queue γ are ranked in descending order of value of Q(yn ). The value of Q(yn ) is equal to the amount of CMOs Yq ∈ Y L∗ containing the MO yn . The number I determines the maximum number of cycles in the step k (k ∈ {1, . . . , N }). During each cycle, the choice is executed for the next element of the queue γ (Block 5). The value of ΔQ i is calculated for the choosen element. The parameter ΔQ i is equal to the number of CMOs excluded from Y ∗ due to transfer of yn ∈ Y L into the set Y E . Next, it is calculated the number Rk of variables sufficient for encoding of CMOs Yq ∈ Y L∗ (Block 6): Rk = log2 (Q L − ΔQ i ) . (7.11) If there is no diminishing the number of bits in K (Yq ) due to excluding the ith element of γ (the output “no” from Block 7), then it is executed the transition to the next element of γ (Block 9). If the γ is exhausted, then the algorithm is terminated (the output “yes” from Block 11). Otherwise, the next element of γ is chosen (the output
7.1 Mixed Encoding for Mealy FSMs
185
“no” from Block 11). If the bit width of K (Yq ) is reduced (the output “yes” from Block 7), then the i-th element of γ is included into Y E and excluded from Y L (Block 8). The value of k is incremented (Block 10). If all microoperations are checked (the output “yes” from block 12), then the algorithm is terminated. Otherwise, the correction of Y L∗ is executed (Block 13) and a new queue γ is created (the transition to Block 3). Let us illustrate this algorithm on the example of GSA Γ15 (Fig. 7.4). It is marked by states of Mealy FSM using the rules [1]. Fig. 7.4 Initial GSA Γ15
186
7 Mixed Encoding of Microoperations
Table 7.1 Collections of microoperations for GSA Γ15 q Yq q Yq − y1 y2 y3 y3 y4 y6 y2 y7 y4 y8 y3 y5 y9
1 2 3 4 5 6
7 8 9 10 11 12
y1 y4 y6 y3 y5 y6 y2 y3 y6 y1 y2 y7 y2 y7 y6 y2 y3
Table 7.2 The process of partition of the set Y yn Q(yn ) ΔQ i R Q1 Q(yn ) ΔQ i y1 y2 y3 y4 y5 y6 y7 y8 y9 Y1
7 7 6 5 3 8+ 3 2 2 y6
6
4
5+ 4 5 4 2 − 2 2 1 y1
4
R Q2 3
−
−
q
Yq
13 14 15 16 17 18
y1 y4 y8 y1 y4 y1 y6 y1 y6 y3 y5 y6 y9
Q(yn )
ΔQ
R Q3
− 2+ 4+ 3+ 2+ − 1+ 1+ 1+ –
− 0 0 1 0 − 0 1 1
− 3 3 3 3 – 3 3 3
There are Q = 18 collections of microoperations written into the operational vertices of GSA Γ15 . They are shown in Table 7.1. So, there is the set Y ∗ = {Y1 , . . . , Y18 }. Using (2.24) gives R Q = 5. It determines the set Z = {z 1 , . . . , z 5 }. There is the process of partition of the set Y represented by Table 7.2. There are the columns yn , Q(yn ), ΔQ i , R Q 1 , R Q 2 , R Q 3 in Table 7.2. Their meaning is clear from our discussing the algorithm (Fig. 7.3). The sign “+” means that a particular MO yn is chosen for analysis. The sign “⊕” means that a particular MO is included into the set Y E . The sign “–” means that a particular MO is excluded from the further analysis. The row Y 1 shows the microoperations yn ∈ Y E in the order of their including into the set Y E . We do not show queues because they are clear from the columns Q(yn ). Using the first column Q(yn ) produces the queue γ = y6 , y1 , y2 , y3 , y4 , y5 , y7 , y8 , y9 . There is ΔQ i = 6 due to including the MO y6 into Y E . Using (7.6) gives R Q 1 = 4. There is the relation R Q 1 < R Q . So, the condition (7.7) takes place and the MO y6 should be included into Y E . So, it is necessary only a single cycle to finish the first step of the partition. Excluding y6 from Y L leads to the set Y L∗ = {Y1 , . . . , Y12 }. Table 7.3 shows these new CMOs.
7.1 Mixed Encoding for Mealy FSMs
187
Table 7.3 Collections of MOs after excluding y6 q Yq q Yq 1 2 3 4
− y1 y2 y3 y3 y4 y2 y7
5 6 7 8
y4 y8 y3 y5 y9 y1 y4 y3 y5
Table 7.4 Collections of MOs after excluding y1 q Yq q 1 2 3 4
− y1 y2 y3 y3 y4 y2 y7
5 6 7 8
q
Yq
9 10 11 12
y2 y3 y1 y2 y7 y1 y4 y8 y1
Yq y4 y8 y3 y5 y9 y4 y3 y5
Analysis of the second column Q(yn ) produces the queue γ = y1 , y2 , y3 , y4 , y5 , y7 , y8 , y9 . Four CMOs are excluded from Y L∗ due to including the MO y1 into Y E . It gives R Q 2 = 3. Therefore, it is necessary to include y1 into Y E . Now, there is the set Y L∗ = {y1 , . . . , y8 }. Table 7.4 shows these new CMOs. Analysis of the third column Q(yn ) gives the queue γ = y3 , y4 , y2 , y5 , y7 , y8 , y9 . As follows from the column R Q 3 , there is no decrementing for the value of R Q . So, the algorithm is terminated after the analysis of the last element of γ . The outcome of this algorithm’s operation produces the sets Y E = {y1 , y6 } and Y L = y2 , y3 , y4 , y5 , y7 , y8 , y9 . So, there is Q L = 8. Using (7.9) gives R L = 3 and Z = {z 1 , z 2 , z 3 }. Because R Q − R L = 2, the bit width of K (Yq ) is decreased by 2. Let us design the circuit of Mealy FSM PY M (Γ15 ). Let it be necessary to use LUTs with S = 3. Let it be the configuration 8, 8 for EMB used for design. Let we can use only a single EMB. Analysis of Fig. 7.4 gives the following sets and their parameters: A = {a1 , . . . , a7 }, M0 = 7, X = {x1 , . . . , x5 }, L = 5, Y = {y1 , . . . , y9 }, N = 9, Y ∗ = {Y1 , . . . , Y18 }, Q = 18, R Q = 5. So, there is R0 = 3. It gives the sets T = {T1 , T2 , T3 } and Φ = {D1 , D2 , D3 }. There is R0 + L = 8 = S A . It means that EMB could be used for implementing the circuit of FSM PY M (Γ15 ). There is R Q = R0 = 8 = t F and N + R0 = 12 > t F . So, it has sense to encode the CMOs Yq ⊆ Y . There is no influence of state assignment on the hardware amount for EMBbased FSM [3]. Because of it, let us encode the states am ∈ A in the trivial way: K (a1 ) = 000, . . . , K (a7 ) = 110. There are S = 3 and R Q = 5. So, the condition (7.2) takes place. It means that it worth trying to divide the set Y by Y L and Y E . This step is already executed. Let us encode the CMOs Yq ⊆ Y L . Using [14], we can get the codes shown in Karnaugh map (Fig. 7.5).
188
7 Mixed Encoding of Microoperations
Fig. 7.5 Codes of CMOs for Mealy FSM PY M (Γ15 )
z1 z2 z3
00
01
11
10
0
Y1
Y4
Y7
Y5
1
Y3
Y2
Y6
Y8
Using Table 7.4 and codes from Fig. 7.5, we can get the following system of equations: y2 = Y2 ∨ Y4 = z¯ 1 z 2 ; y3 = Y2 ∨ Y3 ∨ Y6 ∨ Y8 = z 3 ; y4 = Y3 ∨ Y5 ∨ Y7 ; y5 = Y6 ∨ Y8 = z 1 z 3 ; (7.12) y8 = Y5 ; y7 = Y4 ; y9 = Y6 . As follows from (7.12), the MO y3 is generated by EMB as the output z 3 . So, there is NY = 6. It could be shown that NY = 72 for R Q = 5. The system (7.12) is a base for creating truth tables of LUTs from LUTerY. To construct the table of EMB, it is necessary to construct the structure table of PY M Mealy FSM. It includes the following columns: am , K (am ), as , K (as ), X h , Y E h , Y L h , Yqh , K (Yqh ), h. There is the clear meaning for each column of this table. There is no need in functions Dr ∈ Φ and zr ∈ Z . Creating the table of EMB, we will use codes K (Yq ) to fill the column Z . There is the ST of Mealy FSM PY M (Γ15 ) represented by Table 7.5. There are the following columns in the table of EMB: K (am ), X , Y E , Z , Φ, h m . All necessary information could be taken from the ST of PY M Mealy FSM. There are H rows in the table of EMB: H = 2 L+R0 .
(7.13)
There are H (am ) rows representing transitions from any state am ∈ A: H (am ) = 2 L .
(7.14)
Using (7.13)–(7.14) gives the values H = 256 and H (am ) = 32. Because M = 7, there are only zeros in 32 rows of the table of EMB for the code 11. There is a fragment of this table shown in Table 7.6. Table 7.6 represents 8 last rows for the transitions from the state a5 and 2 first rows for the state a6 . Because H (am ) = 32, then 5H (am ) = 160. So, the transitions from a6 ∈ A are started from the row 161 (Table 7.6). We add the column h to show the correspondence among the lines of Tables 7.5 and 7.6. There is no need in table for LUTerY. The truth tables for each LUT are constructed on the base of the system (7.12). Let us analyse the outcome of the mixed encoding of MOs in the discussed case. In the case of standard approach [2, 8, 9], there are 72 LUTs with S = 3 in the circuit of LUTerY. This circuit has 3 levels of logic. Also, there are 216 interconnections
7.1 Mixed Encoding for Mealy FSMs
189
Table 7.5 Structure table of Mealy FSM PY M (Γ15 ) am K (am ) as K (as ) X h Y Eh a1
000
a2
001
a3
010
a4 a5
011 100
a6 a7
101 110
a2 a2 a2 a2 a3 a3 a3 a4 a5 a5 a5 a5 a7 a6 a6 a6 a6 a1 a1
001 001 001 001 010 010 010 011 100 100 100 100 110 101 101 101 101 000 000
x1 x2 x1 x¯2 x¯1 x3 x¯1 x¯3 x2 x¯2 x4 x¯2 x¯4 x1 x¯1 x4 x¯1 x¯4 x5 x¯1 x¯4 x¯5 1 x3 x4 x3 x¯4 x¯3 x5 x¯3 x¯5 x1 x¯3 x¯5 x¯1 1 1
y1 y6 − − − y1 y6 y6 y6 y1 y6 − y1 y1 y1 y6 y1 y6 − y1 y6
Y Lh
Ygh
K (Ygh ) h
y2 y3 y3 y4 y2 y7 y4 y8 y3 y5 y9 y4 y3 y5 y2 y3 y2 y7 y2 y7 y2 y3 y4 y8 y4 − − − − y3 y4 y3 y5 y9
Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y2 Y4 Y4 Y2 Y5 Y7 Y1 Y1 Y1 Y1 Y3 Y6
011 001 010 100 111 110 101 011 010 010 011 100 110 000 000 000 000 001 111
Table 7.6 Part of the table of EMB of Mealy FSM PY M (Γ15 ) K (am ) X YE Z Φ T1 T2 T3 x1 x2 x3 x4 x5 y1 y6 z1 z2 z3 D1 D2 D3 100 100 100 100 100 100 100 100 101 101
11000 11001 11010 11011 11100 11101 11110 11111 00000 00001
11 00 11 00 10 10 10 10 10 10
000 000 000 000 000 000 110 110 001 000
101 101 101 101 101 101 110 110 000 000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
hm
h
153 154 155 156 157 158 159 160 161 162
16 15 16 15 14 14 13 13 18 18
inside this circuit. Using our approach produces the circuit with 6 LUTs, a single level of logic and 16 interconnections. So, we have designed a circuit which is three times faster and requires 12 times less amount of LUTs. Also, it provides a significant reduction in the number of interconnections (216:16 = 13, 5). As a significant portion of energy is consumed by interconnections [5], our approach allows to significantly reduce the energy consumption.
190
7 Mixed Encoding of Microoperations
7.2 Mixed Encoding for Moore FSMs Let us start from P Moore FSM. Let it be possible to implement the system (1.1) using a single EMB. It is possible if the following condition takes place: 2 L+R · R ≤ V0 .
(7.15)
It leads to the P H Moore FSM (Fig. 7.6). Let the condition (7.15) be true for some combination S A , t1 . Let the following condition be true: (7.16) Δt = t1 − R > 0. It means that Δt microoperations yn ∈ Y could be implemented by EMB. They form a set Y E . The remaining MOs form a set Y L . Now, the model P H M could be used (Fig. 7.7). If the condition (7.15) is true, then only a single EMB is used for implementing the systems Φ = Φ(T, X ) and Y E = Y E (T, X ). So, it is necessary to diminish the numbers of LUTs and interconnections for LUTerY. It could be done due to a proper state assignment. To optimize the circuit of LUTerY, we propose: 1. To execute the refined state assignment. 2. To find the number of LUTs necessary for implementing each MO yn ∈ Y . Let us denote it as N (yn ). 3. To place the MOs in the descending order of N (yn ). 4. To choose the first Δt microoperations and place them into the set Y E . 5. The remaining MOs form the set Y L . Let us discuss an example of synthesis for Moore FSM P H M (Γ15 ). There is a GSA Γ15 marked by states of Moore FSM and shown in Fig. 7.8. There are M = 20 states for corresponding Moore FSM. All other parameters are the same as for Mealy FSM PY M (Γ15 ). Let us find the system showing dependence of MOs yn ∈ Y from states am ∈ A: X
Fig. 7.6 Structural diagram of P H Moore FSM Clock Start
LUTerY
EMB T
Y
X
Fig. 7.7 Structural diagram of P H M Moore FSM Clock Start
LUTerY
EMB YE
T
YL
7.2 Mixed Encoding for Moore FSMs
191
Fig. 7.8 Initial GSA Γ15 marked by states of Moore FSM
y1 = A2 ∨ A7 ∨ A10 ∨ A10 ∨ A13 ∨ A14 ∨ A15 ∨ A17 ∨ A20 ; y2 = A2 ∨ A4 ∨ A9 ∨ A10 ∨ A11 ∨ A12 ; y3 = A2 ∨ A3 ∨ A6 ∨ A8 ∨ A9 ∨ A12 ∨ A19 ∨ A20 ; y4 = A3 ∨ A5 ∨ A7 ∨ A13 ∨ A14 ∨ A20 ; y5 = A6 ∨ A8 ∨ A19 ; y6 = A3 ∨ A7 ∨ A8 ∨ A9 ∨ A11 ∨ A16 ∨ A17 ∨ A19 ;
(7.17)
192
7 Mixed Encoding of Microoperations
Fig. 7.9 State codes for Moore FSM P H M (Γ15 )
X
Fig. 7.10 Structural diagram of P H C Moore FSM Clock Start
LUTer Y
EMB T
Y
y7 = A4 ∨ A10 ∨ A11 ; y8 = A5 ∨ A13 ; y9 = A6 ∨ A19 . Using approach [14] gives the state codes shown in Fig. 7.9. As in previous case, we use LUTs with S = 3. Using system (7.18) and codes (Fig. 7.9) gives the following system of Boolean equations: y2 = T2 T¯3 ; y4 = T¯2 T3 ; y5 = T1 T3 (7.18) y7 = T¯1 T2 T¯3 ; y8 = T2 T3 T¯4 ; y9 = T1 T¯2 T¯5 . There are five literals in functions y1 , y3 , y6 . So, there are N (yn ) = 7 for y1 , y3 , y6 and N (yn ) = 1 for other microoperations yn ∈ Y . Let us find the value of Δt. Let we can use EMB with S A = R + L = 10 and t F = 8. Using (7.16) gives Δt = 3. So, three microoperations could be placed into the set Y E . It follows from the previous analysis that Y E = {y1 , y3 , y6 } and Y L = {y2 , y4 , y5 , y7 , y8 , y9 }. If LUTerY implements all MOs yn ∈ Y , then it is necessary 6 LUTs to implement the system (7.18) and 21 LUTs for implementing MOs y1 , y3 and y6 . This circuit has 15 interconnections for the part corresponding to (7.18). There are 63 interconnections for the part corresponding to Y E . So, there is NY = 27 and there are 78 interconnections in LUTerY of P H (Γ15 ). In the case of P H M (Γ15 ), there is NY = 6 and there are 15 interconnections. Also, the LUTerY of P H M (Γ15 ) has only a single level of logic. There are three levels of logic in the LUTerY of P H (Γ15 ). Let us discuss the P H C Moore FSM (Fig. 7.10). The variables τr ∈ T encode the classes of PES [4]. It is necessary R B variables τr ∈ T as determined by (2.42).
7.2 Mixed Encoding for Moore FSMs
193
In P H C Moore FSM, the EMB implements the system Φ(T , X ). The LUTerT Φ implements the systems (1.3) and (2.43). To design the circuit of P H C Moore FSM, it is necessary to find the partition Π A = {B1 , . . . , B8 } with the classes B1 = {a1 }, B2 = {a2 , . . . , a5 }, B3 = {a6 , a7 , a8 }, B4 = {a9 }, B5 = {a10 , . . . , a13 }, B6 = {a14 }, B7 = {a15 , . . . , a18 } and B8 = {a19 , a20 }. So, there is I = 8. Using (2.42) gives R B = 3 and T = {τ1 , τ2 , τ3 }. Let the following condition take place: 2 R B +L · R ≤ V0 .
(7.19)
In this case, it is possible to use the model of P H C Moore FSM. If (7.16) is true, it is possible to form the sets Y E and T E . It determines the sets Y L = Y \Y E and T L = T \T E . The LUTerT Y implements functions from sets Y L and T L , the EMB from Y E and T E . It leads to P H C M Moore FSM (Fig. 7.11). Recall that subscript “H” stands for “heterogeneous”, the subscript “C” for transformation of K (am ) into K (Bi ) and the subscript “M” for mixed encoding of MOs. To design the circuit of P H C M Moore FSM, it is necessary: (1) to encode the classes Bi ∈ Π A ; (2) to construct systems (1.3) and (2.43); (3) to encode the states am ∈ A for optimizing systems (1.3) and (2.43); (4) to choose the elements included into sets T E and Π E . Let us encode the classes Bi ∈ Π A in the trivial way (Fig. 7.12). Using the Karnaugh map from Fig. 7.12, it is possible to construct the system T (A). It is the following: τ1 = B5 ∨ B6 ∨ B7 ∨ B8 = A10 ∨ A11 ∨ · · · ∨ A20 ; τ2 = B3 ∨ B4 ∨ B7 ∨ B8 = A6 ∨ · · · ∨ A9 ∨ A15 ∨ · · · ∨ A20 ; τ3 = B2 ∨ B4 ∨ B6 ∨ B8 = A2 ∨ · · · ∨ A5 ∨ A9 ∨ A14 ∨ A19 ∨ A20 . Fig. 7.11 Structural diagram of P H C M Moore FSM
X Clock Start
LUTer Y
EMB
τ3
YL
T
YE
Fig. 7.12 Class codes for Moore FSM P H M (Γ15 )
(7.20)
L E
τ1 τ2 00
01
11
10
0
B1
B3
B7
B5
1
B2
B4
B8
B6
194
7 Mixed Encoding of Microoperations
There is the system Y (A) for a given example represented by (7.18). We hope that the further execution of the synthesis is clear. As a rule, there is the following relation: R > R B [4]. It means that it is necessary a fewer amount of address inputs to implement the circuit of P H C M Moore FSM than it is required for the equivalent P H M Moore FSM. Obviously, this approach could be used for optimizing P H Y Moore FSMs. The structural diagram of P H Y Moore FSM is practically the same as for PY M Mealy FSM (Fig. 7.2). In P H Y Moore FSM, the EMB implements systems (1.1) and (2.47), the LUTerY the system (2.4). This model could be used if the following condition takes place: 2 R+L (R + R Q ) ≤ V0 .
(7.21)
Let the condition (7.21) correspond to a pair S A , t1 such that Δt = t1 − (R + R Q ) ≥ 0.
(7.22)
In this case, it is possible to implement Δt microoperations as outputs of EMB. It gives the dividing the set Y by sets Y L and Y E . In turn, it leads to the P H Y M Moore FSM. Its structural diagram is practically the same as for PY M Mealy FSM (Fig. 7.2). In P H Y M Moore FSM, the EMB implements the systems (1.1), (2.25) and Y E = Y E (T ).
(7.23)
The LUTerY implements the system Y L = Y L (Z ).
(7.24)
There are the same steps in synthesis methods for PY M Mealy FSM and P H Y M Moore FSM. Let us discuss an example of synthesis for Moore FSM P H Y M (Γ15 ). Let us use EMB with S A = L + R = 10 and t F = 10. There are the sets A = {a1 , . . . , a20 }, T = {T1 , . . . , T5 } and Φ = {D1 , . . . , D5 }. There is no influence of state codes on the hardware amount if functions Φ and Z are implemented using EMB. So, we encode the states in the trivial way: K (a1 ) = 00000, . . . , K (a20 ) = 10111. The set Y ∗ is the same as for Mealy FSM PY (Γ15 ). So, there is R Q = 5. The following relation is true in the discussed case: R + R Q = 10 = t F . So, the condition (7.22) takes place. It means that the model P H Y M (Γ15 ) could be used. The same procedure is used for dividing the set Y for PY M Mealy and P H Y M Moore FSMs. So, we have the sets Y E = {y1 , y6 } and Y L = {y2 , y3 , y4 , y5 , y7 , y8 , y9 }. Let us encode the CMOs Yq ⊆ Y L using the same codes as it is shown in Fig. 7.5. So, the system (7.12) represents the LUTerY for Moore FSM P H Y M (Γ15 ). To construct the table of EMB, it is necessary to construct the structure table of Moore FSM P H Y M (Γ15 ). It includes the following columns: am , K (am ), as , K (as ), X h , h. There are microoperations yn ∈ Y E and codes of CMOs K (Yq ) shown in the column am . Table 7.7 represents a part of ST.
7.2 Mixed Encoding for Moore FSMs
195
Table 7.7 Part of ST of Moore FSM P H Y M (Γ15 ) am K (am ) as K (as ) a1 (−)
00000
a2 (y1 , 011)
00001
a3 (y6 , 001)
00010
a4 (−, 010)
00011
a2 a3 a4 a5 a6 a7 a8 a6 a7 a8 a6 a7 a8
00001 00010 00011 00100 00101 00110 00111 00101 00110 00111 00101 00110 00111
Xh
h
x1 x2 x1 x¯2 x¯1 x3 x¯1 x¯3 x2 x¯2 x4 x¯2 x¯4 x2 x¯2 x4 x¯2 x¯4 x1 x2 x¯2 x4 x¯2 x¯4
1 2 3 4 5 6 7 8 9 10 11 12 13
There are transitions from states a1 , a2 , a3 , a4 ∈ A shown in Table 7.7. Let us explain the column am . The CMO Y1 is generated in state a1 ∈ A. So, there are no MOs yn ∈ Y E and K (Yq ) shown for a1 . There are the microoperations y1 , y2 , y3 written in the vertex marked by a2 (Fig. 7.8). They are divided in such a way that y1 ∈ Y E and y2 , y3 ∈ Y L . As follows from Fig. 7.5, there is K (Y2 ) = 011. So, the following information is shown for the state a2 : (y1 , 011). All other cells for am ∈ A are filled in the same way. Let us point out that there are HE M B = 1024 rows in the table of EMB in the discussed case. There are the following columns in the table of EMB: K (am ), X , Y E , Z , Φ, h m . So, there are the same columns in table of EMB for PY M and P H Y M FSMs. There is a part of table of EMB for Moore FSM P H Y M (Γ15 ) shown in Table 7.8. It represents 10 transitions from the state a2 . Table 7.8 Part of the table of EMB for FSM P H Y M (Γ15 ) K (am ) X YE Z Φ T1 T2 T3 T4 T5 x1 x2 x3 x4 x5 y1 y6 z1 z2 z3 D1 D2 D3 D4 D5 00001 00001 00001 00001 00001 00001 00001 00001 00001 00001
00000 00001 00010 00011 00100 00101 00110 00111 01000 01001
10 10 10 10 10 10 10 10 10 10
011 011 011 011 011 011 011 011 011 011
00100 00100 00100 00100 00011 00011 00011 00011 00100 00100
hm
h
33 34 35 36 37 38 39 40 41 42
4 4 4 4 3 3 3 3 4 4
196 Fig. 7.13 Structural diagram of P H C Y M Moore FSM
7 Mixed Encoding of Microoperations
X
Z EMB
YE T
LUTerY
YL
LUTerY
Start Clock
Using (7.14) give H (am ) = 32. So, the transitions for the state a2 ∈ A start from the row 33. There is the same information in columns Y E and Z for all rows for the state am ∈ A. We take it from the column am of Table 7.7. There is the code K (as ) in the column Φ. For example, the row 33 of Table 7.8 corresponds to the row 4 of Table 7.8. So, there is the code K (a5 ) in the column Φ for the row 33. The same approach is used for filling each row of the table of EMB. Obviously, this approach could be combined with the transformation of state codes into class codes. It leads to P H C Y M Moore FSM (Fig. 7.13). To design the circuit of P H C Y M FSM, it is necessary to combine together the previous methods. We leave this problem for a reader.
7.3 Synthesis of FPGA-based Combined FSMs Three systems of Boolean functions represent the circuit of CFSM. They are the systems (1.1), (1.28) and (1.29). There are MOs yn ∈ Y 1 depending on Tr ∈ T and xe ∈ X . The system (1.28) is implemented by the block BF, system (1.29) by the block BMO (Fig. 1.13). Existence of the system (1.29) requires marking a GSA Γ by the states of Moore FSM [6, 7]. So, it is possible to use PES for optimizing the logic circuit of CFSM. There are M internal states for CFSM which are encoded using R bits. The value of R is determined by the expression (1.4). It is possible to use LUTs for implementing the circuit of blocks BF and BMO. It leads to LUT-based P L CFSM (Fig. 7.14). It is possible to use EMBs for implementing the circuit of blocks BF and BMO. The circuit could be implemented using a single EMB. To do it, the following condition should take place: (7.25) 2 L+R (R + N1 + N2 ) ≤ V0 .
Fig. 7.14 Structural diagram of P L CFCM
X
1
Y LUTerTY1 Start Clock
T
LUTerY2
Y2
7.3 Synthesis of FPGA-based Combined FSMs
197
Let the condition (7.25) be violated but the following condition take place: 2 L+R (R + N1 ) ≤ V0 .
(7.26)
In this case, it is enough two EMBs to implement the circuit of CFSM (Fig. 7.15). We denote this model as P E CFSM. It is possible to use LUTs and EMBs together to implement the circuit of CFSM. Two approaches are possible. In the case of P H 1 CFSM, the BF is implemented as EMBerTY1 and BMO as LUTerY2 (Fig. 7.16). In the case of P H 2 CFSM, the BF is implemented as LUTerTY1 and BMO as EMBerY2 (Fig. 7.17). Let us discuss a case when it is enough a single EMB to implement the circuit of BF for P H 1 CFSM. There are the following steps in the method of synthesis for P H 1 CFSM: 1. 2. 3. 4. 5.
Finding the set of states A for a given GSA Γ . Executing the state assignment. Constructing the table of EMB. Constructing the table of LUTerY2 . Implementing the circuit of CFSM with fiven EMB and LUTs.
This method could be used if the condition (7.26) takes place. If (7.26) is violated, it is possible to use different methods of structural decomposition such as: 1. Replacement of logical conditions [8, 9]. 2. Encoding of the classes of PES [3, 12]. 3. Encoding of the collections of microoperations of Mealy FSM [3, 9].
Fig. 7.15 Structural diagram of P E CFCM
X
Y1 EMB TY
1
T
EMB Y2
2
Y
Start Clock
Fig. 7.16 Structural diagram of P H 1 CFCM
X
1
Y 1
EMBerTY
T
2
LUTerY
Y2
EMBerY2
Y
Start Clock
Fig. 7.17 Structural diagram of P H 2 CFCM
X
Y1 1
LUTerTY Start Clock
T
2
198 Fig. 7.18 Structural diagram of MP H CFCM
7 Mixed Encoding of Microoperations X
LUTerP
P
Y1 EMB
T
LUTerY2
Y2
Start Clock
Using RLC has sense if the following condition takes place: 2G+R (R + N1 ) ≤ V0 .
(7.27)
It leads to MP H CFSM (Fig. 7.18). In MP H CFSM, the LUTerP generates the functions (2.1), the LUTerY2 the functions (1.29). The EMB implements the functions (2.3) and Y 1 = Y 1 (T, P).
(7.28)
Let us find the partition Π A = {B1 , . . . , B I } for the set A. Let us encode each class Bi ∈ Π A by the binary code K (Bi ) having R B bits where R B is determined as (2.42). Let the following condition take place: 2 L+R B (R + N1 ) ≤ V0 .
(7.29)
In this case, it has sense to replace the codes K (am ) by K (Bi ). We use variables τr ∈ T for the encoding of the classes. It leads to P H C CFSM (Fig. 7.19). In P H C CFSM, functions Φ are represented as Φ(T , X ). The LUTerT Y 2 implements systems (1.3) and (2.43). Let us encode the CMOs Yq ⊆ Y 1 by binary codes K (Yq ) having R Q bits. The value of R Q is determined by (2.24) where Q is the number of CMOs Yq ⊆ Y 1 . Let us use the variables zr ∈ Z to encode the CMOs where |Z | = R Q . Let the following condition take place: (7.30) 2 L+R (R + R Q ) ≤ V0 . In this case, it is possible to use the model of P H Y CFSM (Fig. 7.20). Now, EMB generates functions Φ = Φ(T, X ) and Z = Z (T, X ). The LUTerY1 generates functions Y 1 = Y 1 (Z ), the LUTerY2 the functions Y 2 = Y 2 (T ).
Fig. 7.19 Structural diagram of P H C CFCM
X
Y1 EMB Start Clock
T
EMBerY2
Y2
7.3 Synthesis of FPGA-based Combined FSMs Fig. 7.20 Structural diagram of P H Y CFSM
199
X
Z EMB
T
1
1
Y
2
Y2
LUTerY LUTerY
Start Clock
There are some additional steps in synthesis method of P H Y CFSM as compared to P H 1 CFSM. They are the following: 1. Constructing the CMOs Yq ⊆ Y 1 . 2. Encoding of CMOs Yq ⊆ Y 1 . 3. Creating the table of LUTerY1 . Let us discuss an example of synthesis for CFSM P H 1 (Γ16 ). There is the GSA Γ16 shown in Fig. 7.21. The following sets could be found from Fig. 7.21: A = {a1 , . . . , a8 }, X = {x1 , . . . , x4 }, Y 1 = {y1 , . . . , y9 }, Y 2 = {y10 , . . . , y15 }. It gives M = 8, L = 4, N1 = 9, N2 = 6, N = 15. Using (1.4) gives R = 3, T = {T1 , T2 , T3 }, Φ = {D1 , D2 , D3 }. Let we have EMB with S A = 7 and t F = 6. There is L + R = 7 = S A and R + N1 = 12 > t F . So, it is necessary to encode the CMOs Yq ⊆ Y 1 . Fig. 7.21 Marked GSA Γ16
200
7 Mixed Encoding of Microoperations
There are MOs yn ∈ Y 1 written on the arks of Γ16 . It is possible to find the following CMOs: Y1 = ∅, Y2 = {y1 , y2 }, Y3 = {y3 , y8 , y9 }, Y4 = {y2 , y4 }, Y5 = {y3 , y5 }, Y6 = {y4 , y6 }, Y7 = {y7 , y8 }, Y8 = {y1 , y6 }. So, there is Q = 8. Using (2.24) gives R Q = 3 and Z = {z 1 , z 2 , z 3 }. There is R + R Q = 6 = t F . So, it is possible to use the model of CFSM P H Y(Γ16 ). Let it be S = 3. So, the following conditions take places: R ≤ S and R Q ≤ S. It means that there is no need in functional decomposition during implementing circuits of LUTerY1 and LUTerY2 . It is the base case for using the model of P H Y CFSM. Let us encode the CMOs Yq ⊆ Y 1 in such a manner that it gives the minimum number of literals in SBF Y 1 . Let us create the system (7.31) showing dependence of yn ∈ Y 1 on CMOs Yq ⊆ Y : y1 = Y2 ∨ Y8 ; y2 = Y2 ∨ Y4 ; y3 = Y3 ∨ Y5 ; y6 = Y6 ∨ Y8 ; y4 = Y4 ∨ Y6 ; y5 = Y5 ; y8 = Y3 ∨ Y7 ; y9 = Y3 . y7 = Y7 ;
(7.31)
There are the codes of CMOs shown in Fig. 7.22. They are obtained using the approach [14]. Using the system (7.31) and codes from Fig. 7.22 gives the following SBF: y1 = z¯ 1 z 2 ; y2 = z 2 z¯ 3 ; y3 = z 1 z¯ 2 ; y4 = z 1 z 2 ; y5 = z 1 z¯ 2 z¯ 3 ; y6 = z 2 z 3 ; y7 = z¯ 1 z¯ 2 z 3 ; y8 = z¯ 2 z 3 ; y9 = z 1 z¯ 2 z 3 .
(7.32)
It gives the circuit of LUTerY1 having 9 LUT S and 21 interconnections. The system (7.32) is used to form truth tables for LUTs of LUTerY1 . There are the following columns in table of EMB: K (am ), X, K (Yq ), K (a S ), h. It is constructed on the base of ST of P H Y Mealy FSM having the following columns: am , K (am ), as , K (as ), X h , Yh , K (Yh ), Z h , Φh , h. The MOs yn ∈ Y 2 are written in the column am . There is a part of ST for P H Y(Γ16 ) shown in Table 7.9. There is a part of table of EMB shown in Table 7.10. It represents transitions from state a3 . States are encoded in the trivial way: K (a1 ) = 000, . . . , K (a8 ) = 111. We add the column h 0 to show the correspondence between Tables 7.9 and 7.10. Table 7.10 is filled as it is for previous examples. The table of LUTerY2 is constructed as in previous examples for synthesis of Moore FSM. It is possible to encode states am ∈ A to minimize the numbers of
Fig. 7.22 Codes of CMOs for P H Y(Γ16 )
z3
z1 z2 00
01
11
10
0
Y1
Y2
Y4
Y5
1
Y7
Y8
Y6
Y3
7.3 Synthesis of FPGA-based Combined FSMs Table 7.9 Part of ST for CFSM P H Y(Γ16 ) am K (am ) as K (as ) X h a1 a2 (y10 )
000 001
a3 (y11 , y12 )
001
a2 a3 a4 a5 a6 a7 a8 a5
001 010 011 100 101 110 111 100
1 x1 x¯1 x2 x¯1 x¯2 x3 x4 x3 x¯4 x¯3 x2 x¯3 x¯2
201
Yh
Yh
Zh
Φ
h
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y4
000 010 101 110 100 111 001 110
− z2 z1 z3 z1 z2 z1 z1 z2 z3 z3 z1 z2
D3 D2 D2 D3 D1 D1 D3 D1 D2 D1 D2 D3 D1
1 2 3 4 5 6 7 8
Table 7.10 Part of table of EMB for CFSM P H Y(Γ16 ) K (am ) X K (Yq ) K (as ) T1 T2 T3 x1 x2 x3 x4 z1 z2 z3 D1 D2 D3 010 010 010 010 010 010 010 010 010 010
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
Fig. 7.23 Structural diagram of MP H C Y CFSM
110 110 111 100 001 001 111 100 110 110
X
100 100 110 101 111 111 110 101 100 100
LUTerP
h
h0
33 34 35 36 37 38 39 40 41 42
8 8 6 5 7 7 6 5 8 8
P
Z EMB
T
1
1
LUTerY
Y
LUTerY2
Y2
Start Clock
LUTs and interconnections in LUTerY2 . It is practically impossible in the discussed case due to the lack of insignificant input assignments [10, 11]. It is possible to combine together different methods of structural decomposition for Mealy or Moore FSMs. Obviously, it could be done for CFSM, too. For example, there is a structural diagram of MP H C Y CFSM shown in Fig. 7.23. In MP H C Y CFSM, three blocks of LUTs are used. The LUTerP implements functions pg ∈ P depending on T and X . The LUTerY1 implements functions Y (Z ), the functions Y 2 (T ) and T (T ) are implemented by LUTerT Y 2 . The EMB implements functions Z (T , P) and Φ(T , P).
202
7 Mixed Encoding of Microoperations
It is possible to use the mixed encoding of microoperations in combined FSMs. It could be used for either CMOs Yq ∈ Y 1 or Yq ⊆ Y 2 or for both. Let us discuss some of possible solutions.
7.4 Mixed Encoding for Combined FSMs Let it be necessary to implement a circuit of CFSM with LUTs and a single EMB. Let the following conditions take places for EMB and GSA Γ : tF ≥ R + RQ ;
(7.33)
t F < R + N1 ;
(7.34)
S A = L + R.
(7.35)
It means that some MOs yn ∈ Y 1 could be generated by EMB. It leads to P H Y H CFSM (Fig. 7.24). Obviously, there is Y 1 = Y L1 ∪ Y E1 . It is necessary to divide the set Y 1 in such a way that it minimizes: (1) the number of LUTs in LUTerY1L ; (2) the number of interconnections. To do it, we can use the same approach as for PY M Mealy FSM. Let us discuss an example of synthesis for CFSM P H Y H (Γ17 ). There is the GSA Γ17 shown in Fig. 7.25. We could find the following sets and their parameters for CFSM P H (Γ17 ): A = {a1 , . . . , a12 }, M = 12, X = {x1 , . . . , x4 }, L = 4, Y 1 = {y1 , . . . , y9 }, N1 = 9 Y 2 = {y10 , . . . , y14 }, N2 = 5, N + N1 + N2 = 15. Also, it is possible to find Q = 17 CMOs Y Q ⊆ Y 1 (Table 7.11). Also, there is the CMO Y1 = ∅. Let us discuss the most difficult case when the conditions (3.46) and (4.1) take places. It means that R > S and R Q > S. So, it is necessary to apply the functional decomposition to both functions Y 1 = Y 1 (Z ) and Y2 = Y 2 (T ). In this case, we propose the following method of synthesis: 1. 2. 3. 4.
Constructing the set A. Executing the diminishing state assignment. Constructing the CMOs Yq ⊆ Y 1 . Executing the mixed encoding of CMOs Yq ⊆ Y 1 .
Fig. 7.24 Structural diagram of P H Y H CFSM
X
Z 1
EMB Start Clock
YE T
LUTerY1
Y1
LUTerY2
Y
2
7.4 Mixed Encoding for Combined FSMs
203
Fig. 7.25 Initial GSA Γ17
Table 7.11 Colections of MOs for GSA Γ17 q Yq q Yq 2 3 4 5
y1 y2 y3 y3 y4 y6 y2 y7 y4 y8
6 7 8 9
y3 y5 y9 y3 y5 y6 y1 y4 y6 y2 y3 y6
q
Yq
q
Yq
10 11 12 13
y1 y2 y7 y2 y3 y2 y6 y7 y1 y4
14 15 16 17
y1 y6 y1 y6 y1 y3 y4
204
5. 6. 7. 8. 9.
7 Mixed Encoding of Microoperations
Constructing the table of EMB. Constructing the system Y L1 = Y L1 (Z ). Constructing the system Y 2 = Y 2 (T ). Constructing truth tables for functions yn ∈ Y L1 and yn ∈ Y 2 . Implementing CFSM logic circuit.
Let it be S = 3. Using (1.4) gives R = 4, using (2.24) gives R Q = 5. So, there are R > S and R Q > S. So, it is necessary to execute the diminishing state assignment if condition (7.33) takes place. Let it be an EMB having the configuration 256 × 9. So, there is S A = 8 and t F = 9. Because R + L = 8 = S A , it is possible to use the model of P H Y H CFSM in the discussed case. It follows from the condition (7.35). Let us construct the system Y 2 = Y 2 (A). It could be done using the information from operational vertices of GSA Γ17 . The following system could be found: y10 = A2 ∨ A3 ∨ A7 ∨ A11 ; y11 = A3 ∨ A5 ∨ A8 ∨ A11 ; y12 = A4 ∨ A6 ∨ A9 ∨ A12 ; y13 = A5 ∨ A8 ∨ A9 ; y14 = A10 .
(7.36)
Using approach [14] gives the state codes shown in Fig. 7.26. Using codes from Fig. 7.26 turns the system (7.36) into the following system: y10 = T¯1 T2 ; y11 = T¯1 T4 ; y12 = T1 T¯3 ; y13 = T¯2 T4 ; y14 = T1 T3 .
(7.37)
It follows from (7.37) that there are: (1) 5 LUTs in the circuit of LUTerY2 and (2) 10 interconnections. In the worst case there are: (1) 15 LUTs; (2) 45 interconnections and (3) two levels of logic. There is only a single level of logic in the circuit determined by (7.37). There are the same CMOs for PY H (Γ15 ) and P H Y H (Γ17 ). It could be found from comparison of Tables 7.1 and 7.11. So, the process of dividing the set Y 1 is the same as it is shown in Table 7.2. As a result, MOs y1 and y6 are placed in the set Y E1 . It gives the set Y L1 = {y2 , y3 , y4 , y5 , y7 , y8 , y9 }. There are resulting CMOs Yq ⊆ Y L1 shown in Table 7.4. Let us encode these CMOs as it is shown in Fig. 7.10. Now the system (7.12) represents LUTs from LUTerY1 . Fig. 7.26 State codes for CFSM P H Y H (Γ17 )
T1 T2 00
01
00
a1
a2 a12 a4
01
a5
a3
11
a8 a11
T3 T4
10
11
a6
a7 a10
10
a9
7.4 Mixed Encoding for Combined FSMs
205
Let us construct the table of EMB. To do it, it is necessary to construct an ST of CFSM P H Y M (Γ17 ). It includes the following columns: am , K (am ), as , K (as ), X h , Y E1 , Y L1 , K (Y L1 ), Φh , h. Let the column Y L1 consist from CMOs Yq ⊆ Y L1 . Table 7.12 represents a part of ST. In Table 7.12, we use state codes from Fig. 7.26, the collections of MOs from Table 7.4, codes of CMOs from Fig. 7.5. Using Table 7.12, we could build the table of EMB. It includes the following columns: K (am ), X, Y E1 , K (Y L1 ), Φ, h. There is H = 256 and H (am ) = 16 in the discussed case. For example, Table 7.13 shows transitions from the state a3 ∈ A. Table 7.12 Part of ST of CFSM P H Y(Γ17 ) am K (am ) as K (as ) X h a1 (−)
0000
a2 (y10 )
0100
a3 (y10 y11 )
0101
a2 a3 a4 a5 a2 a6 a5 a2 a6 a5
0100 0101 0001 1001 0100 0010 1001 0100 0010 1001
x1 x2 x1 x¯2 x¯1 x3 x¯1 x¯3 x2 x¯2 x4 x¯2 x¯4 x2 x¯2 x4 x¯2 x¯4
Y E1
Y L1
K (Y L1 ) Φ
h
y1 y6 − − − y2 y6 y6 − y2 y6 y6
Y2 Y8 Y7 Y4 Y5 Y3 Y6 Y5 Y3 Y6
011 101 110 010 100 001 111 100 001 111
1 2 3 4 5 6 7 8 9 10
Table 7.13 Part of table of EMB for CFSM P H Y M (Γ17 ) K (am ) X Y E1 K (Y L1 ) Φ h0 T1 T2 T3 T4 x1 x2 x3 x4 y1 y6 z1 z2 z3 D1 D2 D3 D4 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
01 11 01 11 00 00 00 00 01 11 01 11 00 00 00 00
111 001 111 001 100 100 100 100 111 001 111 001 100 100 100 100
1001 0010 1001 0010 0100 0100 0100 0100 1001 0010 1001 1001 0100 0100 0100 0100
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
D2 D2 D4 D4 D1 D4 D2 D3 D1 D4 D2 D3 D1 D4
h 10 9 10 9 8 8 8 8 10 9 10 9 8 8 8 8
206
7 Mixed Encoding of Microoperations
There is K (a3 ) = 0101. It means that there are 5H (am ) = 80 rows of table of EMB before the first row of Table 7.13. So, there is h 0 = 81 for the first row of Table 7.13. We add the column h to show the correspondence between Table 7.12 and Table 7.13. The one-hot codes for yn ∈ Y E are constructed on the base of column Y E1 of Table 7.12. We do not discuss the last step of the proposed method. It requires using special CAD tools such as [13]. Let us point out that it is difficult to investigate this method because there is no a library of standard benchmarks for CFSMs. It is possible to use the mixed encoding of CMOs together with other methods of structural decomposition of CFSMs. For example, there are the structural diagrams of MP H Y M (Fig. 7.27), P H C Y M (Fig. 7.28) MP H C Y M (Fig. 7.29). We hope that methods of synthesis of their circuits are clear for our readers. For example, let us discuss the synthesis method for MP H C Y M CFSM. It combines synthesis methods for MP H C and P H Y M CFSMs. There are the following steps in the proposed method of synthesis: 1. 2. 3. 4. 5.
Constructing the set of internal states A. Constructing the partition Π A for the set A. Executing the diminishing state assignment. Executing the replacement of logical conditions. Constructing the CMOs Yq ⊆ Y 1 .
Fig. 7.27 Structural diagram of MP H Y M CFSM
1
X
LUTerP
YE
P
Z
EMB
T
LUTerY1
Y1L
LUTerY2
Y2
Start Clock
1
YE
Fig. 7.28 Structural diagram of P H C Y M CFSM
X
Z
EMB
T
1
LUTerY
2
LUTerY
1
Y
2
Y
Start Clock
Fig. 7.29 Structural diagram of MP H C Y M CFSM
1
X
LUTerP
YE
P
EMB Start Clock
Z T
1
LUTerY
Y1
LUTerY2
Y
2
7.4 Mixed Encoding for Combined FSMs
6. 7. 8. 9. 10. 11. 12. 13.
207
Executing the mixed encoding of MOs yn ∈ Y 1 . Constructing the system P = P(T , X ). Constructing the ST of MP H C Y M CFSM. Constructing the table of EMB on the base of ST. Constructing the system Y L1 = Y L1 (Z ). Constructing the systems Y 2 = Y 2 (T ) and T = T (T ). Constructing truth tables for functions pg ∈ P, yn ∈ Y L1 ∪ Y 2 and τr ∈ T . Implementing CFSM logic circuit.
7.5 Mixed Encoding for LUT-based FSMs Let us discuss LUT-based Mealy FSMs with encoding of CMOs. There are two blocks in PY Mealy FSM (Fig. 7.30). The LUTerΦ Z implements functions Φ = Φ(T, X ) and Z = Z (T, X ), the LUTerY the functions Y = Y (Z ). Let us discuss the case when R Q > S. It corresponds to relation (3.46). In this case, it is necessary to use the functional decomposition for functions yn ∈ Y . Let CMOs Yq ⊆ Y create a set Y Q . Let us use the procedure of excluding MOs from CMOs Yq ⊆ Y . We use the procedure shown in Fig. 7.3. Let Y E be a set of excluded microoperations and Y R = Y \Y E . The excluding turns the set Y Q into Y Q R having Q R elements. We terminate this process when the following condition takes place: R Q R = log2 Q R = S.
(7.38)
Let us encode the CMOs Y Q ⊆ Y R by binary codes K R (Yq ) having R Q R bits. Let us use the variables zr ∈ Z for such encoding. The excluded MOs yn ∈ Y E create some CMOs Yq ⊆ Y E . Let they form a set Y Q E with Q E elements. Let the following condition take place R Q E = log2 Q E = S.
(7.39)
In this case, we encode each CMO Yq ∈ Y Q E by a binary code K E (Yq ). Let us use the variables vr ∈ V for such encoding. Now, each CMO Yq ⊆ Y could be represented by two codes: K (Yq ) = K R (Yi ) ∗ K E (Y j ),
(7.40)
where Yi ∈ Y Q R and Y j ∈ Y Q E . It leads to the PY1 Mealy FSM (Fig. 7.31). Fig. 7.30 Structural diagram of PY Mealy FSM
X
Z LUTerΦZ Start Clock
T
LUTerY
Y
208
7 Mixed Encoding of Microoperations
Fig. 7.31 Structural diagram of PY1 Mealy FSM
Z
X
EMB Start Clock
V
LUTerYR
YR
LUTerYE
YE
T
In PY1 Mealy FSM, the LUTer1 generates functions (1.1), (2.25) and (5.10). The LUTerY R implements the functions yn ∈ Y R and LUTerY E functions yn ∈ Y E represented respectively as: (7.41) Y R = Y R (Z ); Y E = Y E (V ).
(7.42)
Let us discuss this approach for some Mealy FSM S1 having Q = 18 CMOs (Table 7.14). Let it be S = 3. Using (2.24) gives R Q = 5 > S. In the worst case, there are 7N = 98 LUTs and 21N = 294 interconnections in the circuit of LUTerY of PY Mealy FSM S1 . Let us divide the set Y = {y1 , . . . , y14 } by the sets Y R and Y E . We do not show the process of division. But using the procedure discussed above we could get the following sets Y R = {y1 , y3 , y4 , y5 , y6 , y8 , y9 , y11 } and Y E = {y2 , y7 y10 , y12 , y13 , y14 }. There are the CMOs Yq ⊆ Y E represented by Table 7.15. The CMOs Yq ⊆ Y R are shown in Table 7.16. There is Q R = 8, Q E = 6, R Q R = R Q E = 3. So, both conditions (7.38)–(7.39) are true. It is necessary N E = |Y E | = 6 LUTs and 3N E = 18 interconnections in the circuit of LUTerY E . It is necessary N R = |Y R | = 8 LUTs and 3N R = 24 interconnections Table 7.14 Table of CMOs for Mealy FSM S1 q Yq q 1 2 3 4 5 6
− y1 y2 y3 y12 y2 y3 y4 y5 y12 y3 y4 y5 y7 y4 y6 y4 y6 y10
7 8 9 10 11 12
Yq
q
Yq
y2 y4 y6 y7 y13 y2 y6 y8 y10 y14 y2 y6 y7 y8 y13 y2 y6 y8 y12 y1 y9 y1 y2 y9 y10 y14
13 14 15 16 17 18
y1 y2 y7 y9 y13 y9 y11 y2 y9 y11 y12 y2 y7 y13 y10 y1 y4 y9 y12
Table 7.15 Table of CMOs Yq ⊆ Y E q
Yq
q
Yq
19 20 21
− y2 y12 y7
22 23 24
y10 y2 y10 y14 y2 y7 y13
7.5 Mixed Encoding for LUT-based FSMs
209
Table 7.16 Table of CMOs Yq ⊆ Y R q
Yq
q
Yq
25 26 27 28
− y1 y3 y3 y4 y5 y4 y6
29 30 31 32
y6 y8 y1 y9 y9 y11 y4 y9
Table 7.17 Representation of initial CMOs Yq ⊆ Y Yq
YR
YE
Yq
YR
YE
Yq
YR
YE
Y1 Y2 Y3 Y9 Y5 Y6
Y15 Y26 Y27 Y27 Y28 Y28
Y19 Y20 Y20 Y21 Y19 Y22
Y7 Y8 Y9 Y10 Y11 Y12
Y28 Y29 Y29 Y29 Y30 Y30
Y24 Y23 Y24 Y20 Y19 Y23
Y13 Y14 Y15 Y16 Y17 Y18
Y30 Y31 Y31 Y25 Y25 Y30
Y24 Y19 Y20 Y24 Y22 Y20
in the circuit of LUTerY R . So, it is necessary 14 LUTs and 42 interconnections for the circuit implementing the system Y . To get the initial CMOs Yq ⊆ Y (Table 7.14), it is necessary to use some pairs of CMOs Yq ⊆ Y R and Yq ⊆ Y E . The corresponding representation is shown in Table 7.17. For example, Y2 = Y26 ∪ Y20 , Y3 = Y27 ∪ Y20 . It means that K (Y2 ) = K R (Y26 ) ∗ K E (Y20 ), K (Y3 ) = K R (Y27 ) ∗ K E (Y20 ). There are the following steps in the proposed synthesis method for PY1 Mealy FSM: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Constructing the set A for a given GSA Γ . Executing the state assignment. Constructing the set of CMOs Yq ⊆ Y . Dividing the set Y by Y E and Y R . Constructing the CMOs Yq ⊆ Y E and Yq ⊆ Y R . Encoding the CMOs. Constructing the ST of P Mealy FSM. Constructing the ST of PY1 Mealy FSM. Constructing the systems Φ(T, X ), Z (T, X ), V (T, X ), Y R (Z ) and Y E (V ). Implementing the FSM logic circuit with particular LUTs.
Let us discuss a case of synthesis for PY1 Mealy FSM S1 . We have executed the steps 3, 4, 5 of this method. Let it be LUTs with S = 3. It means that the model PY1 could be used. The encoding of CMOs should lead to hardware reduction in the circuits of LUTerY R and LUTerY E . Let us execute the diminishing encoding.
210
7 Mixed Encoding of Microoperations
There is R Q R = R Q E = 3. It gives the sets Z = {z 1 , z 2 , z 3 } and V = {v1 , v2 , v3 }. Let us form the system showing dependence of MOs yn ∈ Y R from CMOs. It is the following one: y1 = Y26 ∨ Y30 ; y3 = Y26 ∨ Y27 ; y4 = Y27 ∨ Y28 ∨ Y32 ; y6 = Y28 ∨ Y29 ; y8 = Y29 ; y5 = Y27 ; y9 = Y30 ∨ Y31 ; y11 = Y31 .
(7.43)
Let us use the approach [14]. It gives the codes K R (Yq ) show in Fig. 7.32. Using the system (7.43) and codes K R (Yq ) from Fig. 7.32, we could obtain the following SBF: y1 = z¯ 1 z 2 ; y3 = z 2 z¯ 3 ; y6 = z 1 z¯ 2 y4 = z 1 z¯ 3 ∨ z¯ 1 z¯ 2 z 3 ; y5 = z 1 z 2 z¯ 3 ; y8 = z 1 z¯ 2 z 3 ; y11 = z 1 z 2 z 3 . y9 = z 2 z 3 ;
(7.44)
Analysis of (7.44) shows that there are 8 LUTs and 20 interconnections in the circuit of LUTerY R . Let us form the system showing dependence of MOs yn ∈ Y E from CMOs. It is the following one: y2 = Y20 ∨ Y23 ∨ Y24 ; y7 = Y21 ∨ Y24 ; y10 = Y22 ∨ Y23 ; y13 = Y24 ; y14 = Y23 . y12 = Y20 ;
(7.45)
Using [14], we could get the codes shown in Fig. 7.33. Using the system (7.45) and codes from Fig. 7.33 gives the following system: y7 = v1 ; y10 = v3 ; y2 = v2 ; y12 = v¯ 1 v2 v¯ 3 ; y13 = v1 v2 ; y14 = v2 v3 .
Fig. 7.32 Codes of CMOs Yq ⊆ Y R
Fig. 7.33 Codes of CMOs Yq ⊆ Y E
z3
v3
(7.46)
z1 z2 00
01
11
10
0
Y25 Y26 Y27 Y28
1
Y32 Y30 Y31 Y29
v1 v2 00
01
11
10
0
Y19 Y20 Y24 Y21
1
Y22 Y23
7.5 Mixed Encoding for LUT-based FSMs Table 7.18 Part of ST of P Mealy FSM S1 am K (am ) as K (as ) Xh a3
0010
a4 a5 a7 a3
0011 0100 0110 0010
0010
a4 a5 a7 a3
0011 0100 0110 0010
x1 x4 x1 x¯4 x¯1 x3 x¯1 x¯3
Yh
Yq
Φ
h
y4 y6 y10 y2 y6 y8 y12 y1 y9 y3 y4 y5 y7
Y6 Y10 Y11 Y4
D3 D4 D2 D2 D3 D3
10 11 12 13
x1 x4 x1 x¯4 x¯1 x3 x¯1 x¯3
Table 7.19 Part of ST of PY1 Mealy FSM S1 am K (am ) as K (as ) X h YR YE a3
211
Y28 Y29 Y30 Y27
Y22 Y20 Y19 Y21
K R (Yq ) K E (Yq )
Zh
Vh
Φ
h
100 101 011 110
z1 z1 z3 z2 z3 z1 z2
V3 V2 − V1
D3 D4 D2 D2 D3 D3
10 11 12 13
001 010 000 100
Analysis of (7.46) shows that there are 3 LUTs and 8 interconnections in the circuit of LUTerY E . Let us have a part of ST of P Mealy FSM S1 (Table 7.18). It represents transitions from state a3 ∈ A. There is R0 = 4; the states are encoded in the trivial way: K (a1 ) = 0000, K (a2 ) = 001, …. We add the column Yq to show which CMO Yq ⊆ Y is written in the row h of the ST. To construct the ST of PY1 Mealy FSM, we should replace the columns Yh , Yq by the columns Y R , Y E , K R (Yq ), K E (Yq ), Z h , Vh . It leads to the Table 7.19. Using the ST of PY1 Mealy FSM, we can find the systems Φ(T, X ), Z (T, X ), V (T, X ). Next, the obtained functions are used to form the truth tables for LUTs of LU T er 1. Using systems (7.42) and (7.44) gives truth tables for LUTs of LUTerY R and LUTerY E . We do not discuss these steps in this Chapter. We can conclude that our approach could give a tremendous saving in both LUTs and interconnections. Also, it could decrease the propagation time for Mealy FSMs.
References 1. Asahar P, Devidas S, Newton A (1992) Sequential logic synthesis. Kluwer Academic Publishers, Boston 2. Baranov S (1994) Logic synthesis of control automata. Kluwer Academic Publishers 3. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on CPLD. In: Proceedings of the sixth international conference CAD DD’07, vol 2, Minsk, pp 39–45 4. Barkalov A, Titarenko L, Chmielewski S (2014) Hardware reduction in CPLD-based Moore FSM. J Circuits, Syst, Comput 23(6):1450086–1–1450086–21 5. Barkalov A, Titarenko L, Kołope´nczyk M (2006) Optimization of control unit with code sharing. In: Proceedings of the 3rd international workshop of IFAC discrete–event system design (DESDES’06). University of Zielona Góra Press, Rydzyna, pp 195–200
212
7 Mixed Encoding of Microoperations
6. Barkalov A, Titarenko L, Kołope´nczyk M (2006) Optimization of control unit with code sharing. In: Proceedings of the IEEE East-West design & test workshop (EWDTW’06). Kharkov National University of Radioelectronics, Sochi, Kharkov, pp 171–174 7. Barkalov A, Titarenko L, Kołope´nczyk M (2007) Optimization of control memory size of control unit with codes sharing. In: Proceedings of the IXth international conference CADSM 2007 “The experience of designing and application of CAD systems in microelectronics”. Lviv–Polana, Ukraine, pp 242–245 8. Barkalov A, Titarenko L, Wi´sniewski R (2006) Optimization of address circuit of compositional microprogram unit. In: Proceedings of the IEEE East-West design & test workshop (EWDTW’06). Kharkov National University of Radioelectronics, Sochi, Kharkov, pp 167–170 9. Barkalov A, Titarenko L, Wi´sniewski R (2006) Synthesis of compositional microprogram control units with sharing codes and address decoder. In: Proceedings of the international conference mixed design of integrated circuits and systems–MIXDES 2006. Łódz, pp 397– 400 10. Barkalov A, We˛grzyn M, Wi´sniewski R (2006) Partial reconfiguration of compositional microprogram control units implemented on FPGAs. In: Proceedings of IFAC workshop on programmable devices and embedded systems (Brno), pp 116–119 11. Kim T, Vella T, Brayton R, Sangiovanni-Vincentalli A (1997) Synthesis of finite state machines: functional optimization. Kluwer Academic Publishers, Boston 12. QuickLogic (2019). http://www.quicklogic.com 13. Rudell R, Sangiovanni-Vincentelli A (1987) Multiple-valued minimization for PLA optimization. IEEE Trans Comput-Aided Des 6(5):727–750 14. Tatalov E (2011) Synthesis of compositional microprogram control units for programmable devices. Master’s thesis, Donetsk National Technical University, Donetsk
Chapter 8
Synthesis of Mealy FSMs with Counters
Abstract The Chapter is devoted to the using linear chains in Mealy FSMs. The known counter-based models of Moore FSMs are discussed together with corresponding design methods. Then, there are proposed models and design methods for counter-based Mealy FSMs. There are proposed synthesis methods based on natural and extended linear chains of states. Next, there is discussed the case of synthesis for regular GSA having a single chain of states. There are proposed different models of counter-based Mealy FSMs based on combining together known and proposed methods of structural decomposition.
8.1 Using Counters in Control Units Since the fifties of the twentieth century, binary counters were used for reducing hardware amount in control units’ circuits [14, 15]. Initially, the counters were used in MCUs with natural or combined addressing of microinstructions [3]. It reduced the size of the control memory which was very expensive [1]. The synthesis was based on creation of the linear chains of microinstructions. The transitions inside a chain were executed due to incrementing the content of the counter. In compositional microprogram control units (CMCU), the synthesis was based on creating operational linear chains (OLC) [7]. Each OLC includes operational vertices such that for each pair of vertices exists a direct connection. There are no conditional vertices inside OLCs. The principle is clear from Fig. 8.1. Let the symbol αg stand for OLC. The following OLCs could be created from vertices b2 − b4 (Fig. 8.1): α1 = b2 , α2 = b2 , b3 , α3 = b2 , b3 , b4 , α4 = b3 , α5 = b3 , b4 , α6 = b4 . As you can see, there is a connection between the output of operational vertex bi with the input of operational vertex bj for each pair bi , bj . To optimize hardware, the longest OLCs are created [12, 13]. There is the theory of CMCU and methods of their synthesis represented in [7]. Let us point out that both CMCU and MCU are Moore FSMs [12]. Till now, counters are used only in Moore FSMs. There is the theory and methods of synthesis for Moore FSMs with counters represented in [11]. In this Chapter, we propose © Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7_8
213
214
8 Synthesis of Mealy FSMs with Counters
using counters in FPGA-based Mealy FSMs. Before doing it, let us discuss the corresponding methods targeting Moore FSMs. Let us introduce some definitions. Definition 8.1 A linear chain of states (LCS) is a finite vector αg =ag1 , ag2 , . . . , agFg such that there is an unconditional transition agi , agi+1 for any pair of adjacent components of αg . Each LCS αg has at least one input Ig and exactly a single output Og . Definition 8.2 A state am ∈ A(αg ), where A(αg ) ⊆ A is a set of states from LCS αg , is an input of LCS αg if the input of operational vertex marked by am is connected with the output of any vertex which is not marked by a state as ∈ A(αg ). Definition 8.3 A state am ∈ A(αg ) is an output of LCS αg if the output of the vertex marked by am is connected with the input of any vertex which is not marked by a state as ∈ A(αg ). Definition 8.4 A state am ∈ A(αg ) is a main input of LCS αg if the input of operational vertex marked by am is not connected with the output of any operational vertex of GSA Γ . We use the symbol MFg to denote the main input of an LCS αg . Let us consider Fig. 8.1. It is possible to form the longest LCS α3 = a4 , a5 , a6 . Using Definitions 8.2–8.4 gives inputs I31 = a4 , I32 = a5 , output O3 = a6 and main input MI3 = a4 . Let us find a partition C = {α1 , . . . , αG } of the set A by LCSs such that: G
A(αg ) = A\{a1 };
(8.1)
g=1
A(αg ) = ∅ (g ∈ {1, . . . , G});
(8.2)
A(αi ) ∩ A(αj ) = ∅ (i = j; i, j ∈ {1, . . . , G});
(8.3)
G → min.
(8.4)
These conditions mean that: 1. 2. 3. 4.
Any LCS αg ∈ C includes at least a single state am ∈ A (condition (8.2)). There are no states which are not included in some LCS αg ∈ C (condition (8.1)). Any state am ∈ A is included only in a single LCS (condition (8.3)). There are minimal possible amount of elements in the set C (condition (8.4)). It means that any chain should be as long as possible.
Let us execute the natural state assignment [2] inside each LCS αg ∈ C. It means that there is
8.1 Using Counters in Control Units
215
Fig. 8.1 A part of GSA Γ
+1
Fig. 8.2 Structural diagram of Pd Moore FSM X
y0 BIMF
Φ
CT
T
BMO
Y
Start Clock
K(as ) = K(am ) + 1
(8.5)
for each pair of adjacent states of each LCS αg ∈ C. Let symbol Pα stand for Moore FSM with a counter of states. It has the following structural diagram (Fig. 8.2). The variable y0 controls the counter (CT). If y0 = 1, then CT := CT + 1. If y0 = 0, then CT := Φ. The content of CT could be changed if Clock is changed from 1 to 0. The Pα Moore FSM operates in the following manner. If Start = 1, then CT := 0. So, Start initializes the operation loading into CT the zero code of state a1 ∈ A. Let some code K(am ) be in CT in the instant t (t = 0, 1 . . .). If am = Og (g ∈ {1, . . . , G}), then the variable y0 is generated by BMO. It leads to incrementing the counter. If am = Og , then y0 = 0. So, the CT is loaded using the functions Dr ∈ Φ generated by BIMF. The operation is terminated if the state a1 ∈ A is a state of transition. There are the following steps [11] in the method of synthesis of Pα Moore FSM: 1. 2. 3. 4. 5. 6. 7.
Creating the set of internal states A. Constructing the set of LCSs C = {α1 , . . . , αG }. Executing the natural state assignment. Constructing the structure table of Pα FSM. Constructing the system of input memory functions. Constructing the table of MOs yn ∈ Y . Implementing the FSM logic circuit.
Let us discuss an example of synthesis for Moore FSM Pα (Γ18 ). The GSA Γ18 is shown in Fig. 8.3. It is marked by states of Moore FSM using rules [4]. The following sets could be found for Pα (Γ18 ): A = {a1 , . . . , a13 }, X = {x1 , x2 , x3 }, Y = {y1 , . . . , y7 }. There is M = 13. Using (1.4) gives R = 4. Therefore, there are the sets T = {T1 , . . . , T4 } and Φ = {D1 , . . . , D4 }. Using approach [11], we can find the partition C = {α1 , . . . , α5 } where α1 = a2 , a3 , a4 , α2 = a5 , . . . , a8 , α3 = a9 , α4 = a10 , a12 , α5 = a11 , a13 . So, there is G = 5. Also, we could find that I11 = a2 , I12 = a4 , I21 = a5 , I22 = a7 , I31 = O3 = a9 ,
216 Fig. 8.3 Initial GSA Γ18
8 Synthesis of Mealy FSMs with Counters Start
a1
y2y4
a2
y2y4
a3
y2y4
a4
x2
x2
y2y4
a5
y2y4
a6
y2y4
a7
y2y4
a8
1
1
y1y2
1
x1 0
a10
y2y4
y2y4
a11
a12
y2y4
y2y4
a13
End
a1
Fig. 8.4 Outcome of natural state assignment for Moore FSM Pα (Γ18 )
I4 = a10 , I5 = a11 , O1 = a4 , O2 = a8 , O4 = a12 and O5 = a13 . The set C satisfies to conditions (8.1)–(8.4). There is a very simple algorithm of natural state assignment proposed in [11]. Using this algorithm produces the state codes shown in Fig. 8.4. It is necessary to create the indexsystem of!generalized formulae of transitionssystem of formulae of transitions (SFT) [5, 11] for constructing the ST. This system is constructed for outputs of LCSs αg ∈ C. In the discussed case, there is the following SFT:
8.1 Using Counters in Control Units
217
Table 8.1 Structure table of Moore FSM Pα (Γ18 ) am K(am ) as K(as ) a4
0011
a8
0111
a9
1000
a12 a13
1010 1100
a5 a7 a9 a10 a11 a10 a11 a4 a1
0100 0110 1000 1001 1011 1001 1011 0011 0000
a4 → x1 a5 ∨ x¯1 x2 a7 ∨ x¯1 x¯2 a9 ; a9 → x3 a10 ∨ x¯3 a11 ; a13 → a1 .
Xh
Φh
h
x1 x¯1 x2 x¯1 x¯2 x3 x¯3 x3 x¯3 1 1
D1 D2 D3 D1 D1 D4 D1 D3 D4 D1 D4 D1 D3 D4 D3 D4 −
1 2 3 4 5 6 7 8 9
a8 → x3 a10 ∨ x¯3 a11 ; a12 → a4 ;
(8.6)
The Table 8.1 represents the system (8.6). It has H rows where H is equal to the number of terms in the SFT. The ST is used to extract the system (1.1). The functions Dr ∈ Φ depend on terms (1.5). After minimizing, the following system is derived from the ST (Table 8.1): D1 D2 D3 D4
= T¯1 T¯2 T3 T4 x¯1 x¯2 ∨ T¯1 T2 T3 T4 ∨ T1 T¯2 T¯3 T¯4 ; = T¯1 T¯2 T3 T4 x1 ∨ T¯1 T¯2 T3 T4 x2 ; = T¯1 T¯2 T3 T4 x¯1 x2 ∨ T¯1 T2 T3 T4 x¯3 ∨ T1 T¯2 T¯3 T¯4 x¯3 ∨ T1 T¯2 T3 T¯4 ; = T¯1 T2 T3 T4 ∨ T1 T¯2 T¯4 .
(8.7)
This system is a base to design the circuit of BIMF. The table of MOs is represented by Table 8.2. It has M rows. If am = Og , then there is y0 = 1 in the row m of the table (m ∈ {1, . . . , M }, g ∈ {1, . . . , G}). There is clear meaning for each column of Table 8.2. In both tables, we use state codes from Fig. 8.4. To fill the column Y (am ) of Table 8.2, we use CMOs from operational vertices of GSA Γ18 . As follows from Table 8.2, y0 = 1 for a1 . It is necessary to execute the transition a1 → a2 . If there are conditional transitions from state a1 , then it is necessary to add the LCS α0 = a1 in the set C [11]. It does not change the condition (8.1). As a rule, some memory blocks are used to implement the circuit of BMO [5, 10]. So, there is no need in creating the system Y = Y (T ) and the equation for the function y0 . There is the structural diagram of FPGA-based Moore FSM with LCSs (Fig. 8.5). The LUTerΦ implements the functions (1.1). So, it corresponds to the BIMF from Fig. 8.2. The EMBerY corresponds to the BMO. Some LUTs are used to implement the circuit of CT.
218
8 Synthesis of Mealy FSMs with Counters
Table 8.2 Table of MOs of Moore FSM Pα (Γ18 ) am K(am ) Y (am ) a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1011 1010 1100
Fig. 8.5 Structural diagram of FPGA-based Pα Moore FSM
− y1 y2 y3 y2 y4 y2 y5 y3 y2 y6 y1 y3 y7 y2 y4 y3 y6 y2 y6
y0
m
1 1 1 0 1 1 1 0 0 1 1 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13
+1 y0
X
LUTerΦ
Φ
CT
T
EMBerY
Y
Start Clock
y0
C C2
Φ
Start
Fig. 8.6 Organization of CT for Pα Moore FSM
1
Clock
Inputs
C1 C2 R CT
Ouptuts
T
There is the organization of CT shown in Fig. 8.6. The input C1 is used for incrementing the CT. The input C2 is used for loading the CT from the outputs of LUTerΦ. There is a special logic distributing the pulse Clock among the inputs C1 and C2 . The pulse Start enters the clearing input R of the CT. There is one common feature that unites all control units discussed in this Section. Namely, there is no dependence of their output functions from logical conditions. So, these control units are Moore FSMs. We could not find the methods of synthesis of Mealy FSMs with counters. Due to it, we propose such methods in the following Sections of this Chapter.
8.1 Using Counters in Control Units
219
It is known [2, 12] that the use of counters gives the greatest effect when linear GSAs are interpreted. Let it be Nop operational vertices and Ncon conditional vertices in some GSA Γ . The GSA Γ is called linear, if the following condition takes place: Nop ≥ 0, 75(Nop + Ncon ).
(8.8)
Further, we consider the synthesis methods targeting Mealy FSMs, linear GSAs and FPGA chips.
8.2 Using Counters in Mealy FSMs Let us consider the GSA Γ19 (Fig. 8.7). There is Nop = 20 and Ncon = 6 for Γ19 . So, the condition (8.8) takes place. The GSA Γ19 is a linear GSA. It is marked by internal states of Mealy FSM. The following sets could be derived from GSA Γ19 : A = {a1 , . . . , a15 }, X = {x1 , . . . , x6 }, Y = {y1 , . . . , y9 }. Using (1.17) gives R0 = 4. Therefore, there are the sets T = {T1 , . . . , T4 } and Φ = {D1 , . . . , D4 }. Let us mark a GSA Γ by the states of Mealy FSM. Let us divide the set A by LCS αg ∈ C. Let the partition C satisfy to conditions (8.2)–(8.4). Let it be the chain α1 with IM1 = a1 . So, the condition (8.1) is transformed into the following one: G
A(αg ) = A.
(8.9)
g=1
Let us execute the natural state assignment for states am ∈ A. Now, we can propose a structural diagram of Pα Mealy FSM (Fig. 8.8). The BIMF generates functions (1.1) used for executing transitions different from (8.5). The BMO generates MOs yn ∈ Y as functions (1.2). Also, it generates the function y0 = y0 (T , X ). There are the same steps in design methods of Pα Moore and Mealy FSMs. But there is no need in the table of BMO for Pα Mealy FSM. Functions (1.2) and y0 = y0 (T , X ) are derived from the ST. Let us discuss an example of synthesis for Mealy FSM Pα (Γ19 ). There is already executed the first step of synthesis. We have the set A with M0 = 15. There are two stages in the executing the step 2. The first stage is devoted to finding LCSs for each main input MIg. The second stage is reduced to connecting together some initial LCSs. The following rule is used for connection of chains: if there are transitions from Og only into MIq, then the LCSs αg and αq are connected into a single LCS αg ∗ αq . Let us discuss this step for the GSA Γ19 . Let a set MI(Γi ) include main inputs of LCS for some GSA Γi . Using Definition 8.4 gives the set MI(Γ19 ) = {a1 , a2 , a5 , a6 , a8 , a10 }. Using procedure from [11], we can construct the set
220
8 Synthesis of Mealy FSMs with Counters
Fig. 8.7 Initial GSA Γ19
+1
Fig. 8.8 Structural diagram of Pα Mealy FSM X
y0 BIMF
Φ
CT
Start Clock
T
BMO
Y
8.2 Using Counters in Mealy FSMs
221
Fig. 8.9 Outcome of natural state assignment for Mealy FSM Pα (Γ19 )
C = {α1 , . . . , α6 } where α1 = a1 , α2 = a3 , a4 , α3 = a5 , a7 , α4 = a6 , a15 , α5 = a8 , a9 and α6 = a10 , . . . , a14 . So, there is O1 = a1 and MI2 = a2 . It is possible to create the following formula of transitions for a1 : a1 → x1 a2 ∨ x¯1 x2 a2 ∨ x¯1 x¯2 a2 . So, it is possible to connect together the LCSs α1 and α2 . It gives the LCS α1 = a1 , a2 , a3 , a4 . Other chains cannot be connected. Now, we have the partition C = {α1 , α3 , . . . , α6 }. Let us execute the natural state assignment for states am ∈ A. Using approaches [16] gives the outcome shown in Fig. 8.9. As follows from Fig. 8.9, the condition (8.5) takes place for any pair of adjacent states am , as . Let us construct the structure table of Pα Mealy FSM. It is practically the same as an ST for P Mealy FSM. But there is a column y0 in the ST of Pα FSM. If the condition (8.5) takes place for a pair am , as , then there is y0 = 1 in the row corresponding to the transitions from am into as . Using this table, it is possible to derive the systems Φ, Y and the function y0 . The system Φ is used to implement the circuit of BIMF. The system Y and function y0 are used for implementing the circuit of BMO. The ST of Mealy FSM Pα (Γ19 ) is represented by Table 8.3. The following condition takes place: K(a5 ) = K(a4 ) + 1. Because of it, we write y0 = 1 in the row 6 of Table 8.3. The same rule is used for the row 19. There is H0 = 21 for Table 8.3. But only three terms are used in functions Φ. The terms correspond to the rows 7, 8 and 21. After minimizing, the following system could be derived from Table 8.3: D1 = T2 T3 T4 x¯5 x¯6 ; D3 = T¯1 T¯2 T3 T4 x3 x¯4 ∨ T2 T3 T4 x¯5 x¯6 ;
D2 = T¯1 T¯2 T3 T4 x3 x¯4 ; D4 = T¯1 T¯2 T3 T4 x¯3 .
(8.10)
The ST is used to derive the equations for y0 and yn ∈ Y . For example, the following equations could be derived from Table 8.3: y0 = T¯3 T¯4 ∨ T¯1 T¯2 T¯3 ∨ T1 T2 T¯3 ∨ T¯2 T¯4 ∨ T1 T¯2 T3 ∨ T¯1 T¯2 T3 T4 x3 x4 ∨ T¯1 T2 T3 T 4x¯5 x¯6 ; y1 = F1 ∨ F4 ∨ F11 ∨ F18 ∨ F19 ∨ F21 . (8.11) Two last terms in function y0 correspond to the rows 6 and 19 of Table 8.3. We added 1’s in these rows to simplify the functions Dr ∈ Φ. But it leads to more complex equation for y0 . Without these terms, we have y0 = y0 (T ). Adding these terms leads to y0 = y0 (T , X ).
222
8 Synthesis of Mealy FSMs with Counters
Table 8.3 Structure table of Mealy FSM Pα (Γ19 ) am K(am ) as K(as ) Xh a1
0000
a2 a3 a4
0001 0010 0011
a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15
0100 0110 0101 1000 1001 1010 1011 1100 1101 1110 0111
a2 a2 a2 a3 a4 a5 a6 a2 a7 a15 a1 a9 a1 a11 a12 a13 a14 a1 a8 a1 a10
Fig. 8.10 Structural diagram of LUT-based Pα Mealy FSM
0001 0001 0001 0010 0011 0100 0110 0001 0101 0111 0000 1001 0000 1011 1100 1101 1110 0000 1000 0000 1010
x1 x¯1 x2 x¯1 x¯2 1 1 x3 x4 x¯3 x4 x¯3 1 1 1 1 1 1 1 1 1 1 x5 x¯5 x6 x¯5 x¯6
y0
Yh
Φh
h
1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 0
y1 y2 y3 y8 y4 y5 y1 y2 y9 y2 y3 y8 y2 y3 y9 y4 y6 y4 y5 y2 y7 y2 y7 y8 y2 y3 y8 y4 y5 y2 y7 y3 y2 y3 y2 y7 y9 y3 y9 y1 y2 y1 y3 y9 y8 y9 y1 y2 y8
− − − − − − D2 D3 D4 − − − − − − − − − − − − D1 D3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
+1 X
y0 LUTerΦ
Φ
CT
T
LUTerY
Y
Start Clock
To execute the last step, we should know the characteristics of logic elements used for implementing the FSM circuit. Using LUTs leads to the structural diagram shown in Fig. 8.10. In the case of LUT-based Pα Mealy FSM, each equation from (8.10)–(8.11) should be transformed if the number of literals exceeds S. Then each new equation should be turn into a truth table. We do not discuss this step for our example. Let us consider a GSA Γ20 (Fig. 8.11). It is marked by states of Mealy FSM. There are the following peculiarities in Mealy FSM Pα (Γ20 ):
8.2 Using Counters in Mealy FSMs
223
Fig. 8.11 Marked GSA Γ20
+1
Fig. 8.12 Structural diagram of PR Mealy FSM
y0 “0”
CT
Start Clock
T
BMO
Y
X
1. There is the same state of transition as ∈ A for all transitions from the state am ∈ A. For example there are three transitions from state a1 ∈ A. All these transitions are executed into the state a2 ∈ A. Next, there are two transitions from the state a2 ∈ A. All they are executed into the state a3 ∈ A. 2. There are no “jumps” either back or forward. Each state determines a level of GSA Γ . The transitions are possible only from level i to level i + 1. Let us call such a GSA regular. Obviously, there is only a single LCS α1 for regular GSAs. It is the LCS α1 = a1 , . . . , a4 in the discussed example. Let us execute the natural state assignment for Mealy FSM Pα (Γ20 ). It gives the following codes: K(a1 ) = 00, . . . , K(a4 ) = 11. There are only unconditional transitions between the states of Pα (Γ20 ). Let us use the symbol PR (Γi ) for similar automata. There is the structural diagram of PR Mealy FSM shown in Fig. 8.12.
224
8 Synthesis of Mealy FSMs with Counters
Table 8.4 Structure table of Mealy FSM PR (Γ20 ) am K(am ) Xh Yh a1
00
a2
01
a3 a4
10 11
x1 x¯1 x2 x¯1 x¯2 x3 x¯3 1 x1 x¯1
y1 y2 y4 y3 y5 y3 y2 y4 y1 y2 y3 y4
y0
h
0 0 0 0 0 0 1 1
1 2 3 4 5 6 7 8
As follows from Fig. 8.12, there are no input memory functions Dr ∈ Φ in the case of PR FSMs. The pulse Start loads zero code K(a1 ) into CT. During each cycle, the content of CT is incremented by the pulse Clock. The BMO generates functions Y (T , X ) and y0 (T , X ). To execute the transition am , a1 , it is necessary to generate y0 = 1. It loads the zero code from input “0” into CT. Table 8.4 represents the ST for Mealy FSM PR (Γ20 ). There is no column Φh in this table. There is y0 = 1 only in two last rows of Table 8.4. If y0 = 0, then the content of CT is incremented. In the case of Moore FSM Pα (Γ20 ), it is necessary to create 8 LCSs. Also, it is the block BIMF in Moore FSM Pα (Γ20 ). This table is a base to derive the system (1.2) and y0 = y0 (T ). For example, the following equations could be derived from Table 8.4: y1 = T¯1 T¯2 x1 ∨ T1 T¯2 ; y0 = T1 . In [11], it is proposed to use LCS having more than a single output. They are called extended LCS(ELCS). The same approach could be used for Mealy FSMs. Let us denote them by the symbol PE . There are the same structural diagrams for Pα and PE Mealy FSMs. There is only a single difference in their synthesis method. Namely, it is necessary to find the set CE of ELCSs in the case of PE Mealy FSM. In this Chapter, we propose a “greedy” algorithm for solution of this problem. It is the following: 1. To form the set MI(Γj ) for a particular GSA Γj . 2. To construct an LCS α1 starting from the state a1 . To exclude a1 from MI(Γj ). 3. To find the state of transition from the output of α1 such that all logical conditions are equal to 1. Let it be a state as . 4. If as ∈ MI (Γj ), then LCS α1 continues starting from the state as . The state as is excluded from MI(ΓJ ). 5. The chain Γ1 is finished if as = a1 . 6. If MI(Γj ) = ∅, then all chains are constructed. Otherwise, the process starts for the LCS α2 . If starts from the state am ∈ MI (Γj ) with the smallest value m. Let us apply this algorithm for GSA Γ19 . There is the set MI (Γ19 ) = {a1 , a2 , a5 , a6 , a8 , a10 }. Let us start from the state a1 . There is the LCS α1 = a1 . If x1 = 1, then
8.2 Using Counters in Mealy FSMs
225
Table 8.5 Constructing the partition CE for Mealy FSM PE (Γ19 ) am a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 α1 α2 α3 K(am )
1
2
3
4
5
1
2
3
4
a12
a13
a14
12 11
13 12
14 13
15 14
a15
6 7
0
a11
6
9 5
8
10 9
8 11 10
7
Fig. 8.13 Outcome of natural state assignment for Mealy FSM Pα (Γ19 )
as = a2 . Because a2 ∈ MI (Γ19 ), we can continue the chain α1 . Now, there is α1 = a1 , a2 , a3 , a4 . The states a1 and a2 are excluded from MI (Γ19 ), If x3 = x4 = 1, then as = a5 . Because as ∈ MI (Γ19 ), we can form the chain α1 = a1 , a2 , a3 , a4 , a5 , a7 . It gives the set MI (Γ19 ) = {a6 , a8 , a10 }. Because as = a1 , the LCS α1 is finished. The LCS α2 starts from a6 . So, there is α2 = a6 , a15 and MI (Γ19 ) = {a8 , a10 }. Now, we have as = a8 for x5 = 1. Because a8 ∈ MI (Γ19 ), we can get α2 = a6 , a15 , a8 , a9 and MI (Γ19 ) = {a10 }. There is a pair a9 , a1 , so the LCS α2 is finished. The chain α3 starts from a10 . We have the LCS α3 = a10 , . . . , a14 and MI (Γ19 ) = ∅. Now, all states am ∈ A are distributed among the LCSs So, the process is terminated. As a result, we have the set CE = {α1 , α2 , α3 }. Table 8.5 shows the process of creating the partition CE for Mealy FSM PE (Γ19 ). The last row contains decimal equivalents of state codes K(am ). Let us consider an intersection of the row αg and the column am . It contains the number NS equal to the number of the step when the state am was included into ELC αg . The decimal code K(am ) is calculated as NS − 1. Obviously, it gives the natural state assignment for PE FSM. State codes K(am ) are obtained as binary equivalents of the decimal codes. There is R0 = 4 for a given example. There is the outcome of natural state assignment for PE (Γ19 ) shown in Fig. 8.13. As follows from comparison Figs. 8.9 and 8.13, the state codes are the same for both Pα (Γ19 ) and PE (Γ19 ). But it is just a coincidence. Due to this coincidence, there are the same structure tables for Pα (Γ19 ) and PE (Γ19 ). They are represented by Table 8.3. So, the input memory functions of PE (Γ19 ) are represented as (8.10) and its microoperations as (8.11).
226
8 Synthesis of Mealy FSMs with Counters
8.3 Structural Decomposition for Counter-Based Mealy FSMs It is possible to use all methods of structural decomposition discussed before for Mealy FSMs with counters. Let us discussed the following methods: 1. 2. 3. 4.
Replacement of logical conditions [5]. Mixed encoding of CMOs. Transformation of object codes. Twofold state assignment.
Let us discuss these approaches for Mealy FSM based on ELCSs. Let it be possible to use only a single EMB for implementing some parts of FSM circuit. Let us replace the logical conditions xe ∈ X by additional variables pg ∈ P = {p1 , . . . , pG }. Let us form the system P(T , X ). Let the condition (6.1) take place for given combination of parameters R0 , L, SA , tF and V0 . It means that the BRLC could be implemented as EMBer (having a single EMB). It leads to MPE1 Mealy FSM (Fig. 8.14). The EMB generates functions (2.1), the LUTerΦ the functions (2.3), the LUTerY the functions (2.12) and y0 . We hope that the principle of operation of MPE1 is clear for a reader. Let the following conditions take places: Δt = tF − G > 0;
(8.12)
Δt = tF − G < R0 .
(8.13)
So, there are free outputs of EMB, but not all functions Dr ∈ Φ could be implemented by EMB. In this case, we propose to represent the set Φ as Φ = ΦL ∪ ΦE where ΦL ∩ ΦE = ∅ and |ΦE | = Δt. The EMB implements functions Dr ∈ ΦE , the LUTerΦ the functions Dr ∈ ΦL . The principle of dividing is very simple. A designer should: 1. To find Boolean equations for functions Dr ∈ Φ. 2. If G + R ≤ S, then any Δt functions could be placed into ΦE .
Fig. 8.14 Structural diagram of MPE1 Mealy FSM
8.3 Structural Decomposition for Counter-Based Mealy FSMs +1
ΦE X
EMB
227
y0
P LUTerΦ
ΦL
CT
T
Start Clock
LUTerY
Y
P
Fig. 8.15 Structural diagram of MPE2 Mealy FSM +1
Fig. 8.16 Structural diagram of MPE3 Mealy FSM X
y0 EMB
Φ
CT
Start Clock
T
LUTerY
Y
P
3. If G + R > S, then it is necessary to calculate the number L(Dr) equal to the number of literals in the SOP of Dr (r ∈ {1, . . . , R0 }). 4. To organize the queue of functions Dr ∈ Φ in the descending order of L(Dr ). 5. To select the first Δt elements of the queue and place them into ΦE . This approach leads to MPE2 Mealy FSM (Fig. 8.15). Let us point out that all functions Dr ∈ Φ could be implemented by EMB. To do it, the following condition should take place: (8.14) Δt = R0 . It leads to MPE3 Mealy FSM (Fig. 8.16). Of course, there is no selection of functions Dr ∈ Φ in this case. There is no LUTerΦ in MPE3 Mealy FSM. Let the following condition take place: R0 + N > Δt > R0 .
(8.15)
In this case, we can divide the set Y ∪ {y0 } by two sets YL and YE . The EMB implements MOs yn ∈ YE , the LUTerY the MOs yn ∈ YL . If R0 + G ≤ S, then there is no influence of selection of MOs on the number of LUTs in LUTerY. Otherwise, it is necessary to find the numbers L(yn ) (n ∈ {0, 1, . . . , N }). After that, the selection of MOs is executed as it is proposed for selection of input memory functions for MPE2 Mealy FSM. If (8.15) is true, we propose the model of MPE4 Mealy FSM (Fig. 8.17). Its circuit has only EMB and LUTerY. So, there are four different structural diagrams of MPEi Mealy FSMs. The number of LUTs in LUTerΦ and LUTerY diminishes with the growth of i. To select the best
228
8 Synthesis of Mealy FSMs with Counters
Fig. 8.17 Structural diagram of MPE4 Mealy FSM
X
EMB
YE
Table 8.6 Table of RLC for Mealy FSM PE (Γ19 ) am a1 a2 a3 a4 a5 a6 a7 a8 p1 2
x1 x2
– –
– –
x3 x4
+1
y0
– –
– –
– –
– –
Φ
CT
T
LUTerY
Start Clock
YL
P
a9
a10
a11
a12
a13
a14
a15
– –
– –
– –
– –
– –
– –
x5 x6
Fig. 8.18 Karnaugh map for function p1
possible model, it is necessary to analyse the conditions (8.12)–(8.15) for a given GSA Γ and FPGA chip. Let us discuss an example of synthesis of MPEi Mealy FSM for GSA Γ19 . Let us use LUTs with S = 5 and EMB with configurations 2048 × 1, 1024 × 2, 512 × 4 and 256 × 8. There are the obvious steps in the method of synthesis. For GSA Γ19 , we have executed the following: (1) we are found the set A with M0 = 15; (2) we constructed the set CE ; (3) we executed the natural state assignment (Fig. 8.13); (4) we constructed the ST of Mealy FSM PE (Γ19 ) (Table 8.3). Let us execute the RLC. We can find that G = 2. Let us replace the logical conditions as it is shown in Table 8.6. It there is “–” in some cell of the table of RLC, we could use the corresponding code K(am ) as an insignificant input assignment [6, 9]. Using this approach and codes K(am ) from Fig. 8.13, we can get the following system: p1 = T¯3 x1 ∨ T¯2 T3 x3 ∨ T2 x5 ;
p2 = T¯3 x2 ∨ T¯2 T3 x4 ∨ T2 x6 .
(8.16)
To understand (8.16), one can consider the Karnaugh map for function p1 (Fig. 8.18). It includes 13 insignificant input assignments. There is the same Karnaugh map for p2 but the subscript l should be replaced by l + 1 (l ∈ {1, 3, 5}).
8.3 Structural Decomposition for Counter-Based Mealy FSMs Table 8.7 Structure table of Mealy FSM MPα (Γ19 ) am K(am ) as K(as ) Ph a1
0000
a2 a3 a4
0001 0010 0011
a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15
0100 0110 0101 1000 1001 1010 1011 1100 1101 1110 0111
a2 a2 a2 a3 a4 a5 a6 a2 a7 a15 a1 a9 a1 a11 a12 a13 a14 a1 a8 a1 a10
0001 0001 0001 0010 0011 0100 0110 0001 0101 0111 0000 1001 0000 1011 1100 1101 1110 0000 1000 0000 1010
P1 P¯1 P2 P¯1 P¯2 1 1 P1 P2 P¯1 P2 P¯1 1 1 1 1 1 1 1 1 1 1 P1 P¯1 P2 P¯1 P¯2
229
y0
Yh
Φh
h
1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 0
y1 y2 y3 y8 y4 y5 y1 y2 y9 y2 y3 y8 y2 y3 y9 y4 y6 y4 y5 y2 y7 y2 y7 y8 y2 y3 y8 y4 y5 y2 y7 y3 y2 y3 y2 y7 y9 y3 y9 y1 y2 y1 y3 y9 y8 y9 y1 y2 y8
− − − − − − D2 D3 D4 − − − − − − − − − − − − D1 D3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Let us construct the structure table of Mealy FSM MPE (Γ19 ). To do it, we should replace the column Xh of Table 8.3 by the column Ph . It leads to Table 8.7. Now, let us analyse which model of MPE FSM could be used. Analysis of (8.16) shows that this system depends on state variables T2 and T3 . So, only two variables Tr ∈ T and 6 logical conditions should be connected with EMB. So, it should be SA ≤ 8. We could use the configuration 256x8 with SA = 8 and tF = 8. Maybe, it is possible to implement G = 2 functions pg ∈ P and R0 = 4 functions Dr ∈ Φ using EMB. Let us analyse the system (8.10). It does not include an equation depended only on T2 and T3 . So, we cannot implement any equation of the system Φ = Φ(T , X ) using EMB. If we take SA = 9, then tF = 4. The function D1 ∈ Φ could be implemented by EMB if T2 , T3 , T4 enter its inputs. So, there is ΦE = {D1 }, ΦL = {D2 , D3 , D4 }. So, we can use the model MPE2 (Γ19 ). Let us construct the table of EMB. It is constructed using the table of RLC and the ST of MPE FSM. There are the following columns in the table of EMB: T , X , P, ΦE , h. In the discussed case, there are 512 rows in this table. A part of this table is represented by Table 8.8.
230
8 Synthesis of Mealy FSMs with Counters
Table 8.8 Part of the table of EMB of Mealy FSM MPE2 (Γ19 ) T X P ΦE T2 T3 T4 x1 x2 x3 x4 x5 x6 P1 P2 D1 111 111 111 111 111 111 111 111
000000 000001 000010 000011 111100 111101 111110 111111
00 01 10 11 00 01 10 11
1 0 0 0 1 0 0 0
h 448 449 450 451 509 510 511 512
As follows from (8.16), if T2 = 1, then p1 = x5 and p2 = x6 . So, the columns x5 and x6 coincide with the columns p1 and p2 of Table 8.8, respectively. As follows from (8.10), there is D1 = 1 if T2 = T3 = T4 = 1 and x5 = x6 = 0. It corresponds to rows 448 and 509 of Table 8.8. We hope that the principle of filling the table of EMB is clear for the reader. To construct truth tables of LUTs from the blocks LUTerΦ and LUTerY, it is necessary to find functions Dr = Dr (T , P) and Y = Y (T , P). These functions are derived from the ST of MPE Mealy FSM. For example, the following equations could be derived from Table 8.7: D2 = T¯1 T¯2 T3 T4 p1 p¯2 ; D4 = T¯1 T¯2 T3 T4 p¯1 ; y0 = T¯3 T¯4 ∨ T¯1 T¯2 T¯3 ∨ T1 T2 T¯3 ∨ T¯2 T4 ∨ T1 T¯2 T3 ∨ ∨T¯1 T¯2 T3 T4 p1 p2 ∨ T¯1 T2 T3 T4 p¯1 p¯2 .
(8.17)
Acting in the same way, it is possible to get equitations for Dr ∈ ΦL and yn ∈ Y ∪ {y0 }. Next, these equations should be analysed. There is L(D2 ) = 6 > S = 5. So, the equation D2 should be transformed. For example, the following two equations could be used: f1 = T¯1 T¯2 T3 T4 p1 and D2 = f1 p¯2 . Next, there is L(D4 ) = 5 = S. So, there is a single LUT in the circuit of y1 . There is L(y0 ) = 6 > S. So, it is necessary to transform this equation. And so on, and so far. Let us memorise that it is necessary to apply the functional decomposition for some functions implemented by LUTerΦ and LUTerY of Mealy FSM MPE2 (Γ19 ). Let us discuss the FSMs based on counters and encoding of CMOs. Let us encode CMOs Yq ⊆ Y by binary codes K(Yq ) having RQ bits. The value of RQ is determined by (2.24). Let us use the variables zr ∈ Z for encoding of CMOs where |Z| = RQ . It leads to PE Y Mealy FSM (Fig. 8.19). Let us do not discuss this model. Its synthesis method is clear. Let we can use a single EMB. Let the condition (7.1) take place. In this case, it is possible to to implement functions Φ = Φ(T , X ) and Z = Z(T , X ) by EMB. Let it be the following relation: Δt = 1. In this case we propose to implement y0 as an output of EMB. It results in PE1 Y Mealy FSM (Fig. 8.20).
8.3 Structural Decomposition for Counter-Based Mealy FSMs
231
+1 y0 X LUTerΦ
Φ
T
CT
LUTerZ
LUTerY
Z
Y
Start Clock
Fig. 8.19 Structural diagram of PE Y Mealy FSM Fig. 8.20 Structural diagram of PE1 Y Mealy FSM
X
EMB
Φ
Z
+1
y0
CT
T
LUTerY
Y
Start Clock
Fig. 8.21 Structural diagram of PE2 YM Mealy FSM
X
EMB
YE
Φ
Z
+1
y0
CT
T
LUTerY
Y
Start Clock
We hope there is no problem in understanding how to design the circuit of PE1 Y FSM. The table of EMB will contain the columns T , X , Φ, y0 , Z, h. Using the mixed encoding of MOs, we could diminish the number of LUTs and interconnections for the circuit of LUTerY. If the following condition takes place Δt = tF − (R0 + RQ ) ≥ 1,
(8.18)
it is possible to implement functions y0 , Φ, Z using EMB. Next, the set Y could be represented as YL ∪ YE (as we do it in Chap. 7). It leads to PE2 YM Mealy FSM (Fig. 8.21). We propose the method of synthesis for PE2 YM Mealy FSMs. It includes the following steps: 1. Marking the GSA Γ by states of Mealy FSM. 2. Constructing the partition CE = {α1 , . . . , αG } for the set A. 3. Executing the natural state assignment.
232
8 Synthesis of Mealy FSMs with Counters
Table 8.9 Collections of MO yn ∈ YL for Mealy FSM PE2 YM (Γ19 ) q Yq q Yq q Yq 1 2
∅ y1 y2
3 4
y3 y4 y5
5 6
y2 y3 y4 y6
q
Yq
7 8
y1 y3 y2 y7
Fig. 8.22 Kodes of CMOs for PE2 YM (Γ19 )
4. 5. 6. 7. 8. 9.
Constructing the CMOs Yq ⊆ Y . Executing the mixed encoding of CMOs Yq ⊆ Y . Constructing the structure table of PE2 YM Mealy FSM. Constructing the table of EMB. Constructing the system YL = YL (Z). Implementing FSM logic circuit.
Let us discuss an example of synthesis for Mealy FSM PE2 YM (Γ19 ). There are already executed three fist steps of the proposed method. Let us start our example from the step 4. Analysis of GSA Γ19 allows finding the following CMOs: Y1 = {y1 , y2 }, Y2 = {y3 , y8 }, Y3 = {y4 , y5 }, Y4 = {y1 , y2 , y9 }, Y5 = {y2 , y3 , y8 }, Y6 = {y2 , y3 , y9 }, Y7 = {y4 , y6 }, Y8 = {y4 , y5 }, Y9 = {y2 , y7 }, Y10 = {y2 , y7 , y8 }, Y11 = {y1 , y3 , y8 }, Y12 = {y3 }, Y13 = {y2 , y3 }, Y14 = {y2 , y7 , y9 }, Y15 = {y3 , y9 }, Y16 = {y1 , y3 , y9 }, Y17 = {y8 , y9 }, Y18 = {y1 , y2 , y8 }. So, there is Q = 18. It gives RQ = 5 and Z = {z1 , . . . , z5 }. Let an FPGA chip have LUTs with S = 3 and EMB with the configuration 1024 × 10. So, there is RQ > S. Because tF = 10, it is possible to generate functions Dr ∈ Φ, y0 and zr ∈ Z by EMB. So, the condition (8.18) takes place. It means that we can use the model of PE2 YM Mealy FSM for GSA Γ19 . Using the method of dividing the set Y from Chap. 7, we can find the sets YE = {y8 , y9 } and YL = {y1 , . . . , y7 }. Excluding YE from Y leads to creating Q = 8 collections of microoperations Yq ⊆ YL (Table 8.9). Now, there is RQ = 3 = S. So, there is no need in functional decomposition for implementing the circuit of LUTerY. Let us execute the diminishing encoding of CMOs. There are the codes K(Yq ) shown in Fig. 8.22. Let us find the system Y = Y (Z) for yn ∈ YL . It is the following: y1 = Y2 ∨ Y7 ; y4 = Y4 ∨ Y6 ; y7 = Y8 ;
y2 = Y2 ∨ Y5 ∨ Y8 ; y5 = Y4 ;
y3 = Y3 ∨ Y5 ∨ Y7 ; y6 = Y6 ;
(8.19)
There are no insignificant input assignments in the Karnaugh map (Fig. 8.22). Because of it only two functions could be minimized: y1 = z¯1 z2 and y4 = z1 z¯3 . All
8.3 Structural Decomposition for Counter-Based Mealy FSMs Table 8.10 Structure table of Mealy FSM PE2 YM (Γ19 ) am K(am ) as K(as ) Xh y0 a1
0000
a2 a3 a4
0001 0010 0011
a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15
0100 0110 0101 1000 1001 1010 1011 1100 1101 1110 0111
a2 a2 a2 a3 a4 a5 a6 a2 a7 a15 a1 a9 a1 a11 a12 a13 a14 a1 a8 a1 a10
0001 0001 0001 0010 0011 0100 0110 0001 0101 0111 0000 1001 0000 1011 1100 1101 1110 0000 1000 0000 1010
x1 x¯1 x2 x¯1 x¯2 1 1 x3 x4 x3 x¯4 x¯3 1 1 1 1 1 1 1 1 1 1 x5 x¯5 x6 x¯5 x¯6
1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 0
233
YE
Yq
Φh
h
− y8 − y9 y9 y9 − − − y8 y8 − − − − y9 y9 − y9 y8 y9 y8
Y2 Y3 Y4 Y2 Y5 Y5 Y6 Y4 Y8 Y8 Y7 Y4 Y8 Y3 Y5 Y8 Y3 Y2 Y7 Y1 Y2
− − − − − − D2 D3 D4 − − − − − − − − − − − − D1 D3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
other functions yn ∈ Y have three literals. So, there are 7 LUTs and 19 interconnections in the circuit of LUTerY. Without the mixed encoding of CMOs, it is necessary to implement 9 Boolean functions. Each function has RQ = 5 literals. It gives 7 LUTs with S=3 for implementing each functions. So, there are 63 LUTs and 315 interconnections in the circuit of LUTerY for Mealy FSM PE1 Y (Γ19 ). There are the following columns in the ST of PE2 YM Mealy FSM: am , K(am ), as , K(as ), Xh , y0 , YE , Yq , K(Yq ), Φh , h. It is Table 8.10 in the discussed case. Table 8.10 is constructed on the base of Table 8.3. The column YE includes MOs y8 , y9 ∈ YE . The CMOs Yq are taken from Table 8.9, their codes from Fig. 8.22. The table of EMB is constructed using Table 8.10 (in the discussed case). It includes the columns K(am ), X , y0 , YE , Z, Φ, h. For PE2 YM (Γ19 ) there are 1024 rows in this table. It is necessary H (am ) = 64 rows to represent transitions from a state am ∈ A. There is a part of table of EMB represented by Table 8.11. There are decimal equivalents of cell addresses shown in column h0 . There is the number h of the row of Table 8.10 shown in the column h of Table 8.11. Only 8 from 64 possible rows are shown in Table 8.11 (for the state a15 ∈ A). We hope that there is a clear correspondence among Tables 8.3 and 8.11.
234
8 Synthesis of Mealy FSMs with Counters
Table 8.11 Part of the table of EMB of Mealy FSM PE2 YM (Γ19 ) K(am ) X y0 YE Z Φ T2 T3 T4 x1 x2 x3 x4 x5 x6 y8 y9 z1 z2 z3 D1 D2 D3 D4 0111 0111 0111 0111 0111 0111 0111 0111
000000 000001 000010 000011 111100 111101 111110 111111
0 0 1 1 0 0 1 1
10 11 01 01 10 11 01 01
010 000 011 011 010 000 011 011
1010 0000 0000 0000 1010 0000 0000 0000
h0
h
448 449 450 451 508 509 510 511
21 20 19 19 21 20 19 19
Fig. 8.23 Structural diagram of MPE YM Mealy FSM
The step 8 is already executed. Its outcome is represented by the system (8.19). As always, we do not discuss the last step of the proposed design method. It is possible to use together the RLC and the mixed encoding of CMOs. For example, let the following condition be true: 2L+R0 · (G + RQ + 1) ≤ V0 .
(8.20)
In this case, the EMB could implement the systems P = P(T , X ), Z = Z(T , X ) and y0 = y0 (T , X ). It is necessary to use the LUTerΦ to implement functions Φ = Φ(T , X ) and the LUTerY functions Y = Y (Z). Let the following condition take place: (8.21) Δt = tF − (G + RQ + 1) ≥ 0. It allows dividing the set Y by sets YL and YE . In this case, we propose the model of MPE YM Mealy FSM shown in Fig. 8.23. The method of synthesis for MPE YM Mealy FSM combines the steps from the methods for MPE and PE YM FSMs. Let us only discuss how to create the table of EMB for the FSM MPE YM (Γ19 ). There are the following columns in the table of EMB: K(am ), P, y0 , YE , Z, h0 .
8.3 Structural Decomposition for Counter-Based Mealy FSMs Fig. 8.24 Structural diagram of MPE1 Mealy FSM
235
EMB
P LUTerΦ
Table 8.12 Part of the table of EMB of Mealy FSM MPE YM (Γ19 ) K(am ) X y0 P YE Z Φ h0 T2 T3 T4 x1 x2 x3 x4 x5 x6 p1 p2 y8 y9 z1 z2 z3 D1 D2 D3 D4 000000 000001 000010 000011 111100 111101 111110 111111
0 0 1 1 0 0 1 1
00 01 10 11 00 01 10 11
10 11 01 01 10 11 01 01
010 000 011 011 010 000 011 011
1010 0000 0000 0000 1010 0000 0000 0000
Φ
CT
T
Start Clock
Y
0111 0111 0111 0111 0111 0111 0111 0111
+1
y0 X
448 449 450 451 508 509 510 511
h 21 20 19 19 21 20 19 19
Let us use the RLC shown in Table 8.6, the state codes from Fig. 8.13, the partition of the set Y shown in Table 8.10, the codes of CMOs from Fig. 8.22. Using this information, let us show the part of the table of EMB (Table 8.12). To form Table 8.12, we take all columns from Table 8.11 (without the column Φ). The column P could be taken from Table 8.8 or it could be filled using the system (8.16). Let the following condition take place: 2L+R0 · (G + N + 1) ≤ V0 .
(8.22)
In this case, the EMB implements functions P, y0 and Y . It leads to MPE1 Mealy FSM (Fig. 8.24). In this case, the table of EMB is represented by Table 8.13. It is obtained on the base of Table 8.12. To fill the column Y of Table 8.13, we use the column Yh of Table 8.7. All other columns could be taken from Table 8.12. It is possible to use EMB for generating some of functions yn ∈ Y and Dr ∈ Φ. Other functions are implemented using LUTerYΦ. It leads to MPE2 Mealy FSM (Fig. 8.25). The EMB implements functions y0 (T , X ), Φ(T , X ), P(T , X ) and YE (T , P). The main problem is to divide the sets Φ and Y in such a manner that it minimizes the value of NY Φ .
236
8 Synthesis of Mealy FSMs with Counters
Table 8.13 Part of the table of EMB of Mealy FSM MPE1 (Γ19 ) K(am ) X y0 P Y h0 T2 T3 T4 x1 x2 x3 x4 x5 x6 P1 P2 y1 y2 y3 y4 y5 y6 y7 y8 y9 0111 0111 0111 0111 0111 0111 0111 0111
000000 000001 000010 000011 111100 111101 111110 111111
0 0 1 1 0 0 1 1
00 01 10 11 00 01 10 11
110000010 000000011 101000001 101000001 110000010 000000011 101000001 101000001
h
448 449 450 451 508 509 510 511
Fig. 8.25 Structural diagram of MPE2 Mealy FSM
21 20 19 19 21 20 19 19
+1
y0 X
ΦE EMB
P LUTerYΦ
YE
YL
Φ
CT
T
Start Clock
Let the following condition take place: 2L+R0 · (N + R0 + 1) ≤ V0 .
(8.23)
In this case, all functions yn ∈ Y , Dr ∈ Φ and y0 could be implemented using a single block EMB. But in this case, there is no optimization needed for any function of FSM. So, there is no sense in using the counter. It could be replaced by the state register RG. Also, there is no need in the function y0 . We do not discuss this trivial circuit in our book. It is possible to use the transformation of object codes in the case of PE Mealy FSMs. Four approaches could be used: 1. The transformation of CMOs Yq ⊆ Y and identifiers Im ∈ I into state codes K(am ). It leads to PEY Y Mealy FSMs (Fig. 8.26). In PEY Y Mealy FSMs, the LUTerZV generates the functions vr ∈ V and zr ∈ Z used for encoding of the identifiers and CMOs, respectively. The LUTerY implements the system Y = Y (Z) and the function y0 is used for control of the mode of operation of CT. The LUTerΦ implements input memory functions Φ = Φ(Z, V ). 2. The transformation of state codes and identifiers into microoperations. It leads to PEA Mealy FSMs (Fig. 8.27).
8.3 Structural Decomposition for Counter-Based Mealy FSMs
LUTerY
Z X
LUTerZV
V
LUTerΦ
237 +1
y0
Φ
CT
T
Start Clock
Fig. 8.26 Structural diagram of PEY Y Mealy FSM +1 X
y0 LUTerΦV
Φ
CT
Start Clock
T
LUTerY
Y
V
Fig. 8.27 Structural diagram of PEA Mealy FSM
In PEA Mealy FSMs, the LUTerΦV implements the functions vr ∈ V and Dr ∈ Φ. The LUTerY implements the functions y0 = y0 (T , V ). 3. The transformation of state codes and identifiers into codes of CMOs. It leads to PEA Y Mealy FSMs (Fig. 8.28). In PEA Y Mealy FSMs, the LUTerZ implements functions zr ∈ Z where Z = Z(T , V ). The LUTerY generates the functions Y = Y (Z). 4. The transformation of microoperations yn ∈ Y and identifiers Im ∈ I into state codes. It leads to PEY Mealy FSMs (Fig. 8.29). In PEY Mealy FSMs, the LUTerYV implements microoperations yn ∈ Y and functions vr ∈ V . These functions depend on logical conditions xe ∈ X and state variables Tr ∈ T . Also, the function y0 is generated by the LUTerYV. The LUTerΦ generates functions Φ = Φ(Y , V ). Of course, it has sense for small values of N = |Y |. To develop a synthesis method for any of these FSMs, it is necessary to combine together the synthesis methods for FSMs with the transformation of objects [5, 8] and for FSMs with counters. For example, there are the following steps in the synthesis of PEA Y Mealy FSMs: 1. 2. 3. 4. 5.
Constructing the set of states A for a given GSA Γ . Creating the CMOs Yq ⊆ Y . Finding the set of identifiers I . Representing CMOs by the pairs . Executing the natural state assignment.
238
8 Synthesis of Mealy FSMs with Counters +1 X
y0 LUTerΦV
Φ
CT
T
LUTerZ
Start Clock
Z
LUTerY
Y
V
Fig. 8.28 Structural diagram of PEA Y Mealy FSM +1
y0 X
V
LUTerYV
LUTerΦ
Φ
CT
T
Start Clock
Y
Fig. 8.29 Structural diagram of PEY Mealy FSM +1 X
y0 EMBerΦV
Φ
CT
T
Start Clock
LUTerZ
Z
LUTerY
Y
V
Fig. 8.30 Structural diagram of PHEA Y Mealy FSM
6. 7. 8. 9. 10. 11. 12.
Executing the diminishing encoding of CMOs. Executing the encoding of identifiers Im ∈ I . Creating the structure table of PE Y Mealy FSM. Constructing the table of LUTerΦV. Constructing the table of LUTerZ. Constructing the table of LUTerY. Implementing FSM logic circuit with particular LUTs.
Obviously, some LUTers could be replaced by EMB. For example, it is possible to use EMBerΦV instead of LUTerΦV. It turns the PEA Y Mealy FSM into PHEA Y Mealy FSM (Fig. 8.30). Because there is the block EMBer, it is possible to use the mixed encoding of microoperations. In this case, the function y0 and some MOs yn ∈ Y are generated
8.3 Structural Decomposition for Counter-Based Mealy FSMs
239
+1 y0
X
EMBerΦV
YE
Φ
CT
T
LUTerZ
Start Clock
Z
LUTerY
Y
V
Fig. 8.31 Structural diagram of PHEA YM Mealy FSM +1
1
X1
LUTer1
K
XK
LUTer1
Φ1
ΦK
y0 LUTerΦ
Φ
CT
T
Start Clock LUTer
LUTerZ Z LUTerY
Y
Fig. 8.32 Structural diagram of PET YM Mealy FSM
by EMBs. So, the set Y is divided by the sets YL and YE . It results in PHEA YM Mealy FSM (Fig. 8.31). Let the following conditions take places: 2L+R0 · (R0 + RI + 1) ≤ V0 ;
(8.24)
Δt = tF − (R0 + RI + 1) ≥ 0.
(8.25)
In this case, there is only a single EMB in the circuit of EMBerΦV. Recall, that RI = |V |. We hope that a reader understands how to synthesize FSMs shown in Figs. 8.25, 8.26, 8.27, 8.28, 8.29, 8.30, 8.31. It is possible to use counters in Mealy FSMs with the twofold state assignment. Let us consider the PE Y Mealy FSM (Fig. 8.19). The LUTerΦ could be replaced by blocks LUTer1,…,LUTerK. It should be added the LUTerT executing the transformation of state codes K(am ) into their codes C(am ). It results in PET Y Mealy FSM (Fig. 8.32). Different models are possible in this case. For example, it is possible: 1. To replace LUTerZ by LUTer1, …, LUTerK. 2. To replace both LUTerΦ and LUTerZ by LUTer1, …, LUTerK. 3. To replace LUTerΦ by LUTer1, …, LUTerK and to use EMBer instead of LUTerT and LUTerZ.
240
8 Synthesis of Mealy FSMs with Counters
Also, it is possible to use the replacement of logical conditions for all discussed models. It is possible to combine together: (1) the RLC; (2) the twofold encoding of states; (3) the encoding of CMOs (or mixed encoding of MOs); (4) transformation of states into CMOs (or microoperations); (5) transformation of CMOs (or microoperations) into states. Each new model requires a new method of synthesis. But it is possible to use some parts of known synthesis methods to get the required new method. We see the examples of this approach throughout this book. So, there is a lot of models of FSMs. To select the best model, the designer should take into account: 1. The specifics of a GSA Γ used for synthesis. 2. Parameters of LUTs and EMBs (if they are available). 3. Requirements to the final circuit of FSM (the minimal hardware, the maximal operating frequency, the minimal consumption of energy and so on). We hope that now our readers have a lot of new models of FSMs and methods of their synthesis. We hope it will be helpful in developing new projects involving usage of finite state machines for implementing logic circuits of control units.
References 1. Amann R, Baitinger U (1989) Optimal state chains and states codes in finite state machines. IEEE Trans Comput-Aided Des 8(2):153–170 2. Atmel (2019). http://www.atmel.com 3. Bacchetta P, Daldos L, Sciuto D, Silvano C (2000) Low-power state assignment techniques for finite state machines. In: Proceedings of the 2000 IEEE international symposium on circuits and systems (ISCAS’2000), vol. 2, IEEE, Geneva, pp 641–644 4. Baranov S (1994) Logic synthesis of control automata. Kluwer Academic Publishers 5. Baranov S (2008) Logic and system design of digital systems. TUT Press, Tallinn 6. Barkalov A, Bukowiec A (2005) Synthesis of mealy finite-states machines for interpretation of verticalized flow-charts. Theor Appl Inform 5(5):39–51 7. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of logic circuit of Moore FSM on CPLD. Pomiary Autom Kontrola 53(5):18–20 8. Barkalov A, Titarenko L, Chmielewski S (2007) Optimization of Moore FSM on CPLD. In: Proceedings of the sixth international conference CAD DD’07, vol 2. Minsk, pp 39–45 9. Barkalov A, We˛grzyn M, Wi´sniewski R (2006) Partial reconfiguration of compositional microprogram control units implemented on FPGAs. In: Proceedings of IFAC workshop on programmable devices and embedded systems (Brno), pp 116–119 10. Garcia-Vargas I, Senhadji-Navarro R, Jiménez-Moreno G, Civit-Balcells A, Guerra-Gutierrez P (2007) ROM-based finite state machine implementation in low cost FPGAs. In: IEEE international symposium on industrial electronics ISIE 2007. IEEE, pp 2342–2347 11. Habib S (1988) Microprogramming and firmware engineering methods. Wiley, New York 12. Hu H, Xue H, Bian J (2003) A heuristic state assignment algorithm targeting area. In: Proceedings of 5th international conference on ASIC, vol 1, pp 93–96 13. Iranli A, Rezvani P, Pedram M (2003) Low power synthesis of finite state machines with mixed D and T flip-flops. In: Proceedings of the Asia and South Pacific – DAC, pp 803–808 14. Nowicka M, Łuba T, Rawski M (1999) FPGA-based decomposition of boolean functions: algorithms and implementation. Adv Comput Syst 502–509
References
241
15. Solovjev V, Czyzy M (2001) Synthesis of sequential circuits on programmable logic devices based on new models of finite state machines. In: Proceedings of the EUROMICRO conference, milan, pp 170–173 16. Tatalov E (2011) Synthesis of compositional microprogram control units for programmable devices. Master’s thesis, Donetsk National Technical University, Donetsk
Conclusion
Now we are witnesses of the intensive development of design methods targeting FPGA-based circuits and systems. The complexity of digital systems to be designed increases drastically, as well as the complexity of FPGA chips used for the design. The up-to-day FPGAs include up to seven billion transistors and this is not a limit. Development of digital systems with such complex logic elements is impossible without application of hardware description languages, computer-aided design tools and design libraries. But even the application of all these tools does not guarantee that some competitive product will be designed at appropriate time-to-market. To solve this problem, a designer should know not only CAD tools, but the design and optimization methods, too. It is especially important in the case of such irregular devices as control units. Because of irregularity, their logic circuits are implemented without using the standard library cells; only LUTs and EMBs of a particular FPGA chip can be used in FSM logic circuit design. In this case, the knowledge and experience of a designer become a crucial factor of the success. Many experiments conducted with use of standard industrial packages show that outcomes of their operation are, especially in case of complex control units design, too far from optimal. Thus, it is necessary to develop own program tools oriented on FSM optimization and use them together with industrial packages. This problem cannot be solved without fundamental knowledge in the area of logic synthesis. Besides, to be able to develop new design and optimization methods, a designer should know the existed methods. We think that new FSM models and design methods proposed in our book will help in solution of this very important problem. We hope that our book will be useful for both the designers of digital systems and scholars developing synthesis and optimization methods targeting implementation FPGA-based logic circuits of finite state machines.
© Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7
243
Index
A Addressing of microinstructions, 2, 213 B Block of FSM, 226, 240 Boolean equation, 192, 226 function, 3, 4, 14, 23, 25, 27–29, 31, 34, 36, 61, 64, 67, 80, 119, 134, 136, 143, 153, 196, 233 system, 139 variable, 7, 8, 11, 107, 171 C Class of pseudoequivalent states, 23, 48, 91, 175 Code of microoperation, 181 Code of state, 215 Compatible microoperations, 23, 42, 46, 91, 111, 117 Compositional Microprogram Control Unit (CMCU), 26, 213 Control memory, 2, 3, 213 unit, 1–3, 13, 29, 32, 155, 213, 218, 240 D Decomposition functional, 25, 28, 51, 62, 69, 70, 82, 92, 140, 177, 182, 200, 202, 207, 230, 232 structural, 23, 25, 27, 34, 61, 122, 169, 181, 197, 201, 206, 213, 226
Design, 121, 127, 131, 136, 164, 175, 181, 187, 189, 193, 196, 213, 217, 219, 231 Diminishing encoding of collections of microoperations, 104, 107, 232, 238 states, 166
E Embedded memory block, 23, 28, 29, 151 EMBer, 151, 197, 217, 226, 238, 239 Encoding of classes of PES, 140, 145, 151, 164, 169, 197 collections of microoperations, 23, 41, 61, 80, 87, 91, 128, 137, 138, 151, 163, 169, 170, 183, 184, 188, 197, 199, 202, 206, 207, 209, 226, 230, 232–234, 240 fields of compatible microoperations, 42, 45, 46, 91, 111 logical conditions, 39, 41, 156 microoperations, 181, 183, 202, 207 states of FSM, 4, 10, 25, 39, 40, 61, 74, 127, 132, 137, 164, 166, 177, 240
F Field-Programmable Gate Array (FPGA), 28–33, 63, 83, 181, 196, 219, 228, 232 Finite State-Machine (FSM), 1, 3–6, 8–14, 17–21, 23–30, 32–53, 61–70, 72, 73, 75–83, 85, 87, 89, 91, 93–96, 98–101,
© Springer Nature Switzerland AG 2020 A. Barkalov et al., Logic Synthesis for FPGA-Based Control Units, Lecture Notes in Electrical Engineering 636, https://doi.org/10.1007/978-3-030-38295-7
245
246 103, 104, 107, 109, 111, 112, 117, 152, 157, 159, 169, 183, 187, 222, 229, 231, 237, 239, 240 combined, 196, 202 Mealy, 117–119, 121–123, 125–133, 135–139, 151, 152, 154–157, 159, 161–164, 166, 171–175, 181, 183, 185, 187–189, 194, 197, 200, 202, 207–209, 211, 213, 219–239 Moore, 103, 112, 117, 120, 121, 138– 140, 143–148, 151, 153–155, 157, 164–172, 175–178, 190–196, 200, 213, 215–218 Flip-flop, 3, 9, 12, 16, 31, 33, 37, 120, 122 Function Boolean, 119, 134, 136, 226 input memory, 9, 26, 37, 44, 63, 73, 215, 224, 225, 227
G Graph-schemes of algorithm, 122, 128, 132, 133, 137, 138, 140–142, 145, 146, 151, 155, 157–159, 164–166, 171– 173, 175, 177, 183, 185, 186, 196, 197, 199, 202–204, 209 marked, 123, 133, 190, 191, 199 transformed, 155, 157, 159, 160, 166, 167, 172, 173, 175–177
I Identifier, 117–119, 121, 122, 124, 127, 132, 134, 137, 139, 140, 145, 147 Input assignment “don’t care”, 201, 232 Input memory functions, 3, 6, 9, 26, 37, 44, 63, 73, 117 Input of LCS, 214
K Karnaugh map, 27, 40, 43, 50, 75, 108, 126, 127, 138, 187, 193, 228, 232
L Linear chains of states, 26, 117, 214 Logical condition, 117, 151–153, 155–159, 162, 166, 169, 173, 197, 206, 218, 224, 226, 228, 229, 237 Logic circuit, 28, 29, 32–34, 48, 61, 62, 95, 117, 157, 165, 169, 196, 204, 207, 209, 215, 232, 238, 240
Index Look-Up Table (LUT) element, 28–30, 32– 34, 38–40, 51, 61, 65, 151, 154–156, 159, 161, 163, 166, 170, 173–175, 177 LUTer, 28–30, 32–34, 38–40, 51, 61, 65, 121, 122, 124–128, 130–132, 134, 136–140, 143–145, 147, 148, 152, 154–157, 161–166, 168–171, 174, 175, 177, 178, 181–183, 188, 190, 192–194, 197–202, 204, 207–211, 217, 218, 226, 227, 230–239
M Main input of LCS, 214 Microoperation, 117, 118, 120–122, 128, 130, 132, 134, 139, 181–186, 190, 192, 194, 195, 197, 202, 207, 236, 237 Mixed encoding of microoperations, 181, 238 Model of FSM, 118, 130, 140, 157, 164, 166, 181, 182, 193, 198, 200, 204
O Object code, 117 One-hot encoding of microoperations, 181, 183, 206 Operational Linear Chain (OLC), 117, 213 Output of LCS, 214
P Product term, 217, 221 Programmable Array Logic (PAL), 27, 28, 37 Programmable Read-Only Memory (PROM), 27 Pseudoequivalent states, 23, 48, 91, 117, 151, 181
R Random-Access Memory (RAM), 29 Replacement of logical conditions, 151, 157, 166, 169, 173, 197, 206, 226, 240
S State assignment, 11, 23, 25, 26, 49–51, 61, 65, 70, 79, 87, 91, 154, 159, 183, 187, 190, 197, 209, 225
Index diminishing, 159, 160, 177, 202, 204, 206 natural, 214–216, 219, 221, 223, 225, 228, 231, 237 special, 154, 155 twofold, 117, 121, 151, 152, 157, 226, 239 code, 120, 132, 134, 156, 159, 162, 167, 173, 178, 192, 194, 196, 204, 205, 216, 217, 225, 235, 237, 239 of transition, 153, 215 pseudoequivalent, 117, 181 variables, 117, 123, 171, 229, 237 Structure table of FSM, 188, 189, 194, 215, 217, 221, 222, 224, 229, 232, 233, 238 Sum of Products (SOP), 126, 143–145, 227 Synthesis, 121–123, 127, 128, 132, 133, 137–140, 145, 148, 151, 157, 164, 165, 169, 171, 172, 175, 181, 183, 190, 194, 196, 197, 199, 200, 202, 206, 209, 213, 215, 218, 219, 224, 228, 230–232, 234, 237, 240 System of Boolean functions, 3, 153, 192, 215
247 T Table of EMBer, 131, 157, 161, 162, 164, 169, 170, 183, 188, 189, 194–197, 200, 201, 204–207, 229–235 LUTer, 28–30, 32–34, 38–40, 51, 61, 122, 125, 126, 128, 130–132, 136, 137, 139, 140, 144, 145, 147, 148, 157, 162– 166, 168–171, 174, 178, 183, 197, 199, 200, 238 Transformation of initial GSA, 4, 6, 10, 14, 17, 23, 33, 39, 69, 70, 151, 157, 164, 171, 172 microoperations, 117, 120, 121, 139 object codes, 117, 226, 236 state codes, 117, 132, 196, 236, 237, 239 Transformed GSA, 4, 6, 10, 14, 17, 23, 33, 39, 69, 70, 159, 160, 166, 167, 172, 173, 175–177
V Vertex, 195 conditional, 4, 166 final, 4 initial, 4, 47 input, 10, 14 operator, 4, 5, 10, 39, 47, 48, 213, 214