Tool Independent High-Level Synthesis - with a comparison between Mentor Catapult and Cadence Stratus


165 0 225KB

English Pages 46 Year 2019

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Introduction
Background
Objectives and Questions
Methodology
Contribution
State of the Art
Increasing abstraction levels
Benefits and drawbacks of HLS
An Altered Workflow
System C
HLS tools
Methodology
FIR filter example block
Catapult Workflow
Conversion of Codebase
Simulation
Synthesis
Environments
Results
AC types in Stratus
Verification of functionality using ARM SoC Designer
Data type simulation speed
High-Level Synthesis using Cadence Stratus and Mentor Graphics Catapult
Logic Synthesis Using Cadence Genus
Discussion
Simulation speed
Bit accurate data types in Catapult and Stratus
Components placed by the HLS tools
Genus Synthesis Results
Area approximations
Conclusion
Return to the Research Questions
Guidelines and Recommendations
Future work
Recommend Papers

Tool Independent High-Level Synthesis - with a comparison between Mentor Catapult and Cadence Stratus

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Tool Independent High-Level Synthesis with a comparison between Mentor Catapult and Cadence Stratus

Oscar Dahlblom Department of Electrical and Information Technology Lund University

Supervisor: Liang Liu Examiner: Erik Larsson April 17, 2019

© 2019 Printed in Sweden Tryckeriet i E-huset, Lund

Abstract

High-level synthesis is by many seen as the next step in the ever-increasing abstraction levels of digital hardware design. During development at this level there is a high risk of getting locked into tools by a single supplier, as many aspects of the design description are not standardised. In this thesis two of the most popular HLS-tools on the market are evaluated; Mentor Graphics Catapult and Cadence Stratus. The tools are compared in terms of features, development work flow efficiency and quality of generated RTL. The RTL’s are tested by doing synthesis using Cadence Genus. For the example block, Catapult is shown to generate higher quality RTL. Catapult is also shown to offer twice as fast simulation of synthesisable SystemC hardware descriptions, at the cost of being locked into the tool because of data type incompatibility with other HLS tools. From this, recommendations and development guidelines for avoiding to become dependent on a single tool are made.

i

ii

Contents

1

2

3

4

5

Introduction 1.1 Background . . . . . . . 1.2 Objectives and Questions 1.3 Methodology . . . . . . . 1.4 Contribution . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 1 1 1 2

State of the Art 2.1 Increasing abstraction levels . . 2.2 Benefits and drawbacks of HLS 2.3 An Altered Workflow . . . . . . . 2.4 System C . . . . . . . . . . . . 2.5 HLS tools . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

3 3 4 5 5 8

Methodology 3.1 FIR filter example block . 3.2 Catapult Workflow . . . . 3.3 Conversion of Codebase 3.4 Simulation . . . . . . . . 3.5 Synthesis . . . . . . . . 3.6 Environments . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

9 9 10 10 13 13 14

. . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . . . .

Results 4.1 AC types in Stratus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Verification of functionality using ARM SoC Designer . . . . . . . . . . 4.3 Data type simulation speed . . . . . . . . . . . . . . . . . . . . . . . . 4.4 High-Level Synthesis using Cadence Stratus and Mentor Graphics Catapult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Logic Synthesis Using Cadence Genus . . . . . . . . . . . . . . . . .

15 15 15 15

Discussion 5.1 Simulation speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Bit accurate data types in Catapult and Stratus . . . . . . . . . . . . . 5.3 Components placed by the HLS tools . . . . . . . . . . . . . . . . . .

25 25 25 26

iii

21 22

Genus Synthesis Results . . . . . . . . . . . . . . . . . . . . . . . . . Area approximations . . . . . . . . . . . . . . . . . . . . . . . . . . .

26 28

Conclusion 6.1 Return to the Research Questions . . . . . . . . . . . . . . . . . . . . 6.2 Guidelines and Recommendations . . . . . . . . . . . . . . . . . . . . 6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29 29 29 30

5.4 5.5 6

iv

List of Figures

2.1 2.2 2.3 2.4

Abstraction levels of digital design. . . . . . . . . . Abstraction levels of digital design, including HLS. Traditional flow . . . . . . . . . . . . . . . . . . . . Flow using HLS . . . . . . . . . . . . . . . . . . .

. . . .

3 4 5 5

3.1 3.2

FIR filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An overview of the signals of the FIR filter hardware block used for the tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure of the SystemC codebase. The grey boxes represent the part of the code used for synthesis. The outer boxes are used for testing purposes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arm SoC Designer test bench layout . . . . . . . . . . . . . . . . . .

9

3.3

3.4 4.1 4.2 4.3 4.4

4.5 4.6 4.7 4.8 4.9

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Data type simulation speed for isolated 32 bit additions. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . . . . . . Data type simulation speed for isolated 32 bit multiplications. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . Data type simulation speed for isolated 32 bit divisions. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . . . . . . Data type simulation speed for isolated 32 bit divisions with data for sc_fixed omitted to increase readability. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data type simulation speed for isolated 256 bit additions. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . . . Data type simulation speed for isolated 256 bit multiplication. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . Data type simulation speed for isolated 256 bit divisions. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . . . Genus synthesis area results. . . . . . . . . . . . . . . . . . . . . . . Genus synthesis power results. . . . . . . . . . . . . . . . . . . . . .

v

10

11 13

17 17 18

18 19 20 20 23 24

vi

List of Tables

4.1

SystemC simulation time using AC and different SC data types. AC is the original and taped-out implementation. All the other implementations are converted from the original. . . . . . . . . . . . . . . . . . . 4.2 Data type simulation speed for isolated 32 bit operations. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . . . 4.3 Data type simulation speed for isolated 32 bit operations. Difference between comparable data types. . . . . . . . . . . . . . . . . . . . . . 4.4 Data type simulation speed for isolated 256 bit operations. Numbers are in seconds for 100 000 000 runs. . . . . . . . . . . . . . . . . . . 4.5 Data type simulation speed for isolated 256 bit operations. Difference between comparable data types. . . . . . . . . . . . . . . . . . . . . . 4.6 HLS synthesis time in seconds. Note; As Stratus doesn’t output decimals for these results, the Catapult measurements are also rounded to the closest integer. However, the calculation of std. deviation and average were done using higher precision for the catapult run times. . 4.7 Approximations of area from Catapult using AC types and SC types and Stratus using SC types. . . . . . . . . . . . . . . . . . . . . . . . 4.8 Genus synthesis results using default HLS settings, with % of results from Catapult using AC types. . . . . . . . . . . . . . . . . . . . . . . 4.9 Genus synthesis results with HLS power saving features inactivated, with % of results from Catapult with power saving activated (default). . 4.11 Components placed during HLS synthesis. . . . . . . . . . . . . . . . 4.10 Genus synthesis HVT, SVT and LVT cell choices using default HLS settings. Values are percentage of cells. . . . . . . . . . . . . . . . . .

vii

16 16 16 19 19

21 21 22 23 24 24

viii

List of Abbreviations

CPU

Central processing unit

DUT

Device under test

FIR

Finite impulse response

HLS

High-level synthesis

IDE

Integrated development environment

LSB

Least significant bit

MSB

Most significant bit

RTL

Register-transfer level

SoC

System on a chip

UART Universal asynchronous receiver-transmitter

ix

x

Preface

This bachelor’s thesis is a collaboration between the Faculty of Engineering LTH at Lund University and ARM Sweden AB. The thesis was supervised at LTH by Liang Liu, and at ARM Sweden AB by Michal Stala and Thomas Olsson. Examiner was Erik Larsson. Special thanks to the members of the ARM LPWAN hardware design team in Lund, including; Thomas Olsson, Henrik Ljunger, Ola Nordling, Marcel Tovar and Sebastien Fuhrmann. Lund, February 2019

Chapter

1

Introduction

1.1

Background

High-level synthesis (HLS) is constantly gaining traction and seeing increased use in the industry. However, there are several players on the market providing tools that may produce substantially different outputs as well as require different inputs. Development with only one tool in mind may result in difficulties to adapt the developed system to another tool. Two of the most popular tools for HLS on the market are Mentor Graphics Catapult and the relatively recently released Cadence Stratus. These two tools will be the subject for comparison in this thesis.

1.2

Objectives and Questions

The research questions for the thesis are; • What impact does changing HLS tools have on an already implemented system, and what amount of effort is needed to adapt an existing system written for Mentor Graphics Catapult to be synthesised using Cadence Stratus? • How can SystemC hardware development be done without getting locked in to a single HLS tool? • How does Cadence Stratus compare to Mentor Catapult in terms of performance during development and the generated end result?

1.3

Methodology

Insights regarding tool agnostic HLS will be gained by adapting a real-world system implemented in SystemC, with the Catapult environment in mind, to be synthesised using Cadence Stratus. The generated RTL’s will be tested using ARM SoCDesigner to ensure they provide identical data outputs. The RTL’s will also be synthesised using Cadence Genus in order to compare the quality of results. The

1

2

Introduction

scope of the project will be limited by focusing on a small number of hardware blocks to be adapted for synthesis using Stratus.

1.4

Contribution

The thesis will provide guidelines and valuable insights for making SystemC HLS development tool-independent, as well as provide an evaluation of the relatively recently released (2015) Cadence Stratus in comparison to Mentor Graphics Catapult.

Chapter

2

State of the Art

In this chapter, the history and current state of digital hardware development will be briefly summarised. Special attention will be given to the field of high-level synthesis.

2.1

Increasing abstraction levels

Higher abstraction level

The very first digital hardware designs were created at the most basic level, by manually connecting transistors into circuits. We call this the gate level. During the 1970s and 1980s most digital hardware design were done at the logic level, using logic gates as building blocks. In other words, the abstraction level increased. This allowed for more efficient digital hardware development and thus bigger and more complex designs. The next increase in abstraction level were the move to register-transfer level design, again allowing more efficient development and bigger and more complex designs. To this day this is still the most commonly used design methodology, with hardware describing languages such as Verilog. An illustrative overview of these abstraction levels can be seen in Figure 2.1. (Vahid 2010) This development is driven by constantly improving tools allowing the designer to use more and more complex building blocks. This frees up time and allows the designer to focus more on increasing complexity. This evolution of tools and increase in abstraction level has played a big part in the very rapid

Register-transfer level

Logic level

Gate level

Figure 2.1: Abstraction levels of digital design.

3

4

State of the Art

Higher abstraction level

High level (HLS)

Register-transfer level

Logic level

Gate level

Figure 2.2: Abstraction levels of digital design, including HLS.

progress in digital hardware that we have seen for several decades. (Vahid 2010) (Fingeroff 2010) (Joentakanen 2017) This is the context in which we find high-level synthesis (HLS). It is the most likely candidate for the next big industry-wide increase of abstraction level in digital design. It follows the pattern of ever-increasing abstraction levels, and fits into the overview as shown in Figure 2.2. (Fingeroff 2010) Simplified, the idea of HLS can be described as writing hardware using high level programming languages such as C/C++ in order to allow faster development. This can be seen as an analogy to the ever-increasing abstraction levels in software development; Most people agree that an implementation in assembly would be faster and more efficient than one done in C, but the general consensus is that it would be impossibly time consuming to create a system of today’s scale using assembly. Therefore C is used, or even C++, Java, Python, and so on. In this thesis, the C++ based hardware-describing language SystemC will be used.

2.2

Benefits and drawbacks of HLS

One major benefit of HLS is the simpler syntax, which allows for shorter development times. Combined with other benefits of HLS, such as faster simulation and rapid architectural exploration, this development methodology has big potential of greatly reducing the time-to-market for hardware designs. The single biggest potential drawback of HLS is the possibility of larger and more power consuming designs when giving up some of the designer’s control, just as in the software development analogy above. While considering some of the potential benefits with HLS, it is fitting to briefly comment on Michael Fingeroff’s book that is frequently referred to in this thesis. High-level Synthesis Blue Book by Michael Fingeroff was published by Mentor Graphics, and while it is a good learning source for getting up to speed with HLS and Catapult, it might be tainted to depict HLS and Catapult in as good light as possible. The entire fist chapter of the book, while only four pages

State of the Art

5

RTL

Gate-level synthesis

Netlist

Figure 2.3: Traditional flow Source /SystemC

Constraints

HLS-tool

RTL

Gate-level synthesis

Netlist

Tech library info

Figure 2.4: Flow using HLS

long, is dedicated to describe how HLS is not only very efficient and produces good results, it is even necessary going forward. The author has arguably some good arguments, but it is good to keep in mind that currently there is far from consensus within the industry regarding this questions.

2.3

An Altered Workflow

The common development work flow with development on register-transfer level (RTL) using for example Verilog is illustrated in Figure 2.3. HLS fits into the synthesis flow before RTL as seen Figure 2.4. The SystemC code is synthesised into RTL using a HLS tool, and then the RTL is synthesised as usual using a tool for logic synthesis. During development simulations for functionality testing can be run fast using SystemC and debugging can be done rapidly. (Michael Keating 2011)(Fingeroff 2010)(Vahid 2010)

2.4

System C

SystemC was first and foremost developed as a hardware description language for use in simulation of hardware. Because of this intended use, some aspects of the language are not suitable for creating hardware and not all code is synthesisable. (Accellera 2016)

6

State of the Art

The SystemC standard is developed by the standards organisation Accellera and transferred to Institute of Electrical and Electronics Engineers (IEEE) as IEEE 1666. (Mentor Graphics n.d.[b])

2.4.1

Bit accurate data types

When using C++ as basis for hardware description, one problem that occurs is the matter of representing data. The built in data types in C and C++ typically has a bit width defined by the system that the code is compiled for. This is naturally not ideal when describing hardware. Even if a explicitly set bit width is used, they would still not accurately represent hardware, nor be very practical. In order to solve this, bit accurate data types need to be used. System C includes some basic bit accurate types. Those types are, however, limiting in some cases, for instance in handling big integers and representing complex numbers. A closely related topic is the hardware implementation of operands on these data types. The HLS tool needs to make a highly optimised interpretation of the operands, for instance make clever use of shift and add logic for operands such as division and modulo. (Fingeroff 2010)

SystemC Data types (SC types) When using fixed point numbers, HLS has a big advantage over more traditional hardware design methodologies. (ibid.) The SystemC set of data types (SC types) includes several data types, and in some cases, alternative implementations for representing the same type of data. This project will consider the following types; • sc_int This is the basic implementation of a signed integer. The maximum bit width is 64 bits. • sc_uint This is the basic implementation of an unsigned integer. The maximum bit width is 64 bits. • sc_biguint This is an alternative implementation of an unsigned integer that supports bit widths over 64 bits. • sc_fixed The basic fixed point implementation. • sc_fixed_fast This is an alternative fixed point implementation with improved simulation speed. Supports bit widths up to 64 bits.

Mentor Algorithmic C (AC) types In order to meet demand for a set of data types that can be synthesised into good RTL and being fast to simulate, Mentor Graphics developed their "Algorithmic

State of the Art

7

C" (AC) types. AC types are made to work well in conjunction with Mentor Catapult, and are probably the widest used bit accurate data types. While the AC types are freely available under Apache License, it is unclear how good the support is from other HLS tools than Mentor’s own Catapult. Cadence has, for instance their own data types they recommend for use with Stratus, and their documentation do not mention any other types, except the built in SystemC types. (Mentor Graphics n.d.[a]) The AC types that will be considered in this paper are; • ac_int This is an implementation of both signed and unsigned integers. • ac_fixed This is a fixed point implementation. As this thesis will investigate different approaches for quickly migrating code between different tools, the ability to write code with minimal consideration of how the individual tools might react is essential. It might be possible that the AC works just fine with Stratus. It might also turn out that while AC types work, Stratus’ own types results in better RTL. Another case is that there is a clear compatibility issue resulting in errors or very bad RTL. In such case, it might be solved by simple type defines in a .h-file, depending on tool. There is, however a slight possibility that this approach of simply changing all types in a system might result in unexpected behaviour, and needs to be investigated. Another approach is to implement new data types, with tool independence in mind, which may prove difficult when considering the high requirements on optimisation and quality of the RTL. Making data types based on the AC types, or try to use third party data types are other options. (W. Ecker 2007) found that it’s easy to convert to AC types from SC types as there are mostly simple changes is the syntax. This thesis will, however, test doing this the other way around, from AC types to SC types. (Cadence n.d.[a]) (Cadence n.d.[b]) (Mentor Graphics n.d.[b]) Arguably the most important key purpose of AC types is to increase SystemC simulation speed. (W. Ecker 2007) presented a comparison of simulation speed. A number of simulation speed increasing tips were also put forward in (Tumbush 2007) that will be used for the implementation done as part of this thesis. This thesis will include a renewed evaluation of simulation speed using modern computing power. One question to ask is: Are AC types even relevant as a mean to increase speed of simulations today?

2.4.2

Constraints

The SystemC code cannot define the complete design of the system by itself. It can describe the behaviour, but the HLS tool will also need additional instructions for several other aspects of the design, for example how to interpret loops and variables. (Fingeroff 2010) Thus, additional files with instructions for the tool is an important part of the HLS flow. This is also an area where the different tools use different instructions and syntax. (Cadence n.d.[a]) (Cadence n.d.[b]) (Mentor Graphics n.d.[b])

8

2.5

State of the Art

HLS tools

In the following the HLS tools Catapult by Mentor Graphics and Stratus by Candance will be treated. Catapult HLS was released by Mentor Graphics in 2002 (EE Times 2004). Originally using ANSI C/C++ , SystemC support was later introduced. A big selling point is the ability to synthesise AC types which allows for improved simulation speed. Stratus was released by Cadence in 2015. The tool has its roots in Forte Synthesiser, released in the early 2000s and acquired by Cadence in 2014, and Cadence own C-to-Silicon, released in 2008 (Moretti 2008) (Electronic Engineering Journal 2015). Stratus features a highly integrated IDE, including integration of Cadence synthesis tool Genus which can be scripted to run directly from the Stratus GUI. The integration with Genus should allow for making better optimisations in the generated RTL, and is a big selling point for the tool.

Chapter

3

Methodology

This chapter describes the methodology and setup for the tests conducted. The chapter also brings up some key parts of the work conducted as part of this project, in order to provide a sense of the effort needed for adaption to a new tool. This also provides insights into what is important to keep in mind for achieving maximum tool independence during development.

3.1

FIR filter example block

The example block used for the results and comparisons is a simple finite impulse response (FIR) filter. The reader is assumed to be familiar with the basic concept, as shown in Figure 3.1. The details about the signal processing itself lies outside the scope of this thesis but can be found in e.g. (J. Proakis 2007). The example block is a 20 order FIR filter with variable 16 bits coefficients. The block operates using complex numbers with a width of 16 bits for the imaginary part and 16 bits for the real part, for a total data width of 32 bits. Apart from the 320 bits (16*20) of coefficients and 32 bits of data with associated valid signal, there are also inputs for both bypassing the block completely using muxes, and by setting all coefficients to zero, effectively making the block operate as a shift register. The block has inputs for both hardware reset and software reset, as well as an input for the system clock. The data output is 16 + 16 bits complex with valid signal. An overview of the signals can be seen in Figure 3.2.

x [n]

z −1 h [0]

z −1 h [1]

z −1 h [2]

Figure 3.1: FIR filter.

9

h[ N ] y[n]

10

Methodology

Data (16 + 16 bits) Valid (1 bit) Coefficients (320 bits) Bypass using mux (1 bit) Bypass using coeffs (1 bit)

Data (16 + 16 bit)s Valid (1 bit)

Figure 3.2: An overview of the signals of the FIR filter hardware block used for the tests.

3.2

Catapult Workflow

The basic work flow used for development of the existing Catapult codebase uses a project organised with a core containing the actual algorithms for data processing, that is instantiated in a wrapper using a separate file to define the data types. During development, a C++/SystemC test bench is used. The testbench instantiates the wrapped device under test (DUT), generates test and reference data and runs test. The testbench may be split into several files, separating these tasks into separate files. The testbench is itself instantiated and run by a main program, as is common in C++ (S. Lippman and Moo 2015). For HLS synthesis the same wrapped DUT is used, and the testbench is omitted. This is scripted to generate RTL using Catapult, using additional files with constraints and libraries. The instructions for Catapult, such as unrolling of loops and how to interpret arrays, are written directly in the SystemC code of the core and wrapper. This means that these tool specific instructions are scattered all over the code. In order to use another HLS tool all of these instructions need to be replaced with instructions for the other tool that corresponds as close as possible.

3.3

Conversion of Codebase

Behaviour-Describing Code Figure 3.3 shows the a basic example structure of the code for a SystemC hardware block. Apart from the layers shown, it is considered to be good practise to define all data types in a separate file. The main purpose of the wrapper is to instantiate the block, using the desired data types. The grey parts in Figure 3.3 represent the part of the code used for HLS and the rest of the syntesis flow. The white boxes outside are the designer test bench used during development. This is what is used to run the SystemC simulation. This part of the code has a common C++ structure with a small main program instantiating the simulation. For the block used in this thesis, the DUT core consists of yet another SystemC wrapper, handling things like bypass, around the actual algorithm implemented in C/ C++.

Methodology

11

main test bench wrapper DUT core

Figure 3.3: Structure of the SystemC codebase. The grey boxes represent the part of the code used for synthesis. The outer boxes are used for testing purposes.

All of this was originally implemented using AC data types. In order for the block to be synthesisable using Stratus it had to be converted to SC types. This is a quite tedious task, as all reads and writes to any bit accurate variable has to be rewritten and in many cases rethought. One major difference is the way to set bits while using the two different kinds of types. To write the bits in y to x starting at the least significant bit lsb, using AC types, one writes; x.set_slc(lsb, y); For SC types one would instead write; x.range(lsb, msb) = y; which writes y to the bits in x between lsb and msb. This also requires the most significant bit msb, and thereby the width of y to be known. This kind of inconsistencies might result in quite substantial rethinking and rewriting, not least when the code is split into several functions in several files, and the required constants or variables are not known where they are now needed. When using AC types, integers of large bit widths, in this case 320 bits, can be used without any major impact on simulation speed. When using SC types this is not the case. The basic SC integer type, sc_int, cannot handle bit width of this size, and as mentioned in W. Ecker (2007), the SC type for large bit width, sc_bigint, has a very big impact on simulation speed. To solve this arrays of sc_int’s have to be used instead. In order for the block to be synthesised using Stratus another wrapper around the DUT core needs to be written. Stratus quite strongly suggests the user to use a certain work flow. In practice this requires the wrapper and test bench to be written a certain way, in order for the test bench to be usable at several stages in the development and synthesis process.

Instructions for the HLS Tool The other substantial conversion needed is the replacement of all instructions for the HLS tool. As these instructions may vary quite substantially between tools in both names and usage, this is a big issue for achieving tool independence. To give instructions to Catapult

12

Methodology

#pragma /* Instruction */ is used. Stratus instead relies on the use of macros inside an encasement of { } In context this might look like; Catapult: #pragma hls_pipeline_init_interval 1 while (true) { /* Functionality */ ... ... ... } Stratus: while (true) { HLS_PIPELINE_LOOP(HARD_STALL,1,"pipeline"); /* Functionality */ ... ... ... }

3.3.1

Data type simulation speed

Early tests hinted that some of the SC types might have very big negative impact on simulation speed. In order to circle in what might cause simulation speed issues, a series of tests ware constructed. The tests measure simulation speed of all relevant SC types in isolated tests of simple arithmetic operations. Tests were conducted on addition, multiplication and division of three constants. For addition the operation was; r = a+b+c For multiplication; r = a∗b∗c For division; r = ( a/b)/c The constants were defined as; For tests of fixed types; a = 0.321, b = 0.213, c = 0.123

Methodology

13

For tests of integer types; a = 321, b = 213, c = 123 For tests of large integer types; a = 321321, b = 213213, c = 123123 The data type testbenchs were written in SystemC with shell-script used for running and timing multiple simulations. For the converted code, sc_biguint were not used. Arrays of uint were used instead for drastic increase in simulation speed.

3.4 3.4.1

Simulation Functionality test using ARM SoC Designer

SoCDesigner is an ARM tool for designing and simulating a SoC. The tool allows for hardware blocks to easily be connected into systems using a graphical interface. For instance, an entire CPU can be simulated and run software with UART I/O. (M. Tovar 2017) SocDesigner were used in conjunction with ARM Cycle Model Studio. Cycle Model Studio is another ARM tool that creates Cycle Models from RTL to be used for cycle accurate simulations in SoCDesigner. (ibid.)

Test setup The input data used is a simple ramp and the data generation is done using SoCDesigner’s own scripting language. As data reference, the formally verified and taped out original block is used. The data from the reference model and the devices under test (DUT), i.e. the converted versions are compared using the SoCDesigner scripting language and the result written to a file. The layout can be seen in 3.4. Reference implementation

Data generator

Compare results Coefficients

DUT

Figure 3.4: Arm SoC Designer test bench layout

3.5 3.5.1

Synthesis Synthesis using Cadence Genus

Even though both Catapult and Stratus calculate area and timing estimates, it is necessary to do logic synthesis of the generated RTL’s in order to get accurate

14

Methodology

values. Especially when comparing two different tools, with likely different approaches to calculate said estimates. The RTL’s generated by Catapult and Stratus were synthesised using modified versions of the flow used for synthesis and delivery of the whole system. Even for a small hardware block, such as the one tested, this process is somewhat time consuming at around 45-90 minutes.

3.6

Environments

The simulation speed tests using System and the SoCDesigner functional tests were conducted using the same desktop machine with Intel Core i7 Gen 7 and 16 GB RAM running CentOS 7. All HLS and logic syntheses were conducted using the same cluster environment. For all syntheses 40 nm technology libraries and a clock period of 20 ns were used.

Chapter

4

Results

4.1

AC types in Stratus

Initial tests to use AC data types in Stratus by running a very basic design showed that this is not supported by Stratus at this time, and is not possible.

4.2

Verification of functionality using ARM SoC Designer

Execution of the SoCDesigner test bench described in Figure 3.4 showed that the output data from all converted versions of the FIR block is equal to the original implementation.

4.3

Data type simulation speed

This section present the simulation speed result for the real-world example block, as well as for isolated arithmetic operations, using both 32 bits and 256 bits.

15

16

4.3.1

Results

Real-world example simulation speed results

Table 4.1 show SystemC simulation times for 100 runs for the real-world FIR-filter test. As we can see, there is one order of magnitude difference in execution time between AC and SC types. Table 4.1: SystemC simulation time using AC and different SC data types. AC is the original and taped-out implementation. All the other implementations are converted from the original. Data types Time for 100 runs Average simulation time Percentage of AC simulation time Data types Time for 100 runs Average simulation time Percentage of AC simulation time

4.3.2

AC SC SC (fixed_fast) 42.7 420 319 0.427 4.20 3.19 100% 983% 746% SC (w/o sc_biguint) SC fast, w/o bigint 179 84.8 1.79 0.848 418% 198%

32 bit simulation speed results

Table 4.2 show the simulation time for 100 000 000 runs of different basic arithmetic operations using 32 bits, with difference between comparable AC and SC types in Table 4.3. The results can also be viewed in Figures ?? to 4.4. Table 4.2: Data type simulation speed for isolated 32 bit operations. Numbers are in seconds for 100 000 000 runs. Addition Multiplication Division Addition Multiplication Division

sc_int sc_uint sc_fixed sc_bigint 21.50 21.71 50.05 39.80 21.47 21.73 51.91 41.59 22.88 22.77 261.26 37.37 sc_fixed_fast ac_int (unsigned) ac_int (signed) 31.16 26.57 25.45 31.33 34.85 26.26 31.71 25.43 25.58

ac_fixed 24.31 27.23 42.15

Table 4.3: Data type simulation speed for isolated 32 bit operations. Difference between comparable data types. Addition Multiplication Division Addition Multiplication Division

ac_int (signed)/sc_int ac_int (unsigned)/sc_uint 118% 122% 122% 160% 112% 112% ac_int(signed)/sc_bigint ac_fixed/sc_fixed_fast 63.9% 78.0% 63.2% 86.9% 68.4% 1.33%

ac_fixed/sc/fixed 48.6% 52.4% 16.1%

Results

17

6

107

5 4 3 2 1 0

t d) d) st nt int ne xed igne xed d_fa gin g i i fi fi c_i sc_u b s _ s _ s e _ ( n c x c ac s s fi int t (u sc_ ac_ ac_in

Figure 4.1: Data type simulation speed for isolated 32 bit additions. Numbers are in seconds for 100 000 000 runs.

6

107

5 4 3 2 1 0

d) d) st nt nt ed int ne xed igne _fa g igi i fi fix c_i sc_u d b s _ s _ s e _ ( c n c a s sc fix int t (u sc_ ac_ ac_in

Figure 4.2: Data type simulation speed for isolated 32 bit multiplications. Numbers are in seconds for 100 000 000 runs.

18

Results

3

108

2.5 2 1.5 1 0.5 0

t d) d) int nt ed ne fas int xed igne g i fi fix c_i sc_u d_ big s _ s _ s e _ ( n c c x c a s s fi int t (u sc_ ac_ ac_in

Figure 4.3: Data type simulation speed for isolated 32 bit divisions. Numbers are in seconds for 100 000 000 runs.

4.5

107

4 3.5 3 2.5 2 1.5 1 0.5 0

d) d) t st t nt ne xed igne xed d_fa g _in c_uin igi i fi fi c b s _ s _ s e _ ( n c s sc a sc fix int t (u sc_ ac_ ac_in

Figure 4.4: Data type simulation speed for isolated 32 bit divisions with data for sc_fixed omitted to increase readability. Numbers are in seconds for 100 000 000 runs.

Results

4.3.3

19

64 bit simulation speed results

Table 4.4 show the simulation time for 100 000 000 runs of different basic arithmetic operations using 256 bits, with difference between comparable AC and SC types in Table 4.5. The results can also be viewed in Figures 4.5 to 4.7. Table 4.4: Data type simulation speed for isolated 256 bit operations. Numbers are in seconds for 100 000 000 runs. ac_int (signed) 33.54 156.81 53.14

Addition Multiplication Division

sc_bigint 45.88 48.87 40.73

sc_fixed 48.28 59.15 247.44

ac_fixed 33.28 162.53 327.99

Table 4.5: Data type simulation speed for isolated 256 bit operations. Difference between comparable data types. Addition Multiplication Division Average

5

ac_fixed/sc_fixed 68.9% 275% 133% 159%

ac_int/sc_big_int 73.1% 321% 130% 175%

107

4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

ed fix ac_

ac_

in

ign t (s

) ed

nt igi b _ sc

ed fix sc_

t fas

_ xed fi _ sc

Figure 4.5: Data type simulation speed for isolated 256 bit additions. Numbers are in seconds for 100 000 000 runs.

20

Results

18

107

16 14 12 10 8 6 4 2 0

xed c_fi

a

d)

int ac_

e gn (si

nt

i big sc_

ed

fix sc_

st

_fa xed fi sc_

Figure 4.6: Data type simulation speed for isolated 256 bit multiplication. Numbers are in seconds for 100 000 000 runs.

3.5

108

3 2.5 2 1.5 1 0.5 0

ed fix ac_

) ed

ac

ign t (s n i _

nt igi b _ sc

xed

fi sc_

_fa

ed fix

st

sc_

Figure 4.7: Data type simulation speed for isolated 256 bit divisions. Numbers are in seconds for 100 000 000 runs.

Results

4.4

21

High-Level Synthesis using Cadence Stratus and Mentor Graphics Catapult

Results on run time and area approximation from HLS done using Stratus and Catapult with corresponding setups can be seen in Tables 4.6 and 4.7. The area approximations can also be viewed alongside the area results from synthesis in Figure 4.8. The area approximations differ by around 31 % between Stratus and Catapult. The big difference is likely due to Stratus not including the net area in the estimate, but only the cell area. Table 4.6 shows the run time for high-level synthesis of the example block. Both tools take roughly the same amount of time, with Stratus being slightly faster on average, but with a high standard deviation. Table 4.6: HLS synthesis time in seconds. Note; As Stratus doesn’t output decimals for these results, the Catapult measurements are also rounded to the closest integer. However, the calculation of std. deviation and average were done using higher precision for the catapult run times. Run no AC, catapult SC, stratus (no bigint, fixed_fast) Stratus/Catapult

1 26 24

2 27 20

3 27 25

4 26 24

5 27 26

std. deviation 0,43 2,28 529%

Average 26,4 23,8 90,2%

Table 4.7: Approximations of area from Catapult using AC types and SC types and Stratus using SC types. Tool Data types Total area

Catapult AC 18756.60

Catapult SC 18725.18

Catapult SC (using sc_fast_fixed) 18725.18

Stratus SC (using sc_fast_fixed) 12883

22

4.5

Results

Logic Synthesis Using Cadence Genus

The results from Logic synthesis using Cadence Stratus can be seen in Table 4.8 and Figures 4.8 & 4.9. Table 4.9 shows the result of synthesis of the original AC Catapult implementation with power saving features disabled. Multiple synthesis runs with Genus for the different implementations provided the same results on area, data path time, cells and power usage, thus showing no signs of randomness in the synthesis. Additional synthesis run using Stratus with all power saving features explicitly turned both on and off were also conducted. These tests gave the same results as the Stratus results in Table 4.8. Table 4.8: Genus synthesis results using default HLS settings, with % of results from Catapult using AC types. Implementation Total area (µm2 ) Total area (% of AC) Cell area(µm2 ) Net area(µm2 ) Data path time (ps) Data path time (% of AC) Cells Cells (% of AC) Total Power (µW) Total Power (% of AC) Leakage Power (µW) Leakage Power (% of Total Power) Leakage Power (% of AC)

Catapult using AC types 17085 100 % 12094 4990 19059 100 % 7368 100 % 1245.876 100 % 1.032 0.08 % 100 %

Catapult using SC types 16925 99.1 % 11979 4947 19053 100 % 7307 99.2 % 1159.776 93.1 % 0.595 0.05 % 57.7 %

Stratus using SC types 17041 99.7 % 12082 4958 18907 99.2 % 7303 99.1 % 1451.367 116.5 % 1.838 0.13 % 178.1 %

Results

23

·104 2

18,757 17,085

18,725 16,925

17,041

Total Area (µm)

1.5

12,883

1 0.5 0

lt apu

Cat

n usi

Ct gA

s

ype

t us

ul tap

C gS

e typ

in

Ca

s

Area from Genus

es

typ

in

s us

tu Stra

C gS

HLS tool approximation

Figure 4.8: Genus synthesis area results.

Table 4.9: Genus synthesis results with HLS power saving features inactivated, with % of results from Catapult with power saving activated (default). Implementation Total area (µm) Total area (% of original AC) Cell area(µm2 ) Net area(µm2 ) Data path time (ps) Data path time (% of original AC) Cells Cells (% of AC) Total Power (µW) Total Power (% of original AC) Leakage Power (µW) Leakage Power (% of Total Power) Leakage Power (% of original AC)

Catapult using AC types 16897 98.9 % 12057 4840 19057 100 % 7220 98 % 1265.055 101.5 % 0.835 0.07 % 81 %

24

Results

1,451.37

1,500 Total Power (µW)

1,245.88

1,159.78

1,000

500

0 s

Cat

lt apu

usi

ng

lt apu

s

s ype

ype

t AC

usi

ng

t SC

Cat

sin

us u

t Stra

ype Ct gS

Figure 4.9: Genus synthesis power results. Table 4.11: Components placed during HLS synthesis. Implementation Multiplicators Adders CSA Tree Adders Increment Adders

Catapult using AC types 4 2 2 -

Catapult using SC types 4 4 4

Stratus using SC types 4 4 4

Table 4.10: Genus synthesis HVT, SVT and LVT cell choices using default HLS settings. Values are percentage of cells. Implementation High Voltage Threshold Standard Voltage Threshold Low Voltage Threshold

Catapult using AC types 95.6 % 3.7 % 0.7 %

Catapult using SC types 96.6 % 2.8 % 0.4 %

Stratus using SC types 80.3 % 15.3 % 4.5 %

Chapter

5

Discussion

5.1

Simulation speed

The real-world implementation using AC types showed very high simulation speed compared to the basic SC type implementation, as seen in Table 4.1. This is a much larger difference than expected from (W. Ecker 2007). The improved SC type implementation without the use of sc_biguint and with use of sc_fixed_fast showed expected performance at about half the speed of the AC type implementation. Interestingly, contrary to this, the SC types showed over all higher speed and more predicable behaviour in the synthetic isolated test cases. One exception to this is the sc_biguint which is very slow. W. Ecker (ibid.) choose to not even include sc_biguint in their comparison because of "very bad performance" (ibid.). This reasoning is confirmed by the results presented here. As the tests conducted only test basic arithmetic operations and, for instance, no bit wise operations, shifts, or operations requiring type conversions, the answer for this result may be any of a number of possibilities. However, the purpose of the simulation speed tests conducted were to track down the reason for very bad simulation speed, which was accomplished and shown to be because of the use of sc_biguint. Moreover, based on these tests the sc_fixed_fast were also used in the real application and shown to significantly increase simulation speed. This can be seen in Table 4.1, were the implementation using sc_fixed_fast runs in less than half the time (0.848 s) compared to the implementation using sc_fixed (1.79 s)

5.2

Bit accurate data types in Catapult and Stratus

One of the big original questions, whether it is possible to use AC types in Stratus, turned out to have a clear answer during initial testing. As it turns out, AC-types are themselves not actually synthesisable using the SystemC standard alone, but are required to be recognised and swapped out by the HLS-tool during synthesis. This is of course done by Catapult as both the tool and data types are supplied by the same vendor and used as a major selling point. Stratus, however does not do this. This makes the user in effect limited to use SC data types, as SC types work

25

26

Discussion

just fine in Catapult. This leads to the conclusion that for achieving tool independence, SC types should be used. Solutions using user defined data types that instantiate either AC or SC types depending on flags set during compilation were also considered. That would allow for high speed simulations using AC types at the same time as allowing synthesis using any of the tools. The problem with a solution like that is the high risk of introducing unexpected bugs or behaviour when changing data types. E.g. if the system is developed using AC types for increased simulation speed, and then the types are changed into SC types for synthesis, there is a large risk of introducing bugs and unexpected behaviour, which at that stage of development might prove difficult and time consuming to track down and solve. Of course this can be avoided to great extent by regularly running simulations using SC as well. This would however also greatly defy the purpose of using the AC types as the increased simulation speed cannot be fully utilised. It would potentially add quite a lot of effort to the implementation and overall complexity of the work flow, and only give a moderate increase in simulation speed. However, if the system to be implemented and simulated is very big and takes the magnitude of many minutes or hours to simulate using a reasonably high-performing machine, the approach might prove viable. Especially if it is used in a context requiring a large number of automated tests running on servers with limited capacity.

5.3

Components placed by the HLS tools

As seen in Table 4.11, the most prominent difference between the implementations is the use of carry save adders and no increment adders in the Catapult AC type implementation. Interestingly, the Catapult and Stratus implementations using SC are seemingly of very similar design. This indicates a quite large difference in the design depending on which data types are used. For larger designs, it might be possible that these differences make a significant impact on design properties, maybe even more so than the HLS tool used.

5.4

Genus Synthesis Results

The Gate level synthesis using Genus shows that all of the implementations meet the timing requirements with data path times less than the required one clock cycle, or 20000 ps. The Stratus implementation is slightly faster than the two Catapult implementations, but since the difference is small (0.9 %) and all implementations will complete the required operations within the same clock cycle, this result is of limited importance. This can be seen in Table 4.8. It is also seen in Figure 4.8 that the differences in area between the different implementations are very small. The arguably most interesting result is the quite large difference in total power used by the different implementations. The power consumption of Stratus is 16.5 % higher than the original Catapult/AC type implementation. The power consumption of the Catapult implementation using SC types is 6.9 % lower. The leakage power also vary a lot between the implementations. The Stra-

Discussion

27

tus leakage power is almost twice that of the Catapult AC implementation, and three times the Catapult AC implementation. Note that all the power usage data are approximations done by Genus, and might differ substantially from the actual power usage. A power simulation using toggle information would provide more accurate results, but lies outside the scope of this thesis. It could, however, be subject for further research. As the area and cell count are roughly the same for all implementations, the large difference in power might be down to Genus being forced to do a different selection of cells from the library to use in order to meet the given constraints, likely the timing constraint. Table 4.10 contains a summary of types of cells chosen during synthesis for the different implementations. There is a major difference in voltage threshold in the cells chosen for the Catapult implementations and the Status implementation. The lower average power thresholds in the Status implementation likely results in a higher power usage. As all implementation used the same constraints for the logic synthesis, and even the same complete flow, this must be down to the HLS done by Stratus. One possible reason is that Catapult has bigger margins in the design as default, allowing Genus to meet the timing without problems and thus generating good hardware. If Stratus has smaller margins in the design, this could lead to Genus being forced to place more power consuming but faster cells in order to meet the timing requirements. To try this, the Stratus HLS was rerun using a period time on 15 ns, thereby over constraining the design and force Stratus to make a faster implementation. This should give Genus more room to meet the timing requirements and not having to result to using more power hungry cells. However, this test generated the same result and thereby enforcing the previous findings in power usage differences. Another possibility is that Catapult uses more power saving features such as clock gating, and those features makes a major difference for this design. What features to be implemented in the design can be set in both tools, but the default settings varies between the tools, and between different license levels for the tools. The top license level of Catapult uses power features by default that might not be used by Stratus as default. This could explain, or partly explain, the observed differences in power usage. In order to test this a series of extra synthesis runs were conducted. The results of HLS using Catapult without any advanced power optimisation features activated can seen in Table 4.9. The results indicate that the extra power optimisation features used by Catapult does not explain the large difference in power usage between the implementations. It might very well be possible to arrive at Stratus implementation with power usage on comparable levels to Catapult by continue to tweak the synthesis flow. For instance Genus could be constrained to only use cells with high voltage threshold, and the pipelining done by Stratus could perhaps be tweaked. In any case, all of this does not make a very good case for Stratus in terms of user friendliness and speed of development. To summarise the result of the logic synthesis, the combination of SC data types and use of Mentor Graphics Catapult generates the best quality of result. Especially the power usage sets it apart from Stratus. Using the exact same behavioural code base and synthesis flow, Catapult’s implementation requires as much as 25.1 % less power than Stratus’s implementation.

28

5.5

Discussion

Area approximations

From the results of the logic synthesis, conclusions regarding the area estimations of the HLS tools can be drawn. The most apparent conclusion is the that Stratus quite severely underestimated the area. Stratus estimated 24% less (12883µm2 compared to 17041µm2 ) than the actual area used after synthesis, as seen in Figure 4.8. Catapult, on the other hand, made better estimations; 9.8% more (18757µm2 compared to 17085µm2 ) for the AC type implementation and 11% (18725µm2 compared to 16925µm2 ) for the SC type implementation. This result is somewhat unexpected as both Stratus and Genus are supplied by the same vendor and one would expect interchange of technology between the tools. However, as mentioned earlier this result is likely due to Stratus only estimating the cell area and does not include the net area. With an analogue argument, it is interesting that Catapult provides better results across the board using SC types than Mentor Graphics own AC types. As the test is conducted on only one block, no definite decisions on what tool is superior should be made based on these results, as this might be an unusual case. However, the results unveils a potentially interesting subject for further research and investigation. The main shortcoming with this project is the use of only one, small block. This was deemed to be a necessary trade off in order to make time to get real useful results from logic synthesis. As it turned out to be time consuming to make the data type conversion of existing blocks, the inclusion of more blocks would not fit within the time scope for a thesis at this level.

Chapter

6

Conclusion

6.1

Return to the Research Questions

The effort needed to adapt an existing system written for Mentor Graphics Catapult for synthesis in Cadence Stratus turned out to be quite substantial. Even a small block as the one covered here has a big amount of code around he actual core, leading to many changes required. For instance, almost every read or write from or to any variable has to be rewritten, tested and potentially debugged. Guidelines for avoiding getting locked into a tool and being dependent on a specific tool vendor are put forward in the next section. Based on the results presented, Mentor Graphics Catapult is the superior tool, both in regard to the quality of the generated RTL and the higher simulation speed during development allowed by the AC data types. Granted, the Stratus work flow might be preferred by some developers and have advantages regarding development time in some contexts. Stratus has for instance built in functionality for simplified verification of the hardware design. However, if the organisation’s work flow includes a dedicated verification team, as is common in large projects, this feature may be of limited use.

6.2

Guidelines and Recommendations

In order to archive any level of tool independence during HLS development, the main areas of concern is the data types used. For these guidelines both the case of creating a code base that can be converted for use with another tool with minimum effort and the case of using a common script with the possibility of running either tool without any modifications will be considered.

Data types; Use SC types and keep in mind the limitations of certain types; avoid using sc_bigint, as it has very poor simulation speed. If large bit widths are required, use arrays of sc_int instead. sc_fixt_fast has a limit at 64 bits. If simulation speed is of great importance, for instance for very large systems, dummy types that instantiate either SC or AC types depending on application can be considered. In this case the implantation will be very similar to an SC type imple-

29

30

Conclusion

mentation, as SC types generally should be easier to change into AC types than the other way around.

Wrappers; Wrappers in the sense of an interface towards the tools, are relatively easy to rewrite when needed, but if a script with choice of either HLS tool is required, a wrapper written with Stratus in mind will again be more compatible with Catapult compared to the other way around. However, this is argument is only valid for the specific case with Catapult and Stratus, based on the research conducted in this thesis no conclusions regarding other HLS tools on the market can be made.

Tool instructions; Tool instructions should be written in separate files, and not in the behavioural code. This is possible for both Catapult and Stratus, and most commands have equivalents in both tools. There might be possibilities for further optimisations of the design when certain tool specific commands are used, a possibility which will unfortunately be lost when aiming for maximum tool independence. For the case with a common script for multiple tools, definitions of new common instructions that will be interpreted as different instructions for respective tools will be needed. This can e.g. be done with a construction of ifdef and compilation flags.

HLS synthesis scrips; Project setup and HLS synthesis scrips will need to be set up individually for each tool. However, with knowledge of the tools this is a relatively small task.

6.3

Future work

The shortcomings of this thesis mentioned in the previous chapter lead neatly into suggestion for possible further work related to this subject. Future studies could include more blocks, and especially bigger blocks. This could give more room for the tools, both HLS tools and logic synthesis tool, to make optimisations. This could in turn reveal more significant differences in measurements such as area than was shown in this thesis.

Bibliography

Accellera (2016). SystemC Synthesizable Subset Version 1.4.7. URL: https: //accellera.org/images/downloads/standards/systemc/ SystemC_Synthesis_Subset_1_4_7.pdf. Cadence (n.d.[a]). Stratus Reference Manual. — (n.d.[b]). Stratus User’s Guide. EE Times (2004). High-level synthesis rollouts enable ESL. Ed. by eetimes.com. [Online; posted 31-May-2004]. URL: https://www.eetimes.com/ document.asp?doc_id=1217741. Electronic Engineering Journal (2015). Cadence Announces Stratus High-Level Synthesis Platform. Ed. by eejournal.com. [Online; posted 24-February2015]. URL: https : / / www . eejournal . com / industry _ news / 20150224-03/. Fingeroff, M. (2010). High-level Synthesis Blue Book. Bloomington: Xlibris Corporation. J. Proakis, D. Manolakis (2007). Digital signal processing. Upper Saddle River, NJ: Prentice-Hall. Joentakanen, T. (2017). Evaluation of HLS modules for ASIC backend. URL: http://URN.fi/URN:NBN:fi:tty-201705101358. M. Tovar, P. Elfborg (2017). “Virtual Cycle-accurate Hardware and Software Co-simulation Platform for Cellular IoT”. MA thesis. Department of Electrical and Information Technology, Lund University. Mentor Graphics (n.d.[a]). Algorithmic C (AC) Datatypes. — (n.d.[b]). Catapult HLS Reference Manual. URL: https://accellera. org/downloads/standards/systemc. Michael Keating, Synopsys Fellow (2011). The Simple Art of SoC Design. New York, NY: Springer. Moretti, Gabe (2008). Cadence introduces C-to-Silicon Compiler. Ed. by eetimes.com. [Online; posted 14-July-2008]. URL: https://www.eetimes. com/document.asp?doc_id=1309567.

31

32

BIBLIOGRAPHY

S. Lippman, J. Lajoie and B. Moo (2015). C++ Primer. Upper Saddle River, NJ: Addison-Wesley. Tumbush, G. (2007). Dramatically Increase the Performance of SystemC Simulations. URL: http://www.tumbush.com/published_papers/ AMI%5C%20Tumbush%5C%20DVCon%5C%2007.pdf. Vahid, F. (2010). Digital Design with RTL Design, Verilog and VHDL. John Wiley and Sons. W. Ecker, L. Schönberg (2007). Impact of SystemC data types on execution speed. 15th European SystemC Users Group Meeting April 2007. URL: https://www.embedded.uni-tuebingen.de/uploads/media/ Presentation-15-UP2_ecker.pdf),%20also%20available% 20at % 20https : / / www . researchgate . net / publication / 265941432_Impact_of_SystemC_data_types_on_execution_ speed.