Introduction to Embedded Systems Using the MSPM0+ [1 ed.] 9798852536594


121 98 37MB

English Pages 391 Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
CamScanner 01-05-2024 22.04_1.pdf
319K Book.pdf
CamScanner 01-05-2024 22.06_1.pdf
Recommend Papers

Introduction to Embedded Systems Using the MSPM0+ [1 ed.]
 9798852536594

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

I V1trodv1ctio\11 to l£VV1~edded SysteVV1s s1Vlt) tVle V\AS'?VV\O+ I











• •





• ,.





.





INTRODUCTION TO EMBEDDED SYSTEMS USING THE MSPM0+

First Edition, Third Printing December 2023

Jonathan W. Valvano

u

Jonathan Valvano

First edition 3 rd printing December 2023 The true engineering experience occurs not with your eyes and ears, but rather with your fingers and elbows. In other words, engineering education does not happen by listening in class or reading a book; rather it happens by designing under the watchful eyes of a patient mentor. So, go build something today, then show it to someone you respect!

ARM and uVision are registered trademarks of ARM Limited. Cortex and Keil are trademarks of ARM Limited. Code Composer Studio is a trademark of Texas Instruments. Al l other product or service names mentioned herein are the trademarks of their respective owners.

In order to reduce costs, this college textbook has been self-published. This book is configured specifically for ECE319K, taught at the University of Texas at Austin. For more information about my classes, my research, and my books, seehttp://users.ece.utexas.edu/ ~vaJvano/ For corrections and comments, please contact me at: [email protected] .edu. Please cite this book as:

J. W. Valvano, Introduction to Embedded Systems Using the MSPM0+, http://users.ece.utexas.edu/~valvano/, ISBN: 9798852536594. Copyright © 2024 Jonathan W. Valvano AJJ rights reserved. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, web distribution, information networks, or information storage and retrieval, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher. ISBN: 9798852536594

Jonathan Valvano

Table of Contents Preface ........................................................................................................ viii Acknowledgements ................................................................................... viii 1. Introduction to Embedded Systems ..................................................... 1 1.1. Embedded Systems ........................................................................ 1 1.2. Binary Information Implemented with MOS transistors .............. 3 1.3. Numbers .......................................................................................... 4 1.4. ARM Cortex-MO+ ............................................................................. 16

1.4.1. Registers ................................................................................................ 16 1.4.2. Reset ...................................................................................................... 18 1.4.3. Memory ........................................................................ ..... .................... 19 1.5. Assembly Language ....................................................................... 20

1.5.1. Syntax .................................................................................................... 20 1.5.2. Addressing Modes .............................................................................. 22 1.5.3. Data Access Instructions ..................................... ........... .................... 25 1.5.4. Logical Operations .............................................................................. 27 1.5.5. Shift Operations .................................... ........................ ....................... 29 1.5.6. Addition and Subtraction .......... ..................................... .................... 31 1.5.7. Stack ............ ........................................................................................... 39 1.5.8. CCS Assembler Directives ................................... ............................. .42 1.5.9. Functions and Control Flow ..... ........... ............................................. .43 1.5.10. Multiplication and Division ......................... .................................... 45 1.5.11. Arm Architecture Procedure Call Standard (AAPCS) ................ .47 1.6. Introduction to Data Structures ..................................................... 48 1. 7. The Software Development Process ............................................. 52 1.7.1. Integrated Development Environment ....... ................................... .. 52 1.7.2. Flowcharts and Structured Programming ................. ..................... . 53 1.7.3. Requirements Document ................ .. .................................................. 57

iii

iv

Jonathan Valvano

1.8. Time Delay ....................................................................................... 57 2. Introduction to Interfacing ............................................ ............ ............. 59 2.1. Review of Electronics .............. ............................... ... .. ............... .... 59 2.2. General Purpose Input and Output ................................................ 65

2.2.1 . Input/ Output Ports ............................. .. ....... .... ................................... 65 2.2.2. MSPM0G3507 Input/ Output Pins .............. ..... .. .. .......................... ... 68 2.2.3. MSPM0G3507 GPIO Programming .......... ................................ ........ 73 2.3. Interfacing ..................................................................... ................... 80

2.3.1. Switch Inputs .. ............................. .. .......... .. .................. ........................ 80 2.3.2. LED Outputs ..................... ......... .. ....... ............... ................. .. ...... .. ....... 83 2.3.3. Stepper Motor Controller .................... ............ .. ........................... .. ... . 90 2.3.4. Solid State Relay ...................... .................... .. ...................................... 94 2.3.5. Introduction to Design ....... ......................................................... ........ 95 2.4. Pulse width modulation .................................................................. 96 2.5. Macro .............................................................. .................................. 97 2.6. Hardware Timers ............................................................................. 100

2.6.1. SysTick Timer ................................... ........................................ ........... . 100 2.6.2. Ti1nerG8 ..... ... ....... ...... ..................... ......................... ............ ......... ....... . 102 2.7. Debugging Techniques ................................................................... 104

2.7.1. Heartbeat or Activity Indicator using an LED ... ............................. 104 2.7.2. Debugging Hardware ................................ ............................. ............ 105 2.7.3. Measuring Current with a DVM .. ... .. .............. .......... ............ ............ 106 3. Software Design and Testing ................................................................ 107 3.1. Software Design Process ............................................................... 107 3.2. Successive Refinement .................................................................. 109 3.3. Programming in C ........................................................................... 112

3.3.1. Overview ....... .............. ..................... .................................................. .. 112 3.3.2. Organization of C software ................................................................ 114 3.3.3. Tokens, Punctuation, and Keywords .. ...................... ....................... 116

Jonathan Valvano

3.3.4. Variables and Constants ........ ................................... .......................... 120 3.3.5. Operations and Precedence ................... .. .......................................... 122 3.3.6. Conditional Branch Instructions ....................................................... 123 3.3.7. Loops ................ ..................................................................................... 131 3.3.8. Functions ......................................... ...................................................... 133 3.3.9. Pointers ..... ...... ........................................................... ........................... 136 3.3.10. Call by value vs Call by reference ....................... ............................ 137 3.3.11. Strings .................................................................................................. 142 3.3.12. Standard I/ O Driver and the print£ Function ............................... 145 3.4. Modular Design using Abstraction ............................................... 149

3.4.1. Definition and Goals ................. .......................................................... 149 3.4.2. Functions, Procedures, Methods, and Subroutines ........................ 151 3.4.3. Dividing a Software Task into Modules .......................................... 152 3.4.4. How to Draw a Call Graph .................................. ....... ....... ...... .......... 155 3.4.5. How to Draw a Data Flow Graph ..................................................... 157 3.4.6. Top-down versus Bottom-up Design ............................................... 158 3.5. Quality Software ............................................................................. 159

3.5.1. Attitude ..................... .. .......................................................................... 159 3.5.2. Style Guidelines ................. ..................... .......................................... ... 162 3.5.3. Con1ments .................................................................... ......................... 165 3.5.4. Quantitative Performance Measurements ....... .. ........ ..... .......... ....... 166 3.5.5. Qualitative Performance Measurements ...................................... ... 167 3.6. Functional Debugging .................................................................... 168

3.6.1. Debugging Concepts ...... ..... .............................................................. .. 168 3.6.2. Instrumentation: Dump into Array without Filtering ................... 172 3.6.3. Instrumentation: Dump into Array with Filtering .................. ... .... 174 3.7. Performance Debugging ................................................................ 175

3.7.1. Cycle Counting ........................................................... ............... ...... .... 175 3.7.2. Using a Hardware Timer to Measure Elapsed Time ...................... 176

v

vi

Jon athan Valvano

3.7.3. Using a Logic Analyzer to Measure Elapased Time ...................... 178 4. Finite State Machines ............................................................................. 181 4.1. Structures ........................................................................................ 181 4.2. Finite State Machines with Linked Structures .............................. 182

4.2.1. Abstraction ............................................................................... .......... .. 182 4.2.2. Moore Finite State Machines .............................................................. 184 4.3. Debugging ........................................................................................ 192 5. Real-time Systems .................................................................................. 193 5.1. Hardware/Software Synchronization ............................................. 193 5.2. Interrupt-Triggered Multithreading ................................................ 198 5.3. NVIC on the ARM Cortex-M Processor .......................................... 203 5.4. Sys Tick Periodic Interrupts ............................................................ 207 5.5. Approximating continuous signals in digital domain .................. 209 5.6. Digital to Analog Conversion ......................................................... 210 5.7. Music Generation ............................................................................ 213 5.8. Internal 12-bit DAC .......................................................................... 218 5.9. Real-time Debugging Tools ............................................................ 220 6. Variables, Conversions, and LCD Output.. ........................................... 221 6.1. Local, Static, and Global Variables ................................................ 221 6.2. Stack rules ....................................................................................... 224 6.3. Local variables allocated on the stack .......................................... 225 6.4. Managing Overflow ......................................................................... 232 6.5. Fixed-point Numbers ....................................................................... 234 6.6. Conversions ..................................................................................... 238 6.7. Recursion ......................................................................................... 240 6.8. Serial Peripheral Interface, SPI ...................................................... 244 6.9. ST7735R Graphics LCD Interface .................................................. 247 7. Data Acquisition Systems ...................................................................... 249 7.1. lnterthread Communication and Synchronization ....................... 249 7.2. Analog to Digital Conversion ......................................................... 251

Jonathan Valvano

7.3. Signal to Noise Ratio ...................................................................... 255 7.4. Timer Periodic Interrupts ............................................................... 257 7.5. Performance measurements ......................................................... 259 8. Communication Systems ...................................................................... 261 8.1. Introduction to Networks ............................................................... 261 8.2. Reentrant Programming and Critical Sections ............................ 264 8.3. Producer-Consumer using a FIFO Queue .................................... 267

8.3.1. Basic Principles of the FIFO Queue ......................... ......................... 267 8.3.2. FIFO Queue Analysis ................................................... ................. .... .. 273 8.3.3. FIFO Queue Implementation ............................................................. 275 8.4. Universal Asynchronous Receiver Transmitter (UART) .............. 278

8.4.1. Asynchronous Communication ........................................................ 278 8.4.2. MSPM0 UART Details ........................................................................ 280 8.4.3. Busy-wait UART Device Driver .......................... .............................. 283 8.4.4. Interrupt-driven UART Device Driver.. ............ ............................... 285 8.5. Profiling ........................................................................................... 288

8.5.1 Profiling using a software dump to study execution pattern ........ 288 8.5.2. Profiling using an Output Port ................. .. ............................. .. ........ 289 8.5.3. Thread Profile ........... ....... ......... ..................................... ...................... 289 9. Embedded System Design .................................................................... 291 9.1. Graphics .......................................................................................... 292 9.2. Sprites ............................................................................................. 297 9.3. Edge-triggered Interrupts .............................................................. 300 9.4. Playing Sound Files ........................................................................ 303 9.5. Modular Design Example ............................................................... 305 9.6. Best Practices ................................................................................. 308 Appendix 1. Glossary ................................................................................ 310 Appendix 2. Solutions to Checkpoints .................................................... 323 Appendix 3. Assembly Reference ............................................................ 331

vii

viii

Jonathan Valvano

Appendix 4. LC3 to ARM Cortex-M Conversion ....................................... 362 Appendix 5. Digital Logic ........................................................................... 364 Index ............................................................................................................ 369

Preface An embedded system is a system that performs a specific task and has a computer embedded inside. A system is comprised of components and interfaces connected for a common purpose. This book is an introduction to embedded systems. Specific topics include the MSPM0+ microcontroller, finite-state machines, debugging, fixed-point numbers, the design of softwaJ"e in assembly language and C, elementary data structures, programming input/output, inten-upts, measurements with analog to digital conversion, graphics, sound production with digital to analog conversion, introduction to networks using serial communication, and real-time systems. There is a web site accompanying this book http://users.ece.utexas.edu/~valvano/mspmO. Posted here are projects for each of the example programs in the book.

Acknowledgements I owe a wonderful debt of gratitude to Daniel Valvano. He wrote and tested most of the software examples found in these books. A special thanks to Texas Instruments (Doug Phillips, Cathy Wicks, Ayesha Mayhugh and Mark Easley) for all the support emotional and monetary. The educational content presented in this book is result of the combined efforts of the entire teaching staff of ECE3 l 9K: Drs. Ramesh Yerraballi, Mattan Erez, Andreas Gerstlauer, Mohit Twiari, Vijay Janapa Reddi, Nina Telang, William Bard, Al Cuevas, Lucas Holt, Vivek Telang, Derek Chiou, and me. This team has created an educationally rich lab course that is both engaging and achievable for the freshman engineer. Each time we teach ECE319K, we create a capstone design experience centered on a class competition described in Chapter 9. You can see descriptions and photos of our class design competitions at http://users.ece.utexas.edu/~valvano/. Ramesh Yerraballi and I have created three MOOCs, which have had over 150,000 students, and delivered to 110 countries. Much of the material in this book was developed under the watchful eye of Professor Yerraballi. lt has been an honor and privilege to work with such a ski lled and dedicated educator. Sincerely, I appreciate the valuable lessons of character and commitment taught to me by my parents and grandparents. I recall how hard my parents and grandparents worked to make the world a better place for the next generation. Most significantly, J acknowledge the love, patience and support of my wife, Barbara, and my children, Ben Dan and Liz. By the grace of God, I am truly the happiest man on the planet, because I am surrounded by these fine people. Good luck. -.Jonathan W Valvano This book is dedicated to my mom and dad, who taught me to give first and ask questions later.

1. Introduction to Embedded Systems

1.1. Embedded Systems To better understand the expression embedded microcomputer system , consider each word separately. In this context, the word "embedded" means hidden inside so one can' t see it. The term "micro" means srnall, and a "computer" contains a processor, memo1y, and a means to exchange data with the external world. The word "system" means multiple components interfaced together for a common purpose. Systems have structure, behavior, and interconnectivity operating in a framework bound by rules and regulations. Another name for embedded systems is Cyber-Physical Systems, introduced in 2006 by Helen Gill of the National Science Foundation. Random access memory (RAM) is storage allowing fast read and write access, e.g., about 10 nanoseconds. In an embedded system, we place variable data in RAM. Data stored in RAM changes dynamically as software is executed. Information in RAM is lost when power is removed. Memory that loses data when power is turned off and back on is called volatile. Flash read only memry (ROM) is memory that can be erased and reprogrammed. However, erasing and storing dat,1 into flash is slow, e.g., about 1 millisecond. On the other hand, reading data from flash is fast, typically the same speed as RAM. Flash is categorized as nonvolatile memory because infonnation is not lost when power is turn off and back on. In an embedded system, we use flash for storing the software and fixed constant data. The functionality of a toaster is defined by the software programmed into its flash. If you unplug a toaster, move it, then plug it back in, it still behaves like a toaster because the flash is nonvolatile storage. Microcontrollers are small computers incorporating the processor, RAM, flash, and J/O po11s into a single package, and they are often employed in an embedded system because of their low cost, small size, and low power requirements. A bus is a collection of wires used to pass information between modules. The common bus in Figure 1.1. l defines the von Neumann architecture, because a single bus is used for both data and instructions. The term embedded system refers to a device that contains one or more computers inside. I/0 ports allow access to the physical world. Data flows into and out of the computer through ports. The software together with the 1/0 ports and associated interface circuits give an embedded computer system its distinctive characteristics. The microcontrollers often must communicate with each other. How the system interacts with humans is often called the human-computer interface (HCI) or manmachine interface (MMI). Checkpoint 1.1.1: What is an embedded system? Checkpoint 1.1.2: What is a microcontroller?

Software is an ordered sequence of very specific instructions that are stored in ROM, defining exactly what and when certain tasks are to be performed. Software is executed in a complicated but well-defined manner. The processor executes the software by retrieving and interpreting these instructions one at a time.

2

• 1. Introduction M icrocontroller

Bus

Processor

RAM Pins External circuits

External circuits

,-----,

Input Physical _ signals=: devices

=:

Physical devices

Output signal s

Input port Address Data

Fip,zm 1.1.1. An e111bedrled .ryste111 includes a 1J1icroco/J/puter i11te,jaced to e:xlemal p!?J1sict1! rle1 ic1:s. 1

An input/output (l/0) is the hardware and software that allow the system to communicate with the external environment. We must also learn how to interface a wide range of inputs and outputs that can exist in either digital or analog form. In general, we can classify l/0 interfaces into parallel, serial, analog or time. Parallel 1/0 or general purpose input/output (GPIO) means two or more bits of information are passed at the same time using different wires. Serial 1/0 means two or more bits of information are passed at the different times on the same wire. Analog J/0 will provide variable voltages (e.g., between O and 3.3V) on a single wire. Time 1/0 encodes information in the time domain as period, frequency, phase, or pulse width. Because of low cost, low power, and high performance, there has been and will continue to be an advantage of using time-encoded inputs and outputs. lf the embedded system is connected to the internet, it is classified as an Internet of Things (loT). We will learn the basics of communication systems in Chapter 8, but detailed interfacing of the embedded system to the internet will be presented in Volumes 2 and 3. The other general concept involved in most embedded systems is they run in real time. Managing time, both as an input and an output, is a critical task. It is not only important to get the correct output, but to get the correct output at the proper time. In a real-time computer system, we can put an upper bound on the time required to perform the input-calculation-output sequence. A real-time system can guarantee a worst case upper bound on the response time between when the new input infom1ation becomes available and when that information is processed. This response time is called interface latency. Another real-time requirement that exists in many embedded systems is the execution of periodic tasks. A periodic task is one that must be performed at equaltime intervals. A real-time system can put a small and bounded limit on the time error between when a task should be run and when it is actually run . Because of the real-time nature of these systems, microcontrollers have a rich set of features to handle many aspects of time. Checkpoint 1.1.3: Why do we encode information as period, frequency, phase, o r pulse width?

Checkpoint 1.1.4: What is a real time?

Jonathan Valvano

3

1.2. Binary Information Implemented with MOS transistors lnfmmation is stored on the computer in binary form. A bina1y bit can exist in one of two possible states. In positive logic, the higher voltage is called the ' I ', true, asserted, or high state. The lower voltage is caJlcd the ' 0 ', false, not asse11ed, or low state. Figure 1.2.1 shows the output of a typical complementary metal oxide semiconductor (CMOS) circuit. The left side shows the condition with a true bit at the output, and the right side shows a false at the output. The output of each digital circuit consists of a p-typc transistor "on top of' an 11-type transistor. ln digital circuits, each transistor is either on or off, determined by its Gate (G) voltage. If the transistor is on, it is equivalent to a short circuit between its two Source (S) and Drain (D) pins. Conversely, if the transistor is off, it is equivalent to an open circuit between Sand D.

True

Equivalence +3.3V

p-type

~

False

Out= 3.3V

~

ut=3.3V

I open

4

Out=

n-type

Out=0V

OV

~

~ lopen

-

-

Equ;valence +3.3V

p-type

art

n-type

+3.3V

short -

-

Figure 1.2.1. A binary bit ctt !he 011tp1d is !nte if a JJol!ctge is present and false if the vol/age is 0. Every family of digital logic is different, but on the MSPM0+ microcontrollers from Tl powered with 3.3 V supply, an input voltage between 2.31 and 3.6 V is considered high, and an input voltage between 0 and 0.99 Vis considered low, as drawn in Figure 1.2.2. The difference between the minimum high voltage and maximum low voltage is called noise margin, which is 1.32 V. It allows digital logic to operate reliably at very high speeds in the presence of noise. The design of transistor-level digital circuits is beyond the scope of this book. However, it is impo11ant to know that digital data exist as binary bits and encoded as high and low voltages.

Digital Analog

!!legal

"O"

~ I o

0.99

H

l"

-------I

- 1

2.31

3.6 V

Figure 1.2.2. iVIapping betJ1Jeeu analog 110/tage and the correspo11rli11g digital mem1i11g on the MSPNI0C350x. lf the infonnation we wish to store exists in more than two states, we use multiple bits. A collection of 2 bits has 4 possible states (00, 01, l 0, and l 1). A collection of3 bits has 8 possible states (000, 001 , 010, 011 , 100, 101, l 10, and 111). In general, a collection ofn bits has 2" states. For example, a byte contains eight bits and is built by grouping eight binary bits into one object, as shown in Figure 1.2.3 . Another name for a collection of eight bits is octet (octo is Latin and



4

1. Introduction Greek meaning 8.) Information can take many forms, e.g., numbers, logical states, text, instructions, sounds, or images. What the bits mean depends on how the information is organized and more importantly how it is used. This figure shows one byte in the state representing the binary number OI I00111, which could mean I03. As a character, this same collection of bits could represent the letter 'g' . Again, the output voltage 3.3V means true or 1, and the output voltage of0V means false or 0. For more information, see Appendix 5.

Bil 7

Bit 6 +3.3V

1 ~

On

On

Bit 5 +3.3V

1

Bit 4

Bit 3

Bit 2 +3 .3V

Bit I +3.3V

1~1) 1~

Bit 0 +3 .3V

11

On

~~ ~~

3.3

3.3

4~]

-

l-"7igtm 1. 2. 3. A byte is co111prised of 8 hits, in this case representing !he bi11t11J' m11vber 0 1100111 .

1.3. Numbers The term kilobyte used to be ambiguous, meaning either l 000 bytes or I 024 bytes. ln 1998 the International Electrotechnical Commission (lEC) defined a new set of abbreviations for the powers of 2, as shown in Table 1.3.1. These new terms are endorsed by the Institute of Electrical and Electronics Engineers (IEEE) and International Committee for Weights and Measures (CIPM) in situations where the use of a binaiy prefix is appropriate. The correct terminology is to use the SI-decimal abbreviations to represent powers of 10, and the IEC-binaiy abbreviations to represent powers of 2. The scientific meaning of 2 kilovolts is 2000 volts, but 2 kibibytes means 2048 bytes. The term kibibyte is a contraction of kilo binary byte and is a unit of infonnation or computer storage, abbreviated KiB.

1 Kill = 2 10 bytes = 1024 bytes 1 MiB = 220 bytes= 1,048,576 bytes 1 GiB = 230 bytes= 1,073,741,824 bytes These abbreviations can also be used to specify the number of binary bits. The term kibibit is a contraction of kilo bina1y bit, and is a unit of information, abbreviated Kibit. A mebibyte ( I MiB is 1,048,576 bytes) is approximately equal to a megabyte ( J MB is 1,000,000 bytes), but mistaking the two has nonetheless led to confusion and even legal disputes. As engineers we must be precise, using gigahertz (GHz) for l,000,000,000 Hz and gibibyte (GiB) for 1,073,741,824 bytes.

Jonathan Valvano Value

SJ

Decimal k M G T p E

1000 1 1000 2 1000 3 10004 10005 10006 1000 7 10008

z y

SI Decimal kilomegagigaternpetaexazetta~otta-

Value

!EC Binary__

1024 1 10242 10243 10244 10245 10246 10247 10248

Ki Mi Gi Ti

Pi Ei Zi

Yi

5

!EC Binary kibimebigibitebipebiexbizebi~obi-

Table 1.3.1. Common abbreviations; for large numbers.

To solve problems using a computer we need to understand numbers and what they mean. Each digit in a decimal number has a place and a value. The place is a power of 10 and the value is selected from the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. A decimal number is simply a combination of its digits multiplied by powers of 10, called place-value. For example 1984

= 1•103 + 9•1Q2 + 8•10 1 + 4•10U

Fractional values can be represented by using the negative powers of 10. For example, 273.15 = 2•10 2 + 7•1QI + 3•JQO+ 1•10· 1 + 5•10·2

ln a similar manner, each digit in a binary number has a place and a value. In binary numbers, the place is a power of 2, and the value is selected from the set {0, 1}. A binary number is simply a combination of its digits multiplied by powers of 2. To eliminate confusion between decimal numbers and binary numbers, we will put a subscript 2 after the number to mean binary. Because of the way the microcontroller operates, most of the binary numbers in this book will have 8, 16, or 32 bits. An 8-bit number is called a byte, a 16-bit number is called a halfword, and a 32-bit number is called a word . On the Cortex-M, we will utilize all three fonnats. For examp le, the 8bit binary number for 106 is 011010102

= 0•27 + 1•2 6 + 1•25 + 0•24 + 1•23 + 0•22 + 1•2 I + 0•20 = 64+ 32+8+2

Checkpoint 1.3.1: What is tbe numericaJ value of the 8-bit binary number 0bl 1111111? Binary is the natural language of computers but a big nuisance for us humans. To simplify working with binary numbers, humans use a related number system called hexadecimal, which uses base 16. Just like decimal and binary, each hexadecimal digit has a place and a value. ln this case, the place is a power of 16 and the value is selected from the set {0, I , 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F}. As you can see, hexadecimal numbers have more possibilities for their digits than are available in the decimal format; so, we add the letters A through F, as shown in Table 1.3.2. A hexadecimal number is a combination of its digits multiplied by powers of 16. To eliminate confusion between various formats , we will put a Ox, an x, or a $ before the number to mean hexadecimal. Hexadecimal representation is a convenient mechanism for us humans to define binary information, because it is extremely simple for humans to convert back and forth between binary and hexadecimal. Hexadecimal number system is often abbreviated as "hex". A nibble is defined as 4 binary bits, or one hexadecimal digit. Each value of the 4-bit nibble is mapped into a unique hex digit, as shown in Table 1.3 .2.

6



1. Introduction Hex Digit 0

Decimal Value 0 2

2 3 4 5

Binary Value 0000 0001 0010

3 4 5 6 7

6

7 8 9 A or a B orb C or c D urd E ore r or f

001 I

8 9 10 11 12

13 14 15

0100 0·101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Table 1.3.2. Definition of hexadecimal representation. For example, the hex adec imal mlITlber for the 16-bit binary 0001 0010 1010 1101 is Ox12J\D

= 1•163 +

2•16 2 + !0•161 + 13•W' = 4096+512+160+13 = 4781

Observation: In order to maintain consistency between assembly and C programs, we will use the Ox format when writing hexadecimal numbers in this book. Checkpoint 1.3.2: What is the munerica] value of the 8-bit hexadecimal number OxFE? As illustrated in Figure 1.3.1 , to conve1t from binary to hexadecimal we can: 1) Divide the binaiy number into right justified nibbles, 2) Convert each nibble into its corresponding hexadecimal digit.

Binary

0011011011001101

ITT\ ~\//

Nibbles 0011 0110 1100 1101 Hexadecimal

0x36CD

Fig11n: 1.3.1. Ex{/J11ple con11ersio11 hehvee11 hinary (11/d hexadecimal Checkpoint 1.3.3: Convert the 8-bit binary number 0601000111 to hexadecimal. Checkpoint 1.3.4: Convert the 12-bit binary number Ob 110110101011 to hexadecimal.

To conve1i from hexadecimal to binary we can reverse the process: 1) Convert each hexadecimal digit into its corresponding 4-bit binary nibble, 2) Combine the nibbl es into a single binary number.

7

.Jonathan Valvano Checkpoint 1.3.5: Convert the hex number 0x49 to binary. Checkpoint 1.3.6: How many binary bits does it take to represent 0x12345? Observation: Some compiJers, like Clang and GNU, support binary constants of the form 0b0l 100100, which is equal to 0x64 or 100.

Computer programming environments use a wide variety of symbolic notations to specify the hexadecimal numbers. E.g., assume we wish to represent the binary number 01111010 2 . Some assembly languages use $7 A, while other assembly languages use 7AH. The C language uses 0x7A. Patt's LC-3 simulator uses x7A. We will use the 0x7A format for both C and assembly. Precision is the number of distinct or different values. We express precision in alternatives, decimal digits, bytes, or binary bits. Alternatives are defined as the total number of distinct possibilities. For example, an 8-bit munber format can represent 256 different numbers. An 8-bit digital to analog converter (DAC) can generate 256 different analog outputs. A 12-bit analog to digital converter (ADC) can measure 4096 different analog inputs. Table 1.3.3 illustrates the relationship between precision in binary bits and precision in alternatives. The operation [[x]] is defined as the greatest integer of x. E.g., [[2.1]] [[2.9]] and [[3.0]] are all equal to 3. The Bytes column in Table l .3.3 specifies how many bytes ofmem01y it would take to store a number with that precision assuming the data were not packed or compressed. Binary bits 8 10 12 16 20 24 30 32 n

Bytes

I 2 2 2 3 3 4 4 [[n/8]]

Alternatives 256 1024 4096 65536 1,048,576 16,777,216 1,073,741,824 4,294,967,296 2"

Table 1.3.3. Relationship between bits, bytes and alternatives as units of precision. Checkpoint 1.3.7: How many bytes of memory would it take to store a 60-bit number? Sometimes we are interested in only the approximate size of a digital number system, see Table 1.3.4. Since 2 is about 103, we can approx imate 2 11 as 10 * 11 • For example, 2 is 2 *2 which is about 8 million . 10

10

3

23



3

20

,

Binary bits Alternatives Approximation 1024 1000 10 I 000 2 = 1,000,000 20 1,048,576 1,073,741,824 I 000 3 = 1,000,000,000 30 l ,099,511,627,776 1000 = l ,000,000,000,000 40 l ,125,899,906,842,624 10005 = 1,000,ooo,ooo,ooo,ooo 50 Table 1.3.4. Relationship between bits and alternatives as units of precision. 4

Checkpoint 1.3.8: Use Table 1.3.4 to approximate the value of 2 without using a calculator. 3

5

Checkpoint 1.3.9: Use Table 1.3.4 to approximate the value of 2·12 without using a calculator.

8



1. Introduction A byte contains 8 bits as shown in Figure 1.3.2, where each bit b7, ... ,b0 is binaty and has the value 1 or 0. We specify b7 as the most significant bit or MSB, and bo as the least significant bit or LSB. In C, the specifier char means 8 bits or 1 byte. In C99, the specifier uint8 _ t means 8-bit unsigned and int8 _ t means 8-bit signed. In this book, we will use char to store ASCII characters and we will use uint8 t and int8 t to store 8-bit numbers.

Fig11re 1.3.2. 8-bit binary Jor71lal. If a byte is used to represent an unsigned number, then the value of the number is

Notice that the significance of bit n is 211 • There are 256 different unsigned 8-b it numbers. The smallest unsigned 8-b it number is 0 and the largest is 255. For example, 0b0000 10 IO is 8+2 or l 0. Other examples are shown in Table 1.3.5. The least significant bit can tell us if the number is even or odd. Furthermore, if the bottom n bits are 0, the number is divisible by 211 • bimuy 000000002 0l00000lz 000101102 1000011 l 2 lllllll [z

hex 0x00 0x41 0x16 0x87 0xFF

Calculation 64+1 16+4+2 128+4+2+ ] 128+64+32+ 16+8+4+2+ 1

decimal 0 65 22 135 255

Table 1.3.5. Example conversions from unsigned 8-bit binary to hexadecimal and to decimal. Checkpoint 1.3.10: Convert the binary number 0601101011 to unsigned decimal. Checl,,.-point 1.3.11: Convert the hex number 0x46 to unsigned decimal.

The basis of a number system is a subset from which linear combinations of the basis elements can be used to construct the entire set. The basis represents the "places" in a "place-value" system. For positive integers, the basis is the infinite set {I, 10, l 00, ... }, and the "values" can range from Oto 9. Each positive integer has a unique set of values such that the dot-product of the value vector times the basis vector yields that number. For example, 2345 is the dot-product ( .. ., 2,3,4,5) • ( ... , 1000, 100,10, 1), which is 2* 1000+3*100+4*10+5. For the unsigned 8-bit number system, the basis elements are {128,64,32, 16,8,4,2, 1} The values of a binary number system can only be 0 or I. Even so, each 8-bit unsigned integer has a unique set of values such that the dot-product of the values times the basis yields that number. For example, 69=0x45 is (0,l,0,0,0,l,0,1)•(128,64,32,16,8,4,2,1), wh ich equals 0*128+ 1*64+0*32+0* 16+0*8+1*4+0*2+1*I. Conveniently, there is no other set of O's and l's, such that set of values multiplied by the basis is 69. In other words, each 8-bit unsigned binary representation of the values Oto 255 is unique.

Jonathan Valvano

9

One way for us to convert a decimal number into binary is to use the basis elements. The overall approach is to start with the largest basis element and work towards the smallest. More precisely, we start with the most significant bit and work towards the least significant bit. One by one, we ask ourselves whether or not we need that basis element to create our number. Ifwe do, then we set the corresponding bit in our binary result and subtract the basis element from our number. If we do not need it, then we clear the corresponding bit in our binary result. We will work through the algorithm with the example of converting 100 to 8-bit binary, see Table 1.3.6. We start with the largest basis element (in this case 128) and ask whether or not we need to include it to make 100? Since our number is less than 128, we do not need it, so bit 7 is zero. We go the next largest basis element, 64 and ask, "do we need it?" We do need 64 to generate our 100, so bit 6 is one and we subtract I 00 minus 64 to get 36. Next, we go the next basis element, 32 and ask, "do we need it?" Again, we do need 32 to generate our 36, so bit 5 is one and we subtract 36 minus 32 to get 4. Continuing along, we do not need basis elements 16 or 8, but we do need basis element 4. Once we subtract the 4, our working result is zero, so basis elements 2 and I are not needed. Putting it together, we get 0b0 I I 00 I 00, 100 = 64+ 32+4. Observation: Bit 7 of an 8-bit number determines whether it is greater than or egual to 128.

Number 100 100 36 4 4 4 0 0

Basis 128 64 32 16 8 4 2 l

bit Need it? bit 7=0 no bit 6= 1 yes bit 5=1 yes bit4=0 no bit 3=0 no bit 2=1 yes bit 1=0 no bit 0=0 no Table 1.3.6. Example: convert 100 clecimal to unsigned 8-bit binary,

Operation none subtract I 00-64 subtract 36-32 none none subtract 4-4 none none 0b01100100.

Checkpoint 1.3.12: Io thi s conversion algorithm, how can we tell if a basis element is needed? Checkpoint 1.3.13: Convert decimal 45 to 8-bit binary and hexadecimal. Checkpoint 1.3.14: Coovert decimal 200 to 8-bit binary and hexadecimal.

One of the first schemes to represent signed numbers was called one's complement. It was called one's complement because to negate a number, we complement (logical not) each bit. For example, since 25 equals 060001100 I in binary, to get - 25 we flip all the bits yielding 0blll00ll0. An 8-bit one's complement number can vary from -127 to +127. The most significant bit is a sign bit, which is 1 if and only if the number is negative. The difficulty with the one's complement format is that there are two zeros +0 is 0b00000000, and - 0 is Ob 11111111. Another problem is that one's complement numbers do not have basis elements. These limitations led to the use of two's complement. The two's complement number system is the most common approach used to define signed integers. It is called two's complement because to negate a number, we complement each bit (like one's complement), and then add I. For example, if 25 equals 0b000I 1001 in binary, then -25 is 0bl 1100111. The value of an 8-b it signed number in two's complement format is

10

• 1. Introduction For the signed 8-bit mm1ber system the eight basis elements are {-128, 64, 32, 16, 8, 4, 2, 1J Observation: One usually means two's complement when one refers to signed integers.

There are 256 different signed 8-bit numbers. The smallest signed 8-bit number is -128 and the largest is + 127. For example, Ob 10000010 equals -128+2 or -126. Other examples are shown in Table 1.3.7. binary 0b00000000 0b01000001 0b000l0I 10 0blO000ll l 0bllllllll

Hex 0x00 0x41 0xl6 0x87 0xFF

Calculation 64+1 16+4+2 -128+4+2+ 1 -128+64+32+ 16+8+4+2+ 1

decimal 0 65 22 -121 -1

Table 1.3.7. Example conversions from signed 8-bit binary to hexadecimal and to decimal.

Checkpoint 1.3.15: Convert the signed binary number Ob 11101010 to signed decimal. Checkpoint 1.3.16: Are the signed and un signed decimal representations of the 8-bit hex number 0x45 the same or different? Observation: The most significant bit in a two's complement number will specify the sign.

Notice that the same 8-b it binary pattern of Ob 11111111 could represent either 255 or - 1. It is ve1y important for the software developer to keep track of the nwnber format. The computer cannot determine whether a number is signed or unsigned. You, as the programmer, will determine whether the number is signed or unsigned by the specific assembly instructions you select to operate on the number. Some operations like addition, subtraction, multiplication, and shift left (multiply by 2) use the same hardware (instructions) for both unsigned and signed operations. On the other hand, division, and shift right (divide by 2) require separate hardware (instructions) for unsigned and signed operations. When implementing a conditional branch, we will use different instruments when comparing two signed numbers versus when comparing two unsigned numbers. Similar to the unsigned algorithm, we can use the basis to convert a decimal number into signed binary. We will work through the algorithm with the example ofconverting - 100 to 8-bit binary, as shown in Table 1.3.8. We start with the most significant bit (in this case - 128) and decide do wc need to include it to make - 100? Yes (without - 128, we would be unable to add the other basis elements together to get any negative result), so we set bit 7 and subtract the basis element from our value. Our new value equals - l 00 minus - 128, which is 28. We go the next largest basis element, 64 and ask, "do we need it?" We do not need 64 to generate our 28, so bit 6 is zero. Next, we go the next basis element, 32 and ask, "do we need it?" We do not need 32 to generate our 28, so bit 5 is zero. Now we need the basis element 16, so we set bit 4, and subtract 16 from our number 28 (28- I 6=12). Continuing along, we need basis elements 8 and 4 but not 2, 1. Putting it together we get Ob I 0011100 (which means -128+ 16+8+4).

Jonathan Valvano Number -100 28 28 28 12 4 0 0

Basis -128 64 32 16 8 4 2 I

Need it yes no no yes yes yes 110 110

bit bit 7=1 bit 6=0 bit 5=0 bit 4=1 bit 3=1 bit 2=1 bit l=0 bit 0=0

11

Operation subtract-100- -128 none none subtract 28 -1 6 subtract 12-8 subtract 4-4 none none

Table 1.3.8. Example: convert decimal -100 to signed 8-bit binary, Ob 10011100. Observation: To take the negative of a two's complement signed number we first complement (flip) all the bits, then add 1.

A second way to convert negative numbers into bina1y is to first convert them into unsigned binary, then do a two's complement negate. For examp le, we earlier found that + 100 is 0b01100100. The two's complement negate is a two-step process. First, we do a logical comp lement (flip all bits) to get Ob 1001 I 011. Then add one to the result to get Ob 10011100. A third way to convert negative numbers into binary is to first add 256 to the number, then convert the unsigned result to binary using the unsigned method. For example, to find - 100, we add 256 plus - I 00 to get 156. Then we conve1t 156 to binary resulting in Ob I 0011100. This method works because in 8-bit binary math adding 256 to number does not change the value. E.g., - 100+2 56 has the same 8-bit binary value as - 100. Checkpoint 1.3.17: Convert decimal -45 to 8-bit binary and hexadecimal. Checkpoint 1.3.18: Why can't yo u represent the number 200 using 8-bit signed binary?

Overflow is defined as the error that occurs during an operation where the result of the operation is too big to be correctly stored in the destination. Although the same instructions for addition and subtraction work for both signed and unsigned numbers, the overflow (error) conditions are distinct. The C bit in the condition code register signifies unsigned overflow, and the V bit means a signed overflow has occurred. Common Error: An error will occur if yo u use signed operations on unsigned numbe rs, or use unsigned operations on signed numbers. Maintenance Tip: To improve the clarity of our software, always specify the forma t of yo ur data (signed versus unsigned) when defining or accessing the data.

A halfword or double byte contains 16 bits, where each bit b 15,... ,bo is binary and has the value I or 0, as shown in Figure 1.3 .3. In C, the specifier short means 16 bits or 2 bytes. In C99, the specifier uin tl 6_ t means 16-bit unsigned and in tl 6_ t means 16-bit signed.

bl5 bl4 bl3 bl2 bll blO b9 b8 b7 b6 b5 b4 b3 b2 bl bO Fig11re 1.3.3. 16-bit binary format. If a halfword is used to represent an unsigned number , then the value of the number is

12



1. Introduction N = 32768•615 + 16384•614 + 8192•613 + 4096•612 + 2048•611 + 1024•610 + 512•69 + 256•bs + 128•67 + 64•66 + 32•65 + 16•64 + 8•63 + 4•62 + 2•61 + bo There are 65536 different unsigned 16-bit numbers. The smallest unsigned 16-bit number is 0 and the largest is 65535. For example, 060010000110000[00 or 0x2184 is 8192+2 56+128+4 or 8580. Other examples are shown in Table 1.3.9.

binary 0b0000000000000000 0b00000J000000000l 060000110010100000 061000111000000010 Ob 111111 111111111 l

hex 0x0000 0x0401 0x0CA0 0x8E02 0xFFFF

Calculation 1024+ 1 2048+1024+128+32 32768+2048+1024+5]2+2 32768+ 16384+8 192+4096+2048+1024 +512+256+ 128+64+32+ 16+8+4+2+ 1

decimal 0 1025 3232 36354 65535

Table 1.3.9. Example conversions from unsigned 16-bit binary to hexadecimal and to decimal. Checkpoint 1.3.19: Convert the 16-bit binary number 060010000001101010 to unsigned decimal. Checkpoint 1.3.20: Convert the 16-bit hex number 0x1234 to unsigned decimal.

For the unsigned 16-bit number system the sixteen basis elements are {32768, 16384,8 192,4096, 2048, 1024,512,256, 128,64,32, 16,8,4,2, 1} Checkpoint 1.3.21: Convert the unsigned decimal number 1234 to 16-bit hexadecimal. Checkpoint 1.3.22: Convert the unsigned decimal number 10000 to 16-bit binary.

There are also 65536 different signed 16-bit numbers. The smallest two 's complement signed 16-bit number is - 32768 and the largest is 32767. For example, 0bl 101000000000100 or 0xD004 is - 32768+ 16384+4096+4 or - 12284. Other examples are shown in Table 1.3.10. binary 060000000000000000 060000010000000001 060000110010100000 061000010000000010 061111111111111111

hex 0x0000 0x0401 0x0CA0 0x8402 0xFFFF

Calculation 1024+ 1 2048+1024+128+32 -32768+ l 024+2 -32768+ 16384+8192+4096+2048+ 1024 +512+256+ 128+64+32+16+8+4+2+ 1

Table 1.3.10. Example conversions from signed 16-bit binary to hexadecimal and to decimal.

decimal 0 1025 3232 -31742 -1

Jonathan Valvano

13

If a halfword is used to represent a signed two's complement number, then the value of the number is N

= -32768•615 + 16384•614 + 8192•613 + 4096•61 2

+ 2048•611 + 1024•610 + 512•69 + 256•68 + 128•6 7 + 64•66 + 32•bs + 16•64 + 8•63 + 4•bz + 2•61 + bo

Checkpoint 1.3.23: Convert the 16-bit hex number 0x1234 to signed decimal. Checkpoint 1.3.24: Convert the 16-bit hex number 0x.ABCD to signed decimal.

For the signed 16-bit number system the sixteen basis elements are {-32768, 16384, 8192,4096,2048, 1024,512,256, 128,64,32, 16,8,4,2, 1} Common Error: J\11 error will occur if you use 16-bit operations on 8-bit numbers, or use 8bit operations on 16-bit numbers. Maintenance Tip: 'To improve the clarity of your software, always specify the precision of your data when defining or accessing the data. Checkpoint 1.3.25: Convert the signed decimal number 1234 to 16-bit hexadecimal. Checkpoint 1.3.26: Convert the signed decimal number -10000 to 16-bit binary.

On the ARM, a word is 32 bits wide. In C, the specifier long means 32 bits. In C99, the specifiers uint32 _ t and int32 _ t mean 32 bits. Consider an unsigned number with 32 bits, where each bit b31 , ... ,bo is binary and has the value 1 or 0. If a 32-bit number is used to represent an unsigned integer, then the value of the number is N

= 231 • 631

+ 230 • 630 + '" + 2•61 + bo

• = ""'31 L...J;;o2' *b;

There are 2 32 different unsigned 32-bit numbers. The smallest unsigned 32-bit number is 0 and the largest is 232- 1. This range is Oto about 4 billion . For the unsigned 32-bit number system, the 32 basis elements are

lfa 32-bit binary number is used to represent a signed two's complement number, then the value of the number is N

= -231 • 631

+ 230 • b30 + ... + 2•61 + bo

= -2 31 • 631

+

""'30 LJ;; o



2' *b;

There are also 2 32 different signed n-bit numbers . The smallest signed n-bit number is -2 31 and the largest is 23 1-1. This range is about -2 billion to +2 billion. For the signed 32-bit number system, the 32 basis elements are {-231, 230, 2 29, ... ,

4, 2, 1}

14

• 1. Introduction Maintenance Tip: We will use the int data type in Conly when we don't care about precision, and we wish the compiler to choose the most efficient way to perform the operation. For example, on the 8-bit 8051 int will be 8 bits, on the 16-bit 9S12 int will be 16 bits, on the Cortex-M int is 32 bits, and on some x86 processors int is 64 bits. The C99 programming standard eliminates the confusion. We will use these types:

int8 t int16 t int32 t int64 t char

signed 8-bit signed 16-bit signed 32-bit signed 64-bit 8-bit ASCH characters

uint8 t unsigned 8-b it uintl6 t unsigned 16-bit uint32 t unsigned 32-bit uint64 t Ltnsigned 64-bit

-

Observation: When programming in assembly, we will always explicitly specify the precision of our numbers and calcluations. We can use bytes to represent characters with the American Standard Code for Information Interchange (ASCII) code. Standard ASCII is actually only 7 bits, but is stored using 8-bit bytes with the most significant bit equal to 0. For example, the capital 'V' is defined by the 8-bit binary pattern 01010110 2 . Table 1.3.11 shows the ASCII code for some of the commonly-used nonprinting characters. In C, we will use the char data type to represent characters. Abbr. NUL STX ETX BS HT CR LF SP

C

\0 \2 \3 \b \t

\r \n

ASCII character Null Start of text End of text Delete or Backspace Tab Enter or Return Line feed Space

Binary 0b00000000 0b000000l0 0b000000II 0b00001000 0b0000I00I 0b00001101 0b00001010 0b00I00000

Hexadecimal 0x00 0x02 0x03 0x08 0x09 0x0D 0x0A 0x20

Decimal 0 2 3 8 9 13 10 32

Table 1.3.11. Common special characters and their ASCII representations. The 7-bit ASCII code definitions are given in the Table 1.3 .12. For example, the letter 'V' is in the 0x50 column and the 6 row. Putting the two together yields hexadecimal 0x56. Checkpoint 1.3.27: How is the character '0' represented in ASCII? Checkpoint 1.3.28: J\ssume variable tt contains an ASCII code 0 to 9. Write a formula that converts the ASCII code in 17 into the correspo nding decimal number. Observation: In the LCD interface, we wiU use char codes 128 to 255 to specify special characters needed to display non-English charactters like Allo? O la iHolal Lab 9 will use nonstandard encodings developed for the LCD. See the README.hrrnl file in the ST7735 project.

Jonathan Valvano

15

BITS 4 to 6 0

B I

1 2

T S

3 4 5

0

6 7

T 0

8 9

3

A B C

D E F

0 1 NUL DLE SOH DCl/XON STX DC2 ETX DC3/XOFF EOT DC4 ENQ NAK ACK SYN BEL ETB BS CAN HT EM LF SUB ESC VT FF FS CR GS so RS SI us

2

SP ! II

# $ % &

' ( )

*

3

4

0 1 2 3 4 5 6 7

@

8

9

A

B C D E F G H

5 p

6

7

'

Q R

a b

p q r

s

C

s

d e f g

t u

T

u V

w X

I J

y

K

[

z

+

;

I




N

"

?

0

I

-

V

w

h i j k 1 m n

X

0

DEL

y

z {

I }

~

Table 1.3.12. Standard 7-bit ASCII. One way to encode a character string is to use null-termination. In this way, the characters of the string are stored one right after the other, and the end of the string is signified by the NUL character (0x00). For example, the stt·ing "Valvano" is encoded as these 8 bytes 0x56, 0x61 , 0x6C, Ox 76, Ox 61 , 0x6E, 0x6F , Ox 00. Typ ically we use a pointer to the first byte to identify the string, as shown in Figure 1.3.4. Pointer -.,.,._ox56 V 0x6l a 0x6C I V 0x76 a 0x61 0x6E n 0 0x6F 0x00

Figure 1. 3.4. Strings are stored cts a sequence ofA S C II characters, fol/o}J)ed l?J a nu//. Checkpoint 1.3.29: How is " HelJo World" encoded as a null-terminated /\SCII string? Observation: When outputting to some devices we send just a 13 (CR, 0x0D) to go to the next line, and some devices we send just a 10 (LF, 0x0A), while for other devices we need to send both a 13 and a 10 (LF, 0x0A).

16



1. Introduction

1.4. ARM Cortex-MO+ Table 1.4.1 lists general observations when deciding whether to classify a computer as a complex instruction set computer (CJSC) and a reduced instruction set computer (RISC). In reality, there are a wide range of archjtectures and these architectures exist in the spectrum ranging from comp letely CISC to completely RISC. Examples of CISC include Intel x86 and NXP 9Sl2. Examples of RISC include LC3, MIPS, A YR (Atmel), PowerPC (IBM), SPARC (Sun), MSP430 (TI), and Co11ex-M (ARM). The ARM company name originally began as Acorn RISC Machine, changed to A RISC Machine, and now the ARM company name is not an acronym any more, the company name is simply ARM.

crsc Many instructions Instructions have va1ying lengths Instructions execute in varying times Many instructions can access memory Can read and write mem01y in one instruction Fewer and more specialized registers. Many different types of addressing modes

Few instructions Instructions have fixed lengths Instructions execute in I or 2 bus cycles Few instructions (load store) can access memory Cannot read and write memo1y in one instruction Many identical general purpose registers Limited number of addressing modes

Table 1.4.1. General characteristics of CISC and RISC architectures.

In a CISC computer, the complexity is embedded in the processor. In a RISC computer, the complexity exists in the assembly code generated by the programmer or the compiler. RISC computers can be designed for low power because of the simplicity of the architecture (e.g., MSPM0+). It is very difficult to compare the execution speed of two computers, especially between a CISC and a RISC. One way to compare is to run a same benchmark program on both, and measure the time it takes to execute.

Time to execute benchmark =Instructions / program * Average cycles/instruction* Seconds/ cycle The 80 MHz ARM Cortex-M bas one bus cycle every 12.5 ns. Most instructions take 1 to 2 bus cycles to execute. At 80 MHz, the Cortex-M can execute between 40,000,000 and 80,000,000 assembly instructions per second.

1.4.1. Registers Registers are high-speed storage inside the processor. The registers are depicted in Figure 1.4.1. RO to Rl2 are general purpose registers and contain either data or addresses.

Function input parameters: The ARM Architecture Procedure Call Standard, AAPCS, requires us to use registers RO, RI, R2, and R3 to pass input parameters into a function . RO will be first parameter, RI the second, etc . Preserved registers: According to AAPCS , functions must preserve the values of registers R4Rl l. In other words, functions can use R4-Rl 1, but they must push the previous values on the

Jonathan Valvano

17

stack at the beginning of the function and pop the values off the stack at the end of the function. This way the values in R4-Rl 1 are preserved when one function calls another. Conversely, registers RO, Rl, R2, R3, and R12 need not be preserved. Function output parameter: Also, accord ing to AAPCS we place the return parameter in Register RO, if needed. Stack: Register Rl3 (also called the stack pointer, SP) points to the top element of the stack. The push operation stores d~1ta onto the stack, and the pop operation removes data from the stack. The stack operates in a last in first out (LIFO) manner. Since we push and pop only 32-bit data, the SP will always be worcl-aligned. A word-aligned address is divisible by 4, meaning the bottom two bits will be 0. Subroutine linkage: Register Rl 4 (also called the link register, LR) is used to store the return location for functions. lf a function A calls a function B, the function A must save (pushed on the stack) the LR. Before returning, the function A must restore the LR (popped from the stack). The LR is also used in a special way during exceptions, such as interrupts. Inte1Tupts are introduced in Chapter 5. Program counter: Register Rl5 (also called the program counter, PC) points to the next instruction to be fetched from ROM. The processor fetches an instruction using the PC, and then increases the PC by 2 or 4, since instructions are 16 or 32 bits wide. The PC is always halfwordaligned. A halfword-aligned address is divisible by 2, meaning the bottom bit will be 0. Program Status Register: The PSR contains the condition code bits, the exception munber, and the T bit. The condition code bits specify whether the previous result is zero (Z bit), negative (N bit), unsigned overflow (C bit) or signed overflow (V bit). The exception number specifies which interrupt is being processed. The T bit will always equal 1 for the Cortex-M, signifying Thumb mode. The LR bit 0 will contain the T bit when a function is called and when the function returns, so you will notice LR will always be odd (bit 0 will always be set).

RO

General purpose registers

Rl R2 R3 R4 R5 R6

Special registers

PSR

Program status register

PRIMASK

Exception mask register

CONTROL

CONTROL register

R7 Program Data

18

• 1. Introduction Control Register: There are two stack pointers: the main stack pointer (MSP) is used when running in a secure setting; the Process Stack Pointer (PSP) is used when running in an unsecure setting. The control register specifies which stack pointer will be used. We always use the MSP. There are tlu·ee status registers named Application Program Status Register (APSR), the lntenupt Program Status Register (IPSR), and the Execution Program Status Register (EPSR) as shown in Figure 1.4.2. These registers can be accessed individually or in combination as the Program Status Register (PSR). The N, Z, V, and C bits give information about the result of a previous ALU operation. ln general, the N bit is set after an arithmetical or logical operation signifying whether or not the result is negative. Similarly, the Z bit is set if the result is zero. The C bit means carry and is set on an unsigned overflow, and the V bit signifies signed overflow. 31 30 29 28

0

I IIII IPSR I Reserved

APSR N Z C V

Reserved 5

31

31

0

I ISR NUMBER I 24

0

EPSRl~__R_e_se_rv_e_d_ ____._l_r~I_ _ _ _ _ _ _ _R_e_s_er_ve_d_ ____.l 31

30 29 28

24

5

0

I JSR NUMBER I Ir I Figure 1.4.2. The program s!tttus register of the A RM Cortex-MO+ processor. PSR

INI z IC IV I

The ISR_ NUMBER indicates which interrupt if any the processor is handling. Bit Oof the special register PRIMASK is the intenupt mask bit. If this bit is 1, interrupts and exceptions are not postpones. If the bit is 0, then interrupts are allowed.

1.4.2. Reset A reset occurs immediately after power is applied. There is also a reset button on most systems, connected to the reset signal on the chip, that can also be used to trigger a reset. After a reset, the processor is in thread mode, running at a privileged level, and using the MSP stack pointer. The 32-bit value at flash ROM location O is loaded into the SP. All stack accesses are word-aligned. Thus, the least significant two bits of SP must be 0. A reset also loads the 32-bit value at location 4 into the PC. This value is called the reset vector. All instructions are halfword-aligned. Thus, the least significant bit of PC must be 0. On the ARM Cortex-M processor, the T bit should always be set to l. On reset, the processor initializes: The SP to the 32-bit initial value in ROM locations 0-3 The PC to the 32-bit reset vector in ROM locations 4-7 The LR to 0xFFFFFFFF The T bit to 1 The reset event does not clear or initialize RAM. Jnfonnation in RAM will be garbage. However, the C compiler will add software that runs on reset that will initialize all RAM variables.

Jonathan Valvano

19

1.4.3. Memory Microcontrollers differ by the amount of memory and by the types of 1/0 modules. The memo1y map ofMSPM0G3507 is illustrated in Figure 1.4.3. All Co1tex-M microcontrollers have similar memory maps. Flash ROM begins at address 0x0000.0000, RAM begins at 0x2020.0000, the peripheral l/0 space begins at 0x4000.0000, and internal 1/0 begins at 0xE000.0000. There are thousands of Cortex-M microcontrollers and the main differences are in their 1/0 ports.

128k Flash ROM 32kRAM

0x0000.0000

t

~--7

0x000l.FFFF ox202ioooo 0x2020.7FFF

I/0 ports

Initial SP Reset vector Interrupt vectors Code Constants

Ox 4000. 0000

t Global variables Stack, local variables Heap, temporary data

Internal 1/0 Figure 1.4.3. Me111ory 111ap of the MSPM0G3507.

We will put variables in RAM and constants in ROM. Global variables exist forever in RAM, and local variables exist temporarily on the stack or in registers. The heap is section of RAM that software can use to allocate (malloc ), use, and then release (free) . When we store 16-bit data into memoty it requires two bytes. Since the memory systems on most computers are byte addressable (a unique address for each byte), there are two possible ways to store in memory the two bytes that constitute the 16-bit data. Some NXP microcont.rollers implement the big endian approach that stores the most significant byte at the lower address. lntel microcomputers implement the little endian approach that stores the least significant byte at the lower address. Cortex-M microcontrollers use the little endian format. Many ARM processors are biendian, because they can be configured to efficiently handle both big and little endian data. Instruction fetches on the ARM are always little endian. Figure 1.4.4 shows two ways to store the 16-bit number l 000 (0x03E8) at locations 0x2020.0850 and 0x2020.085 l.

Address

I Data I

0x2020.0850 0x03 0x2020.0851 0xE8

Big Endian

Address

I Data I

0x2020.0850 0xE8 0x2020.085 l 0x03

Little Endian

Fig11re 1.4. 4. Example of big and little endian formats of a 16 -hit nu111her. Figure 1.4.5 shows the big and little endian formats that could be used to store the 32-bit number Ox 12345678 at locations 0x2020.0850 through 0x2020.0853. Again the Cortex-M uses little endian for 32-bit numbers.

20



1. Introduction

Address 0x2020.0850 0x2020.085 l 0x2020.0852 0x2020.0853

Data 0x12 0x34 0x56 0x78

Address 0x2020.0850 0x2020.085 l 0x2020.0852 0x2020.0853

Big Endian

Data 0x78 0x56 0x34 0xl2

Little Endian

Figure 1.4.5. Example of big and little endian formats of a 32-bit number. In the previous two examples, we normally would not pick out individual bytes (e.g., the 0xl2), but rather capture the entire multiple-byte data as one nondivisible piece of information. On the other hand, if each byte in a multiple-byte data structure is individually addressable, then both the big and little end ian schemes store the data in first to last sequence. For example, ifwe wish to store the ASCI1 string "ABC" at locations 0x2020.0850 through 0x2020.0853, then the ASCII 'A '=0x4 I comes first in both big and little endian schemes, see Figure 1.4.6.

Address Data 0x2020.0850 0x41 0x2020.085 l 0x42 0x2020.0852 0x43 0x2020.0853 1--'0~Big Endian and Little Endian Figf,/re 1.4.6. Character strings are stored in the same mannerjor both big and little endian formats. The terms "big and little endian" come from Jonathan Swift's satire Gulliver's Travels. In Sw ift' s book, a Big Endian refers to a person who cracks their egg on the big end. The Lilliputians were Little Endians because they insisted that the only proper way is to break an egg on the little end. The Lilliputians considered the Big Endi ans as inferiors. The Big and Little Endians fought a long and senseless war over the best way to crack an egg. Common Error: An error will occur when data is stored in Big Endian by one computer and read in Little E ndian format on another.

1.5. Assembly Language

1.5.1. Syntax We will use the Tl Clang compiler in this book. Therefore, our assembly syntax will fo llow the armclang rules. For detailed information, search "Arm Compiler annclang Reference Guide" or see the armclang_reference_guide_100067_0612_00_en.pdf on the book webs ite.

Jonathan Valvano

21

Assembly language instructions have four fields separated by spaces or tabs. The label field is optional and is used to identify the position in memory of the current instruction. A colon is required after every label, and you must choose a unique name for each label. The opcode field specifies the processor command to execute. The operand field specifies where to find the data to execute the instruction. Thumb instructions have 0, 1, 2, or 3 operands, separated by commas. The comment field is also optional and is ignored by the assembler, but it allows you to describe the software making it easier to understand. You can add optional spaces between operands in the operand field. However, a// must separate the operand and comment fields. Label Opcode Operands Fune: MOVS RO, #100 BX LR

Comment // this sets RO to 100 // this is a function return

Observation: A gocid comment explains why an operation is being performed, bow it is used, bow it can be changed, or how it was debugged. /1. bad comment explains what the operation does. The comment~ in the above two assembly lines are examples of bad comments.

It is much better to add comments to explain bow or even better wby we do the action. Good c01mnents also describe how the code was tested and identify limitations. But for now, we are learning what the insttuction is doing, so in this chapter comments will describe what the instruction does. The assembly source code is a text fi le (with file extension .s) containing a list of instructions. lf register RO is an input parameter, the following is a function that will return in register RO the value (l00*input+ 10). Fune: MOVS MULS ADDS BX

Rl,#100 R0,R0,Rl R0,#10 LR

// // // //

Rl=l00 R0=l00*input R0=l00*input+l0 return l00*input+l0

The assembler translates assembly source code into object code, which are the machine instructions executed by the processor. All object code is halfword-aligned. This means instructions can be 16 or 32 bits wide, and the program counter bit 0 will always be 0. When we build a project all files are assembled or compiled then linked together. When the entire project is built, the files are linked together, and the linker decides exactly where in memory everything will be. After building the project, it can be downloaded, which stores the object code into flash ROM. For an embedded system, we place executable instructions into nonvolatile flash ROM. The linker creates a map file, showing you exactly where in memory your variables and labels exist. When we run code in the debugger we can also observe the Disassembly window, which contains the address, the object code, and assembly code. Address Object code 2164 000000c0: 000000c2: 4348 000000c4: 300A 4770 000000c6:

Assembly code rl, #0x64 movs r0, rl, r0 muls r0, #0xa adds rl4 bx

22



1. Introduction

1.5.2. Addressing Modes A fundamental issue in program development is the differentiation between data and address. When we put the number 1000 into Register RO, whether I 000 is data or an address depends on how the 1000 is used. To run efficiently, we try to keep frequently accessed information in registers. However, we need to access memory to fetch information or save results. The addressing mode is the format the instruction uses to specify the memory location to read or write data. The addressing mode is defined in the syntax of the operands. A a single instruction could exercise multiple addressing modes for each of the operands. When the association is obvious, we will use the expression "the addressing mode of the instruction", rather than "the addressing mode of an operand in an instruction". All instructions begin by fetching the machine instruction (op code and operand) pointed to by the PC. When extended with Thumb-2 technology, some machine instructions are 16 bits wide, while others are 32 bits. Some instructions operate completely within the processor and require no memory data fetches. For example, the ADDS Rl, R2 instrnction performs Rl +R2 and stores the sum back into RI. If the data is found in the instruction itself, like MOVS RO, #1 , the instruction uses immediate addressing mode. A register that contains the address or the location of the data is called a pointer or index register. Indexed addressing mode uses a register pointer to access memory. The addressing mode that uses the PC as the pointer is called PC-relative addressing mode. It is used for branching, for calling functions, and accessing constant data stored in ROM. The addressing mode is called PC relative because the machine code contains the address difference between where the program is now and the addJess to which the program will access. The MOVS instruction will move data within the processor without accessing memory. The LDR instruction will read a 32-bit word from memory and place the data in a register. With PC-relative addressing, the assemb ler automatical ly calculates the correct PC offset. Register. Most instructions operate on the registers. In general, data flows towards the op code (right to left). In other words, the register closest to the op code gets the result of the operation. In each of these instructions, the result goes into R2 . MOVS LDR ADDS ADDS

R2,Rl R2, [Rl] R2,R0 R2,R0,Rl

II II II II

put R2= R2= R2=

a copy of Rl into R2 32-bit value pointed to by Rl R2+R0 R0+Rl

Register list. The stack push and stack pop instructions can operate on one register or on a list of registers. SP is the same as RJ3 , LR is the same as Rl4, and PC is the same as R15 . The order of the register list does not matter. These register lists are the same {Rl,R3,R6,R7} {R7, R6, R3, Rl} { Rl, R7, R6, R3}. Data are stored on the stack with the smaller register number associated with the smaller memory address . The dash provides a short-cut {Rl-R4} is the same as { Rl , R2 , R3 , R4 } . PUSH POP PUSH POP

{LR} {PC} {Rl-R4,LR} {R0-R3,PC}

II II II II

save LR on stack remove from stack and place in PC save Rl,R2,R3,R4, and link register restore R0,Rl,R2,R3 and PC

Jonathan Valvano

23

Immediate addressing. With immediate addressing mode, the data itself is contained in the instruction. Once the instruction is fetched no additional memory access cycles are required to get the data. Notice the number I 00 (0x64) is embedded in the machine code of the instruction shown in Figure 1.5.1. Immediate addressing is only used to get data. It will never be used with an instruction that stores to memory.

RO PC

0x00000264

MOVS R0,#1 0x000002641-~2~1~6~4-~ 0x000002661-----~ 0x00000268-----~ Figure 1.5.1. An exa111ple of im111ediate addressing JJJode, data is in the instruclio11. Indexed addressing. With indexed addressing mode, the data is in memory and a regi ster will contain a pointer to the data. Once the instruction is fetched , one additional memory access cycle are required to read or write the data. ln these examples, RI is an address and R2 is a constant offset. In these examples, RO gets the data , and RI and R2 are unchanged. LOR LOR LOR

RO, [Rl] RO, [Rl,#4] RO, [Rl ,R2]

// RO= 32-bit value pointed to by Rl // RO= 32-bit value pointed to by Rl+4 // RO= 32-bit value pointed to by Rl+R2

In Figure 1.5.2, the instruction LDR RO, [Rl] will read the 32-bit value pointed to by RI and place it in RO. RI could be pointing to any valid object in the memory map (i.e., RAM , ROM, or I/0), and Rl is not modified by th is instruction.

PC

0x00000144

x00000l 0x00000l Ox l 2345678

RO

Rl

0x2020000 0x2020000 0x 2 0200004

Figure 1.5.2. An example of indexed addressi11g mode, data is in 111e111ory. In Figure 1.5.3, the instruction LOR RO, [Rl, #4] will read the 32-bit value pointed to by Rl +4 and place it in RO. Even though the memory address is calculated as RI +4, the Register R1 itself is not modified by this instruction.

24

• 1. Introduction

PC 0x00000146

Rl +4 Ox20200008i-w...L..M.-.::ai............_~ Ox2020000C~_ _ _ __, figu re 1.5.3. An exa111ple of indexed addressing 1J1ode JJJith offset, data is in 1t1eJJ1ory. Rl

0x20200004

PC-relative addressing. PC-relative addressing is indexed addressing mode using the PC as the pointer. The PC always points to the instruction that will be fetched next, so changing the PC will cause the program to branch. A simple example of PC-relative addressing is the unconditional branch. In assembly language, we simply specify the label to which we wish to jump, and the assembler encodes the instruction with the appropriate PC-relative offset. B

Location

// jump to Location, PC-relative addressing

PC-relative addressing mode is also used for a function call. Upon executing the BL instruction, the return address is saved in the link register (LR) . In assembly language, we simp ly specify the label defining the start of the function, and the assembler creates the appropriate PC-relative offset. BL

Subroutine

// call Subroutine, PC-relative addressing

Typically, it takes two instructions to access data in RAM or 1/0. The fast instruction uses PCrelative addressing to create a pointer to the object, and the second instruction accesses the memory using the pointer. We can use the =Something operand for any symbol defined by our program. In this case, Count is the label defining a 32-bit variable in RAM .

MOVS LDR

STR

R0,#100 Rl,=Count RO, [Rl]

// Rl points to variable Count, PC-relative // store 100 into Count

The operation caused by the above three instructions is illustrated in Figure 1.5.4. Assume a 32bit variable Count is located in the data space at RAM address 0x2020.0000. First, the MOVS instruction places 100 in RO. Second, the LDR Rl, =Count makes Rl equal to 0x2020.0000. l.e., RI points to Count. When the LOR instruction is being executed, the PC is pointing to the next instruction, the STR, i.e. , PC= 0x000000CC. The assembler places a constant 0x2020.0000 in code space and translates the =Count into the correct PC-relative access to the constant (e.g., LDR Rl, [PC, #0x025C] ). In this case, the constant 0x2020.0000, the address of Count, will be located at 0x0328, which is 0x00CC+0x025C. Third, the STR RO, [Rl] instruction will dereference this pointer, storing the 32-bit value I 00 into location 0x2020.0000.

Jonathan Valvano

25

ROM

I

PC 0x000000CC J

0:itOOOOOOC6 0:itOOOOOOC8 0:itOOOOOOCA 0:itOOOOOOCC

RI

2964 4997 ~ ► 6008 ROM

0:it00000328 0:it0000032C / .• ·····---------•.. __________ /

f ROl~o_x_o_o_o_o~6-o_6_4~

20200000

LDR

R0,#100 Rl, [PC,#Ox025C]

STR

RO, [Rl]

MOVS

0x0328=0x00CC+0x025C

RAM

0:it202000~-~---~ - - - . _ R L 0:it20200004 0x 20200008

Figure 1.5.4. Indexed addressing ttsing R.1 as a registerpointer lo access meJJ1ory. 100 is 1J1oved into the variable Count. Code space is J})here we place programs and data space is where we place variables. Observation: This book sometimes adds a dot in the middle o f 32-bit hexadecimal numbers (e.g., 0x2020.0000). 'fhi s dot helps the reader vis ualize the number. However, this dot should not be used when writing actual software. Checkpoint 1.5.1: What is the addressing mode used fo r? Checkpoint 1.5.2: Assume R3 equals 0x2020.0000 at the time LDR R2, [R3, #8] is executed. \Xlhat address will be accessed? If R3 is changed, to what value will R3 become? Checkpoint 1.5.3: Assume R3 equals 0x2020.0000 and R 1 is 8 at the time LDR R2, [Rl, R3] is executed . \X'hat address will be accessed? lf Rl and R3 are changed, to what values wi ll they become?

1.5.3. Data Access Instructions This section presents mechanisms to read from and write to memory. Because I/0 ports ex ist at specific addresses in the same memory map as ROM and RAM, we access 1/0 ports using the same mechanisms as we access memory . As illustrated in Figure 1.5.4, to access memory we first establ ish a pointer to the object, then use indexed addressing. RAM is where we place global variables. There are four types of memory objects, and typical ly we use a specific register to access them.

Memory object tyRe Constants in code space Local vari ables on the stack G lobal variables in RAl\f I/0 ports

Register used to access PC SP RO-R7 RO-R7

Example

LDR RO,=Constant LDR RO,[SP,#OxO4] LDR RO, [Rl] LDR RO, [Rl]

26



1. Introduction An aligned access is an operation where a word-aligned address is used for a 32-bit word, and where a halfword-aligned address is used for a 16-bit halfword access. Byte accesses are always aligned . The address of an aligned word access will have its bottom two bits equal to zero. The address of an aligned halfword access will have its bottom bit equal to zero. An unaligned word access means we are accessing a 32-bit object (4 bytes) but the address is not evenly divisible by 4. An unaligned halfword access means we are accessing a 16-bit object (2 bytes) but the address is not evenly divisible by 2. Unaligned accesses will cause a hard-fault on the Co,tex-MO+. When data exist in registers, it always occupies the entire 32-bit register. However, data in memory could exist as 8-bit, 16-bit, or 32-bit values. Loading 8-bi t or 16-bit data from memory into a 32-bit register is called promotion. Storing 32-bit data from a register into an 8-bit or 16bit memory cell is called demotion. Demotion simply discards the extra bits. When reading 8bit and 16-bit data, we must also know if the values are signed or unsigned. The available LDR and STR instructions arc listed in Table 1.5.1. The type determines how the data is promoted or demoted. When we load an 8-bit or J 6-bit unsigned value into a register, the extra most significant bits are filled with 0, called zero pad. When we load an 8-bit or 16-bit signed value into a register, the sign bit of the value is filled into the extra most significant bits, called sign extension. This way, ifwe load an 8-bit -10 (OxF6) into a 32-bit register, we get the 32-bit -10 (OxFFFF.FFF6). Errors can occur during demotion, when we store a 32-bit register into an 8-bit or 16-bit memory variable, because only the least significant bits are stored.

Instruction LOR LORB LORSB LORH LORSH STR STRB STRH

Data ty e 32-bit word Unsigned 8-bit byte Signed 8-bit byte Unsigned 16-bit halfword Signed 16-bit halfword 32-bit word Unsigned 8-bit byte Unsi ned 16-bit halfword

Meanina 0 to 4,294,967,295 or -2,147,483,648 to +2, 147,483,647 0 to 255, Zero pad to 32 bits on load -128 to + 127, Sign extend to 32 bits on load 0 to 65535, Zero pad to 32 bits on load -32768 to +32767, Sign extend to 32 bits on load 0 to 4,294,967,295 or -2,147,483,648 to +2, 147,483,647 0 to 255, Discard top 24 bits 0 to 65535, Discard to 16 bits

Table 1.5.1. Memory access instructions.

Note that there is no such thing as a signed or unsigned store. For example, there is no STRSH; there is only STRH. This is because 8, 16, or all 32 bits of the register are stored to an 8-, 16-, or 32-bit location, respectively. Demotion is simply storing the least significant bits. This means that the value stored to memory can be different from the value located in the register. When using STRB to store an 8-bit number, be sure that the number in the register is 8 bits or less. The move instructions get their data from the machine instruction or from within the processor and do not require additional memory access instructions. See Appendix 3 for more details.

MOVS MOVS MOV LOR

Rd, Rs Rd, #im8 Rd2, Rs2 Rd, =const

// // // //

set set set set

Rd equal to the value in Rs, set NZCV Rd equal to im8, 0 to 255, set NZCV Rd2 equal to the value in Rs2 Rd equal to any constant value

Jonathan Valvano

27

Performance Tip: See J\ppencLi.x 3 for which registers and which addressing modes are available for each instruction, and how conditio n code bits are set.

We use the LOR instruction to load data from memory to a register and the STR instruction to store data from a register to memory. In real Ii Fe, when we move a box to the basement, push a broom across the floor, load clothes into a washing machine, store spoons in a drawer, pop a candy into our mouth, or transfer employees to a new location, there is a physical object and the action changes the location of that object. Assembly languages use these same verbs, but the action will be different. 1n most cases, it creates a copy of the data and places the copy at the new location. In other words, since the original data still exists in the previous location, there are now two copies of the information. The exception to this memory-access-creates-two-copies-rule is a stack pop. When we pop data from the stack, it no longer exists on the stack leaving us just one copy. For example in Figu re 1.5.4, the instruction STR RO, [Rl] stores the contents of RO into the variable Count. At this point, there are two copies of the data, the original in RO and the copy in RAM . lfwe next add 1 to RO, the two copies have different values. When we learn about interrupts in Chapter 5, we will take special care to handle shared infonnation stored in global RAM, making sure we access the proper copy.

1.5.4. Logical Operations Software uses logical and shift operations to combine information, to extract information and to test information . A unary operation produces its result given a sing le input parameter. Examples of unary operations include negate, complement, increment, and decrement. In discrete digital logic, the complement operation is called a NOT gate; shown in Appendix 5. The complement function is defined in Table 1.5.2. A

0 l

l 0

Table 1.5.2. Logical complement.

When designing digital logic we use gates, such as NOT AND OR, to convert individual input signals into individual output signals. However, when writing software using logic functions, we take two 32-bit numbers and perf01m 32 logic operations at the same time in a bit-wise fash ion yielding one 32-bit result. Boolean Logic has two states: true and fa lse. The false is 0, and the true state is any nonzero value. In C, we use the Boolean operators && for AND, I I for OR, and ! for NOT. A binary operation produces a single result given two inputs. The logical and (&) operation yields a true result if both input bits are true. The logical or (I) operation yields a true result if either input bit is true. The exclusive or (" ) operation yields a true result if exactly one input bit is true . The logical operators are summarized in Table 1.5.3 and shown as digital gates in Appendix 5. The logical instructions on the ARM Cortex-M processor take two 32-bit inputs, one from register Rdn and the other from reg ister Rm. These operations are performed in a bitwise fashion on two 32-bit input parameters yielding one 32-bit output result. The result is stored

28

• 1. Introduction into the register Rdn. For example, the calculation t=m&n means each bit is calculated separately, r1,=m1,&m,, r1rFm1o&n10, ... , r(Fmo&nn. In C, when we write r=m&n; r=m In; r=mAn; the logical operation occurs in a bit-wise fashion as described by Table 1.5.3. However, in C, we define the Boolean functions as r=m&&n; r=m I In; For Booleans, the operation occurs in a word-wise fashion. For example, r=m&&n; means r will become zero if either mis zero or n is zero. Conversely, with r=m&&n, r will become a nonzero (1) if both mis nonzero and n is nonzero. A Rdn

0 0

B

A&B

Rm

AND

0

0 0 0

1

0

AIB ORR

0 J

1

N'B EOR 0 I J 0

A&(-B) BIC

0 0 I 0

Al(-B) ORN I

0 J

Table 1.5.3. Logical operations performed by the Cortex-M processor.

The register Rm contains data for the logical operation. The first four instructions use Rdn as both source data and a destination for the result. If the middle register is omitted, it is assumed to be Rdn . Logical instructions set the N and Z condition code bits based on the result of the operation, and they will leave the C and V bit unchanged.

ANDS Rdn, Rdn, Rm II Rdn = Rdn&Rm ORRS Rdn, Rdn, Rm II Rdn = RdnlRm EORS Rdn, Rdn, Rm II Rdn = Rdn"Rm BICS Rdn, Rdn, Rm II Rdn = Rdn& (~Rm) MVNS Rd, Rm II Rd= ~Rm, logical NOT For examp le, assume RI is 0x12345678 and R2 is 0x87654321. The ORRS Rl , Rl , R2 will perform this operation, placing the 0x97755779 result in R 1. R1 R2 ORR

0001 0010 001101000101 0110 01111000 1000 0111 0110 0101 0100 0011 0010 0001 10010111011101010101011101111001

Example 1.5.1: Write code to set bit 3 in a 32-bit variable called N. Solution: First, we perform a 32-bit read, bringing N into Register RO. Second, we perform a logical OR setting bit 0xl 0, and lastly, we store the result back into N.

LOR LOR MOVS ORRS STR

Rl, =N RO, [Rl] R2, #8 RO, RO, R2 RO, [Rl]

II II

Rl RO

= &N =N

II II

RO N

= Nl8 = Nl8

(Rl points to N)

II

C implementation N=N I Ox00000008;

Program 1.5.1 . Code to set bit 3 using logical OR Checkpoint 1.5.4: IfR1 is Ox12345678 and R2 is Ox87654321, what would Rl be after the instruction ANDS Rl , Rl , R2 is executed? What would RO be after EORS Rl , Rl , R2?

Jonathan Valvano

29

Checkpoint 1.5.5: Cbange Program 1.5.1 so it clears bit 3 of the 32-bit variable N. Observation: W/e use the logical OR to make bits become one, and we use the logical AND to make bits become zero.

1.5.5. Shift Operations fn both assembly and C, the shift operation takes two input parameters and yields one output result. In C, the left shift operator is>. E.g., to left shift the value in Mby N bits and store the result in R we execute: R = MN. The logical shift right (LSR) is similar to an unsigned divide by 2n, where n is the number of bits shifted as shown in Figure 1.5.5. A zero is shifted into the most significant position, and the carry flag will hold the last bit shifted out. The right shift operations do not round. For example, a right shift by 3 bits is similar to divide by 8. However, 15 right-shifted three times (15» 3) is 1, while 15/8 is much closer to 2. Other than the last bit that goes into the carry, LSR discards bits that were shifted out. The arithmetic shift right (ASR) is similar to a signed divide by 211 • Notice that the sign bit is preserved, and the carry flag will hold the last bit shifted out. This right shift operation also does not round. Again, a right shift by 3 bits is similar to divide by 8. For example, -9 right-shifted three times (-9>>3) is -2. Other than the last bit that goes into the carry, ASR discards bits that we.re shifted out. The logical shift left (LSL) operation works for both unsigned and signed multiply by 2n. A zero is shifted into the least significant position, and the carry bit wi ll contain the last bit that was shifted out. Other than the last bit that goes into carry, LSL discards bits that were shifted out.

Logical Shift Right LSR

Arithmetic Shi~~tt

31 30 29 28 27 26

0

C

~ 30 29 28 27 26

0

C

31 30 29 28 27 26

0

C

0

f ff ff ff f EHJ =ff ff ff EEHJ

Logical Shift Left LSL Figure 1.5.5. Shift operations.

All shift inst.ructions place the result into the destination register Rd. Rm is the register holding the value to be shifted. The number of bits to shift is either in register Rs, or specified as a constant n . All shift operations will set the N and Z condition code bits are updated based on the

30

• 1. Introduction result of the operation . The C bit is the cany out after the shift as shown in Figure 1.5.5. These shift instructions will leave the V bit unchanged. Observation: Use logic shift right for unsigned numbers and arithmetic shi ft right fo r signed numbers. Logical shift left is appropriate fo r both signed and unsigned numbers.

LSRS LSRS ASRS ASRS LSLS LSLS

Rd, Rd, Rd, Rd, Rd, Rd,

Rd, Rm, Rm, Rm, Rd, Rm,

Rs #n Rs #n Rs #n

II II II II II II

logical shift right Rd=Rd>>Rs (unsigned) logical shift right Rd=Rm>>n (unsigned) arithmetic shift right Rd=Rd>>Rs (signed) arithmetic shift right Rd=Rm>>n (signed) shift left Rd=Rd2 LOR R2, =M II R2 = &M (R2 points to M) STR RO, [R2) II M = N>>2 Program 1.5.2. Example code showing ct right shift.

II

C implementation M = N>>2;

Example 1.5.3: Assume we have three 8-bit variables named High, Low, and Result. High and Low have 4 bits of data; each is a nwnber from O to 15. Take these two 4-bit nibbles and combine them into one 8-bit value, storing the combination in Result. Solution: The solution uses the shift operation to move the bits into position, then it uses the logical OR operation to combine the two parts into one number. This works only if both High and Low are bounded within the range of Oto 15. The expression HighSECCFG.PINCM to Mode= I. Observation: The expression mixed-signal refers to a system with both analog and digital components. Notice how many I/0 ports perform this analog-digital bridge: J\DC, DAC, analog comparator, PWM, input capture, and analog amplifier.

70

• 2. Introduction to Interfacing Index 0 I 6 13 18 19 20 21 33 34 35 36 37 38 39 45 46 52 53 54 58 59 2 3 4 5 11 12 14 15 16 17 22 23 24 25 26 27 28 29 30 31 32 42 43 44 47 48 49 50 51 55 56 57

Mode l PAO PA I PA2 PA7 PA8 PA9 PA IO PA I I PA 12 PA 13 PA l 4 PA I S PA 16 PA l7 PA 18 PA2 1 PA22 PA23 PA24 PA25 PA26 PA27 PA28 PA29 PA30 PA3 1 PB0 PB I PB 2 PB3 PB4 PB 5 PB 6 PB7 PB8 PB9 PB I 0 PBI I PBl 2 PBl 3 PB l4 PBl 5 PBl 6 PBl 7 PBI 8 PBl 9 PB20 PB 21 PB22 PB23 PB24 PB25 PB26 PB27

Mode2 U0 TX U0 RX TG8 C I CO OUT U I TX U I RX U0 TX U0 RX U3 CT S U3 RT S U0 CTS U0 RTS C2 OUT U l TX U l RX U2 TX U2 RX U2 TX U2 RX U3 RX U3 TX RTC OUT U0 TX II SCL II SDA U0 RX U0 TX U0 RX U3 TX U3 RX U I TX Ul RX U l TX U I RX U I CT S U l RT S TG0 CO TG0 C I U3 TX U3 RX SPI CS3 U2 TX U2 RX U2 TX U2 RX C2 OUT SP0 CS2 SPI POCI SPI PI CO SPI SCK SP0 CS3 U0 CTS U0 RTS C2 OUT

Modc3 IO SD A IO SCL SP0 CSP0 CK OUT SP0 CSP0 SP0 PI CO SP0 POC I SP0 SCK SP0 SC K SP0 POCI SP0 PI CO SPI CS2 SPI POC I SPI SCK SPI PI CO TG8 CO T G8 C I SP0 CS3 SP0 CS2 SPI CS3 SPI CSP0 SPI CSP! IO SDA U2 RTS U2 CTS IO SCL SPI CS2 SPI CS3 U2 CT S U2 RTS U3 CT S U3 RTS SPI CS P0 SPI POCI SPI PICO SPI SCK TG8 CO TG8 C l TAO C2 TAO C3 SPI POC I SPI PICO SPI SCK SP0 PICO SP0 SCK SP0 POCI SPI CSP0 TG8 CO TGS C I CO OUT SP0 CS P! SP0 CS P0 SP0 CSPI SPI CS P!

Mode4 TAO CO TAO C I TG7 C I TG8 CO U0 RTS U0 CTS IO SDA ro SC L TG0 CO U3 RX U3 T X II SCL II SDA II SCL II SDA UI CT S U l RTS T A O C3 TAO C3N TG l 2 C l TG8 CO TG8 C I TAO C3 TG8 CO TG8 C l TAO C3N TA I CO TA I C l II SC L II SDA T A I CO TAI C I SP0 CSPI SP0 CS2 TAO CO TAO C l C l OUT CK OUT TA FALi TG l 2 CO SP0 CS3 U3 CTS U3 RTS SPI CS PI SPI CS2 TG8 C l TAO C2

TA FAL0 TAO C3 TA FAL2 T A O C3 TAO C3N

Mode5 TA FAL i TA FA L2 SPI CSP0 TAO C2 TAO CO TAO C l TA I CO TA I C l CAN T X TG0 C I TG l 2 CO TA I CO TA I C l TAO C3 TAO C3N TAO CO TAO C I TG0 CO TG 0 C I T AO C3 TA FAL0 TA FA L2 TA FAL0 TG6 CO TG6 C I TG l 2 C I TAO C2 TA O C2N T A O C3 TAO C3N TAO C2 TAO C2N TG8 CO TGS C I C I OUT TAO CON TG6 CO TG6 C l TAO C l TA O C l N TG l 2 C l TG8 CO TGS C l TA I CO TA I C I U0 CTS TG l 2 CO

M ode6 TG8 C I TG8 !D X

Mode7 FCC IN TG8 CO

Mode8

TG8 IDX T A I CON R OUT TG l 2 CO CO OUT T A O C3 CAN RX CK OUT TG8 IDX T A I C IN TG7 CO TG7 C l TG6 CO CK OUT U3 CTS U3 RTS TA O C I N CAN TX CAN RX TG7 CO

TG7 C l

TAO C I

TAO T AO T AO FCC TAO

CON C2 C2N IN C3N

TA I C I N II SDA II SCL

TA I TAO TA I TA I

CON C2N CO Cl

TAO C2 FCC IN

TAO CON TG7 CO T G7 C l

TG6 C l TG8 CO TA I C l

TG7 CO TG7 C l TA I CO

CK OUT

TG7 C I

TA I C l

UI CT S U I RTS TAI CON TA I C I N U2 CT S U2 RTS

TG6 CO TG6 C l

TA I co TA I C l

TG6 CO TG6 C l

TA I CON TAI C I N

TAO CO

TG8 IDX T G7 CO T G7 C I TA O C2 T AO C2N TG7 C I T A FALi

TA O C l

TG l 2 C l

TAO C I N

TA I CON

TG6 CO TG6 C I

TA I co TA I C l

T able 2.2.1. Bits 5:0 of IOMUX->SECCFG.PINCM specify the digital Mode for that pin.

TA I C I N

Jonathan Valvano

71

Pins on the MSPM0 family can be assigned to many different functions. The SPI is used to interface medium-speed 1/0 devices. Timers can create periodic interrupts, and we will use timers in Chapter 5 to run software tasks at a regular rate. Timer input can be used to measw·e period, pulse width, phase, and frequency. Timer output can create pulse width modulation (PWM) outputs, and will be used to apply variable power to lights or motors. In a typical motor controller, timer input measures rotational speed, and PWM output sets power. The DAC will be used to generate signals, such as sound (Chapter 5). In Chapter 6, we will use SPI to inte1face graphics display (e.g., ST7735R and Crystalfontz CFAFl28128B-0145T LCDs). The ADC will be used to measure the amplitude of analog signals and will be important in data acquisition systems (Chapter 7). The UART can be used for serial communication between computers. The UART allows for simultaneous c01mnunication in both directions, and we will present the UART in Chapter 8. l 2 C is a simple 1/0 bus that we will use to interface low-speed peripheral devices (e.g., OPT3001 optical sensor). The MSPM0G3507 has many analog features that wil l be explored in Volume 2. The CAN creates a high-speed communication channel between microcontrollers and is commonly found in automotive and other distributed control applications. The MSPM0G3507 LaunchPad evaluation board is a low-cost development board, search for buying options at https://octopart.com/search?q=LP-MSPM0G3507. The kit provides an integrated XDSl 10 USB debug probe, which allows programming and debugging of the MSPM0G3507 microcontroller. Six of the MSPM0G3507 microcontroller pins were left off of Table 2.2.1 because they are used by the LaunchPad to petlorm necessary functions and thus are not available for us to use as 1/0. • • • • • •

PA3 LFXIN 32kHz crystal PA4 LFXOUT 32kHz crystal PAS HFXIN 40MHz crystal PA6 HFXOUT 40MHz crystal PA19 SWDIO debugger PA20 SWCLK debugger

Figure 2.2.6 shows the LaunchPad connected to one switch and one LED located on an external solderless breadboard. When interfacing external circuits, we could connect male-to-male wires to the bottom of the LaunchPad, or we could connect male-to-female wires to the top of the LaunchPad. Figw·e 2.2.7 shows some the circuits on the LaunchPad itself. The S3/Reset button will reset the chip. There are two switches, St and S2, on the sides of the board. There is one red LED and one RGB LED that the software can control. The LaunchPad has a lot of jumpers allowing you to configure the system for different applications. Each of the starter projects for the book specify the jumper setting needed for that project. We will insert jumpers 14, 15 , 16, J7, and J8 to activate the switches and LEDs on the LaunehPad. We place jumpers 126 and 127 in the XDS position to connect PAJO and PAI l to the serial port allowing data to flow in the USB cable between the LaunchPad and the PC. The LaunchPad has four 10-pin connectors, labeled as J 1 J2 J3 J4 in Figure 2.2.8, to which you can attach your external signals. The top side of these connectors has male pins, and the bottom side has female sockets. The intent is to stack boards together to make a layered system. Texas Instrwnents also produces Booster Packs, which are pre-made external devices that will plug into this 40-pin connector. Figure 2.2.9 shows the Educational BoosterPack MK-II. Checkpoint 2.2.3: How is the Launch Pad powered?

72

• 2. Introduction to Interfacing USB Debug

• ♦



,





•• ♦ • • • ♦ " + ♦ •

.......

♦ ♦

♦ ♦

1'

. .. . •







+



.

....

... ♦

Fig1-1re 2.2.6. LatmchParl based 011 the MSPM0G3507, part #LP-MSPM0G3507.

MSPMOG3507 PAll PAO

USB Serial

3.3V

LEDl

PAlO PAI0 BP

Sl

PA18

PB22 PB27 PB26

240

LED2 Blue

PB21 S2 _J__ J 9-337/R6GHBHC-A01 /2T

Figure 2.2. 7. SJJJitch and LED i11terfaces on the LaunchPad Evaluation Board.

Jonathan Valvano

JI 4

JI

J3

3.3V

I

1 2 3

PA26 PB24 PB9 PA27 PB2 PB3

2 3 4 5 6 7 8 9 10

PB23 PA9

Figure 2.2.8. Interface connectors

011

J4

4 5 6 7 8 9 10

11 'i

PB4 PBl PA28 PA3l PB20 PB13 PAIO BP PA 11 -BP PA12 PAJ3

73

J2 1 2 3 4 5 6 7

8 9 10

GND PB12 PBl7 PBl5

RESE1 PBS PB7 PB6 PB0 PBl6

the MSPM0C3507 LaimchPad Evaluation Board.

Figure 2.2.9. The MSPM0G3507 L.L1u11chPad connected to the Educational BoosterPack MKII, pctrf mm1her BOOSTXL-EDUMKII. It is m1111i1~g the starter code Ct]'stalfontzLCD.

2.2.3. MSPM0G3507 GPIO Programming Since we will be using both Ports A and B in most examples, we will activate them both once at the start of our project. The first step is a reset, and a reset should run exactly once. If you have multiple initializations, only the first one should reset. Program 2.2.3 will reset and activate the clocks for both Port A and Po1t B. For the reset operation, bits 31-24 of the RSTCLR are the unlock key and must be OxBl. The sticky bit allows the software to know if the p01t has been reset. Setting bit I of the RSTCLR will clear the sticky bit, and setting bit O will perform the reset operation. We will not use the sticky bit, but we clear it anyway just in case we wish to add that feature later. For the activate operation, bits 31-24 of the PWREN are the unlock key and must be Ox26. Setting bit O will apply power to the po1t, and then we wait about 24 bus cycles.

74

• 2. Introduction to Interfacing Write 0xB1000003 to GPIOA->GPRCM.RSTCTL to reset Port A Write 0xB1000003 to GPIOB->GPRCM.RSTCTL to reset Port B Write 0x26000001 to GPIOA->GPRCM.PWREN to power Port A Write 0x26000001 to GPIOB->GPRCM.PWREN to power Port B Wait at least 24 bus cycles fo r the power to stabi li ze

Init: PUSH {LR} LOR Rl,=0xB1000003 LDR R0,=GPIOA RSTCTL STR Rl,[R0] 71 reset PortA LOR R0,=GPIOB_RSTCTL // reset PortB STR Rl,[R0] LOR Rl,=0x26000001 LDR R0,=GPIOA PWREN STR Rl,[R0] 71 power PortA LDR R0,=GPIOB_PWREN STR Rl, [RO] // power PortB MOVS R0,#24 Delay // 24 bus cycles BL POP {PC}

void ActivatePortA_B(void) { GPIOA->GPRCM.RSTCTL = 0xB1000003; GPIOB->GPRCM.RSTCTL = 0xB1000003;

GPIOA->GPRCM.PWREN = 0x26000001; GPIOB->GPRCM.PWREN = 0x26000001; Clock_Delay(24); // 24 bus cycles

Progra111 2.2.3. Reset and activate both Port A and Port B. De/cry fimclio11s are shown in Program 1.8.1 . The second step of initialization is to set the mode for each pin. The first column of Table 2.2. l shows the index value for each I/0 pin. The column header of Table 2.2.1 shows the Mode value to write to bits 5:0 of the PINCM register. Tab les 2.2.2 and 2.2.3 list some of the bits in the PINCM register. For example, to select PA 11 as UARTO_ Rx, we set PINCM index 2 1 bit 18 to allow input, bit 7 to connect the pin, and bits 5:0=0x02 to select UART0_ Rx mode (0x00040082). Bit 25 20 18 17

16 7 5:0

Field HiZ ORV INENA PIPU PIPO PC PF

Meaning if 1 Output states are HiZ and low Output pins are high drive Input operation enabled Input has passive pullup resistor Input has passive oulldown resistor Software is connected to pin See Table 2.2.1 for mode

Meaning if0 Output states are high and low Output oins are regular drive Input operation disabled Inout has no passive oulluo lnput has no passive pulldown Software is disconnected to oin No digital function

Table 2.2.2. Functionality of bits in the IOMUX->SECCFG.PINCM register.

Address 0x40428004 0x40428008 0x4042800C 0x40428010 .. . 0x404280F0

16

25 HiZ HiZ HiZ HiZ

20 ORV ORV ORV ORV

18 TNENA INENA lNENA INENA

17 PIPU PlPU PIPU PIPU

PIPO PIPO PIPO PIPO

7 PC PC PC PC

5:0 PF PF PF PF

Pin PAO PAI PA28 PA29

Name PINCMf0l PINCMfll PINCMf2l PlNCMf31

HiZ

ORV

INENA

PIPU

PTPD

PC

PF

PA27

PINCMf59l

Table 2.2.3. Each pin has a separate PINCM register. For PF bits, see Table 2.2.1.

Jonathan Valvano

75

Next, we set up PAO, PB22, PB2 6 and PB27 as outputs, which are connected to LEDs. The 3color LED on PB22, PB26 and PB27 requires more current than the red LED on PAO. The constant Ox00000081 means output pin. Pins PA3 l, PA28, PA l l, and PAIO can be high drive outputs, which the software selects with the constant Ox0O l 00081 . Write 0x00000081 to Write 0x000000Sl to Write 0x00000081 to Write 0x00000081 to

IOMUX->SECCFG.PINCM[PA0INDEX] IOMUX->SECCFG.PINCM[PB22INDEX] IOMUX->SECCFG.PINCM[PB26INDEX] IOMUX->SECCFG.PINCM[PB27INDEX]

For output pins PAO, PB22, PB26 and PB27, we also need to set the corresponding bits in the data output enable select register (DOE). Set bit 0 in GPIOA->DOE31_0 Set bits 22, 26, and 27 in GPIOB-> DOE31_0 Next, we set up PA18 and PB21 as inputs, which are connected to SI and S2 switches, respectively. The constant Ox0005008 l means input with internal pull down resistor. The constant Ox0006008 l means input with internal pull up resistor. Write 0x00050081 to IOMUX->SECCFG.PINCM[PA18INDEX] W.rite 0x00060081 to IOMUX->SECCFG.PINCM[PB21INDEX] For GPIO output, there is also data output enable register, DOE31_0. For each output pin, we set the corresponding bit in DOE3 I_O. Program 2.2.4 shows the code to initialize the LmmchPad.

#define REDl 1 #define BLUE (1SECCFG.PINCM[PA18INDEX] = Ox00050081; LOR Rl,=Ox00060081 // input with pull up LOR R0,=IOMUXPB21 STR Rl, [RO] // IOMUX->SECCFG.PINCM[PB21INDEX] = Ox00060081; LOR Rl,=Ox00000081 // regular output LOR R0,=IOMUXPB22 STR Rl, [RO] // IOMUX->SECCFG.PINCM[PB22INDEX] = Ox00000081; LOR RO,=IOMUXPB26 STR Rl, [RO] // IOMUX->SECCFG.PINCM[PB26INDEX] = Ox00000081; LOR RO,=IOMUXPB27 STR Rl, [RO] // IOMUX->SECCFG.PINCM[PB27INDEX] = Ox00000081; LOR R0,=GPIOA_OOE31_0 LOR R2,[R0] MOVS R3,#1 ORRS R2,R2,R3 // PAO output enable STR R2, [RO] LOR RO,=GPIOB OOE31 0 LOR R2, [RO] -// read all if OOE31_0 LOR R3,=( (1DTN31 0 GPIOB->DOUT3 I 0 GPIOB->DOUTSET3 I GPIOB->DOUTCLR31 GPIOB->DOUTTGL3 I GPIOB->DOE3 l 0 GP!OB->DIN3 l 0

77

0 0 0

0 0 0

Table 2.2.4. These I/O registers allow access to the PA31-PA0 and PB27-PB0.

The DOUTSET31_0 DOUTCLR31_0 DOUTTGL31_0 registers are write only. Reading from these registers has no effect. Writing O's to these registers have no effect. Writing l ' s to bits in these registers will modify the corresponding output pins. We use these three registers when different unrelated software modules need to access the same GPIO port.

Write 1s to DOU'fSET31_0 to make the output pins go bigh Write 1s to DOU"fCLR31_0 to make tb e output pins go low Write 1s to DOUl'TGL31_0 to toggle the output pins (invert from Oto 1 or 1 to 0) Program 2.2.4 shows the low-level software to input from the two switches on the LaunchPad, shown in Figure 2.2.7 . The voltage on PA18 will be 3.3V when SI is pressed and 0V when not pressed. To input S 1, we read from GPIOA->DIN31 _ 0 and observe bit 18. Since S 1 is positive logic, bit 18 will be set if S 1 is pushed and bit 18 will be clear if SI is released. The logical AND is included so the result only depends on bit 18: 0x00040000 if pressed and 0 if not pressed. The interface for S2 is negative logic, meaning voltage on PB2 l will be 0V when S2 is pressed and 3.3V when not pressed. To input S2, we read from GPIOB->DIN31_0 and observe bit 21. Since S2 is negative logic, bit 21 will be clear if S2 is pushed and bit 21 will be set if S2 is released. Bit 21 is inverted so the function returns 0x00200000 if pressed and 0 if not pressed. LaunchPad InSl: LDR Rl,=GPIOA_DIN31_0 LDR RO, [Rl] LDR R3,=(1OOUTTGL31_0 = R=D Titre= Time+l;

~ Clcc k.c

JJl mspm0g3507.cmd

X



fv1SP MO_CCS12 - TogglELED/ TogglELEDmain .c - Code Composer Studio

File

113

'I

Fig1m 3.3.1. The CCS prqject co111bines 11111/tiple components into one sojtJ11are S_)'sfem (ToggleLED ).

Before we write software, we need to develop a plan. Software development is an iterative process. Even though we li st steps the development process in a 1,2,3,4 order, in reality we cycle through these steps over and over. I like to begin with step 4), deciding how I will test it even before I decide what it does. l) We begin with a list of the inputs and outputs. This usually defines what the overall system will do. We specify the range of values and their significance. 2) Next, we make a list of the required data. We must decide how the data is structured, what does it mean, how it is collected, and how it can be changed. The organization of data will have a profound effect on the performance of our system. 3) Next, we develop the software algorithm , which is a sequence of operations we wish to execute. There are many approaches to describing the plan. Experienced programmers can develop the algorithm directly in C language. On the other hand, most of us need an abstractive method to document the desired sequence of actions. Flowcharts and pseudo code are two common descriptive formats. There are no formal rules regarding pseudo code, rather it is a shorthand for describing what to do and when to do it. We can place our pseudo code as documentation into the comment fields of our program . Next, we write software to implement the algorithm as defined in the flowchart and pseudo code. 4) The last stage is debugging. Learning debugging sk ills will greatly improve the quality of your software and the efficiency at which you can develop code.

114



3. Software Design

3.3.2. Organization of C software Documentation is impo1iant, so we begin with comments . The token I I specifies the remainder of the line is a comment. Comments can also be placed between the token I* and the token* I. There are two types of comments. The first type explains how to use the software or what the software do es. These comments are usually placed at the top of the file, within the header file, or at the start of a function. The reader of these comments will be writing software that uses or calls these routines. The second type of comments explains how the software works, assisting a future programmer in changing, debugging or extending these routines. We usually place these comments within the body of the functions. Every C program has a main, and execution will begin at this main program. There are four sections of a C program as shown in Program 3.3.1 . The first section is the documentation section, which includes the purpose of the software, the authors, the date, and any copyright information. When the software involves external hardware, we will add information about how external hardware is connected. The second section is the preprocessor directives. We will use the preprocessor directive #include to connect this software with other modules. We use diamond braces to include system libraries, like the standard I/O, and we use quotes to link up with other user code within the project. ln this case, the LaunchPad and Clock modules handle the LaunchPad and Clock respectively. Preprocessor lines begin with # in the first column, like the #include lines in Program 3.3.1. All preprocessor lines arc invoked first (first pass through the software), and then all the other lines will be compiled as regular C (second pass).

II II II II II

0. Documentation Section This program toggles Port B bit 26, red LED Author: Ramesh Yerraballi & Jon Valvano Date: 0612012023 Copyright: Simplified BSD License (FreeBSD) 1. Pre-processor Directives Section #include II MSPM0 portF #include" .. linclLaunchPad.h" II switches and LED on LauncPad #include " .. linclClock.h" II clock functions #define ONESEC 32000000 II 32000000*31.25ns = 1sec #define RED (1 + *

I % I & I\

Op_eration

-= != > ++ && 11 += = *=

I= I= &= "= = %= ->

117

Meaning_ Equal to comparison Less than or equal to Greater than or equal to Not equal to Shift left Shift right Increment Decrement Boolean and Boolean or Add value to Subtract value to Mu ltiply value to Divide value to Or value to And value to Exclusive or value to Shift value left to Shift value right to Modulus divide value to Pointer to a structure

Table 3.3.2. Special characters can be operators; operators can be made from 1, 2, or 3 characters.

void Pulse(void){ GPIOB->DOUTSET31 0

GPIOB->DOUTCLR31 0

= =

0x04000000; 0x04000000;



¥

Progra111 3.3.3. Semicolons are used to separate one statement fro/JI the next. Colons are used to terminate case and default prefixes that appear in switch statements. In Program 3.3.4 one output to the stepper motor is produced each time the function OneStep is called. The proper stepper motor sequence is I 0- 9- 5- 6. The default case is used to restart the pattern. The colon creates is a potential target for a transfer of control.

uint8_t Last=0x0A; void OneStep(void){ uint8_t theNext; switch (Last) ~ case 0x0A: theNext = 0x09; break; case Ox09:¥ theNext = 0x0S; break; case 0x0S: ::: theNext = 0x06; break; case 0x06: theNext = 0x0A; break; default:, theNext = 0x0A;

II II II II

10 to 9 9 to s 5 to 6 6 to 10

}

GPIOB->DOUT31 0

Last= theNext;

=

theNext; II set up for next call

Program 3.3.4. Colons are 11secl with the SJvitch state111ent, defining places to 1vhich we ectn jump.

118

• 3. Software Design Commas separate items that appear in lists. We can create multiple variables of the same type using commas.

¥

uint32_t beginTime,endTime,elapsedTime; Lists are also used with functions having multiple parameters, both when the function is defined and when it is called. Program 3.3.5 adds two 32-bit signed numbers, implementing ceiling and floor. Notice the use of commas in Program 3.3.5.

¥

int32_t add(int32_t x, int32_t y){ int32_t z; z = x+y; if((x>0)&&(y>0)&&(z
= -- != & ()

[]

A

I &&

II ?

= Lowest

+=

-=

*=

I=

%=

=

I=

'

123

&=

"=

Left to right Left to right Left to right Left to right Left to right Left to right Left to right Left to right Left to right Left to right Right to left Right to left Left to right

Table 3.3.4. Precedence and associativity determine the order of operation. Observation: When confused about precedence and associativity (and aren't we all) add parentheses to clarify the expression . Good software is easy to w1derstand .

3.3.6. Conditional Branch Instructions Normally the computer executes one instruction after another in a sequential or linear fashion. In particular, the next instmction to execute is found immediately following the current instruction. We use branch instructions to deviate from this straight line path. The branch instructions were presented earli er in Chapter 1. The following unsigned conditional branch instructions must follow a subtract or compare, such as SUBS and CMP.

BLO BLS BHS BHI

target target target target

//Branch if unsigned //Branch if unsigned //Branch if unsigned //Branch if unsigned

less than less than or equal to greater than or equal to greater than

ifC=0, same as BCC ifC=0 or Z=l if C= I , same as BCS if C= 1 and Z=0

After a subtraction the carry bit is 0 if there is an error and 1 if there is no error. To understand exactly how unsigned conditional branches work, let's start with the BLO instruction. We bring the first unsigned number into a register, and then subtract a second unsigned nwnber from the first. Let's call the first number First and the second number Second. The BLO instruction is supposed to branch if the first unsigned number is strictly less than the second. The two possibilities, branch or no branch, are illustrated in number wheels drawn in Figure 3.3 .2.

124



3. Software Design First < Second

First >= Second

~econd 232_1 ---0 First

.Figttre 3.3.2. N11mber wheel on left sho1vs the result of subtracting a big unsigned null/her from and the one on the right occurs 111he11 subtracting a small unsigned numher from a large one.

ct

little numhe,;

Assume for a moment that the condition is true, meaning First < Second. Since First < Second, First-Second should be a negative number. I.e., if we subtract a big unsigned number from a small Ltnsigned number, an unsigned overflow must occur, because the correct result of the subtraction is negative, but there are no negative numbers in the unsigned format. Thus, the C bit must be clear (C=O means overflow error). The left side of Figure 3.3.2 shows the subtraction will always cross the 0- (2 32- 1) barrier because First < Second. Conversely, assume the condition is false, meaning the first unsigned number is greater than or equal the second. The right side of Figme 3.3.2 shows when we subtract the smaller second number from the bigger first number we get the correct result. In this case, the C bit will be set (C= 1 means no overflow error). Thus, the BLO instruction can be defined as branch if C=O. The BHS instruction is the logical complement of BLO, so BHS instruction will branch if the C bit is set. The BLS instruction will branch if the first number is less than the second (C=O) or if the two numbers are equal (Z= 1). Hence, the operation of the BLS instruction can be defined as branch if (C=O or Z=l). Lastly, the BHI instruction is the logical complement of BLS, so BHI instruction will branch if (C= l and Z=O). The following signed branch instructions must follow a subtract or compare instruction, such as SUBS and CMP. BLT target

BGE target BGT target BLE target

II if signed less than if(~N&V I N&~V)=l II if signed greater than or equal to if(~N&V I N&~V)=O II if signed greater than if(Z I ~N&V I N&~V)=O II if signed less than or equal to if(Z I ~N&V I N&~V)=l

if NjV if N=V ifZ=O and N=V if Z=l orN:IV

To understand exactly how signed conditional branches work, we will begin with the BLT instruction. We bring the first signed number into a register, and then subtract a second signed number from the first. The BLT instruction is supposed to branch if the first signed number is strictly less than the second. Assume for a moment that First < Second, thus the branch should occur. Since First < Second, First-Second should be a negative number. Let' s fu11her dissect this

Jonathan Valvano

125

case into two subcases. Ifthe V bit is clear, the subtraction is correct and the N bit will be 1. This subcase defines the N&-V tenn. If the V bit is set, the subtraction is incorrect and the result will be incorrectly positive, making N bit 0. This subcase defines the - N&V term. Conversely, assume the condition is false, meaning First 2: Second, and the branch should not occur. Since First 2: Second, First-Second should be a positive number. If the V bit is clear, the subtraction is correct and the N bit will be 0. If the V bit is set, the subtraction is incorrect and the result will be incorrectly negative, making N bit 1. Thus, the BLT instruction can be defined as branch if (-N&V I N&-V)=I. The BGE instruction is the logical complement of BLT, so BGE instruction will branch if (-N&V I N&- V)=0. The BLE instruction will branch if the first number is less than the second ((~N&V I N&-V)=l) or if the two numbers are equal (Z=l). Combining the less than with the equal conditions, the operation of BLE instruction can be defined as branch if (Z I ((- N&V) I (N&~V)))=l. Lastly, the BGT instruction is the logical complement ofBLE, so BGT instruction will branch if(Z I ((~N&V) I (N&-V)))=0. Decision making is an impo11ant aspect of sof1ware programming. Two values are compared and certain blocks of program are executed or skipped depending on the results of the comparison. In assembly language it is important to know the precision (e.g., 8-bit, 16-bit, 32-bit) and the format of the two values (e.g., unsigned, signed). It takes three steps to perform a comparison. You begin by reading the first value into a register. If the second value is not a constant, it must be read into a register, too. The second step is to compare the first value with the second value using a CMP instruction. The CMP instruction sets the condition code bits. The last step is a conditional branch. Observation: Think of tbe three steps 1) bring two values into registers, 2) compare to second value, 3) conditional branch, bxx (wbere xx is eq ne lo ls hi hs gt ge l tor le). The branch will occur if (the first is xx tbe second).

In Programs 3.3.10 and 3.3.11, we assume G and Hare 32-bit unsigned variables in registers R4 and R5 respectively. The first one in Program 3.3.10 will call GEqualH if G equals H, and the second one will call GNotEqualH if G does not equal H. When testing for equal or not equal it doesn't matter whether the numbers are signed or unsigned. Assembly code CMP R4, RS BNE nextl BL GEqualH nextl: CMP R4, RS BEQ next2 BL GNotEqualli next2:

II II II

is G --- H ? if not, skip G == 7

II II II

is G != H ? if not, skip G != H

C code uint32 t G,H; if(G == H){ GEqualH(); }

if(G != H){ GNotEqualH(); }

Program 3.3. 10. Conditional structures that test for equality (this works 1vith signed and unsigned numbers) . When testing for greater than or less than , it does matter whether the numbers are signed or unsigned. Program 3.3. 1 I contains four separate unsigned if-then structures. In each case, the first step is to bring the two values into registers; the second step is to compare the fust value with a second value; and the third step is to execute an unsigned branch Bxx. The branch will occur if the first unsigned value is xx the second unsigned value.

126

• 3. Software Design Assembly code CMP R4, RS BLS nextl BL GGreaterH nextl: CMP R4, RS BLO next2 BL GGreaterEqH next2: CMP R4, RS BHS next3 BL GLessH next3: CMP R4, RS BHI next4 BL GLessEqH next4:

II II II

is G > H? if not, skip G > H

II II II

is G >= H? if not, skip G >= H

II II II

is G < H? if not, skip G < H

II II II

is G = H){ GGreaterEqH(); }

if (G < H) { GLessH () ; } if(G lOO

.Figure 3.3.3. F/01vchart of an if-then structure. LOR R2,=Gl LDRB R0,[R2] MOVS Rl,#100 CMP R0,Rl BLS next MOVS Rl,#1 LDR R2 ,=G2 STRB Rl, [R2] next:

II II

R2 = &Gl RO = Gl

II II II II II

is if Rl R2 G2

Gl > 100? not, skip to end = 1 = &G2 = 1

uint8 t Gl, G2; if(Gl > 100){ G2 = l;

-

}

Program 3.3.12. An unsigned ifthen structure. LDRB used because 8-bit, BLS used because it is uns H? if not, skip G > H

II II II

is G >= H? if not, skip G >= H

II II II

is G < H? if not, skip G < H

C code int32 t G,H; if(G > H){ GGreaterH(); }

if (G >= H) { GGreaterEqH(); }

if(G < 7){ GLessH (); }

II II II

is G 25) isGreater () ; wh y is it important to know ifN is signed or unsigned?

Notice that the C code for Program 3.3. 11 looks similar to Program 3.3.13 , and the C code for Program 3.3.12 looks similar to Program 3.3.14. This is because the compiler knows the type of variables G 1 and G2 ; therefore, it knows whether to utilize unsigned or signed branches. Unfortunately, this similarity can be deceiving. Wl1en writing code whether it be assembly or C, you still need to keep track of whether your variables are signed or unsigned. Fmthermore, when comparing two objects, they must have comparable types. E.g. , "Which is bigger, 4,294,967,295 or - I?" (they have the same 32-bit binary.) The compiler does not seem to reject comparisons between signed and unsigned variables as an error. However, I recommend that you do not compare a signed variable to an unsigned variable. When comparing objects of different types, it is best to first convert both objects to the same format, and then pe,form the comparison. Conversely, we see that on the ARM Cortex, all numbers are converted to 32 bits before they are compared. This means there is no difficulty comparing variables of differing precisions: e.g., 8bit, 16-bit, and 32-bit as long as both are signed or both are unsigned. We can use the unconditional branch to add an else clause to any of the previous if then structures. A simple example of an unsigned conditional is illustrated in the Figure 3.3.4 and presented in Program 3.3.15 . Assume Gl is in R4, and G2 is in RS. The first three lines test the condition Gl>G2. lfGl>G2, the software branches to high. Once at high, the software calls the isGreater subroutine then continues. Conversely, if Gl:::;G2, the software does not branch and the isLessEq subroutine is executed. After executing the isLessEq subroutine, there is an unconditional branch, so that only one and not both subroutines are called.

Gl G2

Figure 3.3.4. Flowchart of an if-then-else stmct11re. CMP BHI BL low: B high: BL next:

R4, RS high isLessEq next isGreater

II II II II II

is Gl > G2 ? if so, go to high Gl G2

uint32_t Gl,G2; if(Gl > G2) { isGreater(); }

else{ isLessEq(); }

Pi-ogram 3.3.15. An unsigned if-then-else structure (unsigned 32-bit). Common error: It is an errnr to use an unsigned conditional branch when comparing two signed values. Similarly, it is a mistake to use a signed conditional branch w hen comparing two unsigned values.

Jonathan Valvano

129

Observation: One c::annot directly compare a signed number to an unsigned number. The proper method is to first convert both numbers to signed numbers of a higher precision and then compare. Checkpoint 3.3.5: Assume you have a 16-bir signed global varia ble M. Write assembly code that implements i f (M > 1000) isGreater () ; else isLess () ;

The selection operator takes three input parameters and yields one output result. The format is Exprl? Expr2

: Expr3

The first input parameter is an expression, Exprl, which yields a Boolean (0 for false, not zero for true) . Expr2 and Expr3 return values that are regular numbers. The selection operator will return the result of Expr2 if the value of Exprl is true, and will return the result of Expr3 if the value ofExprl is false. The type of the expression is determined by the types ofExpr2 and Expr3. IfExpr2 and Expr3 have different types, then promotion is applied. For example, the left and right side of the following example perform identical functions . If b is 1 set a equal to 10, otherwise set a to I. a=

(b==l)

? 10

1;

if(b == 1){ a= 10; }

else{ a= 1;

Switch statements provide a non-iterative choice between any number of paths based on specified conditions. They compare an expression to a set of constant values. Selected statements are then executed depending on which value, if any, matches the expression. The expression between the parentheses following switch is evaluated to a number and compared one by one in top to bottom order to the explicit cases. Figure 3.3.5 draws a flowchart and shows the software that perfonns one output each time the function OneStep is called. The break causes execution to exit the switch statement. The default case is run if none of the explicit case statements match. The operation of the switch statement in Figure 3.3 .5 pe1forms this list of actions: if Last If Last If Last If Last If Last

is is is is is

equal to 10, then theNext is set to 9. equal to 9, then theNext is set to 5. equal to 5, then theNext is set to 6. equal to 6, then theNext is set to 10. not equal any of the above, then theNext is set to I 0.

When using break, only the first matching case will be invoked. In other words, once a match is found, no other tests are perfonned . The body of the switch is not a normal compound statement since local declarations are not allowed in it or in subordinate blocks. Assume the output port is connected to a stepper motor, and the motor has 24 steps per rotation. Calling OneStep will cause the motor to rotate by exactly 15 degrees, because 15 degrees is 360 degrees divided by 24.



130

3. Software Design

OneSte

uint32 t Last=l0; void OneStep(void){ uint32 t next; switch(Last){ case 10: next= 9; break; case 9: next= 5; break; case 5: next= 6; break; case 6: next= 10; break; default: next= 10;

/

Output ~ext to PORT .iy'

return

}

GPIOB DOUT31 0 = next; Last= next; } Figure 3.3.5. The switch statement is Nsed to 111ctke mu!tzple co111pctrisons. Program 3.3.16 conve11s an ASCII character to the equivalent decimal value. This example of a switch statement shows that the multiple tests can be perfonned for the same condition.

uint8_t Convert(char letter){ uint8_t digit; switch (letter) case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': digit = letter+lO- 'A ' ; case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': digit= letter+lO-'a'; default: digit= letter-'0';

break;

break;

}

return digit; }

Progm111 3.3. 16. A s1JJitch stctte/7/ent is used to convert an ASCII character to mmmic Mlue.

Jonathan Valvano

131

3.3.7. Loops Quite often the microcomputer is asked to wait for events or to search for objects. Both of these operations are solved using a looping structure. A while loop performs the test condition first. Inside the parentheses of the while an expression is evaluated (e.g., bit 18 set), and the body of the while loop is executed over and over while the condition is true. Once the condition is false, the while loop terminates. A simple example of while loop is illustrated in the Figw-e 3.3 .6 and presented in Program 3 .3 .17. It is possible the body of the loop is never executed, if the condition is initially false. On the other hand, it is possible the loop executes forever, if the condition remains true forever.

bit 18 clear

figttre 3.3.6. F!oJJJchart of a JJJhi/e structure. Execute Bocfy0 over and over 1111ti/ PortA hit 18 is clear. Program 3.3.17 begins with reading all of Port A. lt will test bit 18 and skip to end if bit 18 is clear. If bit 18 is set it will execute the body. The unconditional branch (B loop) after the body causes Port A bit 18 to be tested again. In this way, the body is executed repeatedly until bit l 8 is clear. The use of (1 DIN31 0&0xFF; out= Conve~t(in); GPIOA->DOUT31 0 = out;

Program 3.3.20. A f unction with one input and one output.

II 4 II 5 II 6 II 7 II a II 9

134



3. Software Design ln C, execution begins with the main program. The execution sequence is shown below (numbers on right are line nwnbers from Program 3.3.20). Every time the function is invoked, the value of the input parameter is passed into the function. On the first call, the parameter n will be 0. On other calls, the value in variable in is copied to the parameter n. After the operations within the function are performed, control returns to the place right after where the function was called. In this example the return parameter is stored in the variable out. The execution sequence repeats lines 6,7 ,8, 1,2,9 indefinitely

void rnain(void){ int32 t in,out; out= Convert(0); d = (1825*n)>>12 -155; return (d); while(l){ in= GPIOB->DIN31 0&0xFF; out= Convert(in); d = (1825*n)>>12 -155; return(d); GPIOA->DOUT31 0 = out; while(l){ in= GPIOB->DIN31 0&0xFF; out= Convert(in); d = (1825*n)>>12 -155; return (d) ; GPIOA->DOUT31 0 = out; while(l){ in= GPIOB->DIN31 0&0xFF; out= Convert(in); d = (1825*n) >>12 -155; return (d) ; GPIOA->DOUT31 0 = out;

-

-

II II II II II II II II II II II II II II II II II II II II II II

4

5 1 2 6 7

8 1 2 9

6 7

8 1 2 9

6 7

8 1 2 9

Some functions have neither an input or output parameter. To specify the absence of a parameter we use the expression void. The body of a function consists of a statement that performs the work. Normally the body is a compound statement between a {} pair. Iftbe function has a return parameter, then all exit points must specify a value to return. Notice the return parameter has a type. It is important for the C programmer to distinguish the three terms declaration, definition and invocation. A function declaration specifies its name, its input parameters and its output parameter. Another name for a function declaration is prototype. A data structure declaration specifies its type and format. On the other hand, a function definition specifies the exact sequence of operations to execute when it is called. A function definition will generate object code, which are machine instructions to be loaded into memory that perform the intended operations. A data structure definition will reserve space in memory for it. The confusing part is that the definition will repeat the declaration specifications. The C compiler perfotrns just one pass through the code, and we must declare data/functions before we can access/invoke them. To run , of course, all data and functions must be defined. We invoke a function by placing its name within our software and passing parameters as needed. If the function has a return parameter, that value is substituted in place of the function call within the code that invoked the function .

Jonathan Valvano

135

For example the declm·ation or prototype for the function Random in Program 3.3 .21 specifies the function has no inputs, is named Random, and has one output of type 32-bit unsigned integer. The declaration, along with the comments associated with the declaration, shows us how to use the function or what the function does, but not how the function works. The preprocessor performs the first pass through the program that handles the preprocessor directives. However, the second or compi lation pass goes through just once. This means an object must be declared or defined before it can be used in a statement. A top-down approach is to first declare low-level functions as prototypes, use the functions in high-level code, and lastly define the low- level functions at the end as illustrated in Program 3.3.21. In this way you place the high-level software at the top of the file and low-level software at the bottom.

uint32_t Random(void); 11 Declaration I void main(void) {uint32_t n; . .~ 1 while ( 1) { Invocation n = Random()/4; // n varies from Oto 63 ~-----~

-~----:------:---:~--=- I

} }

__.-i

I Definition I

uin t32 _ t M=l; ~ I .____ ___, uint32_t Random(void){ // returns a number from Oto 255 M = 1664525*M+l013904223; return(M>>24); }

Program 3.3.21 . A 1JJain program that calls a function. In this case the declaration occurs .first. A bottom-up approach is to define the low-level functions at the top of the file and place the high-level functions at the bottom, as illustrated in Program 3.3.22. In the bottom-up approach, the definition both declares its structure and defines what it does. In bottom-up, you see the lowlevel functions fast, and therefore declarations are not needed.

-----1 1Definition I.

uin t32 t M=l ; ---------uint32=t Random(void){ // returns a number from Oto 255 M = 1664525*M+l013904223; return(M>>24);

void main (void) {uint32 _ t n; while(l){ n = Random()/4; // n varies from Oto 63

Invocation

Program 3.3.22. A main program that calls a.f11nction. In this ccm the difinitio11 occurs before its 11se.

136

• 3. Software Design

3.3.9. Pointers A pointer is simply an address. There are three steps to using pointers: 1) defining or allocating the pointer; 2) initializing the pointer to point to an object; 3) dereferencing the pointer to read data from the object or write data into the object. In Figure 3.3.8, Pt is shown pointing to Objectl and then later, Pt is shown pointing to Object2. For this figure we assume Objectl and Object2 are the same type.

Not pointing to anything

N/5

I Objectl I I Object2 I

Pointing to Object]

Pt

Pointing to Object2

-I Objectl I I Object2 I

Objectl Pt~'

I

Object2

I I

Figure 3.3.8. Pointers are addresses pointing to oijects. The oijects 111qy be data, functions, or other pointers. At the assembly level, we implement pointers using indexed addressing mode. For example, a register contains an address, and the instruction reads or writes memory specified by that address. Basically, we place the address into a register, then use indexed addressing mode to access the data. ln this case, the register holds the pointer. An array or string are a simple structure containing multiple equal-sized elements. We set a pointer to the address of the first element, then use indexed addressing mode to access the elements inside. We will use the pointer Pt to access data in the 16-bit array Prime. When we define a pointer in C we give it a type. This pointer type means Pt points to unsigned 16-bit data in ROM. The pointer is an address, but the type reference to the object to which it points. The first step to using pointers is to define it. Notice that Pt is in RAM and Prime is in ROM. Manipulating addresses in assembly always involves the physical byte-address regardless of the type of the data.

.data

uint16_t const *Pt; uint16_t const Prime[lO]={ 1,2 , 3,5 , 7,11,13,17,19,23};

II Pt is a pointer Pt: Prime:

. space 4 .text .short 1,2,3,5,7,11 .short 13,17,19,23

ln this case, the const does not indicate the pointer is fixed. Rather, the pointer refers to constant 16-bit data in ROM. Furthem1ore, since pointers are addresses, the size of every pointer is 4 bytes. In this example, Pt is a 4-byte variable in RAM that points to unsigned 16-bit data in ROM. The second step to using pointers is to in itialize the pointer at run time . LOR Rl,=Prime II => Prime LOR RO,=Pt II => Pt STR Rl, [RO] II Pt is a pointer to Prime[O]

Pt =

Prime;

Pt =

&Prime[O];

/

1

Pt

or 16 bits

Jonathan Valvano

137

You should not use . lng to define/allocate RAM-based variables in microcontrollers, because RAM has no initial valt1e when power is applied to the microcontroller. C initializes all globals to O after reset and before main. Assembly does not automatically initialize globals. I.e. , the assembly Pt will be garbage on a power up. So, in both assembly and C, we must initialize pointers before they are used. The third step to use pointers is to dereference it. To read the data pointed to by Pt, we use *Pt. The LDR Rl ,[RO] fetches the pointer; LDR is used because the pointer is 32 bits. The LDRH is used to fetch data, because this object is 16-bit unsigned. For the assembly code, we assume variable data is in R2. LDR RO,=Pt LDR Rl, [RO] LDRH R2 , [Rl]

II II II

RO is pointer to Pt Rl is value of Pt R2 is the value

uint16 t data; data= *Pt; II data contains the value

Now, to increment the pointer to the next element in C, use the expression Pt++. In C, Pt++, which is the same thing as Pt=Pt+l; actually adds two to the pointer because it points to halfwords. However, in assembly, we have to explicitly add 2 to the pointer. LDR LDR ADDS STR

RO,=Pt Rl, [RO] Rl,#2 Rl, [RO]

II II II II

pointer to Pt Rl is value of Pt each data is 16 bits update Pt

Pt=Pt+l;

Observation: In assembly, we add/ sub one to the pointer when accessing an 8-bit array, add/sub two when accessing a 16-bit array, and add/sub four when accessing a 32-bit array. Observation: In C, we add/ sub one to the pointer regardl ess of the type of the array.

3.3.10. Call by value vs Call by reference According to AAPCS, the first four input parameters are passed in Registers RO - R3 , and the output parameter is returned in RO. With an input parameter using call by value, a copy of the data is passed into the subroutine. For an output parameter using return by value, the result of the subroutine is a value, and the value itself is returned. Alternatively, if you pass a pointer to the data, rather than the data itself, we will be able to pass large amounts of data. Passing a pointer to data is classified as call by reference. For large amounts of data, call by reference is also ve1y fast, because the data need not be copied from calling program to subroutine. In call by reference, the one copy of the data exists in the calling program, and a pointer to it is passed to the subroutine. In this way, the subroutine actually perforn1s read/write access to the original data. Call by reference is also a convenient mechanism to return data as well. Passing a pointer to an object allows one parameter to be both an input parameter and an output parameter.

138



3. Software Design Programs 3.3 .23 and 3.3.24 illustrate the difference between call by va lue and call by reference. The goal is to swap values of two variables. The call by value function (BadSwap ), when invoked has no effect on the main program global variables a, b because copies of the data are passed. Hence, the copies are swapped, not the original.

// Inputs: RO=aa value // Rl=bb value BadSwap: MOV R2,RO // MOV RO,Rl // MOV Rl,R2 // BX LR main: LDR RO,=a LDR RO, [RO] // LDR Rl,=b LDR Rl, [RO] // BL BadSwap loop: B loop

R2=aa RO=bb Rl=aa

copy of a copy of b

void BadSwap(int32_t aa, int32 t bb) { int32 _ t tmp; tmp = aa; aa = bb; bb = tmp; int32_t a=33, b=44; int main(void){ BadSwap (a,b); while (1) {};

Program 3.3.23. Call ry ve1/tte function to sivap values of /Jvo variables (this version does not work). The call by reference function (GoodSwap), when invoked does swap the values of the global variables a, b because addresses are passed, so the original data are modified. Notice in this example, aa and bb both enter data into the function and return data from the function.

II II

Inputs: RO=address aa Rl=address bb GoodSwap: LDR R2, [RO] //read *aa LDR R3, [Rl] II read *bb STR R2, [Rl] II write *bb STR R3, [RO] II write *bb BX LR main: LDR RO,=a // address of a LDR Rl,=b // address of b BL GoodSwap loop: B loop

void GoodSwap(int32 t *aa, int32 t *bb) { int32_t tmp; tmp = *aa; *aa = *bb; *bb = tmp; int32 t a=33, b=44; int main(void){ GoodSwap(&a,&b); while(l){};

Progra111 3.3.24. Ce1ll l?J r~ferencejimction lo swap ve1lt1es ofhvo variables (this versio11 works).

Observation: Call by reference can be used when we wish to pass a lot of data. To pass a lot of data, we simply pass a pointer to the data. Observation: Ca.LI by reference can be used when we wish to change the original data. Observation: Call by reference can be used when we wish to return more than one value.

Example 3.3.3. Write a function to find the maximum value in an array of unsigned 32-bit data. Solution: The algorithm (Figure 3.3.9) to find the maximum is to set an initial ans to the smallest possible value, in this case 0. Next, we look at all the values in the array, and if the value in the array is larger than the ctment value in ans, ans is replaced with the new maximum value. The do-while loop structure is used to handle the case where the size is 0, see Program 3.3.25.

Jonathan Valvano

139

max(data,size)

return(ans)

Figure 3.3.9. Flowchart to find the maximum.

II II

RO is a pointer to data array Rl is the size of the array max: PUSH {R4,R5,LR} MOVS R4,#0 II index i MOVS RS,#0 II ans loop: CMP R4,Rl II done? BBS done LSLS R2,R4,#3 II R2=4*i LDR R3, [RO ,R2] II data[i] CMP R3,R5 BLS skip MOV R5,R3 II new max skip: ADDS R4,R4,#1 II i++ B loop done: MOV R0,RS II return in RO POP {R4,R5,PC}

uint32 t max(uint32_t data[], uint32 t size) { uint32_t ans,i; ans= 0; for(i=0; i < size; i++){ if(data[i] > ans){ ans= data[i];

return ans;

Pr·ogram 3.3.25. Function to find the maximitm value of the arrc!J. In C, we can access an element of the array using its name and an index. In assembly, we can perform a similar function using indexed addressing, see Program 3.3.25. The index i is a 32bit local vaiiable, defined in R4, and initialized to zero. i takes on the values 0, 1, 2, ... The result ans is another 32-bit local variable, defined in RS, and initialized to zero. The address of the ith element is base+4*i. RO will contain the base address of the data aITay and R2 will constain 4*i. The instruction LDR R3, [RO, R2] will read the contents of data[iJ into R3. Sotting is an operation where data in an array is shuffled around so the values go from minimum to maximum. We have two buffers, defined in RAM, which initially contains data in an unsmted fash ion. The buffers shown here are uninitialized, but assume previously executed software has filled these buffers with corresponding vo ltage and pressure data. In C, we could have intB t VBuffer[lOO]; intB t PBuffer[200];

II II

voltage data pressure data

140



3. Software Design Since the arrays have more data than will fit in the registers, we will use call by reference. In C, to declare a parameter call by reference we use in t8 _ t * a or int8 _ t a [] . The Bubble Sort algorithm uses two loops. lt is easy to understand but much slower to execute than other algorithms. If the size of the buffer is n, then the execution speed of this sort routine is related to n2 . We define the complexity of this algorithm as O(n 2).

void BubbleSort(int8_t a[], int n) { int i,j; int8_t tmp; for(i=0; i < n; i++) { II loop n times II previous elements sorted already for(j=0; j < n-i-1; j++) II swap if needed if(a[j] > a[j+l]) { tmp = a[j]; a[j] = a[j+l]; a[j+l] = tmp;

ln C, to invoke a function using call by reference we pass a pointer to the object. These two calling sequences are identical, because in C the array name is equivalent to a pointer to its first element. The & operator is used to get the address of a variable. void rnain(void){ BubbleSort(Vbuffer,100); BubbleSort(Pbuffer,200);

void rnain(void){ BubbleSort(&VBuffer[0] ,100 ) ; BubbleSort(&PBuffer[0] ,200);

Example 3.3.5. Design a fonction to average all values of a fixed-length array. Solution: The for loop in Program 3 .3.26 is used to calculate the average of 100 data points. The loop variable R2 goes 0,4,8, 12, ... 396 as it accesses all 100 entries.

II RO points to array II return RO as average of data Ave: PUSH {LR} MOVS Rl,#0 II sum=0 MOVS R2,#0 II i=O LOR R3,=400 loop: LOR R3, [R0,R2] IIData[i] ADDS Rl,Rl,R3 ADDS R2,R2,#4 II 4 bytes CMP R2,R3 BNE loop MOVS R0,Rl II sum MOVS Rl,#100 BL udiv32 16 II R0=surnll00 POP {PC}

Program 3.3.26. For-loops can be used to traverse arrays.

II assume array has 100 elements uint32_t Ave(uint32_t data[]){ uint32_t i,surn; sum= 0; for(i=0; iO LDR R3,=65535 loop: LSLS R4,R2,#1 II 2*i LDRH R2, [RO,R4] II buf[i] CMP R2,R3 II termination? BEQ done ADDS Rl,Rl,R2 II add to result ADDS R2,#1 II next index B loop done: MOVS RO,Rl II return result POP {R4,PC}

uint16_t Sum(uint16_t buf[]){ uint16_t result=O; int i=O while(buf[i] != 65535){ result =result+ buf[i]; i++; return result;

Program 3.3.27b. Function to s1,m all elements of a variable length array. Data are 16 bits unsigned.

Example 3.3.8. Design a function to calculate the dot product. Solution: The implementation on the left uses pointer-access and the implementation on the right uses index-access, Program 3.3.28. The array is passed by reference as a pointer to the array. The function prototypes look different, but are actually identical. This implementation does not check for overflow, which could occur with large numbers or large ru.-rays.

142



3. Software Design

int32_t dot(int32_t *p, int32 t *q, int32_t length) { int32_t sum=O; for(int i=O;i< length;i++){ sum+= (*p)*(*q); p++; q++;

int32_t dot(int32_t a[], int32 t b[], int32_t length){ int32_t sum=O; for(int i=O;i< length;i++){ sum+= a[i]*b[i]; return sum;

return sum;

Progmm 3.3.28a. Function to calculate dot-product. The following main program illustrates how to call the dot-product function. #define LEN 5 int32 t aa[LEN]; int32 t bb[LEN]; int main(void){ int32_t result; for(int i=O; i< LEN;i++){ aa[i] = i; bb[i] = 5;

result= dot(aa,bb,LEN); while (1) { }

Program 3.3.28h. Invoking the dot-product function . From a security perspective, call by reference is more vulnerable than call by value. If we have important information, then a level of trust is required to pass a pointer to the original data to a subroutine. Since call by value creates a copy of the data at the time of the call, it is slower but more secure. With call by value, the original data is protected from subroutines that are called.

3.3.11. Strings Observation: Most C compilers have standard libraries. If you include "s tring.h" you will have access to many convenient string operations.

When dealing with strings we must remember that they are arrays of characters with null termination. In C, we can pass a string as a parameter, but doing so creates a constant string and implements call by reference. Assuming Hello is as defined above, these three invocations are identical:

OutString(Hello); OutString(&Hello[0]); OutString("Hello world\n\r");

Jonathan Valvano

143

Previously we dealt with constant strings. With string variables, we do not know the length at compile time, so we must allocate space for the largest possible size the string could be. E.g., if we know the string size could vary from O to 19 chasacters, we would allocate 20 bytes. char Stringl[20]; char String2[20];

Example 3.3.9. Write software to output an ASCII string an output device. Solution: Because the string may be too long to place all the ASCII characters into the registers at the same time, call by reference parameter passing will be used. With call by reference, a pointer to the string will be passed . The function OUtString, shown in Program 3.3.29, will output the string data to the display. Versions of the function OutChar will be developed later in Chapters 6 and 8 that send data out the LCD and UART respectively. When running in the debugger, UART data is observable on CCS in a terminal window. For now all we need to know is that it outputs a single ASCII character. In the assembly version R4 is used because we know by AAPCS the function OutChar will preserve R4 to R11. Within the body of the while loop, R4 is a pointer to the string; one is added to the pointer each time through the loop because each element in the string is one byte. Since this function calls a subfunction it must save LR. The POP PC operation will perform the function return. //Input: RO points to OutString: PUSH {R4, LR} MOV R4, RO loop: LDRB RO, [R4] ADDS R4, #1 II CMP RO, #0 II BEQ done II BL OutChar II B loop done: POP {R4, PC}

string

// displays a string void OutString(char *pt){ while(*pt) { OutChar(*pt); II output pt++; II next

next done? 0 termination print character

} }

Program 3.3.29. A variahle length string contains ASCII data.

ln C, we cannot assign one string to another. I.e. , these are illegal Stringl = "Hello"; String2 = Stringl;

//********illegal************ //********illegal************

We can copy strings by calling a function called strcpy. This function takes two pointers. It copies the string at the second pointer into the space defined by the first pointer. We must however make sure the destination string has enough space to hold the string being copied . strcpy(Stringl,"Hello"); strcpy(String2,Stringl);

// copies "Hello" into Stringl // copies Stringl into String2

144

• 3. Software Design Program 3.3.30 shows two implementations of this string copy function. RO and RI are pointers, and R2 contains the data as it is being copied. In this case, dest++; is implemented as an "add l" because the data is one byte each. For halfword data, the increment pointer would be "add 2". For word data, it would be "add 4". A do-while structure is used because even if the source string is empty, we will transfer at least one byte. The operation *source++ is the same as (*source)++ because * has precedence over ++. So, it will first read the contents at *source, and then increment the pointer. However, adding the parentheses makes the code easier to understand.

II Input: RO=&dest strcpy: LDRB R2, [Rl] STRB R2, [RO] ADDS Rl,#1 ADDS R0,#1 CMP R2,#0 BNE strcpy LR BX

Rl=&source source data copy next

II II II II

termination?

// copy string from source to dest void strcpy(char *dest, char *source){ char data; do{ data= (*dest)++ = (*source)++; } while(data);

Program 3.3.30. Silllple string copy junctions. Example 3.3.10. Write software to calculate the length of a string. Solution: The first implementation uses index addressing, see Program 3.3.31. The end of the string is signified by the sentinel, 0. The address of the i th element is base+i, because each element is 1 byte. A while loop is used because the string may be empty. //Input: RO points to string //Output: RO is length of string slen: MOVS Rl,#0 // count loop: LDRB R2,[RO,Rl] // value CMP R0,#0 // done? BEQ done // 0 termination ADDS Rl,#1 // count++ B loop done: MOV RO,Rl // return count BX LR

// calculate string length uint32_t slen(char *a){ uint32_t count=O; while(a[count]){ count++; // next return count;

Program 3.3.31. Calculate string length using indexed addressing. The second implementation uses pointer addressing, see Program 3.3.32 . Each time through the loop the assembly pointer is increased by 1 because each element is 1 byte. //Input: RO points to string //Output: RO is length of string slen2: MOVS Rl,#0 // count loop2: LDRB R2, [RO] // value at *a CMP R0,#0 // done? BEQ done2 // 0 termination ADDS Rl,#1 // count++ ADDS R0,#1 // a++ B loop done2: MOV RO,Rl // return count BX LR

Program 3.3.32. Calculate string length using pointer addressing.

// calculate string length uint32_t slen2(char *a){ uint32_t count=O; while(*a){ count++; II next a++; return count;

Jonathan Valvano

145

Observation: Notice the prototypes of slen and slen2 are the same, meaning the choice of index versus pointer addressing is up to the programmer.

3.3.12. Standard 1/0 Driver and the printf Function A very powerful approach to I/O is to provide a high-level abstraction in such a way that the I/O device itself is hidden from the user. There are two printf projects on the book's web site. The overall purpose of each of these examples is to provide an output stream using the standard printf function. Using the project UART_busywait, we send the output data stream through UART to the PC. The project ST7735 implements a similar approach sending data through SPI to a color LCD ST7735 display. In each implementation, there is an initialization function that must be called once, and a general function printf() we use to output data in a standard way. At the low level , we implement how the output actually happens by writing a uart_write function. The uart_write function is a private and implemented inside the UART.c, At the high level, the user performs output by calling printf. This abstraction clearly separates what it does (printf outputs infonnation) from how it works (sends data to the display over UART or SPI). By rewi-iting the low level, we could redirect the output to other devices. The call to printf has a string parameter followed by a list of values to display. Assume cc is an 8-bit variable containing 0x56 ('V'), xx is a 32-bit variable containing 100, and yy is a I 6bit variable containing -100, zz is a 32-bit floating containing 3.14159265. The following illustrate the use of printf. After the format parameter, printf requires at least as many additional argmnents as specified in the format. Example code printf("Hello world\n"); printf ( "cc %c %d %#x\n",cc,cc,cc); printf ( "xx %c %d %#x\n",xx,xx,xx); printf("yy %d %#x\n" ,yy,yy); printf("zz = %f %3.2f\n" ,zz,zz);

= = =

Output Hello world V 86 0x56 cc xx d 100 0x64 -100 0xffffff9c yy zz 3.141593 3.14

= = = =

Escape sequences are used to display non-printing and hard-to-print characters. In general, these characters control how text is positioned on the screen, see Table 3.3 .5. Character backslash carriage return double quote horizontal tab newline null character single quote STX question mark

Value 0xSC 0x0D 0x22 0x08 0x0A 0x00 0x27 0x02 0x3F

Table 3.3.5. Escape sequences.

Escape Sequence

\\ \r

\" \t \n

\0

\' \x02

\?

(this syntax works for any 2-digit hex value)

146

• 3. Software Design When the program is executed, the control string will be displayed exactly as it appears in the program with two exceptions. First, the computer will replace each conversion specification with a value given in the other arguments part of the printf statement. Second, escape sequences will be replaced with special non-printing and hard-to-print characters. To display the contents of a variable we add a% tag into the format string the specifier defines the type as listed in Table 3.3.6 . The floating-point specifiers have been omitted. %[flags] [width] [.precision]specifier

Specifier C

d or i ld e E

f 0

s u X

X %

Output Character Signed decimal integer Signed 32-bit long decimal integer Scientific notation Scientific notation, capital letter Floating point Unsigned octal String of characters Unsigned decimal integer Unsigned hexadecimal integer Unsigned hexadecimal integer (capital letters) %% will write% to stdout

Example a 392 1234567890 6.02214le23 6.022141£23 3.14159 610 sample 7235 7fa 7FA %

Table 3.3.6. Format specifiers.

The tag can also contain flags, width, .precision, and length sub-specifiers. The flags are listed in Table 3.3.7. If the width is present, it specifies the minimum number of characters to be printed. If the value to be printed is shorter than this number, the result is padded with blank spaces. The value is not truncated even if the result is larger. The .precision sub-specifier specifies the minimum number of digits to be written (d, i, o, u, x, X). If the value to be written is shorter than this number, the result is padded with leading zeros. The value is not truncated if the result requires more digits. A precision of 0 means that no character is written for the value 0. Fors the .precision is the maximum number of characters to be printed. For c type is .precision has no effect. For floating point .precision is the number of digits after the decimal.

Flags

+ (space)

# 0

Description Left-justify within the given field width Forces the result to have a plus or minus sign If no sign is going to be written, a blank space is inserted before the value. Used with o, x or X specifiers the value is preceded with 0, Ox or OX respectively for values different than zero. Left-pads the number with zeroes (0) instead of spaces, where padding is specified (see width sub-specifier).

Table 3.3.7. Flag sub-specifiers.

If successful, printf will return the total number of characters written. On failure, a negative number is returned. The start of a fonnat specifier is signified by a percent sign and the end is signified by one of the letter codes in Table 3.3.6. Each format specifier will be replaced by a value from the argument list converted to the specified format. These optional fields typical occur

Jonathan Valvano

147

in this order. The pound sign (' # ') specifies that the value should be converted to an alternate form. The alternate form for hexadecimal adds the Ox or OX. The alternate fotm for octal is a leading zero.

printf("%x", 11); printf("%#x", 11); printf("%X", 11) ; printf("%#X", 11) ; printf("%o", 11); printf("%#o", 11);

II II II II II II

prints prints prints prints prints prints

'b' '0xb' 'B'

'0XB' '13' '013'

The zero (' 0 ') specifies zero-padding. The converted value is padded on the left with the specified number of zeros minus the number of digits to be printed. This is described in more detail below.

printf("%d", 9) ; printf("%4d", 9) ; printf("%04d", 9) ; printf("%04d", 123);

II II II II

prints prints prints prints

'9' 9' ' '0009' '0123'

A minus sign (' - ') specifies left justification. Without the minus, the fotmat is right justified.

printf ( "%5d", 12); printf("%-5d", 12); A space ('

II II

12' prints ' ' prints '12

(right justified) (left justified)

') specifies that a blank should be left before a positive number.

print£("% d", 9); print£("% d", -9);

II II

prints ' 9' prints '-9'

The plus sign (' + ') specifies that a sign always be placed before the value. The plus sign overrides a space if both are used .

printf("%+d", 9); printf("%+d", -9);

II II

prints '+9' prints '-9'

A decimal digit specifies the minimum field width. Using the minus sign makes the format is left justified, otherwise it is right justified. Used with the zero-modifier for numeric conversions, the value is right-padded with zeros to fill the field width.

printf("%3d", 12); printf ( "%-3d", 12); printf ( "%3d", 123); printf ( "%3d", 1234) ;

II II II II

prints prints prints prints

' 12' (right justified) '12 ' (left justified) '123' (filled up) '1234' (bigger than 3 width)

A precision value in the form of a period ( ' . '), fol lowed by an optional digit string. If the digit string is omitted, a precision of zero is used. When used with decimal, hexadecimal or octal integers, it specifies the minimum number of digits to print. For floating point output, it specifies the number of digits after the dec imal place. For the 's' (string) conversion, it specifies the maximum number of characters of the string to print, which is quite useful to make sure long strings don't exceed their field width.

148

• 3. Software Design printf("% . 3d", printf("%.3d", printf(" %3s", printf(" %.3s", printf(" %3s", printf(" %.3s",

7) ; 12345); "Jonathan"); "Jonathan"); "JV"); "JV");

II II II II II II

prints prints prints prints prints prints

'007' '12345' 'Jonathan' 'Jon' 'JV 'JV'

I

Consider a decimal fi xed-point number with units 0.001 cm. For example, if the value of distance is equal to 1234, this means the distance is 1.234 cm. Assume the distance varies from Oto 99.999 cm. This C code could be used to print the value of the number in such a way that exactly 20 characters are printed for all values of distance fro m O to 99999. The first form at specifi er (%2u) prints the integer part in exactly two characters, and the second fo rmat specifier (%. 3u) prints the fractional part in exactly three characters.

printf("Distance Value 0 99

123 1234 12345

=

%2u. %.3u cm" , distancell000,distance%1000); O u tpu t

Distance Distance Distance Distance Distance Distance

= = = = = =

0.000 0.001 0.099 0.123 1.234 12.345

cm cm cm cm cm cm

Program 3.3.33 demonstrates the use of output us ing the standard library . The UART_ InUDec function w ill wait for user input and return with a value as specified by the type. The body of the main program : 1) 2) 3) 4)

Asks for input, Waits for input, P erfor ms a calculation, and Displays the res ults.

uint32 t side ; II room wall meters uint32 t area; II size squared meters II note: sorry, but scanf does not work int main(void){ Clock_Init40MHz(); LaunchPad_Init(); UART InitPrintf(); printf("\n\rThis program calculates areas of square-shaped rooms\n\r") ; while(l){ printf("Give room side: "); II 1) ask for input side= UART_InUDec(); II 2) wait for input area= side*side; II 3) calculation printf("\n\rside = %um, area= %u sqr m\n\r", side, area); II 4) out

Program 3.3.33. Sojt1vare to calculate the area of a square room (UAR..T_busywait).

Jonathan Valvano

149

3.4. Modular Design using Abstraction In Section 3.2, we presented successive refinement as a method to convert a problem statement into a software algorithm. Successive refinement is the transformation from the general to the specific. In this section, we introduce the concept of modular programming and demonstrate that it is an effective way to organize our software projects. Modular design applies to both hardware and software. There are four reasons for fom1ing modules. First, functional abstraction allows us to reuse a software module from multiple locations. Second, complexity abstraction allows us to divide a highly complex system into smaller less complicated components. The third reason is portability. If we create modules for the I/0 devices, then we can isolate the rest of the system from the hardware details. This approach is sometimes called a hardware abstraction layer. Since all the software components that access an 1/0 port are grouped together, it will be easier to redesign the embedded system on a machine with different 1/0 ports. Finally, another reason for forming modules is security. Modular systems by design hide the inner workings from other modules and provide a strict set of mechanisms to access data and f/0 ports. Hiding details and restricting access generates a more secure system. Systems must deal with complexity. Most real systems have many components, which interact in a complex manner. The size and interactions will make it difficult to conceptualize, abstract, visualize, and document. In this chapter we will present data flow graphs and call graphs as tools to describe interactions between components. Software must deal with conformity. All design, including software design, must interface with existing systems and with systems yet to be designed. Interfacing with existing systems creates an additional complexity. Software must deal with changeability. M()st of the design effort involves change. Creating systems that are easy to change will help manage the rapid growth occurring in the computer industry.

3.4.1. Definition and Goals The key to completing any complex task is to break it down into manageable subtasks. Modular programming is a style of software development that divides the software problem into distinct well-defined modules. The pai1s are as small as possible, yet relatively independent. Complex systems designed in a modular fashion are easier to debug because each module can be tested separately. Industry experts estimate that 50 to 90% of software development cost is spent in maintenance. All five aspects of software maintenance • Correcting mistakes, • Adding new features, • Optimizing for execution speed or program size, • Porting to new computers or operating systems, and • Reconfiguring the software to solve a similar related program are simplified by organizing the software system into modules. The approach is pai1icula.rly useful when a task is large enough to require several progranimers. A module is a collection of functions and data that working together perform one well-defined

150

• 3. Software Design task. Typically, there are three files that constitute a module: a header file, a code file, and an example main program that demonstrates how the most is used or how it was tested.

Header file contains Comments that describe what it does and how to use it Prototypes for public functions struct, enum, typedef for new data types used by the module Preprocessor directives to limit the header file to be loaded ju st o nce Doxygen formatting to create software documentation Things that should not be in a header file include variables, executable code, and anything private.

Code file contains Comments that describe how it works, how it was tested, how it can be changed Definitions for all functions including private functions Variables if needed Preprocessor directives to include other modules that this module needs Things that should not be in a code file include extern references to private data/functions in other files, access to other modules that are not absolutely essential, and access to input/output ports that are not necessary. The goal is to separate what it does (header fi le) from how it works (code file) . Another goal is to allow software to evolve with minimal impact on how it is used.

main file contains Example code that illustrates how it works Example code that was used to test the module Clear explanation of the limitations of the module In this section, an object refers to either a subroutine or a data element. A public object is one that is shared by multiple modules . This means a public object can be accessed by other modules. Typically, we make tJ1e most general functions of a module public, so the functions can be called from other modules. For a module performing 1/0, typical public functions include initialization, input, and output. A private object is one that is not shared. I.e., a private object can be accessed by only one module. Typically, we make the internal workings of a module private, so we hide how a private function works from user of the module. In an object-oriented language like C++ or Java, the progranuner clearly defines a function or data object as public or private. Later in this chapter, we will present a naming convention for assembly language or C that can be used in an equivalent manner to define a function or data object as public or private. At a first glance, 1/0 devices seem to be public. For example, Port B registers reside penuanently at the fixed addresses, and the programmer of every module knows that. In other words, from a syntactic viewpo int, any module has access to any 1/0 device. However, in order to reduce the complexity of the system, we will restrict the number of modules that actually do access the 1/0 device. From a "what do we actually do" perspective, however, we will write software that considers 1/0 devices as private, meaning an 1/0 device should be accessed by only one module. In general, it will be important to clarify which modules have access to 1/0 devices and when they are allowed to access them. When more than one module accesses an 1/0 device, then it is

Jonathan Valvano

151

important to develop ways to arbitrate or synchronize. If two or more want to access the device simultaneously arbitration determines which module goes first. Sometimes the order of access matters, so we use synchronization to force a second module to wait until the first module is finished. Most microcontrollers do not have architectural features that restrict access to I/O ports, because it is assumed that all software burned into its ROM was designed for a common goal, meaning from a security standpoint one can assume there are no malicious components. However, as embedded systems become connected to the Internet, providing the power and flexibility, security will become important issue. Checkpoint 3.3.7: What conflict could arise if multiple modules use the same port, but module initialization functions are not friendly? How do you resol ve the conflict? Information hiding is similar to minimizing coupling. It is better to separate the mechanisms of software from its policies. We should separate "what the function does" from "how the function works". What a function does is defined by the relationship between its inputs and outputs. It is good to hide certain inner workings of a module and simply interface with the other modules through the well-defined input/output parameters. For exan1ple we could implement a variable size buffer by maintaining the current byte count in a global variable, Count. A good module will hide how Count is implemented from its users. If the user wants to know how many bytes are in the buffer, it calls a function that returns the count. A badly written module will not hide Count from its users. The user simply accesses the global variable Count. If we update the buffer routines, making them faster or better, we might have to update all the programs that access Count too. Allowing all software to access Count creates a secmity risk, making the system vulnerable to malicious or incompetent software. The object-oriented programming environments provide well-defined mechanisms to support information hiding. This separation of policies from mechanisms is discussed further in the section on layered software. Maintenance Tip: It is good practice to make all permanentJy-allocated data and all 1/ 0 devices private. Information is transferred from one module to another through well-defined public function calls.

The Keep It Simple Stupid approach tries to generalize the problem so that the solution uses an abstract model. Unfortunately, the person who defines the software specifications may not understand the implications and alternatives. Sometimes we can restate the problem to allow for a simpler and possibly more powerful solution. As a software developer, we always ask ourselves these questions: "How important is this feature?" "What if it worked this different way?"

3.4.2. Functions, Procedures, Methods, and Subroutines Formally, there are three components to a function: declaration, definition and invocation. The declaration defines the function name, its input parameters and its output parameter. Other name

for declaration is prototype. If another module can call a function within this module, we classify the function as public. We will place prototypes for public functions in the header file for the module. A function which can only be called by software within the module is classified as private. The function definition specifies the task to be performed. In other words, the definition

152

• 3. Software Design defines what will happen when executed. The definition of a function includes a formal specification of its input parameters and output parameters. In well-written software, the task performed by a function will be well-defined and logically complete. The function invocation is inserted to the software system at places when and where the task should be performed. We define software that invokes the function as "the calling program". There are three parts to a function invocation: pass input parameters, call the function, and accept output parameters. lf there are input parameters, the calling program must establish the values for input parameters before it calls the subroutine. A BL instructi on is used to call the subroutine. After the subroutine finishes, and if there are output parameters, the calling program accepts the return value(s). According to AAPCS , one to four input parameters are passed in registers RO to R3. lfthe register contains a value, the parameter is classified as call by value. If the register contains an address, which points to the value, then the parameter is classified as call by reference. Ifthere is a retmn parameter, it is returned in RO. For examp le, consider a subroutine that samples the 12-bit ADC, as drawn in Figure 3.4.1. An analog input signal is connected to ADCO. The details of how the ADC works will be presented later in Chapter 7, but for now we focus on the defining and invoking of subroutines. The execution sequence begins with the calling program setting up the input parameters. ln this case, the calling program sets Register RO equal to the channel number, MOVS RO , # 0 . The instruction BL ADC_ In wi II save the return address in the LR register and jump to the ADC_ In subroutine. The subroutine performs a well-defined task. In this case, it takes the channel number in Register RO and performs an analog to digital conversion, placing the digital representation of the analog input into Register RO. The BX LR instruction will move the return address into the PC, returning the execution thread to the instruction after the BL in the calling program. In this case, the output parameter in Register RO contains the result of the ADC conversion. It is the responsibility of the calling program to accept the return parameter. In this case, it simply stores the result into variable n. In this example, both the input and output parameters are call by value.

Input Parameter CallingPrograrn MOVS R0,#0 BL ADC In LDR R3,=n STR RO, [R3]

//Subroutine //Samples 12-bit ADC //In: Reg RO has channel Number //Out: Reg RO has 12-bit ADC result ADC In: ... For details see Chapter 7 ... BX LR

Figure 3.4.1. The calling program invokes the ADC_ In subroutine passing parameters in RO. See Chapter 1.

3.4.3. Dividing a Software Task into Modules The overall goal of modular programming is to enhance cla1ity. The smaller the task, the easier it will be to understand. Coupling is defined as the influence one module's behavior has on

Jon a than Valvano

153

another module. In order to make modules more independent we strive to minimize coupling. Obvious and appropriate examples of coupling are the input/output parameters explicitly passed from one module to another. A quantitative measure of coupling is the number of bytes per second (bandwidth) that are transferred from one module to another. On the other hand, information stored in public global variables can be quite difficult to track. In a similar way, shared accesses to 1/0 ports can also introduce unnecessary complexity. Public global variables cause coupling between modules that complicate the debugging process because now the modules may not be able to be separately tested. On the other hand, we must use global variables to pass information into and out of an interrupt service routine and from one call to an intem1pt service routine to the next call. When passing data into or out of an interrupt service routine, we group the functions that access the global into the same module, thereby making the variable static. Another problem specific to embedded systems is the need for fast execution, coupled with the limited support for local variables. On many microcontrollers it is inefficient to implement local variables on the stack. Consequently, many programmers opt for the less elegant yet faster approach of global variables. Again, ifwe restrict access to these globals to function in the same module, the global becomes private. It is poor design to pass data between modules through public global variables; it is better to use a well-defined abstract technique like a FIFO. We should assign a logically complete task to each module. The module is logically complete when it can be separated from the rest of the system and placed into another application. The interface design is extremely important. The interface to a module is the set of public functions that can be called and the formats for the input/output parameters of these functions. The interfaces determine the policies of om modules: "What does the module do?" fn other words, the interfaces define the set of actions that can be initiated. The interfaces also define the coupling between modules. In general we wish to minimize the bandwidth of data passing between the modules yet maximize the number of modules. Of the following three objectives when dividing a software project into subtasks, it is really only the first one that matters • Make the software project easier to understand • Increase the number of modules • Decrease the interdependency (minjmize bandwidth between modules). Observation: We improve modularity by increasing the number of modules and decreasing the interdependency or coupling between modules. Checkpoint 3.3.8: tist some examples of coupling.

We will illustrate the process of dividing a software task into modules with an abstract but realistic example. The overall goal of the example shown in Figure 3.4.2 is to sample data using an ADC, perform calculations on the data, and output results. The liquid crystal display (LCD) could be used to display data to the external world. Notice the typical format of an embedded system in that it has some tasks performed once at the beginning, and it has a long sequence of tasks performed over and over. The structure of this example applies to many embedded systems such as a diagnostic medical instrument, an intruder alarm system, a heating/ AC controller, a voice recognition module, automotive emissions controller, or military surveillance system. The left side of Figure 3 .4.2 shows the complex software system defined as a linear sequence of ten steps, where each step represents many lines of code. The linear approach to thjs progran1 follows closely to linear sequence of the processor as it executes instructions. This linear code, however close to the actual processor, is difficult to understand, hard to debug, and impossible to reuse

154



3. Software Design for other projects. Therefore, we will attempt a modular approach considering the issues of functional abstraction, complexity abstraction, and portability in this example. The modular approach to this problem divides the software into three modules containing seven subroutines. In this example, assume the sequence Step4-Step5-Step6 causes data to be sorted. Notice that this sorting task is executed twice.

Linear approach main Stepl Step2 loop Step3 Step4 Steps Step6 Step7 Steps Step9 Step4 Steps Step6 Step10 B loop

Modular approach

-----

Math Cale PUSH {LR}

BL Sort BL Average Step9 BL Sort

Sort Step4 Steps Step6 BX LR

POP {PC}

Figure 3.4. 2. A complex sojhvare syste111 is broken into !hree modules containing seven subroutines. Functional abstraction encourages us to create a Sort subroutine allowing us to write the software once, but execute it from different locations. Complexity abstraction encourages us to organize the ten-step software into a main program with multiple modules, where each module has multiple subroutines. For example, assume the assembly instructions in Step 1 cause the ADC to be initialized . Even though this code is executed only once, complexity abstraction encourages us to create an ADC_ Ini t subroutine so the system is easier to understand and easier to debug. In a similar way assume Step2 initializes the LCD port, Step3 samples the ADC, the sequence Step7-Step8 performs an average, and Step IO outputs to the LCD. Therefore, each well-defined task is defined as a separate subroutine. The subroutines are then grouped into modules. For example, the ADC module is a collection of subroutines that operate the ADC. The complex behavior of the ADC is now abstracted into two easy to understand tasks: turn it on, and use it. In a similar way, the LCD module includes al l functions that access the LCD. Again, at the abstract level of the main program, understanding how to use the LCD is a matter knowing we first tum it on then we transmit data. The math module is a collection of subroutines to pe1fom1 necessary calculations on the data. In this example, we assume so1i and average will be private subroutines, meaning they can be called only by software within the math module and not by software outside the module. Making private subroutines is an example of"information hiding", separating what the module does from how the module works. When we port a system, it means we take a working system and redesign it with some minor but critical change. The LCD device is used in this system to output results. We might be asked to p01i this system onto a device that uses an OLEO in place of the LCD for its output. In this case, all we need to do is design,

Jonathan Valvano

155

implement and test an OLED module with two subroutines LCD Ini t and LCD Out that function in a similar manner as the existing LCD routines. The modular approach perfom1s the exact same ten steps i11 the exact same order. However, the modular approach is easier to debug, because first we debug each subroutine, then we debug each module, and finally we debug the entire system. The modular approach clearly supports code reuse. For example, if another system needs an ADC, we can simply use the ADC module software without having to debug it again. Observation: When writing modular code, notice its two-dimensional aspect. Down the y-axis still represents time as the program is executed, but along the x-axis we now visualize a fw1erional block diagram of the system showing its data flow: input, calculate, output. Observation: When writing modular code, we hide details that are likely to change. Furthermore, we take details that are unlikely to change and use them to define the interfaces between modules.

3.4.4. How to Draw a Call Graph Defined previously, we recall that a call graph is a graphical representation of the organizational structure of the modules pieced together to construct a system. In this section, we will work through the process of drawing a call graph. A software module is a collection of public functions, private functions, and private global variables that together perform a complete task. Modular programming places multiple related subroutines into a single module. 1/0 devices are essential in all computers, but they are particularly relevant when developing software for an embedded system. Just like our software, it is appropriate to group l/0 ports into hardware modules, which together perform a complete l/0 task. The main program is at the top, and the 1/0 ports are at the bottom. In a hierarchical system, the modules arc organized both in a horizontal and vertical fashion. Modules at the same horizontal level perform similar but distinct functions (e.g., we could place all 1/0 modules at the same horizontal level in the call graph hierarchy). From a vertical perspective, we place modules responsible for overall policy decisions at the top and modules perfonning implementations at the bottom of the call graph hierarchy. Since one of the advantages of breaking a large software project into subtasks is concmrent development, it makes sense to consider concmrency when dividing the tasks. In other words, the modules should be pa1iitioned in such a way that multiple programmers can develop and test the subtasks as independently as possible. On the other hand, careful and constant supervision is required as modules are connected together and tested . An arrow represents a software linkage, i.e., one software module calling another. We draw the tail of the arrow in the software module that initiates the call, and we point the head of the arrow at the software module it calls. Again, we place prototypes for public functions in the header file. When programming in C, including a header file in the implementation file of a module defines an arrow in the call graph . The exception to this rule is including a header file that contains constants and has no corresponding implementation file. The file msp.h contains the 1/0 port definitions for the MSPM0G3507, and has no code file . Therefore, msp is not a module. On the other hand we can create an ADC module by placi11g all the ADC functions in the ADC.c file and defining the prototypes for the public functions in the ADC.h file. Ifwe place an #include "ADC. h" statement in our main. c code, we create a call graph arrow from the main module to the ADC module, because software in the main module can call the public functions of the

156

• 3. Software Design ADC module. ln a large complex system, we wi ll add call graph arrows for situations where it can call rather than where is does call. It is easier in a larger system to draw the can-call an-ows than the does-call anows, because we just have to look at the header files each code file includes. In contrast, we usually draw only the does-call arrows for accesses to l/0 devices. In other words, a device driver is a collection of 1/0 software for a particular 1/0 device. This approach will also simplify maintaining a call graph during phases while the software is being designed, written, debugged, or upgraded . Changes to the list of header files included by a module are much less frequent than changes to the list of functions actually called. On the other hand , most embedded systems are simple enough that it is more appropriate to show just the does-call arrows. A global variable is one which is allocated in permanent RAM. These variables are a necessary and important component of an embedded system, because some infonnation is permanent in nature. Good programming style however suggests we restrict access to these variables to a single module using static. On the other hand, a public global variable is accessed by more than one module. Public globals (externals) represent poor progran1ming style, because they add complexity to the system. Reading and writing public globals add arrows to the call graph. If module A reads a global variable in module B, then we add an arrow from B to A, because activities in B cause changes in A. If module A writes to a global variable in module B, then we add an arrow from A to B, because activ ities in A cause changes in B. If there is an an-ow from A to B, and a second a1rnw from B to A, then modules A and B must be tested together. Coupling through shared public global variables is a very bad style because debugging will be difficult. Typically, hardware modules are at the lowest level, because hardware responds to software. An arrow from an oval to a rectangle represents a hardware access, i.e., the software reads from or writes to an 1/0 port. An arrow from an oval to a rectangle signifies the usua l read/write access to the hardware module or public global. We will study interrupts in detail in Chapter 5. With interrupts, a hardware triggering event causes the software interrupt service routine to execute. Therefore with interrupts, we add an arrow from the hardware module to the software module. lt can be drawn with two single-headed arrows or one double-headed arrow. Defining arrows between hardware and software modules allows us to identify problems such as conflict (two modules writing to the same l/0 configuration registers), or race conditions (e.g., one module reading a port before another module initializes it). Figure 3.4.3 shows a call graph of the example presented in Figure 3.4.2. To draw a call graph, we first represent all the software modules as ovals. Inside the oval lives the functions and variab les of that module. Normally, there is not space to list all the subroutines of each module inside the oval, but they are drawn here in this figure so you can see the details of how the graph is drawn. In this example, there is a main program and three software modules. Since this main program calls the Math module, the main program is at a higher level than the Math module, therefore the oval for main will be drawn above the oval for Math. The ADC, Math, and LCD modules do not call each other and each is called by main, so they exist at the same level. In this example, there are two hardware modules, and they are drawn as rectangles. To draw the arrows, we search for subroutine call instructions. The tai l ofan arrow is placed in the module containing the calling program, and the head of an arrow is placed in the module with the subroutine. If there are multiple calls from one module to another, only one arrow is needed. For example, there are two calls from main to ADC, but only one arrow is drawn . No an-ows will be drawn to describe subroutine calls within a module. For example, we do not need to draw arrows representing the Math routine Math_Cale calling Sort and Average, because these are all

J onathan Valvano

157

within the same module. Two arrows from software to hardware are drawn, because the ADC module accesses the ADC hardware, and the LCD module accesses the LCD hardware.

LCD LCD Init

ADC Init Step I BX LR

Step2 BX LR BL Average

Step9 BL Sort

routines

POP {PC}

Sort

Step4 Step5 Step6

Average

Step7 Step8 BX LR

BX LR

Fig11re 3.4.3. A call graph of the

ryste111

qfFigure 3.4.2.

We can develop and connect modules in a hierarchical manner. Construct new modules by combining existing modules. In general, to reduce complexity of the system we want to maximize the number of modules and minimize the number of arrows between them. More specifically, we want to minimize the bandwidth of data flowing from one module to the other. Observation: If module A calls module B, and module B ca.Us module A, then these two modules must be tested together. Maintenance Tip: It is good practice to have one hardware module (e.g., the AD C or LCD) accessed by exactly cme software module. Checkpoint 3.3.9: Io what way are I /0 devices considered as publi c? Checkpoint 3.3.10: How can you implement a system that co nsiders I/0 devices as private?

3.4.5. How to Draw a Data Flow Graph As shown previously, a data flow graph is a graphical representation of the data as it traverses the system. Figure 3.4.4 shows the data flow graph for the example presented in Figme 3.4.2. In general, the data flow graph contains the same software and hardware modules as the call graph. There are two fundamental differences, however. The arrows in a data flow graph specify the direction, data type, and rate of data transfer. Conversely, arrows in a call graph specify which

158



3. Software Design module invoked which other module. The second difference is in general we draw modules in a data flow graph from left to right as data enters as inputs on the left and exits as outputs on the right. In a call graph, we draw modules top to bottom from high-level to low-level functions.

0 to 50Hz analog signal I ADC hardware

16-bit data 100 halfwords/sec

10-character string 1 string/sec

/

100-element 16-bit buffer l buffer/ sec

LCD hardware

16-bit result 1 halfword/sec

Fz"g11re 3.4 .4. A data flow graph of the rystem of Figure 3.4.2. Assume in this example, the analog input contains frequency components from Oto 50 Hz. We classify the signal as analog and specify the bandwidth of the analog signal to be 50 Hz. The output of the ADC hardware is 12-bit digital samples. If the 12-bit ADC is sampled 100 times a second, we define the bandwidth of the digital data out of the ADC software module as 100 samples/sec, 100 halfwords/sec or 200 bytes/sec. Assume once a second, the main program fills a I 00-element buffer and passes it to the math module. The math module takes in l 00 samples and generates one 16-bit result. In this case, we define the output of the math module to be 1 halfword/sec. lf once a second each result is printed as ten ASCII characters using the LCD, then the bandwidth into and out of the LCD software module will be l O characters/sec.

3.4.6. Top-down versus Bottom-up Design Hierarchical systems have tree-structured call graphs, like system in Figure 3.4.3. Layered systems have call graphs that group the modules into layers, such that the linkage aiTows only go from a high level to a lower level or within the same level. A lower leve l module is not allowed to call a higher level. lf at all possible, we should avoid cyclic graphs. A cycle in the call graph will make testing difficult. Recall that we design top down and test bottom up. When there is a cycle in the call graph, there is no good place to start debugging. There are two approaches to hierarchical programming. The top-down approach starts with a general overview, like an outline of a paper, and builds refinement into subsequent layers. Most engineers believe top down is the proper approach to design. A top-down programmer was once quoted as saying, ''Write no software Lmtil every detail is specified"

Jonathan Valvano

159

Top down provides an excel lent global approach to the problem. Managers like top down because it gives them tighter control over their workers. The top-down approach works well when an existing operational system is being upgraded or rewritten. On the other hand the bottom-up approach starts with the smallest detail , builds up the system "one brick at a time." The bottom-up approach provides a realistic appreciation of the problem because we often cannot appreciate the difficulty or the simplicity of a problem until we have tried it. It allows programmers to start immediately coding and gives programmers more input into the design. For example, a low level programmer may be able to point out features that are not possible and suggest other features that a.re even better. Some software projects are flawed from their conception. With bottom-up design, the obvious flaws surface early in the development cycle. Bottom-up is a better approach when designing a complex system and specifications a.re openended. For example, when researching new technologies or exploring new markets, you can't perfonn a top-down design because there are no specifications or constraints with which to work. However, a bottom-up approach allows you to brainstorm putting pieces together in new and creative ways . In a bottom-up design, questions begin with "l wonder what would happen if. .." On the other hand, top down is better when you have a very clear understanding of the problem specifications and the constraints of your system.

3.5. Quality Software

3.5.1. Attitude Good engineers employ well-defined design processes when developing complex systems. When we work within a structured framework, it is easier to prove our system works (verification) and to modify our system in the future (maintenance.) As our software systems become more complex, it becomes increasingly important to employ well-defined software design processes. Throughout this book, a very detailed set of software development rules will be presented. This book focuses on real-time embedded systems written in assembly language and C, but most of the design processes should apply to other languages as well. At first, it may seem radical to force such a rigid structure to software. We might wonder if creativity will be sacrificed in the process. True creativity is more about good solutions to important problems and not about being sloppy and inconsistent. Because software maintenance is a critical task, the time spent organizing, documenting, and testing during the initial development stages will reap huge dividends throughout the life of the software project. Observation: The easies t way to debug is to write software witho ut an y bugs.

We define clients as programmers who will use our software. A client develops software that will call our functions. We define coworkers as programmers who will debug and upgrade our software. A coworker, possibly ourselves, develops, tests, and modifies our software.

160

• 3. Software Design Writing quality software has a lot to do with attitude. We should be embarrassed to ask our coworkers to make changes to our poorly written software. Since so much software development effort involves maintenance, we should create software modules that are easy to change. Tn other words, we should expect each piece of our code will be read by another engineer in the future, whose job it will be to make changes to our code. We might be tempted to quit a software project once the system is running, but this short time we might save by not organizing, documenting, and testing will be lost many times over in the future when it is time to update the code. As project managers, we must reward good behavior and punish bad behavior. A company, in an effort to improve the quality of their software products, implemented the following policies.

The employees in the customer relations department receive a bonus for every software bug that they can identify. These bugs are reported to the software developers, who in turn receive a bonus for every bug they fix. Checkpoint 3.5.1: Why did the above policy fail horribl y?

We should demand of ourselves that we deliver bug-free software to our clients. Again, we should be embarrassed when our clients report bugs in our code. We should be mortified when other programmers find bugs in our code. There are a few steps we can take to facilitate this important aspect of software design. Test it now. When we find a bug, fix it immediately. The longer we put off fixing a mistake the more complicated the system becomes, making it harder to find . Remember that bugs do not go away on their own, but we can make the system so complex that the bugs will manifest themselves in mysterious and obscure ways. For the same reason, we should completely test each module individually, before combining them into a larger system. We should not add new features before we are convinced the existing system is bug-free. In this way, we start with a working system, add features, and then debug this system until it is working again. This incremental approach makes it easier to track progress. lt allows us to undo bad decisions, because we can always revert back to a previously working system. Adding new features before the old ones are debugged is very risky. With this sloppy approach, we could easily reach the project deadline with 100% of the features implemented, but have a system that doesn 't run. In addition, once a bug is introduced, the longer we wait to remove it, the harder it will be to correct. This is particularly true when the bugs interact with each other. Conversely, with the incremental approach, when the project schedule slips, we can deliver a working system at the deadline that suppo1ts some of the features. Maintenance Tip: Go from working system to working system.

Plan for testing. How to test each module should be considered at the start of a project. In particular, testing should be included as pa1t of the design of both hardware and software components. Our testing and the client's usage go hand in hand. In particular, how we test the module will help the client understand the context and limitations of how our component is to be used. On the other hand, a clear understanding of how the client wishes to use our hardware/software component is critical for both its design and its testing. Maintenance Tip: It is better to have some parts of the system that run with 100% reliabihty than to have the entire system with bugs.

Jonathan Valvano

161

Get help. Use whatever features am available for organization and debugging. Pay attention to warnings, because they often point to misunderstandings about data or functions. Misunderstanding of assumptions that can cause bugs when the software is upgraded, or reused in a different context than originally conceived. Remember that computer time is a lot cheaper than programmer time. Maintenance Tip: It is better to have a system that runs slowly than to have one that doesn't run at all.

Deal with the complexity. In the early days of microcomputer systems, software size could be measw·ed in l00's of lines of source code using lO00's of bytes of memory. These early systems, due to their small size, were inherently simple. The explosion of hardware technology (both in speed and size) has led to a similar increase in the size of software systems. Current automobiles have over l 0 million lines of code in their embedded systems. The only hope for success in a large software system will be to break it into simple modules. In most cases, the complexity of the problem itself cannot be avoided. E.g., there is just no simple way to get to the moon. Nevertheless, a complex system can be created out of simple components. A real creative effort is required to orchestrate simple building blocks into larger modules, which themselves are grouped to create even larger systems. Use your creativity to break a complex problem into simple components, rather than developing complex solutions to simple problems. Observation: There: are two ways of constructing a software design: one way is to make it so simple that there are obviously no defi ciencies and the other way is make it so complicated that there are no obvious deficiencies. C.A.R. Hoare, "The E mperor's O ld Clothes," CACM Feb. 1981 .

Embedded system development is similar to other engineering tasks . We can choose to follow well-defined procedures during the development and evaluation phases, or we can meander in a haphazard way and produce code that is hard to test and harder to change . The ultimate goal of the system is to satisfy the stated objectives such as accuracy, stability, and input/output relationships. Nevertheless it is appropriate to separately evaluate the individual components of the system. Therefore in this section, we will evaluate the quality of our software. There are two categories of performance criteria with which we evaluate the "goodness" of our software. Quantitative criteria include dynamic efficiency (speed of execution), static efficiency (memory requirements), and accuracy of the results. Qualitative criteria center on ease of software maintenance. Another qualitative way to evaluate software is ease of understanding. If your software is easy to understand then it will be: Easy to dc:bug (fix mistakes) Easy to vc~rify (prove correctness) Easy to maintain (add features) Common Error: Programmers who sacrifice clariry in favor of execution speed often develop software that runs fast, but is error-prone and difficult to change.

Golden Rule of Software Development Write seftJJJare )or others asyott 1vish thry 1vott!rl JJJri!e foryou.

162



3. Software Design

3.5.2. Style Guidelines The objective of this section is to present style rules when developing software. This set of rules is meant to guide not control. In other words, they serve as general guidelines rather than fundamental law. Choosing names for variables and functions involves creative thought, and it is intimately connected to how we fee l about ourselves as programmers. Of the po licies presented in this section, naming conventions may be the hardest habit for us to break. The difficulty is that there are many conventions that satisfy the "easy to understand" objective. Good names reduce the need for documentation. Poor names promote confusion, ambiguity, and mistakes. Poor names can occm because code has been copied from a different situation and inserted into our system without proper integration (i.e., changing the names to be consistent with the new situation.) They can also occur in the cluttered mind of a second-rate programmer, who hurries to deliver software before it is finished. Names should have meaning. Ifwe observe a name away from the place where it is defined, the meaning of the object should be obvious. The object TxFifo is clearly the transmit first in first out circular queue. The function LCD_OutString will output a string to the LCD. Avoid ambiguities. Don't use variable names in our system that are vague or have more than one meaning. For example, it is vague to use temp, because there are many possibilities for temporary data, in fact, it might even mean temperature. Don't use two names that look similar, but have different meanings. Give hints about the type. We can further clarify the meaning of a variable by including phrases in the variable name that specify its type. For example, dataPt timePt putPt are pointers. Similarly, vol tageBuf timeBuf pressureBuf are data buffers. Other good phrases include Flag Mode U L Index Cnt, which refer to Boolean flag, system state, unsigned J 6-b it, signed 32-b it, index into an array, and a counter respectively. Use the same name to refer to the same type of o~ject. For example, everywhere we need a local variable to store an ASCII character we could use the name letter. Another common example is to use the names i j k for indices into arrays. The names Vl Rl might refer to a voltage and a resistance. The exact correspondence is not part of the policies presented in this section, just the fact that a correspondence should exist. Once another programmer learns which names we use for which types of object, understanding our code becomes easier. Use a prefix to identffy public objects. A public variable is shared between two modules. A public function is a function in one module that can be called from another module. An underline character will separate the module name from the fw1ction name. Public objects have the underline and private objects do not. As an exception to this rule, we can use the underline to delimit words in all upper-case name (e.g., MIN_PRESSURE equ 10). Functions that can be accessed outside the scope of a module (i.e., public) will begin with a prefix specifying the module to which it belongs. It is poor style to create public variables, but if they need to exist, they too would begin with the module prefix. The prefix matches the module name containing the object. For example, if we see a function call, BL LCD_OutString we know the public function belongs to the LCD module. Notice the similarity between this syntax (e.g., LCD_Ini t) and the corresponding syntax we would use if programming the module in C++ (e.g., LCD. Ini t () ). Using this convention, we can distinguish public and private objects.

)onathan Valvano

163

Use upper and lower case to specify the allocation ofan object. We will define 1/0 po1t addresses and other constants using no lower-case letters, like typing with caps-lock on. In other words, names without lower-case letters refer to objects with fixed values. TRUE FALSE and NULL are good examples of fixed-valued objects. As mentioned earlier, constant names fonned from multiple words will use an underline character to delimit the individual words. E.g., MAX_VOLTAGE UPPER_BOUND FIFO_SIZE. Pennanently allocated variables are global, with a name beginning with a capital letter, but including some lower-case letters. A static variable has permanent allocation, but with scope restricted to the file or to the function. Temporarily allocated variables are called local, and the name will begin with a lower-case letter, and may or may not include upper case letters. Since all functions are permanently allocated, we can start function names with either an upper-case or lower-case letter. Using this convention, we can distinguish constants, globals and locals. Observation: An object's properties (public/ private, local/global, constant/variable) are always perfectly clear at the place where the object is defined. The importance of the naming policy is to extend that clarity also to the places where the object is used .

Use capitalization to delimit words. Using capitalization is called camel case, because it looks like it has bumps. Names that contain multiple words should be defined using a capital letter to signify the first letter of the word. Recall that the case of the first letter specifies whether it is a local or global variable. Some programmers use the underline as a word-delimiter, but except for constants, we will reserve underline to separate the module name from the name of a public object. Table 3.5. l overviews the naming convention presented in this section. Object type Constants Local variables Static variable Public global variable Private function Public function

Examples of names that identify type CR SAFE TO RUN PORTA STACK SIZE START OF RAM maxTemperature lastCharTyped errorCnt MaxTemperature LastCharTyped ErrorCnt RxFifoPt DAC_MaxTemperature Key_LastCharTyped Network_ErrCnt ClearTime wrapPointer InChar Timer ClearTime RxFifo Put Key InChar

Table 3.5. 1. Exa111ples of names. Checkpoint 3.5.2: By looking at its name, can you tell if a function is private or public? Checkpoint 3.5.3: By looking at its name, can you tell if a vari able is local or global?

The Single Entry Point is at the Top. In assembly language, we place a single entry point of a subroutine at the first line of the code. By default, C functions have a single ent1y point. Placing the entry point at the top provides a visual marker for the beginning of the subroutine. The Single Exit Point is at the Bottom. Most programmers prefer to use a single exit point as the last line of the subroutine. Some programmers employ multiple exit points for efficiency reasons. In general, we must guarantee the registers, stack, and return parameters are at a similar and consistent state for each exit point. ln particular, we must deallocate local variables properly. If you do employ multiple exit points, then you should develop a means to visually delineate where one subroutine ends and the next one starts. You could use one Iine of comments to signify the start a subroutine and a different line of comments to show the end ofit. Program 3.5.1 employs distinct visual markers to see the beginning and end of the subroutine.

164

• 3. Software Design //------------Abs-----------// Take the absolute value of a number. // Input: RO is 32-bit signed number // Output: RO is 31-bit absolute value Abs: CMP RO, #0 // is number (RO) >= 0? BPL AbsOK // if so, already positive RSBS R0,R0,#0 // negate AbsOK: BX LR // return //------------end of Abs------------

//************Abs************ // Input: signed 32-bit // Output: absolute value uint32_t Abs(int32_t n){ i f (nCOUNTERREGS . CTR ; z = udiv(0xFFFF,1) ; end = TIMG8->COUNTERREGS . CTR; elapsed= s t art-end-13; while(l){ }

PrograJJJ 3.7.2. Using Timer G8 to measure elapsed time. main: MOVS BL BL LDR LDR LDR MOVS BL LDR SUBS LDR ANDS loop : B

R0,#0 Clock Init80Maz SysTickinit R4,=SysTick_V.AL RS, [R4] II start R0,=0xFFFF Rl,#1 udiv32 16 R6 , [R4] II R6=end R7,RS,R6 II start-end R2, =0xFFFFFF R7,R7,R2 // R7=elapsed loop

Program 3.7.3. Using SjsTick to mea.rttre elapsed time.

uint32 t udiv(uint32 t x, uint32 t y){ return x/y; int main(void){ uint32_t z ; uint32_t elapsed,start,end ; Clock_ InitB0MHz(0); SysTick- >LOAD = 0xFFFFFF ; SysTick->VAL = 0; SysTick->CTRL = 5; start= SysTick->VAL; // 12.Sns z = udiv(0xFFFF,l) ; end= SysTick->VAL ; elapsed = (start - end)&0xFFFFFF; while (1) { }

178

• 3. Software Design

3. 7 .3. Using a Logic Analyzer to Measure Elapased Time Another method to observe time-dependent behavior of our software involves an output port and a logic analyzer or oscilloscope, see Program 3. 7.4. Assume a logic analyzer or oscilloscope is attached to Port B bit I. The two subroutines in Program 2.3.2 or the macros in Program 2.5.3 could be used to set and clear the PB l. Next, you add calls to LED_On and LED_Off at strategic places within the system. Port B must be initialized so that bit l is an output (LED_ Ini t) before the debugging begins. We will set the pin high before the call to the function and set the pin low after the function call. In this way a pulse is created on the digital output with duration equal to the execution time of the function. By placing the function call in a loop, the scope can be triggered. Figure 3.7.1 shows the execution time is about 48~ts.

uint32 t sqrt2(uint32 ts){ int n; // loop counter uint32-t t; - // t*t will becomes t = s/16+1; // initial guess for(n = 16; n; --n){ // will finish t = ((t*t+s)/t)/2; }

return t; }

int main(void){ uint32 t tt; Clock_Init80MHz(0); LED_Init(); while(l){ LED_On (); tt sqrt2(230400); LED_Off();

=

}

Program 3.7.4. Using a logic a11a!Jzer to 111easure elapsed ti111e. • T•


Fi Measurements

ft

X

PB!

X,..

Fig11re 3. 7. 1. J\1ee1sure171ent ef exemtio11 time using a port pi11 and a logic mwb1zer. Performance debugging includes both data and time. In Program 3.7.5, Timer G8 is initialized with a resolution of I Ous and a maximum of 655ms. Notice Save4 is called after each output to Port B. Both the value of Port B and the time are recorded. Save4: PUSH {RO-R4,LR} LDR RO,=Cnt //RO= &Cnt LDR Rl, [RO] //Rl Cnt

#define SIZE 32 uint8_t DBuf[SIZE]; uint16 t TBuf[SIZE];

Jonathan Valvano

179

CMP Rl,#SIZE BHS done4 //full? LOR R3,=GPIOB DOUT31 0 LOR R3, [R3] //Port out LOR R2,=DBuf STRB R3, [R2,Rl] //dump LOR R3,=TIMG8 CTR LOR R3, [R3] // Time LOR R2,=TBuf LSLS R4,Rl,#1 // 16bit STRH R3, [R2,R4] //dump ADDS Rl,#1 STR Rl, [RO] //save Cnt done4: POP {R0-R4,PC} StepperToggle2: LOR Rl,=GPIOB_DOUTTGL31_0 STR RO, [Rl] BL Save4 BX LR

uint32_t Cnt; void Save4(void}{ if(Cnt < SIZE}{ DBuf[Cnt] = GPIOB->DOUT31_0; TBuf[Cnt] = TIMGB->COUNTERREGS.CTR; Cnt++;

main: MOVS BL MOVS MOVS BL BL BL loop: MOVS BL LDR BL MOVS BL LOR BL B

int main(void}{ Clock_InitBOMHz(O}; TimerGB_Init(S,16O); // lOus

B

R0,#0 Clock InitBOMHz R0,#5 Rl,#160 TimerGB Init // lOus SysTickinit Stepperinit RO,#OxOC StepperToggle2 R0,=400000 SysTick_Wait RO,#Ox03 StepperToggle2 R0,=400000 SysTick_Wait loop

void Stepper_Toggle2(uint32_t n} { GPIOB->DOUTTGL31 0 = n; Save4 (};

SysTick_Init(); Stepper_Init(}; while(l}{ Stepper_Toggle2(OxOC}; SysTick_Wait(4OOOOO}; Stepper_Toggle2(OxO3}; SysTick_Wait(4OOOOO};

Program 3.7.5. Pe~formance debugging records hoth data and lime. The values in Dbuf follow the expected values : 5,6, 10,9,5,6, 10,9 ... (see Figure 3.7.2). Notice the Tbuf values decrease by about 500 at each output (e.g., 65036-64536 = 500). The difference in values of adjacent TBuf entries are 500, meaning 500*10us = 5ms, as expected. Save4 is minimally intrusive because it only takes 40 bus cycles to execute, while it is cal led every 400000 cycles.



180

3. Software Design

Expression

v

k= TBuf (x)= Cnt C::7 DBuf (x)=

[D]

= (1 J (x)=

[2]

(x)=

[3]

(x)=

[4]

(x)=

[51

Type unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned umicmed

short[32] int char[32] char char char char char char

Value

Expression

[ ,65036., 6453 32 [5 '\x05',6 '\xl 5 '\,xOS' 6 '\x06' 1D'\xOa' 9 '\x09' 5 '\ x S' 6 '\x06'

v

(='7 TBuf

[DJ = [1J (x)= [2]

(x)=

(x) =

(3]

(x)=

[4]

(x)=

(5] [6]

(x),

(x)= [71

Type unsigned unsigned unsigned umigned unsigned unsigned unsigned unsigned unsianed

Value short[32] short short short short short short short short

[0,65036,64 0 65036 64536 64036 63535 63035 62535 6203 5

Figure 3.7.2. Petjormance debugging using a hardJVare timer and a debugging dump. To simplify the analysis we could have recorded the time difference, see Program 3.7.6. The values in Dbuf again follow the expected values: 5,6, 10,9,5,6, l 0,9 .. . However, it is easy to see from the Tbuf values that the expected time differences are 1Oms. Since we are measuring time differences, the first entry has no information. #define SIZE 32 uint8_t DBuf[SIZE]; uintl6 t TBuf[SIZE]; uint32_t Cnt=0; uint16_t Last=0 ; void SaveS(void) { if(Cnt < SIZE){ uint16_t now; DBuf[Cnt] = GPIOB->DOUT31_0; now= TIMG8->COUNTERREGS.CTR; TBuf[Cnt] = Last-now; Last= now; Cnt++ ;

void Stepper_Toggle3(uint32_t n){ GPIOB-> DOUTTGL31 0 = n; SaveS{); int main {void) { Clock_InitSOMHz(O); TimerGS_Init{S,160); // lOus SysTick_Init () ; Stepper_Init(); while{l){ Stepper_Toggle3{0x0C); SysTick_Wait{400000); Stepper_Toggle3(0x03); SysTick_Wait{400000);

Progrc1111 3.7.6. The pe1:formcmce debugging records data and time difference.

4. Finite State Machines

4.1. Structures A structure is a mechanism to organize data with different types and/or precisions. ln C, we use struct to define a structure. The canst modifier causes the structure to be allocated in ROM. Without the canst, the C compiler will place the structure in RAM, allowing it to be dynamically changed. lfthe structure were to contain an ASCII string of variable length, then we must allocate space to handle its maximum size. In this first example, the structure will be allocated in RAM so no canst is included. The following code defines a structure with three elements. We give separate names and types to each element. The typedef command creates a new data type based on the structure, but no memory is allocated.

struct player{ uint8_t Xpos; uint8_t Ypos; uint16_t LifePoints;

II II II

first element second element third element

};

typedef struct player player_t; We can allocate a variable called Sprite of this type, which will occupy four bytes in RAM:

player_t Sprite; We can access the individual elements of this variable using the syntax name. element. After these three lines are exc:cuted we have the data structure as shown in Figure 4.1 . l , assuming the variable occupies the four bytes starting at 0x2020.0250. Sprite.Xpos = 10; Sprite.Ypos = 20; Sprite.LifePoints = 12000;

0x2020.0250 0x2020.025 l 0x2020.0252

10 20....... 12000

__ __...

Figure 4.1.1. A strttctttre collects elements of dijferent sizes and/ or types into one ol?Ject. Addresses illttstmte the Sprite struclure is stored in RAM as (1-hyte, 1-lryte, pl11s 2-lrytes). We can also define pointers to structures. We define pointer in a similar way as other pointers

player_t *Ptr; Before we can use a pointer, we must make sure it points to something Ptr = &Sprite; We access the individual fields using the syntax pointer->element

182

• 4. Finite State Machines Ptr->Xpos = 10; Ptr->Ypos = 20; Ptr->LifePoints = 12000; We can create something similar to structures in assembly by using . equ definitions, see Program 4.1.1. Since structures have multiple elements, we will employ call by reference when pass ing parameters. This C function takes a player, moves it to location 50,50 and adds one lifepoint We call the function by passing an address to a player_t variable. For example, we execute MoveCenter (&Sprite) ;

II Input: RO points to a player_t .equ Xpos, 0 .equ Ypos, 1 .equ LifePoints, 2 MoveCenter: MOVS Rl,#50 STRB Rl, [R0,#Xpos] STRB Rl, [R0,#Ypos] LDRH Rl, [R0,#LifePoints] LDR R2,=65535 CMP Rl,R2 II at max? BHS skip ADDS Rl,#1 II more life STRH Rl, [R0,#LifePoints] skip: BX LR

II move to center and add life void MoveCenter(player_t *pt){ pt->Xpos = 50; pt->Ypos = 50; if(pt->LifePoints < 65535){ pt->LifePoints++;

Progra111 4. 1.1. A jfl11ctio11 that acresses a structure. Observation: Most C compilers will align 16-bit elements within structures to an even address and wiJJ align 32-bit elements to a word-aligned address. Observation: CalJ by reference alJows tl1e single parameter to be both an input parameter and an output parameter.

Without the const, the C compiler will place the strncture in RAM, allowing it to be dynamically changed. If the structure resides in RAM , then the system will have to initialize the data structure explicitly by executing software. If the structure is in ROM , we must initialize it at compile time. The next section shows examples of ROM-based structures.

4.2. Finite State Machines with Linked Structures

4.2.1. Abstraction Software abstraction allows us to define a complex problem with a set of basic abstract principles. If we can construct our software system using these abstract building blocks, then we have a better understanding of both the problem and its solution. This is because we can separate what we are doing (policies) from the details of how we are getting it done (mechanisms). This

Jonathan Valvano

183

separation also makes it easier to optimize. Absh·action provides for a proof of correct function and simplifies both extensions and customization. The abstraction presented in this section is the Finite State Machine (FSM). The abstract principles ofFSM development are the inputs, outputs, states, and state transitions. The FSM state graph defines the time-dependent relationship between its inputs and outputs. Ifwe can take a complex problem and map it into a FSM model, then we can solve it with simple FSM software tools. Our FSM software implementation will be easy to understand, debug, and modify. Other examples of software abstraction include Proportional Integral Derivative digital controllers, fuzzy logic digital controllers, neural networks, and linear systems of differential equations. In each case, the problem is mapped into a well-defined model with a set of abstract yet powerful rules. Then, the software solution is a matter of implementing the rules of the model. In our case, once we prove our software con-ectly solves one FSM, then we can make changes to the state graph and be confident that our software solution correctly implements the new FSM. The FSM controller employs a well-defined model or framework with which we solve our problem. The state graph will be specified using either a linked or table data structure. An important aspect of this method is to create a 1-1 mapping from the state graph into the data structure. The tlu·ee advantages of this abstraction are 1) it can be faster to develop because many of the building blocks preexist; 2) it is easier to debug (prove correct) because it separates conceptual issues from implementation; and 3) it is easier to change. In a Moore FSM the output value depends only on the cun-ent state, and the inputs affect the state transitions. On the other hand, the outputs of a Mealy FSM depend both on the current state and the inputs. See Figure 4.2.1 . When designing a FSM, we begin by defining what constitutes a state. In general, a state embodies what we believe to be true. Being in a state, therefore, specifies what we know. The state represents storage of information, because which state we are in is a function of the current input and all previous inputs. Mathematically, the FSM represents an inefficient mechanism for data storage. If you wished to store an 8-bit number, it would require 256 states. Conversely, if an FSM has 50 states, there arc 50 distinct possibilities, only one of which we believe to be true at any given time. Even with this seemingly inefficient property, FSMs are convenient and natural mechanism to solve problems with inputs and outputs , because we will be able to separate what is does (high level) from how it works (low level). In a simple system like a single intersection traffic light, a state might be defined as the pattern of lights (i.e., which lights are on and which are off) and what we believe the h·affic is at this time. For example, we think there are cars on the north road, so we activate green on the north road. In a more sophisticated traffic controller, what it means to be in a state might also include predictions about future traffic volume at this and other adjacent intersections. The next step is to make a list of the various states in which the system might exist. As in all designs, we add outputs so the system can affect the external environment, and inputs so the system can collect information about its environment or receive conunands as needed. The execution of a Moore FSM repeats this sequence over and over 1. Perform output, which depends on the current state 2. Wait a prescribed amount of time (optional) 3. Input 4. Go to next state, which depends on the input and the current state

184

• 4. Finite State Machines The execution of a Mealy FSM repeats this sequence over and over 1. 2. 3. 4.

Wait a prescribed amount of time (optional) Input Perform o utput, which depends on the input and the current state Go to next state, which depends on the input and the current state

There are other possible execution sequences. Therefore, it is important to document the sequence before the state graph is drawn. The high-level behavior of the system is defined by the state graph. The states are drawn as circles. Descriptive states names help explain what the machine is doing. Arrows are drawn from one state to another, and labeled with the input value causing that state transition.

Moore stat Name Output Wait time

Mealy state®=/123 Output if • • l Name ( Go mput 1s \ 30 Next if Wait time input is 1

r1gure 4.2.1 . The output in a Moore depends just on the state. In a Mea(y the output depends on state and input. Observation: If the machine is such that a specific output value is necessary "to be a state", then a Moore implementation will be more appropriate. Observation: If the machine is such that no specific output value is necessary "to be a state", but rather the output is re9uired "to transition the machine from one state to the next", then a Mealy implementation will be more appropriate.

A linked structure consists of multiple identically-structured nodes. Each node of the linked structure defines one state. One or more of the entries in the node is a pointer (or link) to other nodes. In an embedded system, we usually use statically-allocated fixed-size linked structures, which are defined at compile time and exist throughout the life of the software. In a simple embedded system the state graph is fixed, so we can store the linked data structure in nonvolatile memory. For complex systems where the control functions change dynamically (e.g., the state graph itself varies over time), we could implement dynamically-allocated linked structw·es, which are constructed at run time and number of nodes can grow and shrink in time. We can implement next arrows as pointers or indices. An important factor when implementing FSMs is that there should be a clear and one-to-one mapping between the FSM state graph and the data structure. I.e., there should be one element of the strncture for each state. If each state has four aITows, then each node of the linked structure should have four links.

4.2.2. Moore Finite State Machines The outputs of Moore FSM are only a function of the cunent state. In contrast, the outputs are a function of both the input and the current state in a Mealy FSM. Often, in a Moore FSM, the specific output pattern defines what it means to be in the current state.

Jonathan Valvano

185

Example 4.2.1. Design a traffic light controller for the intersection of two equally busy one-way streets. The goal is to maximize traffic flow, minimize waiting at a red light, and avoid accidents. Solution: The intersection has two one-ways roads with the same amount of traffic: North and East, as shown in Figure 4.2.2. Controlling traffic is a good example because we all know what is supposed to happen at the intersection of two busy one-way streets. We begin the design defining what constitutes a state. ln this system, a state describes which road has authority to cross the intersection. The basic idea, of course, is to prevent southbound cars to enter the intersection at the same time as westbound cars. In this system, the light pattern defines which road has right of way over the other. Since an output pattern to the lights is necessary to remain in a state, we will solve this system with a Moore FSM. 1t will have two inputs (car sensors on North and East roads) and six outputs (one for each light in the traffic signal.) The six traffic lights are interfaced to :Port B bits 5-0, and the two sensors are connected to Port B bits 7- 6, PB7=0, PB6=0 means no cars exist on either road PB7=0, PB6=1 means there are cars on the East road PB7=1, PB6=0 means there are cars on the North road PB7=l, PB6=1 means there are cars on both roads The next step in designing the FSM is to create some states. Again, the Moore implementation was chosen because which lights are on defines which state we are in. Each state has a name: goN, PBS-0 = 100001 makes it green on North and red on East wai tN, PBS-0 = 100010 makes it yellow on North and red on East goE, PBS-0 = 001100 makes it red on North and green on East wai tE, PBS-0 = 010100 makes it red on North and yellow on East

MSPMO

PB?

------North.__LL .

PB6 .----. ' '---===='----1, PBS PB4 PB3 PB2 1 - - - - - - . PB! PBO

Figure 4.2.2. Traffic light inteiface 111ith two sensors and 6 lights. The output pattern for each state is drawn inside the state circle. The time to wait for each state is also included. How the machine operates will be dictated by the input-dependent state transitions. We create decision rules defining what to do for each possible input and for each state. For this design we can list heuristics describing how the traffic light is to operate: If no cars are coming, stay in a green state, but which one doesn't matter. To change from gteen to red, implement a yellow light of exactly 5 seconds. Green lights will last at least 30 seconds. If cars are only coming in one direction, move to and stay green in that direction. If cars are coming in both directions, cycle through all four states.

186

• 4. Finite State Machines Before we draw the state graph, we need to decide on the sequence of operations. 1. Initialize timer and direction registers 2. Specify initial state 3. Perform FSM controller a) Output to traffic lights, which depends on the state b) Delay, which depends on the state c) Input from sensors d) Change states, which depends on the state and the input We implement the heuristics by defining the state transitions, as illustrated in Figure 4.2.3. Instead of using a graph to define the finite state machine, we could have used a state transision table (STT), as shown in Table 4.2 .1.

Next if input is 01 or 11 Output

Figure 4.2.3. STG, a graphical jortn if a Moore FSM that implements a traffic light. State

\

Input

goN (100001,30) wai tN (100010,5) goE (001100,30) waitE (010100,5)

00

01

10

11

goN goE goE goN

wai tN goE goE goN

goN goE wai tE goN

wai tN goE wai tE goN

Table 4.2.1. STT, a tabular form of a Moore FSM that implements a traffic light.

The next step is to map the FSM graph onto a data structure that can be stored in ROM. Program 4.2.1 uses a linked data structure, where each state is a node, and state transitions are defined as pointers to other nodes. The four Next parameters define the input-dependent state transitions. The wait times are defined in the software as fixed-point decimal munbers with units of 0.01 seconds, giving a range of 10 ms to about 10 minutes. Using good labels makes the program easier to understand, in other words goN is more descriptive than &FSM [ 0] . The main program begins by initializing the PLL, SysTick and Po11 B. The initial state is defined as goN. The main loop of our controller first outputs the desired light pattern to the six LEDs, waits for the specified amount of time, reads the sensor inputs from P011 B, and then switches to the next state depending on the input data. The timer functions were presented earlier as Program 4.1.1. The function SysTick_WaitlOms will wait !Oms times the parameter in Register RO. This implementation is friendly, because it performs a read-modify-write access to Port B. The initiaEzation code that sets PB7-PB6 as inputs and PBS-0 as outputs can be found it the starter projects. Program 4.2.2 shows a version that uses indexed access instead of pointers. Notice for both versions there is a 1-1 relation between the STT (Table 4.2.1) and the software.

]onathan Valvano // Linked data structure .equ OUT,O //offset for output .equ WAIT,4 //offset for time .equ NEXT,8 //offset for next goN: .word Ox21 .word 3000 //30 sec .word goN,waitN,goN,waitN waitN: .word Ox22 .word 500 //5 sec .word goE,goE,goE,goE . word OxOC goE: . word 3000 //30 sec . word goE,goE,waitE,waitE waitE: .word Oxl4 .word 500 //5 sec . word goN,goN,goN,goN main: MOVS R0,#0 BL Clock Init80MHz BL SysTick_Init BL Traffic Init LDR R4,=goN LDR R5,=GPIOB_DOUT31_0 LDR R6,=GPIOB_DIN31_0 loop: LDR RO, [R4, #OUT] LDR Rl,[R5] // all of Port B MOVS R2,#0x3F BICS Rl,Rl,R2 ORRS RO,RO,Rl STR RO,[RS] // output LDR RO , [R4, #WAIT] BL SysTick WaitlOms LDR RO, [R6]-// all inputs MOVS Rl,#OxCO ANDS RO,RO,Rl // just PB7,6 LSRS RO,R0,#4 // Input*4 ADDS RO,RO,#NEXT II a,12,16,20 LDR R4, [R4 ,RO] // next state B loop

// Linked data structure struct State { uint32_t Out ; uint32_t Time; const struct State *Next[4];}; typedef const struct State State_ t ; #define goN &FSM[O] #define waitN &FSM[l] #define goE &FSM[2] #define waitE &FSM(3] State_t FSM[4]={ {Ox21,3000 , {goN , waitN,goN,waitN}} , {Ox22, 500,{goE,goE,goE,goE}}, {OxOC,3000 , {goE,goE,waitE,waitE}}, {Ox14, 500,{goN,goN,goN,goN}}}; State_t *Pt; // state pointer int main(void){ uint32 t Input; Clock_Init80MHz(O); SysTick_Init(); Traffic_Init () ; Pt= goN; while(l){ GPIOB->DOUT31 0 = (GPIOB->DOUT3l_ O&(~Ox3F)) IPt->Out; SysTick_Waitl Oms(Pt->Time); Input= (GPIOB->DIN31_0&0xC0)>>6; Pt= Pt->Next[Input];

Program 4.2. 1. Linked data structure implementation of the trcifftc light controller (TrafficLightFSM).

II

each state is 24 bytes . equ OUT,0 //offset for output . equ WAIT,4 //offset for time .equ NEXT,8 //offset for next .equ .equ .equ .equ

goN,O waitN,l goE,2 waitE,3

187

II Linked data structure struct State { uint32 t Out; uint32 t Time; uint32 t Next[4] ; }; typedef const struct State State t; #define goN 0 #define waitN 1 #define goE 2 #define waitE 3

-

188 FSM: SgoN:

• 4. Finite State Machines

.word Ox21 .word 300 //30 sec .word goN,waitN,goN,waitN SwaitN: .word Ox22 .word 50 //5 sec .word goE,goE,goE,goE SgoE: .word OxOC .word 300 //30 sec .word goE,goE,waitE,waitE SwaitE: .word Ox14 .word 50 //5 sec .word goN,goN,goN,goN main: MOVS R0,#0 BL Clock Init80MHz BL SysTick_Init BL Traffic Init MOVS R4,#goN- // S LDR R5,=GPIOB_DOUT31_0 LDR R6,=GPIOB DIN31 0 loop: //FSM+S*24+0UT LDR R7,=FSM // base address MOVS R3,#24 MULS R3,R3,R4 // S*24 ADDS R7,R7,R3 // FSM+S*24 LDR RO, [R7,#0UT] IIFSM+S*24+0UT LDR Rl,[RS] // all of Port B MOVS R2,#0x3F BICS Rl,Rl,R2 ORRS RO,RO,Rl STR RO, [RS] /I output LDR RO, [R7,#WAIT] //FSM+S*24+WAIT SysTick WaitlOms BL LDR RO, [R6]-/I all inputs MOVS Rl,#OxCO ANDS RO,RO,Rl // just PB7,6 LSRS RO,R0,#4 II 0,4,8,12 ADDS RO,RO,#NEXT II 8,12,16,20 II next state LDR R4, [R7 ,RO] B loop

State_t FSM[4]={ {0x21,3000,{goN,waitN,goN,waitN}}, {0x22, 500,{goE,goE,goE,goE}}, {0x0C,3000,{goE,goE,waitE,waitE}}, {0x14, 500,{goN,goN,goN,goN}}}; uint32_t S; II index to current state int main(void){ uint32_t Input; Clock_Init80MHz(0); SysTick_Init(); Traffic_Ini t () ; S goN; while(l){ GPIOB->DOUT31 0 = (GPIOB->DOUT31_0&(~0x3F)) IFSM[S] .Out;

=

SysTick_Waitl0ms(FSM[S] .Time); Input= (GPIOB->DIN3l_0&0xC0)>>6; S FSM[S] .Next[Input];

=

Program 4.2.2. Linked dctta structttre ttsing indexing (TrafficLightFSM). In order to separate what it does from how it works , we have made a I-to- I correspondence between the state transition graph (STG) in Figure 4.2.3, the state transition table (STT) in Table 4.2. l and the FSM [ 4] data structure in software. Notice also how this implementation separates the civil enginee1ing policies (the data structure specifies what the machine does), from the computer engineering mechanisms (the executing software specifies how it is done.) Once we have proven the executing software to be operational, we can modify the policies and be confident that the mechanisms will still work. When an accident occurs, we can blame the civil engineer that designed the state graph.

I'

189

Jonathan V aJvano

The FSM approach makes it easy to change. To change the wait time for a state, we simply change the value in the data structure. To add more states (e.g., put a red/red state after each yellow state, which will reduce accidents caused by bad drivers running the yellow light), we simply increase the size of the FSM[] structure and define the Out, Time, and Next fields for these new states. To add more output signals (e.g., walk and left tmn lights), we simply increase the precision of the Out field . To add two more input lines (e.g., wait button, left turn car sensor), we increase the size of the next field to Next [ 16] . Because now there are four input lines, there are 16 possible combinations, where each input possibility requires a Next value specifying where to go if this combination occurs. In this simple scheme, the size of the Next [] field will be 2 raised to the power of the number of input signals. Checkpoint 4.2.1: Why is it good to use labels fo r the states:> E .g., goN is better than &FSM[O].

Observation: In ord er to make the FSM respond guicker, we could implement a time delay function that returnt immediately if an al arm condition occurs. Tf no alarm exists, it waits the specified delay.

Example 4.2.2. Design a system with one input and one output. The output is a fixed 500-ms period, variable duty cycle LED output with duty cycle varying from 10, 30, 50, 70, 90% as the input switch is touched and released. Solution: We will connect a positive logic switch to PB0 and a positive logic LED to PBl. Figure 4.2.5 shows the expected behavior. The duty cycle changes when the switch is released. The state transition graph is shown in Figure 4.2.5 . The software is Program 4.2.3 .

Because there is one binary input, there are two possible next states depending on that input. Each duty cycle has three states with names starting with letters H LC. The H state has the output high and the L state has the output low. Notice if the input is low, the FSM oscillates between the H and L states, with a fixed period of 500 ms. There are 5 sets of H L C states. Each set has a different duty cycle: 10% 30% 50% 70% and 90% respectively. If the input goes high, the FSM moves to the corresponding C state. The FSM will remain the C state while the input is high. The 50 ms wait time in the C state is long enough to debounce the switch but short enough that it seems responsive to the user. When the switch is released the FSM moves to the next set with a changed duty cycle. We will use the l0ms version of the SysTick wait to make the system more portable, and easier to change if one wishes to make the delays longer.

-r . .- . .....__ s _to'p __.1 -lvoo sampl-;~ at ;;;·Hz I2 0i3-05- 19 09: 32: 1-i:-m

l

f - PB~ PB!

X•

----

30%

90%

70% -3 s

-6 s

-4s

·2 s

Figure 4.2.5. PB1 output is 2 H ZJ variable duty rycle PWM signal. PBO input changes duty rycle.

10% 2s

190

• 4. Finite State Machines

Figure 4.2.6. State transition graph acfjusts the duty rycle each time the sivitch is released. struct State{ uint32_t Out ; uint32_t Wait; II l0ms const struct State *Next[2];

{ { { { { { { { { { {

} ;

typedef const struct State St_t; #define Hl0 &FSM[0] #define Ll0 &FSM[l] #define Cl0 &FSM[2] #define H30 &FSM[3] #define L30 &FSM[4] #define C30 &FSM[S] #define H50 &FSM[6] #define L50 &FSM[7] #define C50 &FSM[8] #define H70 &FSM[9] #define L70 &FSM[l0] #define C70 &FSM[ll] #define H90 &FSM[l2] #define L90 &FSM[l3] #define C90 &FSM[l4] St_t FSM[l5]={ { 2, 5,Ll0,Cl0}, II Hl0 { 0,45,Hl0,Cl0}, II Ll0 { 0, 5,L30,Cl0}, II Cl0 { 2,15,L30,C30}, II H30

0,35,H30,C30}, 0, 5,L50,C30}, 2,25,L50,C50}, 0,25,H50,C50}, 0, 5,L70,C50}, 2,35,L70,C70}, 0,15,H70,C70}, 0, 5,L90,C70}, 2,45,L90,C90}, 0, 5,H90,C90}, 0, 5,Ll0,C90}

II II II II II II II II II II II

L30 C30 H50 L50 C50 H70 L70 C10

890 L90 C90

} ;

St_t *Pt; int main(void){uint32_t in; Clock_Init80MHz(0); SysTick Init(); Init() ;-II Prog 1.10.2 Pt= H30; II initial state while(l){ GPIOB->DOUT31 0 = (GPIOB->DOUT31_0&(~0x02)) IPt->Out; SysTick Waitl0ms(Pt->Wait); in= GPIOB->DIN31 0&0x0l; Pt= Pt->Next[in]; II next

Program 4.2.3. Linked data stmrture i111plementation of a variable duty rycle LED (fra[ficLightfDOUT31 0 = (GPIOB->DOUT31_0&(~0x0F)) IPt->Out; SysTick_Wait(Pt->Wait); in= (GPIOB->DIN31 O&Oxl0)>>4; Pt= Pt->Next[in] ;-// next

Progra111 4.2.4. Linked dcita strNcture implementation of a stepper motor controller (TrafftcLightFSM).

192

• 4. Finite State Machines

4.3. Debugging Debugging is an important aspect of any embedded system. One simple mechanism to debug finite state machines is to add the state name to the data structure. The controller is the same, so there is no change to the behavior of the system. However, ifwe add Pt->Name to the debugger, we can observe the state sequence in real time as it executes. Populating the debugger observation window on most debuggers requires time, so this technique will be minimally intrustive. struct State { char Name[8]; uint32 t Out; uint32 t Time; canst struct State *Next[4] ;}; typedef canst struct State State_t; State_t FSM[4]={ { "goN", 0x21, 3000, {gaN, wai tN, gaN, wai tN}}, {"waitN" ,0x22, 500, {gaE,gaE,goE,gaE}}, { "gaE", 0x0C, 3000, {goE, gaE, wai tE, wai tE}}, {"waitN",0xl4, 500,{goN,goN,gaN,gaN}}};

Another mechanism is the dmnp presented in Chapter 3. We could save input, output, and time. #define SIZE 256 uint8_t IBuf[SIZE]; uint8_t OBuf[SIZE]; uintl6_t TBuf[SIZE]; uint32_t Cnt; void Save(vaid){ if(Cnt < SIZE) { IBuf[Cnt] = Input; OBuf[Cnt] = Pt->Out; TBuf[Cnt] = TIMG8->COUNTERREGS.CTR; Cnt++;

A logic analyzer is a nonintrusive tool for debugging FSMs, see Figure 4.3.1 . Name PB4 PB3 PB2 PBl PB0

;

....

Pin

T

Armed

4096 samples at 4 kHz

I 2023-05-26 09: 20: 17. 596

L

DIO 4 I

• x m x m x DIO O X -100 ms



100 ms

300 ms

.Figure 4. 3.1. L ogic ana!Jzer trace of Program 4.2.4 stepp er motor· controller

500 ms

5. Real-time Systems

5.1. Hardware/Software Synchronization One can think of the hiirdware being in one of three states. The off state is when the device is disabled or inactive. Ne) 1/0 occurs in the off state. When active (not off) the hardware toggles between the busy and teady states. The interface includes a trigger flag specifying either busy (0) or ready (I) status. Hardware-software synchronization revolves around this flag: • • • •

The hardware will set the flag when the hardware is complete, The software can read the flag to determine if the device is busy or ready, The software can clear the flag, signifying the software is complete, This flag serves as the hardware trigger for an interrupt.

For an input device, the trigger flag is set by hardware when new input data a.re available. Once the software recognizes the input device has gone from busy to ready, it will read the data and ask the input device to create more data. It is the busy to ready state transition that signals to the software that the hardware task is complete, and now software service is required. The problem with 1/0 devices is that they may be slower or faster than software execution. Therefore, we need synchronization, which is the process of the hardware and software waiting for each other in a manner such that data are properly transmitted. A way to visualize thjs synchronization is to draw a state versus time plot of the activities of the hardware and software. For an input device, the software begins by waiting for new input. When the input device is busy it is in the process of creating new input. When the input device is ready, new data are available. When the input device makes the transition from busy to ready, it releases the software to go forward . In a similar w,1y, when the software accepts the input, it releases the hardware to create more input. The up and down arrows in Figure 5. l. l represent the synchronizing events. In this figure, the time for the software to read and process the data is less than the time for the input device to create new input. This situation is called 1/0 bound, meaning the data transfer rate is limited by the speed of the I/0 hardware.

lnte,face latency ~

Input device

Software

l Response time

Busy

Busy

'--------------1...---:--..-'--~---------\-----r'-----_.,,.

Wait

Wait Read Process

----11)1,► Time

Read Process

Figure 5.1.1. The sojtJvare 1J1t1st 11Jait for the input device to be 1·eady (I/ 0 hound inpttt inteiface).

194

• 5. Real-time Systems If the input device were faster than the software, then the software waiting time would be small. This situation would then be called CPU bound, because the data transfer rate is limited by the speed of the executing software. In real systems the data transfer rate depends on both the hardware and the software. Furthermore, the data rates can vary over time, like traffic arriving and leaving an intersection. In other words, the same 1/0 channel can sometimes be 1/0 bound, but at other times the channel could be CPU bound. For an output device, a trigger flag is set when the output is ready to accept more data. Figure 5.1.2 contains a state versus time plot of the activities of the output device hardware and software. For an output device, the software begins by generating data then writing it to the output device. When the output device is busy it is processing the data, i.e. , performing the output. Normally when the software writes data to an output port, that only starts the output process. Again, the time it takes an output device to process data may be slower or faster than the software execution time. When the output device makes the transition from busy to ready, it releases the software to go forward. In a similar way, when the software writes data to the output, it releases the output device hardware. Figure 5.1.2 illustrates an l/O bound interface because the time for the output device to perform the output is longer than the time for the software to generate and write it. Again, I/0 bound means the data transfer rate is limited by the speed of the I/0 hardware.

interface latency

Output device

Software

Ready

Busy

-------------~j Device latency

"'i

Busy

L----',--------+--....,....--.-,...-----------1...-----.-L----✓

Wait

Write Generate Generate

Wait

Wait

Write ____.., Time Generate

Figure 5.1.2. T/Je sojhvare must UJait for the outpNt device to finish the previous operation (I/ 0 bound). Observation: For an I/O bound system, the software waits for the hardware longer than the hardware waits for the software. For an input device, the interface latency is the time between when new data are available, and the time when the software reads the input data. A similar parameter is response time, which is the time between when new data are available, and the time when the software completes processing of the data. We can also defi11e device latency as the response time of the I/O device. For example, ifwe request the analog to digital conve1ter (ADC) to sample an analog input, then the device latency is the time it takes from the ADC start conunand to the completion of the analog to digital conversion. For an output device, the interface latency is the time from when the output device is ready to the time when the software writes new data to the output device. A real-time system is one with short and bounded interface latency. In this book, we will also have periodic tasks. In Lab 5, the software will periodically output to the digital to analog converter (DAC) in order produce sound. In Lab 7, we create a data acquisition system, in which the software starts the analog to digital converter (ADC) at a fixed frequency , called the sampling rate. In Lab 7, we collect a sequence of digital values that

Jonathan Valvano

195

approximate the continuous-time analog signal. A control system also employs periodic software processing. At a periodic frequency, a control system will sample data from its sensors, make decisions, and output commands to its actuators. For periodic tasks, we define time jitter as the maximum variation in the time in between running the periodic task. Throughput or bandwidth is the maximum data flow in bytes/second that can be processed by the system. Sometimes the bandwidth is limited by the I/0 device (called I/0 bound), while other times it is limited by software (called CPU bound). Bandwidth can be reported as an overall average or a sho1t-term maximum. Priority determines the order of service when two or more requests are made simultaneously. Priority also determines if a high-primity request should be allowed to suspend a low priority request that is currently being processed. We may also wish to implement equal priority, so that no one device can monopolize the computer. The tolerance of a real-time system towards failure to meet the timing requirements determines whether we classify it as hard real time, firm real time, or soft real time. If missing a timing constraint is completely unacceptable, we call it a hard real-time system. In a firm real-time system, the value of an operation completed past its timing constraint is considered zero but not considered as a complete failure. In a soft real-time system, the value of an operation diminishes the further it completes after the timing constraint. The hardware/software interface allows the microcontroller to interact with its l/0 device. There are five mechanisms to synchronize the microcontroller with the l/0 device. Each mechanism synchronizes the 1/0 data transfer to the busy to ready transition. See Figures 5.1.3 and 5.1.4.

___~ .,..+

Blind Cycle

Interrupt

Busyit

Wait a fixed time

Read data

Read data

Read data

Get data from Fifo

Put data in Fifo return from intenupt

F igure 5 . 1.3. T he inp ut device sets a.flag when it has new data.

Blind cycle is a method where the software simply waits a fixed amount of time and assumes the 1/0 will complete within that fixed delay . For an input device, the software triggers (starts) the external input hardware, waits a specified time, then reads data from device, see the left pa1t of Figure 5.1.3. For an output device, the software writes data to the output device, starts the device, then waits a specified time, see left part of Figure 5.1.4. We call this method blind, because there is no status about the 1/0 device reported to the software. There is no trigger flag . lt is appropriate to use this method in situations where the l/0 speed is short and predictable. One application of blind-cycle synchronization is the GPIO initialization. We reset and apply power to a GPIO port, and wait 24 bus cycles for the port to become active. This method works because the activation time is short and predictable. The stepper motor interface in Program 4.2.4 used blind-cycle synchronization, because it waited a fixed time between outputs.

196



5. Real-time Systems

~

+

Blind Cycle Busy

BusyWait

Write data

Wait a fixed time

Write data

Write data

Put data in Fifo

return from interrupt Figure 5.1 .4 . T he output device sets a flag 1vhe11 it has finished otttputting the last data.

Busy wait or polling is a software loop that checks the 1/0 status waiting for the ready state. For an input device, the software waits until the input device has new data, and then reads the data from the input device, see the middle part of Figure 5.1.3. For an output device, the software writes data, triggers the output device then waits until the device is finished , see middle part of Figure 5.1.4. Another approach to output device busy wait is for the software to first wait until the output device has finished the previous output, write data, and then trigger the device. Busywait synchronization will be used in situations where the software system is relatively simple and real-time response is not important. The LCD interface in Lab 6, the ADC interface in Lab 7, and the UART output in Lab 8 will use busy-wait synchronization. An interrupt uses hardware to cause special software execution. An software interrupt service routine (ISR) is a function that is triggered by the hardware. With an input device, the hardware will request an interrupt when input device has new data. The ISR will read the data from the input device and save it in first in first out (FIFO) queue, see the right part of Figure 5.1.3. With an output device, the hardware will request an interrupt when the output device is ready to accept more data, see right side of Figure 5.1.4. The ISR will get data from the FIFO, and then write the data to the device. When executing periodic tasks we configme the hardware timer to request interrupts on a periodic basis. Interrupt synchronization will be used in systems with a lot of 1/0 devices or when a real-time response is impo1tant. Periodic polling uses a hardware timer to periodically interrupt. At the time of the interrupt the software will check the I/0 status, performing actions as needed. When the input device has new data, it sets a flag but doesn't trigger an interrrupt. At the next periodic interrupt, the software will notice the flag is set, will read the data, and save the data in global RAM. Similarly, when the output device is ready to accept more data, it sets a flag but doesn ' t trigger an interrupt. At the next periodic interrupt, the software will get data from a RAM, and write it. Periodic polling will be used in situations that require interrupts, but the 1/0 device does not support interrupt requests directly. Figure 5. J .5 shows busy wait side by side with periodic polling. In busy-wait synchronization, the main program polls the 1/0 devices continuously. With periodic polling, the l/0 devices are polled on a regular basis (established by the periodic inte1rnpt.) lfno dev ice needs service, then the interrupt simply returns. If the polling period is L'-.t, then on average the interface latency will be ½L'-.t, and the worst case latency will be L'-.t. Pe1iodic polling is appropriate for low bandwidth devices where real-time response is not necessary. This method frees the main program from the 1/0 tasks. We use periodic polling if the following two conditions apply:

Jonathan Valvano

197

1. The 1/ 0 hardware cannot generate interrupts directly 2. We wish to perform the I/0 function s in the background

Busy wait

Periodic Polling

Ready

Ready

Input datal

Input datal

Ready

Ready

Output data2

Output data2

functions Figure 5.1.5. On the left is b11sy-wait, and on the right is periodic polling. Direct memory acces~ (DMA) is an interfacing approach that transfers data directly to/from memory. With an input device, the hardware will request a OMA transfer when the input device has new data. Without the software's knowledge or permission the OMA controller will read data from the input device and save it in memory. With an output device, the hardware will request a DMA transfer when the output device is ready. The DMA controller will get data from memory, and then write it to the device. We can configure a hardware timer to request OMA transfers on a periodic basis. Using a timer triggered DMA, Figure 5.1.6 shows data goes directly from RAM to DAC. OMA synchronization will be used in situations where high bandwidth and low latency are important. The hardware events that triggers OMA transfer are sim ilar to those used to trigger interrupts. OMA is beyond the scope of this book, but it is supported by the MSPMOG3507.

Cortex-MO+ Bus

i~~

r

1- •• .:.......{ ~:~• ~~ ····t::·· .. •·.,, DMAfrom RAM to DAC

PAl5/ DAC_OUT

Figure 5.1.6. DMA transfers data directly Jro111 RA.1\1 to the digital to analog converter.

198

• 5. Real-time Systems The busy-wait method is class ified as unbuffered because the hardware and software must wait for each other during the transmission of each piece of data. The interrupt solutions, shown in the right parts of Figures 5.1.3 and 5.1.4, are classified as buffered. The input device runs continuously, filling a FIFO with data as fast as it can. The FIFO decouples the action of reading input data from the action of data processing. We will implement a buffered interface for the serial port input in Lab 8 using interrupts. The buffering used in an interrupt interface may be a hardware FIFO, a software FIFO, or both hardware and software FIFOs. We will see the FIFO queues will allow the 1/0 interface to operate during both situations: l/0 bound and CPU bound.

5.2. Interrupt-Triggered Multithreading A thread is defined as the path of action as software executes. Parallel programming allows the computer to execute multiple threads at the exact same time. A computer with a multi-core processor supports parallel programming, because it can execute separate programs in its cores. Fork and join arc the fundamental building blocks of parallel programming. After a fork, two or more software threads will be run in parallel, i.e., the threads will run simu ltaneously on separate cores. Two or more simultaneous software threads can be combined into one using a join (Figure 5.2.1 ). Software execution after the join will wait until all threads above the join arc complete.

+

Parallel

Fork

+ ++ +

9999 Join

9999 t

Distributed

Inte1rupt-driven concurrent

-----.--Trigger interrupt

t

9 __9_--"-R'-eturn from intern, I

fig11re 5.2.1. Flowchart rymbols to describe parallel, disttihNted, a11d concurrent progra111n1i11g. As an analogy, if I want to dig a big hole in my back yard, 1 will invite three friends over and give everyone a shovel. The fork operation changes the situation from me working alone to four of us ready digging at the same time. The four diggers do not have to be performing the exact same task, but they operate simultaneous ly and they cooperate towards a common goal. When the hole is done, the join operation causes the friends to go away, and I am working alone again. We classify a system with multiple computers, each running its own software, connected via 1/0 or a network as a distributed system. Lab 8, which involves solving a problem with two LaunchPads, is an example of a distributed system.

Concurrent or multi-threaded programming allows the computer to execute multiple threads, but only one thread at a time. Interrupts are the mechanism to implement multi-threading. Interrupts have a hardware trigger and a software action. The ISR is a parameter-less function, triggered by a hardware event. The trigger is a hardware event signa ling it is time to do something. This hardware event is called a trigger. The hardware event can either be a busy to

.Jonathan Valvano

199

ready transition in an external 1/0 device (like UART input) or an internal event (like a periodic timer). When the hardware needs service, signified by a busy to ready state transition, it will request an interrupt by setting its trigger flag. The execution of the interrupt service routine (ISR) is called a background t!U'ead. This thread is created by the hardware interrupt request and is killed when the ISR returns from interrupt. A new thread is created for each interrupt request. It is impo1tant to consider each individual request as a separate thread because local variables and registers used in the ISR are unique and separate from one interrupt event to the next interrupt. In a multi-threaded system, we consider the threads as cooperating to pe1form an overall task. Consequently we will develop ways for the threads to communicate and synchronize witl1 each other. A FIFO allows communication: one thread puts data into the FIFO, and another thread gets data from the FIFO. A semaphore is a shared software flag that is set by one thread (signal). Another thread will test the flag, waiting for it to be set, and then clear the flag (wait). Most embedded systems have a single common overall goal. On the other hand, general-purpose computers can have multiple unrelated functions to perform. A process is also defined as the action of software as it executes. Processes do not necessarily cooperate towards a common shared goal. Threads share I/O devices, resources, and global variables, while processes have separate 1/0 devices, resow-ces, and global variables. The foreground thread is defined as the execution of the main program, and the background threads are executions of the ISRs. Consider the analogy of sitting in a comfy chair reading a book. Reading a book is like executing the main program in the foreground. You stait reading at the beginning of the book and basically read one page at time in a sequential fashion. You might jump to the back and look something up in the glossary, then jw11p back to where you were, which is analogous to a function call. Similarly, you might read the same page a few times, which is analogous to a program loop. Even though you skip around a little, the order of pages you read follows a logical and well-defined sequence. Conversely, if the telephone rings, you place a bookmark in the book, and answer the phone. When you are finished talking on the phone, you hang up the phone and continue reading in the book where you left off. The ringing phone is analogous to hardware trigger and the phone conversation is like executing the JSR. There are no standard definitions for the terms mask, enable, and arm in the professional, Computer Science, or Computer Engineering communities. Nevertheless, in this book we will adhere to the following specific meanings. To arm (disarm) a device means to enable (shut off) the source of interrupts. Each potential interrupting trigger has a separate arm bit. One arms (disarms) a trigger if one is (is not) interested in interrupts from this source. To enable (disable) means to allow inte1TL1pts at this time (postponing interrupts until a later time). On the Cortex-M there is one interrupt enable bit for the entire interrupt system . We disable interrupts if it is currently not convenient to accept interrupts. In particular, to disable interrupts we set the I bit in PRIMASK. We enable interrupts by clearing the I bit. Except when the trigger flag is set, the software has control over the other aspects of the interrupt process. First, each potential interrupt trigger has a separate arm bit that the software can activate or deactivate. The software will set the arm bits for those devices from which it wishes to accept interrupts, and will disarm those devices from which interrupts are not to be allowed. The second aspect is an enable bit in the nested vectored interrupt controller (NVIC). Each source has a separate enable in NVIC. The thjrd aspect that the software controls is the interrupt enable bit. The interrupt mask bit I is bit O of the special register PRIMASK. If this bit is l most interrupts are not allowed , which we will define as disabled. If the I bit is 0, then interrupts are allowed, which we will define as enabled . The fourth aspect is priority. The software will configure the

200

• 5. Real-time Systems priority of each interrupt. The MSPM0+ supports four levels of priority: 0 is highest and 3 is lowest. For example, if a device has priority I, then it can interrupt IS Rs running at 2 or 3, but it will pend (postpone) if an ISR is running at 0 or 1. The fifth aspect is the externa l event that sets the tTigger flag. Five conditions must be true to generate an interrupt: • • • • •

Device arm, a bit in the device (IMASK) specifying it should interrupt, NVIC enable, a bit in NVIC-> ISER allowing it to interrupt, Global interrupt enable, (I=0), Level, having a priority NVIC-> IP higber than the current level, and Trigger, a flag in tbe device set on the hardware event.

These five conditions must be simultaneously true, but they can occur in any order. If a trigger flag is set, but one or more of the other four conditions is false, the request is not dismissed. Rather the request is held pending, postponed until a later time, when all five conditions become true. An interrupt causes the following sequence of five events. 1. 2. 3. 4. 5.

Finish instruct.ion Push RO, R1, R2, R3, R12, LR, PC, and PSR on stack with the RO on top Sets LR = 0xFFFFFFF9 if interrupting main or 0xFFFFFFF1 if interrupting another ISR, bit 0 in the LR is the T bit, which is always 1, Sets the IPSR to the interrupt number being processed Sets the PC with the address of the ISR (vector)

These five steps, called a context switch, occur automatically in hardware as the context is switched from a foreground thread to a background thread. Note that the context switch does not set the l bit, so a higher priority trigger can interrupt an lSR running at lower priority. After the context switch, the ISR executes. An interrupt is ahardware-triggered software action. • • • • •

Touch a switch - edge-triggered interrupt - ISR processes event Network input - UART receive interrupt - ISR reads data, put FIFO Network outp ut - UART transmit interrupt---+ ISR get FIFO, writes data Sound generation - SysTick interrupt---+ ISR get data, writes data to DAC Sampling ---+ Timer interrupt ---+ ISR inputs from ADC, writes to buffer

Observation: An interrupt will be triggered when an action needs ro be performed. The software in the lSR will perform the action and then return to the previous operation.

We will pay special attention to these enable/d isable software actions. In other words, once the trigger flag is set, under most cases it remains set until the software clears it. The five necessary events (device arm, NVIC enable, global enable, level, and trigger) can occur in any order. For example, the software can set the I bit to prevent interrupts, run some code that needs to run to completion, and then clear the l bit. A trigger occurring while running with I= l is postponed until the time the I bit is cleared again. We will discuss critical sections in Section 8.2, which can occur when two threads perform read-modify-write access to a shared global or I/0 register. We will set the I bit during the read-modify-write access to remove the critical section .

Jonathan Valvano

201

Clearing a trigger flag is called acknowledgement. Each trigger flag has a specific action software must perform to clear that flag. The SysTick periodic interrupt will be the only example of an automatic acknowledgement. For SysTick, the periodic timer requests an interrupt, and the SysTick trigger flag wiJI be automatically cleared by running its ISR. For all the other trigger flags, the JSR must exp li citly execute code that clears the flag. The interrupt service toutine (JSR) is the software modu le that is executed when the hardware requests an interrupt. There may be one ISR that handles multiple requests (polled interrupts), or separate ISRs, one specific ISR for each potential source of interrupt (vectored interrupts). The design of the ISR requires careful consideration of many factors. Except for the SysTick interrupt, the ISR must clear the trigger flag that caused the interrupt (acknowledge). After the ISR provides the neces~a.iy service, it will execute BX LR. Because LR contains a special value (e.g., 0xFFFFFFF9 or 0xFFFFFFFI), BX LR pops the 8 registers from the stack, wh ich returns control to the place the software was prior to the interrupt trigger. There are two stack pointers: PSP and MSP. The software in this book will exclusively use the MSP. It is imperative that the ISR software balance the stack before exiting. Execution of the previous thread will then continue with the exact stack and register values that existed before the interrupt. Although interrupt handlers can create and use local va.i·iables, pa.i·ameter passing between threads must be implemented using shated global memory variables. An axiom with interrupt synchronization is that the JSR should execute as fast as possible. The interrupt should occur when it is time to perform a needed function, and the ISR should perform that function, and return right away . Placing backward branches (busy-wait loops, iterations) in the interrupt software should be avoided if possible. The percentage of time spent executing any one ISR should be minimized. Maintenance tip: The time to execute an ISR (L'.11) should be much less than the time between invocations of the interrupt (I) . This way, the percentage of time in tbe ISR (Llt/]) is small.

Many factors shou ld be considered when deciding the most appropriate mechanism to synchronize hardware and software. One should not always use busy wait because one is too lazy to implement the complexities of interrupts. On the other hand, one should not always use interrupts because they are fun and exciting. Busy-wait synchronization is appropriate when the 1/0 timing is predictable and when the 1/0 structure is simple and fixed. Busy wait should be used for dedicated single thread systems where there is nothing else to do while the I/O is busy. Interrupt synchronization is appropriate when the I/O timing is variable, and when the 1/0 structure is complex. In paiticular, intem1pts are efficient when there are multiple I/O devices with different speeds. Interrupts allow for quick response times to important events. In particular, using interrupts is one mechanism to design real-time systems, where the interface latency must be short and bounded. Bounded means it is always less than a specified value. Short means the specified va lue is acceptable to our consumers. Interrupts can also be used for infrequent but critical events like power failure, memory faults, and machine errors. Periodic interrupts will be useful for real-time clocks, data acquisition systems, and control systems. For extremely high bandwidth and low latency interfaces, direct memory access (OMA) should be used. An atomic operation is a sequence that once started will always finish, and cannot be interrupted. All instructions on the Cortex-M processor are atomic except STM LDM PUSH POP. Ifwe wish to make a section of code atomic, we can run that code with I=!. ln this way, interrupts w.ill not

202

• 5. Real-time Systems be able to break apart the sequence. Again, requested inteITupts that are triggered while I= I are not dismissed, but simply postponed until l=0. In paiticular, to implement an atomic operation we will 1) disable interrupts, 2) execute the operation, and 3) reenable interrupts. Checkpoint 5.2.1: What five conditions must be true for an interrupt to occu r? Checkpoint 5.2.2: How do yo u enable interrupts? Checkpoint 5.2.3: What are the steps that occur when an interrupt is processed?

As you develop experience using interrupts, you will come to notice a few common aspects that all computers share. The following paragraphs outline three essential mechanisms that are needed to utilize interrupts. Although every computer that uses interrupts includes all three mechanisms, there is a wide spectrum of implementation methods. All interrupting systems must have the ability for the hardware to request action from computer. In general, the interrupt requests can be generated using a separate connection to the processor for each device. On the Cortex-M, interrupts are connected through the NVIC. All interrupting systems must have the ability for the computer to determine the source. A vectored interrupt system employs separate connections for each device so that the computer can give automatic resolution. You can recognize a vectored system because each device has a separate interrupt vector address. With a polled interrupt system, the interrupt software must poll each device, looking for the device that requested the interrupt. The MSPM0 interrupts use both vectoring and polling. For a polled interrupt, the JSR must poll to see which trigger caused the interrupt. For example, all 32 input pins on a GPIO po1t can trigger an intem1pt, but the 32 trigger flags share the same vector. So if multiple pins on a GPIO port are armed, the shared JSR must poll to detennine which one(s) requested service. The third necessary component of the interface is the ability for the computer to acknowledge the interrupt. Normally there is a trigger flag in the interface that is set on the busy to ready state transition, i.e., when the device needs service. In essence, this trigger flag is the cause of the intem1pt. Acknowledging the interrupt involves clearing this flag. It is impo1tant to shut off the request, so that the computer will not mistakenly request a second (and inappropriate) interrupt service for the same condition. Except for periodic SysTick, MSPM0 microcont::rollers use software acknowledge. So when designing an interrupting interface, it will be impo1tant to know exactly what hardware conditions will set the trigger flag (and request an interrupt) and how the software will cleai· it (acknowledge) in the ISR. Common Error: The system will crash if the interrupt service routine doesn't either acknowledge or disarm the device re9uesti11g the interrupt. Common Error: The ISR software should not disable interrupts at tJ,e beginning nor should it reenable interrupts at the end. Which interrupts are allowed to run are automatically controlled by the priority set in the NVIC.

Jon athan Valvano

203

5.3. NVIC on the ARM Cortex-M Processor On the ARM Cortex-M processor, exceptions include resets, software interrupts and hardware interrupts. Each exception has an associated 32-bit vector that points to the memory location where the ISR that handles the exception is located . Vectors are stored in ROM at the beginning of memory. Program 5.3. l shows some vectors defined in startup_mspm0g3507 _ticlang.c . void (* canst interruptVectors[]) (void) = { (void (*) (void)) ((uint32 t)& STACK END), /* initial SP*/ Reset_Handler, /*-The-;eset handler */ NMI_Handler, /* The NMI handler */ HardFault_Handler, /* The hard fault handler *I 0, I* Reserved */ 0, /* Reserved */ 0, /* Reserved */ 0, /* Reserved */ 0, I* Reserved */ 0, /* Reserved *I 0, /* Reserved */ SVC_Handler, /* SVCall handler */ 0, /* Reserved */ 0, /* Reserved */ PendSV_Handler, /* The PendSV handler */ SysTick_Handler, /* SysTick handler */ GROUP0_IRQHandler, /* GROUP0 interrupt handler */ GROUPl IRQHandler, /* GROUPl interrupt handler */ TIMG8 IRQHandler, /* TIMG8 interrupt handler *I UART3_IRQHandler, /* UART3 interrupt handler */ ADC0_IRQHandler, /* ADC0 interrupt handler */ ADCl_IRQHandler, /* ADCl interrupt handler */ CANFDO IRQHandler, /* CANFD0 interrupt handler *I DAC0_IRQHandler, /* DAC0 interrupt handler */ 0, /* Reserved *I SPI0_IRQHandler, /* SPIO interrupt handler */ SPil_IRQHandler, /* SPil interrupt handler */ 0, I* Reserved */ 0, /* Reserved */ UARTl_IRQHandler, /* UARTl interrupt handler *I UART2 IRQHandler, /* UART2 interrupt handler */ UARTO_IRQHandler, /* UART0 interrupt handler */ TIMG0 IRQHandler, /* TIMG0 interrupt handler */ TIMG6_IRQHandler, /* TIMG6 interrupt handler */ TIMA0_IRQHandler, /* TIMA0 interrupt handler */ TIMAl_IRQHandler, /* TIMAl interrupt handler */ TIMG7_IRQHandler, /* TIMG7 interrupt handler */ TIMG12_IRQHandler, /* TIMG12 interrupt handler */

Program 5.3. 1. SojtJVare .ryntax to set the it1lerr11pt vectors for the NISPM0G3507.

204



5. Real-time Systems The C code in Program 5.3.1 defines the array of32-bit constants in ROM. Location 0x00000000 has the initial stack pointer, and location 0x00000004 contains the initial program counter, which is called the reset vector. It points to a function called the reset handler, which is the first thing executed following reset. The interrupt sources and their 32-bit vectors are listed in order starting with location 0x00000008. From a programming perspective, we can attach lSRs to interrupts by writing the ISRs as regular assembly subroutines or C functions with no input or output parameters. We could edit the startup file to match the name of our function. However in this book, we will write our lSRs using standard function names so that the startup file need not be edited. We name our lSR for Port B edge-triggered interrupts as GROUPl _ IRQHandler, and the code in Program 5.3 . l will create a 32-bit pointer located at ROM address 0x00000044 to point to our JSR. Because the vectors are in ROM , this linkage is defined at compile time and not at run time. For more details see the startup files within the interrupt examples posted on the book web site. Each processor is a little different so check your data sheet. Program 5.3.2 shows that an ISR is a simply function with no parameters. This function is not called by software, rather hardware invokes it. Each interrrupt has a mechanism to acknowledge. The SysTick has an automatic acknowledge and does not require software action to clear its trigger. In Program 5 .3 .2 the interrupt trigger flag is bit 21 of the RIS register, which is set on an edge of PB2 l input. We can clear bit 21 of the RIS register, acknowledge, by writing a one to bit 21 into the ICLR register. The acknowledge in Program 5.3.2 is friendly and will not affect the other 31 bits in RIS. We will cover edge-triggered interrupts later in Section 10.3.

GROUPl_IRQHandler //stuff LOR R0,=GPIOB_CPU_INT ICLR LOR Rl,#1IP[2] NVIC->IP[3] NVIC->IP[4] NVIC->IP[S] NVIC->IP[6] NVIC->IP[7] SCB->SHP[l]

Table 5.3.1. The MSPM0G3507 NVIC registers. Each register is 32 bits wide. Bits not shown are zero.

Jonathan Valvano Vector address 0x0000002C 0x00000038 0x0000003C 0x00000040 Ox00000044 0x00000048 Ox0000004C 0x00000050 0x00000054 0x00000058 0x0000005C 0x00000064 0x00000068 0x00000074 0x00000078 0x0000007C 0x00000080 Ox00000084 0x00000088 Ox0000008C 0x00000090 0x00000094

Number 11 14 15 16 17 18 19 20 21 22 23 25 26 29 30 31 32 33 34 35 36 37

IRQ -5 -2 -I 0 2 3 4 5 6 7 9 10 13 14 15 16 17 18 19 20 21

ISR name in startup SVC Handler PendSV Handler SysTick_Handler GROUP0_IRQHandler GROUPl_IRQHandler TIMG8_IRQHandler UART3_IRQHandler ADC0_IRQHandler ADCl_IRQHandler CANFD0_IRQHandler DAC0_IRQHandler SPI0_IRQHandler SPil_IRQHandler UARTl_IRQHandler UART2_IRQHandler UART0_IRQHandler TIMG0_IRQHandler TIMG6_IRQHandler TIMAO_IRQHandler TIMAl_IRQHandler TIMG7_IRQHandler TIMG12_IRQHandler

Usage Software interrupt Software interrupt to OS Periodic timer Port A edge triggered Port B edge triggered Timer Asynchronous serial I/0 Analog to digital Analog to digital Controller area network Digital to analog Synclu-onous se1ial I/0 Synclu-onous serial 1/0 Asynchronous serial I/0 Asynchronous serial I/0 Asynchronous serial I/0 Timer Timer Timer Timer Timer Timer

Table 5.3.2. Some of the interrupt vectors for the MSPM0G3507. ISR name in startup

PendSV Handler SysTick_Handler GROUP0_IRQHandler GROUPl_IRQHandler TIMG8_IRQHandler UART3_IRQHandler ADC0_IRQHandler ADCl_IRQHandler CANFD0_IRQHandler DAC0_IRQHandler SPI0_IRQHandler SPil_IRQHandler UARTl_IRQHandler UART2_IRQHandler UART0_IRQHandler TIMG0_IRQHandler TIMG6_IRQHandler TIMA0_IRQHandler TIMAl_IRQHandler TIMG7_IRQHandler TIMG12_IRQHandler

NVIC priority

SCB->S1-l:P[l] SCB->S1-l:P[l] NVIC->lP[0] NVIC->IP[0] NVIC->lP[0] NVIC->lP[0] NVIC->lP[l] NVIC->lP[l] NVIC->IP[l] NVIC->IP[l] NVIC->IP[2] NVIC->IP[2] NVIC->lP[3] NVIC->IP[3] NVIC->lP[3] NVIC->lP[4] NVIC->IP[4] NVIC->IP[4] NVIC->IP[4] NVIC->lP[S] NVIC->IP[S]

Priority bits 23 31 715 23 31 715 23 31 15 23 15 23 31 715 23 31 715 -

22 30 6 14 22 30 6 14 22 30 14 22 14 22 30 6 14 22 30 6 14

NVIC enable

Enable bit

NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0]

0 I 2 3 4 5 6 7 9 10 13 14 15 16 17 18 19 20 21

Table 5.3.3. Some of the interrupt priority and enable registers for the MSPM0G3507.

205

206

• 5. Real-time Systems The Nested Vectored Interrupt Controller (NVIC) controls interrupts on the Cortex-M. Table 5.3.2 lists some interrupt sources available on the MSPM0. lnte1Tupt numbers Oto 15 contain the faults, software interrupt, PendSV, and SysTick. lnterrupt numbers 16 to 47 are device specific. During the context switch, the 6-bit interrupt number (Number column in Table 5.3 .2) is loaded into the IPSR register. Table 5.3.3 shows the priority and enab le bits. To activate an interrupt source we need to set its priority and enable that source in the NVIC. This activation is in addition to the arm and enable steps. Checkpoint 5.3.1: Where is the vector for SysTick? \'(!hat is the standard name for this ISIP

We write a 1 to bit n in NVIC->ISER[O] to enable IRQ number n. It is a write one to enable. For example, NVIC->ISER [ 0] =2; enables IRQ I , which is the Port B edge-triggered interrupt. We write a one to bit n in NVIC->ICER [ 0] to disable IRQ number n. lt is a write one to disable. NVIC->ICER [ O] =2; disables JRQ I. These are friendly , because writing Os to ISER and lCER have no affect. Figure 5.3 .1 shows the context switch from executing in the foreground to running a SysTick JSR. The I bit in the PRIMASK is 0 signifying interrupts are enabled . The interrupt number in the IPSR register is 0, meaning we are running in Thread mode (i.e. , the main program, and not an ISR). Handler mode (i.e., an lSR) is signified by a nonzero value in IPSR. When a SysTick intcnupt is triggered, the current instruction is finished. (a) Eight registers are pushed on the stack with RO on top . These registers are pushed onto the stack using whichever stack pointer is currently active: either the MSP or PSP. Secure operating systems will run user code with PSP and system code with MSP. We will always use the MSP. (b) The vector address from 0x0000003C is loaded into the PC ("Vector address" column in Table 5.3 .2). (c) The lPSR register is set to 15 ("Number" column in Table 5.3 .2) (d) The top 24 bits of LR are set to 0xFFFFFF, signifying the processor is executing an ISR. The bottom eight bits specify how to return from interrupt. Other CortexM processors have other possible values. For us, we have: OxF1 Return to Handler mode MSP OxF9 Return to Thread mode MSP -

we will mostly be using this one

After pushing the registers, the processor always uses the main stack pointer (MSP) during the execution of the ISR. Events b, c, and d can occur simultaneously.

I [ill

Before interrupt RAM

IPSR@=]

Context Switch Finish instruction a) Push registers

[Q]

1

IPSR@=j

b) PC= {0x0000003C} c) Set IPSR = 15 d) Set LR= 0xFFFFFFF9

Use MSP as stack pointer

MSP ;

Stack

Fzgtm 5.3.1 . Stack before a11d after a .SjsTick interrupt.

Ajier interrupt

MSP

old RO old Rl old R2 old R3 old Rl2 old LR old PC old PSR Stack

Jonathan Valvano

207

To return from an interrupt, the ISR executes the typical function retw-n BX LR. However, since the top 24 bits of LR are 0xFFFFFF, it will return from interrupt by popping the eight registers off the stack. Since the bottom eight bits of LR in this case are Ob 1111100 I, it returns to thread mode using the MSP as its stack pointer. Since the lPSR is pa1t of the PSR that is popped, it is automatically reset its previous state. A nested interrupt occurs when a higher priority interrupt suspends an !SR. The lower priority inte1TUpt will finish after the higher primity ISR completes. When one interrupt preempts another, the LR is set to 0xFFFFFFF I, so it knows to return to handler mode. Tail chaining occurs when one JSR executes immediately after another. Optimization occurs because the eight registers need not be popped only to be pushed once again. If an interrupt is triggered and is in the process of stacking registers when a higher priority intem1pt is requested , this late arrival interrupt will be executed first. Priority determines the order of service when two or more requests are made simultaneously. Priority also allows a higher priority request to suspend a lower priority request currently being processed. Usually, if two requests have the same priority, we do not allow them to interrupt each other. The software assigns a priority level to each interrupt trigger in the NVIC. This mechanism allows a higher priority trigger to intem1pt the JSR of a lower priority request. Conversely, if a lower priority request occurs while running an ISR of a higher priority trigger, it will be postponed until the higher priority service is complete. Observation: There arc many interrupt sources, but an effective system will use only a few.

Program 5.3.3 gives the Clang compiler definitions that allow the software to enable and disable intetTUpts. The CPSIE I instruction enables interrupts, and the CPSID I instruction disables interrrupts. When configuring a system with interrupts, we execute _disable_irq () ; at the start of main, perform all initializations, and then execute _enable_irq () ;

STATIC FORCEINLINE void _enable_irq(void){ _ASM volatile ("cpsie i" : "memory") ; STATIC FORCEINLINE void _disable_irq(void){ _ASM volatile ("cpsid i" : : "memory"); }

PrograJJI 5.3.3. C definitions needed for intermpt enabliug and disabling 11si11g Clang co1J1piler.

5.4. SysTick Periodic Interrupts To generate sound we need to output new data to the digital to analog converter at a regular frequency, called the sampling rate. The time in between DAC outputs must be equal and known to create quality sound. Similarly for a data acquisition system the time in between ADC inputs must be equal and known in order for the digital signal processing to function properly. A microcontroller-based control systems, also requires periodic execution. The SysTick timer is a simple way to create periodic interrupts. A periodic interrupt is one that is requested on a fixed time basis. Table 5.4.1 shows the SysTick registers used to create a

208



5. Real-time Systems periodic interrupt. SysTick has a 24-bit counter called VAL, which decrements at the bus clock frequency. First, we clear the ENABLE bit to tum off SysTick during initialization. Second, we set the LOAD register. Third, we write to the VAL to clear the counter. Lastly, we write the desired mode to the CTRL register. We must set CLK_SRC=l, because CLK_SRC=0 external clock mode is not implemented on the MSPM0 family . We set INTEN to enable interrupts. We establish the priority of the SysTick interrupts using the TICK field in the SCB->SHP [ 1] register. We need to set the ENABLE bit so the counter will run. When the VAL counts down from I to 0, the COUNT flag is set. On the next clock, the VAL is loaded with the LOAD value. In this way, the SysTick counter (VAL) is continuously decrementing. If the LOAD value is n, then the SysTick counter operates at modulo n+l ( .. .n, n-1, n-2 ... 1, 0, n, n-1, ... ). In other words, it rolls over every n+ l cotmts. Thus, the COUNT flag will be configured to trigger an interrupt every n+ l counts. Let/sus be the frequency of the bus clock, and let n be the value of the LOAD register. The frequency of the periodic interrupt will be f; =f8 usl(n+ 1). 31-24 0 0 0

Address 0xEO00E0I0 0xE000E014 0xE000E018 Address OxE000ED20

31-30 TICK

23 - 17 0

I I

29-24 0

2 15-3 I I 0 I I 0 I CLK SRC I TNTEN I ENABLE 24-bit RELOAD value 24-bit ctment VALUE of SvsTick counter

16 COUNT

I I

23-22 PENDSV

21-0 0

Name SysTick->CTRL SysTick - >LOAD SysTick->VAL Name SCB->SHP[l]

Table 5.4.1. SysTick registers.

Program 5.4.1 shows a simple example of SysTick. SysTick and PendSV are the only interrupts on the Cortex-M that have an automatic acknowledge. Notice there is no explicit software step in the ISR to clear the trigger flag. The triple toggle debugging technique allows you to measure the time between interrupts (500ms) and the time within an interrupt (300ns). See Figure 5.4.1. volatile uint32_t Counts; void SysTick_IntArm(uint32_t period, uint32 t priority){ SysTick->CTRL = 0x00; II disable du~ing initialization SysTick->LOAD = period-1; II set reload register SCB->SHP[l] (SCB->SHP[l]&(~0xC0000000)) lpriorityCTRL = 0x07; II Enable SysTick IRQ and SysTick Timer

= =

}

void SysTick_Handler(void) { GPIOB->DOUTTGL31_0 = (1CPU_INT.IIDX) == 1){ // this will acknowledge data= wave[I0]+wave[Il]; // 5-bit signal GPIOB->DOUT31_0 = (GPIOB->DOUT31_0&(~0x1F)) ldata; // output one IO= (I0+l)&OxlF;

void TIMGS IRQHandler(void){uint32 t data; if((TIMG8->CPU_INT . IIDX) == 1){ // this will acknowledge data= wave[I0]+wave[Il]; // 5-bit signal GPIOB->DOUT31_0 = (GPIOB->DOUT31_0&(~0x1F)) ldata; // output one Il = (Il+l)&0xlF; }

int main(void){ _disable_irq(); DAC_Init(); // 5-bit DAC Clock_InitS0MHz(0); TimerG0_IntArm(4771,1,0); //40M/4771/32 TimerG8_IntArm(3189,l,0); //40M/3189/32 _enable_irq(); while (1) { }

= =

261.999GHz 391.9724Hz

}

Program 5.7.2. TUJo 4-bit sine 1vaves are added before outputting /.o the 5-hit DAC.

218



5. Real-time Systems

5.8. Internal 12-bit DAC The MSPM03507 has an internal 12-bit DAC. Program 5.8.1 provides the initialization and conversion functions to create an analog output on PA 15. To reduce noise, we will activate the internal 2.5V voltage reference (VREF) module. Using the reference makes the range O to 2.5V. Thus, the resolution will be 2.5V/4095, which is about 0.6 mY. The initialization begins by resetting and activating the VREF and DAC. The YREF is configured with the bus clock (CLKSEL), divide by l (CLKDIY), and enabled (CTLO) . The while loop waits for the VREF to stabilize. The DAC is configured with 12-bit straight binary (CTLO), VREF (CTLl), no DMA (CTL2), 200 ksps speed (CTL3). To initiate an analog to digital conversion, the software simple writes to the DAT AO register. For more information, see the data sheet for the microcontroller. void DAC Init(void){ VREF->GPRCM.RSTCTL = (uint32_t)0xB1000003; DAC0->GPRCM.RSTCTL = (uint32_t)0xB1000003; VREF->GPRCM.PWREN = (uint32_t)0x26000001; DAC0->GPRCM.PWREN = (uint32_t)0x26000001; Clock Delay(24); II time for ADC and VREF to power up VREF->CLKSEL = 0x00000008; II bus clock VREF->CLKDIV = 0; II divide by 1 VREF->CTL0 = 0x000l; VREF->CTL2 = 0; while((VREF->CTLl&0x0l)==0) {}; II wait for VREF to be ready DAC0->CTL0 = 0x0100; II 12-bit, straight, disable DAC0->CTLl = (1DOUTTGL31 0 = REDl; Index= Index+l; if(Index > 63) Index= 0; DAC_Out(Wave[Index]);

Program 5.8.2. System to create a 3.1 25 kHz analog sine wave (DAC). 3.0

V

:.o •. 0

0.0

"' Figure 5.8.1. Oscilloscope output of the 3.125 kHz sine 1vave. dSu

·5.0 - ; 3.

3.1:3kr :

1.056 dBu

.J

-.:s.o -37 1 6 dBu --15.0

Figure 5.8.2. Spectrum ana!Jzer output of the 3. 125 kHz sine 1vave.

219



220

5. Real-time Systems

5.9. Real-time Debugging Tools These are our favorite minimally intrusive or nonintrusive debugging techniques: • • • • • •

Dump of data and time Triple toggle during an ISR to measure time within and between interrupts Logic analyzer measuring digital signals versus time Oscilloscope measuring analog signals versus time Spectrum analyzer measuring analog signals versus frequency Digital voltmeter in AC mode to measure voltage noise.

We can't just say our system is real time, we must prove it. The dump, logic analyzer, and oscilloscope allow us to collect time measurements to demonstrate the real time behavior of our system. There are many low cost instruments that we have used and enjoyed in our labs • • • •

Digilent: Analog Discovery 2 Liquid Instrum ents: Moku Go, Lab, Pro Pico Technologies: Picoscope, PicoLog Saleae: Logic

Remember the hallmark of effective debugging is control and observability. For an oscilloscope and logic analyzer, control is ach ieved by proper triggering. For a scope we set a voltage threshold and a rising or falling edge to trigger on. For a logic analyzer, there can be a complex combination of rising, falling, high, or low conditions from the signals on which we can trigger. Running at 80 MHz, the system generates more information than we can see with our eyes or comprehend with our brians. Triggering controls the time axis so we can focus on observing impo1tant events of our system. Once we have established an appropriate trigger, wc can zoom in or out on the time axis to see what we need. Triggering is an imp01tant but complicated process. So, read the manual for your instruments and experiment with its different tJigger modes. Oscilloscopes and logic analyzers also have a rich set of modes to assist observability. In the time domain, these instruments can measure period, frequency, pulsewidth, and duty cycle. An oscilloscope can measure maximum voltage, minimum voltage, average voltage, and root-rneansquared (RMS). If the input is supposed to be constant, RMS is a measure of noise. Most oscilloscopes have a spectrum analyzer mode, allowing you to observe amplitude versus frequency . Logic analyzers can decode digital signals for protocols like UART, SPT, and 12C. Ready

Name

!

4096 samples at 4 MHz

I 2022-12-15 09:33: 10.453

- I2C

DIO O X

Clock Data

XT

__

■•n•fi ~ -20 us

so us

Agure 5.9.1 A logic anc1!yzer triggered on the fall of Data, and decoded using the I2C protocol.

6. Variables, Conversions, and LCD Output

6.1. Local, Static, and Global Variables There are two characteristics that define a variable. The first characteristic is allocation. A variable may have permanent or temporary allocation. Permanent means it exists forwever. Temporary means it is dynamically created, used, and then destroyed. The second characteristic is scope. A variable has public scope if it can be accessed anywhere in the software system. A variable has private scope if the access is restricted to only some of the software. In C, we can restrict scope to one file, one function, or even one part of a function. A local variable is created by defining it inside a function. Local variables have these characteristics

• Temporary a!Jocation: dynamica!Jy created, used and then released • Private scope, only software within the {} can access it • Implemented in registers or on the stack • Each time the function is ca!Jed a new instance is created • Your software must explicitly initialize each instance. A global variable is allocated at a pe1manent and fixed location in RAM and has public scope. A global variable created by defining it outside a function. Since it is public, a global variable contains information that can be shared by more than one program module . Global variables have these characteristics

• Permanent a!Jocation: created at compile time and never released • Public sDOUTSET31_0

= 0x0000000l;

}

In C, a const is an object that is read-only. Global and static constants allocated in the ROM portion of memory. Since they are read-only, constants must be initialized at compile time. E.g.,

const uint8 t SinTable[S]={0,50,98,142,180,212,236,250}; const static int16_t Slope=21; II private to file

II

public

An advantage of local variables, al located either in registers or on the stack, versus global variables is that memory and register resources are reused: dynamically allocated before usage and deallocated after usage. The second advantage of local variables versus global variables is reentrant code. To illustrate reentrant code, consider the sqrt2 function with its two local variables n and t of Program 6.1.1. Consider the situation where the main program calls sqrt2, gets halfway through when the SysTick interrupt occurs. The JSR also calls sqrt2 . Notice main has entered sqrt2, and then the ISR reenters sqrt2. This sqrt2 operates successfully, called reentrant, because n and t are local, stored in registers or on the stack. When the ISR returns control back to main, the co1Tect values of n and t for the main thread are restored. If n and t were allocated as globals, then the values from the main invocation would be lost when the function is reentered by the JSR. Defining n and t as globals would make sqrt2 nonreentrant.

}

int main(void){ uint32_t x,y; X = 1000 ; while (1) { y = sqrt2 (x) ; x++ ;

return t;}

}

uint32 t sqrt2(uint32_t s){ uint32 t n,t; t = s/16+1; for(n = 16; n; --n) { t = ( (t*t+s) /t) /2;

uint32 _t x2=1000,y2; void SysTick_Handler(void){ y2 = sqrt2(x2) ; x2++ ; }

Progra111 6.1.1. A fimctio11 that is called jrolll IJJJo di.ffere11/ threads. Because of the public scope, it is a poor design to employ global variables. However, global or static variables are necessary to store data that are permanent in nature. We must use global or static variables to pass data between the main program (i.e., foreground thread) and an ISR (i.e., background thread). Similarly, we use global or static variables to pass data from one ISR to another, or from one instance of an 1SR to another instance of the same ISR.

Jonathan Valvano

223

Observation: It is good prnctice to reduce the scope of variables and functions as much as possible, because it simplifies complexity .

To better understand local variables on the stack, we can observe register usage in a simple assembly traffic controller FSM, shown here as Program 6.1.2. A local variable contains tempora1y and private inf01mation. Local variables are allocated, used, and then deallocated, in this specific order. When we use registers to implement local variables, we do not employ any forma l syntax describing which infonnation is in which registers, but we do follow a process:

1. 2. 3.

We put something in a register, We use the something in that register, and then We stop caring that the something is in that register.

For example, focus on RI in this example. Starting in line 8, we put the contents of all of Port B in Rl. In lines 9-11 we modify and use this information. However, after line 11 , we no longer care about the value in Rl. More specifically, line 8 is the allocation and initialization of the local variable in R l. Lines 9-I I use the local variable. After line 11, we no longer care about what is in RI , meaning the local variable is deallocated. There will be a certain line in the assembly software at which the register begins to contain the variable (allocation), followed by lines where the register contains the information (access or usage), and a certain line in the software after which we no longer care about the contents in the register (deallocation). The variables in R4 RS and R6 a.re a little different in the sense that they are allocated and used, but never deallocated. These variables are never deallocated because on an embedded system, the main never finishes. Line 1 2 3 4 5

6 7

8 9 10 11 12 13 14 15 16 17 18 19 20 21

Program

RO

main: MOVS R0,#0 BL Clock InitBOMHz BL SysTick_Init BL Traffic Init LDR R4,=goN LDR RS,=GPIOB_DOUT31_ 0 LDR R6,=GPIOB_DIN31_0 loop: LDR RO, [R4, #OUT] LDR Rl, [RS] MOVS R2,#0x3F BICS Rl,Rl,R2 ORRS RO,RO,Rl STR RO, [RS] LDR RO, [R4, #WAIT] BL SysTick_Waitl0ms LDR RO, [R6] MOVS Rl,#0xC0 ANDS R0,R0,Rl LSRS R0,R0,#4 ADDS R0,R0,#NEXT LDR R4, [R4 ,RO] B loop

#0

OUT OUT OUT OUT OUT OUT WAIT IN IN IN Index Index Index

RI

PortB PortB PortB PortB

0xC0 0xC0

R2

Ox3F Ox3F

R4

RS

R6

Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt

OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt OUTpt

INpt INpt INpt INpt INpt INpt INpt INpt INpt INpt INpt INpt INpt INpt INpt INpt

Program 6.1 .2. Register uwge in a finite state t11achine controller (repeated froll! Progm111 5.2.4). Observation: All variables have a type such as integer or pointer. In C, that type is explicitly defined when tl1 ey are created. W hereas in assembly, the type is implied by how we use it.

224

• 6. Variables, Conversions, and LCD Output

6.2. Stack rules A deep understanding of the stack is critical when implementing local variables on the stack. So important, we repeat the rules for proper use of the stack here. The stack pointer (SP) on the Cortex-M processor points to the top entry of the stack, as shown in Figure 6.2. l. In other words, SP points to data. Entries on the stack are 32-bits wide. If it exists, we define the data immediately below the top (larger memory address) as next to top . To push a 32-bit word on the stack, we first decrement the SP by 4, and then we store that word at the location pointed to by the SP. To pop a byte from the stack, first we read the word from memory pointed to by SP, and then we increment the SP by 4. The compiler will assign a fixed-size area in RAM for the stack drawn as five 32-bit entries in Figure 6.2.1 . Look in your .map files to see your stack area is separate from the area in RAM the compiler places global variables. At reset, the SP is initialized like the left image of Figure 6.2.1 pointing below the stack area, which defines an empty stack.

Empty Sta~c_k_ _~

Stack with 3 elements

SP

,

top next

SP--• Fig11re 6.2.1. Each e.nt~y on the stack is 32 hits. The white boxes are free, and the shaded hoxes contain data. Checkpoint 6.1.3: Why is the SP initialized co point outside of the stack area?

You might have thought push and pop were the only ways to access stack data. However, we can also read and write previously allocated locations on the stack using SP-indexed addressing mode without modifying the SP. For example, to read the 32-bit value from the next to top word,

LDR RO, [SP, #4)

//RO= next to the top

Interrupts, the PUSH instruction, and the POP instruction are three operations to modify the stack pointer. However, we can subtract mu ltip les of 4 from the SP to allocate stack space, and add multiples of 4 to the SP to deallocate stack space. The LIFO stack has these rules (repeated from Chapter 1)

1. AU function s should have an equal number of pushes and pops, calJed balanced 2. Stack accesses (push or pop) should not be performed outside the allocated area 3. Stack reads and writes should not be performed within the free area 4. Stack push should first decrement SP by 4, then store the data 5. Stack pop should first read the data, and then increment SP by 4

Rule J) Functions that do not balance the stack will probably crash in weird and confusing ways.

Jonathan Valvano

225

Rule 2) Accessing memory below the allocated area is called stack underflow. Stack underflow is caused when there are more pops than pushes, and it is always the result of a software bug. Accessing memory above the allocated area is called stack overflow. A stack overflow can be caused by two reasons. lfthe software mistakenly pushes more than it pops, then the stack pointer will eventually overflow its bounds. Even when there is exactly one pop for each push, a stack overflow can occur if the stack is not allocated large enough. Stack overflow is a very difficult bug to recognize, becaLtse the first consequence occurs when the computer pushes data onto the stack and overwrites data stored in a global variable. At this point the local variables and global variables exist at overlapping addresses. Checkpoint 6.2.1: Where in a CCS project do yo u specify the size of your stack?

Rule 3) Figure 6.2.l shows the free area as white boxes. The following assembly code violates rule 3 and will not work if interrupts are active. The objective is to save and then restore Register RO using the stack. lf an interrupt were to occur between the STR and LOR instructions, the context switch would push registers onto the stack, destroying the saved data.

II

MOV SUBS STR later LDR

Rl,SP Rl,Rl,#8 RO, [Rl]

II II II

Rl points to stack Rl points to free area Save RO on stack (***illegal***)

RO, [Rl]

II

Restore RO from stack (***illegal***)

Rules 4 and 5) The PUSH and POP instructions automatically follow Rules 4 and 5. If a subroutine modiffos R4- Rl 1, it is required by AAPCS to save and restore the register. Conversely, it can freely change R0-R3 and R12. Fmthermore, if one subroutine calls another subroutine, then it must save and restore the LR. In the following examp le, assume the function modifies Register RO, R..4, RS and calls another function. AAPCS dictates registers R4, RS , and LR be saved. Notice the return address is pushed on the stack as LR but popped off into PC. When multiple registers are pushed or popped, the data exist in memory with the lowest numbered register occurring in the lowest mem01y address. In other words, the registers in the {} can be specified in any order, but the order in which they appear on the stack is fixed. Of course remember to balance the stack by having the same number of pops as pushes. Fune:

PUSH {R4,R5,LR} II save registers as needed Ill) allocate local variables 112) body of the function, access local variables 113) deallocate local variables POP {R4,R5,PC} II restore registers and return

6.3. Local variables allocated on the stack There are two advantages of allocating local variables on the stack versus in registers. First, we must use the stack for local variab les ifwe have more variables than registers. Second, we must use the stack when creating local arrays. Stack implementation oflocal variables has four stages: binding, allocation, access, and deallocation.

226



6. Variables, Conversions, and LCD Output Phase 1. Binding is the assignment of the address (not value) to a symbolic name. The symbolic name will be used by the programmer when referring to the local variable. The assembler binds the symbolic name to a stack index, and the computer calculates the physical location during execution. In the following example, the local variable will be at address SP+O, and the programmer will access the variable using [SP, #sum] addressing: .equ sum,O Phase 2. Allocation is the creation of memory storage for the local vaiiable. The software allocates space during execution by decrementing the SP. In this first example, the software allocates the a local variable simply by pushing a register on the stack. The contents of the register become the initial value of the variable. This method is appropriate when we wish to allocate local variables with initial values. MOVS R0,#0 PUSH {RO} //allocate and initialize one 32-bit local variable In this next example, the software allocates two local variables by decrementing the stack pointer by 8. These two local variables are uninitialized. This method is most general, allowing the allocation of an arbitrary amount of data. SUB SP,#8

//allocate two 32-bit local variables

Phase 3. The access to a local variable is a read or write operation that occurs during execution. Because we use SP addressing with offset, we will only use LOR and STR to access local variables on the stack. ln the following code, we will add 100 to the local variable sum. LOR RO, [SP,#sum] ADDS R0,#100 STR RO, [SP,#sum]

// Rl=sum // Rl=sum+lOO II sum=sum+lOO

In the next code fragment, the local variable sum is divided by 16.

LOR Rl, [SP,#sum] LSRS R2,Rl,#4 STR R2, [SP,#sum]

// RO=sum // sum=sum/16

In the above two examples, we consider the local variable as the value on the stack. In other words, RO - R2 contain temporary calculations but not the local variable itself.

Phase 4. Deallocation is the release of memory storage for the location variable. The software deallocates space during execution by incrementing SP. The software deallocates two local variables by incrementing the stack pointer by 8. When deallocating, we must balance the stack. I.e. , we add to the SP the correct number needed to balance the stack. ADD SP,#8

//deallocate two variables

Checkpoint 6.3.1: What stack rule does this instruction violate? STR R0,[SP,#-4] Checkpoint 6.3.2: Write a subroutine that allocates tben deallocates three 32-bit locals.

Program 6.3.1 implements a l 0-element local anay called data. It is impossible to store arrays in registers. Rather, we place local arrays on the stack. For Phase I binding, we draw a picture showing where the array is on the stack. Figure 6.3 .1 shows the stack before and after the

Jonathan Valvano

227

allocation. For Phase 2, the SUB instruction allocates 10 words on the stack. During Phase 3, we access the data. The SP points to the first location of data. The local variable i is held in RO. RI will contain 4*i as an offset into the array, because each entry is 4 bytes. R2 is the address of element data [i] . The addressing mode [R2] accesses data on the stack without modifying the stack pointer. For Phase 4, the ADD instruction deallocates the local array, balancing the stack.

Set:

SUB MOVS loop: LSLS ADD STR ADDS CMP BLT ADD BX

SP,SP,#40 R0,#0 Rl,R0,#2 R2,Rl,SP R0,[R2] R0,R0,#1 R0,#10 loop SP,SP,#40 LR

//2)allocate 10 words //3)i=0 //3)4*i //3)SP+4*i => data[i] //3)data[i]=i; //3)i++ //3) //3) //4)deallocate

void Set(void){ uint32 t data[l0]; int i=0; do{ data[i] = i; i++; }while(i < 10);

Program 6.3.1. Assemh61 and Cflmctiow that allocate and initialize a local arrc9 oftm ele1JJent.r.

SUB SP,SP,#40

SP--:

0x2020 0000 data[0]

...

data [9]

SP -,



ADD SP,SP,#40

0x2020. 7FFC

Figure 6.3.1. A slack picture sho1vi11g a local array of fet1 elevm1/s. There are five types of data that may be saved on the stack . The first four are created at the beginning of the frmction and constitute the stack frame . Each time a fun ction is called a new stack frame is created . The return address, saved registers, and local variables can be stored in any order. Tempora1y calculations can occur anywhere during the execution of the function .

• • • • •

Parameters Return address Saved registers Local variab les Temporary calculations

First, by AA CPS, if there are more than 4 input parameters, additional parameters above 4 will be pushed on the stack by the calling program. Every function must balance the stack. So if program A pushes some parameters on the stack, calls function B, then program A will remove the parameters. I.e., both program A and function B separately balance tl1eir stacks. Second, if a function calls another function, the LR (return address) must be pushed on the stack.

228



6. Variables, Conversions, and LCD Output Third, by AAPCS if the function uses registers R4-Rl 1, it will push them on the stack so their values are preserved. Fow-th, the function may allocate local variables on the stack. Lastly, some algorithms are so complex they require temporary calculations, which could be stored on the stack. One limitation of SP indexed addressing mode to access local variables is the difficulty of pushing temporary calculations onto the stack during the execution of the function. In paiticular, if the body of the function pushes additional items on the stack, the symbolic binding becomes incorrect. There are two approaches to this problem. First, we could recompute the binding after each stack push/pop. Second, we could assign a second register to point into the stack. To employ a stack frame pointer we execute the initial steps of the function: saving LR, saving registers, and allocating local variables on the stack. Once these initial steps are complete, we set another register to point into the stack. Because R4-R 11 will be saved and restored any of these would be appropriate for the stack frame pointer. E.g. , M0V R7,SP

This stack frame pointer (R7) points to the local variables and parameters of the function. It is important in this implementation that once the stack frame pointer is established (e.g., using the M0V R7, SP instruction), that the stack frame register (R7) not be modified. The term frame refers to the fact that the pointer value is fixed. lfR7 is a fixed pointer to the set oflocal variables, then a fixed binding (using the .equ pseudo op) can be established between Register R7 and the local variables and parameters, even if additional information is pushed on the stack. Because the stack frame pointer should not be modified, every subroutine will save the old stack frame pointer of the function that called the subroutine and restore it before returning. Local variable access uses indexed addressing mode using Register R7. Observation: One advantage of using a stack frame is that you can push and pop within the body of the function, and still be able to access local variables using their. symbolic name. Observation: \Xii th a processor like the ARM with lots of registers, it is not a clisadvantage to dedicate a register as a stack frame pointer, and thus making it unavailable for general use.

In C, we can define a local variable after any open brace {. The compiler will usually allocate local variables in registers, but in this section, we will place all local variables on the stack. Programs 6.3.2 and 6.3 .3 calculate the 32-bit sum of the first 1000 numbers. The purpose of this simple program is to demonstrate various implementations of local variables. According to AAPCS, the result will be returned by value in Register RO . The Figme 6.3.2 shows the local variables with SP indexed addressing. Program 6.3.2 shows an implementation using regular stack pointer addressing, drawn in Figure 6.3.2. The binding is not necessary, but its usage greatly improves understanding . For example the access to the variable n could be performed using [SP, #4] , but [SP, #n] addressing mode is easier to understand. The binding creates exactly the same machine code as without binding, but it is easier to understand because the variables are referred to by symbolic names.

Jonathan Valvano

SP

sum

[SP,#0]

n

[SP,#4]

229

other data other data +- 32 bits ~ Figure 6.3.2. The stack frame includes return address, registers, c111d local variables. The local variables are accessed with SP-indexed addressing ,vode. // *****binding phase************** .equ sum,0 .equ n,4 II l)*****allocation phase******** Cale: MOVS R0,#0 //initial sum LDR Rl,=1000 //initial n PUSH {R0,Rl} //allocate n,sum II 2)******access phase*********** loop: LDR Rl, [SP,#n] //Rl=n LDR R0,[SP,#sum] //R0=sum ADD R0,Rl //R0=sum+n STR RO, [SP,#sum] //sum=sum+n LDR Rl, [SP,#n] //Rl=n SUBS Rl,#1 //n-1 STR Rl, [SP,#n] //n=n-1 BNE loop // 3)******deallocation phase**** LDR RO, [SP, #sum] ADD SP,#8 //deallocation BX LR //R0=sum

uint32_t Calc(void){ uint32_t sum,n; sum= 0 ; n = 1000 ; do{ sum= sum+n ; n - -; }while(n != 0); return sum ;

Program 6.3.2. Stack pointer implementation of a function 1JJith t1JJ0 local 32-hit variables. Program 6.3.3 shows an implementation using stack frame pointer addressing, drawn in Figure 6.3 .3. The program establishes the frame poin ter in R7, and then it allocates the variables.

SP R7

sum

[R7,#0]

n

[R7,#4]

R7 return address

other data +- 32 bits -+Figure 6.3.3. The stack fraJJ1e includes return address, registers, and local vmiables. The local variables are accessed 1Pith R.7-indexed addressi11g modes.

230



6. Variables, Conversions, and LCD Output In Program 6.3.3 , the variable n is accessed using the [R7, #n] addressing mode. The pushing of RO and Rl allocates and initializes the local variables. The pushing of R7 is required by AAPCS. The pushing of LR allowes the pop instruction to both restore R7 and return from the function. Notice the similarity between Program 6.3.2 and Program 6.3.3. The stack frame pointer implementation is only one instruction longer than the stack pointer version. However, the body of Program 6.3.3 is free to push additional data on the stack.

// *****binding phase************** .equ sum,0 .equ n,4 // l)*****allocation phase******** Cale: MOVS R0,#0 //initial sum LDR Rl,=1000 //initial n PUSH {R0,Rl,R7,LR) //allocate MOV R7,SP //frame pointer // 2)******access phase************ //Rl=n loop: LDR Rl, [R7,#n] LDR RO, [R7 , #sum] //R0=sum //R0=sum+n ADD R0,Rl STR RO, [R7,#sum] //sum=sum+n //Rl=n LDR Rl, [R7, #n] SUBS Rl,#1 //n-1 //n=n-1 STR Rl, [R7, #n] BNE loop // 3)******deallocation phase**** LDR RO, [R7,#sum] ADD SP,#8 //deallocation POP {R7,PC} //R0=sum

uint32_t Calc(void){ uint32_t sum,n; sum= 0; n = 1000; do{ sum= sum+n; n--; }while(n != 0); return sum;

Progra111 6.3.3. Stackfm111e pointer (Rl) implementation of ajimction 111ith hvo local 32-bit vciriahles. Program 6.3.4 illustrates an example function with 5 parameters and one local variable. Since it calls another function , it will save the LR. The main program will push the fifth parameter. The function will push the other four parameters on the stack. This way, all five parameters are on the stack during the execution of the function. The stack frame is shown in Figure 6.3.4. Notice both the main program and the function balance their own stacks.

SP- ~

f

[SP,#0]

a

[SP,#4]

b C

[SP,#8] [SP,#12]

d

[SP,#16]

return address e

[SP,#24]

other data - :nhit

Figure 6.3.4. The stackjrc1111e inclttdes Jive parameters, re/um address, and a local 11aria/Jle.

Jonathan Valvano

main: MOVS R0,#5 PUSH {RO} II e MOVS R0,#1 // a MOVS Rl,#2 II b MOVS R2,#3 II C MOVS R3,#4 II d BL fun ADD SP,SP,#4 // discard e loop: B loop // *****binding phase************** .EQU f,0 .EQU a,4 .EQU b,8 .EQU c,12 .EQU d,16 .EQU e,24 // l)*****allocation phase** ** **** fun : PUSH {RO-R3,LR} // a,b,c,d SUB SP,#4 // f II 2)******access phase ******* ** ** LOR RO, [SP,#a] //RO=a LOR Rl, [SP,#b] //Rl=b MULS RO,RO,Rl //a*b LOR Rl,[SP,#c] //Rl=c MULS RO,RO,Rl //a*b*c LOR Rl, [SP,#d] //Rl=d ADDS RO,RO,Rl //a*b*c+d LOR Rl, [SP,#e] //Rl=e ADDS RO,RO,Rl //a*b*c+d+e STR RO, [SP , #f] //save inf BL Out // 3)******deallocation phase* ** * LOR RO, [SP,#f] ADD SP,#20 //deallocate POP {PC} //RO=f

231

int main(void){ fun ( 1, 2, 3, 4 , 5) ; while (1) {} ; }

uint32 t fun(uint32_t a , uint32 t b, u int32 t c , uint32 t d, uint32 t e) { uint32 t f ; f = a *b *c+d+e ; Out (f) ; return f ;

Program 6.3.4. Stack pointer implementation of a function 1JJith five parameters and one local. Common Error: One does not allocate/ deallocate stack space by changing stack frame pointer R7. We must modify SP to allocate/deallocate space. Com mon Error: The stack must always be word-al.igned. The instructions LD R RO, [SP,#2], LD RH RO,[SP], SUB SP,#2, and ADD SP,#2 will not compi le.

232

• 6. Variables, Conversions, and LCD Output

6.4. Managing Overflow Overflow and underflow are e1Tors that occur when the result of a calculation exceeds the range of the number system. The consequences of overflow and underflow can be catastrophic, so we must consider their possibility whenever adding, subtracting, multiplying or left shifting. Dividing by O is an example of overflow. Dropout is the loss of information when dividing or right-shifting. Multiplying by O is an example of dropout. Assume for this discussion, nm and pare integers. Let A be any n-bit number, and B be any mbit number. Adding, subtracting, multiplying, and left-shift make the value larger, increasing the number of bits. Division and right-shifting make the value smaller, reducing the number of bits. However, when dividing A/B, B might equal I, so A/B still has n bits of data. In general,

A+B A-B A*B A/B A

p A/2P

=

has has has has has has

max(n,JJJ)+1 bits of data, max(n,m)+1 bits of data, n+m bits of data, n bits of data (assuming Bis not zero), n+p bits of data, n-p bits of data if n?_p, 0 bits of data if n>10 (A-5)*7

has 12+1=13 bits has 12+10=22 bits has 12-9= 3 bits (example of dropout) has 12-10= 2 bits (example of dropout) has (12+1)+3 = 16 bits

Another approach to determine the maximum nLL1Dber of bits in any calculation is to replace the inputs with minimum and maximum values and determine the range of possible outputs. Again, assume A is a 12-bit number (so the minimLUn is -2048 and maximum is 2047)

A+l000 A*1000 A/1000 A>>10 (A-5)*7

min=-1,048 min=-2,048,000 min=-2 min=-2 min= -14301

max= +3,047 max= +2,047,000 max= +2 max= +1 max=+14294

range fits range fits range fits range fits range fits

in in in in in

13 bits 22 bits 3 bits 2 bits 16 bi ts

Checkpoint 6.4.1: Evaluate the potential errors in these two statements: y=(A*B)/C; versus y=A*(B/C). Assume all four variables are signed 32-bit integers.

Jonathan Valvano

233

There are two appropriate mechanisms to deal with the potential for overflow and underflow. The first mechanism is called promotion. Promotion involves increasing the precision of the input numbers, and perf01ming the operation at that higher precision. An error can still occur if the result is stored back into the smaller precision. Fortunately, the program has the ability to test the intermediate result to see if it will fit into the smaller precision. To promote an unsigned number we add zeros to the left side. To promote a signed number we sign extend into the higher bits. Review the assembly instructions LDRB LDRSB LDRH and LDRSH. In C, ifwe load a lower precision number into a higher precision variable, it will automatically promote. Unsigned promotion occurs moving an uintB_t or uint16_t value to an uin t32 _ t variable. Signed promotion occurs when moving from in t8 _ t or in tl 6_ t to int32 _ t variable. No error occurs on promotion. However, if we load a higher precision number into a lower precision variable, it will automatically demote. For example writing a 32bit uin t32 _ t value into an 8-bit uin t _ 8 variable will discard the top 24 bits. Overflow or underflow errors can occur on demotion if the result does not fit. The add8 function adds two unsigned 8-bit values. The function sub8 subtracts two 8-bit signed values. Both use promotion to detect for errors. Setting the value to maximum possible is called ceiling, and setting the value to minimum is called floor.

uint8 t add8(uint8 ta, uint8 t b){ uint32 t result; result= a+b; -II promote-and perform 32-bit addition if(result>255){ II check for overflow result= 255; II yes, overflow occurred, set to ceiling }

return result;

II demote back to 8 bits

}

intB_t sub(intB_t x, int8 t y){ int32 t result result= x-y; II promote and perform 32-bit subtraction if(return < -128){ II check to see if underflow occurred result= -128; II yes, underflow occurred, set to floor if(return > 127){ result= 127;

II check to see if overflow occurred II yes, overflow occurred, set to ceiling

return result; }

Program 6.4.1. Using promotion to detect and compensate for ove,j!mv and 1mdeifl01v errors. Maintenance Tip: When evaluating overflow and underflow in C programs it is best to observe the assembly code produced by the compiler so you can identify the precision of intermediate results. Common Error: Even tho ugh most C compilers autom.atically promote to a higher precision during the intermediate calculations, however they do not check for overflow when demoting the result back to the original format. Checkpoint 6.4.2: Assume data is an unsigned 12-bit integer and volt is an tmsigned 16-bit variable. How do you prove this code does not overflow: volt=(3300*data)>>12?

234

• 6. Variables, Conversions, and LCD Output

6.5. Fixed-point Numbers We will use fixed-point mlll1bers when we wish to express values in our software that have non integer values. ln order to design a fixed-point system the range of values must be known. A fixed-point number contains two pai1s. The first part is a variable integer, called I. This variab le integer may be signed or unsigned. An unsigned fixed -point number is one that has an unsigned variable integer. A signed fixed-point number is one that has a signed variable integer. The precision of a number system is the total number of distinguishable values that can be represented. The precision of a fixed-point number is determined by the number of bits used to store the variable integer. On the Cortex-M processor, we typically use 32 bits, but 8 or 16 bits could be used . The variable integer is saved in memory and is manipulated by software. These manipulations include but are not limited to load, store, shift, add, subtract, multiply, and divide. The second part of a fixed-point number is a fixed constant, called &. The fixed constant is defined at design time and cannot be changed at run time. The fixed constant defines the resolution of the number system. The fixed constant is not stored in memory. Usually we specify the value of this fixed constant using software comments to explain our fixed-point algorithm. The value of the fixed-point number is defined as the product of the variable integer times the fixed constant:

Fixed-point number = I• A The resolution of a number is the sma llest difference that can be represented. In the case of fixed-point numbers, the resolution is equal to the fixed constant, &. Sometimes, we express the resolution of the number as its units. For example, a decimal fixed-point number with a resolution of0.001 volts is really the same thing as an integer with units ofmV. When inputting numbers from a keyboard or outputting numbers to a display, it is usually convenient to use decimal fixed point. With decimal fixed point the fixed constant is a power of 10,

Decimal fixed-point number= I• 10'" for some constant integer 111. Again, the integer m is fixed and is not stored in memory. Decimal fixed point will be easy to input from or output to humans, while binary fixed point will be easier to use when perfo1ming mathematical calculations. With binary fixed point the fixed constant is a power of 2,

Binary fixed-point number= I• 2" for some constant integer

11.

Observation: If the range of numbers is known and small, then the numbers can be represented in a fixed -point format. Checkpoint 6.5.1: Give an approximation of n: using the decimal fixed-point (t. = 0.001) format. Checkpoint 6.5.2: Give an approximation of n: using the binary fixed -point (t.= 2-8) format.

In the first example, we will develop the equations that a microcontroller would need to implement a digital voltmeter. The MSPM0 family of microcontrollers has a built-in analog to digital converter (ADC) that can be used to transform an analog signal into digital fonn. The 12bit ADC analog input range is 0 to +3.3 V, and the ADC digital output varies 0 to 4095 respectively. Let Vin be the ana log voltage in volts and n be the digital ADC output, then the

Jonathan Valvano

235

equation that relates the analog to digital conversion is

Vin= 3.3*n/4095 = 0.00080586 *n Resolution is defined as the smallest change in voltage that the ADC can detect. This ADC has a resolution of about 0.8 mV. In other words, the analog voltage must increase or decrease by 0.8 mV for the digital output of the ADC to change by at least one bit. It would be inappropriate to save the voltage as an integer, because the only integers in this range are 0, 1, 2, and 3. Even though the compiler supports floating point, the voltage data will be saved in fixed-point format, because it will take less memory and execute much faster. Decimal fixed point is chosen because the voltage data for this voltmeter will be displayed. A fixed-point resolution of ~=0.001 V is chosen because it is approximately equal to the ADC resolution. Table 6.5.1 shows the perfom1ance of the system. The table shows us that we need to store the variable part of the fixedpoint number in at least l 6 bits. V;,,(V) Analo 0.000 0.001 1.000 1.650 3.300

ll1

ut

n ADC di 0 I 1241 2048 4095

I (0.001 V) variable aii of the fixed- oint data 0 1

1000 1650 3300

Table 6.5.1. Performance data of a microcomputer-based voltmeter.

One possible software fonnula to convert

11

into I is as follows.

I= (3300*n + 2048)/ 4095, where I is defined as

Vi11

= I*0.001V

It is very important to carefully consider the order of operations when performing multiple integer calcu lations. There are two mistakes that can happen. The first error is overflow, and it is easy to detect. Overflow occurs when the result of a calculation exceeds the range of the number system. The solution to the overflow was presented in the last section. The other error is called dropout. Dropout occurs after a right shift or a divide, and the consequence is that an intermediate result loses its ability to represent all of the values. To avoid dropout, it is very important to divide last when performing multiple integer calculations. If you divided first, e.g., /=3300*(11/4095), then the values of/ would be only 0, or 3300. The addition of "2048" has the effect of rounding to the closest integer. The value 2048 is selected because it is about one half of the denominator. For example, the calcu lation (3300*n)/4095=0 for n=l, whereas the "(3300*11+2048)/4096" calculation yields the better answer of I. A display algorithm for this decimal fixed-point format is shown the next section. When adding or subtracting two fixed-point numbers with the same ii, we simply add or subtract their integer parts. First, let x, y, and z be three fixed-point numbers with the same ii. Let x=J•ii , y=J•ii , and z=K•ii. To perform z = x+y, we simply calculate K = I+J. Similarly, to subtract z = xy, we simply calculate K=l-J. When adding or subtracting fixed-point numbers with different fixed parts, we must first convert the two inputs to the fonnat of the result before adding or subtracting. This is where binary fixed point is more convenient, because the conversion process involves shifting rather than multiplication/division.

236

• 6. Variables, Conversions, and LCD Output In this next example, let x,y, and z be three bina1y fixed-point numbers with different resolutions. fn patticular, we define x to be /•2- 5 , y to be 1•2-2, and z to be K•2- 3 . To conve1t x to the format of z, we divide I by 4 (1ight shift twice) . To convert y to the format of z, we multiply l by 2 (left shift once). To perform z = x+y, we calculate

K = (1>>2)+U< < 1) For the general case, we define x to be I•2n ,y to be 1•2111 , and z to be K•2P. To perform any general operation, we derive the fixed-point calculation by starting with desired result. For addition, we have z = x+y. Next, we substitute the definitions of each fixed-point parameter

K•2J

= I• 2" + ]•2111

Lastly, we solve for the integer part of the result

K

=1•2 -P + ]•2Jll·P 11

For multiplication, we have z=x•y. Again, we substitute the definitions of each fixed -point parameter

Lastly, we solve for the integer part of the result

K

=1•]•2 + -P 11

111

For division , we have z=xly. Again, we substitute the definitions of each fi xed-point parameter

Lastly, we so lve for the integer patt of the result

K = l /)•211•111·P Again, it is very important to carefully consider the order of operations when performing multiple integer calculations. We must worry about overflow and drop out. In particular, in the division example, if (n-m-p) is positive then the left shift (!•2n- 111 -P) should be performed before the divide (/J). We can use these fixed -point algorithms to perform complex operations using the integer functions on om microcontroller. Maintenance Tip: When evaluating potential errors in fi..'l:ed-point calculations, it is best to obse rve the asse mbly code produced by the compiler and identify possible errors in all intermed iate resu lts. Checkpoint 6.5.3: When do we use decimal fixed point rather than binary fixed point? Checkpoint 6.5.4: Write C code to implement y = 0.75*x, assuming x and y are integers. Checkpoint 6.5.5: Rewrite the equation F= 1.8• C+32 using shift rather than divide.

Jonathan Valvano

237

Example 6.5.1. Rewrite the following digital filter using fixed-point calculations.

y

= X -0.0532672•x1 + X2 + 0.0506038,,1-0.9025,,2

Solution: In this case, the variables y, y 1, y 2, x, x1, and x2 are all signed integers, but the constants will be expressed in binary fixed-point format. The value -0.0532672 can be approximated by -14•2-8. The value 0.0506038 can be approximated by 13•2-8. Lastly, the value -0.9025 can be approximated by -231 •2- 8. The fixed-point implementation of this digital filter is

J = X + XJ + (-14•.x1+13j11-231 J'2)>>8 Common Error: Lazy or incompetent progra mmers use floa ting point in many situations where fi xed-point would be just as accurate and much faster. Checkpoint 6.5.6: Assume resistors R 1, R2, R3 are the integer parts of 16-bit unsigned binary fixed-point numbers with a fi xed constant of 2- 4. Write an equati on to calculate R3 = R111 R2 (parallel combination.)

The purpose of this example is to study overflow and dropout errors during integer calculations. The objective of the software is to calculate the circumference of a circle given its radius.

c =21er Assumer is an unsigned 32-bit fixed-point number with a resolution of0.001 cm. c is also fixedpoint with the same re$olution. l.e. , c = C*0.001 cm and r = R*0.00 I cm, where C and R are unsigned 32-bit variable integers. Given 32-bit variables , the values of c can range from 0.000 to 4,294,967.295 cm. If we divide this by 2n, this calculation should work for values of r ranging from Oto 683,565.275 cm . We substitute the definitions of c and r into the equation to get an exact relationship between input R and output C,

C= 2*1e*R We need to convert this equation to a function with integer operations. One simple possibility is

C = 6283*R/1000 The difficulty with this equation is the multiply 6283 is the possibility of overflow. The largest valuer can be without overflow is 232/6283*0.00 l cm= 683cm, which is a 1000 times smaller than the range predicted by the c = 2 n r equation. There are two approaches to reducing the effect of overflow. The first approach would be to promote to 64 bits, peiform the operation with 64-bit math, and then demote back to 32 bits. The second approach is the find a better approximation for 2n. lfwe search the space of all integers (1,, h) less than 255, such that I,/h is as close to 2n as possible, we find this possibility

C= 245*R/39 Notice that 2n-245/39"" 0.0011 , which means this calculation is just as accurate as the 6283/1000 approximation. However, the multiply by 245 is less likely to cause an overflow error as compared to the multiply by 6283. When dividing by an unsigned number we can implement rounding by adding half of the di visor to the dividend .

238

• 6. Variables, Conversions, and LCD Output

6.6. Conversions In this section, we will develop methods to conve1t between ASCH strings and numbers. Let's begin with a simple example. Let Buf be a fixed length string of three ASCll characters. Each entry of Buf is an ASCH character 'O' to ' 9'. Let Buf [ 0] be the ASCII code for the hundred's digit, Buf [ 1] be the ten's digit, and Buf [ 2] be the one's digit. n will be calculated as the numerical value of the three ASCII digits. The decimal digits 'O' to '9' are encoded in ASCII as Ox30 to Ox39. So, to convert a single ASCll digit to numerical value, we simply subtract Ox30. To convert this string of three ASCH characters into a numerical value we calculate

n

= 100*(Buf[O]-Ox30)

+ 10*(Buf[l]-Ox30) + (Buf[2]-0x30);

The numerical value of this 3-element ASCII string could also be calculated as

n

=

(Buf[2]-0x30) + 10*((Buf[l]-Ox30) + 10*(Buf[O]-Ox30)); Assume M is a constant containing the number of ASCII characters in the array Buf. We could put the conversion algorithm into a loop

=

n O; for (int i=O; i null-terminated string // Output: number value uint32_t Str2UDec(char *pt){ uint32_t n = 0; // number while (*pt'= 0){ n = l0*n +((*pt)-0x30); pt++; return n;

Program 6.6.1. Unsigned ASCII string to decimal conversion. Checkpoint 6.6.1: Look at aU the 1O's on this page! \Xlhat is the significance of these 1O's?

Jon a than Valvano

239

The example, shown in Program 6.8.2, uses an 1/0 device capable of sending and receiving ASCII characters. When using the LaunchPad, we can send serial data to/from the PC using the UART. Details of the UART are presented in Chapter 9. The function UART_InChar () receives an ASClJ character from the PC. The function UART_Ou tChar () sends an ASCII character to the PC. Using the algorithm in Program 6.6. 1, the function UART_ InUDec () will accept characters from the device until a carriage return (the Enter key) is typed and return the equivalent numerical value. Sending the received characters back through UART_OutChar () is called echoing, and is included so you can see characters as you are typing.

#define CR 0x0D II Accept ASCII input in unsigned decimal format, up to 4294967295 II If n>4294967295, it will truncate without reporting the error uint32 t UART InUDec(void){ uint32 t n=0; char ch; whil;((ch =-UART_InChar()) != CR){ II accepts until if((ch >= '0') && (ch= 10){ OutUDec(n/10); // ms digits n = n%10; // n is 0-9 OutChar(n+'0');

}

while(n) ;// repeat until n==0 for(; cnt; cnt--){ OutChar(buffer[cnt-1]+'0');

Program 6.9.2. Iterative and recursive implementations of output decimal.

244

• 6. Variables, Conversions, and LCD Output Observation: In general, recursive algorithm s are shorter to write, but require additional stack space.

The program OutUHex is a recursive function that outputs a variable number of characters to display the number in hexadecimal.

void OutUHex(uint32_t number){ if(number >= 0xl0){ OutUHex(numberl0xl0); OutUHex(number%0x10);

II

Output a hexadecimal number

II II

all but last digit last hex digit

II

base case 0 to 15

II

0 to 9

II

A to F

}

else{ if(number < 0xA){ OutChar(number + '0'); }

else{ OutChar(number -10 +'A');

}

Program 6.7.3. Print 32-hit hexadecimal numher to an output device.

6.8. Serial Peripheral Interface, SPI Microcontrollers employ multiple approaches to communicate synchronously with peripheral devices and other microcontrollers. The serial peripheral interface (SPI) system can operate as a controller (master) or as a peripheral (slave). The channel can have one controller and one peripheral, shown on the left side of Figure 6.8.1. Alternatively, the one centralized controller can be connected to multiple peripherals, shown on the right side of Figure 6.8.1. The controller initiates all data communication.

SPI Single Peripheral

MSPM0G3507

Controller

PICO POCJ SCK

cso

Data in Data out . Clock Peripheral Chip select

SPJ Dual Peripherals

MSPM0G3507

Controller

Data in Data out Clock Peripheral Chip select

PICO POCl SCK

cso

CS!

-

.....

Data in Data outp . h erzp era 1 Clock ' - - - - + -1 Chip select

-

Figure 6. 8.1. Serial peripheral interface port pins on the MSPM0C3 50 7 microcontroffer.

Jonathan Valvano

245

The MSPM0G3507 has two SPI modules. The fundamental difference between a UART, which implements an asynchronous protocol, and a SPI, which implements a synchronous protocol, is the manner in which the clock is implemented. Two devices communicating with asynchronous serial interfaces (UART) operate at the same frequency (baud rate) but have separate hardware to create their clocks. With a UART protocol, the clock signal is not included in the interface cable between devices. Two UART devices can communicate with each other as long as the two clocks have frequencies within ±5% of each other. Two devices communicating with synchronous serial interfaces (SPI) operate from the same hardware clock (synchronized). With a SPI protocol, the clock signal is included in the interface cable. The controller creates the clock, and the peripheral uses the clock to latch the data (in and out.) Another name for controller is master, and another naine for peripheral is slave. The SPI protocol includes up to six 1/0 lines. The PICO (peripheral in controller out) is a data line driven by the controller and received by the peripheral. Another name for PICO is MOSl (master out slave in). The POCI (peripheral out controller in) is a data li.ne driven by the peripheral and received by the controller. Another name for POCI is MISO (master in slave out). The SCK is a 50% duty cycle clock generated by the controller. In order to work properly, the transmitting device uses one edge of the clock to change its output, and the receiving device uses the other edge to accept the data. The chip select signals CS0, CSl, CS2 and CS3 are negative logic control signals from controller to peripherals signifying the channel is active. Since there are four select lines, we can have one to four peripherals on each interface. Table 6.8.1 lists the available pins for each SP! function . Function SPIO PICO SPIO POCI SPIO SCK SPIO CS0 SPIO CS! SPIO CS2 SPIO CS3 SPil PICO SPIJ POCI SPil SCK SPII CS0 SPll CSl SPil CS2 SPll CS3

Pin PA9 (3) PA!O (3) PAI 1 (3) PA2 (3) PB6 (4) PA24 (3) PA23 (3) PAI 8 (3) PA16 (3) PA17(3) PA26 (3) PA27 (3) PA15 (3) PA25 3

Pin i>Al4 (3) i>A13(3) l>AI2 (3) PA8 (3) P824 (3) P87 (4) l>Al4 (4) PB8 (3) P87 (3) PB9 (3) PB6 (3) PB!7(4) PB0 (3) PBl (3

Pin P817(3) P819 (3) PB18 (3) PB25 (3) PB26 (3) P820 (2) PB24 (2) P815 (3) PB14 (3) PB18(3) PB20 (3) PB27 (3) PB18 (4) PB14 2

Pin

PB22 (2) PB21 (2) PB23 (2)

Table 6.8.1. SPI pin options (digital Mode), see Table 3.2.1.

The shift register in Figure 6.8.2 is shown with 8 bits, but it can be configured from 4 to 16 bits. The shift register in the controller and the shift register in the peripheral are linked to form a distributed register. Typically, the microcontroller and the 1/0 device slave are so physically close we do not use interface logic. Figure 6.8.2 illustrates SP! I connected to a single peripheral using pins PB6 - PB9. The SPI on the MSPM0 employs two hardware FIFOs, one for transmission and another for receiving. Both FlFOs are 4 elements deep and 16 bits wide. When perfom1ing only transmission, the software first waits until the transmit FIFO not full (TNF) flag is 1, meaning there is room to save more data. Next, the software puts data into the transmit FIFO by writing to the SPil->TXDATA register. Writing to the TXDATA register puts the data into the transmit FIFO. If there are data in the transmit FIFO, the SPI module will transmit them .

246



6. Variables, Conversions, and LCD Output

MSPM0 Write data

TNF

"4.

Transmit FIFO not full

SPil CS0 I/O device peripheral

PB9 SPil SCK

4-element 16-bit FIFO

PB8 SPil - PICO

SP! contro!Ler

PB7 SPil POCI ··-....

4-element 16-bit FIFO RFE

PB6

I Receive FIFO empty

.... •

.. Shift ..... •······· register

GND

GND

Fig11re 6.8.2. /I synchronous sericzl inte1:fczce betJveen a 1nicroconlroller and czn l / 0 device. Table 6.8.2 lists some of the SPil registers on the MSPM0G3507. The details of the initialization can be found in the starter code. In this section , we will focus on the data transfer using busywait synchronization. Address 0x4046B 140 0x4046B 130

3 1-16

Address 0x4046B 110

7

15-0 Data Data 6

5

4 BSY

3

R.NF

SPI 1->TXDAT A SPl l->R.XDATA 2 RFE

TNF

0 TFE

SPll->STAT

Table 6.8.2. Some MSPM0G3507 SPI1 registers. Each register is 32 bits wide.

We must do output and input even ifwe only want to input. The SPl transmits and receives bits at the same time. Basically, the two shift registers, shown in Figure 6.8.2, are exchanged between the controller and the peripheral. Data in the controller shift register are transmitted to the peripheral, and at the same time, data in the peripheral sh ift register are transmitted to the controller. To perform output and input, the software first waits until the transmit not full (TNF) flag is 1. Next, the software writes output data to the SPil->TXDATA register. Data in the transmit FIFO will be transmitted. While it is transmitting it is also receiving. The software then waits until the receive FIFO empty (RFE) flag is 0, meaning there is data in the receive FIFO. The software gets from the receive FIFO by reading from the SPil->RXDATA register. The timing of a tranmit-only interface between SPI 1 and an ST7735R LCD is shown in Figure 6.8.3. The key to proper transmission is to select one edge of the clock (shown as falling edges and marked with "T" in Figure 6.8.3) for the controller to change the output, and then use the other edge (shown as rising edges and marked with "R") to latch the data into the peripheral. This way, data is latched during the time when it is stable. In order for the communication to occur without error, the data available interval from the device must overlap (start before and end after) the data required interval by the other device that is receiving the data. It is this overlap that will determine the maximum frequency at which synchronous serial commun ication can occur. Data is avai lable on the PICO pin a short time after the falling edge of the clock until a short time after the next falling edge. The data is required to be valid from the setup time before

Jonathan Valvano

247

the rising edge until the hold time after that same edge. From Figure 6.8.3, we can see the interval the data will be availab le (DA) overlaps the interval the data is required (DR). Checkpoint 6.8.1: \'\i'hat are the definitions of setup time and hold time?

T

t

R

t

T

t

R

t

SPil

Figure 6.8.3. Synchronous serf.al timing showing the data available interval overlaps the dattt required interval. Observation: Because the clocks are shared, if you change the SPI clock frequency, the transfer rate will change in both controller and peripheral.

The SPI timing (with 8-bit data, SPO=0, SPH=0) between SPIJ and an ST7735R LCD is shown in Figure 6.8.4. The CS0 signal goes low during the transmission, 8 bits are sent, and then the CS0 signal goes high. Notice the 8 data bits are sent most significant bit first.

SPil CS0

SPil PICO ST7735 MOSI figure 6.8.4. Synchronous serial Freescale single tran-der mode (8-bit dattt, SPO=0, SPH=0).

6.9. ST7735R Graphics LCD Interface In this section, we will interface a ST7735R LCD to SPll using busy-wait synchronization. The interface to the Adafruit ST3357R is shown in Figure 6.9.1. See the ST7735 .c staiter file for interface connections for other versions of the LCD. Driving RESET low for 500ms will initialize the LCD. We will use a GPIO output on PAI 3 to specify whether the SP! transmission is data or command. Many of the ST7735R displays come with a secme digital card (SDC). In this book, we just use the display, so the SOC pins are not connected. DIC stands for data/command; you will make DIC high to send data and make DIC low to send a command.

248

• 6. Variables, Conversions, and LCD Output

MSPMO PB15 PA13

3.3V

I GP!O GPIO

I

2 3

4

5

Gnd ST7735R

vcc RESET DIC

-6

SPi l PB6 cso

PICO

PB8 SCK PB9 POCI ,__ PB7

--=-

CARD CS TFT CS 7 MOSI 8 SCK - 9 MISO ~ LITE

Figure 6.9.1. Adafruit ST7735R display with 160 fry 128 16-bit color pixels. With a 4 MHz SPl clock, it takes about 2 µs to send one byte. At this speed it is appropriate to use "busy-wait" synchronization. The interface will send one command and wait for it to finish. However, when sending data it wi ll stream it through the SPI FIFO. The wait at the beginning of OutCommand allows previous data outputs to complete. See Figure 6.9.2 and Program 6.9.1.

Busy-Wait

rzgt{re 6.9.2. Bury-ivait rynchronization is used to send COJIJ1Jlands and data to the display void SPI_OutData(char data){ while((SPI1->STAT&0x02) == 0x00){}; GPIOA->DOUTSET31 0 = 1STAT&0xl0) == 0xl0){}; // spin if SPI busy GPIOA->DOUTCLR31_0 = 1STAT&0xl0) == 0xl0){}; II spin if SPI busy }

Program 6.9.1. L01v-level fimctions to output data/ commands to the LCD display (SPI).

7. Data Acquisition Systems

7.1. lnterthread Communication and Synchronization For regular function calls we use the registers and stack to pass parameters, but interrupt threads have logically separate registers and stack. In particular, registers are automatically saved by the processor as it switches from main program (foreground thread) to interrupt service routine (background thread). Exiting an ISR will restore the registers back to their previous values. Thus, all parameter passing rnust occur through global memory. One cannot pass data from the main program to the interrupt service routine using registers or the stack. In this book, multi-threading means one main program (foreground thread) and multiple ISRs (background threads). An operating system allows multiple foreground threads (sec Volume 3). Synchronizing threads is a critical task affecting efficiency and effectiveness of systems using interrupts. In this section, we will present in general form three constructs to synchronize threads: binary semaphore, mailbox, and FIFO queue. A binary semaphore is simply a shared flag, as described in Figure 7.1.1. There are two operations one can perform on a semaphore. Signal is the action that sets the flag. Wait is the action that checks the flag, and if the flag is set, the flag is cleared and important stuff is perfo1med. This flag must exist as a private global variable with restricted access to only the Wait and Signal functions. In C, we add the qualifier static to an otherwise global variable to restrict access to software within the same file . In order to reduce complexity of the system, it will be important to Jim.it the access to this flag to as few modules as possible. Flagl is used to synchronize from an ISR to main, and Flag2 synchronizes from main to an ISR. Main ~--p-ro•gtc11n Other calculations

Main

JSR

i

ISR

i

[F1•t'1 Other calculations

Flagl = 0 Do important stuff

Flag2 = O Do important stuff

Figrm 7. 1. 1. A semaphore ca11 be 11serl lo !)'t1chronize threc,ds. A binary semaphore has two states: 0 and I. However, it is good design to assign a meaning to this flag. For examp le, 0 might mean the switch has not been pressed, and 1 might mean the switch has been pressed. Figure 7. I. I shows two examples of the binary semaphore. The big arrows in this figure signify the synchronization link between the threads. In the example on the left, the ISR signals the semaphore and the main program waits on the semaphore. Notice the

250

• 7. Data Acquisition Systems "important stuff' is run in the foreground once per execution of the lSR. In the example on the right, the main program signals the semaphore and the ISR waits. It is good design to have NO backwards jumps in an lSR. ln this pa1iicular application, if the ISR is running and the semaphore is 0, the action is just skipped and the computer returns from the interrupt. The second interthread synchronization scheme is the mailbox. The mailbox is a binary semaphore with associated data variable. Figure 7.1 .2 illustrates an input device interfaced using interrupt synchronization. The big arrow in this figure signifies the communication and synchronization link between the background and foreground. The mailbox structure is implemented with two shared global variables. Mail contains data, and Status is a semaphore flag specifying whether the mailbox is full or empty. The interrupt is requested when its trigger flag is set, signifyi11g new data ai-e ready from the input device. The ISR will read the data from the input device and store it in the shared global variable Mail , then update its status to full. The main program wil l perform other calculations, while occasionally checking the status of the mailbox. When the mailbox has data, the main program will process it. This approach is adequate for situations where the input bandwidth is slow compared to the software processing speed.

Main

ISR

.----p_ro~~am Read data

Other calculations a

from input Mail = data Status= Full b

Process Mail Status= Empty

d

Figure 7. 1.2. A mailbox can be used to pass data between threads. One way to visualize the interrupt synchronization is to maw a state versus time plot of the activities of the hardware, the mailbox, and the two software threads (Figure 7.1 .3).

Input device

lfUlJ

lnter.r:upt service routine Main program

Status

U1J1J

Trigger set b

I

b Return from interrupt

a empty

Trigger set

C

full

d

Return from inte1rupt a empty

Figure 1.1. 3. Hardware/ sojt711are timing of an input intetface 11sing a mailbox.

C

full

d

a empty

Jonathan Valvano

251

Figure 7.1.3 shows that at time (a) the mailbox is empty, the input device is busy and the main program is performing other tasks, because mailbox is empty. When new input data are ready, the trigger flag will be set, and an interrupt will be requested. At time (b) the ISR reads data from input device and saves it in Mail , and then it sets Status to full. At time (c) the main program recognizes Status is full. At time (d) the main program processes data from Mail, sets Status to empty. Notice that even though there are two threads, only one is active at a time. The interrupt hardware sw itches the processor from the main program to the ISR, and the return from interrupt switches the processor back. The third synchrnnization technique is the FIFO queue. Details of the FlFO will be presented in Chapter 8. There are other types of interrupt that are not an input or output. For examp le we will configure the computer to request an interrupt on a periodic basis. This means an interrupt handler will be executed at fixed ti me intervals. This periodic interrupt wi 11 be essential for the implementation of real-time data acquisition and real-time contro l systems. For example ifwe are implementing a digital controller that executes a control algorithm 100 times a second, then we will set up the internal timer hardware to request an interrupt every 10 ms. The interrupt service routine will execute the digital control algorithm and then return to the main tlu·ead. In a similar fashion, we will use periodic interrupts in this chapter to perforn1 analog input and/or analog output. For example ifwe wish to sample the ADC 100 times a second, then we will set up the internal timer hardware to request an interrupt every 10 ms. The interrupt service routine will sample the ADC, process (or save) the data, and then return to the main thread. Performance Tip: It is poor design to employ backward jumps in an ISR, because they may affect the latency of other internipt requests. Whenever you arc thinking about using a backward jump, consider redesigning the system with more o r different triggers to reduce the number of backward jumps.

7.2. Analog to Digital Conversion An analog to digital converter (ADC) converts an analog signal into digital form, see Figure 7.2.1. An embedded system uses the ADC to collect information about the external world (data acquisition system.) The input signal is an analog voltage, and the output is a binary number. The ADC precision is the number of distinguishable ADC inputs (e.g., 4096 alternatives, 12 bits). The ADC range is the maximum and minimum ADC input (e.g., 0 to +3.3V). The ADC resolution is the smallest distinguishable change in input (e.g., 0.8 mV). The resolution is the change in input that causes the digital output to change by J.

Range(volts)

= Precision(alternatives) • Reso luti on(volts)

Normally we don't specify accuracy for just the ADC, but rather we give the accuracy of the entire system (including transducer, analog circuit, ADC and software). An ADC is monotonic ifit has no missing codes. This means if the analog signal is a slow ri sing voltage, then the digital output will hit all values one at a ti.me. The merit of an ADC involves three factors: precision (number of bits), speed (how fast can we sample), and power (how much energy does it take to operate). How fast we can sample involves both the ADC conversion time (how long it takes to

252



7. Data Acquisition Systems convert), and the bandwidth (what frequency components can be recognized by the ADC). The ADC cost is a function of the number and quality of internal components. 4096

Analog to Digital Converter

3584 .µ

;=j

3072

&2560 ;=j

0 2048

t0

1536 1024 512 0

0.825 1.650 2.475 Analog Input (volts)

0.000

3.300

figttre 7.2. 1. A 12-hit ADC 1J1ith a range of Oto 3.3V.

Checkpoint 7.2.1. In Figure 7.2.1 , what will be the digital output if the analog input is 0.825V?

The MSPM0G3507 has two converters: ADC0 and ADCl. Table 7.2.1 shows the register bits required to perform sampling on a single cha1mel using ADC0. For more complex configurations refer to the specific data sheet. Table 7 .2.2 shows the available analog channels for the ADCs. 5

31-23 Key=OxA9

Address

I Ox40000808 I

3 1-27

I Ox40001100 I

I Ox40001108 I I Ox40001180 I

31

31

31

30-28 AVGD

I AVGN I

28-24 ENDADD

20-16 STRTADD

I Ox40001280 I

16 PWRDN 20 SMPMODE

26-24

28

24 TRIG

I WNDCMP I

15-11 SM PCNT 20

I BCSEN

0

17- 16 CONSEQ 10

I FIFOEN 16

I AVGEN

I CTLO

8

0

SC

TRIGSRC

8

2-1

I DM AEN 12

I CTL I

0

I RES I DF I CTL2 9-8

4-0

I STLME I I VRSEL I I CHAN I MEMCTR(OJ 9-0 VAL

2 ASCACT

3 1-1 6

I CLKFREQ

ENC

3 1-10

31-3

Name

2-0 FRANGE

26-24 SCLKDIV

I Ox40001 I 14 I I Ox40001340 I

1-0

I SAMCLK I CLKCFG

31-3

I Ox4000 1I 10 I

I Ox40001104 I

4 CCONRUN

CCONSTOP

I SCOMPO 0

REFBUFRDY 15-0 DATA

BUSY

I STATUS I MEMRES(OJ

Table 7.2.1. The MSPM0G3507 ADC0 registers. For addresses of the ADCl, add 0x2000 to the above.

Jonathan Valvano Channel 0 1 2 3 4 5 6 7

Al)CO pin

ADCJ pin

PA27 PA26 PA25 PA24 PB25 PB24 PB20 PA22

PA15 PA16 PA17 PA18 PBl 7 PB18 PB19 PA21

253

Table 7.2.2. Available pins for the MSPM0G3507 ADC.

We perform the following steps to software stait the ADC and sample one channel. Program 7.2.1 shows a simple initialization of the ADC. ft will sample one channel using software start and busy-wait synclu-onization. Steps 1-3. We reset and activate the ADC, and then wait 24 bus cycles. Step 4. We select the clock to operate the ADC. SAMCLK is set to 00 to select the ULPCLK. The ULPCLK is the fastest clock avai lable. CCONRUN is set to 0 so the ADC clock stops after each conversion, saving power. CCONSTOP is set to 0 so the ADC does not run continuously, rather we will explicitly start the ADC when we want a new sample. Step 5. We tell the ADC how fast the ADC clock is. In particular, The FRANGE field is set to match the frequency of the SAMCLK selection. We set FRAN GE to 7 because the bus clock (ULPCLK) is greater than 32 MHz. Step 6. We detennine the tradeoffbetween speed and accuracy. In general, the slower we clock the ADC the less noise we will have. Converse ly, the faster we clock the ADC, the faster we can samp le. We set the SC:LKDIV field to 3 to select a divide by 8, making the ADC clock at 40MHz/8 = 5 MHz, taking about 2 µs to perform one sample. We set the PWRDN to l so ADC is continuously powered. To save power we could clear PWRDN, which would save power but take longer to sample. During initialized we clear the ENC bit to disable the ADC. However, we will set ENC to trigger an ADC sample. Step 7. We configure the three fields AVGD AVGN and AVGEN activate a process where the ADC is sampled multiple times and the results are averaged. In this example, these tlu-ee fields are cleared so an ADC trigger causes one sample to be taken. We set SMPMODE to 0 so the so sample timer samples the ADC. We clear CONSEQ to take one sample on each trigger. During initialization we clear SC. However, we will set SC to start the ADC. We clear TRIGSRC to select software as the method to trigger the ADC. Step 8. We select the memory registers it will store the result. Setting STRTADD and END ADD to 0 specifi es the ADC will use MEMCTR[0I for control and MEMRES[0] for data. SAMPCNT FIFOEN and DMAEN are cleared because we are not using DMA. RES is cleared to select 12-bit mode. Running in 8-bit or I 0-bit mode would improve speed at the expense of resolution. DF is cleared so the digital output is tmsigned binary, i.e., 0 to 4095. Step 9. We set CHANSEL to the channel number, see Table 7.2.2, to select from which pin to sample. We clear WNDCMP to disable the window comparator. The TRIG bit does not matter when we sample one chaimel. When we configure to sample multiple channels, we will set

254

• 7. Data Acquisition Systems TRJG. We clear BCSEN to disable the burnout current source. We clear AVGEN to disable hardware averaging. Clearing STIME selects SCOMP0, see Step 10. Clearing VRSEL selects the 3.3V internal power line as the reference for the ADC. This means the ADC input range will be Oto 3.3 V. The starter project has initialization that activate and select the internal 2.5V analog reference. When using the internal reference, the ADC input range will be O to 2.5 V. Step 10. When converting, the ADC first collects the analog voltage into its circuits. This collection phase is called sampling. The second phase is converting the analog signal to digital form. The eventual digital output is a function of the analog signal occu1Ting duJing the sampling phase. We set V AL=0 so the sampling phase is 8 clocks. Longer sampling times result in less noise at the expense of slower conversion time. Step 11. We clear I MASK so the ADC itself does not request interrupts.

void ADC0 Init(uint32 t channel){ ADCO->ULLMEM.GPRCM.RSTCTL = 0xB1000003; ADC0->ULLMEM.GPRCM.PWREN = 0x26000001; Clock Delay(24); ADCO->ULLMEM.GPRCM.CLKCFG = 0xA9000000; ADC0->ULLMEM.CLKFREQ = 7; ADC0->ULLMEM.CTL0 = 0x03010000; ADC0->ULLMEM.CTLl = 0x00000000; ADC0->ULLMEM.CTL2 = 0x00000000; ADC0->ULLMEM.MEMCTL[0] = channel; ADC0->ULLMEM.SCOMP0 = 0; ADC0->ULLMEM.CPU INT.IMASK = 0; //

// // // // // // // // // // 11)

1) reset 2) activate 3) wait 4) ULPCLK 5) 40-48 MHz 6) divide by 8 7) mode 8) MEMRES 9) channel 10) 8 sample clocks no interrupt

}

Progm111 7.2.1. I11itializatio11 of the ADC t1sing software start and btt-91-Jvait (ADC.HVfrigger). Program 7 .2.1 gives a function that performs an ADC conversion. Which channel to sample was set previously in step 9 of the initialization. There are five steps required to perform a conversion. The range is Oto 3.3V. If the analog input is 0, the digital output will be 0, and if the analog input is 3.3V, the digital output will be 4095. The time to conve1t is about 2.4 µs. Step l. We set ENC to enable conversions. Step 2. We set SC to strut the ADC conversion. Step 3. Delay at least 6 bus cycles for ADC to start. Step 4. The function waits for the ADC to comp lete by polling the BUSY bit. Step 5. The 12-bit digital srunple is read out ofMEMRES[0I.

uint32 t ADC0 In(void) { ADCO~>ULLMEM.CTLO I= 0x0000000l; // enable conversions ADC0->ULLMEM.CTLl I= 0x00000l00; // start ADC uint32_t volatile delay=ADC0->ULLMEM.STATUS; // time to start while((ADC0->ULLMEM.STATUS&0x0l)==0x0l){}; // wait for completion return ADC0->ULLMEM.MEMRES[0] ;} Program 7.2.2. ADC sampling ming sofh11are start and b11sy-1vait (ADCSWTrigger) .

Jonathan Valvano

255

Observation: To sample using ADCl, simply change all the in stances of J\DCO to J\DC1 in Programs 7.2. l and 7.2.2.

There are many examples in the ADCSWTrigger starter code, including configuring the internal reference, samp ling ADC], and samp ling two channels on the same trigger. In this chapter we will use a periodic timer interrupt to generate periodic sample. The timer JSR wi ll call the function ADC0_ln, and use a mailbox to pass data to the main program. A more accurate sampling method is timer-triggered sampling. Checkpoint 7.2.2: Assume you have an n-bit ./\DC with a range of Oto Vmax. Wbat will be its resolution in volts? Checkpoint 7.2.3: Assume you wisb to create a measurement system with a range ofO to 2.SV and a resolution of at least Sm V, wbat is the fewest number of bits required for the ./\DC?

7.3. Signal to Noise Ratio We define signal to noise ratio (SNR) as the magnitude of the signal divided by the magnitude of the noise. Averaging is a powerful tool to reduce the effects of noise. Let Xi be N experimental values as measured by the ADC with the input fixed. The population mean µ of the signal is defined as the theoretical average, whereas we define the sample mean x

as the calculated

average of sampled data. Similarly, we define sigma a of the signal as the theoretical standard deviation, whereas we calculate experimental standard deviation S from actual collected data. If the input to the ADC is a constant, and you perform multiple conversions, the standard deviation of the conversions is an indication of the noise. N-1

-

1 ~

x=N Lxi i=O

~N-1(

S=

L, i=O

Xi -

-)2 X

N

The ADC resolution is the smallest change in input that can be reliably detected by the system. Figure 7 .3.1 illustrates how ADC resolution should be measured. Because of noise, if we set the ADC input to Vi11 and sample it many times, we will get a distribution of digital outputs. We plot the number of times we got an output as a function of the output sample. The shape of this response is called a probability mass function (pmt) characterizing the noise processes. A pmf plots the number of occurrences versus the ADC sample value. When two profs overlap, the two inputs are not distinguishable. If the pmfs do not overlap, we claim the system can resolve the two inputs. For example, white noise has a Gaussian pmf. The standard deviation of repeated measurements (with units of volts) is a simple measure of ADC resolution (in volts). One way to estimate resolution is to co llect multiple sets of I 00 measurements with each input slightly larger than the last, Vi11 +L1 V. lfwe can demonstrate that the second data set is statistically different from the first (regardless of Vin), we claim the resolution is less than or equal to L1 V. For the 12-bit ADC on the MSPM0, Figure 7.3.1 shows us that we have to increase the input by I rnV to always be able to recognize tl1e change. For example, the 1.6500V data is statistically different from the 1.651 0V data. Therefore, we claim the ADC has a resolution of l m V. The data in Figure 7 .3.1 was taken with 64-point hardware averaging.

256

• 7. Data Acquisition Systems 50

Probability Mass Function (pmf)

"'u 40

· ·+ ·· 1.6515 V

...... :::i

- • - 1.6510V

Ql

C

QI

u u 0

.... ... 0

30

-Jr- 1.6505 V

20

-+-1.6500 V

Ql

.J:J.

E 10 :::i

z

0 2040

2042

2044

2046

2048

ADC output

2050

2052

2054

1-: .igure 7.3.1 . A probability 111ass J;mctiot1 sho1ving experimental de!er111inatio11 ofADC res{)/11tio11. Checkpoint 7.3.1: The standard deviation of the data in Figure 7.3.1 is about I J\DC sample. ls this the expected result or extremely noisy? Oversampling: We will sample the J\DC faster than we need, and average multiple samples to get one reading. This averaging, called oversampling, will improve the signal to noise ratio. Central Limit Theorem (CLT) states as independent random variables are added , their sum tends toward a Normal or Gaussian distribution.

The use of the CL T assumes the noise is random, and the noise in each sample is independent from the noise in the other samples. For an ADC-based measurement we will apply an additional assumption that the noise has zero mean . In order to improve signal to noise ratio in our measurement, we can take multiple ADC samples and calculate the average. As we increase the number of measurements in the average, the calculated average approaches mean (truth) and also the standard deviation approaches 0, see Figme 7.3.2.

900 800 "'GPRCM.RSTCTL = 0xB1000003; // reset UART0 UART0->GPRCM.PWREN 0x26000001; // activate UART0 Clock_Delay(24); // time for uart to activate // configure PAll PAl0 as alternate UART0 function IOMUX->SECCFG.PINCM[PAl0INDEX] = 0x00000082; //bit 7 PC connected //bits 5-0=2 for UART0_Tx IOMUX->SECCFG.PINCM[PAllINDEX] = 0x00040082; //bit 18 !NENA input enable //bit 7 PC connected //bits 5-0=2 for UART0 Rx 0x08; // bus clock UART0->CLKSEL UART0->CLKDIV 0x00; // no divide UART0->CTL0 &= ~0x0l; // disable UART0 UART0->CTL0 = 0x00020018; // enable fifos, tx and rx II 40000000/16 ~ 2,500,000, 2,500,000/115200 = 21.70139 UART0->IBRD = 21; //divider= 21+45/64 = 21.703125 UART0->FBRD = 45; UART0 - >LCRH = 0x00000030; // 8bit, 1 stop, no parity UART0->CTL0 I= 0x0l; // enable UART0

=

= =

}

char UART InChar(void){ while((UART0->STAT&Ox04) == 0x04){}; // wait while not input return((char) (UART0->RXDATA)); }

void UART OutChar(char data){ while((UART0->STAT&Ox80) -- 0x80){}; // wait while TxFifo full UART0->TXDATA = data; }

Program 8.4.1 . Device driverJunctions !hat implement serial I/ 0 (UAI-U:_Bu.ryWait). Checkpoint 8.4.5: When is RXFE set? When is RXFE clear? Checkpoint 8.4.6: When is TXFF set? When is TXFF clear? Checkpoint 8.4.7: Describe what happens if the receiving computer is operating on a baud rate that is twice as fast as the transmitting computer? Checkpoint 8.4.8: Describe what happens if the transmitting computer is operating on a baud rate that is twice as fast as the receiving computer? Checkpoint 8.4.9: How do you change Program 8.4.1 to run at the same baud rate, but the CPU clock is now 40 MHz, making the bus clock 20 MHz.

Jonathan Valvano

285

8.4.4. Interrupt-driven UART Device Driver Typically a communication system has two separate channels, one for input and one for output, and each channel employs a separate FIFO queue. Program 8.4.2 shows the inte1rnpt-driven UART device driver. The flowchart for this interface was shown previously as Figure 8.3.3. DW'ing initialization, Port A pins 11 and 10 are enabled as UART signals. The two software FIFOs of similar to Program 8.3.1 are initialized. The baud rate is set at 115200 bits/sec, and the hardware FIFOs are enabled. To use interrupts we will enable the FIFOs by setting the FEN bit in the LCRH register. The RTOUT flag will interrupt if the receiver becomes idle, and there are data in the receiver FIFO. This trigger will allow the interface to receive input data when data comes one frame at a time. RXTOSEL specifies the receive idle time before which an receive timeout interrupt is generated. More specifically, if RXTOSEL equals n, then an interrupt will occur n bit times of idle after the last received character ifthere are some data in the receive FIFO. RXIFLSEL specifies the receive FIFO level that causes an interrupt. We set RXIFLSEL to 2 so a receive interrupt wi ll occur if the receive FIFO is greater than or equal to half full. In other words, the RXINT flag will interrupt if there are 2 3 or 4 elements in the receive hardware FIFO. Notice if there are two elements in the receive FIFO, there are still 2 free spaces so the latency requirement for this realtime input will be 20 bit times. TXIFLSEL specifies the transmit FIFO level that causes an interrupt. We set TXIFLSEL to 2 so a transmit intem1pt will occur if the transmit FIFO is less than or equal to half foll. In other words, the TXINT flag wi ll interrupt if there are 0 1 or 2 elements in the transmit hardware FIFO. Not waiting until the hardware FIFO is completely empty allows the software to refill the hardware FIFO and maintain a continuous output stream, achieving maximum bandwidth. In the NVIC, the priority is set at 2 and UART0 (IRQ=l 5) is activated. Normally, one does not enab le interrupts in the individual initialization functions . Rather, interrupts should be enabled in the main program, after all initialization functions have completed. We will employ three ()f the many possible interrupt trigger flags, located in the RIS register: TXINT RXINT and RTOUT . Each of the trigger flags has a corresponding arm bit in the IMASK register. A bit in the MIS register set if the trigger flag is both set and armed. Since all three trigger flags invoke the same JSR, we need a way to determine which flag causes the interrupt, cognizant of the possibility that multiple flags may be set. The mechanism Texas Jnstrument uses involves the IIDX register. We read the IIDX register and it will return a value telling us which trigger caused the interrupt. The possible values for us are

STAT= 0x01 for receiver timeout, RTOUT STAT= 0x0B for receiver fifo half full, RXINT STAT= 0x0C for transmit fifo half empty, DUNT Reading the IIDX wi ll automatically clear the trigger flag specified by the ST AT value. lf more than one flag is set, additiona l interrupts will be triggered to process each flag separately. For other interrupt possibilities, see the technical reference manual. When the main thread wishes to output it calls UART_OutChar, which will put the data into a software FIFO. Next, it copies as much data from this software FIFO into the hardware FIFO and arms the transmitter. The transmitter interrupt service will also get as much data from this

286

• 8. Communication Systems software FIFO and put it into the hardware FIFO. The copySoftwareToHardware function has a critical section, because it is called by both UART_OutChar and the JSR. To remove the critical section the transmitter interrupt is temporarily disarmed in the UART_OutChar function when copySoftwareToHardware is called. This helper function guarantees data is transmitted in the same order it was produced. When input frames are received they are placed into the receive hardware FIFO. If this FIFO goes from I to 2 elements, or if the receiver becomes idle with data in the FIFO, a receive interrupt occurs. The helper function copyHardwareToSoftware will get from the receive hardware FIFO and put into the receive software FIFO. When the main thread wishes to input data it calls UART_ InChar. Th is function simply gets from the software FIFO. If the receive software FIFO is empty, it will spin. The helper function copyHardwareToSoftware is not critical because it runs atomically. Both invocations occm in the JSR and this JSR will not interrupt itse lf.

II assume 40MHz bus clock, initialize UART for 115200 baud rate void UART Init(void){ UARTO->GPRCM.RSTCTL = 0xB1000003; II reset UART0 UART0->GPRCM.PWREN = 0x26000001; II activate UART0 Clock_Delay(24); II time for uart to power up IOMUX->SECCFG.PINCM[PAl0INDEX] 0x00000082; II PC=l, Mode=2 IOMUX->SECCFG.PINCM[PAllINDEX] 0x00040082; II INENA, PC, Mode=2 TxFifo_ Ini t () ; RxFifo_Init(); UART0->CLKSEL = 0x08; II bus clock UART0->CLKDIV 0x00; II no divide UART0->CTL0 &= ~0x0l; II disable UART0 UART0->CTL0 = 0x00020018; II enable FEN, Tx, Rx II 20000000116 = 1250000, 1250001115200 = 10.850694 UART0->IBRD = 10; II 10+54164 = 10.84375 UART0->FBRD = 54; II baud =1,250,000110.84375 = 115,274 bps UART0->LCRH = 0x00000030; II 8 bit, 1 stop, no parity UART0->CPU INT.IMASK = 0x0C0l; I I bit 11 TXINT I I bit 10 RXINT II bit 0 RTOUT Receive timeout UART0->IFLS = 0x0422; II bits 11-8 RXTOSEL receiver timeout select 4 (0xF highest) II bits 6-4 RXIFLSEL 2 is greater than or equal to half II bits 2-0 TXIFLSEL 2 is less than or equal to half NVIC->ICPR[0] = 1RXDATA; RxFifo_Put(letter); }

char UART_InChar(void){ char letter; do{ letter= RxFifo_Get(); }while(letter==0); return (letter) ; II copy from software TX FIFO to hardware TX FIFO II stop when software TX FIFO is empty or hardware TX FIFO is full void static copySoftwareToHardware(void){ char letter; while(((UART0->STAT&0x80) == 0) && (TxFifo_Size() > 0)){ letter= TxFifo_Get(); UART0->TXDATA = letter; }

void UART_OutChar(char data){ while(TxFifo_Put(data) == 0){}; UART0->CPU_INT.IMASK &= ~0x0800; copySoftwareToHardware(); UART0->CPU_INT.IMASK I= 0x0800;

II disarm TX FIFO interrupt II rearm TX FIFO interrupt

}

void UART0 IRQHandler(void){ uint32 t status; status =-UARTO->CPU_INT.IIDX; II reading clears bit in RIS if(status == 0x0l){ II 0x0l receive timeout copyHardwareToSoftware(); }else if(status == 0x0B){ II 0x0B receive copyHardwareToSoftware(); }else if(status == 0x0C) { II 0x0C transmit copySoftwareToHardware(); if(TxFifo_Size() == 0){ II software TX FIFO is empty UART0->CPU INT.IMASK &= ~0x0800; II disable TX FIFO interrupt

}

Program 8.4.2. Interrupt-driven device driverfor the UART uses /1110 hard1vare FIFOs and t11Jo sojt1JJare F JFOs to buffer data (UAR.Tints). Observation: The bu sy-wait solution in Program 8.4. 1 and the interrupt solution in Program 8.4.2 have the same prototypes lnit In Char and OutChar. This separation of what it does (prototypes in the heade r file) from how it works (implementation in the code file) is the hallmark of modular de ign.

288

• 8. Communication Systems

8.5. Profiling Profiling is similar to performance debugging because both involve dynamic behavior. Profiling is a debugging process that collects the time histo1y of strategic variables. For example if we could collect the time-dependent behavior of the program counter, then we could see the execution patterns of our software. We can profile the execution of a multiple thread software system to detect reentrant activity. We can profile a software system to see which oft\vo software modules is run first. For a real-time system, we need to guarantee the time between when software should be run and when it actually runs is short and bounded. Profiling allows us to measure when software is actually run , experimentally verifying the system is real time. Checkpoint 8.5.1: Write two friendly debugging instrn ments, one that sets Port B bit 3 high, and the other makes it low.

8.5.1 Profiling using a software dump to study execution pattern In this section, we will use a software instrument to study the execution pattern of our software. In order to collect information concerning execution we wi ll define a debugging instrument that saves the time and location in an array (like a dump), as shown in Program 8.5.1. The debugging session will initialize the private global N to zero. In this profile, the place p will be an integer, uniquely specifying from which place in the software Profile is called.

uint32_t Time[lOO], Place[lOO], N; void Profile(uint32_t p){ if(NVAL; // record time Place[N] = p; // record place N++; }

Progra/J/ 8.5.1. Debugging instru!llent for profiling. The compiled version of Profile with Tl Clang requires about 59 cycles to execute. If the microcontrollcr is running at 80 MHz, this debugging instrument consumes about 0. 74 µs per call. This amount of time would usually be classified as minimally intrusive. Next, we add calls to the debugging instrument at strategic locations in the software, giving a different number for each place, as shown in Program 8.5.2. By observing these data, we can detennine both a time profile (when=SysTick timer) and an execution profile (where=p) of the software execution. Without the profiling sqrt2(10000) function runs in 3182 cycles. 18 calls to the debugger, each at 59 cycles, will slow down execution by over 1000 cycles. Therefore, profiling this program with a dump would be highly intrusive. This profiling method is appropriate for situations where the time between dumps is much longer than 59 cycles.

Jonathan VaJvano

289

uint32_t sqrt2 (uint 32_t s) { int n; // loop counter uint32 t t; // t*t will becomes Profile(0); t = s/16+1 ; // initial guess for (n = 16; n ; --n) { // will finish Profile(l); t = ((t *t+s )/t) /2 ; Profile(2); return t ;

Program 8.5.2. A time/position profile du111ping into a data amry.

8.5.2. Profiling using an Output Port In this section, we will discuss a hardware/software combination to visualize program activity. We will profile the same three places in the sq1t function . Our debugging instrument will set and clear output port bits. We will place these instruments at strategic places in the software. If we are using a regular oscilloscope, then we must stabilize the system so that the function is called over and over. We connect the output pins to an osci lloscope or logic analyzer and observe the program activity. Program 8.5.3 uses an output port to profile. Assume Port 8 pins 22 and 26 are initialized as outputs. The debugging profile only 194 cycles. Therefore this is less intrusive than the dump in Program 8.3.l. uint32_t sqrt3 (uint 32_t s) { int n; uint32 t t ; // t*t will becomes GPIOB->DOUTSET31_0 = 1POLARITY31 16 0x00000800; // falling GPIOB->CPU INT.ICLR = 0x00200000; // clear RIS bit 21 GPIOB->CPU-INT.IMASK = 0x00200000; // arm PB21 NVIC->IP[O] = (NVIC->IP[0]&(~0x0000FF00)) 12CPU INT.ICLR = 0x00200000; // clear bit 21 }

int main(void) { _disable_irq(); EdgeTriggered_Init(); Count = 0; _enable_irq(); while(l){ GPIOB->DOUTTGL31 0

= GREEN;

// toggle PB27

Progra171 9.3.1. Intermpt-drive11 edge-ttiggered input that counts falling edges of PB2 1 (Edgefotermpt). Program 9.3.2 shows the other way to acknow ledge edge-triggered interrupts.

void GROUPl_IRQHandler(void){ uint32 t stat; while((stat GPIOB->CPU INT.IIDX) 1 - 0){ if(stat 22){ // PB21 Count++; // number of touches GPIOB->DOUTTGL31 0 = RED; // toggle PB26

==

=

}

Progm111 9.3.2. JSR using IIDX to determine J1Jhichpin triggered the intermpt (Edgelntermpt). One of the problems with switches is called switch bounce. One solution to bounce is to use periodic interrupts to sample the switch position at a rate slower than the bounce. For example, if the bounce lasts l ms, use a pe1iodic interrupt with l Oms period. Another solution is to place a 0.1 ~LF capacitor in parallel with the switch. Checkpoint 9.3.1: The swi tch on PB21 is negati ve logic. What does negative logic mean?

Jonathan Valvano

303

9.4. Playing Sound Files Back in Chapter 5, we produced sound from a wavefoim table and adjusted the interrupt frequency to establish the pitch. In Chapter 5, we stored one period of the wave in a table. The voice was defined by the shape of the wave in the table (sine, trumpet, bassoon, etc). Each interrupt we read one value from the table and sent it to the DAC. When we got to the end of the table, we went back to the beginning, repeating the pattern over and over. If mis the size of table and.J;- is the frequency of the interrupt, the sound frequency is J;Im. Alternatively, in this section, we will play any .wav sound file using a DAC and a periodic interrupt. The length of the sound will be limited by the available size of ROM on the microcontroller. For example, if we allocate half of the available 128 kibibytes of flash to the sound, we can store 65536 bytes of data. If the DAC is less than or equal to 8 bits, each flash byte can store one sound sample. [f sound is sampled at l l .025 kHz, then we can store about 5.9 seconds of sound in the 65536 bytes of flash. You could double the length of time by halving the sampling rate. The first step is to find example .wav files you wish to import. You only have 5.9 seconds total, so find short sounds that will work well with your project. Next, you install Octave (free) or MATLAB (many schools have an educational license for students). Put the .wav files in the same folder as WC.m (Program 9.4.1). You can edit the samping rate, changing the 11025 to any value you wish. You need to load the signal package first, by typing

pkg load signal into the conunand prompt. Assume you wish to convert shoot.wav into 4-bit C code, execute WC(" shoot", 4) This will generate the file shoot.txt. You simply open shoot.txt and copy-paste the declaration into your software, see Program 9.4.2. Make sure the size of all your sounds will fit into the flash of the microcontroller. Some .wav files may produce very quiet sounds on a 5-bit DAC. You could adjust the gain in the line beginning with Spls = round to scale the data however you wish, being careful not to overflow the range of your DAC. To start playing a sound, we initialize the index I and enable the periodic timer. When the index I reaches the end of the sound, we turn off the interrupt. One way to turn on/off timer interrupts is access the ISER and ICER registers in the NVlC. In this case, the interrupt frequency is fixed at 11.025 kHz, and the JSR outputs one data value to the DAC.

## Author: Ramesh Yerraballi ## Use this script to convert a wav file into a C declaration of ## the samples with a sampling rate of ll.025kHz function WC(filename,precision) # load the file [SplsOrig, fs] = audioread(strcat(filename,' .wav')); # downsample to ll . 025kHz Spls = decirnate(SplsOrig,round(fs/11025)); # trim the precision of each sample Spls = round((Spls+ l)* (2Aprecision-1)/2); # write C declaration to file with txt extension

304



9. Embedded System Design

file= fopen(strcat(filename,' .txt'), 'w'); fprintf(file, strcat('const uint8_t \t', filename,'[' ,num2str(length(Spls)), ') = {'));

# The sample dump is done here fprintf (file, fprintf (file, fprintf (file, fclose(file); end

'%d, ', Spls (1: length (Spls) -1)); '%d' , Spls (length (Spls))) ; '}; \n') ;

Program 9.4. 1. Octa11e/ Matlab script to convert .wav file to C code (in WavPlqy folder). const uint8 t shoot[4080] = {8,6,6,10,12,8,2,6,12,7,3,5,9,13,9, ... } ;

static uint32 t I=0; void Sound_Off(void){ NVIC->ICER[0] = 1 ISER[0] = 1 CPU INT.IIDX) == l){ // this will acknowledge DAC_Out(shoot[I]); // output one I= I+l; if(I >= 4080) Sound_Off(); }

Program 9.4.2. Each invocation of the periodic interrupt outputs Otte 4-hit value to the DAC. 2.5

' r-'

fT

r--

, I 1.5

,.,.,

I SOm,

l

l30ms

230ms

,SQm,

.Figure 9.4.1. Analog output on 4-bit DAC of the shoot.wav sound (4080/ 11.025kHz = 370111s).

I L

r

rr

Jonathan Valvano

305

9.5. Modular Design Example In this section we outline the steps to build a hand-held game using the components presented in this book. The game will have inputs from switches and a slide potentiometer, and we will have outputs to a DAC and an LCD display. Sometimes we can restate the problem to allow for a simpler and possibly more powerful solution. We begin the design of the game by listing possible modules for our system.

ADC Switch Sound ST7735R Game engine

The interface to the joystick User interaction with LEDs and switches Sound output using the DAC Images displayed on the LCD The central controller that implements the game

Figure 9 .5. I shows a possible call graph for the game. An arrow in a call graph means software in one module can call functions in another module. This is a very simple organization with a layered hierarchy. This configuration is an example of good modularization because there are six software modules but only five arrows between software modules.

Game Engine

Sound Routines

ST7735 Routines

Switches LEDs

Speaker Figure 9.5.1. Possible call graph for the game. Figure 9.5.2 shows on possible data flow graph for the game. Recall that arrows in a data flow graph represent data passing from one module to another. Notice the high bandwidth communication occurs between the sound module and its hardware, and between the ST7735R module and its hardware. We will design the system such that software modules do not need to pass a lot of data to other software modules. As you combine modules from previous systems, make sure all the code is friendly.

306

• 9. Embedded System Design

-

Position

Voltugc ADCO Slide 1----+1PA22 Pot 30H z

Inputs Boolean Switches1--- GPIO Out uts LEDs

1---•/

30!-!z

30H z

5-bit

DAC PB4

Sound

Speaker

PB! PBO

The TimerA0 lSR will output a sequence of numbers to the DAC to create sound. Let shoot be an array of 4080 5-bit numbers, representing a sound sampled at 11 kHz. If the game engine wishes to make the explosion sound, it calls Sound_Start(shoot,4080); This function call simply passes a pointer to the shoot sound array into the sound module. The function Sound_Start does not output to the DAC. Rather, it sets a pointer and counter, and then enables the timer inte1rnpt. The TimerA0 JSR will output one 5-bit nwnber to the DAC for the next 4080 interrupts, and then disarm. The data flow from the game engine to the sound module is only two parameters (pointer and count), causing 4080 5-bit numbers to flow from the sound module to the DAC. To update the entire screen, the ST7735 module sends 128* 160*2 = 40960 bytes to the LCD. Since the screen is updated 30 times per second, 1,228,800 bytes/sec flows from the LCD hardware to the LCD screen. However the software needs to send data from ROM memory to the LCD hardware whenever it wishes to change an image on the screen. Let the image of a small enemy be stored in Enemy2 , which is an array of numbers, representing a 14 by 10 pixel image, see Figure 9.1.3 and Program 9.1.4. This image is defined in 14*10*2 = 280 bytes. If the game engine wishes to place this enemy in the center of the screen, it calls ST7735_DrawBitmap(64, 80, Enemy2, 14, 10); This function call simply passes five parameters, one of which is a pointer to the image array into the ST7735 modu le. If the enemy is moving, then 1,228,800 bytes/sec are flowing from the LCD hardware to the LCD screen, but the data flow from the game eng ine to the ST7735 module is 280 bytes*30/sec = 8400 bytes/sec. The data flow from the game engine to the ST5535 module will increase linearly with the number of objects moving on the screen, but remain much smaller than the data into the LCD hardware.

Jonathan Valvano

307

Figure 9.5.3 shows on possible flow chart for the game engine. It is important to perform the actual LCD output in the foreground. In this design there are three threads: the main program and two inte1rnpts. Multithreading allows the processor to execute multiple tasks. The main loop performs the game engine and updates the image on the screen. At 30 Hz, which is fast enough to look continuous, the SysTick ISR will sample the ADC and switch inputs. Based on user input and the game funct ion, the lSR will decide what actions to take and signal the main program. To play a sound, we send the Sound module an array of data and arm TirnerG0. Each TimerG0 interrupt outputs one value to the DAC. When the sound is over we could disarm TimerG0. TimerG0 JSR 30 Hz

TirnerA0 JSR II kHz

Initialization "fire" Enable inte1rnpts Create missile

0

Play "fire" sound

Draw sprites into screen buffer

moved Move player ship

Send screen buffer to LCD Semaphore=0

Acknowledge Move sprites Play sounds Semaphore=]

I Acknowledge t

Figure 9.5.3. Possibleflowchart for the game. For example, if the ADC notices a motion to the left, the TimerG0 lSR can tell the main program to move the player ship to the left. Similarly, if the TimerG0 JSR notices the fire button has been pushed, it can create a missile object, and for the next 100 or so interrupts the TimerG0 lSR will move the missile until it goes off screen or hits something. ln this way the missile moves a pixel or two every 33.3ms, causing its motion to look continuous. In sununary, the ISR responds to input and time, but the main loop performs the actual output to the LCD. Checkpoint 9.5.1: Notice the algorithm in Figure 9.5.3 samples the ADC and the fire button at 30 Hz. How times /sec can we fire a missile or wiggle the slide pot? Hint: think Nyquist Theorem. Checkpoint 9.5.2: Similarly, in Figure 9.5.3, what frequency components arc in the sound ouqJL1t?

The time to execute the TimerGO handler will be very fast because it perfo1ms NO LCD output.

• 9. Embedded System. Design

308

void TIMG0_IRQHandler(void){ // interrupts at 30Hz if((TIMG0->CPU_INT.IIDX) l){ // this will acknowledge // game engine runs at 30 Hz Move(); // sets NeedToDraw if needs to redrawn

==

}

int main(void){ _disable_irq(); Clock_Init80MHz(0); LaunchPad_Init(); ST7735_InitR(INITR_REDTAB); ST7735_Fil1Screen(ST7735_BLACK); // set screen to black Init(); // dynamic initialization of sprites, sounds, switches, LEDs TimerG0_IntArm(33333,40,2); // 40MHz/40/33333 30Hz _enable_irq(); while(l){ if(NeedToDraw){ Draw() ; // slow LCD output operates in foreground NeedToDraw =0; // mark as drawn

=

}

Progra111 9.5.1. Multithreaded approach to the hand-held game.

9.6. Best Practices • • • • • • • • • • • • • • • • • • •

Consider debugging when defining, designing, implementing, building and deploying. Careful thought during design can save lots of time during implementatio n and debugging. Choose good variable names so the software is easier to understand. Divide large projects into modules and test each module separately. Separate hardware from software bugs by first testing the software on a simulator. When designing modules start with the interfaces, e.g., the header files. The second step w hen designing modules is pseudo code typed in as comments. Make the time to service an interrupt short compared to the time between interrupts. When developing a modu lar system, try not to change the header tiles . Use a consistent coding style so aU your software is easy to read, change, and debug. Most of your time is spent changing or fixing existing code called maintenance. So, when designing code plan for testing and make it easy to change. Writing friendly code makes it easier to combine components into systems. Use quality connectors, because faulty connectors can be a difficult flaw to detect. It is your responsibility to debug your hardware and software. It is also your responsibility to debug other hardware/ software you put into your system. A simple solution is often more powerful than a complex solution . Listen carefully to your customers so you can understand their needs. Draw wiring diagrams of electrical circuits before building.

Jonathan Valvano • • • •

309

Double-check all the wiring be.{ ore turning on the power. D ouble-check all signals in cables, don't assume red is power and black is gro und. Be courageous enough to show yo ur work to others. Be humble enough to allow others to show you how yo ur system could be better.

I would like to acknowledge the many excellent teaching assistants I have had the pleasure of working with. Some of these hard-working, underpaid warriors include Pankaj Bishnoi, Rajeev Sethia, Adson da Rocha, Bao Hua, Raj Randeri , Santosh Jodh, Naresh Bhavaraju, Ashutosh Kulkarni, Bryan Stiles, V. K.rishnamwthy, Paul Johnson , Craig Kochis, Sean Askew, George Panayi, Jeehyun Kim, Vikram Godbole, Andres Zambrano, Ann Meyer, Hyunjin Shin, Anand Rajan, Anil Kottam, Chia-ling Wei , Jignesh Shah, lcaro Santos, David Altman, Nachiket K.baralkar, Robin Tsang, Byung Geun Jun, John Porterfield, Daniel Fernandez, Deepak Panwar, Jacob Egner, Sandy Hermawan, Usman Tariq, Sterling Wei, Seil Oh, Antonius Keddis, Lev Shuhatovich, Glen Rhodes, Geoffrey Luke, Katthik Sankar, Tim Van Ruitenbeek, Raffaele Cetrulo, Harshad Desai, Justin Capogna, Arindam Goswami , Jungho Jo, Mehmet Basoglu, Kathryn Loeffler, Evgeni Krimer, Nachiappan Valliappan, Razik Ahmed, Sundeep Korrapati, Peter Garatoni, Manan Kathmia, Jae Hong Min, Pratyusha Nidamaluri , Dayo Lawal, Aditya Srikanth, Kurt Fellows, James Beecham, Austin Blackstone, Brandon Carson, Kin Hong Mok, Omar Baca, Sam Oyetunji, Zack Lalanne, Nathan Quang Minh Thai, Paul Fagen, Zhuoran Zhao, Sparsh Singhai, Saugata Bhattacharyya, Chinmaya Dattathri, Emily Ledbetter, Kevin Gilbert, Siavash Kamali, Yen-Kai Huang, Michael Xing, Katherine Olin, Mitchell Crooks, Prachi Gupta, Mark Meserve, Sourabh Shirhatti, Dylan Zika, Kelsey Ball, Greg Cerna, Sabine Francis, Ahmad El Youssef, Wooseok Lee, Vimal Singh, Youngchun Kim, Jenny Chen, Peter Hu, Jimmy Brisson, Haley Alexander, Wendy Davidson, Chris Friesen, Daniel Pulliam, Irene Kuang, Adeesh Jain, Anoop Naravaram, Matthew Kornegay, Benjamin Cho, Nagaraja Reva1ma, Kei Kudose, Ce Wei, Shija Wei , Majid Jalili , Kamyar Barijou, Schuyler Christensen, Danny Vo, Thomas McRoberts, Vickie Fridge, Matthew Cosentino, Domino Weir, Rutvik Choudhary, Sean Duffy, Kylar Osborne, Prakash Luu, Akhila Bhatt, Austin Harris, Matthew Barondeau, Sneha Pendharkar, Jerry Yang, Phyllis Ang, Brian Tsai, Amogh Agnihotri , Woosoek Lee, Caleb Kovatch, Celine Lillie, Manish Ravula, John MacKay, Willy R. Vasquez, Wei Shi, Hyejun Im, Akhila Bhat, Arjun Ramesh, Dylan McCoy, Suhas Raja, Junlin Zhu, Muhammed Mohaimin Sadiq, Rebecca Phw1g, Sihyung Woo, Meiling Tang, Ashen Ekanayake, Rave Rajan, Ashkan Vafaee, Steven Zhu, Benjamin Thorell, Tosin E Jemilehin, Clark Poon, Sophia Jiang, Rahul Butani, Ethan Golla, Jason Fry, Aditi Katragadda, Charles Block, Ashen Ekanayake, Rave Rajan , Kashyap Mattoo, Weilin Cao, Jeffrey Marshall, Jason Wang, Adee! Rehman, Rishi Ponnekanti, Rafid Hasan, William Bundrant, Stephen Do, Rishabh Parekh, Claire Romero, Chase Block, Elise Johnson, Dinesh Reddy, Malav Shah, Cuauhtemoc Macias, Erruna Nie, Anna Guo, Prithvi Senthilkumar, Chet Pena, Anthony Hermez, Raymond Jiang, Jason Kacines, Anusha Razdan , Paul Han, Frank Col lebrusco, and Ayush RoyChowdhury. These teaching assistants have contributed greatly to the contents of this book and particularly to its laboratory assignments.

310

Appendix 1. Glossary 2's complement (see two ' s complement). accumulator High-speed storage located in the processor used to perform arithmetic or logical functions . The acc umulators on the ARM Co.1tex-M are Register RO thrnugh Rl2. accuracy A measme of how close our instrument measures the desired parameter referred to the N [ST. acknowledge Clearing the interrupt flag bit that requested the intem,pt. actuator Electro-mechanical or electro-chemical device that al lows computer commands to affect the external world . Examples include motors, relays, solenoids, and speakers. ADC Analog to digital converter, an electronic device that converts analog signals (e.g. , voltage) into digital form (i.e., integers). The ADC on the MSPM0 is 12 bits and can sample up to 1 M samples/sec. address bus A set of digital signals that connect the CPU, memory and 1/0 devices, specifying the location to read or write for each bus cycle. See also control bus and data bus. aliasing When digital values samp led at fs contain frequency components above 0.5 fs, then the apparent frequency of the data is sh ifted into the Oto 0.5 fs range. See Nyquist theory. alternatives the total number of possibilities. E.g., an 8-bit number scheme can represent 256 different numbers. An 8-bit digital to analog conve1ter (DAC) can generate 256 different analog outputs. arithmetic logic unit (ALU) Component of the processor that performs arithmetic and logic operations. arm Activate an individual trigger so that interrupts are requested when that trigger flag is set. ASCII American Standard Code for Information Interchange, a code for representing characters, symbols, and synchronization messages as 7 bit, 8-bit or 16-bit binary va lues. assembler System software that converts an assemb ly language program (human readable format) into object code (machine readable format). assembly directive Operations included in the program that are not executed by the computer at run time, but rather are interpreted by the assembler dming the assembly process. Same as pseudo-op. assembly listing Information generated by the assembler in human readable fonnat, typically show ing the object code, the original source code, assembly errors, and the symbol tab le. asynchronous protocol a protocol where the two devices have separate and distinct clocks atomic Software execution that cannot be divided or interrupted. Once started an atomic operation will run to its completion without interruption. On most computers the assembly language instructions arc atomic. All instructions on the ARM® CortexTM_M processor are atomic except PUSH POP. availability The po1tion of the total time that the system is working. MTBF is the mean time between failures , MTTR is the mean time to repair, and availability is MTBF/(MTBF+MTTR). bandwidth The information transfer rate, the amount of data transferred per second. Same as throughput. bandwidth coupling Module A is connected to Module B, because data flows from A to B. basis Subset from which linear combinations can be used to reconstruct the entire set. The basis of the 8-bit unsigned nw11ber system is 1, 2, 4, 8, 16, 32, 64, and 128. baud rate In general the baud rate is the maximum number of bits (sta1t, data, and stop) per time that can be transmitted, in a modem application it is the total number of sounds per time are transmitted. bi-directional Digital signals that can be either input or output. biendian The ability to process numbers in both big and little endian formats. big endian Mechanism for storing multiple byte numbers such that the most significant byte exists first (in the smallest memory addJess). See also little cndian. binary A system that has two states, on and off. binary operation A function that produces its result given two input parameters. For example, addition, subtraction , and multiplication are binary operations.

Jonathan V aJvano

311

binary recursion A recursive technique that makes two calls to itself during the execution of the function. See also recursion, linear recursion , and tail recursion. bipolar stepper motor A stepper motor where the Clll'rent flows in both directions (in/out) along the interface wires; a stepper with 4 interface wires. bit Basic unit of digital information taking on the value of either 0 or 1. bit time The basic unit of time used in serial communication. blind cycle A software/hardware synchronization method where the software waits a specified amount of time for the hardware operation to complete. The software has no direct information (blind) about the status of the hardware. Board Support Package (BSP) A set of software routines that abstract the 1/0 hardware such that the same high-level code can run on multiple computers. Same as hardware abstraction layer (HAL). borrow During subtraction, if the difference is too small , then we use a borrow to pass the excess infomiation into the next higher place. For example, in decimal subtraction 36-27 requires a borrow from the ones to tens place because 6-7 is too sma ll to fit into the 0 to 9 range of decimal numbers. break or trap A break or a trap is an instrument that halts the processor. When encountered it will stop your program and jump into the debugger. Therefore, a break halts the software. The condition of being in this state is also referred to as a break. breakpoint The place where a break is inse1ted, the time when a break is encountered, or the time period when a break is active. buffered 1/0 A FIFO queue is placed in between the hardware and software in an attempt to increase bandwidth by allowing both hardware and software to nm in parallel. burn The process of programming a ROM, PROM or EEPROM. bus A set of digital signals that connect the CPU, memory and 1/0 devices, consisting of address signals, data signals and control signals. See also address bus, control bus and data bus. busy wait A software/hardware synclu·onization method where the software continuou sly reads the hardware status waiting for the hardware operation to complete. The software usually performs no work while waiting for the hardware. Same as gadfly. Sarne as polling. byte Digital information contain ing 8 bits. In C, we use char or unsigned char to create a byte. In C99, we use int8_t or uint8_t to create a byte. In both C and C99, we use char to create an 8-bit ASCil character. call graph A graphical way to define how the software/hardware modules interconnect. If a function in module A invokes a function in module B, then there is an arrow from A to B. carry During addition, if the sum is too large, then we use a carry to pass the excess information into the next higher place. For example, in decimal addition 36+27 requires a carry from the ones to tens place because 6+ 7 is too big to fit into the Oto 9 range of decimal numbers. ceiling Establishing an upper bound on the result of an operation. Central Limit Theorem The CLT states as independent random variables are added, their sum tends toward a Normal or Gaussian distribution. channel The hardware that allows communication to occur. checksum The simple sum of the data, usually in finite precision (e.g., 8, 16, 24 bits). client A programmer/engineer who will use our software and/or hardware. This is typically not the end-user of the final system rather it is another engineer who will integrate our software and/or hardware into a larger system. closed loop control system A control system that includes sensors to measure the current state variables. These inputs are used to drive the system to the desired state. CMOS A digital logic system called complementary metal oxide semiconductor. It has prope1ties oflow power and small size. its power is a function of the number of transitions per second.

312

• Appendix 1. Glossary

cohesion A cohesive module is one such that all parts of the module are related to each other to satisfy a common objective. compiler System software that converts a high-level language program (human readable format) into object code (machine readable format). complex instruction set computer (CISC) A computer with many instructions, instructions that have varying lengths, instructions that execute in vaiying times, many instructions can access memo1y, one instruction may both read and write memory, fewer and more specialized registers, and many different types of addressing modes. Contrast to RISC. concurrent programming A computer system that suppo1ts two or more software tasks that are simultaneously active. Typically one task executes at a time, and there are mechanisms to suspend one task and execute another task. Compare to parallel programming. control bus A set of digital signals that connect the processor, memo1y and I/O devices, specifying when to read or write for each bus cycle. See also address bus and data bus. control coupling Module A is connected to Module B, because actions in A affect the control path in 8. control unit (CU) Component of the processor that determines the sequence of operations. CPU bound A situation where the input or output device is faster than the software. In other words it takes less time for the I/O device to process data, than for the software to process data. To improve banwidth we can use a faster computer or more efficient software. Compare to I/O bound. critical section Locations within a software module, which if an interrupt were to occur at one of these locations, then an error could occur (e.g. , data lost, corrupted data, program crash, etc.) Same as vulnerable window. cyber-physical system A system that performs a specific dedicated operation where the computer is hidden or embedded inside the machine. The system has intelligence in the software and physical connections to the real world. Same as embedded system. DAC Digital to analog converter, an electronic device that converts digital signals (i.e. , integers) to analog form (e.g. , voltage). data acquisition system (DAS) A system that collects information, same as instrument. data bus A set of digital signals that connect the CPU, memory and 1/0 devices, specifying the value that is being read or written for each bus cycle. See also address bus and control bus. data flow graph A block diagram of the system, showing the flow of information. Arrows represent the flow of data from one module to another. decibel A measure of the relative amplitude of two voltages: dB= 20 log1o(Vi/V2). It is also refers to the relative amplitude of two powers: dB= 10 log1o(PdP2). desk-checking or dry run We perfonn a desk check (or dry run) by determining in advance, either by analytical algorithm or explicit calculations, the expected outputs of strategic inte1mediate stages and final results for typical inputs. We then run our program and compare the actual outputs with this template of expected results. device driver A collection of software routines that perform l/O functions. digital signal processing Processing of data with digital hardware or software after the signal has been sampled by the ADC, e.g., filters, detection and compression/decompression. disarm Deactivate a trigger flag so that intenupts are not requested when that trigger fl ag is set. OMA Direct Memory Access is a software/hardware synchronization method where the hardwai·e itself causes a data transfer between the I/O device and memory at the appropriate time when data needs to be transferred. The software usually can perform other work while waiting for the hardwai·e. No softwai·e action is required for each individual byte. double byte Two bytes containing 16 bits. Same as halfword. double-pole switch Two separate and complete switches that are activated together, same as two-pole. Contrast with single-pole.

Jonathan Valvano

313

double-throw switch A switch with tlu·ee contact connections. The center contact will be connected exactly one of the other two contacts. Contrast with single-throw. download The process of transferring object code from the host (e.g., the PC) to the target microcontroller. dropout An error that occw·s after a tight shift or a divide, and the consequence is that an intermediate result loses its ability to represent all of the values. E.g., J=l00*(N/51) can only result in the values 0, 100, or 200, whereas I=(l 00*N)/5 l properly calculates the desired result. duty cycle For a periodic digital wave, it is the percentage of time the signal is high. dynamic efficiency A measure of how fast the program executes. dynamic RAM Volatile read/write storage built from a capacitor and a single transistor having a low cost, but requiring refresh. Contrast with static RAM. Most laptops use dynamic RAM. EEPROM Electrically erasable programmable read only memory that is nonvolatile and easy to reprogram. embedded computer system A system that performs a specific dedicated operation where the computer is hidden or embedded inside the machine. The system has intelligence in the software and physical connections to the real world. Same as cyber-physical system. EPROM Sarne as PROM. Electrically programmable read only memory that is nonvolatile and requires external devices to erase and reprogram. It is usually erased using UV li ght. erase The process of clearing the information in a PROM or EEPROM. The information bits are usually all set to logic 1. EVB Evaluation Board, a product used to develop microcomputer software. Same as LaunchPad. even parity A communication protocol where the number of ones in the data plus a parity bit is an even number. Contrast with odd parity. fan out The nwnber of inputs that a single output can drive if the devices are all in the same logic family . filter ln the debugging context, a tilter is a Boolean function or conditional test used to make run-time decisions. For example, ifwe print information only if two vaiiables x, y are equal, then the conditional (x=y) is a filter. Filters can involve hardware status as well. Finite State Machine (FSM) An abstract design method to build a machine with inputs and outputs. The machine can be in one of a finite number of states. Which state the system is in represents memory of previous inputs. The output and next state are a function of the input. There may be time delays as well. firm real-time A system that expects all critical tasks to complete on time. Once a deadline as passed, there is no value to completing the task. However, the consequence of missed deadlines is real but the overall system operates with reduced quality. Streaming audio and video are typical examples. Compare to hard real-time and soft real-time. fixed point A technique where calculations involving nonintegers are pe1formed using a sequence of integer operations. E.g., 0.123*x is performed in decimal fixed point as (123*x)/1000 or in binary fixed point as (126*x)» l0. flash EEPROM Electrically erasable programmable read only memory that is nonvolatile and easy to reprogram. We can erase all bits to 1, and we can program individual bits to 0. It is fast (I or 2 bus cycles) to read, but slow (ms) to erase and program. We do not write to it like RAM, but use special software sequences to erase and program, which is why it is classified as read only. floating A logic state where the output device does not drive high or pull low. The outputs of open collector and tristate devices can be in the floating state. Same as HiZ. floor Establishing a lower bound on the result ofan operation. fork Used in parallel programming to create additional softwai·e tasks that will run in parallel. See join. frame A complete and distinct packet of bits occurring in a serial communication channel. framing error An error when the receiver expects a stop bit ( I) and the input is 0. friendly Friendly software modifies just the bits that need to be modified, leaving the other bits unchanged.

314



Appendix 1. Glossary

full-duplex channel Hardware that allows bits (information, error checking, synchronization or overhead) to transfer simultaneously in both directions. Contrast with simplex and half-duplex channels. full-duplex communication A system that allows information (data, characters) to transfer simultaneously in both directions. functional debugging The process of detecting, locating, or correcting functional and logical errors in a program and the process of instrumenting a program for such pw-poses is called functional debugging or often simply debugging. Contrast with perfonnance debugging. gadfly A software/hardware synchronization method where the software continuously reads the hardware status waiting for the hardware operation to complete. The software usually performs no work while waiting for the hardware. Same as busy wait. Same as polling general purpose computer system A system like the PC or Macintosh with a keyboard, disk and display that can be programmed for a wide variety of purposes. half-duplex channel Hardware that allows bits (information, error checking, synchronization or overhead) to transfer in both directions, but in only one direction at a time. Contrast with simplex and full-duplex channels. half-duplex communication a system that allows information to transfer in both directions, but in only one direction at a time. halfword Two bytes containing 16 bits. Same as double byte. In C, we use short or unsigned short to create a halfword. In C99, we use int16 _tor uint16_ t to create a halfword. handshake A software/hardware synchronization method where control and status signals go both directions between the transmitter and receiver. The communication is interlocked meaning each device will wait for the other. hard real-time A system that can guarantee that a process will complete a critical task within a certain specified range. ln data acquisition systems, hard real-time means there is an upper bound on the latency between when a sample is supposed to be taken (eveiy 1/fs) and when the ADC is actually started. Hard real-time also implies that no ADC samp les are missed. Compare to soft real-time and firm realtime. Harvard architecture A computer architecture where instructions are fetched from a different bus from where data are fetched. heartbeat A debugging monitor, such as a flashing LED, we add for the purpose of seeing if our program is runrnng. hexadecimal A number system that uses base 16. HiZ A logic state where the output device does not drive high or pull low. The outputs of open collector and tristate devices can be in the HiZ state. Same as floating. hold time When latching data into a device with a rising or falling edge of a clock, the hold time is the time after the active edge of the clock that the data must continue to be valid. Contrast with setup time. hysteresis A condition when the output of a system depends not only on the input, but also on the previous outputs, e.g., a transducer that follows a different response curve when the input is increasing than when the input is decreasing. 1/0 bound A situation where the input or output device is slower than the software. In other words it takes longer for the 1/0 device to process data, than for the software to process data. To improve banwidth we can use a faster 1/0 device. Compare to CPU bound. 1/0 device A computer component capable of bringing information from the external environment into the computer (input device), or sending data out from the computer to the external environment (output device.) 1/0 port A hardware device that co1111ects the computer with external components. IIH Input current when the signal is high. 111, Input cmTent when the signal is low.

Jonathan Valvano

315

immediate An addressing mode where the operand is a fixed data or address value. impedance The ratio of the effort (voltage, force, pressure) divided by the flow (CLLrrent, velocity flow). indexed An addressing mode where the data or address value for the instruction is located in memory pointed to by an index register. input capture A mechanism to set a flag and capture the current time (timer counter value) on the rising, falling or rising&falling edge of an external signal. The input capture event can also request an interrupt. input impedance The input voltage divided by the input current. When an input, Vin, is applied to the MSPM0 ADC, some current, !;,,, will flow into the pin. Z;11= V;Jl;n = SOOD. instrument An instrument is the code injected into a program for debugging or profiling. This code is usually extraneous to the normal function of a program and may be temporary or permanent. Instruments injected during interactive sessions are considered to be temporary because these instruments can be removed simply by terminating a session. Instruments inj ected in source code are considered to be permanent because removal requires editing and recompiling the source. An example of a temporary instrument occw-s when the debugger replaces a regular op code with a breakpoint instruction. This temporary instrwnent can be removed dynamically by restoring the original op code. A print statement added to your source code is an example of a permanent instrument, because removal requires ed iting and recompiling. instrument A system that collects information, same as data acquisition system. instrumentation The process of injecting or inserting an instrument. interrupt A software/hardware synchronization method where the hardware causes a special software program (interrupt handler) to execute when its operation to complete. The software usually can perform other work while waiting for the hardware. interrupt flag A status bit that is set by the hardware to sign ify an external event has occurred. Same as trigger flag. On the MSPM0 most interrupt flags are in the RIS registers . interrupt mask A control bit that, if programmed to 1, will cause an interrupt request when the associated flag is set. Same as arm. interrupt polling A software function to look and see which of the potential sources requested the interrupt. interrupt service routine (ISR) Program that runs as a result of an interrupt. interrupt vector 32-bit values at the beginning of memory specifying where the software should execute after an interrupt request. There is a unique interrupt vector for each type of interrupt. intrusive A characteristic of a debugging instrument when the presence of the collection of information itself does significantly affect the parameters being measured. invocation coupling Module A is connected to Module B, because A calls B. loH Output current when the signal is high. This is the maximum current that has a voltage above YoHloL Output current when the signal is low. This is the maximum current that has a voltage below V oLjoin Used in parallel programming to combine two or more software tasks into one. Execution after a join will continue when all software tasks above the join are complete. See fork. kibibit Stands for kilo-binary-bits, which is 1024 bits or 128 bytes, abbreviated Kibit. kibibyte Stands for kilo-binary-bytes, which is I024 bytes or 8192 bits, abbreviated KiB. latch As a noun, it means a register. As a verb, it means to store data into the register. latency ln this book latency usually refers to the response time of the computer to external events. For an input device, latency is the time between new input becoming avai lable and the time the input is read by the software. For an output device, latency is the time from ready to accept more data to the time the software writes new data to it. There can also be a latency for an 1/0 device, which is the response time of the external l/0 device hardware to a software command. LCD Liquid Crystal Display, where the computer controls the reflectance or transmittance of the liquid crystal, characterized by its flexible display patterns, low power, low cost, and slow speed.

316

■ Appendix

1. Glossary

LED Light Emitting Diode, where the computer controls the electrical power to the diode, characterized by its simple display patterns, medium power, and high speed. linear recursion A recursive technique that makes only one call to itself during the execution of the function. Linear recursive functions are easier to implement iteratively. We draw the execution pattern as a straight or linear path. See also recursion, binary recursion, and tail recursion. little endian Mechanism for storing multiple byte numbers such that the least significant byte exists first (in the smallest mem01y address). Contrast with big endian. Little's Theorem Let Nbe the average number of packets in the system, one being processed and N- 1 stored in the queue. Let A. be the average arrival rate in packets per second (pps ). Let R be the average response time of a packet, which includes the time waiting in the queue plus the time for the consumer to process the packet. Little's Theorem states that N =AR. loader System software that places the object code into the microcomputer's memory. If the object code is stored in EEPROM, the loader is also called an EEPROM programmer. logic analyzer A hardware debugging tool that allows you to visualize many digital logic signals versus time. Real logic analyzers have at least 32 channels and can have up to 200 channels, with sophisticated techniques for triggering, saving and analyzing the real-time data. LSB The least significant bit in a number system is the bit with the smallest significance, usually the rightmost bit. With signed or unsigned integers the significance of the LSB is 1. maintenance Process of verifying, changing, correcting, enhancing, and extending a system . mark A digital value of true or logic 1 used in serial communication. Contrast with space. mask As a verb, mask is the operation that selects ce1tain bits out of many bits, using the logical and operation. The bits that are not being selected will be cleared to zero. When used as a noun, mask refers to the specific bits that are being selected. Mealy FSM A FSM where the both the output and next state are a function of the input and state measurand A signal measured by a data acquisition system. mebibit Stands for mega-binary-bits, which is 1,048,576 bits, abbreviated Mibit. mebibyte Stands for mega-binary-bytes, which is 1,048,576 bytes, abbreviated MiB. memory A computer component capable of storing and recall ing information. memory-mapped 1/0 A configuration where the 1/0 devices are interfaced to the computer in a manner identical to the way memories are connected, from an interfacing perspective I/0 devices and memory modules shares the same bus signals, from a programmer's point of view the I/0 devices exist as locations in the 111em01y map, and 1/0 device access can be performed using any of the memory access instructions. Mibit Stands for mega-binary-bits, which is 1,048,576 bits, same as mebibit. MiB Stands for mega-binary-bytes, which is 1,048,576 bytes, same as mebibyte. microcomputer An electronic device capable of perfo1ming input/output functions containing a microprocessor, memory, and 1/0 devices. microcontroller A single chip microcomputer li ke the Texas Instruments MSPM0, NXP 9Sl2, Intel 8051, Atmel ATmega328, Atmel SAM3X8E, PICl 6, or the Texas Instruments MSP430. minimally intrusive A characteristic of a debugging instrument when the presence of the collection of info1rnation itself has a small but insignificant effect on the parameters being measured. mnemonic The symbolic name ofan operation code, like mov str push. modularity A measure of organization of a system. A modular system maximizes the number of modules, but minimizes coupling. Coupling is the interaction between modules. Examples of coupling include: bandwidth coupling (data flow), invocation coupling (number of times one module calls another module), and control coupling (actions in one module affect behavior of another module). Shared global variables is poor design because it creates hard to understand control coupling.

Jonathan Valvano

317

monitor or debugger window A monitor is a debugger feature that allows us to passively view strategic software parameters during the real-time execution of our program. An effective monitor is one that has minimal effect on the petformance of the system. When debugging software on a windows-based machine, we can often set up a debugger window that displays the current value of certain software variables. MSB The most significant bit in a nutnber system is the bit with the greatest sign ificance, usually the left-most bit. If the number system is signed, then the MSB signifies positive (0) or negative (1). multi-threaded A system with multiple threads (e.g., main program and interrupt service routines) that cooperate towards a common overall goal. negative logic A signal where the true value has a lower voltage than the false value, in digital logic true is 0 and false is 1, in digital logic true is less than 0. 7 volts and false is greater than 2 volts, in RS232 protocol true is -5.5 volts and false is +5.5 volts. Contrast with positive logic. nibble 4 binary bits or 1 hexadecimal digit. nonatomic Software execution that can be divided or interrupted. Most lines of C code require multiple assembly language instructions to execute, therefore an intem1pt may occur in the middle of a line of C code. The instructions store and load multiple, PUSH POP, are nonatomic. nonintrusive A characteristic of a debugging instrument when the presence of the collection of information itself does not affect the parameters being measured. Nonintrusiveness is the characteristic or quality of a debugger that allows the software/hardware system to operate normally as if the debugger did not exist. Intrusiveness is used as a measure of the degree of perturbation caused in program performance by an instrument. For example, a print statement added to your source code and single -stepping are very intrusive because they significantly affect the real-time interaction of the hardware and software. When a program interacts with real-time events, the performance is significantly altered. On the other hand, an instrument with outputs strategic information on LEDs (that requires just 1 µs to execute) is much less intrusive. A logic analyzer that passively monitors the address and data by is comp letely non intrusive. noninvasive/invasive Non invasiveness is the characteristic or quality of a debugger that makes the order of invocation immaterial. The debugger and the user program co-exist in the same global environment. On the other hand, an invasi ve debugger requires the user program to execute within an envirnnment defined by the debugger. The debugger is invoked first and the program is then loaded either by the debugger or by the user from within the debugger. Invasiveness is also a measure of the degree of source code modification to debug or monitor a program. A resident debugger like the serial monitor is invasive because it exists first and then your program is loaded on top of it. This program development environment is invasive because the UART interrupts with the serial monitor is different from the eventual the single chip embedded application. nonreentrant A software module that once started by one thread , cannot be interrupted and executed by a second thread. Nonreentrant modules usually involve nonatomic accesses to global variables or l/0 ports: read modify write, write followed by read, or a multistep write. nonvolatile A condition where infor!hation is not lost when power is removed . When power is restored, then the information is in the stat(! that occurred when the power was removed. nonvolatile RAM Read/write storage that achieves its long term storage abili ty because it includes a battery. Nyquist Theorem If a input signal is captured by an ADC at the regular rate of./4 samples/sec, then the digital sequence can accurately represent the O to 1/-,j; frequency components of the original signal. object code Programs in machine readable format created by the compiler or assembler. odd parity A communication protocol where the number of ones in the data plus a parity bit is an odd number. Contrast with even parity.

318

• Appenclix 1. Glossary

op code opcode or operation code A specific instruction executed by the computer. The op code along with the operand completely specify the function to be performed. ln assembly language programming, the op code is represented by its mnemonic, like LOR. During execution, the op code is stored as a machine code loaded in memory. open collector A digital logic output that has two states low and HiZ. On CMOS circuits, it is sometimes called open drain. open drain A CMOS digital logic output that has two states low and HiZ. Often used interchangeably with the term open collector. operand The second part of an instruction that specifies either the data or the address for that instruction. An assembly instruction typically has an op code and an operand (e.g., #55). Instructions that use inherent addressing mode have no operand field. operating system System software for managing computer resources and faci litating common functions like input/output, memory management, and file system. oscilloscope A hardware debugging tool that allows you to visualize one or two analog signals versus time. output impedance A specification of how strong an output signal is. Zou, is the open circuit output voltage divided by the short circuit output current. In the 2-bit DAC made with a l0k.Q and a 20k.Q resistor, if the both digital signals are high , the open circuit voltage is 3.3V. If the output of the DAC is shorted, 3.3V/10k.Q=0.33mA will flow through the I0k.Q, and 3.3 V/20k.Q=0.165mA will flow through the 20k.Q, making a total short circuit current of about 0.5mA. Zou,= 3.3V/0.5mA = 6.6k.Q overflow An error that occu rs when the result of a calculation exceeds the range of the number system. For example, with 8-bit unsigned integers, 200+57 will yield the incon-ect result of 1. overrun error An error that occurs when the receiver gets a new frame but the data register and shift register already have information. parallel port A port where all signals are available simultaneously. In this book the parallel ports are 8 bits wide. Some po1is have less than 8 bits. Also called GPlO. parallel programming A computer system that supports simultaneous execution of two or more software tasks. Compare to concuJTent programming. parity A communication protocol to detect errors du.ring transmission the data plus a parity bit is transmitted, and the receiver checks the data plus parity. See even parity and odd parity. PC-relative An addressing mode where the effective address is calculated by its position rel ative to the cun-ent value of the program counter. performance debugging or profiling The process of acquiring or modifying timing characteristics and execution patterns of a program and the process of instrumenting a program for such purposes is cal led performance debugging or profiling. Contrast with functional debugging. periodic polling A software/hardware synchronization method that is a combination of interrupts and busy wait. An interrupt occw·s at a regular rate (periodic) independent of the hardware status. The interrupt handler checks the hardware device (polls) to determine if its operation is complete. The software usually can perform other work while waiting for the hardware. personal computer system A small general purpose computer system having a price low enough for individual people to afford and used for personal tasks. A laptop is an example ofa personal computer. port External pins through which the microcomputer can perform input/output. Same as 1/0 port. positive logic a signal where the true value has a higher voltage than the false value, in digital logic true is 1 and false is 0, in digital logic true is greater than 2 vo lts and false is less than 0.7 volts . l2C protocol is positive logic: true greater than 0V and false is 0 volts . Contrast with negative logic.

Jonathan Valvano

319

precision For an input signal, it is the number of distinguishable input signals that can be reliably detected by the measurement. For an output signal, it is the number of different output parameters that can be produced by the system. For a number system, precision is the number of distinct or different values of a number system in units of"alternatives". The precision of a number system is also the number of binary digits required to represent all its numbers in units of"bits". priority When two requests for service are made simultaneously, priority determines which order to process them. Ifwe are processing a low priority task and a higher priority request is received, we will suspend the low priority task, execute the high priority task to completion, and then return to the lower priority task. private Can be accessed only by software functions in that module. Contrnst with public. private variable A variable that is used by a single module, and not shared with other modules. probability mass function (PMF). A plot showing the shape of the noise process of a signal. The input is fixed, and the signal is measured multiple times. The y-axis is the number of times the value on the xaxis was observed. process The execution of software that does not necessarily cooperate with other processes. Contrast with thread. Processes generally do not share global memory or I/0 devices. producer-consumer A multi-threaded system where the producers generate new data, and the consumers process or output the data. profiling The process of acquiring or modifying timing characteristics and execution patterns of a program and the process of instrumenting a program for such purposes is called performance debugging or profiling. Same as perfonnance debugging. program counter (PC) A register in the processor that points to the memory containing the instruction to execute next. program status register (PSR) Register in the processor that contains the status of the previous ALU operation, as well as some operating mode flags such as the interrupt enable bit. promotion Increasing the precision of a number for convenience or to avoid overflow errnrs during calculations. pseudo-code A shorthand for describing a software algorithm. The exact format is not defined, but many programmers use their favorite high-level language syntax (like C) without paying rigorous attention to the punctuation. pseudo op Operations included in the program that are not executed by the computer at run time, but rather are interpreted by the assembler during the assembly process. Same as assembly directive. public Can be accessed by any software module. Contrast with private. public variable A variable that is shared by multiple programs or threads. pulse width modulation A technique to deliver a variable signal (voltage, power, and energy) using an on/off signal with a variable percentage of time the signal is on (duty cycle). Sarne as variable duty cycle. RAM Random Access Memory, a type of memory where the information can be stored and retrieved easily and quickly (1 bus cycle). Since it is volatile the information is lost when power is removed. range Includes both the smallest possible and the largest possible signal (input or output). The difference between the largest and smallest input that can be measured by the instrument. The units are in the units of the measurand. When precision is in alternatives, range=precision •resolution. real-time A system that can guarantee an upper bound (worst case) on latency . real-time computer system A system where time-critical operations occur when needed. recursion A programming technique where a function calls itself. See also linear recursion , tail recursion, and binary recursion.

320



Appendix 1. Glossary

reduced instruction set computer (RISC) A computer with a few instructions, instructions with fixed lengths, instructions that execute in l or 2 bus cycles, only load and store can access memory, no one instruction can both read and write memory, many identical general purpose registers, and a limited number of addressing modes. Contrast to CISC. reentrant A software module that can be started by one thread, interrupted and executed by a second thread. A reentrant module allows multiple threads to properly execute the desired function. registers High-speed storage located in the processor. The registers in the ARM® Cortex™-M processor include RO through Rl5. reproducibility (or repeatability) A parameter specifying how consistent over time the measurement is when the input remains fixed. requirements document A formal description of what the system will do in a ve1y complete way, but not including how it will be done. 1t should be unambiguous, complete, verifiable, and modifiable. reset vector The 32-bit value at memo1y locations 4-7 specifying where the software should start after power is turned on or after a hardware reset. resolution For an input signal, it is the smallest change in the input parameter that can be reliably detected by the measurement. For an output signal, it is the smallest change in the output parameter that can be produced by the system, range equals precision times resolution, where precision is given in alternatives. ritual Software, usually executed once at the beginning of the program, that defines the operational modes of the 1/0 ports. ROM Read Only Memory, a type of memory where the information is programmed into the device once, but can be accessed quickly. It is low cost, must be purchased in high volume, and can be programmed only once. The MSPM0 does not have ROM, it has flash EEPROM. roundoff The error that occurs in a fixed-point or floating-point calculation when the least significant bits of an intermediate calculation are discarded so the result can fit into the finite precision. sampling rate The rate at which data is collected in a data acquisition system. Sampling rate applies to both the ADC while collecting data, and the DAC while outputting data. scan or scanpoint Any instrument used to produce a side effect without causing a break (halt) is a scan. Therefore, a scan may be used to gather data passively or to modify functions of a program. Examples include software added to yom source code that simply outputs or modifies a global variable without halting. A scanpoint is triggered in a manner similar to a breakpoint but a scanpoint simply records data at that time without halting execution. scope A logic analyzer or an oscilloscope, hardware debugging tools that allows you to visualize multiple digital or analog signals versus time. semaphore A system function with two operations (wait and signal) that provide for th.read synchronization and resource sharing. sensitivity The sensitivity ofa transducer is the slope of the output versus input response. The sensitivity ofa data acquisition system that detects events is the percentage of actual events that are properly recognized by the system. serial communication A process where information is transmitted one bit at a time. serial peripheral interface (SPI) device to transmit data with synchronous se1ial communication protocol. The clock is shared on both sides. Same as synchronous serial interface (SSI). serial port An 1/0 port where the bits are input or output one at a time. setup time When latching data into a device with a rising or falling edge of a clock, the setup time is the time before the active edge of the clock that the data must be valid. Contrast with hold time. signed two's complement binary A mechanism to represent signed integers where I followed by all O's is the most negative number, all 1's represents the value -1 , all O's represents the value 0, and 0 followed by all l's is the largest positive number.

Jonathan Valvano

321

sign-magnitude binary A mechanism to represent signed integers where the most significant bit is set if the number is negative, and the remaining bits represent the magnitude as an unsigned binary. simplex channel Hardware that allows bits (information, error checking, synchronization or overhead) to transfer only in one direction. Contrast with half-duplex and full-duplex channels. simplex communication A system that allows information to transfer only in one direction. simulator A simulator is a software application, which simu lates or mimics the operation of a processor or computer system. Most simulators recreate only simple I/O po1ts and often do not effectively duplicate the real-time interactions of the software/hardware interface. On the other hand, they do provide a simple and interactive mechanism to test software. single-pole switch One switch that acts independent from other switches in the system. Contrast with double pole. single-throw switch A switch with two contact connections. The two contacts may be connected or disconnected. Contrast with double-throw. soft real-time A system that implements best effo1t to execute critical tasks on time, typically using a priority scheduler. Once a deadline as passed, the value of completing the task diminishes over time. Compare to hard real-time and firm real-time. software interrupt vector The 32-bit value at memory locations in low memory specifying where the software should go after executing a software interrupt instruction , svc. software maintenance Process of verifying, changing, correcting, enhancing, and extending software. source code Programs in human reaclable fonnat created with an editor. space A digital value of false or logic 0 used in serial communication. Contrast with mark. specificity The specificity of a transdl1cer is the relative sensitivity of the device to the signal of interest versus the sensitivity of the device to other unwanted signals. The specificity of a data acquisition system that detects events is the percentage of events detected by the system that are actually true. stabilize The process of stabilizing a software system involves specifying all its inputs. When a system is stabilized, the output results arc consistently repeatable. Stabilizing a system with multiple real-time events, like input devices and time-dependent conditions, can be difficult to accomplish. It often involves replacing input hardware with sequential reads from an array or disk file. stack Last in first out data structure lc>cated in RAM and used to temporarily save info1mation. stack pointer (SP) A register in the processor that points to the RAM location of the stack. start bit An overhead bit(s) specifying the beginning of the frame , used in serial communication to synchronize the receiver shift register with the transmitter clock. See also stop bit, even parity and odd parity. static efficiency A measure of program size, which is number of memory bytes required. In an embedded system we need to specify bc)th RAM size for variables/stack and ROM size for programs/constants. static RAM Volatile read/write storage built from three transistors having fast speed, and not requiring refresh. RAM on the MSPM0 is static RAM . It will keep its value even if the processor goes to sleep. Contrast with dynamic RAM. stepper motor A motor that moves in discrete steps. stop bit An overhead bit(s) specifying the end of the frame, used in serial communication to separate one frame from the next. See also start bit, even parity and odd parity. string A sequence of ASCH charactei:s, usually terminated with a zero. symbol table A mapping from a symbolic name to its corresponding 32-bit address, generated by the assembler in pass one and displayed in the listing file. synchronous protocol A system where the two devices share the same clock. tachometer A sensor that measures the revolutions per second of a rotating shaft. tail recursion A technique where the recursive call occurs as the last action taken by the function. See also recursion, binary recursion, and linear recursion.

322

• Appendix 1. Glossary

thread The execution of software that cooperates with other threads. A thread embodies the action of the software. One concept describes a thread as the sequence of operations including the input and output data. Contrast with process. throughput The information transfer rate, the amount of data transferred per second . Same as bandwidth. time constant The time to reach 63 .2% of the fina l output after the input is instantaneously increased. time profile and execution profile Time profile refers to the timing characteristic of a program and execution profile refers to the execution pattern of a program. toggle Change Oto 1 or 1 to 0. A toggle switch is one that if it is off when you push it, it will turn on. If it is on when you push it, it will turn off. Triple toggle is a debugging technique where one toggles three times in an ISR, so you can measure the time within an ISR and the time between ISR invocations. transducer A device that converts one type of signal into another type. trigger flag Status bit tl1at is set by hardware to signify an external event has occurred. Same as interrupt flag. tristate The state of a tristate logic output when HiZ or not driven. tristate logic A digital logic device that has three output states low, high, and HiZ. truncation The act of discarding bits as a nwnber is converted from one fonnat to another. two-pole switch Two separate and complete switches, which are activated together, same as double-pole. two's complement A number system used to define signed integers. The MSB defines whether the number is negative (1) or positive (0). To negate a two's complement number, one first complements (flip from 0 to I or from I to 0) each bit, then add 1 to the number. unary operation A function that produces its result given a single input parameter. For example, negate, increment, and decrement are unary operations. unbuffered 1/0 The hardware and software are tightly coupled so tllat both wait for each other during the transmission of data. unipolar stepper motor A stepper motor where the current flows in only one direction (on/off) along the interface wires; a stepper with 5 or 6 interface wires. universal asynchronous receiver/transmitter (UART) A device to transmit data with asynchronous serial communication protocol. unsigned binary A mechanism to represent unsigned integers where all O's represents the value 0, and all J's represents is the largest positive number. vector An address at the end of memory containing the location of the interrupt service routines. See also reset vector and interrupt vector. Vm If the input voltage is above this value, the input is considered high. V1L If the input voltage is below this value, the input is considered low. VoH The smallest possible output voltage when the signal is high , and the current is less than 10 1-1. VoL The largest possible output voltage when the signal is low, and the current is less than loLvolatile A condition where information is lost when power is removed. In C, volatile tells the compiler, the value may change beyond the control of the software itself. von Neumann architecture A computer architecture where instructions are fetched from the same bus as data are fetched . vulnerable window Locations within a software module, which if an interrupt were to occur at one of these locations, then an error could occur (e.g. , data lost, coITupted data, program crash, etc.) Same as critical section. word Four bytes containing 32 bits. In C, we use long or unsigned long to create a word. In C99, we use int32 tor uint32 t to create a word.

323

Appendix 2. Solutions to Checkpoints Checkpoint 1.1.1: An embedded system is a microcomputer with mechanical, chemical, and electrical devices attached to it, programmed for a specific dedicated purpose, and packaged up as a complete system. Checkpoint 1.1.2: A microcontroller is a single-chip computer that includes a processor, memory and l/0 devices. Checkpoint 1.1.3: Encoding information as period, frequency , phase, or pulse width leads to lower cost, lower power and higher performance. Checkpoint 1.1.4: Real time means there is a small and bounded delay between when a task is requested and when it is actually invoked. Checkpoint 1.3.1: Add the powers of 2 for each digit that is l . l •2 + l •2 + I •2 + 1•2 + l •2 + l •2 + l •2 + 1•2° = 255 Checkpoint 1.3.2: 15• 16 1+ l 4• 16° = 254 Checkpoint 1.3.3: First, divide the binary into 4-bit nibbles, then convert the two 4-bit nibbles: Ob0100=0x4 and Ob01 l l=Ox7. Third, combine the two hex digits into one number Ox47 . Checkpoint 1.3.4: First, divide the binary into 4-bit nibbles, then convert the three 4-bit nibbles: Ob 1101 =OxD, Ob l OI O=OxA and Ob l Ol l =OxB. Third, combine the three hex digits into one number Ox DAB. Checkpoint 1.3.5: First, convert the two 4-bit nibbles: Ox4=0b0100 and Ox9=0bl001. Second, combine the 8 binary bits into one binary number ObOIOOIOOl. Checkpoint 1.3.6: Four binary bits are required for each hex digit. 4* 5 is 20 bits. Checkpoint 1.3.7: There are 8 bits/byte, so 60 bits will take 60/8 = 7.5, or 8 bytes of memory. Checkpoint 1.3.8: The rule of thumb says 2 is about 1000 which a billion. 2 is 32, so 2 ~ 32 billion. Checkpoint 1.3.9: The rule of thumb says 2 is about 10004, which a trillion. 2 is 4, so 2 ~ 4 trillion. Checkpoint 1.3.10: 0•2 7 + l •2 6+ I •2 5+0•2 4 + l •2 3+0•2 2+ 1•2 1+ l •2° = 64+32+8+2+ I = 107 Checkpoint 1.3.11: 4* 16+6 = 64+6 = 70 Checkpoint 1.3.12: We start by setting the running total to the number we wish to convert. We start with the basis element associated with the MSB and work towards the basis element for the LSB. We must also subtract basis elements from the running total as we determine they are needed. lfthe basis element in question is less than or equal to the running total, then we need that basis element. Checkpoint 1.3.13: Combine binary basis elements to create the desired value. 45=32+8+4+ I , so 45 = Ob00101101 = Ox2D. Checkpoint 1.3.14: Combine binary basis elements to create the desired value. 200= 128+64+8, so 200 = Obi 1001000 = OxC8. Checkpoint 1.3.15: Combine signed binary basis elements to create the desired value. -128+64+32+8+2 = -22. Checkpoint 1.3.16: They are the same, because bit 7 is zero. Checkpoint 1.3.17: Combine signed binary basis elements to create the desired value. -45 = -128+64+ 16+2+1 = Obi 1010011 = OxD3. Checkpoint 1.3.18: Because the range of 8-bit signed nwnbers is -128 to + 127. Checkpoint 1.3.19: 8192+64+ 32+8+2=8298. Checkpoint 1.3.20: 1*4096+2*256+3* 16+4=4660. Checkpoint 1.3.21: 1234 = 4*256+ 13* 16+2 = Ox04D2. Checkpoint 1.3.22: l 0000 = 8192+ I 024+5 I 2+256+ 16 = 0010011100010000 2 . Checl{point 1.3.23: 1*4096+2*256+3* 16+4 = 4660. Checkpoint 1.3.24: -32768 + 2*4096+ 11 *256+ 12*16+ 13 = -2 I 555. Checkpoint 1.3.25: 1234 = 4*256+13* 16+2 = Ox04D2. Checkpoint 1.3.26: -10000 = -32768 + 16384+4096+2048+ 128+64+32+ 16 = 1101100011110000 2 . 7

30

3

6

5

4

5

3

2

35

,

4

0

2

42

1

324

• Appendix 2. Solutions to Checkpoints

Checkpoint 1.3.27: Looking in the ASCII table we see '0' is 0x30 (or 48). Checkpoint 1.3.28: Let c be the character '0' to '9' , n = c - 0x30. Checkpoint 1.3.29: Look up each letter, concatenate, add Oat end, 0x48656C6C6F20576F726C6400. Checkpoint 1.5.1: The addressing mode defines the format for the effective address for that instruction. In other words, it defines how the instruction will access the data it needs. Checkpoint 1.5.2: Add 8 to R3 to get 0x2020.0008, R3 is not changed. Checkpoint 1.5.3: Add Rl +R3 to get 0x2020.0008, Rl R3 are not changed. Checkpoint 1.5.4: Bit-wise AND. 0x12345678 & 0x87654321 = 0x02244220. Bit-wise EOR. 0001 /\ 1000=1001. 001 QAQJ 11=0101. 0011 /\Q l 10=0101. 0100/\0101=0001. So Ox 12345678 /\ 0x8765432 I = 0x955 I 1559. Checkpoint 1.5.5: Change ORRS to BICS. Checkpoint 1.5.6: Read N, shift, store into M LDR R2,=N II R2 = &N MOVS R3,#0 LDRSH Rl, [R2,R3] II Rl = N (16-bit signed) LSLS R0,Rl,#2 II RO= NIP ..... ..... .. ..... .. ... .......... .. .. ... .... .. ........ ......... . 204 Nyquist Theorem ... ..... ......... .......... .. ...... ....... .... 209, 258 object code ........ ......... ... ... ..... ..... .. .. ..... .. .. ..... . 21, 52, 317 Octave ............. ....... .... .. ..... ......... .. .............. ..... .......... 303 octet .... ........... ..... .... ...... ..... ............ .. .... .... .. .. ... ....... ...... . 3 offset error of a DAC ....... ............. ........... ......... ... ..... 211 Ohm'sLaw ....... ... .. .... .. ...... ...... ... ..... .... .... .... .... ........ ... 59 one's co mplement. ... ..... ...... ..... ........................ ...... ....... 9 opcode field ........... .... .. .... .. .. ......... ........ .... .. ...... .. ........ 21 open collector ........................ .... .... .......... .... ....... .. ... . 318 open collector NOT ........... ...... .......... .... .... .... .... ...... . 368 Open Systems Interconnection .... ........ ..................... 262 operand field ........ .... ........ ........ ....... ... .... .. .............. ..... 2 1 operating system .. ...... ........ ...... ..... .... ........... ..... ........ 3 18 OR gate ...... .............. ......... ... ...... ............. ..... ........... .. 365 origin .. ..... .. ...... ........ ................. ... ...... ............ .. .. ......... 49 ORRS ......... ............................. ....... ............. .. ..... . 28, 352 oscilloscope ....................................... ................ ... ... . 2 15 output impedance ........ ...... ... ... .. ......... ..... ....... ........... 318 output port ....... .... ...... ...... ............................ ...... .... ..... 67 overtlow .................... .... .. .... .. . I I , 31, 35, 232, 235, 237 overrun .. .... ................ .... ... .. ..... ...... .... ..... .. .............. .. 280 parallel port ............ ...... .... ... ........ ... ........ .......... ......... 3 18 parallel programmi ng ........... ...... ...... ... ...... ............... 198 paral lel resistors ......... .... ... ..... .... .. ........ ..... ..... .... ...... ... 63 parentheses ........ ... ... .. .... ... .... ..... .......... ... .. ... ........... .. I 19 parity .................... ......... ........... ... ........ ........ ... ... 317, 3 18 parity error ................... ............ ... ... .............. .. ..... ..... . 280 passive sign convention ................... ........... .... ............ 59 PC .......... ........... ... .................... .... ... ........... ... .............. 17 PC-relative addressing ..................... .. ..... ..... . 22, 24, 318 performance debugging ................ ...... ..... . 175, 318, 319 periodic intermpt ................................... .. .. .... ... ........ 207 periodic polling ....................................... .......... 196, 318 personal computer ........ .... ........ ... ....... .... ............ ...... 318 physics ..... ... .. ......... .... ....... .......... ... ..... .......... .......... .. 299 pin .. ........ ....... ..... ........ ....... .... .... .. ... .... ... ... ...... ..... .. .. .... 65 PfNCM .... ...... ............... ... ........ ... .............. ...... ... .. ....... 69 pitch ................. ..... .. ... .... ..... ..... ... .. .. .... ...................... 213 place-va lue ......... .......... .... ................ ....... ..... ......... .... .... 5 pmf ........ ... ... ... .... ....... ... .... .. ..... ........... ...... ................ 255 pointer ...... ........ ............ ....... .............. ..... ... .. 22, 136, 140 polling .... ..... ...... ......... ......... ...... ................ ................ 196 POP ..... .. .. ............... ... ..... ....... ...... ... ... .. 39, 224, 225, 353 port ............................................. ...................... .... ...... 65 port a system .................................... ........ ........... ...... 154 portabi lity .. ... .. ... .. .... ...... ..... .. ........... .... .............. 93, 149 positive logic ............. ............................... ........ ...... 3, 80 power ..... .... ...... .... ............. .... .......... ............. .... .. ... ...... 61 precedence .. ...... ........ ........ ..... .. ....... .... ..... ....... .. 122, 368

373

precision ..... .. ... ................................ ?, 48,209,251,260 precision of a DAC ....... ............. .... .... ........ ....... .... .. . 210 PRIMASK .......... ............... ...... ........ ........ .... 18, 199, 204 printf ...... .... ...... ......... ..... ... ......... ... ............ ......... . 145 priority ... ... ............ .......................... ....... ....... .... 207, 290 private ........... ............................................... ..... 150, 221 probabi lity mass function ............................... ... 255, 319 procedure ........ ... .. ..... .... ...... ...... ......... ... ..... ... .. .... 54, 133 process ...... ... ....... .... ...... ........ ....... .. ...... ... ... .... ... ..... .. 199 processor ... ............ ...... ............ ... .... ... ... ... .. ... .......... ..... . I producer .......... ...... .... ... .. ................. .... ..... ...... ... ... .... 267 producer/consumer ...... .......... ...... ....... ..... .. ..... ... 267, 269 profile .. ... ................. ..... .. ....... ....... ... .... ....... ............ .. I 02 profiling ..... .. ......... .... ...... ....... ... .. .... .. 288, 289, 318, 3 19 program counter .................. .......... .... .. ..... ... ..... ....... ... 17 program status register ...... ....... ..... ...... ..... .... ..... ..... ... . I 8 promotion ........ ...... .... ........ ....... ..... ...... .... .. .. ........ 26, 233 prototype ............. ...... ... ....... ... .. ..... ..... .... .. ............. ... 134 pseudo-code .. ....... .. ................... ...... .... .. ............. ..... . 3 19 pseudo-ops .......... .......... ....... ....... .. ..... ........................ 42 PSR ...... .. ...... ...... ... .... .... .. .... ....... ....................... ...... ... 18 p-type transistor .... .... ... ..... ...... ....... ........ ........ .. ....... . 364 public ......... .. ............. .............. ....... ......... .......... 150, 221 pull-down ..... ..... ..... ........ ..... ....... ........ ....................... . 81 pull-up ........... ... .... ....... .... ... .. ........ ... ........................... 81 pulse width modulation ... .... .... ... .................. ....... 96, 319 punctuation .......................... .. ............. ..... .. .... .. .. ...... . I 16 PUSH ........ .......... ... ........ .... .................. 39, 224, 225, 354 PWM ... .............................. ... .... ............ .... ..... .... ......... 71 quotation marks .......... ... .... ... ...... .. .. ....... .. ..... ..... ....... 118 race condition ... ........ ....................... ... .. .................... 267 RAM .. .... .. .. ... ........... ........ .......... ...... ............. .... ...... . 319 random access ..... ..... .......................... ....... ........... ... ... 48 random access memory ............ ..................... ... ... .. ... .... I range .. ........ ....... ..... .... ....................... 209, 251, 260, 319 range ofa DAC ... ... ........... ....... ............. .... ......... ...... 210 readable output port ................. ............. ........ ... ... .... ... 67 real time ... ...... ........ .... .... ............... .... ... .............. ...... 3 19 real -time system ... ..... ......... .. ..... ........ .... ...... ......... 2, 194 recursion ............................. ........... .. ... ..... ................. 240 reduced instruction set computer. .... ...... ..................... 16 reentrant ... ....... .. ....... .... ........ ....... ...... ....... ........ .264, 320 reentrant code ....... ........ ............ ....... ........ ..... .. .......... 222 register ... ................ ... ... .. ... ....... .. ..................... .. ..... .... 16 remainder .. ..... ..... ...... ..... ........ ........... ............ .... ... .. .... 46 repeatability ........... ........ ..... .. .. ...... ..... ....... ...... .. ....... 320 reproducibil ity .. ....... ........... ...... .. ............. .. ...... ..260, 320 requirements document.. .. ... .......... .............. 57, 291,320 resistor ... .. .. .................................. ........ ......... ... ..... ...... 59 resistors in parallel ...... .............. .. ....... .. .............. ........ 63 resistors in series .. ......... ............. .. ...... ......... ..... .......... 63 resolution .. ............ ............ 209, 234, 251, 255, 260, 320 resolution of a DAC .... ................................. .... ...... .. 211

374

• Index

response time ............ ........... .... ...... ....... .................. .. 194 RlSC .................. .............................................. ... 16,320 ritual ...... ............ ..... .... ....................................... .. ..... 320 ROM ...... .. ..... ................ .... ..... ............... ...... ........ ..... . 320 rotor ... ................ ......... .. ................................... ....... .... 90 row major ........ ..... ... ............. ..... ... ... ... ........... ...... .. ... 293 RSBS ........... ........ ... .... ..... .. ... ....... ......... .. .... 36, 355, 380 RXFE ........................... ....... ................... .................. 283 sampling rate .. ........ ... ....... ........ ..... ...... ... .. .. ...... 207, 320 scan ............ .... .... ... .... ... .. ..... ....... ............ .. ................. 320 ScanPoint.. ....... ...... ..... .... .... ............. ............... .. .. .... .. 320 SCB->SHP .......... ...... ... ..... ..... ... ........ ........... .... ........ 208

security .. ...... ..... ..... .... ... ... ....... .... ... .... ... ...... .. .. .... ...... 149 selection operator. .. .... ., ......... ... ... ................ ...... ........ 129 semaphore ........................... ........... ................. .. 249, 320 semicolons ........... ... ...... ... ......... ........ ..... .. ... .............. I 16 sensitivity .. ...... ...... ..... ... .. ........ .......... ... ...... .... ...... ..... 320 sentinal ... ........ .. ........ .................. ............. ........ ........... 50 separation of poli cy from mechan ism ......... ... .... ..... .. 151 sequential access ......................... ... ...... .... ............. .. .... 48 serial communication .......................... ..... .. .. ...... ....... 320 seria l port .... .... .... ... ............ .. ............... .. ........... .... ..... 320 serial transmission ....... ................ ..... ....... ... .............. 278 series resistors ..... ......... .......... ............. .. ...... ......... ... .. .. 63 setup time ............... .. ...... ..... .... ....... .. ..... ...... .. .. ... ... ... 320 shift .......................................... .. .. ....... ...... ............. ..... 29 sign extension .... ...... ..... ...... ........ ...... .... ................ .... .. 26 signal to noise ratio ................. .. .. .. ......... .. ......... 215, 255 signed 2 's complement number .................. ....... ...... ... 13 sim plex commun ication sys tem ... .... ..... ....... ....... ...... 32 1 simu lator ................................ ... ........ ....... ... ........ ...... 32 1 sing le po le ... .. ......... ...... ... ......... ..... ..... .. ....... .... .. .......... 80 single throw ......... .... .......... ........... ... .......... ..... ......... ... 80 SNR .... ........ ....... ............ .......................... .............. ... 255 soft real time .... ...... ...... ......... .... ....... .... ............. 195, 321 software ... ..... .... ...... ... ...... ...... ....... .... ....... ........... .... ...... I software maintenance ........... ...... .......... .. ...... ... .. ....... 321 software quality ............................. ... ........................ 166 solid state relay ....... ..... .... ...... ........ .. ... ... ... .............. .... 94 sorting ......... .......... ......................... .. ....... .......... ........ 139 sou nd wav fi le ... .... ..... ..... ......... ... ... ..... ....... ..... .... .... . 303 source code ............. .... .. .. ....... ............... .. .... .... ...... 2 1, 52 SP, stack pointer ........... ............ ........ .... ....... ......... 17, 39 spacc ............ ...... .... .. .. .. .. ........ .. .. ... .. .. ... ... ........ ....... ... 32 I speaker ......................... ....... ...... ..... .. ......... .. .............. 213 specificity ................. ... ......... .. .... ..... .......... ... ...... ...... 32 1 spectrum ana lyzer ........ ........... .................................. 2 15 SP f ... ... .................. .................... .......... 7 1,245,247, 320 sprite ... ......... .......... .. ......... .... .... ......... ........... .... ... ..... 297 SPST .. ......... .. ........ .... ..... .. ............ ...... .......... ..... ... ... .... 80 SSR ............... ........ ... ... .. ..... ................................ ......... 94 stabil ization .......... ..... .............. .... ... .. ... .. ............ .. ..... 169 stabi li ze .. ...... ........ ........................ ..................... 170, 321

stack ....... ...... .. ....... .. .. ........... .. ..... ....... ................. 39, 321 empty ... ..... ....... ... ..................... .. .... ..... ... .......... ... .. . 41 next. .. ......... ..... ......... ...... ........ ... .. .... .... .. ................. 40 overflow ... .... .............. ...... ................ ........... .... ...... 40 pop data .... ................... ......... ...... ...... ........ ... ...... .... 39 push data ....... ........... .................... .... ....... .. ........ ... . 39 top ... ............. .... .... ........ ........ .... ...... .... ... ... .. ....... .... 40 underflow ....... ......... ..... .......... ...... .......... .. .. .. ..... ... . 40 stack frame .... ........ .. ...... .... ............ .... ................... .... 227 stack frame pointer. ... ..... .... ............. ... .. ... ... ........ ...... 228 stack overflow ......... .............. ............. .................... .. 225 stack pointer ..... ..... ..... ..... ...... ......... .... ....... .... ..... ... .... . 39 stack rules .... ...... ......... ....... ... ...... .. ..... .... ....... ..... ...... 224 stack size ... .. ... ...... ...... ..... .............. .. ... ...... .. ....... 224, 225 stack underflow .. ..... .............. ........ ..... .. ... ... .............. 225 standard deviation ... .. .. ...... ................... ....... ..... ........ 255 standard error ........... .. ........... ............. ............. ........ . 259 start bit .. ............... ... ................ ........... ... .. .. ........ .... ... 278 Startup.s .......... ................... ............... .. .... ..... ........... 204 state graph ............. .............. ............. ...... .... ......... .. .. . 183 static ... ............ ..... .......... .. .. .... ........... .................. 2 14 static efficiency .... ....................... ........ ... ..... .. .. ....... .. 167 static RAM ......... ...... .. .. ................ ......................... .. . 321 static variab le ........ ........ ........................... ...... .......... 221 stator ............................ .... ............... ............ ..... .......... 90 stepper motor ........ ...................... ...... .......... ....... .. .. .... 90 stepwise refinement... ......... ............ ....... ....... ...... .... .. I 09 stop bit ........... .. ... .............. .. ..... .... ............................ 278 STR .... ........... ..... ........... ..... .... ........... .... ...... 27, 356,379 STRB ............... .. ....... ..... ............... ..... ...... .......... 357, 379 strcpy ............. .. ... ............ .... .. ... .................... ..... ... ..... 143 STRH ..... .. ............ .......... .......................... .......... 358, 379 string .............. ...... .. ................. ... ... ...... ........ .. ... .. .50, 136 string copy ........... ............ ....... .... ..... ... .... ..... ............. 144 struct ... ............................................ ............... ..... 181 structu red programming ........ ... ......... .. ................ 54, I IO style rules ........... ...... ........... .... ....... ..... ........... ....... ... 162 subroutine .... ............. .... .. ..... .............................. ..54, 133 subroutine call .. ...... .. .................. ... .. .... .... ................. 338 SUBS ........... ...... ..... ..... .......... .............. .. .... .. 36, 359,380 subtracti on ............. ... ........ .... ...... .... ....... ......... ...... 3 1, 36 successive refi nement .... ......... ... ... ................... ........ I 09 SVC ........ ........ ........ .................. ..... .... .... ............ 360, 380 switch ...... ........ .... ..... ....... ......... ..... ............ .......... 82, 11 7 switch bounce ....... ....... ......... .. ... ... ....................... 82, 302 switch interface ......... .. .. ....... .............. ...... ..... .... ......... 80 synchronous protocol .......... .... ..... .... ............. ...... .... . 32 1 system capacity ............ .... ...... ... ........... .. .. ... ... ... ....... 273 systematic decomposition ................. ..... ............... ... 109 SysTick .. ......... .. ....... ......... ...... ........... ..... ... ....... 100, 207 SysTick->CTRL ........................................... 100, 208 SysTick->LOAD .. ........ ........... .......... ....... ..... 100,208 SysTick - >VAL ..... .... ...... .................. .... .... .... . I 00, 208

Jonathan Valvano T bit ....... ......................... .......... ... .... ...... ....... .... .......... 18 tail chaining ........... ................................................... 207 TCP/ IP .................................... .. .. .. ...... ........ ..... ...... ... 263 termination code .......................................... ..... .......... 50 testing .... ... ................... ................ ........... ............... ... I 09 thread ............... ....... .......................................... 198, 322 thread mode ..... ..... .... ................. .... ....... .................... 206 thread profile ......... ... .. .... ..... .... .. ............................... 289 throughput ...... ..... .......... ...... .. ....................... .... 195, 322 time constant. ......... ................................................... 322 time jitter ... ... .. ...... ... ................................................. 259 time profile .................... ............................. ..... ......... 322 time quantizing ........... ...... ............ .................... ........ 209 timer ......... ............ ..................... .. ................. ..... ....... 102 toaster ... .. ... .......... ........... ................................ ............ 55 toggle .. ................... ................................. .. .. 87, 131 ,322 token ................... ........................ ...... ........... ...... ... ... 116 top ...... ...... ......... ............ ... ............ ...... ............. ............ 40 topdown ... ................ ... ............................ 109, 158,159 topology ......... ................... ........ ...... .......... ................ 26 1 transducer ...... ...... ............................ ........... .... .... ...... 322 trap ............. ...... ...... .................... ...... .. ...... ... .... ..... ..... 3 11 trigger ...... ... .. .. .... ...................... ... ............................. 198 trigger flag ..... ...... ........ ... .. .............. .. ............. ........... 322 triple toggle .......... ............................................. 208, 322 tristate ............... ... ....... .. .... ..... .... ...... ......................... 322 tristate logic ... ....... .. ... ... ... ....... ..... ....... ..... ...... ....... .... 322 two's complement ................ ...... .................. ................ 9 TXFF ............ .... .......... .... ..... ...................... ...... ... ..... 284 typedef .............. ......... ................ ............ .......... .. .. 18I

375

UART ......................................................... 71, 278,322

UARTO->RXDATA ......... .. ..... .................................. 279 UARTO->TXDATA ......... ....... ...... .. .......................... 279 UDP ............ ... ....................................................... ... 263 ULN2003B ... ................... ........ ......... .. ................. 86, 368 unaligned .......... ....... ... ..... .... ..... .... ...... ....... .. ........... .. .. 26 unary operation ........ ...... .. .............................. ....... ..... 27 unbuffered 1/0 .. .... ....... ........... .. .............. .................. 322 underflow .... ....... ........ ................. ......... .. ....... .. ... .... .. 232 unfriendly .................. ..... ... ...... .... .... .. .............. ........... 79 Universal Asynchronous Receiver/Transmitter. .. ..... 278 unsigned number .......... .... ......... ........ ................ 8, 11 , 13 V bit ............................ ......... .. .................... .. .. ...... 18, 37 variable .. ....... ............. .. .. .......................................... 120 vectors ....... .. ... ............. ...................... .... .......... ......... 203 volatile .................... ...... ............... .. .......... ..... 1, 121, 322 voltage ..................................... .... ............... ...... ... .. ..... 59 voltage divider rule ... .............. ............. .. .... .... ....... .. ... 63 voltmeter .... .. .... ............... .. .... ...... ...... .... .... .... ... ........ I 05 von Neumann architecture ......... .... ...... ............ ..... ... ... . I vulnerable window ......... ......... ........ ........... ...... .264, 322 WFI ....................... .......... ........ ......... .. ...................... 380 WFI instruction ..... ................................... .............. .. 361 while loop ........... .............................. ......... .............. 131 white-box testing ...................................................... 169 word ... ... ...... .... ........................ ................. ....... 5, 13, 322 Z bit... .... ................ ............. ............. ............. ... ... .... .. .. 18 zero pad ................... .................... ................. ..... ..... .... 26

376

• Reference Material

Reference Material

!SR name in startup

NVTC priority

PendSV Handler SysTick_Handler GROUP0_IRQHandler GROUPl_IRQHandler TIMGS_IRQHandler UART3_IRQHandler ADC0_IRQHandler ADCl_IRQHandler CANFD0_IRQHandler DAC0_IRQHandler SPI0_IRQHandler SPil_IRQHandler UARTl_IRQHandler UART2_IRQHandler UART0_IRQHandler TIMG0_IRQHandler TIMG6_IRQHandler TIMA0_IRQHandler TIMAl_IRQHandler TIMG7_IRQHandler TIMG12_IRQHandler

SCB->SHP[l] SCB->SHP[l] NVIC->IP[0] NVIC->IP[0] NVIC->IP[0] NVIC->IP[0] NVIC - >IP[l] NVIC - >IP[l] NVIC->IP[l] NVIC - >IP[l] NVIC->IP[2] NVIC- >IP[2] NVIC->IP[3] NVIC->IP[3] NVIC->IP[3] NVIC - >IP[4] NVIC - >IP[4] NVIC - >IP[4] NVIC->IP[4] NVIC - >IP[S] NVIC->IP[S]

Priority bits

23 31 715 23 31 715 23 31 15 23 15 23 31 715 23 31 715 -

22 30 6 14 22 30 6 14 22 30 14 22 14 22 30 6 14 22 30 6 14

NVIC enable

Enable bit

NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER [ 0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER[0] NVIC->ISER [0] NVIC->ISER[0] NVIC->ISER[0] NVIC - >ISER[0] NVIC->ISER[0]

0 2 3 4 5 6 7 9 10 13 14 15 16 17 18 19 20 21

Table 5.3.3. Some of the interrupt priority and enable registers for the MSPM0G3507.

C Data t

e

unsigned char signed char char unsigned int int unsigned short short unsigned long long float double

C99 Data

e

uint8 t int8 t char unsigned int int uint16 t int16 t uint32 t int32 t float double

Precision 8-bit unsigned 8-bit signed 8-bit compiler-dependent compiler-dependent 16-bit unsigned 16-bit signed unsigned 32-bit signed 32-bit 32-bit float 64-bit float

Rane 0 to +255 -128 to +127 ASCII characters

0 to +65535 -32768 to +32767 0 to 4294967295L -2147483648L to 2147483647L ±10-38 to ±10+38 +10-308 to +10+308

377

J onathan Valvano Index 0 I 6 13 18 19 20 21 33 34 35 36 37 38 39 45 46 52 53 54 58 59 2 3 4 5

II 12 14 15 16 17 22 23 24 25 26 27 28 29 30 31 32 42 43 44 47 48 49 50 51 55 56 57

Model PAO PA I PA2 PA7 PA 8 PA9 PA I0 PA Ii PAl 2 PA l 3 PA I 4 PA I S PA I 6 PA l 7 PA I 8 PA2 I PA22 PA2 3 PA24 PA2 5 PA26 PA27 PA28 PA29 PA3 0 PA3 I PB0 PB I PB2 PB3 PB4 PBS PB6 PB7 PB 8 PB9 PBI 0 PBI I PBl 7 PBl 3 PBl 4 PBl 5 PBl 6 PBl7 PBl 8 PBl9 PB20 PB2 I PB22 PB23 PB24 PB2 5 PB26 PB27

Mode2 U0 T X U0 RX TG 8 C l CO OUT UI T X UI RX U0 TX UO RX U3 CTS U3 RTS U0 CTS U0 RTS C2 OUT U I TX U I RX U2 TX U2 RX U2 TX U2 RX U3 RX U3 TX RTC OUT U0 TX II SCL 11 SDA U0 RX U0 TX U0 RX U3 TX U3 RX UI TX UI RX UI T X Ul RX Ul CTS UI RTS TG0 CO TG0 C l U3 TX U3 RX SPI CS3 U2 TX U2 RX U2 TX U2 RX C2 OUT SP0 CS2 SPI POCI SPI PI CO SPI SCK SP0 CS3 U0 CTS U0 RTS C2 OUT

Mode3 10 SDA 10 SCL SP0 CS P0 CK OUT SP0 CS P0 SP0 PI CO SP0 POC I SP0 SC K sPo sc K SP0 POC I SP0 PI CO SP! CS2 SP ! POCI SP ! SCK SP! PI CO TG8 CO TG8 C l SP0 CS3 SP0 CS2 SPJ CS3 SP! CS P0 SP! CSP! 10 SDA U2 RTS U2 CTS 10 SCL SPJ CS2 SPI CS3 U2 CTS U2 RTS U3 CTS U3 RTS SPt CS P0 SP t POCI SPJ PICO SPJ SCK TC;g CO TC,8 C l TAO C2 TAO C3 SPJ POCI SPI PICO SPI SCK SP0 PI CO SP0 SC K SP0 POC I SP! CS P0 TGS CO TC;S C l CO OUT SP0 CSPI SP0 CSP0 SP0 CSPI SP! CS PI

Mode4 TAO CO TAO C l TG7 C l TG8 CO U0 RTS U0 CT S 10 SDA 10 SCL TG0 CO U3 RX U3 TX II SC L II SD A II SCL II SDA U I CTS U l RTS TAO C3 TAO C3 N TGl 2 C l TG S CO TGS C l TAO C3 TG8 CO TG8 C l TAO C3N TA I CO TA I C l II SCL II SDA TA I CO TA I C l SP0 CSP! SP0 CS2 TAO CO TAO C l C l OUT CK OUT TA FALi TG l 2 CO SP0 CS3 U3 CTS U3 RTS SPI CS PI SPI CS2 TGS C l T AO C2

T A FA L0 TA O C3 TA FAL2 TAO C3 TAO C3N

Mode5 TA FAL i TA FAL2 SP I CSP0 TAO C2 TAO CO TA O C l TAI CO TA I C l CAN TX TG0 C l TG l 2 CO TA I CO TA I C l TAO C3 TAO C3N TAO CO TAO C l TG0 CO TG0 C l TAO C3 TA FAL0 TA FAL2 TA FAL0 TG6 CO TG6 C l TG l 2 C l TAO C2 TAO C2N TAO CJ TAO C3N TAO C2 TAO C2N TG8 CO TG8 C l C l OU T TAO CON TG6 CO TG6 C l TAO C l TAO C I N TG l 2 C l TGS CO TGS C l TA I CO TA I C l U0 CT S TGl 2 CO

Mode6 TG8 C l TG8 IDX

Mode7 FCC IN TG8 CO

Mode8

TG8 !DX T A I CON R OUT TG l2 CO CO OUT TA O C3 CAN RX CK OUT TG8 IDX TA I C I N TG7 CO TG7 C l TG6 CO CK OUT U3 CTS U3 RTS TAO C IN CAN TX CAN RX TG7 CO

TG7 C l

TAO C l

TA O TA O TA O FCC TAO

CON C2 C2N IN C3N

T AI C I N II SDA II SC L

TA I TAO TA I TA I

CON C2N co Cl

TAO C2 FCC IN

TAO CON TG 7 CO TG7 C l

TG6 C l TG8 CO TAI C l

CK OUT

TG7 C l

TA I C l

UI CT S U I RTS TA I CON TA I C I N U2 CTS U2 RTS

TG6 CO TG6 C l

TA I co TA I C l

TG6 CO TG6 C l

TA I CON TAI C IN

TG7 CO TG7 C l TA I CO

TG8 IDX TG7 CO TG7 C l TA O C2 TAO C2N TG7 C l TA FA Li

TAO CO

TG l 2 C l

TAO C I N

TA I CON

TG6 CO TG6 C l

TA I c o TA I C l

TAO C l

T able 2.2.1. Bits 5:0 of IOMUX->SE CCFG .PINCM specify the digital Mode for that pin.

T A I C IN

378

• Reference Material BITS 4 to 6 0 1 NUL OLE XON SOH STX DC2 ETX XOFF EOT DC4 ENQ NAK ACK SYN BEL ETB BS CAN HT EM LF SUB VT ESC FF FS CR GS so RS SI us

0 B

1

I T S

2 3 4

0

5 6 7

T 0

8 9 A

3

B C D

E

F

3 0

2

SP

1 2 3 4 5

! II

# $ % &

6

'

7

( )

8 9

* + ,

;


BLS label II branch if C==0 or Z==l Lower or same, unsigned~ BGE label II branch if N == V Greater than or equal, signed~ BLT label II branch if N != V Less than, signed< BGT label II branch if Z==0 and N==V Greater than, signed> BLE label II branch if Z==l or N!=V Less than or equal, signed~

380



Reference Material

Function call, function return, stack, and interrupt instructions PUSH {reglist} II push 32-bit registers onto stack, R0-R7,LR POP {reglist} II pop 32-bit from stack into registers, R0-R7,PC ADD Rd, SP, #n8 II Rd= SP+n8 ADD SP, SP, #n7 II SP= SP+n7 SUB SP, SP, #imm7w II SP= SP-imm7w BL labell II branch to subroutine at labell, anywhere BLX Rm4 II branch to subroutine specified by Rm4, R0-R12 BX Rm3 II branch to location specified by Rm3, R0-R12,LR CPSIE I II enable interrupts (I=0) CPSID I II disable interrupts (I=l) WFI II sleep and wait for interrupt SVC #imm8 II software interrupt Logical and shift instructions ANDS Rdn, Rm II Rdn = Rdn&Rm ORRS Rdn, Rm II Rdn = RdnlRm EORS Rdn, Rm II Rdn = Rdn;.Rm BICS Rdn, Rm II Rdn = Rdn& ( ~Rm) (unsigned) LSRS Rd, Rd, Rs II logical shift right Rd=Rd>>Rs (unsigned), 0 to 31 LSRS Rd, Rm, #n II logical shift right Rd=Rm>>n ASRS Rd, Rm, Rs II arithmetic shift right Rd=Rd>>Rs (signed) (signed), 1 to 32 ASRS Rd, Rm, #n II arithmetic shift right Rd=Rm>>n LSLS Rd, Rd, Rs II shift left Rd=Rd