Rearchitecting Software: Source Code Comprehension and Refactoring Applied to Flight Software and Simulation 9789529475841

Only the source code tells the full story. No document, no wiki, no README, no UML model; nothing will tell the story a

347 10 49MB

English Pages 200 [202] Year 2023

Table of contents :
1 Introduction
1.1 Only the source code tells the full story
1.2 The Myth of Code Readability
1.3 Comprehending Code in Embedded, Distributed Systems
1.4 Not programming advice
2 The Science Behind Code Comprehension
2.1 Code comprehension models
2.2 Code Comprehension Typical Activities
2.3 The Necessary Mental Flow7F
2.4 Build Systems & Tools (A Blessing and a Curse)
2.5 Autotools and the GNU Build System8F
2.5.1 Files Appearing out of Nowhere
2.6 CMake
2.7 Meson9F
2.7.1 Using Meson
2.7.2 Adding dependencies
2.8 Need For Speed: Ninja10F
2.8.1 Design goals of Ninja
2.8.2 Using Ninja
2.8.3 Variables
2.8.4 Rules
2.8.5 Build statements
2.9 Make vs. Meson vs. Ninja
3 Architectural Styles
3.1 Monolithic
3.2 Microservices
3.3 Layered
3.4 Component-Based (“The AppStore”)
3.5 Pipes and Filters
3.6 Front and Back End
3.7 Client-Server
3.8 Publisher-Subscriber
3.9 Event-Driven
3.10 Middleware
3.11 Service-Oriented
3.12 Conclusion
3.12.1 Good architecture is simple architecture
4 Tearing Software Down
4.1 JSBSim Teardown
4.1.1 Starting from scratch
4.1.2 JSBSim In Slow-Motion: Step-by-step analysis
4.1.2.1 The FGModel Base Class
4.1.2.2 Properties
4.1.2.3 Models
4.1.2.4 JSBSim Initialization and Execution
4.2 Core Flight System Teardown
4.2.1 Starting from scratch (again)
4.2.2 Compilation Success
4.2.3 cFS In Slow motion
4.2.3.1 Application Startup
4.2.3.2 API Initialization
4.2.3.3 Share Memory Initialization and data structure
4.2.3.4 Creating A Virtual File System
4.2.3.5 Calling the Executive Entry Point
4.2.3.6 Dynamic and Static Application Loading and the challenge of debugging shared libraries
4.2.3.7 Writing our first application
4.2.3.8 Yes, it does; The Application Runs
4.2.3.9 Software Bus, Pipes, Events, Tables, Commands and Telemetry
4.2.3.10 Receiving Commands
4.2.3.11 Sending Telemetry
4.3 Recapping
5 Hooking JSBSim and cFS Together
5.1 Writing an application in cFS that interacts with models in JSBSim
5.2 Querying JSBSim properties from cFS
5.3 Refactoring cFS to work with streaming sockets
5.4 Refactoring JSBSim’s telnet server, slightly
5.5 Decoding the telemetry on the ground
5.6 Showing and plotting the telemetry
5.7 Setting Properties
5.8 Reading from “real” on-board sensors
5.8.1 Creating Our Own Sensor
5.9 Adding more variables and some 3D visualization in real time
5.10 Sending Commands to CFS
5.10.1 Space Packet Protocol
5.10.2 Sending Space Packets to cFS
5.10.2.1 Function/command code
5.10.3 Creating our first command (Throttling the Engines)
5.11 Commanding control surfaces
5.12 Recapping
6 Writing a Roll/Pitch Controller for a Boeing 737 in cFS
6.1 Seeking stabilized pitch
6.2 Flight Control Channels
6.3 A Poor Man’s Fly-By-Wire
6.3.1 Wing Leveling
6.3.2 Commanding Some Roll Angle
6.4 Summary
7 Conclusion

Recommend Papers

Software Source Code 9783110703535, 9783110703399

425 20 6MB Read more

Software Source Code: Statistical Modeling 9783110703399, 9783110703306

This book will focus on utilizing statistical modelling of the software source code, in order to resolve issues associat

170 86 71MB Read more

Software Source Code: Statistical Modeling 9783110703399, 9783110703306

This book will focus on utilizing statistical modelling of the software source code, in order to resolve issues associat

153 22 6MB Read more

Software Source Code: Statistical Modeling (De Gruyter STEM) 3110703300, 9783110703306

This book will focus on utilizing statistical modelling of the software source code, in order to resolve issues associat

763 73 71MB Read more

Applied Software Project Management

660 78 931KB Read more

Producing Open Source Software: How to Run a Successful Free Software Project 0596007590, 9780596007591, 9780596552992, 0596552998

The corporate market is now embracing free, "open source" software like never before, as evidenced by the rece

492 66 2MB Read more

Producing Open Source Software How to Run a Successful Free Software Project

724 31 1MB Read more

Perspectives on Free and Open Source Software 0262062461, 9780262062466, 9781423733294

What is the status of the Free and Open Source Software (F/OSS) revolution? Has the creation of software that can be fre

483 14 3MB Read more

Applied Cryptography——Protocols, Algorithms, and Source Code in C

454 27 3MB Read more

Refactoring in Large Software Projects [1 ed.] 9780470858929, 0470858923

Large Refactorings looks at methods of establish design improvements as an important and independent activity during dev

319 123 4MB Read more

Rearchitecting Software: Source Code Comprehension and Refactoring Applied to Flight Software and Simulation
9789529475841

Author / Uploaded
Ignacio Chechile

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Rearchitecting Software: Source Code Comprehension and Refactoring Applied to Flight Software and Simulation Ignacio Chechile

Table of Contents 1

2

Introduction .......................................................................................................................... 3 1.1

Only the source code tells the full story ........................................................................ 6

1.2

The Myth of Code Readability ...................................................................................... 7

1.3

Comprehending Code in Embedded, Distributed Systems ......................................... 8

1.4

Not programming advice ............................................................................................... 9

The Science Behind Code Comprehension ...................................................................... 10 2.1

Code comprehension models ..................................................................................... 11

2.2

Code Comprehension Typical Activities .................................................................... 12

2.3

The Necessary Mental Flow ........................................................................................ 13

2.4

Build Systems & Tools (A Blessing and a Curse) ....................................................... 15

2.5

Autotools and the GNU Build System........................................................................ 16

2.5.1

2.6

CMake ......................................................................................................................... 20

2.7

Meson .......................................................................................................................... 23

2.8

Need For Speed: Ninja ............................................................................................... 27

2.9

Make vs. Meson vs. Ninja............................................................................................ 29

2.7.1 2.7.2 2.8.1 2.8.2 2.8.3 2.8.4 2.8.5

3

Files Appearing out of Nowhere ........................................................................................................ 20

Using Meson ...................................................................................................................................... 23 Adding dependencies ......................................................................................................................... 25 Design goals of Ninja .......................................................................................................................... 27 Using Ninja ......................................................................................................................................... 28 Variables ............................................................................................................................................. 28 Rules ................................................................................................................................................... 28 Build statements ................................................................................................................................. 29

Architectural Styles ............................................................................................................. 31 3.1

Monolithic ................................................................................................................... 32

3.2

Microservices ............................................................................................................... 32

3.3

Layered ........................................................................................................................ 33

3.4

Component-Based (“The AppStore”) ........................................................................ 35

3.5

Pipes and Filters .......................................................................................................... 36

3.6

Front and Back End .................................................................................................... 36

3.7

Client-Server ................................................................................................................ 37

3.8

Publisher-Subscriber.................................................................................................... 37

3.9

Event-Driven................................................................................................................ 38

3.10

Middleware .............................................................................................................. 38

3.11

Service-Oriented ...................................................................................................... 38

3.12

Conclusion ............................................................................................................... 38

3.12.1

Good architecture is simple architecture ....................................................................................... 39

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation 4

Tearing Software Down ...................................................................................................... 41 4.1

JSBSim Teardown ....................................................................................................... 41

4.2

Core Flight System Teardown ..................................................................................... 85

4.1.1 4.1.2

Starting from scratch ........................................................................................................................... 43 JSBSim In Slow-Motion: Step-by-step analysis .................................................................................. 65

4.2.1 4.2.2 4.2.3

4.3 5

Starting from scratch (again) ............................................................................................................... 87 Compilation Success .......................................................................................................................... 96 cFS In Slow motion ............................................................................................................................ 97

Recapping .................................................................................................................. 122

Hooking JSBSim and cFS Together................................................................................. 123 5.1

Writing an application in cFS that interacts with models in JSBSim ........................ 124

5.2

Querying JSBSim properties from cFS .................................................................... 125

5.3

Refactoring cFS to work with streaming sockets ....................................................... 125

5.4

Refactoring JSBSim’s telnet server, slightly ............................................................... 130

5.5

Decoding the telemetry on the ground ..................................................................... 135

5.6

Showing and plotting the telemetry ........................................................................... 137

5.7

Setting Properties....................................................................................................... 140

5.8

Reading from “real” on-board sensors ...................................................................... 140

5.9

Adding more variables and some 3D visualization in real time ................................ 147

5.8.1

Creating Our Own Sensor................................................................................................................ 141

5.10

Sending Commands to CFS .................................................................................. 155

5.10.1 5.10.2 5.10.3

6

5.11

Commanding control surfaces ............................................................................... 173

5.12

Recapping .............................................................................................................. 175

Writing a Roll/Pitch Controller for a Boeing 737 in cFS ................................................. 177 6.1

Seeking stabilized pitch ............................................................................................. 180

6.2

Flight Control Channels ............................................................................................ 182

6.3

A Poor Man’s Fly-By-Wire ....................................................................................... 183

6.3.1 6.3.2

6.4 7

Space Packet Protocol ................................................................................................................. 156 Sending Space Packets to cFS...................................................................................................... 160 Creating our first command (Throttling the Engines).................................................................. 164

Wing Leveling .................................................................................................................................. 186 Commanding Some Roll Angle ....................................................................................................... 191

Summary ................................................................................................................... 194

Conclusion ........................................................................................................................ 195

Ignacio Chechile

1

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

ISBN 978-952-94-7584-1. 2023. This work is licensed under Creative Commons (CC BY-SA 4.0). You are free to: share, copy and redistribute the material in any medium or format. Adapt, remix, transform, and build upon the material for any purpose, even commercially. You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. Cover photo: Efe Kurnaz (Unsplash).

Ignacio Chechile

2

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

1 Introduction I was a sunny Monday morning, and I had just started a new job. It was a dream job in some ways, and it required me to relocate to a beautiful city far away from where I was living at the time. Upon my arrival, there was a short introduction with my new team. My colleagues seemed (and turned out to be) really nice, talkative, down to earth, and incredibly smart. I remember thinking: this could actually work. A rush of adrenaline made my heartbeats go faster as I started to look forward to kicking off work, not without internally patting myself on the shoulder for having had the courage of quitting my previous job—stepping out my “comfort zone” as they say—for what it was clearly a better one. Then, after the required politeness of the first day, and as the formalities like getting my new email account and different credentials to use enterprise systems were more or less in the bag, my new supervisor called me for a first meeting. After entering the room, he said right away: — Welcome to the team. You have been assigned with the task of documenting, rearchitecting, and ‘productizing’ an internal project we’ve had for years which now we are planning to sell it to a customer. It was the first time I had heard the word ‘productizing’ in my life—if that was even a word. Also, the word “rearchitecting” was somewhat new to me. It was quite interesting to see his expression when he said the ‘rearchitecting’ bit. It sounded as if it needed a pretty heavy one. Then, he opened his laptop, and started to show me around in some IDE I hadn’t seen before (it was Borland C), randomly opening, and closing source and header files. Then, he handed me a CD-ROM (feel old yet?), and he said: ‘good luck’. So, off I went to my new desk, disk in hand, to start a journey that in a way marked my career, and mostly for the good. The project I had just inherited had hundreds of thousand lines of code, entirely in C. And it had practically no documentation besides a loose, outdated collection of documents and block diagrams. It also included chunks in FORTRAN code automatically exported to C. Beautiful. Said software exported its data in csv format, and the original developer would keep a sheet of printed paper with him with the letters of the columns of the spreadsheet that would result from importing that csv file into Excel. This means, in order to understand which output variable belonged to which column, instead of naming the first row with variable names in each column—what we mortals would do—he would prefer to keep these A4 printed paper with letters (AB, AC, AD) and variable names. Chef kiss.

Ignacio Chechile

3

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

Said project was a simulation environment for verifying spacecraft control flight software. All designed and coded from scratch. And it was beautifully selfcontained: it had no dependencies with any libraries or anything whatsoever, except the standard system libraries. The software included many functionalities: replicating the outputs of sensors and the behavior or actuators, numerically integrating the equations of motion, simulating orbits and their perturbations, and many other things. All done with rather simple math. With this software, you could fool a flight computer by connecting the simulator to it through the right interfaces—the software was capable of interfacing with real hardware—and said computer would have no way of telling if it was in space or sitting in a clean room. In a way, this software created a Universe on its own. A small, private Universe made out of trigonometric calculations, functions, and variables. And my task was, first to comprehend how such lines of code could bring that Universe into existence, and second—but not less important—to rearchitect the thing because now an external customer wanted it, and it had never been conceived as a final product, so I needed to package it to protect its intellectual property while at the same time giving the future user the possibility of extending it by writing their own models. More than a library, I had to come up with a ’framework’ of sorts. A productgrade framework from legacy code. Strangely enough, I felt privileged. I know I am probably among the few people in the world feeling privileged after being thrown thousands of lines of poorly documented, aging legacy C code to their faces. But that’s how I felt. And I regret nothing. I wish I could write this text on source code comprehension using that same software that inspired it. I still remember every detail of it. Readers would get to see the brilliance in its simplicity, and it would bring some closure after all these years dealing with software of all kinds. But I do not have it anymore, because it’s a proprietary software, so just I couldn’t. Instead, this text will work with a set of open-source projects that I will use to explain a practical method to comprehend code. Mind you, it is not a groundbreaking method, but a method that has worked well for me throughout the years. Every time my professional life has put me in front of unknown code, I have applied this same method, and it has served its purpose: giving me insight on what is what, what goes where, and the freedom to refactor things for my own purpose. I might or might not be smart, but one thing’s for sure: I am a slow thinker. I might be capable of eventually figuring things out, as long as you give me the time. And the method reflects a bit that: step by step, slowly but steadily. For practicing code comprehension, we will use a flight simulation engine combined with a flight software framework, both running on Linux. The projects that we will use are:

Ignacio Chechile

4

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

• JSBSIM Flight Simulation Engine 1 • Core Flight System (cFS 2) running on Linux. The main idea is to delve into their codebase by breaking them down and see what they are made of. We will get compiler errors galore in the process. Granted, compiler errors can be annoying, but you must embrace them, for they provide a lot of information any source code analyst must ingest and use in their own advantage. In a way, we will put the code in an interrogation room, and we will make it sing like a bird. The ambitious objective of our work is to learn about both programs’ architectures and design styles in a way we will be able to alter them, refactor them, recompile them a hundred times if necessary, and eventually hook them together. For what exactly? We will try a to write a basic autopilot of a Boeing 737. From ignorance to flying an airliner. Ambitious, huh? How are we going to achieve that? Well, let’s find out. The steps ahead are rather straightforward. First and foremost, we will get the code, we will set up a proper code analysis environment, and we will start tampering it. We will fiddle with source code to make two projects that grew totally unaware of each other to interact; we will craft our tiny Frankenstein. These projects, combined together, have around 160K lines of code between C and C++ which, although not the biggest code base ever, it is still complex enough to make us think. Here’s what sloccount 3 says about JSBSIM:

Figure 1-1 JSBSim has roughly 60K lines of code (C++)

And about cFS:

https://jsbsim.sourceforge.net/ https://cfs.gsfc.nasa.gov/ 3 Mind that counting lines of code has never been an incredibly accurate method and it may only represent a rough approximation of code complexity. 1 2

Ignacio Chechile

5

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

Figure 1-2 Around 100K lines of code for cFS

1.1 Only the source code tells the full story Some people find pleasure in being beaten with a whip, or in being burned with candle wax. I find pleasure in diving into source code to figure it out. Just like an archeologist finds its purpose in life brushing dust from stones hoping to find a lost civilization, or a detective chasing a slippery criminal, I find my purpose browsing source files and header files while trying to connect the dots. The reward of understanding why things are like they are or how they work is perhaps one of the most addictive cycles out there; the pleasure of the chase, the boredom of the conquest. When I was a kid, I used to disassemble everything I could find to understand how they worked. I tore down my Commodore 128 when I was maybe 12 and was mind blown by the fact all those chips put together could make possible me playing ‘Operation Wolf’ for hours straight. Not long ago, while browsing the wiki of an open-source project (a 2M lines of code kind of thing), I came across this gem:

Figure 1-3 True that

Amen brothers and sisters. Truer words have never been spoken or written: only the source code tells the full story. Full stop. Just in case it was not clear enough:

Only the source code tells the full story. No document, no wiki, no README file, no UML model; nothing will tell the story as accurately as the recipe itself. If you need to bake a cake, go, and draw as many diagrams as you want, but nothing will beat the actual recipe telling you exactly how to make the damn cake. Software is like cooking; if you want to produce something that is remotely enjoyable, you must pay attention to the recipe. Model-based techniques of doing things have proliferated in the last two decades or so. But here's an unspoken truth: a piece of source code is a model, and its syntax are the artifacts and constructs we have at hand to model the semantics and the behavior we want to implement. Therefore, writing code is also model based, the syntax being the lines and arrows to connect entities together. Ignacio Chechile

6

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

Someone could argue UML 4 is easier to understand and follow than source code. Only that, it is not. If you really, really want to know how a piece of software works, you will sooner than later end up browsing source code, compiling it, and putting it to run. UML diagrams can be, and will be, out of sync from the source code in no time because a software developer is always going to the source code when fixing bugs or adding new functionalities, while they only eventually go to update the diagrams. Keeping the graphical depictions that describe a software’s design up to date is a highly manual process unless you still believe in that ideal world, a world of fairy tales, where source code completely autogenerated from models would keep the integrity between source and diagrams. I have seen engineers preaching about the marvels of auto coding, and while I do recognize auto coding has improved dramatically in the last decade or so, I have also seen same engineers having a hard time debugging or optimizing auto coded embedded software. Experience shows that most of the UML diagrams around have been created following the inverse logic: from existing source code. Often, it feels simpler to implement something directly in object-oriented language rather than drawing a sequence diagram of what needs to be done. And if it is easier to describe the problem in source code and the diagram proves to be “optional”, what is the value of spending time drawing the diagram anyway? That’s the dilemma of many software architects out there struggling to communicate software structure to the ones having to code it. There is no way around it: only source code tells the full story. If you want to hear partial stories, apocryphal stories, or fairy tales, go for it, but just be aware that you are only seeing an incomplete side of the plot when you look at a block diagram with colored boxes and arrows.

1.2 The Myth of Code Readability Sure, no one wants to read cryptic code, as much as no one wants to read a badly written book. One of my favorite technical books about space was written by a German man whose rudimentary English makes it somewhat tricky to follow from time to time. Still, the book checks out in the sense that it conveys useful ideas, regardless of the grammar. We all want to communicate perfectly, and we try our best, but some people succeed more than others when putting their thoughts into words, be it poetry or a multi-threaded TCP/IP client. But comprehending complex source code requires going beyond that and working at a higher level of abstraction than the actual syntax. If you need to focus on how good or badly someone has named their variables, then you are probably putting your magnifier at the wrong zoom level. My argument is simple: there is not bad code, but there are bad source code analysts. Perhaps not bad, but lazy. UML stands for Unified Modeling Language, and it is a graphical way of describing software structure and behavior.

4

Ignacio Chechile

7

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

Analysts who do not want to spend much mental energy and want everything distilled up front, who pretend the comments will meticulously explain how things work like they’re primary schoolers. Regardless of the code readability (which largely focuses on syntax), comprehending thousands of lines of code needs to focus on the bigger picture and on understanding what are the salient building blocks of what you are analyzing and how those blocks relate with each other. Systems become actual systems thanks to the interactions between the constituent parts. Architecture readability is the real challenge we are dealing with here. As a source code analyst, refactorer or (re)architect, you must walk past the minutiae of tooling, indentation levels and camel cases. We all have our choices when it comes to coding style, build systems and IDEs, and don’t get me wrong: being tidy is a good thing. Naming functions in a consistent manner definitely helps searching (source codes with a mix of camelCase, CamelCase or snake_case can make your life harder when you’re looking around for clues). But code is code, and a proper source code reader must be able to deal with it, come what may.

1.3 Comprehending Code in Embedded, Distributed Systems Complex systems such as airliners have tens of millions of lines of code 5 on board. What is more, those lines of code are not running all on the same computing devices but distributed in a collection of specialized computers all across their structure, connected by kilometers of cables, hubs, and routers. Recently, also running in several partitions inside isolated virtualized containers. Comprehending source code is greatly shaped by the contextual boundaries of the software. In simpler English: it is not the same story to comprehend 2 million lines of code supposed to execute all together in one single process than comprehend those 2 million lines of code spread in 10 different embedded computers running 200K lines of code each. The complexity of the interaction between the air gaps, the requests, responses, protocols, handshakes, makes the comprehension process exponentially more challenging. So, next time you inherit legacy code, make sure you understand how cohesively (or not) those lines of code are supposed to run, in which computing contexts those lines are supposed to execute, and in case you inherit a distributed monster which executes here and there, deeply understand the interactions between the parts; sniff the protocols, listen to the packets and handshakes coming and going, and grow the overall picture. With all that being said, the method that I will use in the upcoming sections is aiming to software running inside the same computing context. Debugging and running distributed software step-by-step is a complex matter because there is a According to a 2011 report by Boeing, the 787 Dreamliner contains over 6.5 million lines of code. This code is spread across various systems, including the flight control system, avionics system, and passenger entertainment system. This number has likely increased.

5

Ignacio Chechile

8

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

lot happening: cross-compilation, networks, delays, etc. The method ahead is still applicable in each individual context of a potential collection of distributed units.

1.4 Not programming advice Note that this text is not intended to be a programming course. I am assuming you are a seasoned programmer in Python, C and C++, and you know about data structures, object-oriented design, control flow and the mechanics behind coding, building, and debugging software of a reasonable complexity. This text aims to show a way of comprehending source code and it does it (or at least tries) by means of practical examples, trying to ‘walk the talk’. This text is not intended to be used as reference manual for the software programs dissected in the following sections. Only certain parts of such programs are described at a coarse level and with the goal of identifying their relevant parts in order to integrate them together. Please refer to the documentation available, and ultimately, go to the source code. This text was written in roughly 4 weeks, just to provide some quantitative idea on how long it took to learn and refactor the code and document the whole process. Last but not least, I hope you enjoy reading this at least a fraction of what I enjoyed writing it. Not the formatting part though, I absolutely loathed that part; I’m glad it’s over.

Ignacio Chechile

9

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

2 The Science Behind Code Comprehension Turns out, there is science behind code comprehension. Analyzing source code is a psychologically intensive activity. It is one brain against hundreds of thousands of lines of mysterious text ideated and written by one or more other brains. And our brains, as marvelous biological machines as they surely are, can meet certain limits in terms of processing complexity. There is a certain number of lines of code beyond which code comprehension becomes extremely difficult for one single brain. Why? Not because of any innate lack of initiative or laziness in our minds, but mostly because the whole process of analyzing such amount of code becomes extremely cumbersome. Not only the act of keeping up with thousands of objects and data structures (and their relationship), but also because the sole act of compiling 1 million lines of code or more may take several minutes in a decent development laptop. IDEs indexers can be overloaded and take long times to converge and eat up your RAM in the process and crash without notice. If you are debugging or refactoring a monstrously big code base, having to deal with such dynamics can become unbearable. Code comprehension—or more specifically architecture comprehension—should be a design factor when architecting software, driving the splitting of the code base in a way its comprehension would always be assured; at least until artificial intelligence will finally make a glorious appearance and be able to distill for us billions of lines of code in a matter of seconds. While we wait for AI, it would be great to see more “comprehension-driven design” of software. Therefore, comprehending large amounts of source code can be frustrating and frustration is not something that really helps when it comes to mental clarity. Source code comprehension is an essential part of the software maintenance and/or rearchitecting process, and, according to research 6, it can be classified into two classes: • Functional approach: is interested in what the source code does. • Control-flow approach: is interested in how the source code works. The interesting part is that you cannot survive as a source code analyst with either one or the other. If you really need to dig how a program works and do something tangible with it, you must embrace both: see what it does and how it does it.

Nedhal A. Al-Saiyd “Source Code Comprehension Analysis in Software Maintenance”, Computer Science Department, Faculty of Information Technology Applied Science Private University, Amman-Jordan

6

Ignacio Chechile

10

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

2.1 Code comprehension models The process of source code comprehension can be classified into four cognitive models 7: • Top-down code comprehension model; the knowledge of the program domain is restructured and mapped to the source code in a top-down manner. The process starts with a general hypothesis about the nature of the program, which represent a high-level abstraction or concepts of the program. Then, the general hypothesis is examined, refined, and verified to form subsidiary hypotheses in a hierarchical layout. Each hypothesis represents a segment or chunk of program code. The low levels are generated continuously until comprehension model is achieved. This model is used when the analysts are familiar with the code. • Bottom-up code comprehension model; the analysts read the complete source lines of code, and then group these lines into chunks of higher-level abstractions. The elementary program chunks are specified based on the control flow model, and the procedural relations among chunks are defined based on functional relations. The knowledge of high-level abstractions are incrementally grouped until a highest-level of program understanding is achieved. The comprehension is enhanced using refactoring of code functionalities. Bottom-up comprehension chunk the micro-structure of the program into macro-structure and by crossreferencing these structures. Bottom-up comprehension is less risky than a bottom-up strategy, where the lower-level hypotheses can be identified directly from concrete source code. • Hybrid or Knowledge-based code comprehension model; the comprehension knowledge about the code is grasped from the integration of bottom-up and top-down models because the maintainer navigates through the source line of code and jumps through different chunks when searching the code to find the links to the intended block of code. The understanding of the program evolves using the analysts’ expertise and background knowledge together with source lines of code and documentation. • Systematic and as-needed strategies; on which the analysts focus only on the code that is related to particular evolution task. The analyst’s or (re)architects use a systematic method to extract the static knowledge about the structure of the program, and the causal knowledge about interfaces between different parts of the program at execution time. In the systematic macro-strategy, the programmer traces the flow of the whole program. This M. A. Storey “Theories, Methods and Tools in Program Comprehension: Past, Present and Future”, Software Quality Journal 14, pp. 187–208, DOI 10.1007/s11219-006-9216-4.

7

Ignacio Chechile

11

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

strategy is less feasible for large programs, more mistakes could occur because the maintainer miss some important interactions. The comprehension models describe above come across as highly intertwined and insufficient when taken in isolation. In this text, we follow a combination of the Knowledge-based and Systematic approach.

2.2 Code Comprehension Typical Activities At any time during a program comprehension process, the analyst can apply any code comprehension model, when the size and the complexity of the program structure are varied from one program to another and from block to another. To achieve comprehension the following activities, are typically followed: • Read the Code and Documentation (if any): Reading the source code lineby-line or in any arbitrary order to understand the workflow and the application behavior of the program, and this assist to locate the code where the change should be performed. Also, read and reexamine the application documentation, which is not always consistent and up to date; requirements and design models and specifications. • Execute White-box and/or Black-box experiments of the program to inspect the input, sequence of implemented functions, output, and the consequences. It is depending on the application type and the analyst’s skills and knowledge. Dynamic runtime information is acquired from the executed program. • Extract block: the source code is partitioned conceptually into coherent block of code (i.e., segments of code) that share relevant functionality. Then analyze what each block works (functional approach) and how each block works (control approach). Partitioning into blocks depends on the continuous statements that shared common functionality and code attributes. The block may have more than one related function. The structure of functions within the block of code is defined. This will facilitate the allocation of particular functions and their statements within a block. Partitioning helps in improving the readability of source code, filtering irrelevant code, locating data structures, assisting maintainers when locating the intended code in one area, and saving the maintenance effort and time. • Analyze The Internal Structure Dependencies: The inheritance model and functional dependencies between areas are analyzed to preserve the intended behavior of the source code. • Generate Program Graphs; by transforming the source code of the program into functional graphs. The dataflow graph and control-flow graph (that is similar to that produced in white-box testing) assist to identify the data and control dependencies in the source code. They make it easier for Ignacio Chechile

12

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

maintainer to read, understand and find which parts is needed to maintain and which parts are affected by the maintenance. • Refactoring: this technique is used in code comprehension through analyzing the implicit structural dependencies and find out the data and control interrelationships. Refactoring technique improves interactively the software structure, rename the methods and attributes, eliminate the conflicts, and reserve the functionality of the program. Refactoring process also eliminate the code clones. The software architecture is standardized to have low-coupling and high-comprehension to decrease complexity of source code. Using a meaningful and readable naming of functions, methods and data helps maintainers to facilitate the understanding ability of the code and to locate the intended parts that need maintenance. These steps read as a bit sequential and isolated. Rearchitecting code involves mixing all these steps together, with a great dose of experimentation which includes creating small, separate projects to try things out.

2.3 The Necessary Mental Flow8 Analyzing source code is an investigative endeavor, and as such it requires concentration; it can only ‘tick’ after reaching reasonably deep mental states. The source code detectivesque activity is usually rewarded by dopamine bombs provided upon finding knowledge nuggets as the activity unfolds, and any disruption of this mental state can be impactful for the overall performance and take a very long time to regain. If the average incoming email takes five minutes to read and reply and your reimmersion period is fifteen minutes, the total cost of that email in terms of flow time (actual brain work time) lost is twenty minutes. A dozen emails per day will use up half a day. A dozen other interruptions and the rest of the workday is gone, let alone getting to figure out what the source code does. Just as important as the loss of effective time is the accompanying frustration. The source code analyst who tries and tries to get into flow and is interrupted each time is not a happy person. She gets tantalizingly close to involvement only to be bounced back into awareness of her surroundings. Instead of the deep mindfulness that she craves, she is continually channeled into the promiscuous changing of direction that the modern office tries to force upon her. A few days like that and anybody is ready to look for a new job. If you’re a manager, you may be relatively unsympathetic to the frustrations of being in no-flow. After all, you do most of your own work in interrupt mode—that’s management—but the people who work for you comprehending code for rearchitecting and refactoring need to get into flow. Anything that keeps them from it will reduce their effectiveness and 8

Adapted from “Peopleware, Third Edition” by Tom De Marco and Timothy Lister, Addison-Wesley

Ignacio Chechile

13

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

the satisfaction they take in their work. It will also increase the cost of getting the work done.

Ignacio Chechile

14

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

2.4 Build Systems & Tools (A Blessing and a Curse) Building software is essentially different than comprehending software in its motivation, objectives, and dynamics. In fact, my argument is that both are at odds from each other. On one hand, building software is about defining the dependencies and code structure once, only to then—after this one-time configuration stage—invoke in as little steps as possible the sequence necessary for building the binaries and just repeat the process n number of times. Eventually, building software becomes a matter of executing a selected set of commands or scripts. Building software has a clear goal of giving you binaries; to decrease the time between refactoring and coding and runtime. Building software in highly automated manners makes a lot of sense. This way, software is more replicable, more portable and can also be integrated into other systems such as build servers for a full-blown software engineering suite including testing and the like. Comprehending software, on the other hand, is in a way about reconstructing what the build system is doing rapidly and automatically. Imagine that you enter a highly automated factory floor full of robots, and someone tells you that you must redefine the factory layout for better efficiency, or because now the company wants to assemble a new product. If you are new to the current layout, you will need to observe what the robots are doing, which will most likely require to slow them down to get the process right. Automation brings speed and abstraction, which are great when in production, but detrimental when in need of comprehension, when you need access to detail. The thesis is clear and transpires as somewhat obvious: build systems do not necessarily help in code comprehension. But this is not a terrible problem, we can live with this. We only need to be aware. Let’s maybe dive a bit into build systems and how they work, how they may make the rearchitecting process somewhat challenging, and let’s check the most popular build systems out there. This will be only an introduction; for more details, I recommend you check their websites and repositories and play with them accordingly to gain the necessary familiarity.

Ignacio Chechile

15

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

Figure 2-1 Build Systems are like robots working at your disposal (Photo by Simon Kadula on Unsplash)

2.5 Autotools and the GNU Build System 9 I am sure you have delved into software packages full of files named configure, configure.ac, Makefile.in, Makefile.am, aclocal.m4 and so forth, some of them generated by Autoconf or Automake. But the exact purpose of these files and their relations is probably tricky to understand at first. The goal of this section is to briefly introduce you to this machinery. In the Unix world, a build system is traditionally achieved using the command make. You express the recipe to build your package in a Makefile. This file is a set of rules to build the files in the package. For instance, the program code may be built by running the linker on the files main.o, foo.o, and bar.o; the file main.o may be built by running the compiler on main.c; etc. Each time make is run, it reads the Makefile, checks the existence and modification time of the files mentioned, decides what files need to be built (or rebuilt), and runs the associated commands. When a package needs to be built on a different platform than the one it was developed on, its Makefile usually needs to be adjusted. For instance, the compiler may have another name or require more options. In 1991, David J. MacKenzie got a bit sick of customizing Makefiles for the 20 platforms he had to deal with. Instead, he handcrafted a little shell script called configure to automatically adjust the Makefile. Compiling his package was now as simple as running said configure script and then invoke make. Nowadays this process has been standardized in the GNU project. The GNU Coding Standards recommends that each package of the GNU project should have a configure script, and the minimal interface it should have. The Makefile too should follow some established conventions. The result? A unified build 9

Summarized from https://www.gnu.org/software/automake/manual/html_node/GNU-Build-System.html

Ignacio Chechile

16

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

system that makes all packages almost indistinguishable by the installer. In its simplest scenario, all the installer has to do is to unpack the package, run configure, make finally make install, and repeat with the next package to install. This is called the GNU Build System, since it was grown out of the GNU project. However, it is used by a vast number of other packages: following any existing convention has its advantages. The Autotools are tools that will create a GNU Build System for your package. Autoconf mostly focuses on configure and Automake on Makefiles. It is entirely possible to create a GNU Build System without the help of these tools. However, it is rather burdensome and error prone. Let’s try a “hello world” using Autotools. Create the following files in an empty directory. src/main.c is the source file for the hello program. We store it in the src/ subdirectory: $ cat src/main.c #include #include int main (void) { puts ("Hello World!"); puts ("This is " PACKAGE_STRING "."); return 0; } README

contains some very limited documentation for our little package.

$ cat README This is a demonstration package for GNU Automake. Type 'info Automake' to read the Automake manual. Makefile.am

and src/Makefile.am contain Automake instructions for these two

directories. $ cat src/Makefile.am bin_PROGRAMS = hello hello_SOURCES = main.c $ cat Makefile.am SUBDIRS = src dist_doc_DATA = README

Finally, configure.ac contains Autoconf instructions to create the configure script.

Ignacio Chechile

17

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

$ cat configure.ac AC_INIT([amhello], [1.0], [[email protected]]) AM_INIT_AUTOMAKE([-Wall -Werror foreign]) AC_PROG_CC AC_CONFIG_HEADERS([config.h]) AC_CONFIG_FILES([ Makefile src/Makefile ]) AC_OUTPUT

Once you have these five files, it is time to run the Autotools to instantiate the build system. Do this using the autoreconf command as follows: $ autoreconf --install configure.ac: installing './install-sh' configure.ac: installing './missing' configure.ac: installing './compile' src/Makefile.am: installing './depcomp'

At this point the build system is complete. In addition to the three scripts mentioned in its output, you can see that autoreconf created four other files: configure, config.h.in, Makefile.in, and src/Makefile.in. The latter three files are templates that will be adapted to the system by configure under the names config.h, Makefile, and src/Makefile. Let’s try it: $ ./configure checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... no checking for mawk... mawk checking whether make sets $(MAKE)... yes checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no

Ignacio Chechile

18

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking for style of include used by make... GNU checking dependency style of gcc... gcc3 configure: creating ./config.status config.status: creating Makefile config.status: creating src/Makefile config.status: creating config.h config.status: executing depfiles commands

You can see Makefile, src/Makefile, and config.h being created at the end after configure has probed the system. It is now possible to run all the targets we wish. For instance: $ make … $ src/hello Hello World! This is amhello 1.0. $ make distcheck … ============================================= amhello-1.0 archives ready for distribution: amhello-1.0.tar.gz =============================================

Note that running autoreconf is only needed initially when the GNU Build System does not exist. When you later change some instructions in a Makefile.am or configure.ac, the relevant part of the build system will be regenerated automatically when you execute make. autoreconf is a script that calls autoconf, automake, and a bunch of other commands in the right order. If you are just beginning with these tools, it is not important to figure out in which order all of these tools should be invoked and why. However, because Autoconf and Automake have separate documentation, the important point to understand is that autoconf is in charge of creating configure from configure.ac, while automake is in charge of creating Makefile.in from Makefile.am

Ignacio Chechile

19

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

and configure.ac. This should at least direct you to the right manual when seeking answers. 2.5.1 Files Appearing out of Nowhere An issue quickly appears while describing how Autotools work. You saw in a previous step that a file called config.h was generated. This is a header file, and therefore essential part of the compilation process and part of any code comprehension endeavor. And it did not exist before the build system was invoked. What is the conclusion there? Correct. Build systems generate source code which might be essential for a software teardown process and for code analysis. So, let’s put it again big enough: Build systems tools frequently autogenerate code that is essential for the dissection and comprehension process. A corollary is: when tearing code down to see what it’s made of you shall never skip taking a look at the build system used by the software you are dissecting, and you shall build it and run it in order to obtain all the relevant autogenerated files that are needed to understand the structure of the code under study.

2.6 CMake CMake is another tool to manage building of source code. Originally, CMake was designed as a generator for various dialects of Makefile, but today CMake generates modern build systems such as Ninja (which we will discuss soon) as well as project files for IDEs. Important is to note that CMake does not build software, it only produces files for a variety of build tools. The most basic CMake project is an executable built from a single source code file. For simple projects like this, a CMakeLists.txt file with a few commands is all that is required. CMake can generate a native build environment that will compile source code, create libraries, generate wrappers, and build executable binaries in arbitrary combinations. CMake has supports for static and dynamic library builds. Another nice feature of CMake is that it can generates a cache file that is designed to be used with a graphical editor. For example, while CMake is running, it locates include files, libraries, and executables, and may encounter optional build directives. This information is gathered into the cache, which may be changed by the user prior to the generation of the native build files. CMake scripts also make source management easier because it simplifies build script into one file and more organized, readable format.

Ignacio Chechile

20

Rearchitecting Software: Source Code Comprehension Applied to Flight Software and Simulation

CMake intended to be a cross-platform build process manager, so it defines it is own scripting language with certain syntax and built-in features. CMake itself is a software program, so it should be invoked with the script file to interpret and generate actual build file. A developer can write either simple or complex building scripts using CMake language for the projects. Build logic and definitions with CMake language is written either in CMakeLists.txt or in files ending with .cmake. As a best practice, main script is named as CMakeLists.txt instead of cmake. CMakeLists.txt file is placed at the source of the project you want to build. CMakeLists.txt is placed at the root of the source tree of any application, library it will work for. If there are multiple modules, and each module can be compiled and built separately, CMakeLists.txt can be inserted into the sub folder. .cmake files can be used as scripts, which runs cmake command to prepare environment pre-processing or split tasks which can be written outside of CMakeLists.txt. .cmake files can also define modules for projects. These projects can be separated build processes for libraries or extra methods for complex, multimodule projects. Writing Makefiles might be harder than writing CMake scripts. CMake scripts by syntax and logic have similarity to high level languages so it makes easier for developers to create their cmake scripts with less effort and without getting lost in Makefiles. Let’s start with a basic “Hello World!” example with CMake so we wrote the following “Hello CMake!” the main.cpp file as following: #include int main() { std::cout