C++ In Action: Industrial Strength Programming Techniques 9780201699487, 0201699486

C++ in Action introduces state-of-the-art C++ programming and problem-solving techniques for developing efficient, power

320 114 2MB

English Pages 348 Year 2001

Table of contents :
.RO Release ( Contents......Page 1
Preface......Page 2
Language......Page 4
Techniques......Page 5
Software Project......Page 7
Language......Page 9
Global scope......Page 11
Local scope......Page 14
Embedded objects......Page 18
Inheritance......Page 20
Member functions and Interfaces......Page 22
Member function scope......Page 24
Types......Page 30
Exercises......Page 32
Abstract Data Types......Page 33
Exercises......Page 37
References......Page 39
Stack-based calculator......Page 43
Functional Specification......Page 44
Stubbed Implementation......Page 45
Calculator: Implementation......Page 47
Input: Implementation......Page 49
The Makefile......Page 52
Exercises......Page 53
Pointers......Page 55
Pointers vs. References......Page 56
Pointers and Arrays......Page 57
Pointers and Dynamic Memory Allocation......Page 60
Dynamic Data Structures......Page 62
Dynamic Stack......Page 63
Linked List......Page 65
String Table......Page 70
String Buffer......Page 71
Table Lookup......Page 74
Hash Table......Page 75
Test......Page 77
Exercises......Page 78
The Meaning of is-a......Page 79
Parse Tree......Page 80
Exercises......Page 83
Specification......Page 85
Stubbed Implementation......Page 87
Expanding Stubs......Page 89
Final Implementation. Not!......Page 91
Scanner......Page 92
Symbol Table......Page 97
Store......Page 99
Function Table......Page 102
Nodes......Page 109
Parser......Page 110
Main......Page 117
Initialization of Aggregates......Page 118
Exercises......Page 119
Operator overloading......Page 121
Passing by Value......Page 122
Value Semantics......Page 125
Techniques......Page 133
Techniques......Page 135
Improving Code Grouping......Page 137
Decoupling the Output......Page 138
Fighting Defensive Programming......Page 139
A Case of Paranoid Programming......Page 141
Fringes......Page 145
Improving Communication Between Classes......Page 147
Correcting Design Flaws......Page 151
Using Embedded Classes......Page 158
Combining Classes......Page 159
Combining Things using Namespaces......Page 160
Hiding Constants in Enumerations......Page 161
Hiding Constants in Local Variables......Page 164
Testing Boundary Conditions......Page 165
Templates......Page 168
Dynamic Array......Page 171
Separating Functionality into New Classes......Page 176
Standard Vector......Page 180
Code Review 5: Resource Management......Page 185
Exceptions......Page 186
Stack Unwinding......Page 187
Resources......Page 191
Ownership of Resources......Page 193
Smart Pointers......Page 194
Ownership Transfer: First Attempt......Page 195
Ownership Transfer: Second Attempt......Page 197
Safe Containers......Page 202
Iterators......Page 205
Error Propagation......Page 208
Conversion......Page 209
Conclusion......Page 211
Making Use of the Standard Template Library......Page 212
Reference Counting and Copy-On-Write......Page 215
End of Restrictions......Page 220
Exploring Streams......Page 223
The Calculator Object......Page 228
Command Parser......Page 230
Serialization and Deserialization......Page 232
In-Memory (De-) Serialization......Page 241
Multiple Inheritance......Page 243
Transactions......Page 244
Transient Transactions......Page 245
Persistent Transactions......Page 248
The Three-File Scheme......Page 250
The Mapping-File Scheme......Page 251
Caching......Page 254
Bulk Allocation......Page 256
Array new......Page 258
Macros......Page 260
Tracing Memory Leaks......Page 262
Debug Output......Page 265
Placement new......Page 267
Windows Techniques......Page 268
Of Macros and Wizards......Page 269
Programming Paradigm......Page 270
Hello Windows!......Page 272
Encapsulation......Page 277
Model-View-Controller......Page 278
Exception Specification......Page 284
Cleanup......Page 286
Application Icon......Page 287
Window Painting and the View Object......Page 290
The Canvas......Page 291
The WM_PAINT Message......Page 294
The Model......Page 296
Capturing the Mouse......Page 297
Adding Colors and Frills......Page 302
Stock Objects......Page 308
User Interface......Page 311
Child Windows......Page 312
Windows Controls......Page 313
Dialogs......Page 317
Commands and Menus......Page 321
Dynamic Menus......Page 322
Exercises......Page 327
Implementation Strategies......Page 328
Complexity......Page 329
The Fractal Nature of Software......Page 331
The Living Project......Page 333
The Living Programmer......Page 334
Top-Down Object Oriented Design......Page 336
Model-View-Controller......Page 337
Requirement Specification......Page 338
Architecture Specification......Page 339
Productivity......Page 340
Team Strategies......Page 342
Global Decisions......Page 344
Top-Down Object Oriented Implementation......Page 345
Multi-platform development......Page 346
Program Modifications......Page 347
Stress Testing......Page 348

Recommend Papers

PROGRAMMING IN C# 10 - Basic Techniques 9791221401790

Do you want to learn the basic techniques for programming in C# 10 and gain enough knowledge to start creating your own

185 90 5MB Read more

Rust in Action: Systems programming concepts and techniques 9781617294556

727 157 6MB Read more

Functional Programming in C#: Classic Programming Techniques for Modern Projects [1 ed.] 0470744588, 9780470744581

Take advantage of the growing trend in functional programming. C# is the number-one language used by .NET developers an

115 39 3MB Read more

Programming Abstractions in C++

521 81 5MB Read more

Efficient C++ performance programming techniques 9780201379501, 0-201-37950-3

Far too many programmers and software designers consider efficient C++ to be an oxymoron. They regard C++ as inherently

321 58 1MB Read more

Krishna's Computer Programming in 'C'

747 146 6MB Read more

Wireless Hacks: 100 Industrial-Strength Tips and Techniques [1 ed.] 9780596005597, 0596005598

Wireless Hacks offers 100 industrial-strength tips about wireless networking, contributed by experts who apply what they

240 48 4MB Read more

Real World Camera Raw with Adobe Photoshop CS2 Industrial-Strength Production Techniques

Call it a control thing, but until recentlyor, more specifically, until the availability of digital raw camera formatsyo

396 56 10MB Read more

C programming

653 70 1MB Read more

C/C++ programming style guidelines

This document contains the guidelines for writing C/C++ code for Dynamic Software Solutions. The point of a style guide

614 76 54KB Read more

C++ In Action: Industrial Strength Programming Techniques
9780201699487, 0201699486

Author / Uploaded
Bartosz Milewski

Similar Topics
Computers
Programming

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

.RO Re le as e ☺

Co n te n ts 1. 2. 3. 4. 5. 6. 7.

Preface I nt roduct ion Language Techniques Windows Techniques Soft ware Proj ect Appendix

Pre face W h y Th i s B o o k ? During t he first four m ont h of 1994 I was present ed wit h a wonderful opport unit y. My old Universit y in Wroclaw, Poland, invit ed m e t o give t wo courses for t he st udent s of Com put er Physics. The choice of t opics was left ent irely t o m y discret ion. I knew exact ly what I want ed t o t each... My work at Microsoft gave m e t he unique experience of working on large soft ware proj ect s and applying and developing st at e of t he art design and program m ing m et hodologies. Of course, t here are plent y of books on t he m arket t hat t alk about design, program m ing paradigm s, languages, et c. Unfort unat ely m ost of t hem are eit her writ t en in a dry academ ic st yle and are quit e obsolet e, or t hey are hast ily put t oget her t o cat ch t he lat est vogue. There is a glut of books t eaching program m ing in C, C+ + and, m ore recent ly, in Java. They t each t he language, all right , but rarely do t hey t each program m ing. We have t o realize t hat we are wit nessing an unprecedent ed explosion of new hardware and soft ware t echnologies. For t he last t went y years t he power of com put ers grew exponent ially, alm ost doubling every year. Our soft ware experience should follow t his exponent ial curve as well. Where does t his leave books t hat were writ t en t en or t went y years ago? And who has t im e t o writ e new books? The academ ics? The hom e program m ers? The conference crowd? What about people who are act ive full t im e, designing and im plem ent ing st at e of t he art soft ware? They have no t im e! I n fact I could only dream about writ ing t his book while working full t im e at Microsoft . I had problem s finding t im e t o share experiences wit h ot her t eam s working on t he sam e proj ect . We were all t oo busy writ ing soft ware. And t hen I m anaged t o get a four- m ont h leave of absence. This is how t his book st art ed. Teaching courses t o a live, dem anding audience is t he best way of syst em at izing and t est ing ideas and m aking fast progress writ ing a book. The goal I put forward for t he courses was t o prepare t he st udent s for j obs in t he indust ry. I n part icular, I asked m yself t he quest ion: I f I want ed t o hire a new program m er, what would I like him t o know t o becom e a product ive m em ber of m y t eam as quickly as possible? For sure, I would like such a person t o know • C+ + and obj ect orient ed program m ing. • Top- down design and t op- down im plem ent at ion t echniques. • Effect ive program m ing wit h t em plat es and C+ + except ions. • Team work. He ( and whenever I use t he pronoun he, I m ean it as an abbreviat ion for he or she) should be able t o writ e reliable and m aint ainable code, easy t o underst and by ot her m em bers of t he t eam . The person should know advanced

2

program m ing t echniques such as synchronizat ion in a m ult it hreaded environm ent , effect ive use of virt ual m em ory, debugging t echniques, et c. Unfort unat ely, m ost college graduat es are never t aught t his kind of " indust rial st rengt h" program m ing. Som e universit ies are known t o produce first class com put er hackers ( and seem t o be proud of it ! ) . What 's worse, a lot of experienced program m ers have large holes in t hat area of t heir educat ion. They don't know C+ + , t hey use C- st yle program m ing in C+ + , t hey skip t he design st age, t hey im plem ent bot t om - up, t hey hat e C+ + except ions, and t hey don't work wit h t he t eam . The bot t om line is t his: t hey wast e a lot of t heir own t im e and t hey wast e a lot of ot hers' t im e. They produce buggy code t hat 's difficult t o m aint ain. So who are you, t he reader of t his book? You m ight be a beginner who want s t o learn C+ + . You m ight be a st udent who want s t o supplem ent his or college educat ion. You m ight be a new program m er who is t rying t o m ake a t ransit ion from t he academ ic t o t he indust rial environm ent . Or you m ight be a seasoned program m er in search of new ideas. This book should sat isfy you no m at t er what cat egory you find yourself in.

3

In tro d u ctio n I have divided t his book int o t hree part s, t he Language, t he Techniques, and t he Soft ware Proj ect .

Lan gu age The first part t eaches C+ + , t he language of choice for general- purpose program m ing. But it is not your usual C+ + t ut orial. For t he beginner who doesn't know m uch about C or C+ + , it j ust int roduces a new obj ect orient ed language. I t doesn't concent rat e on synt ax or gram m ar; it shows how t o express cert ain ideas in C+ + . I t is like t eaching a foreign language by conversat ion rat her t han by m em orizing words and gram m at ical rules ( when I was t eaching it t o st udent s, I called t his part of t he course " Conversat ional C+ + " ) . Aft er all, t his is what t he program m er needs: t o be able t o express ideas in t he form of a program writ t en in a part icular language. When I learn a foreign language, t he first t hing I want t o know is how t o say, " How m uch does it cost ?" I don't need t o learn t he whole conj ugat ion of t he verb 't o cost ' in t he past , present and fut ure t enses. I j ust want t o be able t o walk int o a st ore in a foreign count ry and buy som et hing. For a C program m er who doesn't know m uch about C+ + ( ot her t han t hat it 's slow and crypt ic- - t he popular m yt hs in t he C subcult ure) t his is an exercise in unlearning C in order t o effect ively program in C+ + . Why should a C program m er unlearn C? I sn't C+ + a superset of C? Unfort unat ely yes! The decision t o m ake C+ + com pat ible wit h C was a purely pract ical, m arket ing decision. And it worked! I nst ead of being a com plet ely new product t hat would t ake decades t o gain t he m arket , it becam e " version 3.1" of C. This is bot h good and bad. I t 's good because backward C com pat ibilit y allowed C+ + , and som e elem ent s of obj ect orient ed program m ing, t o quickly gain foot hold in t he program m ing com m unit y. I t 's bad because it doesn't require anybody t o change his program m ing m et hodology. I nst ead of having t o rewrit e t he exist ing code all at once, m any com panies were, and st ill are, able t o gradually phase C+ + in. The usual pat h for such a phase- in is t o int roduce C+ + as a 'st rict er' C. I n principle all C code could be recom piled as C+ + . I n pract ice, C+ + has som ewhat st rict er t ype checking and t he com piler is able t o det ect m ore bugs and issue m ore warnings. So recom piling C code using a C+ + com piler is a way of cleaning up t he exist ing code. The changes t hat have t o be int roduced int o t he source code at t hat st age are m ost ly bug fixes and st rict er t ype enforcem ent . I f t he code was writ t en in pre- ANSI C, t he prot ot ypes of all funct ions have t o be generat ed. I t is surprising how m any bugs are det ect ed during t his ANSI - zat ion procedure. All t his work is definit ely wort h t he effort . A C com piler should only be used when a good C+ + com piler is not available ( really, a rare occurrence nowadays) . Once t he C+ + com piler becom es part of t he program m ing environm ent , program m ers sooner or lat er st art learning new t ricks and event ually t hey develop som e kind of C+ + program m ing m et hodology, eit her on t heir own or by reading various self- help books. This is where t he bad news st art s. There is a subset of C+ + ( I call it t he C ghet t o) where m any ex- C- program m ers live. A lot of C program m ers st art hat ing C+ + aft er a glim pse of t he C ghet t o. They don't realize t hat C+ + has as m any good uses as m isuses. For a C- ghet t o program m er t his book should be a shock ( I hope! ) . I t essent ially says, " what ever you did up t o now was wrong" and " Kernighan and Rit chie are not gods" . ( Kernighan and Rit chie are t he creat ors of C and t he

4

aut hors of t he influent ial book The C Program m ing Language) . I want t o m ake t his clear right here and now, in t he int roduct ion. I underst and t hat t he first , quit e nat ural, react ion of such a program m er is t o close t he book im m ediat ely ( or, act ually, j um p t o anot her I nt ernet sit e) and ask for a refund. Please don't do t his! The shocking, iconoclast ic value of t his book is not t here t o hurt anybody's feelings. Seeing t hat t here exist s a drast ically different philosophy is supposed t o prom pt one t o ret hink one's beliefs. Besides, t he Em peror is naked. For a C+ + program m er, t he t ut orial offers a new look at t he language. I t shows how t o avoid t he pit falls of C+ + and use t he language according t o t he way it should have been designed in t he first place. I would lie if I said t hat C+ + is a beaut iful program m ing language. However, it is going t o be, at least for som e t im e, t he m ost popular language for writ ing serious soft ware. We m ay as well t ry t o t ake advant age of it s expressive power t o writ e bet t er soft ware, rat her t han use it t o find so m any m ore ways t o hurt ourselves. For a C+ + program m er, t his part of t he book should be m ost ly easy reading. And, alt hough t he const ruct s and t he t echniques int roduced t here are widely known, I t ried t o show t hem from a different perspect ive. My overriding philosophy was t o creat e a syst em t hat prom ot es m aint ainable, hum an- readable coding st yle. That 's why I t ook every opport unit y not only t o show various program m ing opt ions but also t o explain why I considered som e of t hem superior t o ot hers. Finally, for a Java program m er, t his book should be an eye- opener. I t shows t hat , wit h som e discipline, it is possible t o writ e safe and robust code in C+ + . Everyt hing Java can do, C+ + can do, t oo. Plus, it can deliver unm at ched perform ance. But perform ance is not t he only reason t o st ick wit h C+ + . The kind of elegant resource m anagem ent t hat can be im plem ent ed in C+ + is quit e im possible in Java, because of Java's reliance on garbage collect ion. I n C+ + you can have obj ect s whose lifet im e is precisely defined by t he scope t hey live in. You are guarant eed t hat t hese obj ect s will be dest royed upon t he exit from t hat scope. That 's why you can ent rust such obj ect s wit h vit al resources, like sem aphores, file handles, dat abase t ransact ions, et c. Java obj ect s, on t he ot her hand, have undefined life spans- - t hey are deallocat ed only when t he runt im e decides t o collect t hem . So t he way you deal wit h resources in Java harks back t o t he old C except ion paradigm , where t he finally clause had t o do all t he painfully explicit garbage collect ion. There are no " nat ive" speakers of C+ + . When " speaking" C+ + , we all have som e accent t hat reveals our program m ing background. Som e of us have a st rong C accent , som e use Sm allt alk- like expressions, ot hers Lisp- - The goal of t he t ut orial is t o com e as close as possible t o being a nat ive speaker of C+ + . Language is a t ool for expressing ideas. Therefore t he em phasis is not on synt ax and gram m ar but on t he ways t o express yourself. I t is not " Here's a cut e C+ + const ruct and t his is how you m ight use it ." I nst ead it is m ore of " Here's an idea. How do I express it in C+ + ?" I nit ially t he 'ideas' t ake t he form of sim ple sent ences like " A st ar is a celest ial body," or " A st ack allows you t o push and pop." Lat er t he sent ences are com bined t o form 'paragraphs.' describing t he funct ionalit y of a soft ware com ponent . The various const ruct s of C+ + are int roduced as t he need arises, always in t he cont ext of a problem t hat needs t o be solved.

Te ch n iqu e s Writ ing good soft ware requires m uch m ore t han j ust learning t he language. First ly, t he program doesn't execut e in a vacuum . I t has t o int eract wit h t he

5

com put er. And int eract ing wit h t he com put er m eans going t hrough t he operat ing syst em . Wit hout having som e knowledge of t he operat ing syst em , it is im possible t o writ e serious program s. Secondly, we not only want t o writ e program s t hat run- - we want our program s t o be sm all, fast , reliable, robust and scaleable. Thirdly, we want t o finish t he developm ent of a program in a sensible am ount of t im e, and we want t o m aint ain and enhance it aft erwards. The goal of t he second part of t he book, The Techniques, is t o m ake possible t he t ransit ion from 'weekend program m ing' t o 'indust rial st rengt h program m ing.' I will describe t he t echnique t hat m akes program m ing in C+ + an order of m agnit ude m ore robust and m aint ainable. I call it " m anaging resources" since it is cent ered on t he idea of a program creat ing, acquiring, owning and releasing various kinds of resources. For every resource, at any point in t im e during t he execut ion of t he program , t here has t o be a well- defined owner responsible for it s release. This sim ple idea t urns out t o be ext rem ely powerful in designing and m aint aining com plex soft ware syst em s. Many a bug has been avoided or found and fixed using resource ownership analysis. Resource m anagem ent m eshes very nat urally wit h C+ + except ion handling. I n fact , writ ing sensible C+ + program s t hat use except ions seem s virt ually im possible wit hout t he encapsulat ion of resources. So, when should you use except ions? What do t hey buy you? I t depends on what your response is t o t he following sim ple quest ion: Do you always check t he result of new ( or, for C program m ers, t he result of malloc) ? This is a rhet orical quest ion. Unless you are an except ionally careful program m er- - you don't . That m eans you are already using except ions, whet her you want it or not . Because accessing a null point er result s in an except ion called t he General Prot ect ion Fault ( GP- fault or Access Violat ion, as t he program m ers call it ) . I f your program is not except ionaware, it will die a horrible deat h upon such an except ion. What 's m ore, t he operat ing syst em will sham e you by put t ing up a m essage box, leaving no doubt t hat it was your applicat ion t hat was writ t en using sub- st andard program m ing pract ices ( m aybe not in so m any words) . My point is, in order t o writ e robust and reliable applicat ions- - and t hat 's what t his book is about - - you will sooner or lat er have t o use except ions. Of course, t here are ot her program m ing t echniques t hat were and st ill are being successfully applied t o t he developm ent of reasonably robust and reliable applicat ions. None of t hem , however, com es close in t erm s of sim plicit y and m aint ainabilit y t o t he applicat ion of C+ + except ions in com binat ion wit h t he resource m anagem ent t echniques. I will int roduce t he int eract ion wit h t he operat ing syst em t hrough a series of Windows program m ing exercises. They will lead t he reader int o new program m ing paradigm s: m essage- based program m ing, Model- View- Cont roller approach t o user int erface, et c. The advances in com put er hardware paved t he way t o a new generat ion of PC operat ing syst em s. Preem pt ive m ult it asking and virt ual m em ory are finally m ainst ream feat ures on personal com put ers. So how does one writ e an applicat ion t hat t akes advant age of m ult it asking? How does one synchronize m ult iple t hreads accessing t he sam e dat a st ruct ure? And m ost im port ant ly, how does m ult it asking m esh wit h t he obj ect - orient ed paradigm and C+ + ? I will t ry t o answer t hese quest ions. Virt ual m em ory gives your applicat ion t he illusion of pract ically infinit e m em ory. On a 32- bit syst em you can address 4 gigabyt es of virt ual m em ory- - in pract ice t he am ount of available m em ory is lim it ed by t he size of your hard disk( s) . For t he applicat ion you writ e it m eans t hat it can easily deal wit h m ult i-

6

m egabyt e m em ory based dat a st ruct ures. Or can it ? Welcom e t o t he world of t hrashing! I will explain which algorit hm s and dat a st ruct ures are com pat ible wit h virt ual m em ory and how t o use m em ory- m apped files t o save disk space.

So ftw are Pro je ct There is m ore t o t he creat ion of a successful applicat ion ( or syst em ) t han j ust learning t he language and m ast ering t he t echniques. Today's com m ercial soft ware proj ect s are am ong t he m ost com plex engineering undert akings of hum ankind. Program m ing is essent ially t he art of dealing wit h com plexit y. There were m any at t em pt s t o apply t radit ional engineering m et hods t o cont rol soft ware's com plexit y. Modularizat ion, soft ware reuse, soft ware I C's, et c. Let 's face it - - in general t hey don't work. They m ay be very helpful in providing low level building blocks and libraries, but t hey can hardly be used as guiding principles in t he design and im plem ent at ion of com plex soft ware proj ect s. The sim ple reason is t hat t here is very lit t le repet it ion in a piece of soft ware. Try t o visually com pare a print out of a program wit h, say, a pict ure of a m icroprocessor wafer. You'll see a lot of repet it ive pat t erns in t he layout of t he m icroprocessor. Piece- wise it resem bles som e kind of a high- t ech cryst al. A condensed view of a program , on t he ot her hand, would look m ore like a hight ech fract al. You'd see a lot of self- sim ilarit ies- - large- scale pat t erns will resem ble sm all- scale pat t erns. But you'd find very few exact m at ches or repet it ions. Each lit t le piece appears t o be individually handcraft ed. Repet it ions in a program are not only unnecessary but t hey cont ribut e t o a m aint enance night m are. I f you m odify, or bug- fix, one piece of code, your are supposed t o find all t he copies of t his piece and apply ident ical m odificat ions t o t hem as well. This abhorrence of repet it ion is reflect ed in t he product ion process of soft ware. The proport ion of research, design and m anufact uring in t he soft ware indust ry is different t han in ot her indust ries. Manufact uring, for inst ance, plays only a m arginal role. St rict ly speaking, elect ronic channels of dist ribut ion could m ake t he m anufact uring phase t ot ally irrelevant . R & D plays a vit al role, m ore so t han in m any ot her indust ries. But what really set s soft ware developm ent apart from ot hers is t he am ount of design t hat goes int o t he product . Program m ing is designing. Designing, building prot ot ypes, t est ing- - over and over again. Soft ware indust ry is t he ult im at e " design indust ry." I n t he t hird part of t he book I will at t em pt t o describe t he large- scale aspect s of soft ware developm ent . I will concent rat e on t he dynam ics of a soft ware proj ect , bot h from t he point of view of m anagem ent and planning as well as developm ent st rat egies and t act ics. I will describe t he dynam ics of a proj ect from it s concept ion t o shipm ent . I will t alk about docum ent at ion, t he design process and t he developm ent process. I will not , however, t ry t o com e up wit h ready- m ade recipes because t hey won't work- - for exact ly t he reasons described above. There is a popular unflat t ering st ereot ype of a program m er as a socially challenged nerd. Som ebody who would work alone at night , subsist on Twinkies, avoid direct eye cont act and care very lit t le about personal hygiene. I 've known program m ers like t hat , and I 'm sure t here are st ill som e around. However m ost of t he specim ens of t his old cult ure are becom ing ext inct , and for a good reason. Progress in hardware and soft ware m akes it im possible t o produce any reasonably useful and reliable program while working in isolat ion. Team work is t he essent ial part of soft ware developm ent . Dividing t he work and coordinat ing t he developm ent effort of a t eam is always a big challenge. I n t radit ional indust ries m em bers of t he t eam know ( at least in t heory) what t hey are doing. They learned t he rout ine. They are

7

perform ing a synchronized dance and t hey know t he st eps and hear t he m usic. I n t he soft ware indust ry every t eam m em ber im provises t he st eps as he or goes and, at t he sam e t im e, com poses t he m usic for t he rest of t he t eam . I will advocat e a change of em phasis in soft ware developm ent . I nst ead of t he old axiom Program s are writ t en for com put ers. I will t urn t he logic upside down and claim t hat Program s are writ t en for program m ers. This st at em ent is in fact t he prem ise of t he whole book. You can't develop indust rial st rengt h soft ware if you don't t reat you code as a publicat ion for ot her program m ers t o read, underst and and m odify. You don't want your 'code' t o be an exercise in crypt ography. The com put er is t he ult im at e proofing t ool for your soft ware. The com piler is your spell- checker. By running your program you at t em pt t o t est t he correct ness of your publicat ion. But it 's only anot her hum an being- - a fellow program m er- - t hat can underst and t he m eaning of your program . And it is crucial t hat he do it wit h m inim um effort , because wit hout underst anding, it is im possible t o m aint ain your soft ware.

8

La n g u a g e

• O b je c t s a n d S c o p e s

What 's t he m ost im port ant t hing in t he Universe? I s it m at t er? I t seem s like everyt hing is built from m at t er- galaxies, st ars, planet s, houses, cars and even us, program m ers. But what 's m at t er wit hout energy? The Universe would be dead wit hout it . Energy is t he source of change, m ovem ent , life. But what is m at t er and energy wit hout space and t im e? We need space int o which t o put m at t er, and we need t im e t o see m at t er change. Program m ing is like creat ing universes. We need m at t er: dat a st ruct ures, obj ect s, variables. We need energy- - t he execut able code- - t he lifeforce of t he program . Obj ect s would be dead wit hout code t hat operat es on t hem . Obj ect s need space t o be put int o and t o relat e t o each ot her. Lines of code need t im e t o be execut ed. The space- t im e of t he program is described by scopes. An obj ect lives and dies by it s scope. Lines of execut able code operat e wit hin scopes. Scopes provide t he st ruct ure t o program 's space and t im e. And ult im at ely program m ing is about st ruct ure.

• Ar r a ys a n d R e fe r e n c e s

I n a program , an obj ect is ident ified by it s nam e. But if we had t o call t he obj ect by it s nam e everywhere, we would end up wit h one global nam e space. Our program would execut e in a st ruct ureless " obj ect soup." The power t o give an obj ect different nam es in different scopes provides an addit ional level of indirect ion, so im port ant in program m ing. There is an old saying in Com put er Science- - every problem can be solved by adding a level of indirect ion. This indirect ion can be accom plished by using a reference, an alias, an alt ernat ive nam e, t hat can be at t ached t o a different obj ect every t im e it ent ers a scope. Com put ers are great at m enial t asks. They have a lot m ore pat ience t hat we hum ans do. I t is a punishm ent for a hum an t o have t o writ e " I will not challange m y t eacher's aut horit y" a hundred t im es. Tell t he com put er t o do it a hundred t im es, and it won't even blink. That 's t he power of it erat ion ( and conform it y) .

• Po in te rs

Using references, we can give m ult iple nam es t o t he sam e obj ect . Using point ers, we can have t he sam e nam e refer t o different obj ect s- - a point er is a m ut able reference. Point ers give us power t o creat e com plex dat a st ruct ures. They also increase our abilit y t o shoot ourselves in t he foot . Point er is like a plug t hat can be plugged int o a j ack. I f you have t oo m any plugs and t oo m any j acks, you m ay end up wit h a m ess of t angled cables. A program m er has t o st rike a balance bet ween creat ing a program t hat looks like a breadboard or like a print ed circuit .

• Po lym o rph is m

Polym orphic m eans m ult i- shaped. A t uner, a t ape deck, a CD player- - t hey com e in different shapes but t hey all have t he sam e audio- out j ack. You can plug your earphones int o it and list en t o m usic no m at t er whet her it cam e as a m odulat ion of a carrier wave, a set of m agnet ic dom ains on a t ape or a series of pit s in t he alum inum subst rat e on a plast ic disk.

9

• S m a l l S o f t w a r e P r o je c t

When you writ e a program , you don't ask yourself t he quest ion, " How can I use a part icular language feat ure?" You ask, " What language feat ure will help m e solve m y problem ?"

10

O b je c t s a n d S c o p e s 1. Global Scope 2. Local Scope 3. Em bedded Obj ect s 4. I nherit ance 5. Mem ber Funct ions and I nt erfaces 6. Mem ber Funct ion Scope 7. Types 8. Sum m ary 9. Word of Caut ion 10. Exercises 11. Abst ract Dat a Types 12. Exercises

Glo bal s co p e Class definit ion, obj ect definit ion, const ruct or, dest ruct or, out put st ream , include, m ain. There is an old t radit ion in t eaching C, dat ing back t o Kernighan and Rit chie ( The C Program m ing Language) , t o have t he first program print t he greet ing " Hello World! " . I t is only appropriat e t hat our first C+ + program should respond t o t his greet ing. The way t o do it , of course, is t o creat e t he World and let it speak for it self. The following program does j ust t hat , but it also serves as a m et aphor for C+ + program m ing. Every C+ + program is a world in it self. The world is a play and we define t he charact ers in t hat play and let t hem int eract . This program in a sense is " t he Mot her of all C+ + program s," it cont ains j ust one player, t he World, and let s us wit ness it s creat ion and dest ruct ion. The World int eract s wit h us by print ing m essages on t he com put er screen. I t print s " Hello! " when it is creat ed, and " Good bye! " when it vanishes. So here we go: #include class World { public: World () { std::cout "; // prompt cin.getline (buf, maxBuf); Scanner scanner (buf); Parser parser (scanner, store, funTab, symTab); status = parser.Eval (); } while (status != stQuit); } Not ice t hat for every line of input we creat e a new scanner and a new parser. We keep however t he sam e sym bol t able, funct ion t able and t he st ore. This is im port ant because we want t he values assigned t o variables t o be rem em bered as long as t he program is act ive. The parser’s dest ruct or is called aft er t he evaluat ion of every line. This call plays an im port ant role of freeing t he parse t ree. There is a com m ent at t he t op of main, which hint s at ways of im proving t he st ruct ure of t he program . There are five local variables/ obj ect s defined at t he t op of main. Moreover, t hey depend on each ot her: t he sym bol t able has t o be init ialized before t he funct ion t able and before t he st ore. The parser’s const ruct or t akes references t o t hree of t hem . As a rule of t hum b, whenever you see t oo m any local variables, you should t hink hard how t o com bine at least som e of t hem int o a separat e obj ect . I n our case, it 's pret t y obvious t hat t his obj ect sould be called Calculator. I t should com bine SymbolTable, FunctionTable and Store as it s em beddings. We’ll com e back t o t his program in t he next part of t he book t o see how it can be m ade int o a professional, " indust rial st rengt h" program .

In itializatio n o f Aggre gate s Explicit init ializat ion of classes and arrays. Just as you can explicit ly init ialize an array of charact ers using a lit eral st ring char string [] = "Literal String"; you can init ialize ot her aggregat e dat a st ruct ures- - classes and arrays. An obj ect of a given class can be explicit ly init ialized if and only if all it s non- st at ic dat a m em bers are public and t here is no base class, no virt ual funct ions and no user- defined const ruct or. All public dat a m em bers m ust be explicit ly init ializable as well ( a lit t le recursion here) . For inst ance, if you have

118

class Initializable { public: // no constructor int _val; char const * _string; Foo * _pFoo; }; you can define an inst ance of such class and init ialize all it s m em bers at once Foo foo; Initializable init = { 1, "Literal String", &foo }; Since Initializable is init ializable, you can use it as a dat a m em ber of anot her init ializable class. class BigInitializable { public: Initializable _init; double _pi; }; BigInitializable big = { { 1, "Literal String", &foo }, 3.14 }; As you see, you can nest init ializat ions. You can also explicit ly init ialize an array of obj ect s. They m ay be of a sim ple or aggregat e t ype. They m ay even be arrays of arrays of obj ect s. Here are a few exam ples. char string [] = { 'A', 'B', 'C', '\0' }; is equivalent t o it s short hand char string [] = "ABC"; Here's anot her exam ple Initializable init [2] = { { 1, "Literal String", &foo1 }, { 2, "Another String", &foo2 } }; We used t his m et hod in t he init ializat ion of our array of FuncitionEntry obj ect s. I f obj ect s in t he array have single- argum ent const ruct ors, you can specify t hese argum ent s in t he init ializer list . For inst ance, CelestialBody solarSystem = { 0.33, 4.87, 5.98, 0.64, 1900, 569, 87, 103, 0.66 }; where m asses of planet s are given in unit s of 10 24 kg.

Exe rcis e s

119

1. Creat e a t op level Calculator obj ect and m ake appropriat e changes t o lower level com ponent s. 2. Add t wo new built in funct ions t o t he calculat or, sqr and cube. Sqr squares it s argum ent and cube cubes it ( raises t o t he t hird power) . 3. Add t he recognit ion of unary plus t o t he calculat or. Make necessary m odificat ions t o t he scanner and t he parser. Add a new node, UPlusNode. The calculat or should be able t o deal correct ly wit h such expressions as x = +2 2 * + 7 1 / (+1 - 2) 4. Add powers t o t he calculat or according t o t he following product ions Factor is SimpleFactor ^ Factor // a ^ b (a to the power of b) or SimpleFactor SimpleFactor is ( Expression ) // parenthesized expression or Number // literal floating point number or Identifier ( Expression )// function call or Identifier// symbolic variable or - Factor // unary minus 5. To all nodes in t he parse t ree add virt ual m et hod Print(). When Print() is called on t he root of t he t ree, t he whole t ree should be displayed in som e readable form . Use varying indent at ion ( by print ing a num ber of spaces at t he beginning of every line) t o dist inguish bet ween different levels of t he t ree. For inst ance void AddNode::Print (int indent) const { _pLeft->Print (indent + 2); Indent (indent); cout -sin(x) exp(x) -> exp(x) log(x) -> 1/x sqrt(x) -> 1/(2 * sqrt(x)) The derivat ive of a sum is a sum of derivat ives, t he derivat ive a product is given by t he form ula (f(x) * g(x))’ = f’(x) * g(x) + f(x) * g’(x) where prim e denot es a derivat ive. The derivat ive of a quot ient is given by t he form ula

120

(f(x) / g(x))’ = (f(x) * g’(x) - f’(x) * g(x)) / (g(x) * g(x)) and t he derivat ive of t he superposit ion of funct ions is given by (f(g(x))’ = g’(x) * f’(g(x)). Rewrit e t he calculat or t o derive t he sym bolic derivat ive of t he input by t ransform ing t he parse t ree according t o t he form ulas above. Make sure no m em ory is leaked in t he process ( t hat is, you m ust delet e everyt hing you allocat e) .

Op e rato r o ve rlo ad in g You can pret t y m uch do any kind of arit hm et ic in C+ + using t he built - in int egral and float ing- point t ypes. However, t hat 's not always enough. Old- t im e engineers swear by Fort ran, t he language which has built - in t ype com plex. I n a lot of engineering applicat ions, especially in elect ronics, you can't really do effect ive calculat ions wit hout t he use of com plex num bers. C+ + does not support com plex arit hm et ics. Neit her does it support m at rix or vect or calculus. Does t hat m ean t hat engineers and scient ist s should st ick t o Fort ran? Not at all! Obviously in C+ + you can define new classes of obj ect s, so defining a com plex num ber is a piece of cake. What about adding, subt ract ing, m ult iplying, et c.? You can define appropriat e m et hods of class complex. What about not at ional convenience? I n Fort ran you can add t wo com plex num bers sim ply by put t ing t he plus sign bet ween t hem . No problem ! Ent er operat or overloading. I n an expression like double delta = 5 * 5 - 4 * 3.2 * 0.1; you see several arit hm et ic operat ors: t he equal sign, t he m ult iplicat ion sym bol and t he m inus sign. Their m eaning is well underst ood by t he com piler. I t knows how t o m ult iply or subt ract int egers or float ing- point num bers. But if you want t o t each t he com piler t o m ult iply or subt ract obj ect s of som e user- defined class, you have t o overload t he appropriat e operat ors. The synt ax for operat or overloading requires som e get t ing used t o, but t he use of overloaded operat ors doesn't . You sim ply put a m ult iplicat ion sign bet ween t wo com plex variables and t he com piler finds your definit ion of com plex m ult iplicat ion and applies it . By t he way, a complex t ype is convenient ly defined for you in t he st andard library. An equal sign is an operat or t oo. Like m ost operat ors in C+ + , it can be overloaded. I t s m eaning, however, goes well beyond arit hm et ics. I n fact , if you don't do anyt hing special about it , you can assign an arbit rary obj ect t o anot her obj ect of t he sam e class by sim ply put t ing an equal sign bet ween t hem . Yes, t hat 's right , you can, for inst ance, do t hat : SymbolTable symTab1 (100); SymbolTable symTab2 (200); symTab1 = symTab2; Will t he assignm ent in t his case do t he sensible t hing? No, t he assignm ent will m ost definit ely be wrong, and it will result in a very nast y problem wit h m em ory m anagem ent . So, even if you're not planning on overloading t he st andard arit hm et ic operat ors, you should st ill learn som et hing about t he

121

assignm ent operat or; including when and why you would want t o overload it . And t hat brings us t o a very im port ant t opic- - value sem ant ics.

Pas s in g by Valu e Copy const ruct or, overloading t he assignm ent operat or, default copy const ruct or and operat or = , ret urn by value, passing by value, im plicit t ype conversions. So far we've been careful t o pass obj ect s from and t o m et hods using references or point ers. For inst ance, in t he following line of code Parser

parser (scanner, store, funTab, symTab);

all t he argum ent s t o t he Parser's const ruct or- - scanner, store, funTab and symTab- - are passed by reference. We know t hat , because we've seen t he following declarat ion ( and so did t he com piler) : Parser (Scanner & scanner, Store & store, FunctionTable & funTab, SymbolTable & symTab); When we const ruct t he parser, we don't give it a copy of a sym bol t able. We give it access t o an exist ing sym bol t able. I f we gave it a privat e copy, we wouldn't have been able t o see t he changes t he parser m ade t o it . The parser m ay, for inst ance, add a new variable t o t he sym bol t able. We want our sym bol t able t o rem em ber t his variable even aft er t he current parser is dest royed. The sam e goes for store- - it m ust rem em ber t he values assigned t o sym bolic variables across t he invocat ions of t he parser. But what about t he scanner? We don't really care whet her t he parser m akes a scrat ch copy of it for it s privat e use. Neit her do we care what t he parser does t o t he funct ion t able. What we do care about in t his case is perform ance. Creat ing a scrat ch copy of a large obj ect is quit e t im e consum ing. But suppose we didn't care about perform ance. Would t he following work? Parser (Scanner scanner, Store & store, FunctionTable funTab, SymbolTable & symTab); Not ice t he absence of am persands aft er Scanner and FunctionTable. What we are t elling t he com piler is t his: When t he caller creat es a Parser, passing it a scanner, m ake a t em porary copy of t his scanner and let t he Parser's const ruct or operat e on t hat copy. This is, aft er all, t he way built - in t ypes are passed around. When you call a m et hod t hat expect s an int eger, it 's t he copy of t hat int eger t hat 's used inside t he m et hod. You can m odify t hat copy t o your heart 's cont ent and you'll never change t he original. Only if you explicit ly request t hat t he m et hod accept a reference t o an int eger, can you change t he original. There are m any reasons why such approach will not work as expect ed, unless we m ake several furt her m odificat ions t o our code. First of all, t he t em porary copy of t he scanner ( and t he funct ion t able) will disappear as soon as t he execut ion of t he Parser's const ruct or is finished. The parser will st ore a

122

reference t o it in it s m em ber variable, but t hat 's useless. Aft er t he end of const ruct ion t he reference will point t o a non- exit sent scrat ch copy of a scanner. That 's not good. I f we decide t o pass a copy of t he scanner t o t he parser, we should also st ore a copy of t he scanner inside t he parser. Here's how you do it - - j ust om it t he am persand. class Parser { ... private: Scanner _scanner; Node * _pTree; Status _status; Store & _store; FunctionTable _funTab; SymbolTable & _symTab; }; But what is really happening inside t he const ruct or? Now t hat neit her t he argum ent , scanner, nor t he m em ber variable, _scanner, are references, how is _scanner init ialized wit h scanner? The synt ax is m isleadingly sim ple. Parser::Parser (Scanner scanner, Store & store, FunctionTable funTab, SymbolTable & symTab) : _scanner (scanner), _pTree (0), _status (stOk), _funTab (funTab), _store (store), _symTab (symTab) { } What happens behind t he scenes is t hat Scanner's copy const ruct or is called. A copy const ruct or is t he one t hat t akes a ( possibly const) reference t o t he obj ect of t he sam e class and clones it . I n our case, t he appropriat e const ruct or would be declared as follows, Scanner::Scanner (Scanner const & scanner); But wait a m inut e! Scanner does not have a const ruct or of t his signat ure. Why doesn't t he com piler prot est , like it always does when we t ry t o call an undefined m em ber funct ion? The unexpect ed answer is t hat , if you don't explicit ly declare a copy const ruct or for a given class, t he com piler will creat e one for you. I f t his doesn't sound scary, I don't know what does.

Be w a r e of de fa u lt copy con st r u ct or s! The copy const ruct or generat ed by t he com piler is probably wrong! Aft er all, what can a dum b com piler know about copying user defined classes? Sure, it t ries t o do it s best - - it • does a bit wise copy of all t he dat a m em bers t hat are of built - in t ypes and

123

•

calls respect ive copy const ruct ors for user- defined em bedded obj ect s. But t hat 's it . Any t im e it encount ers a point er it sim ply duplicat es it . I t does not creat e a copy of t he obj ect point ed t o by t he point er. That m ight be okay, or not - - only t he creat or of t he class knows for sure. This kind of operat ion is called a shallow copy, as opposed t o a deep copy which follows all t he point ers. Shallow copy is fine when t he point ed- t o dat a st ruct ures can be easily shared bet ween m ult iple inst ances of t he obj ect . But consider, as an exam ple, what happens when we m ake a shallow copy of t he t op node of a parse t ree. I f t he t op node has children, a shallow copy will not clone t he child nodes. We will end up wit h t wo t op nodes, bot h point ing t o t he sam e child nodes. That 's not a problem unt il t he dest ruct or of one of t he t op nodes is called. I t prom pt ly delet es it s children. And what is t he second t op node point ing t o now? A piece of garbage! The m om ent it t ries t o access t he children, it will st om p over reclaim ed m em ory wit h disast rous result s. But even if it does not hing, event ually it s own dest ruct or is called. And t hat dest ruct or will at t em pt t o delet e t he sam e children t hat have already been delet ed by t he first t op node. The result ? Mem ory corrupt ion. But wait , t here's m ore! C+ + not only sneaks a default copy const ruct or on you. I t also provides you wit h a convenient default assignm ent operat or.

Be w a r e of de fa u lt a ssign m e n t s! The following code is perfect ly legal. SymbolTable symTab1 (100); SymbolTable symTab2 (200); // ... symTab1 = symTab2; Not only does it perform a shallow copy of symTab2 int o symTab1, but it also clobbers what ever already was t here in symTab1. All m em ory t hat was allocat ed in symTab1 is lost , never t o be reclaim ed. I nst ead, t he m em ory allocat ed in symTab2 will be double delet ed. Now t hat 's a bargain! Why does C+ + quiet ly let a skunk int o our house and wait s for us t o run int o it in t he dark? I f you've followed t his book closely, you know t he answer- - com pat ibilit y wit h C! You see, in C you can't define a copy const ruct or, because t here aren't any const ruct ors. So every t im e you want ed t o copy som et hing m ore com plex t han an int, you'd have t o writ e a special funct ion or a m acro. So, in t he t radit ional spirit of let t ing program m ers shoot t hem selves in t he foot , C provided t his addit ional facilit y of quiet ly copying all t he user- defined struct's. Now, in C t his wasn't such a big deal- - a C struct is j ust raw dat a. There is no dat a hiding, no m et hods, no inherit ance. Besides, t here are no references in C, so you are m uch less likely t o inadvert ent ly copy a struct, j ust because you forgot one am persand. But in C+ + it 's a com plet ely different st ory. Beware- - m any a bug in C+ + is a result of a m issing am persand. But wait , t here's even m ore! C+ + not only offers a free copy const ruct or and a free assignm ent , it will also quiet ly use t hese t wo t o ret urn obj ect s from funct ions. Here's a fragm ent of code from our old im plem ent at ion of a st ackbased calculat or, except for one sm all m odificat ion. Can you spot it ? class Calculator

124

{ public: int Execute (Input & input); IStack const GetStack () const { return _stack; } private: int Calculate (int n1, int n2, int token) const; IStack

_stack;

}; What happened here was t hat I om it t ed t he am persand in t he ret urn t ype of Calculator::GetStack and IStack is now ret urned by value. Let 's have a very close look at what happens during such t ransfer. Also, let 's assum e for a m om ent t hat t he com piler doesn't do any clever opt im izat ions here. I n part icular, let 's define GetStack out of line, so a regular funct ion call has t o be execut ed. IStack const Calculator::GetStack () const { return _stack; } //... IStack const stk; stk = calc.GetStack (); // = newSize) newSize = idxMax + 1; // allocate new array T * arrNew = new T [newSize]; // copy all entries int i; for (i = 0; i < _capacity; ++i) arrNew [i] = _arr [i]; for (; i < newSize; ++i) arrNew [i] = _valDefault; _capacity = newSize; // free old memory delete []_arr; // substitute new array for old array _arr = arrNew;

175

} Now it 's t im e t o m ake use of our dynam ic array t em plat e t o see how easy it really is. Let 's st art wit h t he class MultiNode. I n t he old, lim it ed, im plem ent at ion it had t wo arrays: an array of point ers t o Node and an array of Boolean flags. Our first st ep is t o change t he t ypes of t hese arrays t o, respect ively, DynArray and DynArray. We have t o pass default values t o t he const ruct ors of t hese arrays in t he pream ble t o MultiNode const ruct or. These m et hods t hat j ust access t he arrays will work wit h no changes ( due t o our overloading of operat or [ ] ) , except for t he places where we used t o check for array bounds. Those are t he places where we m ight have t o ext end t he arrays, so we should use t he new Add m et hod. I t so happens t hat t he only place we do it is inside t he AddChild m et hod and t he conversion is st raight forward. class MultiNode: public Node { public: MultiNode (Node * pNode) : _aChild (0), _aIsPositive (false), _iCur (0) { AddChild (pNode, true); } ~MultiNode (); void AddChild (Node * pNode, bool isPositive) { _aChild.Add (_iCur, pNode); _aIsPositive.Add (_iCur, isPositive); ++_iCur; } protected: int _iCur; DynArray _aChild; DynArray _aIsPositive; }; MultiNode::~MultiNode () { for (int i = 0; i < _iCur; ++i) delete _aChild [i]; } Let 's have one m ore look at t he Calc m et hod of SumNode. Ot her t han for t he rem oval of error checking ( we have got t en rid of t he unnecessary flag, _isError) , it works as if not hing have changed. double SumNode::Calc () const { double sum = 0.0; for (int i = 0; i < _iCur; ++i) { double val = _aChild [i]->Calc ();

176

if (_aIsPositive[i]) sum += val; else sum --= val; } return sum; } The only difference is t hat when we access our arrays _aChild [i] and _aIsPositive [i], we are really calling t he overloaded operat or [ ] of t he respect ive dynam ic arrays. And, by t he way, since t he m et hod Calc is const , it is t he const version of t he overload we're calling. I sn't t hat beaut iful?

Se p aratin g Fu n ctio n ality in to N e w Clas s e s I 'm not happy wit h t he st ruct uring of t he sym bol t able. Just one look at t he seven dat a m em bers t ells m e t hat a new class is budding. ( Seven is t he m agic num ber.) Here t hey are again: HTable _htab; int * _offStr; int _capacity; int _curId; char * _strBuf; int _bufSize; int _curStrOff; The rule of t hum b is t hat if you have t oo m any dat a m em bers you should group som e of t hem int o new classes. This is act ually one of t he t hree rules of class form at ion. You need a new class when t here are • t oo m any local variables in a funct ion, • t oo m any dat a m em bers in a class or • t oo m any argum ent s t o a funct ion. I t will becom e clearer why t hese rules m ake perfect sense ( and why seven is t he m agic num ber) when we t alk about ways of dealing wit h com plexit y in t he t hird part of t his book. The last t hree dat a m em bers of t he sym bol t able are perfect candidat es for a new st ring- buffer obj ect . The st ring buffer is able t o st ore st rings and assign t hem num bers, called offset s, t hat uniquely ident ify t hem . As a bonus, we'll m ake t he st ring buffer dynam ic, so we won't have t o worry about overflowing it wit h t oo m any st rings. class StringBuffer { public: StringBuffer () : _strBuf (0), _bufSize (0), _curStrOff (0) {} ~StringBuffer () { delete _strBuf; } int AddString (char const * str); char const * GetString (int off) const {

177

assert (off < _curStrOff); return &_strBuf [off]; } private: void Reallocate (int addLen); char int int

* _strBuf; _bufSize; _curStrOff;

}; When t he buffer runs out of space, t he AddString m et hod reallocat es t he whole buffer. int StringBuffer::AddString (char const * str) { int len = strlen (str); int offset = _curStrOff; // is there enough space? if (_curStrOff + len + 1 >= _bufSize) { Reallocate (len + 1); } // copy the string there strncpy (&_strBuf [_curStrOff], str, len); // calculate new offset _curStrOff += len; _strBuf [_curStrOff] = 0; // null terminate ++_curStrOff; return offset; } The reallocat ion follows t he st andard doubling pat t ern- - but m aking sure t hat t he new st ring will fit no m at t er what . void StringBuffer::Reallocate (int addLen) { int newSize = _bufSize * 2; if (newSize AddChild (pRight, (token == tPlus)); token = _scanner.Token(); } while (token == tPlus || token == tMinus); The call t o Term ret urns a node point er t hat is t em porarily st ored in pRight. Then t he MultiNode's m et hod AddChild is called, and we know very well t hat it

194

m ight t ry t o resize it s array of children. I f t he reallocat ion fails and an except ion is t hrown, t he t ree point ed t o by pRight will never be deallocat ed. We have a m em ory leak. Before I show you t he syst em at ic solut ion t o t his problem , let 's t ry t he obvious t hing. Since our problem st em s from t he presence of a naked point er, let 's creat e a special purpose class t o encapsulat e it . This class should acquire t he node in it s const ruct or and release it in t he dest ruct or. I n addit ion t o t hat , we would like obj ect s of t his class t o behave like regular point ers. Here's how we can do it . class NodePtr { public: NodePtr (Node * pNode) : _p (pNode) {} ~NodePtr () { delete _p; } Node * operator-->() const { return _p; } Node & operator * () const { return _p; } private: Node * _p; }; Such obj ect s are called safe or sm art point ers. The point er- like behavior is im plem ent ed by overloading t he point er- access and point er- dereference operat ors. This clever device m akes an obj ect behave like a point er. I n part icular, one can call all t he public m et hods ( and access all public dat a m em bers, if t here were any) of Node by " dereferencing" an obj ect of t he t ype NodePtr. { Node * pNode = Expr (); NodePtr pSmartNode (pNode); double x = pSmartNode->Calc (); // pointer-like behavior ... // Automatic destruction of pSmartNode. // pNode is deleted by its destructor. } Of course, a sm art point er by it self will not solve our problem s in t he parser. Aft er all we don't want t he nodes creat ed by calling Term or Factor t o be aut om at ically dest royed upon norm al exit from t he scope. We want t o be able t o build t hem int o t he parse t ree whose lifet im e ext ends well beyond t he local scope of t hese m et hods. To do t hat we will have t o relax our First Rule of Acquisit ion .

Ow n e rs h ip Tran s fe r: Firs t Atte m p t When t he lifet im e of a given resource can be m apped int o t he lifet im e of som e scope, we encapsulat e t his resource in a sm art point er and we're done. When t his can't be done, we have t o pass t he resource bet ween scopes. There are t wo possible direct ions for such t ransfer: up and down. A resource m ay be passed up from a procedure t o t he caller ( ret urned) , or it can be passed down from a caller t o t he procedure ( as an argum ent ) . We assum e t hat before being passed, t he resource is owned by som e t ype of owner obj ect ( e.g., a sm art point er) .

195

Passing a resource down t o a procedure is relat ively easy. We can sim ply pass a reference t o t he owner obj ect ( a sm art point er, in our case) and let t he procedure acquire t he ownership from it . We'll add a special m et hod, Release, t o our sm art point er t o release t he ownership of t he resource. Node * NodePtr::Release () { Node * tmp = _p; _p = 0; return tmp; } The im port ant t hing about Release is t hat it zeroes t he int ernal point er, so t hat t he dest ruct or of NodePtr will not delet e t he obj ect ( delete always checks for a null point er and ret urns im m ediat ely) . Aft er t he call t o Release t he sm art point er no longer owns t he resource. So who owns it ? Whoever called it bet t er provide a new owner! This is how we can apply t his m et hod in our program . Here, t he node resource is passed from t he Parser's m et hod Expr down t o t he MultiNode's m et hod AddChild. do { _scanner.Accept(); NodePtr pRight (Term ()); pMultiNode->AddChild (pRight, (token == tPlus)); token = _scanner.Token(); } while (token == tPlus || token == tMinus); AddChild acquires t he ownership of t he node by calling t he Release m et hod and passes it im m ediat ely t o t he vect or _aChild ( if you see a problem here, read on, we'll t ackle it lat er) . void MultiNode::AddChild (NodePtr & pNode, bool isPositive) { _aChild.push_back (pNode.Release ()); _aIsPositive.push_back (isPositive); } Passing a resource up is a lit t le t rickier. Technically, t here's no problem . We j ust have t o call Release t o acquire t he resource from t he owner and t hen ret urn it back. For inst ance, here's how we can ret urn a node from Parser::Expr. Node * Parser::Expr () { // Parse a term NodePtr pNode (Term ()); ... return pNode.Release (); } What m akes it t ricky is t hat now t he caller of Expr has a naked point er. Of course, if t he caller is sm art , he or she will im m ediat ely find a new owner for t his point er- - presum ably a sm art point er- - j ust like we did a m om ent ago wit h t he result of Term. But it 's one t hing t o expect everybody t o t ake special care of

196

t he naked point ers ret urned by new and Release, and quit e a different st ory t o expect t he sam e level of vigilance wit h every procedure t hat happens t o ret urn a point er. Especially t hat it 's not im m ediat ely obvious which ones are ret urning st rong point ers t hat are supposed t o be delet ed, and which ret urn weak point ers t hat m ust not be delet ed. Of course, you m ay chose t o st udy t he code of every procedure you call and find out what 's expect ed from you. You m ight hope t hat a procedure t hat t ransfer ownership will be appropriat ely com m ent ed in it s header file. Or you m ight rely on som e special nam ing convent ion- - for inst ance st art t he nam es of all resource- ret urning procedures wit h t he prefix " Query" ( been t here! ) . Fort unat ely, you don't have t o do any of t hese horrible t hings. There is a bet t er way. Read on! To sum m arize, even t hough t here are som e big holes in our m et hodology, we have accom plished no m ean feat . We have encapsulat ed all t he resources following t he First Rule of Acquisit ion. This will guarant ee aut om at ic cleanup in t he face of except ions. We have a crude m et hod of t ransfering resources up and down bet ween owners.

Ow n e rs h ip Tran s fe r: Se co n d Atte m p t So far our at t em pt at resource t ransfer t hrough procedure boundaries have been t o release t he resource from it s owner, pass it in it s " naked" form and t hen im m ediat ely encapsulat e it again. The obvious danger is t hat , alt hough t he passing happens wit hin a few nanosecond in a running program , t he code t hat accept s t he resource m ay be writ t en m ont hs or even years aft er t he code t hat releases it . The t wo sides of t he procedure barrier don't necessarily t alk t o each ot her. But who says t hat we have t o " undress" t he resource for t he durat ion of t he t ransfer? Can't we pass it t oget her wit h it s encapsulat or? The short answer is a resounding yes! The longer answer is necessary in order t o explain why it wouldn't work wit hout som e unort hodox t hinking. First of all, if we were t o pass a sm art point er " as is" from a procedure t o t he caller, we'd end up wit h a dangling point er. NodePtr Parser::Expr () { // Parse a term NodePtr pNode = Term (); // Calc (); if (*isPosIt) sum += val; else sum -= val; } assert (isPosIt == _aIsPositive.end ()); return sum; } I said " m ight ," because it 's not im m ediat ely obvious t hat t his st yle of coding, using it erat ors, is m ore advant ageous t hat t he t radit ional array- index it erat ion; especially when t wo parallel arrays are involved. I had t o use t he com m a sequencing operat or t o squeeze t wo increm ent operat ions int o one slot in t he for- loop header. ( Expressions separat ed by com m as are evaluat ed in sequence. The value of t he sequence is equal t o t he value of it s last expression.) On t he ot her hand, t his code would be easier t o convert if we were t o reim plem ent MultiNode t o use linked list s inst ead of vect ors. That , however, seem s rat her unlikely.

Erro r Pro p agatio n Now t hat our code is except ion- safe, we should reconsider our errorhandling policy. Look what we've been doing so far when we det ect ed a synt ax error. We set t he parser's st at us t o stError and ret urned a null point er from what ever parsing m et hod we were in. I t so happened t hat all synt ax errors were det ect ed at t he lowest level, inside Parser::Factor. However, bot h Parser::Term and Parser::Expr had t o deal wit h t he possibilit y of a null node com ing from a lower- level parsing m et hod. I n fact , Parser::Factor it self had t o deal wit h t he possibilit y t hat t he recursive call t o Expr m ight ret urn a null. Our code was sprinkled wit h error- propagat ion art ifact s like t his one: if (pNode.get () == 0) return pNode; Whenever t here is a sit uat ion where an error has t o be propagat ed st raight t hrough a set of nest ed calls, one should consider using except ions. I f we let Parser::Factor t hrow an except ion whenever it det ect s a synt ax error, we won't have t o worry about det ect ing and propagat ing null point ers t hrough ot her parsing m et hods. All we'll need is t o cat ch t his except ion at t he highest level- say in Parser::Parse. class Syntax {}; Status Parser::Parse () {

208

try { // Everything is an expression _pTree = Expr (); if (!_scanner.IsDone ()) _status = stError; } catch (Syntax) { _status = stError; } return _status; } Not ice t hat I defined a separat e class of except ions, Syntax, for propagat ing synt ax errors. For now t his class is em pt y, but it s t ype let s m e dist inguish it from ot her t ypes of except ions. I n part icular, I don't want t o cat ch bad_alloc except ions in Parser::Parse, since I don't know what t o do wit h t hem . They will be caught and dealt wit h in main. Here's an exam ple of code from Parser::Factor convert ed t o use except ions for synt ax error report ing. Not ice t hat we no longer t est for null ret urn from Expr ( in fact we can assert t hat it 's not null! ) . if (_scanner.Token () == tLParen) // function call { _scanner.Accept (); // accept '(' pNode = Expr (); assert (pNode.get () != 0); if (_scanner.Token () == tRParen) _scanner.Accept (); // accept ')' else throw Syntax (); if (id != SymbolTable::idNotFound && id < _funTab.Size ()) { pNode = auto_ptr ( new FunNode (_funTab.GetFun (id), pNode)); } else { cerr IncRefCount (); } ~RefPtr () { Release (); } RefPtr const & operator= (RefPtr const & p) { if (this != &p) { Release (); _p = p._p; _p->IncRefCount (); } return *this; } protected: RefPtr (T * p) : _p (p) {} void Release () { if (_p->DecRefCount () == 0) delete _p; }

218

T * _p; }; Not ice t hat t he reference- count ed t ype T m ust provide at least t wo m et hods, IncRefCount and DecRefCount. We also t acit ly assum e t hat it is creat ed wit h a reference count of one, before being passed t o t he prot ect ed const ruct or of RefPtr. Alt hough it 's not absolut ely necessary, we m ight want t he t ype T t o be a descendant of a base class t hat im plem ent s reference count ing int erface. class RefCounted { public: RefCounted () : _count (1) {} int GetRefCount () const { return _count; } void IncRefCount () const { _count++; } int DecRefCount () const { return --_count; } private: mutable int _count; }; Not ice one int erest ing det ail, t he m et hods IncRefCount and DecRefCount are declared const, even t hough t hey m odify t he obj ect 's dat a. You can do t hat , wit hout t he com piler raising an eyebrow, if you declare t he relevant dat a m em ber mutable. We do want t hese m et hods t o be const ( or at least one of t hem , IncRefCount) because t hey are called on const obj ect s in RefPtr. Bot h t he copy const ruct or and t he assignm ent operat or t ake const references t o t heir argum ent s, but t hey m odify t heir reference count s. We decided not t o consider t he updat ing of t he reference count a " m odificat ion" of t he obj ect . I t will m ake even m ore sense when we get t o t he copy- on- writ e im plem ent at ion. Just for t he dem onst rat ion purposes, let 's creat e a reference- count ed st ring represent at ion using our original StringVal. Norm ally, one would do it m ore efficient ly, by com bining t he reference count wit h t he st ring buffer. class StringRep: public RefCounted { public: StringRep (char const * cstr) :_string (cstr) {} char const * c_str () const { return _string.c_str (); } void Upcase () { _string.Upcase (); } private: StringVal _string; }; Our act ual st ring class is built on t he base of RefPtr which int ernally represent s st ring dat a wit h StringRep. class StringRef: public RefPtr {

219

public: StringRef (char const * cstr) : RefPtr (new StringRep (cstr)) {} StringRef (StringRef const & str) : RefPtr (str) {} char const * c_str () const { return _p->c_str (); } void Upcase () { _p->Upcase (); } }; Ot her t han in t he special C- st ring- t aking const ruct or, t here is no copying of dat a. The copy const ruct or j ust increm ent s t he reference count of t he st ringrepresent at ion obj ect . So does t he ( inherit ed) assignm ent operat or. Consequent ly, " copying" and passing a StringRef by value is relat ively cheap. There is only one t iny problem wit h t his im plem ent at ion. Aft er you call Upcase on one of t he copies of a StringRef, all ot her copies change t o upper case. StringRef strOriginal ("text"); StringRef strCopy (strOriginal); strCopy.Upcase (); // The original will be upper-cased! cout > _number; // read the whole number break; } Reading a float ing- point num ber from t he st andard input is easy. The only com plicat ion arises from t he fact t hat we've already read t he first charact er of t he num ber- - our lookahead. So before we read t he whole num ber, we have t o put our lookahead back int o t he st ream . Don't worry, t his is a sim ple operat ion. Aft er all, t he input st ream is buffered. When you call get, t he charact er is sim ply read from a buffer ( unless t he buffer is em pt y- - in t hat case t he syst em replenishis it by act ually reading t he input ) . Unget t ing a charact er j ust m eans put t ing it back int o t hat buffer. I nput st ream s are im plem ent ed in such a way t hat it 's always possible t o put back one charact er. When reading an ident ifier, we do a slight variat ion of t he sam e t rick. default: if (isalpha (_look) || _look == '_') { _token = tIdent; _symbol.erase (); // erase string contents do { _symbol += _look; _look = _in.get (); } while (isalnum (_look)); _in.putback (_look); } else _token = tError; break; We don't have t o putback a lookahead at t he beginning of reading an ident ifier. I nst ead, we have t o putback t he last charact er, t he one t hat is not part of t he ident ifier, so t hat t he next call t o ReadChar () can see it . Haven't we lost som e generalit y by swit ching from a st ring t o a st ream in our im plem ent at ion of t he scanner? Aft er all, you can always convert a st ream t o a st ring ( e.g., using getline ()) . I s t he opposit e possible? Not t o worry! Convert ing a st ring int o a st ream is as easy. The appropriat e class is called istringstream and is defined in t he header < sst r e a m > . Since istringstream inherit s from istream, our scanner won't not ice t he difference. For inst ance, we can do t his: std::istringstream in ("sin (2 * pi / 3)"); Scanner scanner (in);

226

We have j ust skim m ed t he surface of t he st andard library and we've already found a lot of useful st uff. I t really pays t o st udy it , rat her t han im plem ent your own solut ions from scrat ch.

227

Co d e R e v i e w 7 : S e r i a l i z a t i o n a n d D e s e rializatio n Th e Calcu lato r Obje ct Look at main: There are t oo m any obj ect s t here. The sym bol t able, t he funct ion t able and t he st ore. All t hree obj ect s have t he sam e lifespan- - t he durat ion of t he program execut ion. They have t o be init ialized in part icular order and all t hree of t hem are passed t o t he const ruct or of t he parser. They j ust scream t o be com bined int o a single obj ect called- - you guessed it - - t he Calculator. Em bedding t hem in t he right order inside t his class will t ake care of t he correct order of init ializat ion. class Calculator { friend class Parser; public: Calculator () : _funTab (_symTab), _store (_symTab) {} private: Store & GetStore () { return _store; } PFun GetFun (int id) const { return _funTab.GetFun (id); } bool IsFunction (int id) const { return id < _funTab.Size (); } int AddSymbol (std::string const & str) { return _symTab.ForceAdd (str); } int FindSymbol (std::string const & str) const { return _symTab.Find (str); } SymbolTable _symTab; Function::Table _funTab; Store _store; }; Of course, now we have t o m ake appropriat e changes ( read: sim plificat ions) in main and in t he parser. Here are j ust a few exam ples- - in t he declarat ion of t he parser: class Parser { public: Parser (Scanner & scanner, Calculator & calc); ... private: ... Scanner & _scanner;

228

auto_ptr Status Calculator

_pTree; _status; & _calc;

}; and in it s im plem ent at ion. // Factor := Ident if (id == SymbolTable::idNotFound) { id = _calc.AddSymbol (strSymbol); } pNode = auto_ptr (new VarNode (id, _calc.GetStore ())); Have you not iced som et hing? We j ust went ahead and m ade anot her m aj or t op- level change in our proj ect , j ust like t his! I n fact it was alm ost t rivial t o do, wit h j ust a lit t le help from t he com piler. Here's t he prescript ion. St art in t he spot in main where t he sym bol t able, funct ion t able and st ore are defined ( const ruct ed) . Replace t hem wit h t he new obj ect , calculat or. Declare t he class for Calculator and writ e a const ruct or for it . Now, if you are really lazy and t ired of t hinking, fire off t he com piler. I t will im m ediat ely t ell you what t o do next : You have t o m odify t he const ruct or of t he parser. You have t o pass it t he calculat or rat her t han it s t hree separat e part s. At t his point you m ight not ice t hat it will be necessary t o change t he class declarat ion of t he Parser t o let it st ore a reference t o t he Calculator. Or, you could run t he com piler again and let it rem ind you of it . Next , you will not ice all t he com pilat ion errors in t he im plem ent at ion of Parser. You can fix t hem one- by- one, adding new m et hods t o t he Calculator as t he need arises. The whole procedure is so sim ple t hat you m ight ask an int ern who has j ust st art ed working on t he proj ect t o do it wit h m inim al supervision. The m oral of t his st ory is t hat it 's never t oo lat e t o work on t he im provem ent of t he high level st ruct ure of t he proj ect . The t rut h is t hat you rarely get it right t he first t im e. And, by t he way, you have j ust seen t he m et hod of t op- down program m odificat ion. You st art from t he t op and let t he com piler lead you all t he way down t o t he nit t y- grit t y det ails of t he im plem ent at ion. That 's t he t hird part of t he t op- down m et hodology which consist s of: • Top- down design • Top- down im plem ent at ion and • Top- down m odificat ion. I can't st ress enough t he im port ance of t he t op- down m et hodology. I have yet t o see a clean, well writ t en piece of code t hat was creat ed bot t om - up. You'll hear people saying t hat som e t hings are bet t er done t op- down, ot hers bot t om up. Som e people will say t hat st art ing from t he m iddle and expanding in bot h direct ions is t he best way t o go. Take all such st at em ent s wit h a very big grain of salt . I t is a fact t hat bot t om - up developm ent is m ore nat ural when you have no idea what you're doing- - when your goal is not t o writ e a specific program , but rat her t o play around wit h som e " neat stuff." I t 's an easy way, for inst ance, t o learn t he int erface t o som e obscure subsyst em t hat you m ight want t o use. Bot t om - up developm ent is also preferable if you're not very good at design or if you dislike j ust sit t ing t here and t hinking inst ead of coding. I t is a plus if you

229

enj oy long hours of debugging or have som ebody else ( hopefully not t he end user! ) t o debug your code. Finally, if you em brace t he bot t om - up philosophy, you'll have t o resign yourself t o never being able t o writ e a professionally looking piece of code. Your program s will always look t o t he t rained eye like t hose elect ronics proj ect s creat ed wit h Radio Shack part s, on breadboards, wit h bent wires st icking out in all direct ions and bat t eries held t oget her wit h rubber bands. The real reason I decided t o finally get rid of t he t op level m ess and int roduce t he Calculator obj ect was t o sim plify t he j ob of adding a new piece of funct ionalit y. Every t im e t he m anagem ent asks you t o add new feat ures, t ake t he opport unit y t o sneak in a lit t le rewrit e of t he exist ing code. The code isn't good enough if it hasn't been rewrit t en at least t hree t im es. I 'm serious! By rewrit ing I don't m ean t hrowing it away and st art ing from scrat ch. Just t ake your t im e every now and t hen t o im prove t he st ruct ure of each part of t he proj ect . I t will pay off t rem endously. I t will act ually short en t he developm ent cycle. Of course, if you have st ress- puppy m anagers, you'll have a hard t im e convincing t hem about it . They will keep running around shout ing nonsense like " if it ain't broken, don't fix it " or " if we don't ship it t om orrow, we are all dead." The m om ent you buy int o t hat , you're doom ed! You'll never be able t o do anyt hing right and you'll be spending m ore and m ore t im e fixing t he scaffolding and chasing bugs in som e low qualit y t em porary code pronounced t o be of t he " ain't broken" qualit y. Welcom e t o t he m aint enance night m are! So here we are, alm ost at t he end of our proj ect , when we are t old t hat if we don't provide a com m and t o save and rest ore t he st at e of t he calculat or from a file, we're dead. Fort unat ely, we can add t his feat ure t o t he program wit hout m uch t rouble and, as a bonus, do som e m ore cleanup.

Co m m an d Pars e r We'll go about adding new funct ionalit y in an orderly fashion. We have t o provide t he user wit h a way t o input com m ands. So far we've had a hack for input t ing t he quit com m and- - an em pt y line was int erpret ed as " quit ." Now t hat we want t o add t wo m ore com m ands, save and rest ore, we can as well find a m ore general solut ion. I probably don't have t o t ell you t hat , but ... W h e n e ve r t h e r e a r e m or e t h a n t w o spe cia l ca se s, you sh ou ld ge n e r a lize them . The calculat or expect s expressions from t he user. Let 's dist inguish com m ands from expressions by prefixing t hem wit h an exclam at ion sign. Exclam at ion has t he nat ural connot at ion of com m anding som ebody t o do som et hing. We'll use a prefix rat her t han a suffix t o sim plify our parsing. We'll also m ake quit a regular com m and; t o be input as " ! q" . We'll even rem ind t he user of t his com m and when t he calculat or st art s. cerr Size () - _curOff); xact.LogSecond (para2); Paragraph * oldPara = _curPara; // May throw an exception! SubstCurPara (para1, para2); xact.Commit (); delete oldPara; // destructor of xact executed } This is how t he t ransact ion obj ect is im plem ent ed. class Transaction { public: Transaction () : _commit (false), _para1 (0), _para2 (0) {} ~Transaction () { if (!_commit) { // unroll all the actions delete _para2; delete _para1; } } void LogFirst (Paragraph * para) { _para1 = para; } void LogSecond (Paragraph * para) { _para2 = para; } void Commit () { _commit = true; } private: bool _commit; Paragraph * _para1; Paragraph * _para2; }; Not ice how carefully we prepare all t he ingredient s for t he t ransact ion. We first allocat e all t he resources and log t hem in our t ransact ion obj ect . The new paragraphs are now owned by t he t ransact ion. I f at any point an except ion is t hrown, t he dest ruct or of t he Transact ion, st ill in it s non- com m it t ed st at e, will perform a rollback and free all t hese resources. Once we have all t he resources ready, we m ake t he swit ch- - new resources go int o t he place of t he old ones. The swit ch operat ion usually involves t he m anipulat ion of som e point ers or array indexes. Once t he swit ch has been done, we can com m it t he t ransact ion. From t hat point on, t he t ransact ion no longer owns t he new paragraphs. The dest ruct or of a com m it t ed t ransact ion usually does not hing at all. The swit ch m ade t he docum ent t he owner of t he new paragraphs and, at t he sam e t im e, freed t he ownership of t he old paragraph which we t hen prom pt ly delet e. All sim ple t ransact ions follow t his pat t ern: • Allocat e and log all t he resources necessary for t he t ransact ion. • Swit ch new resources in t he place of old resources and com m it . • Clean up old resources.

247

Figure: Prepared t ransact ion. The t ransact ion owns all t he new resources. The m ast er dat a st ruct ure owns t he old resources.

Figure: Abort ing a t ransact ion. The t ransact ion's dest ruct or frees t he resources.

Figure 3- - 11 The swit ch. The m ast er dat a st ruct ure releases t he old resources and t akes t he ownership of t he new resources.

Figure 3- - 12 The cleanup. Old resources are freed and t he t ransact ion is delet ed.

Pe r s is t e n t Tr a n s a ct io n s When designing a persist ent t ransact ion- - one t hat m anipulat es persist ent dat a st ruct ures- - we have t o t hink of recovering from such disast ers as syst em crashes or power failures. I n cases like t hose, we are not so m uch worried about in- m em ory dat a st ruct ures ( t hese will be lost anyway) , but about t he persist ent , on- disk, dat a st ruct ures. A persist ent t ransact ion goes t hrough sim ilar st ages as t he t ransient one. • Preparat ion: New inform at ion is writ t en t o disk. • Com m it m ent : The new inform at ion becom es current , t he old is disregarded. • Cleanup: The old inform at ion is rem oved from disk.

248

A syst em crash can happen before or aft er com m it m ent m ent ( I 'll explain in a m om ent why it can't happen during t he com m it ) . When t he syst em com es up again, we have t o find all t he int errupt ed t ransact ions ( t hey have t o leave som e t race on disk) and do one of t wo t hings: if t he t ransact ion was int errupt ed before it had a chance t o com m it , we m ust unroll it ; ot herwise we have t o com plet e it . Bot h cases involve cleanup of som e on- disk dat a. The unrolling m eans delet ing t he inform at ion writ t en in preparat ion for t he t ransact ion. The com plet ing m eans delet ing t he old inform at ion t hat is no longer needed.

Figure: The Swit ch. I n one at om ic writ e t he on- disk dat a st ruct ure changes it s cont ent s. The crucial part of t he t ransact ion is, of course, com m it m ent m ent . I t 's t he " flipping of t he swit ch." I n one at om ic operat ion t he new inform at ion becom es current and t he old becom es invalid. An at om ic operat ion eit her succeeds and leaves a perm anent t race on disk, or fails wit hout leaving a t race. That shouldn't be difficult , you'd say. How about sim ply writ ing som et hing t o a file? I t eit her succeeds or fails, doesn't it ? Well, t here's t he rub! I t doesn't ! I n order t o underst and t hat , we have t o delve a lit t le int o t he int ernals of a file syst em . First of all, writ ing int o a file doesn't m ean writ ing t o disk. Or, at least , not im m ediat ely. I n general, file writ es are buffered and t hen cached in m em ory before t hey are physically writ t en t o disk. All t his is quiet ly done by t he runt im e ( t he buffering) and by t he operat ing syst em ( t he caching) in order t o get reasonable perform ance out of your m achine. Disk writ es are so incredibly slow in com parison wit h m em ory writ es t hat caching is a m ust . What 's even m ore im port ant : t he order of physical disk writ es is not guarant eed t o follow t he order of logical file writ es. I n fact t he file syst em goes out of it s way t o com bine writ es based on t heir physical proxim it y on disk, so t hat t he m agnet ic head doesn't have t o m ove t oo m uch. And t he physical layout of a file m ight have not hing t o do wit h it s cont iguous logical shape. Not t o m ent ion writ es t o different files t hat can be quit e arbit rarily reordered by t he syst em , no m at t er what your program t hinks. Thirdly, cont iguous writ es t o a single file m ay be split int o several physical writ es depending on t he disk layout set up by your file syst em . You m ight be writ ing a single 32- bit num ber but , if it happens t o st raddle sect or boundaries, one part of it m ight be writ t en t o disk in one writ e and t he ot her m ight wait for anot her sweep of t he cache. Of course, if t he syst em goes down bet ween t hese t wo writ es, your dat a will end up part ially writ t en. So m uch for at om ic writ es. Now t hat I have convinced you t hat t ransact ions are im possible, let m e explain a few t ricks of t rade t hat m ake t hem possible aft er all. First of all, t here is a file syst em call, Flush, t hat m akes 100% sure t hat t he file dat a is writ t en t o t he disk. Not at om ically, m ind you- - Flush m ay fail in t he m iddle of writ ing a 32- bit num ber. But once Flush succeeds, we are guarant eed t hat t he dat a is safely st ored on disk. Obviously, we have t o flush t he new dat a t o disk before we go

249

about com m it t ing a t ransact ion. Ot herwise we m ight wake up aft er a syst em crash wit h a com m it t ed t ransact ion but incom plet e dat a st ruct ure. And, of course, anot her flush m ust finish t he com m it t ing a t ransact ion. How about at om icit y? How can we at om ically flip t he swit ch? Som e dat abases go so far as t o inst all t heir own file syst em s t hat support at om ic writ es. We won't go t hat far. We will assum e t hat if a file is sm all enough, t he writ es are indeed at om ic. " Sm all enough" m eans not larger t han a sect or. To be on t he safe side, m ake it less t han 256 byt es. Will t his work on every file syst em ? Of course, not ! There are som e file syst em s t hat are not even recoverable. All I can say is t hat t his m et hod will work on NTFS- - t he Windows NT( t m ) file syst em . You can quot e m e on t his. We are now ready t o t alk about t he sim plest im plem ent at ion of t he persist ent t ransact ion- - t he t hree file schem e.

Th e Th r e e -File Sch e m e An idealized word processor reads an input file, let s t he user edit it and t hen saves t he result . I t 's t he save operat ion t hat we are int erest ed in. I f we st art overwrit ing t he source file, we're asking for t rouble. Any kind of failure and we end up wit h a part ially updat ed ( read: corrupt ed! ) file. So here's anot her schem e: Writ e t he com plet e updat ed version of t he docum ent int o a separat e file. When you are done writ ing, flush it t o m ake sure t he dat a get s t o disk. Then com m it t he t ransact ion and clean up t he original file. To keep perm anent record of t he st at e of t he t ransact ion we'll need one m ore sm all file. The t ransact ion is com m it t ed by m aking one at om ic writ e int o t hat file. So here is t he t hree- file schem e: We st art wit h file A cont aining t he original dat a, file B wit h no dat a and a sm all 1- byt e file S ( for Swit ch) init ializedcont aint ain a zero. The t ransact ion begins. • Writ e t he new version of t he docum ent int o file B. • Flush file B t o m ake sure t hat t he dat a get s t o disk. • Com m it : Writ e 1 int o file S and flush it . • Em pt y file A. The m eaning of t he num ber st ored in file S is t he following: I f it s value is zero, file A cont ains valid dat a. I f it 's one, file B cont ains valid dat a. When t he program st art s up, it checks t he value st ored in S, loads t he dat a from t he appropriat e file and em pt ies t he ot her file. That 's it ! Let 's now analyze what happens if t here is a syst em crash at any point in our schem e. I f it happens before t he new value in file S get s t o t he disk, t he program will com e up and read zero from S. I t will assum e t hat t he correct version of t he dat a is st ill in file A and it will em pt y file B. We are back t o t he pre- t ransact ion st at e. The em pt ying of B is our rollback. Once t he value 1 in S get s t o t he disk, t he t ransact ion is com m it t ed. A syst em crash aft er t hat will result in t he program com ing back, reading t he value 1 from S and assum ing t hat t he correct dat a is in file B. I t will em pt y file A, t hus com plet ing t he t ransact ion. Not ice t hat dat a in file B is guarant eed t o be com plet e at t hat point : Since t he value in S is one, file B m ust have been flushed successfully. I f we want t o st art anot her save t ransact ion aft er t hat , we can sim ply int erchange t he roles of files A and B and com m it by changing t he value in S from one t o zero. To m ake t he schem e even m ore robust , we can choose som e random ( but fixed) byt e values for our swit ch, inst ead of zero and one. I n t his way we'll be m ore likely t o discover on- disk dat a corrupt ion- - som et hing t hat m ight always happen as long as disks are not 100% reliable and ot her

250

applicat ions can access our files and corrupt t hem . Redundancy provides t he first line of defense against dat a corrupt ion. This is how one m ight im plem ent a save t ransact ion. class SaveTrans { enum State { // some arbitrary bit patterns stDataInA = 0xC6, stDataInB = 0x3A }; public: SaveTrans () : _switch ("Switch"), _commit (false) { _state = _switch.ReadByte (); if (_state != stDataInA && state != stDataInB) throw "Switch file corrupted"; if (_state == stDataInA) { _data.Open ("A"); _backup.Open ("B"); } else { _data.Open ("B"); _backup.Open ("A"); } } File & GetDataFile () { return _data; } File & GetBackupFile () { return _backup; } ~SaveTrans () { if (_commit) _data.Empty (); else _backup.Empty (); } void Commit () { State otherState; if (_state == stDataInA) otherState = stDataInB; else otherState = stDataInA; _backup.Flush (); _switch.Rewind (); _switch.WriteByte (otherState); _switch.Flush (); _commit = true; } private: bool File

_commit; _switch;

251

File File State

_data; _backup; _state;

}; This is how t his t ransact ion m ight be used in t he process of saving a docum ent . void Document::Save () { SaveTrans xact; File &file = xact.GetBackupFile (); WriteData (file); xact.Commit (); } And t his is how it can be used in t he program init ializat ion. Document::Document () { SaveTrans xact; File &file = xact.GetDataFile (); ReadData (file); // Don't commit! // the destructor will do the cleanup } The sam e t ransact ion is used here for cleanup. Since we are not calling Commit, t he t ransact ion cleans up, which is exact ly what we need.

Th e M a p p in g -File Sch e m e You m ight be a lit t le concerned about t he perform ance charact erist ics of t he t hree- file schem e. Aft er all, t he docum ent m ight be a few m egabyt es long and writ ing it ( and flushing! ) t o disk every t im e you do a save creat es a serious overhead. So, if you want t o be a not ch bet t er t han m ost word processors, consider a m ore efficient schem e. The fact is t hat m ost of t he t im e t he changes you m ake t o a docum ent bet ween saves are localized in j ust a few places . Wouldn't it be m ore efficient t o updat e only t hose places in t he file inst ead of rewrit ing t he whole docum ent ? Suppose we divide t he docum ent int o " chunks" t hat fit each int o a single " page." By " page" I m ean a power- of- t wo fixed size subdivision. When updat ing a given chunk we could sim ply swap a page or t wo. I t 's j ust like swapping a few t iles in a bat hroom floor- - you don't need t o re- t ile t he whole floor when you j ust want t o m ake a sm all change around t he sink. St rict ly speaking we don't even need fixed size power- of- t wo pages, it j ust m akes t he flushes m ore efficient and t he bookkeeping easier. All pages m ay be kept in a single file, but we need a separat e " m ap" t hat est ablishes t he order in which t hey appear in t he docum ent . Now, if only t he " m ap" could fit int o a sm all swit ch file, we would perform t ransact ions by updat ing t he m ap. Suppose, for exam ple, t hat we want t o updat e page t wo out of a t en- page file. First we t ry t o find a free page in t he file ( we'll see in a m om ent how t ransact ions produce free pages) . I f a free page cannot be found, we j ust ext end t he file by adding t he elevent h page. Then we writ e t he new updat ed dat a int o t his free page. We now have t he current version of a part of t he

252

docum ent in page t wo and t he new version of t he sam e part in page eleven ( or what ever free page we used) . Now we at om ically overwrit e t he m ap, m aking page t wo free and page eleven t ake it s place. What if t he m ap doesn't fit int o a sm all file? No problem ! We can always do t he t hree- file t rick wit h t he m ap file. We can prepare a new version of t he m ap file, flush it and com m it by updat ing t he swit ch file.

Figure: The Mapping File Schem e: Before com m it t ing.

Figure: The Mapping File Schem e: Aft er com m it t ing. This schem e can be ext ended t o a m ult i- level t ree. I n fact several dat abases and even file syst em s use som et hing sim ilar, based on a dat a st ruct ure called a B- t ree.

253

Ove rlo a d in g o p e ra to r n e w Bot h new and delete are considered operat ors in C+ + . What it m eans, in part icular, is t hat t hey can be overloaded like any ot her operat or. And j ust like you can define a class- specific operator=, you can also define class- specific operat ors new and delete. They will be aut om at ically called by t he com piler t o allocat e and deallocat e obj ect s of t hat part icular class. Moreover, you can overload and override global versions of new and delete.

Clas s -s p e cific n e w Dynam ic m em ory allocat ion and deallocat ion are not cheap. A lot of program s spend t he bulk of t heir t im e inside t he heap, searching for free blocks, recycling delet ed blocks and m erging t hem t o prevent heap fragm ent at ion. I f m em ory m anagem ent is a perform ance bot t leneck in your program , t here are several opt im izat ion t echniques t hat you m ight use. Overloading new and delete on a per- class basis is usually used t o speed up allocat ion/ deallocat ion of obj ect s of t hat part icular class. There are t wo m ain t echniques- - caching and bulk allocat ion.

Ca ch in g The idea behind caching is t hat recycling is cheaper t han m anufact uring. Suppose t hat we want ed t o speed up addit ions t o a hash t able. Every t im e an addit ion is perform ed, a new link is allocat ed. I n our program , t hese links are only deallocat ed when t he whole hash t able is dest royed, which happens at t he end of t he program . I m agine, however, t hat we are using our hash t able in anot her program , where it 's eit her possible t o select ively rem ove it em s from t he hash t able, or where t here are m any hash t ables creat ed and dest royed during t he lifet im e of t he program . I n bot h cases, we m ight speed up average link allocat ion t im e by keeping around t he links t hat are current ly not in use. A FreeList obj ect will be used as st orage for unused links. To get a new link we call it s NewLink m et hod. To ret urn a link back t o t he pool, we call it s Recycle m et hod. The pool of links is im plem ent ed as a linked list . There is also a Purge m et hod t hat frees t he whole pool. class Link; class FreeList { public: FreeList () : _p (0) {} ~FreeList (); void Purge (); void * NewLink (); void Recycle (void * link); private: Link * _p; }; Class Link has a st at ic m em ber _freeList which is used by t he overloaded class- specific operat ors new and delete. Not ice t he assert ion in operat or new. I t

254

prot ect s us from som ebody calling t his part icular operat or for a different class. How could t hat happen? Operat ors new and delete are inherit ed. I f a class derived from Link didn't override t hese operat ors, new called for t he derived class would ret urn an obj ect of t he wrong size ( base- class size) . class Link { friend class FreeList; public: Link (Link * pNext, int id) : _pNext (pNext), _id (id) {} Link * Next () const { return _pNext; } int Id () const { return _id; } // allocator void * operator new (size_t size) { assert (size == sizeof (Link)); return _freeList.NewLink (); } void operator delete (void * mem) { if (mem) _freeList.Recycle (mem); } static void Purge () { _freeList.Purge (); } private: static FreeList _freeList; Link * int

_pNext; _id;

}; I nside List::Add t he creat ion of a new Link will be t ranslat ed by t he com piler int o t he call t o t he class- specific operat or new followed by t he call t o it s const ruct or ( if any) . The beaut y of t his m et hod is t hat no changes t o t he im plem ent at ion of List are needed. class List { public: List (); ~List () { while (_pHead != 0) { Link * pLink = _pHead; _pHead = _pHead->Next(); delete pLink; } } void Add (int id) { Link * pLink = new Link (_pHead, id); _pHead = pLink; }

255

Link const * GetHead () const { return _pHead; } private: Link * _pHead; }; A hash t able cont ains an array of Lists which will all int ernally use t he special- purpose allocat or for it s links. Aft er we are done wit h t he hash t able, we m ight want t o purge t he m em ory st ored in t he privat e allocat or. That would m ake sense if, for inst ance, t here was only one hash t able in our program , but it allowed delet ion as well as addit ion of ent ries. On t he ot her hand, if we want ed our pool of links t o be shared bet ween m ult iple hash t ables, we wouldn't want t o purge it every t im e a hash t able is dest royed. class HTable { public: explicit HTable (int size): _size(size) { _aList = new List [size]; } ~HTable () { delete [] _aList; // release memory in free list Link::Purge (); // optional } // ... private: List * _aList; int _size; }; Not ice: Purge is a st at ic m et hod of Link, so we don't need an inst ance of a Link in order t o call it . I n t he im plem ent at ion file, we first have t o define t he st at ic m em ber _freeList of t he class Link. St at ic dat a is aut om at ically init ialized t o zero. FreeList Link::_freeList; The im plem ent at ion of FreeList is pret t y st raight forward. We t ry t o reuse Links, if possible; ot herwise we call t he global operat or new. Since we are allocat ing raw m em ory, we ask for sizeof (Link) byt es ( chars) . When we delet e t his st orage, we cast Links back t o t heir raw form . Delet ing a Link as a Link would result in a ( second! ) call t o it s dest ruct or. We don't want t o do it here, since dest ruct ors for t hese Links have already been called when t he class- specific delete was called. void * FreeList::NewLink () { if (_p != 0) {

256

void * mem = _p; _p = _p->_pNext; return mem; } else { // use global operator new return ::new char [sizeof (Link)]; } } void FreeList::Recycle (void * mem) { Link * link = static_cast (mem); link->_pNext = _p; _p = link; } FreeList::~FreeList () { Purge (); } void FreeList::Purge () { while (_p != 0) { // it was allocated as an array of char char * mem = reinterpret_cast (_p); _p = _p->Next(); ::delete [] mem; } } Not ice all t he cast ing we have t o do. When our overloaded new is called, it is expect ed t o ret urn a void point er. I nt ernally, however, we eit her recycle a Link from a linked- list pool, or allocat e a raw chunk of m em ory of t he appropriat e size. We don't want t o call ::new Link, because t hat would have an unwant ed side effect of calling Link's const ruct or ( it will be called anyway aft er we ret urn from operat or new) . Our delete, on t he ot her hand, is called wit h a void point er, so we have t o cast it t o a Link in order t o st ore it in t he list . Purge delet es all as if t hey were arrays of chars- - since t hat is how t hey were allocat ed. Again, we don't want t o delet e t hem as Links, because Link dest ruct ors have already been called. As usually, calls t o global operat ors new and delete can be disam biguat ed by prepending double colons. Here, t hey ar not st rict ly necessary, but t hey enhance t he readabilit y.

Bu lk Allo ca t io n Anot her approach t o speeding up allocat ion is t o allocat e in bulk and t hus am ort ize t he cost of m em ory allocat ion across m any calls t o operat or new. The im plem ent at ion of Links, Lists and HashTables is as before, except t hat a new

257

class, LinkAllocator is used in place of FreeList. I t has t he sam e int erface as FreeList, but it s im plem ent at ion is m ore involved. Besides keeping a list of recycled Links, it also has a separat e list of blocks of links. Each block consist s of a header of class Block and a block of 16 consecut ive raw pieces of m em ory each t he size of a Link. class Link; class LinkAllocator { enum { BlockLinks = 16 }; class Block { public: Block * Next () { return _next; } void SetNext (Block * next) { _next = next; } private: Block * _next; }; public: LinkAllocator () : _p (0), _blocks (0) {} ~LinkAllocator (); void Purge (); void * NewLink (); void Recycle (void * link); private: Link * _p; Block * _blocks; }; This is how a new Link is creat ed: void * LinkAllocator::NewLink () { if (_p == 0) { // use global operator new to allocate a block of links char * p = ::new char [sizeof (Block) + BlockLinks * sizeof (Link)]; // add it to the list of blocks Block * block = reinterpret_cast (p); block->SetNext (_blocks); _blocks = block; // add it to the list of links p += sizeof (Block); for (int i = 0; i < BlockLinks; ++i) { Link * link = reinterpret_cast (p); link->_pNext = _p; _p = link; p += sizeof (Link); } } void * mem = _p; _p = _p->_pNext;

258

return mem; } The first block of code deals wit h t he sit uat ion when t here are no unused links in t he Link list . A whole block of 16 ( BlockLinks) Link- sized chunks is allocat ed all at once, t oget her wit h som e room for t he Block header. The Block is im m ediat ely linked int o t he list of blocks and t hen chopped up int o separat e Links which are added t o t he Link list . Once t he Link list is replenished, we can pick a Link from it and pass it out . The im plem ent at ion of Recycle is t he sam e as before- - t he links are ret urned t o t he Link list . Purge, on t he ot her hand, does bulk deallocat ions of whole blocks. void LinkAllocator::Purge () { while (_blocks != 0) { // it was allocated as an array of char char * mem = reinterpret_cast (_blocks); _blocks = _blocks->Next(); ::delete [] mem; } } Only one call in 16 t o new Link result s in act ual m em ory allocat ion. All ot hers are dealt wit h very quickly by picking a ready- m ade Link from a list .

Ar r a y n e w Even t hough class Link has overloaded operat ors new and delete, if you were t o allocat e a whole array of Links, as in new Link [10], t he com piler would call global new t o allocat e enough m em ory for t he whole array. I t would not call t he class- specific overload. Conversly, delet ing such an array would result in t he call t o global operat or delete- - not it 's class- specific overload. Since in our program we never allocat e arrays of Links, we have not hing t o worry about . And even if we did, global new and delet e would do t he right t hing anyway. However, in t he unlikely case when you act ually want t o have cont rol over array allocat ions, C+ + provides a way. I t let 's you overload operat ors new[] and delete[]. The synt ax and t he signat ures are analogous t o t he overloads of st raight new and delete. void * operator new [] (size_t size); void operator delete [] (void * p); The only difference is t hat t he size passed t o new[] t akes int o account t he t ot al size of t he array plus som e addit ional dat a used by t he com piler t o dist inguish bet ween point ers t o obj ect s and arrays of obj ect s. For inst ance, t he com piler has t o know t he num ber of elem ent s in t he array in order t o be able t o call dest ruct ors on all of t hem when delete [] is called. All four operat ors new, delete, new[] and delete[] are t reat ed as st at ic m em bers of t he class t hat overloads t hem ( i.e., t hey don't have access t o this) .

259

Glo bal n e w Unlike class- specific new, global new is usually overloaded for debugging purposes. I n som e cases, however, you m ight want t o overload global new and delete perm anent ly, because you have a bet t er allocat ion st rat egy or because you want m ore cont rol over it . I n any case, you have a choice of overriding global new and delete or adding your own special versions t hat follow a slight ly different synt ax. St andard operat or new t akes one argum ent of t ype size_t. St andard delete t akes one orgum ent of t ype void *. You can define your own versions of new and delet e t hat t ake addit ional argum ent s of arbit rary t ypes. For inst ance, you can define void * operator new (size_t size, char * name); void operator delete (void * p, char * name); and call t he special new using t his synt ax: Foo * p = new ("special") Foo; Unfort unat ely, t here is no way t o call t he special delete explicit ly, so you have t o be sure t hat st andard delete will correct ly handle m em ory allocat ed using your special new ( or t hat delete is never called for such obj ect s) . So what 's t he use of t he overloaded delete wit h special argum ent s? There is act ually one case in which it will be called- - when an except ion is t hrown during obj ect const ruct ion. As you m ight recall, t here is a cont ract im plicit in t he language t hat if an except ion happens during t he const ruct ion of an obj ect , t he m em ory for t his obj ect will be aut om at ically deallocat ed. I t so happens t hat during obj ect 's const ruct ion t he com piler is st ill aware of which version of operat or new was called t o allocat e m em ory. I t is t herefore able t o generat e a call t o t he corresponding version of delete, in case an except ion is t hrown. Aft er t he successful com plet ion of const ruct ion, t his inform at ion is no longer available and t he com piler has no m eans t o guess which version of global delete is appropriat e for a given obj ect . Once you have defined an overloaded version of new, you can call it explicit ly, by specifying addit ional argum ent ( s) . Or you can subst it ut e all calls t o new in your code wit h t he overloaded version using m acro subst it ut ion.

M a cr o s We haven't really t alked about m acros in t his book- - t hey are a part of st andard C+ + , but t heir use is st rongly discouraged. I n t he old t im es, t hey were used in place of t he m ore sophist icat ed C+ + feat ures, such as inline funct ions and t em plat es. Now t hat t here are bet t er ways of get t ing t he sam e funct ionalit y, m acros are fast becom ing obsolet e. But j ust for com plet eness, let m e explain how t hey work. Macros are obnoxious, sm elly, sheet - hogging bedfellows for several reasons, m ost of which are relat ed t o t he fact t hat t hey are a glorified t ext subst it ut ion facilit y whose effect s are applied during preprocessing, before any C+ + synt ax and sem ant ic rules can even begin t o apply. Herb Sut t er A m acro works t hrough lit eral subst it ut ion. You m ay t hink of m acro expansion as a separat e process perform ed by t he com piler before even get t ing

260

t o t he m ain t ask of parsing C+ + synt ax. I n fact , in older com pilers, m acro expansion was done by a separat e program , t he preprocessor. There are t wo m aj or t ypes of m acros. The first t ype sim ply subst it ut es one st ring wit h anot her, in t he code t hat logically follows it ( by logically I m ean t hat , if t he m acro is defined in an include file, it will also work in t he file t hat includes it , and so on) . Let m e give you an exam ple t hat m ight act ually be useful. Let 's define t he following m acro in t he file dbnew.h #define new new(__FILE__, __LINE__) This m acro will subst it ut e all occurrences of new t hat logically follow it wit h t he st ring new (__FILE__, __LINE__). Moreover, t he m acro preprocessor will t hen subst it ut e all occurrences of t he special pre- defined sym bol __FILE__ wit h t he full nam e of t he source file in which it finds it ; and all occurrences of __LINE__ wit h t he appropriat e line num ber. So if you have a file c: \ t est \ m ain.cpp wit h t he cont ent s: #include "dbnew.h" int main () { int * p = new int; return 0; } it will be pre- processed t o produce t he following code: int main () { int * p = new ("c:\test\main.cpp", 4) int; return 0; } Now you can use your own overloaded operat or new, for inst ance t o t race all m em ory allocat ion. Here's a sim ple exam ple of such im plem ent at ion. void * operator new (size_t size, char const * file, int line) { std::cout (b++))? (a++): (b++)) One of t he variables will be increm ent ed t wice, t he ot her once. This is probably not what t he program m er expect ed. By t he way, t here is one m ore got cha- - not ice t hat I didn't put a space bet ween.

Tr a cin g M e m o r y Le a k s A m ore int erest ing applicat ion of t his t echnique let s you t race unreleased allocat ions, a.k.a. m em ory leaks. The idea is t o st ore inform at ion about each allocat ion in a global dat a st ruct ure and dum p it s cont ent s at t he end of t he program . Overloaded operat or delete would rem ove ent ries from t his dat a st ruct ure. Since operat or delet e has only access t o a point er t o previously allocat ed m em ory, we have t o be able t o reasonably quickly find t he ent ry based on t his point er. A m ap keyed by a point er com es t o m ind im m ediat ely. We'll call t his global dat a st ruct ure a Tracer class Tracer { private: class Entry { public: Entry (char const * file, int line) : _file (file), _line (line) {} Entry () : _file (0), _line (0) {} char const * File () const { return _file; } int Line () const { return _line; } private: char const * _file; int _line; }; class Lock { public:

262

Lock (Tracer & tracer) : _tracer (tracer) { _tracer.lock (); } ~Lock () { _tracer.unlock (); } private: Tracer & _tracer; }; typedef std::map::iterator iterator; friend class Lock; public: Tracer (); ~Tracer (); void Add (void * p, char const * file, int line); void Remove (void * p); void Dump (); static bool Ready; private: void lock () { _lockCount++; } void unlock () { _lockCount--; } private: std::map _map; int _lockCount; }; We have defined t wo auxillary classes, Tracer::Entry which is used as t he value for t he m ap, and Tracer::Lock which is used t o t em porary disable t racing. They are used in t he im plem ent at ion of Tracer::Add and Tracer::Remove. The m et hod Add adds a new ent ry t o t he m ap, but only when t racing is act ive. Not ice t hat it disables t racing when accessing t he m ap- - we don't want t o t race t he allocat ions inside t he m ap code. void Tracer::Add (void * p, char const * file, int line) { if (_lockCount > 0) return; Tracer::Lock lock (*this); _map [p] = Entry (file, line); } The m et hod Remove m akes t he sam e preparat ions as Add and t hen searches t he m ap for t he point er t o be rem oved. I f it 's found, t he whole ent ry is erased. void Tracer::Remove (void * p) { if (_lockCount > 0) return;

263

Tracer::Lock lock (*this); iterator it = _map.find (p); if (it != _map.end ()) { _map.erase (it); } } Finally, at t he end of t he program , t he m et hod Dump is called from t he dest ruct or of Tracer t o display all t he leaks. Tracer::~Tracer () { Ready = false; Dump (); } void Tracer::Dump () { if (_map.size () != 0) { std::cout second.Line (); std::cout first); std::stringstream out; out OnDestroy (); return 0; case WM_MOUSEMOVE: { POINTS p = MAKEPOINTS (lParam); KeyState kState (wParam); if (pCtrl->OnMouseMove (p.x, p.y, kState)) return 0; } } return ::DefWindowProc (hwnd, message, wParam, lParam); } We init ialize t he GWL_USERDATA slot corresponding t o hwnd in one of t he first m essages sent t o our window. The m essage is WM_NCCREATE ( Non- Client Creat e) , sent before t he creat ion of t he non- client part of t he window ( t he border, t he t it le bar, t he syst em m enu, et c.) . ( There is anot her m essage before t hat one, WM_GETMINMAXINFO, which m ight require special handling.) We pass t he point er t o t he cont roller as window creat ion dat a. We use t he class Win::CreateData, a t hin encapsulat ion of Windows st ruct ure CREATESTRUCT. Since we want t o be able t o cast a point er t o CREATESTRUCT passed t o us by Windows t o a point er t o Win: : Creat eDat a, we use inherit ance rat her t han em bedding ( you can inherit from a struct, not only from a class) . namespace Win { class CreateData: public CREATESTRUCT { public: void * GetCreationData () const { return lpCreateParams; } int GetHeight () const { return cy; } int GetWidth () const { return cx; } int GetX () const { return x; } int GetY () const { return y; } char const * GetWndName () const { return lpszName; } }; } The m essage WM_DESTROY is im port ant for t he t op- level window. That 's where t he " quit " m essage is usually post ed. There are ot her m essages t hat m ight be sent t o a window aft er WM_DESTROY, m ost not ably WM_NCDESTROY, but we'll ignore t hem for now. I also added t he processing of WM_MOUSEMOVE, j ust t o illust rat e t he idea of m essage handlers. This m essage is sent t o a window whenever a m ouse m oves over it . I n t he generic window procedure we will always unpack m essage param et ers and pass t hem t o t he appropriat e handler. There are t hree param et ers associat ed wit h WM_MOUSEMOVE, t he x coordinat e, t he y coordinat e and t he st at e of cont rol keys and but t ons. Two of t hese param et ers, x and y, are packed int o one LPARAM and Windows convenient ly provides a m acro t o unpack t hem , MAKEPOINTS, which t urns lParam int o a st ruct ure called POINTS. We ret rieve t he values of x and y from POI NTS and pass t hem t o t he handler.

281

The st at e of cont rol keys and but t ons is passed inside WPARAM as a set of bit s. Access t o t hese bit s is given t hrough special bit m asks, like MK_CONTROL, MK_SHIFT, et c., provided by Windows. We will encapsulat e t hese bit wise operat ions inside a class, Win::KeyState. class KeyState { public: KeyState (WPARAM wParam): _data (wParam) {} bool IsCtrl () const { return (_data & MK_CONTROL) != 0; } bool IsShift () const { return (_data & MK_SHIFT) != 0; } bool IsLButton () const { return (_data & MK_LBUTTON) != 0; } bool IsMButton () const { return (_data & MK_MBUTTON) != 0; } bool IsRButton () const { return (_data & MK_RBUTTON) != 0; } private: WPARAM _data; }; The m et hods of Win::KeyState ret urn t he st at e of t he cont rol and shift keys and t he st at e of t he left , m iddle and right m ouse but t ons. For inst ance, if you m ove t he m ouse while you press t he left but t on and t he shift key, bot h IsShift and IsLButton will ret urn true. I n WinMain, where t he window is creat ed, we init ialize our cont roller and pass it t o Win::Maker::Create along wit h t he window's t it le. TopController ctrl; win.Create (ctrl, "Simpleton"); This is t he m odified Create. I t passes t he poi nt er t o Controller as t he user- defined part of window creat ion dat a- - t he last argum ent t o CreateWindowEx. HWND Maker::Create (Controller & controller, char const * title) { HWND hwnd = ::CreateWindowEx ( _exStyle, _className, title, _style, _x, _y, _width, _height, _hWndParent, _hMenu, _hInst, &controller); if (hwnd == 0) throw "Internal error: Window Creation Failed."; return hwnd; } To sum m arize, t he cont roller is creat ed by t he client and passed t o t he Create m et hod of Win::Maker. There, it is added t o t he creat ion dat a, and

282

Windows passes it as a param et er t o WM_NCREATE m essage. The window procedure unpacks it and st ores it under GWL_USERDATA in t he window's int ernal dat a st ruct ure. During t he processing of each subsequent m essage, t he window procedure ret rieves t he cont roller from t his dat a st ruct ure and calls it s appropriat e m et hod t o handle t he m essage. Finally, in response t o WM_DESTROY, t he window procedure calls t he cont roller one last t im e and unplugs it from t he window. Now t hat t he m echanics of passing t he cont roller around are figured out , let 's t alk about t he im plem ent at ion of Controller. Our goal is t o concent rat e t he logic of a window in t his one class. We want t o have a generic window procedure t hat t akes care of t he ugly st uff- - t he big swit ch st at em ent , t he unpacking and re- packing of m essage param et ers and t he forwarding of t he m essages t o t he default window procedure. Once t he m essage is rout ed t hrough t he swit ch st at em ent , t he appropriat e Controller m et hod is called wit h t he correct ( st rongly- t yped) argum ent s. For now, we'll j ust creat e a st ub of a cont roller. Event ually we'll be adding a lot of m et hods t o it - - as m any as t here are different Windows m essages. The cont roller st ores t he handle t o t he window it services. This handle is init ialized inside t he window procedure during t he processing of WM_NCCREATE. That 's why we m ade Win::Procedure a friend of Win::Controller. The handle it self is prot ect ed, not privat e- - derived classes will need access t o it . There are only t wo m essage- handler m et hods at t his point , OnDestroy and OnMouseMove. namespace Win { class Controller { friend LRESULT CALLBACK Procedure (HWND hwnd, UINT message, WPARAM wParam, LPARAM lParam); void SetWindowHandle (HWND hwnd) { _h = hwnd; } public: virtual ~Controller () {} virtual bool OnDestroy () { return false; } virtual bool OnMouseMove (int x, int y, KeyState kState) { return false; } protected: HWND _h; }; } You should keep in m ind t hat Win::Controller will be a part of t he library t o be used as a base class for all user- defined cont rollers. That 's why all m essage handlers are declared virt ual and, by default , t hey ret urn false. The m eaning of t his Boolean is, " I handled t he m essage, so t here is no need t o call DefWindowProc." Since our default im plem ent at ion doesn't handle any m essages, it always ret urns false. The user is supposed t o define his or her own cont roller t hat inherit s from Win::Controller and overrides som e of t he m essage handlers. I n t his case, t he only m essage handler t hat has t o be overridden is OnDestroy- - it m ust close t he applicat ion by sending t he " quit " m essage. I t ret urns true, so t hat t he default window procedure is not called aft erwards.

283

class TopController: public Win::Controller { public: bool OnDestroy () { ::PostQuitMessage (0); return true; } }; To sum m arize, our library is designed in such a way t hat it s client has t o do m inim al work and is prot ect ed from m aking t rivial m ist akes. For each class of windows, t he client has t o creat e a cust om ized cont roller class t hat inherit s from our library class, Win::Controller. He im plem ent s ( overrides) only t hose m et hods t hat require non- default im plem ent at ion. Since he has t he prot ot ypes of all t hese m et hods, t here is no danger of m isint erpret ing m essage param et ers. This part - - t he int erpret at ion and unpacking- - is done in our Win: : Procedure. I t is writ t en once and for all, and is t horoughly t est ed. This is t he part of t he program t hat is writ t en by t he client of our library. I n fact , we will sim plify it even m ore lat er. I s it explained t hat t he result of assignm ent can be used in an expression? #include #include #include #include

"Class.h" "Maker.h" "Procedure.h" "Controller.h"

class TopController: public Win::Controller { public: bool OnDestroy () { ::PostQuitMessage (0); return true; } }; int WINAPI WinMain (HINSTANCE hInst, HINSTANCE hPrevInst, LPSTR cmdParam, int cmdShow) { char className [] = "Simpleton"; Win::ClassMaker winClass (className, hInst); winClass.Register (); Win::Maker maker (className, hInst); TopController ctrl; Win::Dow win = maker.Create (ctrl, "Simpleton"); win.Display (cmdShow); MSG msg; int status; while ((status = ::GetMessage (& msg, 0, 0, 0)) != 0) { if (status == -1) return -1; ::DispatchMessage (& msg);

284

} return msg.wParam; } Not ice t hat we no longer have t o pass window procedure t o class m aker. Class m aker can use our generic Win::Procedure im plem ent ed in t erm s of t he int erface provided by our generic Win::Controller. What will really dist inguish t he behavior of one window from t hat of anot her is t he im plem ent at ion of a cont roller passed t o Win::Maker::Create. The cost of t his sim plicit y is m ost ly in code size and in som e m inim al speed det eriorat ion. Let 's st art wit h speed. Each m essage now has t o go t hrough param et er unpacking and a virt ual m et hod call- - even if it 's not processed by t he applicat ion. I s t his a big deal? I don't t hink so. An average window doesn't get m any m essages per second. I n fact , som e m essages are queued in such a way t hat if t he window doesn't process t hem , t hey are overwrit t en by new m essages. This is for inst ance t he case wit h m ouse- m ove m essages. No m at t er how fast you m ove t he m ouse over t he window, your window procedure will not choke on t hese m essages. And if a few of t hem are dropped, it shouldn't m at t er, as long as t he last one ends up in t he queue. Anyway, t he frequency wit h which a m ouse sends m essages when it slides across t he pad is quit e arbit rary. Wit h t he current processor speeds, t he processing of window m essages t akes a m arginally sm all am ount of t im e. Program size could be a considerat ion, except t hat m odern com put ers have so m uch m em ory t hat a m egabyt e here and t here doesn't really m at t er. A full blown Win::Controller will have as m any virt ual m et hods as t here are window m essages. How m any is it ? About 200. The full vt able will be 800 byt es. That 's less t han a kilobyt e! For com parison, a single icon is 2kB. You can have a dozen of cont rollers in your program and t he t ot al size of t heir vt ables won't even reach 10kB. There is also t he code for t he default im plem ent at ion of each m et hod of Win::Controller. I t s size depends on how aggressively your com piler opt im izes it , but it adds up t o at m ost a few kB. Now, t he worst case, a program wit h a dozen t ypes of windows, is usually already pret t y com plex- - read, large! - - plus it will probably include m any icons and bit m aps. Seen from t his perspect ive, t he price we have t o pay for sim plicit y and convenience is m inim al.

Exce p tio n Sp e cificatio n What would happen if a Cont roller m et hod t hrew an except ion? I t would pass right t hrough our Win::Procedure, t hen t hrough several layers of Windows code t o finally em erge t hrough t he m essage loop. We could, in principle cat ch it in WinMain. At t hat point , however, t he best we could do is t o display a polit e error m essage and quit . Not only t hat , it 's not ent irely clear how Windows would react t o an except ion rushing t hrough it s code. I t m ight , for inst ance, fail t o deallocat e som e resources or even get int o som e unst able st at e. The bot t om line is t hat Windows doesn't expect an except ion t o be t hrown from a window procedure. We have t wo choices, eit her we put a try/ catch block around t he swit ch st at em ent in Win::Procedure or we prom ise not t o t hrow any except ions from Cont roller's m et hods. A try/ catch block would add t im e t o t he processing of every single m essage, whet her it 's overridden by t he client or not . Besides, we

285

would again face t he problem , what t o do wit h such an except ion. Term inat e t he program ? That seem s pret t y harsh! On t he ot her hand, t he cont ract not t o t hrow except ions is im possible t o enforce. Or is it ?! Ent er except ion specificat ions. I t is possible t o declare what kind of except ions can be t hrown by a funct ion or m et hod. I n part icular, we can specify t hat no except ions can be t hrown by a cert ain m et hod. The declarat ion: virtual bool OnDestroy () throw (); prom ises t hat OnDestroy ( and all it s overrides in derived classes) will not t hrow any except ions. The general synt ax is t o list t he t ypes of except ions t hat can be t hrown by a procedure, like t his: void Foo () throw (bad_alloc, char *); How st rong is t his cont ract ? Unfort unat ely, t he st andard doesn't prom ise m uch. The com piler is only obliged t o det ect except ion specificat ion m ism at ches bet ween base class m et hods and derived class overrides. I n part icular, t he specificat ion can be only m ade st ronger ( fewer except ions allowed) . There is no st ipulat ion t hat t he com piler should det ect even t he m ost blat ant violat ions of t his prom ise, for inst ance an explicit throw inside a m et hod defined as throw() ( t hrow not hing) . The hope, however, is t hat com piler writ ers will give in t o t he dem ands of program m ers and at least m ake t he com piler issue a warning when an except ion specificat ion is violat ed. Just as it is possible for t he com piler t o report violat ions of const- ness, so it should be possible t o t rack down violat ions of except ion specificat ions. For t he t im e being, all t hat an except ion specificat ion accom plishes in a st andard- com pliant com piler is t o guarant ee t hat all unspecified except ions will get convert ed t o a call t o t he library funct ion unexpected (), which by default t erm inat es t he program . That 's good enough, for now. Declaring all m et hods of Win::Controller as " t hrow not hing" will at least force t he client who overrides t hem t o t hink t wice before allowing any except ion t o be t hrown.

Cle an u p I t 's t im e t o separat e library files from applicat ion files. For t he t im e being, we'll creat e a subdirect ory " lib" and copy all t he library files int o it . However, when t he com piler com piles files in t he m ain direct ory, it doesn't know where t o find library includes, unless we t ell it . All com pilers accept addit ional include pat hs. We'll j ust have t o add " lib" t o t he list of addit ional include pat hs. As part of t he cleanup, we'll also m ove t he definit ion of TopController t o a separat e file, cont rol.h.

286

Pain tin g Ap p licatio n Ico n Every Windows program m ust have an icon. When you browse int o t he direct ory where t he execut able is st ored, Windows browser will display t his program 's icon. When t he program is running, t his icon shows up in t he t askbar and in t he upper- left corner of t he program 's window. I f you don't provide your program wit h an icon, Windows will provide a default . The obvious place t o specify an icon for your applicat ion is in t he window class of t he t op- level window. Act ually, it 's best t o provide t wo icons at once, t he large one and t he sm all one, ot herwise Windows will t ry t o st ret ch or shrink t he one icon you give it , oft en wit h un- est het ic result s. Let 's add a SetIcons m et hod t o Win::ClassMaker and em bed t wo icon obj ect s in it . class ClassMaker { public: ... void SetIcons protected: WNDCLASSEX StdIcon SmallIcon };

(int id); _class; _stdIcon; _smallIcon;

We'll get t o t he im plem ent at ion of StdIcon and SmalIcon soon. First , let 's look at t he im plem ent at ion of SetIcons. The im ages of icons are loaded from program resources. void ClassMaker::SetIcons (int id) { _stdIcon.Load (_class.hInstance, id); _smallIcon.Load (_class.hInstance, id); _class.hIcon = _stdIcon; _class.hIconSm = _smallIcon; } Program resources are icons, bit m aps, st rings, m ouse cursors, dialog t em plat es, et c., t hat you can t ack on t o your execut able. Your program , inst ead of having t o search t he disk for files cont aining such resources, sim ply loads t hem from it s own execut able. How do you ident ify resources when you want t o load t hem ? You can eit her give t hem nam es or int eger ids. For sim plicit y ( and efficiency) , we will use ids. The set of your program 's resources is ident ified by t he inst ance handle t hat is passed t o WinMain. Let s st art wit h t he base class, Win::Icon. When you load an icon, you have t o specify t he resources where it can be found, t he unique id of t he part icular icon, it s dim ensions in pixels ( if t he act ual icon has different dim ensions, Windows will st ret ch or shrink it ) and som e flags. class Icon { public:

287

Icon (HINSTANCE res, int id, int dx = 0, int dy = 0, unsigned flag = LR_DEFAULTCOLOR) { Load (res, id, dx, dy, flag); } ~Icon (); operator HICON () const { return _h; } protected: Icon () : _h (0) {} void Load (HINSTANCE res, int id, int dx = 0, int dy = 0, unsigned flag = LR_DEFAULTCOLOR); protected: HICON _h; }; The API t o load an icon is called LoadImage and can be also used t o load ot her t ypes of im ages. I t 's ret urn t ype is am biguous, so it has t o be cast t o HICON. Once t he icon is no longer used, DestroyIcon is called. void Icon::Load (HINSTANCE res, int id, int dx, int dy, unsigned flag) { _h = reinterpret_cast ( ::LoadImage (res, MAKEINTRESOURCE (id), IMAGE_ICON, dx, dy, flag)); if (_h == 0) throw "Icon load image failed"; } Icon::~Icon () { ::DestroyIcon (_h); } Not ice t hat we can't pass t he icon id direct ly t o t he API . We have t o use a m acro MAKEINTRESOURCE which does som e cheat ing behind t he scenes. You see, LoadImage and several ot her API s have t o guess whet her you are passing t hem a st ring or an id. Since t hese are C funct ions, t hey can't be overloaded. I nst ead, you have t o t rick t hem int o accept ing bot h t ypes and t hen let t hem guess t heir real ident it y. MAKEINTRESOURCE m ucks wit h t he bit s of t he int eger t o m ake it look different t han a point er t o char. ( This is t he kind of program m ing t hat was popular when Windows API was first designed.) We can im m ediat ely subclass Icon t o SmallIcon and StdIcon. Their const ruct ors and Load m et hods are sim pler- - t hey don't require dim ensions or flags.

288

class SmallIcon: public Icon { public: SmallIcon () {} SmallIcon (HINSTANCE res, int id); void Load (HINSTANCE res, int id); }; class StdIcon: public Icon { public: StdIcon () {} StdIcon (HINSTANCE res, int id); void Load (HINSTANCE res, int id); }; The Load m et hods are im plem ent ed using t he parent class' Icon::Load m et hod ( you have t o use t he parent 's class nam e followed by double colon t o disam biguat e- - wit hout it t he com piler would underst and it as a recursive call and t he program would go int o an infinit e loop. To find out what t he correct sizes for sm all and st andard icons are, we use t he universal API , GetSystemMetrics t hat knows a lot about current syst em 's default s. void SmallIcon::Load (HINSTANCE res, int id) { Icon::Load (res, id, ::GetSystemMetrics (SM_CXSMICON), ::GetSystemMetrics (SM_CYSMICON)); } void StdIcon::Load (HINSTANCE res, int id) { Icon::Load (res, id, ::GetSystemMetrics (SM_CXICON), ::GetSystemMetrics (SM_CYICON)); } There's one m ore t hing: how does one creat e icons? There is a hard way and an easy way. The hard way is t o have som e kind of separat e icon edit or, writ e your own resource script t hat nam es t he icon files and, using a special t ool, com pile it and link wit h t he execut able. Just t o give you an idea of what 's involved, here are som e det ails. Your resource script file, let 's call it script .rc, should cont ain t hese t wo lines: #include "resource.h" IDI_MAIN ICON "main.ico" IDI_MAIN is a const ant defined in resource.h. The keyword ICON m eans t hat it corresponds t o an icon. What follows is t he nam e of t he icon file, m ain.ico. The header file, resource.h, cont ains t he definit ions of const ant s, for inst ance: #define IDI_MAIN

101

Unfort unat ely, you can't use t he sefer, C+ + version of it ,

289

const int IDI_MAIN = 101; A m acro subst it ut ion result s in exact ly t he sam e code as const int definit ion. The only difference is t hat , as is usual wit h m acros, you forgo t ype checking. The script file has t o be com piled using a program called rc.exe ( resource com piler) t o produce a file script .res. The linker will t hen link such file wit h t he rest of t he obj ect files int o one execut able. Or, if you have an int egrat ed developm ent environm ent wit h a resource edit or, you can creat e an icon in it , add it t o your resources under an appropriat e sym bolic id, and let t he environm ent do t he work for you. ( A graphical resource edit or becom es really indispensable when it com es t o designing dialogs.) Not ice t hat I 'm using t he sam e id for bot h icons. I t 's possible, because you can have t wo ( or m ore) im ages of different size in t he sam e icon. When you call LoadImage, t he one wit h t he closest dim ensions is picked. Norm ally, you'd creat e at least a 32x32 and a 16x16 icon. I have creat ed a set of t wo icons and gave t hem an int eger id I DI _MAI N ( defined in resource.h) . All I need now is t o m ake one addit ional call in WinMain. Win::ClassMaker winClass (className, hInst); winClass.SetIcons (IDI_MAIN); winClass.Register (); Finally, you m ight be wondering: if you add m any icons t o your program resources, which one is used by t he syst em as t he icon for t he whole execut able? The answer is, t he one wit h t he lowest num erical id.

W in d o w Pain tin g an d th e Vie w Obje ct Just like wit h any ot her act ion in Windows, window paint ing is done in response t o som e ext ernal act ions. For inst ance, your program m ay paint som et ing whenever a user m oves a m ouse or clicks a m ouse but t on, it m ay draw charact ers in response t o key presses, and so on. The part of t he window t hat you norm ally paint is called t he client area- - it doesn't include t he window borders, t he t it le bar, t he m enu, et c. But t here is one m ore sit uat ion when Windows m ay ask your program t o redraw a part or t he whole client area of your window. That happens because Windows is lazy ( or short of resources) . Whenever anot her applicat ion ( or som et im es your own m enu) overlaps your program 's window, t he syst em sim ply t hrows away t he part of t he im age t hat 's occluded. When your window is finally uncovered, som ebody has t o redraw t he discarded part . Guess who! Your program ! The sam e t hing happens when a window is m inim ized and t hen m axim ized again. Or when t he user resizes t he window. Since, from t he point of view of your applicat ion, t hese act ions happen m ore or less random ly, you have t o be prepared, at any t im e, t o paint t he whole client area from scrat ch. There is a special m essage, WM_PAINT, t hat Windows sends t o you when it needs your assist ance in repaint ing t he window. This m essage is also sent t he first t im e t he window is displayed. To illust rat e paint ing, we'll ext end our Windows program t o t race m ouse m ovem ent s. Whenever t he m ouse m oves, we'll draw a line connect ing t he new cursor posit ion wit h t he previous one. But y before we do t hat , we'll want t o add t he second obj ect from t he t riad Model- View- Cont roller t o our program . The

290

View will t ake care of all paint ing operat ions. I t will also st ore t he last recorded posit ion of t he m ouse. class TopController: public Win::Controller { ... private: View _view; };

Th e Can vas All display operat ions are done in t he cont ext of a part icular device, be it t he screen, a print er, a plot t er or som et hing else. I n t he case of drawing t o a window, we have t o obt ain a device cont ext ( DC) for t his window's client area. Windows can int ernally creat e a DC for us and give us a handle t o it . We use t his handle for all window out put operat ions. When done wit h t he out put , we m ust release t he handle. A DC is a resource and t he best way t o deal wit h it is t o apply Resource Managem ent m et hods t o it . We'll call t he generic owner of a DC, Canvas. We will have m any different t ypes of Canvas, depending on how t he device cont ext is creat ed and disposed of. They will all, however, share t he sam e funct ionalit y. For inst ance, we can call any Canvas obj ect t o draw a line or print som e t ext . Let 's m ake t hese t wo operat ions t he st art ing point of our im plem ent at ion. namespace Win { class Canvas { public: operator HDC () { return _hdc; } void Line (int x1, int y1, int x2, int y2) { ::MoveToEx (_hdc, x1, y1, 0); ::LineTo (_hdc, x2, y2); } void Text (int x, int y, char const * buf, int count) { ::TextOut (_hdc, x, y, buf, count); } protected: Canvas (HDC hdc) :_hdc (hdc) {} HDC

_hdc;

}; } HDC is Windows dat a st ruct ure, a handle t o a device cont ext . Our generic class, Canvas, doesn't provide any public way t o init ialize t his handle- - t his responsibilit y is left t o derived classes. The m em ber operator HDC () provides im plicit conversion from Canvas t o HDC. I t com es in handy when passing a Canvas obj ect t o an API t hat requires HDC.

291

I n order t o draw a line from one point t o anot her, we have t o m ake t wo API calls. The first one, MoveToEx, set s t he " current posit ion." The second, LineTo, draws a line from current posit ion t o t he point specified as it s argum ent ( it also m oves t he current posit ion t o t hat point ) . Point posit ions are specified by t wo coordinat es, x and y. I n t he default coordinat e syst em , bot h are in unit s of screen pixels. The origin, corresponding t o x = 0 and y = 0, is in t he upper left corner of t he client area of t he window. The x coordinat e increases from left t o right , t he y coordinat e grows from t op t o bot t om . To print t ext , you have t o specify where in t he window you want it t o appear. The x, y coordinat es passed t o TextOut t ell Windows where t o posit ion t he upper left corner of t he st ring. This is different t han print ing t o st andard out put , where t he only cont rol over placem ent was by m eans of newline charact ers. For a Windows device cont ext , newlines have no m eaning ( t hey are blocked out like all ot her non- print able charact ers) . I n fact , t he st ringt erm inat ing null charact er is also m eaningless t o Windows. The st ring t o be print ed using TextOut doesn't have t o be null- t erm inat ed. I nst ead, you are supposed t o specify t he count of charact ers you want print ed. So how and where should we obt ain t he device cont ext ? Since we want t o do t he drawing in response t o every m ouse m ove, we have t o do it in t he handler of t he WM_MOUSEMOVE m essage. That m eans our Cont roller has t o override t he OnMouseMove virt ual m et hod of Win::Controller. The t ype of Canvas t hat get s t he DC from Windows out side of t he processing of WM_PAINT, will be called UpdateCanvas. The pair of API s t o get and release a DC is GetDC and ReleaseDC, respect ively. class UpdateCanvas: public Canvas { public: UpdateCanvas (HWND hwnd) : Canvas (::GetDC(hwnd)), _hwnd(hwnd) {} ~UpdateCanvas () { ::ReleaseDC (_hwnd, _hdc); } protected: HWND _hwnd; }; We creat e t he Canvas is in t he appropriat e Controller m et hod- - in t his case OnMouseMove. This way t he m et hods of View will work independent of t he t ype of Canvas passed t o t hem . bool TopController::OnMouseMove (int x, int y, Win::KeyState kState) throw () { Win::UpdateCanvas canvas (_h); _view.MoveTo (canvas, x, y); return true; } We are now ready t o im plem ent t he View obj ect .

292

class View { public: View () : _x (0), _y (0) {} void MoveTo (Win::Canvas & canvas, int x, int y) { canvas.Line (_x, _y, x, y); _x = x; _y = y; PrintPos (canvas); } private: void PrintPos (Win::Canvas & canvas) { std::string str ("Mouse at: "); str += ToString (_x); str += ", "; str += ToString (_y); canvas.Text (0, 0, &str [0], str.length ()); } private: int _x, _y; }; The PrintPos m et hod is int erest ing. The purpose of t his m et hod is t o print " Mouse at : " followed by t he x and y coordinat es of t he m ouse posit ion. We want t he st ring t o appear in t he upper left corner of t he client area, at coordinat es ( 0, 0) . First , we have t o form at t he st ring. I n part icular, we have t o convert t wo num bers t o t heir st ring represent at ions. The form at t ing of num bers for print ing is built int o st andard st ream s so we'll j ust use t he capabilit ies of a st ring- based st ream . I n fact , any t ype t hat is accepted by a st ream can be convert ed t o a st ring using t his sim ple t em plat e funct ion: #include template inline std::string ToString (T & val) { std::stringstream out; out OnPaint ()) return 0; break; St rict ly speaking, WM_PAINT com es wit h a WPARAM t hat , in som e special cases, having t o do wit h com m on cont rols, m ight be set t o a device cont ext . For now, let 's ignore t his param et er and concent rat e on t he com m on case. The st andard way t o obt ain a device cont ext in response t o WM_PAINT is t o call t he API BeginPaint. This device cont ext has t o be released by a m at ching call t o EndPaint. The ownership funct ionalit y is nicely encapsulat ed int o t he PaintCanvas obj ect : class PaintCanvas: public Canvas { public: PaintCanvas (HWND hwnd) : Canvas (::BeginPaint (hwnd, &_paint)), _hwnd (hwnd) {} ~PaintCanvas () { ::EndPaint(_hwnd, &_paint); } int Top () const { return _paint.rcPaint.top; } int Bottom () const { return _paint.rcPaint.bottom; } int Left () const { return _paint.rcPaint.left; } int Right () const { return _paint.rcPaint.right; } protected: PAINTSTRUCT _paint; HWND _hwnd; }; Not ice t hat BeginPaint gives t he caller access t o som e addit ional useful inform at ion by filling t he PAINTSTRUCT st ruct ure. I n part icular, it is possible t o

294

ret rieve t he coordinat es of t he rect angular area t hat has t o be repaint ed. I n m any cases t his area is only a sm all subset of t he client area ( for inst ance, aft er uncovering a sm all port ion of t he window or resizing t he window by a sm all increm ent ) . I n our unsophist icat ed applicat ion we won't m ake use of t his addit ional info- - we'll j ust repaint t he whole window from scrat ch. Here's our own override of t he OnPaint m et hod of t he cont roller. I t creat es a PaintCanvas and calls t he appropriat e View m et hod. bool TopController::OnPaint () throw () { Win::PaintCanvas canvas (_h); _view.Paint (canvas); return true; } View sim ply calls it s privat e m et hod PrintPos. Not ice t hat View doesn't dist inguish bet ween UpdateCanvas and PaintCanvas. For all it knows, it is being given a generic Win::Canvas. void View::Paint (Win::Canvas & canvas) { PrintPos (canvas); } What can we do about t he varying size of t he st ring being print ed? We need m ore cont rol over form at t ing. The following code will m ake sure t hat each of t he t wo num bers is be print ed using a fixed field of widt h 4, by passing t he std::setw (4) m anipulat or t o t he st ream . I f t he num ber following it in t he st ream cont ains fewer t han 4 digit s, it will be padded wit h spaces. void PrintPos (Win::Canvas & canvas) { std::stringstream out; out *CmdTable [idx]) (); The t ranslat ion from com m and id t o an index is t he weakest point of t his schem e. I n fact , t he whole idea of defining your m enu in t he resource file is not as convenient as you m ight t hink. A reasonably com plex applicat ion will require dynam ic changes t o t he m enu depending on t he current st at e of t he program . The sim plest exam ple is t he Mem ory> Save it em in t he calculat or. I t would m ake sense for it t o be inact ive ( grayed out ) as long as t here has been no userdefined variable added t o m em ory. We could t ry t o som ehow re- act ivat e t his m enu it em when a variable is added t o m em ory. But t hat would require t he m odel t o know som et hing about t he user int erface- - t he m enu. We could st ill save t he day by m aking use of t he not ificat ion sink. However, t here is a bet t er and m ore general approach- - dynam ic m enus.

D yn am ic Me n u s 322

First , let 's generalize and ext end t he idea of a com m and t able. We already know t hat we need t here a point er t o m em ber t hrough which we can execut e com m ands. We can also add anot her point er t o m em ber t hrough which we can quickly t est t he availabilit y of a given com m and- - t his will enable us t o dynam ically gray out som e of t he it em s. A short help st ring for each com m and would be nice, t oo. Finally, I decided t hat it will be m ore general t o give com m ands st ring nam es, rat her t han int eger ident ifiers. Grant ed, searching t hrough st rings is slower t han finding an it em by id, but usually t here aren't t hat m any m enu it em s t o m ake a percept ible difference. Moreover, when t he program grows t o include not only m enus, but also accelerat ors and t oolbars; being able t o specify com m ands by nam e rat her t han by offset is a great m aint ainabilit y win. So here's t he definit ion of a com m and it em , t he building block of a com m and t able. namespace Cmd { template class Item { public: char const * _name; void (T::*_exec)(); Status (T::*_test)() const; char const * _help; }; }

// // // //

official name execute command test commnad status help string

I f we want t o reuse Cmd::Item we have t o m ake it a t em plat e. The param et er of t he t em plat e is t he class of t he part icular com m ander whose m et hods we want t o access. This is how t he client creat es a com m and t able and init ializes it wit h appropriat e st rings and point ers t o m em bers. namespace Cmd { const Cmd::Item Table [] = { { "Program_About", &Commander::Program_About, &Commander::can_Program_About, "About this program"}, { "Program_Exit", &Commander::Program_Exit, &Commander::can_Program_Exit, "Exit program"}, { "Memory_Clear", &Commander::Memory_Clear, &Commander::can_Memory_Clear, "Clear memory"}, { "Memory_Save", &Commander::Memory_Save, &Commander::can_Memory_Save, "Save memory to file"}, { "Memory_Load", &Commander::Memory_Load, &Commander::can_Memory_Load, "Load memory from file"}, { 0, 0, 0} };

323

} Here, Commander is t he nam e of t he com m ander class defined in t he calculat or. Com m and t able is used t o init ialize t he act ual com m and vect or, Cmd::VectorExec, which adds funct ionalit y t o t his dat a st ruct ure. The relat ionship bet ween Cmd::Table and Cmd::VectorExec is analogous t o t he relat ionship bet ween Function::Array and Function::Table inside t he calculat or. As before, t his schem e m akes it very easy t o add new it em s t o t he t able- - new com m ands t o our program . Cmd::VectorExec has t o be a t em plat e, for t he sam e reason Cmd::Items have. However, in order not t o t em plat ize everyt hing else t hat m akes use of t his vect or ( in part icular, t he m enu syst em ) I derived it from a non- t em plat e class, Cmd::Vector, t hat defines a few pure virt ual funct ions and som e generic funct ionalit y, like searching com m ands by nam e using a m ap. The m enu provides acces t o t he com m and vect or. I n a dynam ic m enu syst em , we init ialize t he m enu from a t able. The t able is organized hierarchicaly: m enu bar it em s point t o popup m enus which cont ain com m ands. For inst ance, t his is what t he init ializat ion t able for our calculat or m enu looks like ( not ice t hat com m and t hat require furt her user input - - a dialog- - are followed by t hree dot s) : namespace Menu { const Item programItems [] = { {CMD, "&About...", "Program_About"}, {SEPARATOR, 0, 0}, {CMD, "E&xit", "Program_Exit"}, {END, 0, 0} }; const Item memoryItems [] = { {CMD, "&Clear", "Memory_Clear"}, {SEPARATOR, 0, 0}, {CMD, "&Save...", "Memory_Save"}, {CMD, "&Load...", "Memory_Load"}, {END, 0, 0} }; //---- Menu bar ---const BarItem barItems [] = { {POP, "P&rogram", "Program", programItems}, {POP, "&Memory", "Memory", memoryItems}, {END, 0, 0, 0} }; } Not e t hat each it em cont ains t he display nam e wit h an em bedded am persand. This am persand is t ranslat ed by Windows int o a keyboard short cut ( not t o be confused wit h a keyboard accellerat or) . The am persand it self is not displayed, but t he let t er following it will be underlined. The user will t hen be able t o select a given m enu it em by pressing t he key corresponding t o t hat let t er while holdnig t he Alt key. All it em s also specify com m and nam es- - for

324

popup it em s, t hese are t he sam e st rings t hat were used in t he nam ing of com m ands. Menu bar it em s are also nam ed, but t hey don't have com m ands associat ed wit h t hem . Finally, m enu bar it em s have point ers t o t he corresponding popup t ables. Sim ilar t ables can be used for t he init ializat ion of accelerat ors and t oolbars. The act ual m enu obj ect , of t he class Menu::DropDown, is creat ed in t he const ruct or of t he View. I t is init ialized wit h t he t able of m enu bar it em s, Menu::barItems, shown above; and a Cmd::Vector obj ect ( init ialized using Cmd::Table) . The rest is convenient ly encapsulat ed in t he library. You m ight be int erest ed t o know t hat , since a m enu is a resource ( released using DestroyMenu API ) , t he class Menu::Maker has t ransfer sem ant ics. For inst ance, when we creat e a m enu bar, all t he popup m enus are t ransfered t o Menu::BarMaker, one by one. But t hat 's not t he end of t he st ory. We want t o be able t o dynam ically act ivat e or deact ivat e part icular m enu it em s. We already have Commander m et hods for t est ing t he availabilit y of part icular com m ands- - t hey are in fact accessible t hrough t he com m and vect or. The quest ion rem ains, what is t he best t im e t o call t hese m et hods? I t t urns out t hat Windows sends a m essage, WM_INITMENUPOPUP, right before opening up a popup m enu. The handler for t his m essage is called OnInitPopup. We can use t hat opport unit y t o m anipulat e t he m enu while t est ing for t he availabilit y of part icular com m ands. I n fact , since t he library class Menu::DropDown has access t o t he com m and vect or, it can im plem ent t he RefreshPopup m et hod once and for all. No need for t he client t o writ e any addit ional code. Displaying short help for each select ed m enu it em is also versy sim ple. When t he user m oves t he m ouse cursor t o a popup m enu it em , Windows sends us t he m essage, WM_MENUSELECT, which we can process in t he cont roller's m et hod, OnMenuSelect. We j ust call t he GetHelp m et hod of t he com m and vect or and send t he help st ring t o t he st at us bar. Let 's now review t he whole t ask from t he point of view of t he client . What code m ust t he client writ e t o m ake use of our dynam ic m enu syst em ? To begin wit h, he has t o im plem ent t he com m ander, which is j ust a reposit ory of all com m ands available in t he part icular program . Two m et hods m ust be im plem ent ed for each com m and: one t o execut e it and one t o t est for it s availabilit y. The role of t he com m ander is: • if required, get dat a from t he user, usually by m eans of a dialog box • dispat ch t he request t o t he m odel for execut ion. Once t he com m ander is in place, t he client has t o creat e and st at ically init ialize a t able of com m ands. I n t his t able all com m ands are given nam es and assigned short help st rings. This t able is t hen used in t he init ializat ion of t he com m and vect or. The m enu syst em is likewise init ialized by a t able. This t able cont ains com m and nam es, display nam es for m enu it em s and m arkers different iat ing bet ween com m ands, separat ors and bar it em s. Once t he m enu bar is ready, it has t o be at t ached t o t he t op- level window. However, don't t ry t o at t ach t he m enu inside t he const ruct or of View. Bot h View and Controller m ust be fully const ruct ed before adding t he m enu. Menu at t achm ent result s in a series of m essages sent t o t he t op level window ( m ost not ably, t o resize it s client area) , so t he whole cont roller has t o be ready t o process t hem in an orderly m anner.

325

Finally, t he user m ust provide a sim ple im plem ent at ions of OnInitPopup and, if needed, OnMenuSelect, t o refresh a popup m enu and t o display short help, respect ively. Because m aj or dat a st ruct ures in t he m enu syst em are init ialized by t ables, it is very easy t o change t hem . For inst ance, reorganizing t he m enu or renam ing m enu it em s requires changes only t o a single file- - t he one t hat cont ains t he m enu t able. Modifying t he behavior of com m ands requires only changes t o t he com m ander obj ect . Finally, adding a new com m and can be done in t hree independent st ages: adding t he appropriat e m et hods t o t he com m ander, adding an ent ry t o t he com m and t able, and adding an it em t o t he m enu t able. I t can hardly be m ade sim pler and less error- prone. Fig 1. shows t he relat ionships and dependencies bet ween various elem ent s of t he cont roller.

Fig 1. The relat ionships bet ween various elem ent s of t he cont roller. Because Commander doesn't have access t o View, it has no direct way t o force t he refreshing of t he display aft er such com m ands as Memory_Clear or Memory_Load. Again, we can only solve t his problem by brut e force ( refresh m em ory display aft er every com m and) or som e kind of not ificat ions. I decided t o use t he m ost generic not ificat ion m echanism - - sending a Windows m essage. I n order t o force t he clearing of t he calculat or's m em ory display, t he Commander sends a special user- defined m essage MSG_MEMCLEAR t o t he t op- level window. Rem em ber, a m essage is j ust a num ber. You are free t o define your own m essages, as long as you assign t hem num bers t hat won't conflict wit h any m essages used by Windows. There is a special ident ifier WM_USER which defines a num ber t hat is guarant eed t o be larger t han t hat of any Windows- specific m essage. To process user- defined m essages, I added a new handler, OnUserMessage, t o t he generic Win::Controller. This handler is called whenever t he m essage is larger or equal t o WM_USER. One m ore change is necessary in order t o m ake t he m enus work correct ly. We have t o expand t he m essage loop t o call TranslateMessage before DispatchMessage. TranslateMessage filt ers out t hese keyboard m essages t hat have t o be t ranslat ed int o m enu short cut s and t urns t hem int o WM_COMMAND m essages. I f you are also planning on adding keyboard accelerat ors ( not t o be confused wit h keyboard short cut s t hat are processed direct ly by t he m enu syst em ) - - for inst ance, Ct rl- L t o load m em ory- - you'll have t o furt her expand t he m essage loop t o call TranslateAccellerator. Alt hough we won't discuss m odeless dialogs here, you m ight be int erest ed t o know t hat t hey also require a pre- processing st ep, t he call t o IsDialogMessage, in t he m essage loop. I t m akes sense t o st ick all t hese accellerat ors and m odeless dialog handles in a separat e preprocessor obj ect , of t he class Win::MessagePrepro. I t 's m et hod Pump ent ers t he m essage loop and ret urns

326

only when t he t op- level window is dest royed. One usually passes t he preprocessor obj ect t o t he t op- level cont roller, t o m ake it possible t o dynam ically swit ch accellerat or t ables or creat e and dest roy m odeless dialogs.

Exe rcis e s 1. I n response t o t he user's double- clicking on an it em in t he hist ory pane, copy t he select ed st ring int o t he edit cont rol, so t hat t he user can edit and reexecut e it . 2. Add it em " Funct ion" t o t he m enu bar. The corresponding popup m enu should display t he list of available built - in funct ions. When t he user select s one, it s nam e and t he opening parent esis should be appended t o t he st ring in t he edit cont rol. Hint : This popup m enu should not be init ialized st at ically. I t should use t he funct ion t able from t he calculat or for it s init ializat ion. 3. Add keyboard accelerat ors for Ct rl- L and Ct rl- S for invoking t he Load and Save com m ands, respect ively. Use a st at ically init ialized accelerat or t able. Pass t his t able, t oget her wit h t he com m and vect or ( for com m and nam e t o com m and id t ranslat ion) t o t he accellerat or m aker. The API t o creat e an accelerat or t able is called CreateAcceleratorTable. Since an accellerat or t able is a resource ( released via DestroyAccelleratorTable) , you'll have t o apply resource m anagem ent in t he design of your classes. To at t ach t he accellerat or, pass a reference t o t he m essage preprocessor from WinMain t o TopController. Aft er creat ing t he accellerat or, use t he MsgPrepro::SetKbdAccelerator m et hod t o act ivat e it . Hint : Rem em ber t o change t he display st ring in m enu it em s t o include t he accellerat or key. For inst ance, t he Load it em should read, "&Load...\tCtrl+L" ( t he t ab m arker \t right - aligns t he accellerat or st ring) . 4. Convert t he Load com m and t o use GetOpenFileName for browsing direct ories.

327

S o f t w a r e P r o je c t Ab o u t S o ftw a r e • • • •

Com plexit y The Fract al Nat ure of Soft ware The Living Proj ect The Living Program m er

D e s ign Strate gie s • • •

Top- Down Obj ect - Orient ed Design Model- View- Cont roller Docum ent at ion

Te a m W o r k • •

Product ivit y Team St rat egies

Im p le m e n tatio n Strate gie s • • • • • •

Global Decisions Top- Down Obj ect - Orient ed I m plem ent at ion I nherit ing Som ebody Else's Code Mult i- Plat form Developm ent Program Modificat ions Test ing

328

Ab o u t S o ftw a r e Co m p le xity Dealing wit h com plexit y, t he finit e capacit y of hum an m ind, divide and conquer, abst ract ion. Dealing wit h com plexit y is t he essence of soft ware engineering. I t is also t he m ost dem anding part of it , requiring bot h discipline and creat ivit y. Why do we need special m et hodologies t o deal wit h com plexit y? The answer is in our brains. I n our im m ediat e m em ory we can deal only wit h a finit e and rat her sm all num ber of obj ect s- - what ever t ype t hey are, ideas, im ages, words. The ballpark figure is seven plus/ m inus t wo, depending on t he com plexit y of t he obj ect s t hem selves. Apparent ly in m any ancient cult ures t he num ber seven was considered synonym ous wit h m any. There are m any folk st ories t hat st art wit h " Long, long ago behind seven m ount ains, behind seven forest s, behind seven rivers t here lived..." There are essent ially t wo ways in which we hum an beings can deal wit h com plexit y. The divide- and- conquer m et hod, and t he abst ract ion m et hod. The divide- and- conquer m et hods is based on im posing a t ree- like st ruct ure on t op of a com plex problem . The idea is t hat at every node of t he t ree we have t o deal wit h only a sm all num ber of branches, wit hin t he lim it s of our im m ediat e m em ory. The t raversal of t he t ree leaf- t o- root or root - t o- leaf requires only a logarit hm ic num ber of st eps- - again, presum ably wit hin t he lim it s of our im m ediat e m em ory. For inst ance, t he body of academ ic knowledge is divided int o hum anit ies and sciences ( branching fact or of 2) . Sciences are subdivided int o various areas, one of t hem being Com put er Science, and so on. To underst and Kernighan and Rit chie's book on C, t he CS st udent needs only very lim it ed educat ion in hum anit ies. On t he ot her hand, t o writ e a poem one is not required t o program in C. The t ree- like subdivision of hum an knowledge not only facilit at es in- dept h t raversal and search, it also enables division of work bet ween various t eam s. We can t hink of t he whole hum anit y as one large t eam t aking part in t he enorm ous proj ect of t rying t o underst and t he World. Anot her very powerful t ool developed by all living organism s and perfect ed by hum ans is abst ract ion. The word " abst ract ion" has t he sam e root as subt ract ion. Abst ract ing m eans subt ract ing non- essent ial feat ures. Think of how m any feat ures you can safely subt ract from t he descript ion of your car before it st ops being recognizable as a car. Definit ely t he color of t he paint , t he license plat es, t he windshield wipers, t he capacit y of t he t runk, et c. Unconsciously t he sam e process is applied by a bird when it creat es it s definit ion of a " predat or." Abst ract ion is not 100% accurat e: a crow m ay get scared by a scarecrow, which som ehow falls wit hin it s abst ract not ion of a " predat or." Division and abst ract ion go hand in hand in, what one can call, divide- andabst ract paradigm . A com plex syst em can be visualized as a very large net work of int erconnect ed nodes. We divide t his net work int o a few " obj ect s" - - subset s of nodes. A good division has t he propert y t hat t here are as few int er- obj ect connect ions as possible. To describe t he obj ect s result ing from such a division we use abst ract ion. I n part icular, we can describe t he obj ect s by t he way t hey connect t o ot her obj ect s ( t he int erface) . We can sim plify t heir inner st ruct ure by subt ract ing as m any inessent ial feat ures as possible. At every st age of division, it should be possible t o underst and t he whole syst em in t erm s of int eract ions bet ween a few well abst ract ed obj ect s. I f t here is no such way, we give up. The

329

real m iracle of our World is t hat large port ions of it ( m aybe even everyt hing) can be approached using t his paradigm .

A com plex syst em .

Abst ract ing obj ect s out of a com plex syst em .

The high level view of t he com plex syst em aft er abst ract ing obj ect s. This process is t hen repeat ed recursively by dividing obj ect s int o subobj ect s, and so on. For every obj ect , we first undo t he abst ract ion by adding back all t he feat ures we have subt ract ed, divide it int o sub- obj ect s and use new abst ract ions t o define t hem . An obj ect should becom e underst andable in t erm s of a few well abst ract ed sub- obj ect s. I n som e way t his recursive process creat es a self- sim ilar, fract al- like st ruct ure.

330

The fract al st ruct ure of a com plex syst em s. I n soft ware engineering we divide a large proj ect int o m anageable pieces. I n order t o define, nam e and describe t hese pieces we use abst ract ion. We can t alk about sym bol t ables, parsers, indexes, st orage layers, et c. They are all abst ract ions. And t hey let us divide a bigger problem int o sm aller pieces.

Th e Fractal N atu re o f So ftw are Let m e illust rat e t hese ideas wit h t he fam iliar exam ple of t he soft ware proj ect t hat we've been developing in t he second part of t he book- - t he calculat or. The t op level of t he proj ect is st ruct ured int o a set of int errelat ed obj ect s, Figure.

Top level view of t he calculat or proj ect . This syst em is closed in t he sense t hat one can explain how t he program works ( what t he funct ion main does) using only t hese obj ect s- - t heir public

331

int erfaces and t heir funct ionalit y. I t is not necessary t o know how t hese obj ect perform t heir funct ions; it is enough t o know what t hey do. So how does t he program work? First , t he Calculator is creat ed inside main. The Calculator is Serializable, t hat m eans t hat it s st at e can be saved and rest ored. Not ice t hat , at t his level, we don't need t o know anyt hing about t he st ream s- - t hey are black boxes wit h no visible int erface ( t hat 's why I didn't include t hem in t his pict ure) . Once t he Calculator is creat ed, we ent er t he loop in which we get a st ream of t ext from t he user and creat e a Scanner from it . The Scanner can t ell us whet her t he user input is a com m and or not . I f it is a com m and, we creat e a CommandParser, ot herwise we creat e a Parser. Eit her of t hem requires access t o bot h t he Calculator and t he Scanner. CommandParser can Execute a com m and, whereas Parser can Parse t he input and Calculate t he result . We t hen display t he result and go back t o t he beginning of t he loop. The loop t erm inat es when CommandParser ret urns st at us stQuit from t he Execute m et hod. That 's it ! I t could hardly be sim pler t han t hat . I t 's not easy, t hough, t o com e up wit h such a nice set of abst ract ion on t he first t ry. I n fact we didn't ! We had t o go t hrough a series of rewrit es in order t o arrive at t his sim ple st ruct ure. All t he t echniques and lit t le rules of t hum b described in t he second part of t he book had t his goal in m ind. But let 's cont inue t he j ourney. Let 's zoom - in int o one of t he t op level com ponent s- - t he Calculat or. Again, it can be described in t erm s of a set of int errelat ed obj ect s, Figure.

The result of zoom ing- in on t he Calculat or. And again, I could explain t he im plem ent at ion of all Calculat or m et hods using only t hese obj ect s ( and a few from t he level above) . Next , I could zoom - in on t he Store obj ect and see a very sim ilar pict ure.

332

The result of zoom ing- in on St ore. I could go on like t his, j ust like in one of t hese Mandelbrot set program s, where you can zoom - in on any part of t he pict ure and see som et hing t hat is different and yet sim ilar. Wit h a m at hem at ical fract al, you can keep zoom ing- in indefinit ely and keep seeing t he sam e infinit e level of det ail. Wit h a soft ware proj ect , you will event ually get t o t he level of plain built - in t ypes and com m ands. ( Of course, you m ay cont inue zoom ing- in int o assem bly language, m icrocode, gat es, t ransist ors, at om s, quarks, superst rings and furt her, but t hat 's beyond t he scope of t his book.)

Th e Livin g Pro je ct The lifet im e of t he proj ect , cyclic nat ure of program m ing, t he phases, openended design, t he program as a living organism . Every soft ware proj ect has a beginning. Very few have an end ( unless t hey are cancelled by t he m anagem ent ) . You should get used t o t his kind of openended developm ent . You will save yourself and your coworkers a lot of grief . Assum e from t he very beginning t hat : • New feat ures will be added, • Part s of t he program will be rewrit t en, • Ot her people will have t o read, underst and, and m odify your code, • There will be version 2.0 ( and furt her) . Design for version 2, im plem ent for version 1. Som e of t he funct ionalit y expect ed in v. 2 should be st ubbed out in v. 1 using dum m y com ponent s. The developm ent of a soft ware proj ect consist s of cycles of different m agnit ude. The longest scale cycle is t he m aj or version cycle. Wit hin it we usually have one or m ore m inor version cycles. The creat ion of a m aj or version goes t hrough t he following st ages: • Requirem ent ( or ext ernal) specificat ion,

333

• • •

Archit ect ural design ( or re- design) , I m plem ent at ion, Test ing and bug fixing. Tim e- wise, t hese phases are int erlaced. Archit ect ural design feeds back int o t he requirem ent s spec. Som e feat ures t urn out t o be t oo expensive, t he need for ot hers arises during t he design. I m plem ent at ion feeds back int o t he design in a m aj or way. Som e even suggest t hat t he developm ent should go t hrough t he cycles of im plem ent at ion of t hrow- away prot ot ypes and phases of re- design. Throwing away a prot ot ype is usually t oo big a wast e of developm ent t im e. I t m eans t hat t oo lit t le t im e was spent designing and st udying t he problem , and t hat t he im plem ent at ion m et hodology was inadequat e. One is not supposed t o use a different m et hodology when designing and im plem ent ing prot ot ypes, scaffolding or st ubs- - as opposed t o designing and im plem ent ing t he final product . Not following t his rule is a sign of hypocrisy. Not only is it dem oralizing, but it doesn't save any developm ent t im e. Quit e t he opposit e! My fellow program m ers and I were bit t en by bugs or om issions in t he scaffolding code so m any t im es, and wast ed so m uch t im e chasing such bugs, t hat we have finally learned t o writ e scaffolding t he sam e way we writ e product ion code. As a side effect , whenever t he scaffolding survives t he im plem ent at ion cycle and get s int o t he final product ( you'd be surprised how oft en t hat happens! ) , it doesn't lead t o any m aj or disast ers. Going back t o t he im plem ent at ion cycle. I m plem ent ing or rewrit ing any m aj or com ponent has t o be preceded by careful and det ailed design or redesign. The docum ent at ion is usually updat ed in t he process, lit t le essays are added t o t he archit ect ural spec. I n general, t he design should be t reat ed as an open- ended process. I t is alm ost always st rongly influenced by im plem ent at ion decisions. This is why it is so im port ant t o have t he discipline t o const ant ly updat e t he docum ent at ion. Docum ent at ion t hat is out of sync wit h t he proj ect is useless ( or worse t han useless- - it creat es m isinform at ion) . The im plem ent at ion proper is also done in lit t le cycles. These are t he fundam ent al edit - com pile- run cycles, well known t o every program m er. Not ice how t est ing is again int erlaced wit h t he developm ent . The run part of t he cycle serves as a sim ple sanit y t est . At t his level, t he work of a program m er resem bles t hat of a physician. The first principle- - never harm t he pat ient - - applies very well t o program m ing. I t is called " don't break t he code." The program should be t reat ed like a living organism . You have t o keep it alive at all t im es. Killing t he program and t hen resuscit at ing it is not t he right approach. So m ake all changes in lit t le st eps t hat are self- cont ained and as m uch t est able as possible. Som e funct ionalit y m ay be t em porarily disabled when doing a big " organ t ransplant ," but in general t he program should be funct ional at all t im es. Finally, a word of caut ion: How not t o develop a proj ect ( and how it is st ill done in m any places) . Don't j um p int o im plem ent at ion t oo quickly. Be pat ient . Resist t he pressure from t he m anagers t o have som et hing for a dem o as soon as possible. Think before you code. Don't sit in front of t he com put er wit h only a vague idea of what you want t o do wit h t he hope t hat you'll figure it out by t rial and error. Don't writ e sloppy code " t o be cleaned up lat er." There is a big difference bet ween st ubbing out som e funct ionalit y and writ ing sloppy code.

Th e Liv in g Pr o g r a m m e r Hum ilit y, sim plicit y, t eam spirit , dedicat ion.

334

A program m er is a hum an being. Failing t o recognize it is a source of m any m isunderst andings. The fact t hat t he program m er int eract s a lot wit h a com put er doesn't m ean t hat he or she is any less hum an. Since it is t he com put er t hat is supposed t o serve t he hum ans and not t he ot her way around, program m ing as an act ivit y should be organized around t he hum ans. I t sounds like a t ruism , but you'd be surprised how oft en t his sim ple rule is violat ed in real life. Forcing people t o program in assem bly ( or C for t hat m at t er) is j ust one exam ple. St ruct uring t he design around low level dat a st ruct ures like hash t ables, linked list s, et c., is anot her exam ple. The fact t hat j obs of program m ers haven't been elim inat ed by com put ers ( quit e t he opposit e! ) m eans t hat being hum an has it s advant ages. The fact t hat som e hum an j obs have been elim inated by com put ers m eans t hat being com put er has it s advant ages. The fundam ent al equat ion of soft ware engineering is t hus H u m a n Cr e a t ivit y + Com pu t e r Spe e d a n d Re lia bilit y = Pr ogr a m Trying t o writ e program s com bining hum an speed and reliabilit y wit h com put er creat ivit y is a big m ist ake! So let 's face it , we hum ans are slow and unreliable. When a program m er has t o wait for t he com put er t o finish com pilat ion, som et hing is wrong. When t he program m er is supposed t o writ he error- free code wit hout any help from t he com piler, linker or debugger, som et hing is wrong. I f t he program m er, inst ead of solving a problem wit h paper and pencil, t ries t o find t he com binat ion of param et ers t hat doesn't lead t o a general prot ect ion fault by t rial and error, som et hing is badly wrong. The charact er t rait s t hat m ake a good program m er are ( m aybe not so surprisingly) sim ilar t o t hose of a m art ial art disciple. Hum ilit y, pat ience, sim plicit y on t he one hand; dedicat ion and t eam spirit on t he ot her hand. And m ost of all, m ist rust t owards everybody including oneself. • Hum ilit y: Recognize your short com ings. I t is virt ually im possible for a hum an t o writ e error- free code. We all m ake m ist akes. You should writ e code in ant icipat ion of m ist akes. Use any m eans available t o m en and wom en t o guard your code against your own m ist akes. Don't be st ingy wit h assert ions. Use heap checking. Take t im e t o add debugging out put t o your program . • Pat ience: Don't rush t owards t he goal. Have pat ience t o build solid foundat ions. Design before you code. Writ e solid code for fut ure generat ions. • Sim plicit y: Get rid of unnecessary code. I f you find a sim pler solut ion, rewrit e t he relevant part s of t he program t o m ake use of it . Every program can be sim plified. Try t o avoid special cases. • Dedicat ion: Program m ing is not a nine- t o- five j ob. I am not saying t hat you should work night s and weekends. I f you are, it is usually a sign of bad m anagem ent . But you should expect a lifet im e of learning. You have t o grow in order t o keep up wit h t he t rem endous pace of progress. I f you don't grow, you'll be left behind by t he progress of t echnology. • Team spirit : Long gone are t he t im es of t he Lone Program m er. You'll have t o work in a t eam . And t hat m eans a lot . You'll have t o work on your com m unicat ion skills. You'll have t o accept cert ain st andards, coding convent ions, com m ent ing convent ions, et c. Be ready t o discuss and change som e of t he convent ions if t hey st op m aking sense. Som e people preach t he idea t hat " A st upid convent ion is bet t er t han no convent ion." Avoid such people. • Program m er's paranoia: Don't t rust anybody's code, not even your own.

335

D e s ign Strate gie s To p -D o w n Ob je ct Or ie n t e d D e s ig n Top level obj ect s, abst ract ions and m et aphors, com ponent s. I t is all t oo easy t o st art t he design by com ing up wit h such obj ect s as hash t ables, linked list s, queues, t rees, and t rying t o put t hem t oget her. Such an approach, bot t om - up, im plem ent at ion driven, should be avoided. A program t hat is built bot t om - up ends up wit h a st ruct ure of a soup of obj ect s. There are pieces of veget ables, chunks of m eat , noodles of various kinds, all float ing in som e kind of brot h. I t sort of looks obj ect orient ed- - t here are " classes" of noodles, veget ables, m eat , et c. However, since you rarely change t he im plem ent at ion of linked list s, queues, t rees, et c., you don't gain m uch from t heir obj ect - orient edness. Most of t he t im e you have t o m aint ain and m odify t he shapeless soup. Using t he t op- down approach, on t he ot her hand, you divide your program int o a sm all num ber of int eract ing high- level obj ect s. The idea is t o deal wit h only a few t ypes of obj ect s- - classes ( on t he order of seven plus/ m inus t wo- - t he capacit y of our short - t erm m em ory! ) . The t op- level obj ect s are divided int o t he m ain act ors of t he program , and t he com m unicat ion obj ect s t hat are exchanged bet ween t he act ors. I f t he program is int eract ive, you should st art wit h t he user int erface and t he obj ect s t hat deal wit h user input and screen- ( or t elet ype- ) out put . Once t he t op level obj ect s are specified, you should go t hrough t he exercise of rehearsing t he int eract ions bet ween t he obj ect s ( t his is som et im es called going t hrough use- case scenarios) . Go t hrough t he init ializat ion process, decide which obj ect have t o be const ruct ed first , and in what st at e t hey should st art . You should avoid using global obj ect s at any level ot her t han possibly t he t op level. Aft er everyt hing has been init ialized, pret end t hat you are t he user, see how t he obj ect s react t o user input , how t hey change t heir st at e, what kind of com m unicat ion obj ect s t hey exchange. You should be able t o describe t he int eract ion at t he t op wit hout having t o resort t o t he det ails of t he im plem ent at ion of lower levels. Aft er t his exercise you should have a pret t y good idea about t he int erfaces of your t op- level obj ect s and t he cont ract s t hey have t o fulfill ( t hat is, what t he result s of a given call wit h part icular argum ent s should be) . Every obj ect should be clearly defined in as few words as possible, it s funct ionalit y should form a coherent and well rounded abst ract ion. Try t o use com m on language, rat her t han code, in your docum ent at ion, in order t o describe obj ect s and t heir int eract ions. Rem em ber, cent er t he proj ect around hum ans, not com put ers. I f som et hing can be easily described in com m on language, it usually is a good abst ract ion. For t hings t hat are not easily abst ract ed use a m e t a ph or . An edit or m ight use a m et aphor of a sheet of paper; a scheduler a m et aphor of a calendar; a drawing program , m et aphors of pencils, brushes, erasers, palet t es, et c. The design of t he user int erface revolves around m et aphors, but t hey also com e in handy at ot her levels of design. Files, st ream s, sem aphores, port s, pages of virt ual m em ory, t rees, st acks- - t hese are all exam ples of very useful low- level m et aphors. The right choice of abst ract ions is always im port ant , but it becom es absolut ely crucial in t he case of a large soft ware proj ect , where t op- level obj ect s are im plem ent ed by separat e t eam s. Such obj ect s are called com pon e n t s. Any

336

change t o t he com ponent 's int erface or it s cont ract , once t he developm ent st art ed going full st eam ahead, is a disast er. Depending on t he num ber of com ponent s t hat use t his part icular int erface, it can be a m inor or a m aj or disast er. The m agnit ude of such a disast er can only be m easured in Richt er scale. Every proj ect goes t hrough a few such " eart hquakes" - - t hat 's j ust life! Now, repeat t he sam e design procedure wit h each of t he t op- level obj ect s. Split it int o sub- obj ect s wit h well defined purpose and int erface. I f necessary, repeat t he procedure for t he sub- obj ect s, and so on, unt il you have a pret t y det ailed design. Use t his procedure again and again during t he im plem ent at ion of various pieces. The goal is t o superim pose som e sort of a self- sim ilar, fract al st ruct ure on t he proj ect . The t op level descript ion of t he whole program should be sim ilar t o t he descript ion of each of t he com ponent s, it s sub- com ponent s, obj ect s, sub- obj ect s, et c. Every t im e you zoom - in or zoom - out , you should see m ore or less t he sam e t ype of pict ure, wit h a few self- cont ained obj ect s collaborat ing t owards im plem ent ing som e well- defined funct ionalit y.

M o d e l-Vie w -Co n t r o lle r Designing user int erface, input driven program s, Model- View- Cont roller paradigm Even t he sim plest m odern- day program s offer som e kind of int eract ivit y. Of course, one can st ill see a few rem nant s of t he grand UNI X paradigm , where every program was writ t en t o accept a one dim ensional st ream of charact ers from it s st andard input and spit out anot her st ream at it s st andard out put . But wit h t he advent of t he Graphical User I nt erface ( GUI ) , t he so- called " com m andline int erface" is quickly becom ing ext inct . For t he user, it m eans friendlier, m ore nat ural int erfaces; for t he program m er it m eans m ore work and a change of philosophy. Wit h all t he available help from t he operat ing syst em and wit h appropriat e t ools at hand it isn't difficult t o design and im plem ent user int erfaces, at least for graphically non- dem anding program s. What is needed is a change of perspect ive. An int eract ive program is, for t he m ost part , in pu t - dr ive n . Act ions in t he program happen in response t o user input . At t he highest level, an int eract ive program can be seen a series of event handlers for ext ernally generat ed event s. Every key press, every m ouse click has t o be handled appropriat ely. The obj ect - orient ed response t o t he int eract ive challenge is t he Model- ViewCont roller paradigm first developed and used in Sm allt alk. The Con t r olle r obj ect is t he focus of all ext ernal ( and som et im es int ernal as well) event s. I t s role is t o int erpret t hese event s as m uch as is necessary t o decide which of t he program obj ect s will have t o handle t hem . Appropriat e m essages are t hen sent t o such obj ect s ( in Sm allt alk parlance; in C+ + we j ust call appropriat e m et hods) . The Vie w t akes care of t he program 's visual out put . I t t ranslat es request s from ot her obj ect s int o graphical represent at ions and displays t hem . I n ot her words it abst ract s t he out put . Drawing lines, filling areas, writ ing t ext , showing t he cursor, are som e of t he m any responsibilit ies of t he View. Cent ralizing input in t he Cont roller and out put in t he View leaves t he rest of t he program independent from t he int ricacies of t he input / out put syst em ( also m akes t he program easy t o port t o ot her environm ent s wit h slight ly different graphical capabilit ies and int erfaces) . The part of t he program t hat is independent of t he det ails of input and out put is called t he M ode l. I t is t he hard worker and t he brains of t he program . I n sim ple program s, t he Model corresponds t o a single obj ect , but quit e oft en it is a collect ion of t op level

337

obj ect s. Various part s of t he Model are act ivat ed by t he Cont roller in response t o ext ernal event s. As a result of changes of st at e, t he Model updat es t he View whenever it finds it appropriat e. As a rule, you should st art t he t op- down design of an int eract ive program by est ablishing t he funct ionalit y of t he Cont roller and t he View. What ever happens prior t o any user act ion is considered init ializat ion of t hese com ponent s and t he m odel it self. The M- V- C t riad m ay also reappear at lower levels of t he program t o handle a part icular t ype of cont rol, a dialog box, an edit cont rol, et c.

D o cu m e n t a t io n Re qu ire m e n t Sp e cificatio n St at em ent of purpose, funct ionalit y, user int erface, input , out put , size lim it at ions and perform ance goals, feat ures, com pat ibilit y. The first docum ent t o be writ t en before any ot her work on a proj ect m ay begin is t he Requirem ent Specificat ion ( also called an Ext ernal Specificat ion) . I n large proj ect s t he Requirem ent Spec m ight be prepared by a dedicat ed group of people wit h access t o m arket research, user feedback, user t est s, et c. However, no m at t er who does it , t here has t o be a feedback loop going back from t he archit ect s and im plem ent ers t o t he group responsible for t he Requirem ent Spec. The crucial part of t he spec is t he st at em ent of pu r pose - - what t he purpose of t he part icular soft ware syst em is. Som et im es rest at ing t he purpose of t he program m ight bring som e new insight s or change t he focus of t he design. For inst ance, describing a com piler as a program which checks t he source file for errors, and which occasionally creat es an obj ect file ( when t here are no errors) , m ight result in a com pet it ively superior product . The st at em ent of purpose m ight also cont ain a discussion of t he key m et aphor( s) used in t he program . An edit or, for inst ance, m ay be described as a t ool t o m anipulat e lines of t ext . Experience however has shown t hat edit ors t hat use t he m et aphor of a sheet of paper are superior. Spreadsheet program s owe t heir popularit y t o anot her well chosen m et aphor. Then, a det ailed descript ion of t he fu n ct ion a lit y of t he program follows. I n a word- processor requirem ent spec one would describe t ext input , ways of form at t ing paragraphs, creat ing st yles, et c. I n a sym bolic m anipulat ion program one would specify t he kinds of sym bolic obj ect s and expressions t hat are t o be handled, t he various t ransform at ions t hat could be applied t o t hem , et c. This part of t he spec is supposed t o t ell t he designers and t he im plem ent ers what funct ionalit y t o im plem ent . Som e of it is described as m andat ory, som e of it goes int o t he wish list . The u se r in t e r fa ce and visual m et aphors go next . This part usually undergoes m ost ext ensive changes. When t he first prot ot ype of t he int erface is creat ed, it goes t hrough m ore or less ( m ost ly less) rigorous t est ing, first by developers, t hen by pot ent ial users. Som et im es a m anager doesn't like t he feel of it and sends program m ers back t o t he drawing board. I t is definit ely m ore art t han science, yet a user int erface m ay m ake or break a product . What com pounds t he problem is t he fact t hat anybody m ay crit icize user int erface. No special t raining is required. And everybody has different t ast es. The program m ers t hat im plem ent it are probably t he least qualified people t o j udge it . They are used t o t erse and crypt ic int erfaces of t heir program m ing t ools, grep, m ake, link, Em ax or vi. I n any case, designing user int erface is t he m ost frust rat ing and ungrat eful j ob. Behind t he user int erface is t he in pu t / ou t pu t specificat ion. I t describes what kind of input is accept ed by t he program , and what out put is generat ed by

338

t he program in response t o t his input . For inst ance, what is supposed t o happen when t he user clicks on t he form at - brush but t on and t hen clicks on a word or a paragraph in a docum ent ( t he form at t ing should be past ed over) . Or what happens when t he program reads a file t hat cont ains a com m a- separat ed list s of num bers. Or what happens when a pict ure is past ed from t he clipboard. Speed and size requirem ent s m ay also be specified. The kind of processor, m inim um m em ory configurat ion, and disk size are oft en given. Of course t here is always conflict bet ween t he ever growing list of desired feat ures and t he always conservat ive hardware requirem ent s and breat ht aking perform ance requirem ent s. ( Feat ures always win! Only when t he proj ect ent ers it s crit ical phase, feat ures get decim at ed.) Finally, t here m ay be som e com pa t ibilit y requirem ent s. The product has t o underst and ( or convert ) files t hat were produced eit her by it s earlier versions, or by com pet it ors' product s, or bot h. I t is wise t o include som e com pat ibilit y feat ures t hat will m ake fut ure versions of t he product easier t o im plem ent ( version st am ps are a m ust ) . Arch ite ctu re Sp e cificatio n Top level view, crucial dat a st ruct ures and algorit hm s. Maj or im plem ent at ion choices. The archit ect ural docum ent describes how t hings work and why t hey work t he way t hey work. I t 's a good idea t o eit her describe t heoret ical foundat ions of t he syst em , or at least give point ers t o som e lit erat ure. This is t he docum ent t hat gives t he t op level view of t he product as a program , as seen by t he developers. All t op level com ponent s and t heir int eract ions are described in som e det ail. The docum ent should show it clearly t hat , if t he t op level com ponent s im plem ent t heir funct ionalit y according t o t heir specificat ions, t he syst em will work correct ly. That will t ake t he burden off t he shoulders of developers- - t hey won't have t o t hink about t oo m any dependencies. The archit ect ural spec defines t he m aj or dat a st ruct ures, especially t he persist ent ones. The docum ent t hen proceeds wit h describing m aj or event scenarios and various st at es of t he syst em . The program m ay st art wit h an em pt y slat e, or it m ay load som e hist ory ( docum ent s, logs, persist ent dat a st ruct ures) . I t m ay have t o finish som e t ransact ions t hat were int errupt ed during t he last session. I t has t o go t hrough t he init ializat ion process and presum ably get int o som e quiescent st at e. Ext ernal or int ernal event s m ay cause som e act ivit y t hat t ransform s dat a st ruct ures and leads t o st at e t ransit ions. New st at es have t o be described. I n som e cases t he algorit hm s t o be used during such act ivit y are described as well. Too det ailed a descript ion of t he im plem ent at ion should be avoided. I t becom es obsolet e so quickly t hat it m akes lit t le sense t o t ry t o m aint ain it . Once m ore we should rem ind ourselves t hat t he docum ent at ion is a living t hing. I t should be writ t en is such a way t hat it is easy t o updat e. I t has t o have a sensible st ruct ure of it s own, because we know t hat it will be changed m any t im es during t he im plem ent at ion cycle. I n fact it should be changed, and it is very im port ant t o keep it up- t o- dat e and encourage fellow developers t o look int o it on a regular basis. I n t he im plem ent at ion cycle, t here are t im es when it is necessary t o put som e flesh int o t he design of som e im port ant obj ect t hat has only been sket ched in t he archit ect ural spec. I t is t im e t o eit her expand t he spec or writ e short essays on select ed t opics in t he form of separat e docum ent s. I n such

339

essays one can describe non- t rivial im plem ent at ions, algorit hm s, dat a st ruct ures, program m ing not es, convent ions, et c.

Te am W o rk Pr o d u ct iv it y Com m unicat ion explosion, vicious circle. The life of a big- t eam program m er is spent • Com m unicat ing wit h ot her program m ers, at t ending m eet ings, reading docum ent s, reading em ail, responding t o it , • Wait ing for ot hers t o finish t heir j ob, t o fix a bug, t o im plem ent som e vit al funct ionalit y, t o finish building t heir com ponent , • Fight ing fires, fixing build breaks, chasing som ebody else's bugs, • St aring at t he com put er screen while t he m achine is com piling, loading a huge program , running t est suit s, reboot ing- - and finally, when t im e perm it s- • Developing new code. There are n( n- 1) / 2 possible connect ions bet ween n dot s. That 's of t he order of O( n 2 ) . By t he sam e t oken, t he num ber of possible in t e r a ct ion s wit hin a group of n program m ers is of t he order of O( n 2 ) . The num ber of hours t hey can put out is of t he order of O( n) . I t is t hus inevit able t hat at som e point , as t he size of t he group increases, it s m em bers will st art spending all t heir t im e com m unicat ing. I n real life, people com e up wit h various com m unicat ionoverload defense m echanism s. One defense is t o ignore incom ing m essages, anot her is t o work odd hours ( night s, weekends) , when t here aren't t hat m any people around t o dist urb you ( wishful t hinking! ) . As a program m er you are const ant ly bom barded wit h inform at ion from every possible direct ion. They will broadcast m essages by em ail, t hey will drop print ed docum ent s in your m ailbox, t hey will invit e you t o m eet ings, t hey will call you on t he phone, or in really urgent cases t hey will drop by your office or cubicle and t alk t o you direct ly. I f program m ers were only t o writ e and t est code ( and it used t o be like t his not so long ago) t he m arket would be flooded wit h new operat ing syst em s, applicat ions, t ools, gam es, educat ional program s, and so on, all at ridiculously low prices. As a m at t er of fact , alm ost all public dom ain and shareware program s are writ t en by people wit h virt ually no com m unicat ion overhead. The following chart shows t he result s of a very sim ple sim ulat ion. I assum ed t hat every program m er spends 10 m inut es a day com m unicat ing wit h every ot her program m er in t he group. The rest of t he t im e he or she does som e real program m ing. The t im e spent program m ing, m ult iplied by t he num ber of program m ers in t he group, m easures t eam product ivit y- - t he effect ive work done by t he group every day. Not ice t hat , under t hese assum pt ions, t he effect ive work peaks at about 25 people and t hen st art s decreasing.

340

But wait , t here's m ore! The m ore people you have in t he group, t he m ore com plicat ed t he dependency graph. Com ponent A cannot be t est ed unt il com ponent B works correct ly. Com ponent B needs som e new funct ionalit y from C. C is blocked wait ing for a bug fix in D. People are wait ing, t hey get frust rat ed, t hey send m ore em ail m essages, t hey drop by each ot her's offices. Not enough yet ? Consider t he reliabilit y of a syst em wit h n com ponent s. The m ore com ponent s, t he m ore likely it is t hat one of t hem will break. When one com ponent is broken, t he t est ing of ot her com ponent s is eit her im possible or at least im paired. That in t urn leads t o m ore buggy code being added t o t he proj ect causing even m ore breakages. I t seem s like all t hese m echanism s feed on each ot her in one vicious circle. I n view of all t his, t he t eam 's product ivit y curve is m uch t oo opt im ist ic. The ot her side of t he coin is t hat raising t he product ivit y of a program m er, eit her by providing bet t er t ools, bet t er program m ing language, bet t er com put er, or m ore help in non- essent ial t asks, creat es a posit ive feedback loop t hat am plifies t he product ivit y of t he t eam . I f we could raise t he product ivit y of every program m er in a fixed size proj ect , we could reduce t he size of t he t eam - - t hat in t urn would lead t o decreased com m unicat ion overhead, furt her increasing t he effect ive product ivit y of every program m er. Every dollar invest ed in program m er's product ivit y saves several dollars t hat would ot herwise be spent hiring ot her program m ers. Cont inuing wit h our sim ple sim ulat ion- - suppose t hat t he goal is t o produce 100,000 lines of code in 500 days. I assum ed t he st art ing product ivit y of 16 lines of code per day per program m er, if t here were no com m unicat ion overhead. The following graph shows how t he required size of t he t eam shrinks wit h t he increase in product ivit y.

341

Not ice t hat , when t he curve t urns m ore or less linear ( let 's say at about 15 program m ers) , every 3% increase in product ivit y saves us one program m er, who can t hen be used in anot her proj ect . Several t hings influence product ivit y: • The choice of a program m ing language and m et hodology. So far it is hard t o beat C+ + and obj ect orient ed m et hodology. I f t he size or speed is not an issue, ot her specialized higher- level languages m ay be appropriat e ( Sm allt alk, Prolog, Basic, et c.) . The t radeoff is also in t he init ial invest m ent in t he educat ion of t he t eam . • The choice of proj ect - wide convent ions. Such decisions as whet her t o use except ions, how t o propagat e errors, what is t he code sharing st rat egy, how t o deal wit h proj ect - wide header files, et c., are all very difficult t o correct during t he developm ent . I t is m uch bet t er t o t hink about t hem up front . • The choice of a program m ing environm ent and t ools. • The choice of hardware for t he developm ent . RAM and disk space are of crucial im port ance. Local Area Net work wit h shared print ers and em ail is a necessit y. The need for large m onit ors is oft en overlooked.

Te a m St r a t e g ie s I n t he ideal world we would divide work bet ween sm all t eam s and let each t eam provide a clear and im m ut able int erface t o which ot her t eam s would writ e t heir code. We would couple each int erface wit h a well defined, com prehensive and det ailed cont ract . The int eract ion bet ween t eam s would be reduced t o t he exchange of int erface specificat ions and periodic updat es as t o which part of t he cont ract has already been im plem ent ed. This ideal sit uat ion is, t o som e ext ent , realized when t he t eam uses ext ernally creat ed com ponent s, such as libraries, operat ing syst em API 's ( applicat ion program m er int erfaces) , et c. Everyt hing whose int erface and cont ract can be easily described is a good candidat e for a library. For inst ance, t he st ring m anipulat ion library, t he library of cont ainer obj ect s, iost ream s, et c., t hey are all well described eit her in on- line help, or in com piler m anuals, or in ot her books. Som e API 's are not t hat easily described, so using t hem is oft en a m at t er of t rial and error ( or cust om er support calls) . I n real world, t hings are m ore com plicated t han t hat . Yes, we do divide work bet ween sm all t eam s and t hey do t ry t o com e up wit h som e int erfaces and cont ract s. However, t he int erfaces are far from im m ut able, t he cont ract s are far

342

from com prehensive, and t hey are being const ant ly re- negot iat ed bet ween t eam s. All we can do is t o t ry t o sm oot h out t his process as m uch as possible. You have t o st art wit h a good design. Spend as m uch t im e as necessary on designing good hierarchical st ruct ure of t he product . This st ruct ure will be t ranslat ed int o t he hierarchical st ruct ure of t eam s. The bet t er t he design, t he clearer t he int erfaces and cont ract s bet ween all part s of t he product . That m eans fewer changes and less negot iat ing at t he lat er st ages of t he im plem ent at ion. Divide t he work in clear correspondence t o t he st ruct ure of t he design, t aking int o account com m unicat ion needs. Already during t he design, as soon as t he st ruct ure cryst allizes, assign t eam leaders t o all t he t op level com ponent s. Let t hem negot iat e t he cont ract s bet ween t hem selves. Each t eam leader in t urn should repeat t his procedure wit h his or her t eam , designing t he st ruct ure of t he com ponent and, if necessary, assigning sub- com ponent s t o leaders of sm aller t eam s. The whole design should go t hrough several passes. The result s of lower level design should serve as feedback t o t he design of higher level com ponent s, and event ually cont ribut e t o t he design of t he whole product . Each t eam writ es it s own part of t he specificat ion. These specificat ions are reviewed by ot her t eam s responsible for ot her part s of t he sam e higher level com ponent . The m ore negot iat ing is done up front , during t he design, t he bet t er t he chances of sm oot h im plem ent at ion. The negot iat ions should be st ruct ured in such a way t hat t here are only a few people involved at a t im e. A " plenary" m eet ing is useful t o describe t he t op level design of t he product t o all m em bers of all t eam s, so t hat everybody knows what t he big pict ure is. Such m eet ings are also useful during t he im plem ent at ion phase t o m onit or t he progress of t he product . They are not a good forum for design discussions. Cont ract negot iat ions during im plem ent at ion usually look like t his: Som e m em ber of t eam A is using one of t he t eam B's int erfaces according t o his or her underst anding of t he cont ract . Unexpect edly t he code behind t he int erface ret urns an error, assert s, raises an except ion, or goes haywire. The m em ber of t eam A eit her goes t o t eam B's leader t o ask who is responsible for t he im plem ent at ion of t he int erface in quest ion, or direct ly t o t he im plem ent or of t his int erface t o ask what caused t he st range behavior. The im plem ent or eit her clarifies t he cont ract , changes it according t o t he needs of t eam A, fixes t he im plem ent at ion t o fulfill t he cont ract , or t akes over t he t racing of t he bug. I f a change is m ade t o com ponent B, it has t o be t horoughly t est ed t o see if it doesn't cause any unexpect ed problem s for ot her users of B. During t he im plem ent at ion of som e m aj or new funct ionalit y it m ay be necessary t o ask ot her t eam s t o change or ext end t heir int erfaces and/ or cont ract s. This is considered a r e - de sign . A re- design, like any ot her dist urbance in t he syst em , produces concent ric waves. The feedback process, described previously in t he course of t he original design, should be repeat ed again. The int erface and cont ract dist urbance are propagat ed first wit hin t he t eam s t hat are im m ediat ely involved ( so t hat t hey m ake sure t hat t he changes are indeed necessary, and t o t ry t o describe t hese changes in som e det ail.) , t hen up t owards t he t op. Som ewhere on t he way t o t he t op t he dist urbances of t he design m ay get dum ped com plet ely. Or t hey m ay reach t he very t op and change t he way t op level obj ect s int eract . At t hat point t he changes will be propagat ed downwards t o all t he involved t eam s. Their feedback is t hen bounced back t owards t he t op, and t he process is repeat ed as m any t im es as is

343

necessary for t he changes t o st abilize t hem selves. This " annealing" process ends when t he proj ect reaches a new st at e of equilibrium .

Im p le m e n tatio n Strate gie s Glo b a l D e cis io n s Error handling, except ions, com m on headers, code reuse, debug out put . The biggest decision t o be m ade, before t he im plem ent at ion can even begin, is how t o handle errors and except ions. There are a few m aj or sources of errors • Bug wit hin t he com ponent , • I ncorrect param et ers passed from ot her ( t rust ed) com ponent , • I ncorrect user input , • Corrupt ion of persist ent dat a st ruct ures, • Syst em running out of resources, Bugs are not supposed t o get t hrough t o t he final ret ail version of t he program , so we have t o deal wit h t hem only during developm ent . ( Of course, in pract ice m ost ret ail program s st ill have som e residual bugs.) Since during developm ent we m ost ly deal wit h debug builds, we can prot ect ourselves from bugs by sprinkling our code wit h assert ions. Assert ions can also be used t o enforce cont ract s bet ween com ponent s. User input , and in general input from ot her less t rust ed part s of t he syst em , m ust be t horoughly t est ed for correct ness before proceeding any furt her. " Typing m onkeys" t est s have t o be done t o ensure t hat no input can break our program . I f our program provides som e service t o ot her program s, it should t est t he validit y of ext ernally passed argum ent s. For inst ance, operat ing syst em API funct ions always check t he param et ers passed from applicat ions. This t ype of errors should be dealt wit h on t he spot . I f it 's direct user input , we should provide im m ediat e feedback; if it 's t he input from an unt rust ed com ponent , we should ret urn t he appropriat e error code. Any kind of persist ent dat a st ruct ures t hat are not t ot ally under our cont rol ( and t hat is always t rue, unless we are a file syst em ; and even t hen we should be careful) can always get corrupt ed by ot her applicat ions or t ools, not t o m ent ion hardware failures and user errors. We should t herefore always t est for t heir consist ency. I f t he corrupt ion is fat al, t his kind of error is appropriat e for t urning it int o an except ion. A com m on program m ing error is t o use assert ions t o enforce t he consist ency of dat a st ruct ures read from disk. Dat a on disk should never be t rust ed, t herefore all t he checks m ust also be present in t he ret ail version of t he program . Running out of resources- - m em ory, disk space, handles, et c.- - is t he prim e candidat e for except ions. Consider t he case of m em ory. Suppose t hat all program m ers are t rained t o always check t he ret urn value of operat or new ( t hat 's already unrealist ic) . What are we supposed t o do when t he ret urned point er is null? I t depends on t he sit uat ion. For every case of calling new, t he program m er is supposed t o com e up wit h som e sensible recovery. Consider t hat t he recovery pat h is rarely t est ed ( unless t he t est t eam has a way of sim ulat ing such failures) . We t ake up a lot of program m ers' t im e t o do som et hing t hat is as likely t o fail as t he original t hing whose failure we were handling. The sim plest way t o deal wit h out - of- m em ory sit uat ions is t o print a m essage " Out of m em ory" and exit . This can be easily accom plished by set t ing our own out - of- m em ory handler ( _set_new_handler funct ion in C+ + ) . This is however very rarely t he desired solut ion. I n m ost cases we at least want t o do

344

som e cleanup, save som e user dat a t o disk, m aybe even get back t o som e higher level of our program and t ry t o cont inue. The use of except ions and resource m anagem ent t echniques ( described earlier) seem s m ost appropriat e. I f C+ + except ion handling is not available or prohibit ed by m anagers, one is left wit h convent ional t echniques of t est ing t he result s of new, cleaning up and propagat ing t he error higher up. Of course, t he program m ust be t horoughly t est ed using sim ulat ed failures. I t is t his kind of philosophy t hat leads t o proj ect - wide convent ions such as " every funct ion should ret urn a st at us code." Norm al ret urn values have t hen t o be passed by reference or a point er. Very soon t he syst em of st at us codes develops int o a Byzant ine st ruct ure. Essent ially every error code should not only point at t he culprit , but also cont ain t he whole hist ory of t he error, since t he int erpret at ion of t he error is enriched at each st age t hrough which it passes. The use of const ruct ors is t hen highly rest rict ed, since t hese are t he only funct ions t hat cannot ret urn a value. Very quickly C+ + degenerat es t o " bet t er C." Fort unat ely m ost m odern C+ + com pilers provide except ion support and hopefully soon enough t his discussion will only be of hist orical int erest . Anot her im port ant decision t o be m ade up front is t he choice of proj ect - wide debugging convent ions. I t is ext rem ely handy t o have progress and st at us m essages print ed t o som e kind of a debug out put or log. The choice of direct ory st ruct ure and build procedures com es next . The st ruct ure of t he proj ect and it s com ponent s should find it s reflect ion in t he direct ory st ruct ure of source code. There is also a need for a place where proj ect - wide header files and code could be deposit ed. This is where one put s t he debugging harness, definit ions of com m on t ypes, proj ect - wide param et ers, shared ut ilit y code, useful t em plat es, et c. Som e degree of code reuse wit hin t he proj ect is necessary and should be well organized. What is usually quit e neglect ed is t he need for inform at ion about t he availabilit y of reusable code and it s docum ent at ion. The inform at ion about what is available in t he reusabilit y depart m ent should be broadcast on a regular basis and t he up- t o- dat e docum ent at ion should be readily available. One m ore observat ion- - in C+ + t here is a very t ight coupling bet ween header files and im plem ent at ion files- - we rarely m ake m odificat ions t o one wit hout inspect ing or m odifying t he ot her. This is why in m ost cases it m akes sense t o keep t hem t oget her in t he sam e direct ory, rat her t han is som e special include direct ory. We m ake t he except ion for headers t hat are shared bet ween direct ories. I t is also a good idea t o separat e plat form dependent layers int o separat e direct ories. We'll t alk about it soon.

To p -D o w n Ob je ct Or ie n t e d Im p le m e n t a t io n The im plem ent at ion process should m odel t he design process as closely as possible. This is why im plem ent at ion should st art wit h t he t op level com ponent s. The earlier we find t hat t he t op level int erfaces need m odificat ion, t he bet t er. Besides, we need a working program for t est ing as soon as possible. The goal of t he original im plem ent at ion effort is t o t est t he flow of cont rol, lifet im es and accessibilit y of t op level obj ect s as well as init ializat ion and shut down processes. At t his st age t he program is not supposed t o do anyt hing useful, it cannot be dem oed, it is not a prot ot ype. I f t he m anagem ent needs a prot ot ype, it should be im plem ent ed by a separat e group, possibly using a different language ( Basic, Sm allt alk, et c.) . Trying t o reuse code writ t en for t he prot ot ype in t he m ain proj ect is usually a big m ist ake.

345

Only basic funct ionalit y, t he one t hat 's necessary for t he program t o m ake progress, is im plem ent ed at t hat point . Everyt hing else is st ubbed out . St ubs of class m et hods should only print debugging m essages and display t heir argum ent s if t hey m ake sense. The debugging and error handling harness should be put in place and t est ed. I f t he program is int eract ive, we im plem ent as m uch of t he View and t he Cont roller as is necessary t o get t he inform at ion flowing t owards t he m odel and showing in som e m inim al view. The m odel can be st ubbed out com plet ely. Once t he working skelet on of t he program is in place, we can st art im plem ent ing lower level obj ect s. At every st age we repeat t he sam e basic procedure. We first creat e st ubs of all obj ect s at t hat level, t est t heir int erfaces and int eract ions. We cont inue t he descent unt il we hit t he bot t om of t he proj ect , at which point we st art im plem ent ing som e " real" funct ionalit y. The goal is for t he lowest level com ponent s t o fit right in int o t he whole st ruct ure. They should snap int o place, get cont rol when appropriat e, get called wit h t he right argum ent s, ret urn t he right st uff. This st rat egy produces professional program s of uniform qualit y, wit h com ponent s t hat fit t oget her very t ight ly and efficient ly like in a well designed sport s car. Conversely, t he bot t om - up im plem ent at ion creat es program s whose part s are of widely varying qualit y, put t oget her using scot ch t ape and st rings. A lot of program m er's t im e is spent t rying t o fit square pegs int o round holes. The result resem bles anyt hing but a well designed sport s car.

In h e r it in g So m e b o d y Els e 's Co d e I n t he ideal world ( from t he program m er's point of view) every proj ect would st art from scrat ch and have no ext ernal dependencies. Once in a while such sit uat ion happens and t his is when real progress is m ade. New languages, new program m ing m et hodologies, new t eam st ruct ures can be applied and t est ed. I n t he real world m ost proj ect s inherit som e source code, usually writ t en using obsolet e program m ing t echniques, wit h it s own m odel for error handling, debugging, use or m isuse of global obj ect s, goto's, spaghet t i code, funct ions t hat go for pages and pages, et c. Most proj ect s have ext ernal dependencies- som e code, t ools, or libraries are being developed by ext ernal groups. Worst of all, t hose groups have different goals, t hey have t o ship t heir own product , com pet e in t he m arket place, et c. Sure, t hey are always ent husiast ic about having t heir code or t ool used by anot her group and t hey prom ise cont inuing support . Unfort unat ely t hey have different priorit ies. Make sure your m anager has som e leverage over t heir m anager. I f you have full cont rol over inherit ed code, plan on rewrit ing it st ep by st ep. Go t hrough a series of code reviews t o find out which part s will cause m ost problem s and rewrit e t hem first . Then do parallel developm ent , int erlacing rewrit es wit h t he developm ent of new code. The effort will pay back in t erm s of debugging t im e and overall code qualit y.

M u lt i-p la t fo r m d e v e lo p m e n t A lot of program s are developed for m ult iple plat form s at once. I t could be different hardware or a different set of API s. Operat ing syst em s and com put ers evolve- - at any point in t im e t here is t he obsolet e plat form , t he m ost popular plat form , and t he plat form of t he fut ure. Som et im es t he t arget plat form is different t han t he developm ent plat form . I n any case, t he plat form - dependent t hings should be abst ract ed and separat ed int o layers.

346

Operat ing syst em is supposed t o provide an abst ract ion layer t hat separat es applicat ions from t he hardware. Except for very specialized applicat ions, access t o disk is very well abst ract ed int o t he file syst em . I n windowing syst em s, graphics and user input is abst ract ed int o API s. Our program should do t he sam e wit h t he plat form dependent services- - abst ract t hem int o layers. A layer is a set of services t hrough which our applicat ion can access som e lower level funct ionalit y. The advant age of layering is t hat we can t weak t he im plem ent at ion wit hout having t o m odify t he code t hat uses it . Moreover, we can add new im plem ent at ions or swit ch from one t o anot her using a com pilet im e variable. Som et im es a plat form doesn't provide or even need t he funct ionalit y provided by ot her plat form s. For inst ance, in a non- m ult it asking syst em one doesn't need sem aphores. St ill one can provide a locking syst em whose im plem ent at ion can be swit ched on and off, depending on t he plat form . We can const ruct a layer by creat ing a set of classes t hat abst ract som e funct ionalit y. For inst ance, m em ory m apped files can be com bined wit h buffered files under one abst ract ion. I t is advisable t hat t he im plem ent at ion choices be m ade in such a way t hat t he plat form - of- t he- fut ure im plem ent at ion be t he m ost efficient one. I t is wort h not icing t hat if t he plat form s differ by t he sizes of basic dat a t ypes, such as 16- bit vs. 32- bit int egers, one should be ext rem ely careful wit h t he design of t he persist ent dat a st ruct ures and dat a t ypes t hat can be t ransm it t ed over t he wire. The fool proof m et hod would be t o convert all fundam ent al dat a t ypes int o st rings of byt es of well defined lengt h. I n t his way we could even resolve t he Big Endian vs. Lit t le Endian differences. This solut ion is not always accept able t hough, because of t he runt im e overhead. A t radeoff is m ade t o eit her support only t hese plat form s where t he sizes of short s and longs are com pat ible ( and t he Endians are t he sam e) , or provide conversion program s t hat can t ranslat e persist ent dat a from one form at t o anot her. I n any case it is a good idea t o avoid using ints inside dat a t ypes t hat are st ored on disk or passed over t he wire.

Pr o g r a m M o d ifica t io n s Modificat ions of exist ing code range from cosm et ic changes, such as renam ing a variable, t o sweeping global changes and m aj or rewrit es. Sm all changes are oft en suggest ed during code reviews. The rule of t hum b is t hat when you see t oo m any local variables or obj ect s wit hin a single funct ion, or t oo m any param et ers passed back and fort h, t he code is ripe for a new abst ract ion. I t is int erest ing t o not ice how t he obj ect orient ed paradigm get s dist ort ed at t he highest and at t he lowest levels. I t is oft en difficult t o com e up wit h a good set of t op level obj ect s, and all t oo oft en t he m ain funct ion ends up being a large procedure. Conversely, at t he bot t om of t he hierarchy t here is no good t radit ion of using a lot of short - lived light weight local obj ect s. The t op level sit uat ion is a m at t er of good or poor design, t he bot t om level sit uat ion depends a lot on t he qualit y of code reviews. The above rule of t hum b is of great help t here. You should also be on t he lookout for t oo m uch cut - and- past e code. I f t he sam e set of act ions wit h only sm all m odificat ions happens in m any places, it m ay be t im e t o look for a new abst ract ion. Rewrit es of sm all part s of t he program happen, and t hey are a good sign of healt hy developm ent . Som et im es t he rewrit es are m ore serious. I t could be abst ract ing a layer, in which case all t he users of a given service have t o be m odified; or changing t he higher level st ruct ure, in which case a lot of lower level st ruct ures are influenced. Fort unat ely t he t op- down obj ect - orient ed design m akes such sweeping changes m uch easier t o m ake. I t is quit e possible t o split

347

a t op level obj ect int o m ore independent part s, or change t he cont ainm ent or access st ruct ure at t he highest level ( for exam ple, m ove a sub- obj ect from one obj ect t o anot her) . How is it done? The key is t o m ake t he changes increm ent ally, t op- down. During t he first pass, you change t he int erfaces, pass different set s of argum ent s- - for inst ance pass reference variables t o t hose places t hat used t o have direct access t o som e obj ect s but are about t o loose it . Make as few changes t o t he im plem ent at ion as possible. Com pile and t est . I n t he second pass, m ove obj ect s around and see if t hey have access t o all t he dat a t hey need. Com pile and t est . I n t he t hird pass, once you have all t he obj ect s in place and all t he argum ent s at your disposal, st art m aking t he necessary im plem ent at ion changes, st ep by st ep.

Te s t in g Test ing st art s at t he sam e t im e as t he im plem ent at ion. At all t im es you m ust have a working program . You need it for your t est ing, your t eam m at es need it for t heir t est ing. The funct ionalit y will not be t here, but t he program will run and will at least print som e debugging out put . As soon as t here is som e funct ionalit y, st art regression t est ing. Re gre s s io n Te s tin g Develop a t est suit e t o t est t he basic funct ionalit y of your syst em . Aft er every change run it t o see if you hadn't broken any funct ionalit y. Expand t he t est suit e t o include basic t est s of all new funct ionalit y. Running t he regression suit e should not t ake a long t im e. Stre s s Te s tin g As soon as som e funct ionalit y st art s approaching it s final form , st ress t est ing should st art . Unlike regression t est ing, st ress t est ing is t here t o t est t he lim it s of t he syst em . For inst ance, a com prehensive t est of all possible failure scenarios like out - of- m em ory errors in various places, disk failures, unexpect ed powerdowns, et c., should be m ade. The scaleabilit y under heavy loads should be t est ed. Depending on t he t ype of program , it could be processing lot s of sm all files, or one ext rem ely large file, or lot s of request s, et c.

348