321 114 2MB
English Pages 348 Year 2001
.RO Re le as e ☺
Co n te n ts 1. 2. 3. 4. 5. 6. 7.
Preface I nt roduct ion Language Techniques Windows Techniques Soft ware Proj ect Appendix
Pre face W h y Th i s B o o k ? During t he first four m ont h of 1994 I was present ed wit h a wonderful opport unit y. My old Universit y in Wroclaw, Poland, invit ed m e t o give t wo courses for t he st udent s of Com put er Physics. The choice of t opics was left ent irely t o m y discret ion. I knew exact ly what I want ed t o t each... My work at Microsoft gave m e t he unique experience of working on large soft ware proj ect s and applying and developing st at e of t he art design and program m ing m et hodologies. Of course, t here are plent y of books on t he m arket t hat t alk about design, program m ing paradigm s, languages, et c. Unfort unat ely m ost of t hem are eit her writ t en in a dry academ ic st yle and are quit e obsolet e, or t hey are hast ily put t oget her t o cat ch t he lat est vogue. There is a glut of books t eaching program m ing in C, C+ + and, m ore recent ly, in Java. They t each t he language, all right , but rarely do t hey t each program m ing. We have t o realize t hat we are wit nessing an unprecedent ed explosion of new hardware and soft ware t echnologies. For t he last t went y years t he power of com put ers grew exponent ially, alm ost doubling every year. Our soft ware experience should follow t his exponent ial curve as well. Where does t his leave books t hat were writ t en t en or t went y years ago? And who has t im e t o writ e new books? The academ ics? The hom e program m ers? The conference crowd? What about people who are act ive full t im e, designing and im plem ent ing st at e of t he art soft ware? They have no t im e! I n fact I could only dream about writ ing t his book while working full t im e at Microsoft . I had problem s finding t im e t o share experiences wit h ot her t eam s working on t he sam e proj ect . We were all t oo busy writ ing soft ware. And t hen I m anaged t o get a four- m ont h leave of absence. This is how t his book st art ed. Teaching courses t o a live, dem anding audience is t he best way of syst em at izing and t est ing ideas and m aking fast progress writ ing a book. The goal I put forward for t he courses was t o prepare t he st udent s for j obs in t he indust ry. I n part icular, I asked m yself t he quest ion: I f I want ed t o hire a new program m er, what would I like him t o know t o becom e a product ive m em ber of m y t eam as quickly as possible? For sure, I would like such a person t o know • C+ + and obj ect orient ed program m ing. • Top- down design and t op- down im plem ent at ion t echniques. • Effect ive program m ing wit h t em plat es and C+ + except ions. • Team work. He ( and whenever I use t he pronoun he, I m ean it as an abbreviat ion for he or she) should be able t o writ e reliable and m aint ainable code, easy t o underst and by ot her m em bers of t he t eam . The person should know advanced
2
program m ing t echniques such as synchronizat ion in a m ult it hreaded environm ent , effect ive use of virt ual m em ory, debugging t echniques, et c. Unfort unat ely, m ost college graduat es are never t aught t his kind of " indust rial st rengt h" program m ing. Som e universit ies are known t o produce first class com put er hackers ( and seem t o be proud of it ! ) . What 's worse, a lot of experienced program m ers have large holes in t hat area of t heir educat ion. They don't know C+ + , t hey use C- st yle program m ing in C+ + , t hey skip t he design st age, t hey im plem ent bot t om - up, t hey hat e C+ + except ions, and t hey don't work wit h t he t eam . The bot t om line is t his: t hey wast e a lot of t heir own t im e and t hey wast e a lot of ot hers' t im e. They produce buggy code t hat 's difficult t o m aint ain. So who are you, t he reader of t his book? You m ight be a beginner who want s t o learn C+ + . You m ight be a st udent who want s t o supplem ent his or college educat ion. You m ight be a new program m er who is t rying t o m ake a t ransit ion from t he academ ic t o t he indust rial environm ent . Or you m ight be a seasoned program m er in search of new ideas. This book should sat isfy you no m at t er what cat egory you find yourself in.
3
In tro d u ctio n I have divided t his book int o t hree part s, t he Language, t he Techniques, and t he Soft ware Proj ect .
Lan gu age The first part t eaches C+ + , t he language of choice for general- purpose program m ing. But it is not your usual C+ + t ut orial. For t he beginner who doesn't know m uch about C or C+ + , it j ust int roduces a new obj ect orient ed language. I t doesn't concent rat e on synt ax or gram m ar; it shows how t o express cert ain ideas in C+ + . I t is like t eaching a foreign language by conversat ion rat her t han by m em orizing words and gram m at ical rules ( when I was t eaching it t o st udent s, I called t his part of t he course " Conversat ional C+ + " ) . Aft er all, t his is what t he program m er needs: t o be able t o express ideas in t he form of a program writ t en in a part icular language. When I learn a foreign language, t he first t hing I want t o know is how t o say, " How m uch does it cost ?" I don't need t o learn t he whole conj ugat ion of t he verb 't o cost ' in t he past , present and fut ure t enses. I j ust want t o be able t o walk int o a st ore in a foreign count ry and buy som et hing. For a C program m er who doesn't know m uch about C+ + ( ot her t han t hat it 's slow and crypt ic- - t he popular m yt hs in t he C subcult ure) t his is an exercise in unlearning C in order t o effect ively program in C+ + . Why should a C program m er unlearn C? I sn't C+ + a superset of C? Unfort unat ely yes! The decision t o m ake C+ + com pat ible wit h C was a purely pract ical, m arket ing decision. And it worked! I nst ead of being a com plet ely new product t hat would t ake decades t o gain t he m arket , it becam e " version 3.1" of C. This is bot h good and bad. I t 's good because backward C com pat ibilit y allowed C+ + , and som e elem ent s of obj ect orient ed program m ing, t o quickly gain foot hold in t he program m ing com m unit y. I t 's bad because it doesn't require anybody t o change his program m ing m et hodology. I nst ead of having t o rewrit e t he exist ing code all at once, m any com panies were, and st ill are, able t o gradually phase C+ + in. The usual pat h for such a phase- in is t o int roduce C+ + as a 'st rict er' C. I n principle all C code could be recom piled as C+ + . I n pract ice, C+ + has som ewhat st rict er t ype checking and t he com piler is able t o det ect m ore bugs and issue m ore warnings. So recom piling C code using a C+ + com piler is a way of cleaning up t he exist ing code. The changes t hat have t o be int roduced int o t he source code at t hat st age are m ost ly bug fixes and st rict er t ype enforcem ent . I f t he code was writ t en in pre- ANSI C, t he prot ot ypes of all funct ions have t o be generat ed. I t is surprising how m any bugs are det ect ed during t his ANSI - zat ion procedure. All t his work is definit ely wort h t he effort . A C com piler should only be used when a good C+ + com piler is not available ( really, a rare occurrence nowadays) . Once t he C+ + com piler becom es part of t he program m ing environm ent , program m ers sooner or lat er st art learning new t ricks and event ually t hey develop som e kind of C+ + program m ing m et hodology, eit her on t heir own or by reading various self- help books. This is where t he bad news st art s. There is a subset of C+ + ( I call it t he C ghet t o) where m any ex- C- program m ers live. A lot of C program m ers st art hat ing C+ + aft er a glim pse of t he C ghet t o. They don't realize t hat C+ + has as m any good uses as m isuses. For a C- ghet t o program m er t his book should be a shock ( I hope! ) . I t essent ially says, " what ever you did up t o now was wrong" and " Kernighan and Rit chie are not gods" . ( Kernighan and Rit chie are t he creat ors of C and t he
4
aut hors of t he influent ial book The C Program m ing Language) . I want t o m ake t his clear right here and now, in t he int roduct ion. I underst and t hat t he first , quit e nat ural, react ion of such a program m er is t o close t he book im m ediat ely ( or, act ually, j um p t o anot her I nt ernet sit e) and ask for a refund. Please don't do t his! The shocking, iconoclast ic value of t his book is not t here t o hurt anybody's feelings. Seeing t hat t here exist s a drast ically different philosophy is supposed t o prom pt one t o ret hink one's beliefs. Besides, t he Em peror is naked. For a C+ + program m er, t he t ut orial offers a new look at t he language. I t shows how t o avoid t he pit falls of C+ + and use t he language according t o t he way it should have been designed in t he first place. I would lie if I said t hat C+ + is a beaut iful program m ing language. However, it is going t o be, at least for som e t im e, t he m ost popular language for writ ing serious soft ware. We m ay as well t ry t o t ake advant age of it s expressive power t o writ e bet t er soft ware, rat her t han use it t o find so m any m ore ways t o hurt ourselves. For a C+ + program m er, t his part of t he book should be m ost ly easy reading. And, alt hough t he const ruct s and t he t echniques int roduced t here are widely known, I t ried t o show t hem from a different perspect ive. My overriding philosophy was t o creat e a syst em t hat prom ot es m aint ainable, hum an- readable coding st yle. That 's why I t ook every opport unit y not only t o show various program m ing opt ions but also t o explain why I considered som e of t hem superior t o ot hers. Finally, for a Java program m er, t his book should be an eye- opener. I t shows t hat , wit h som e discipline, it is possible t o writ e safe and robust code in C+ + . Everyt hing Java can do, C+ + can do, t oo. Plus, it can deliver unm at ched perform ance. But perform ance is not t he only reason t o st ick wit h C+ + . The kind of elegant resource m anagem ent t hat can be im plem ent ed in C+ + is quit e im possible in Java, because of Java's reliance on garbage collect ion. I n C+ + you can have obj ect s whose lifet im e is precisely defined by t he scope t hey live in. You are guarant eed t hat t hese obj ect s will be dest royed upon t he exit from t hat scope. That 's why you can ent rust such obj ect s wit h vit al resources, like sem aphores, file handles, dat abase t ransact ions, et c. Java obj ect s, on t he ot her hand, have undefined life spans- - t hey are deallocat ed only when t he runt im e decides t o collect t hem . So t he way you deal wit h resources in Java harks back t o t he old C except ion paradigm , where t he finally clause had t o do all t he painfully explicit garbage collect ion. There are no " nat ive" speakers of C+ + . When " speaking" C+ + , we all have som e accent t hat reveals our program m ing background. Som e of us have a st rong C accent , som e use Sm allt alk- like expressions, ot hers Lisp- - The goal of t he t ut orial is t o com e as close as possible t o being a nat ive speaker of C+ + . Language is a t ool for expressing ideas. Therefore t he em phasis is not on synt ax and gram m ar but on t he ways t o express yourself. I t is not " Here's a cut e C+ + const ruct and t his is how you m ight use it ." I nst ead it is m ore of " Here's an idea. How do I express it in C+ + ?" I nit ially t he 'ideas' t ake t he form of sim ple sent ences like " A st ar is a celest ial body," or " A st ack allows you t o push and pop." Lat er t he sent ences are com bined t o form 'paragraphs.' describing t he funct ionalit y of a soft ware com ponent . The various const ruct s of C+ + are int roduced as t he need arises, always in t he cont ext of a problem t hat needs t o be solved.
Te ch n iqu e s Writ ing good soft ware requires m uch m ore t han j ust learning t he language. First ly, t he program doesn't execut e in a vacuum . I t has t o int eract wit h t he
5
com put er. And int eract ing wit h t he com put er m eans going t hrough t he operat ing syst em . Wit hout having som e knowledge of t he operat ing syst em , it is im possible t o writ e serious program s. Secondly, we not only want t o writ e program s t hat run- - we want our program s t o be sm all, fast , reliable, robust and scaleable. Thirdly, we want t o finish t he developm ent of a program in a sensible am ount of t im e, and we want t o m aint ain and enhance it aft erwards. The goal of t he second part of t he book, The Techniques, is t o m ake possible t he t ransit ion from 'weekend program m ing' t o 'indust rial st rengt h program m ing.' I will describe t he t echnique t hat m akes program m ing in C+ + an order of m agnit ude m ore robust and m aint ainable. I call it " m anaging resources" since it is cent ered on t he idea of a program creat ing, acquiring, owning and releasing various kinds of resources. For every resource, at any point in t im e during t he execut ion of t he program , t here has t o be a well- defined owner responsible for it s release. This sim ple idea t urns out t o be ext rem ely powerful in designing and m aint aining com plex soft ware syst em s. Many a bug has been avoided or found and fixed using resource ownership analysis. Resource m anagem ent m eshes very nat urally wit h C+ + except ion handling. I n fact , writ ing sensible C+ + program s t hat use except ions seem s virt ually im possible wit hout t he encapsulat ion of resources. So, when should you use except ions? What do t hey buy you? I t depends on what your response is t o t he following sim ple quest ion: Do you always check t he result of new ( or, for C program m ers, t he result of malloc) ? This is a rhet orical quest ion. Unless you are an except ionally careful program m er- - you don't . That m eans you are already using except ions, whet her you want it or not . Because accessing a null point er result s in an except ion called t he General Prot ect ion Fault ( GP- fault or Access Violat ion, as t he program m ers call it ) . I f your program is not except ionaware, it will die a horrible deat h upon such an except ion. What 's m ore, t he operat ing syst em will sham e you by put t ing up a m essage box, leaving no doubt t hat it was your applicat ion t hat was writ t en using sub- st andard program m ing pract ices ( m aybe not in so m any words) . My point is, in order t o writ e robust and reliable applicat ions- - and t hat 's what t his book is about - - you will sooner or lat er have t o use except ions. Of course, t here are ot her program m ing t echniques t hat were and st ill are being successfully applied t o t he developm ent of reasonably robust and reliable applicat ions. None of t hem , however, com es close in t erm s of sim plicit y and m aint ainabilit y t o t he applicat ion of C+ + except ions in com binat ion wit h t he resource m anagem ent t echniques. I will int roduce t he int eract ion wit h t he operat ing syst em t hrough a series of Windows program m ing exercises. They will lead t he reader int o new program m ing paradigm s: m essage- based program m ing, Model- View- Cont roller approach t o user int erface, et c. The advances in com put er hardware paved t he way t o a new generat ion of PC operat ing syst em s. Preem pt ive m ult it asking and virt ual m em ory are finally m ainst ream feat ures on personal com put ers. So how does one writ e an applicat ion t hat t akes advant age of m ult it asking? How does one synchronize m ult iple t hreads accessing t he sam e dat a st ruct ure? And m ost im port ant ly, how does m ult it asking m esh wit h t he obj ect - orient ed paradigm and C+ + ? I will t ry t o answer t hese quest ions. Virt ual m em ory gives your applicat ion t he illusion of pract ically infinit e m em ory. On a 32- bit syst em you can address 4 gigabyt es of virt ual m em ory- - in pract ice t he am ount of available m em ory is lim it ed by t he size of your hard disk( s) . For t he applicat ion you writ e it m eans t hat it can easily deal wit h m ult i-
6
m egabyt e m em ory based dat a st ruct ures. Or can it ? Welcom e t o t he world of t hrashing! I will explain which algorit hm s and dat a st ruct ures are com pat ible wit h virt ual m em ory and how t o use m em ory- m apped files t o save disk space.
So ftw are Pro je ct There is m ore t o t he creat ion of a successful applicat ion ( or syst em ) t han j ust learning t he language and m ast ering t he t echniques. Today's com m ercial soft ware proj ect s are am ong t he m ost com plex engineering undert akings of hum ankind. Program m ing is essent ially t he art of dealing wit h com plexit y. There were m any at t em pt s t o apply t radit ional engineering m et hods t o cont rol soft ware's com plexit y. Modularizat ion, soft ware reuse, soft ware I C's, et c. Let 's face it - - in general t hey don't work. They m ay be very helpful in providing low level building blocks and libraries, but t hey can hardly be used as guiding principles in t he design and im plem ent at ion of com plex soft ware proj ect s. The sim ple reason is t hat t here is very lit t le repet it ion in a piece of soft ware. Try t o visually com pare a print out of a program wit h, say, a pict ure of a m icroprocessor wafer. You'll see a lot of repet it ive pat t erns in t he layout of t he m icroprocessor. Piece- wise it resem bles som e kind of a high- t ech cryst al. A condensed view of a program , on t he ot her hand, would look m ore like a hight ech fract al. You'd see a lot of self- sim ilarit ies- - large- scale pat t erns will resem ble sm all- scale pat t erns. But you'd find very few exact m at ches or repet it ions. Each lit t le piece appears t o be individually handcraft ed. Repet it ions in a program are not only unnecessary but t hey cont ribut e t o a m aint enance night m are. I f you m odify, or bug- fix, one piece of code, your are supposed t o find all t he copies of t his piece and apply ident ical m odificat ions t o t hem as well. This abhorrence of repet it ion is reflect ed in t he product ion process of soft ware. The proport ion of research, design and m anufact uring in t he soft ware indust ry is different t han in ot her indust ries. Manufact uring, for inst ance, plays only a m arginal role. St rict ly speaking, elect ronic channels of dist ribut ion could m ake t he m anufact uring phase t ot ally irrelevant . R & D plays a vit al role, m ore so t han in m any ot her indust ries. But what really set s soft ware developm ent apart from ot hers is t he am ount of design t hat goes int o t he product . Program m ing is designing. Designing, building prot ot ypes, t est ing- - over and over again. Soft ware indust ry is t he ult im at e " design indust ry." I n t he t hird part of t he book I will at t em pt t o describe t he large- scale aspect s of soft ware developm ent . I will concent rat e on t he dynam ics of a soft ware proj ect , bot h from t he point of view of m anagem ent and planning as well as developm ent st rat egies and t act ics. I will describe t he dynam ics of a proj ect from it s concept ion t o shipm ent . I will t alk about docum ent at ion, t he design process and t he developm ent process. I will not , however, t ry t o com e up wit h ready- m ade recipes because t hey won't work- - for exact ly t he reasons described above. There is a popular unflat t ering st ereot ype of a program m er as a socially challenged nerd. Som ebody who would work alone at night , subsist on Twinkies, avoid direct eye cont act and care very lit t le about personal hygiene. I 've known program m ers like t hat , and I 'm sure t here are st ill som e around. However m ost of t he specim ens of t his old cult ure are becom ing ext inct , and for a good reason. Progress in hardware and soft ware m akes it im possible t o produce any reasonably useful and reliable program while working in isolat ion. Team work is t he essent ial part of soft ware developm ent . Dividing t he work and coordinat ing t he developm ent effort of a t eam is always a big challenge. I n t radit ional indust ries m em bers of t he t eam know ( at least in t heory) what t hey are doing. They learned t he rout ine. They are
7
perform ing a synchronized dance and t hey know t he st eps and hear t he m usic. I n t he soft ware indust ry every t eam m em ber im provises t he st eps as he or goes and, at t he sam e t im e, com poses t he m usic for t he rest of t he t eam . I will advocat e a change of em phasis in soft ware developm ent . I nst ead of t he old axiom Program s are writ t en for com put ers. I will t urn t he logic upside down and claim t hat Program s are writ t en for program m ers. This st at em ent is in fact t he prem ise of t he whole book. You can't develop indust rial st rengt h soft ware if you don't t reat you code as a publicat ion for ot her program m ers t o read, underst and and m odify. You don't want your 'code' t o be an exercise in crypt ography. The com put er is t he ult im at e proofing t ool for your soft ware. The com piler is your spell- checker. By running your program you at t em pt t o t est t he correct ness of your publicat ion. But it 's only anot her hum an being- - a fellow program m er- - t hat can underst and t he m eaning of your program . And it is crucial t hat he do it wit h m inim um effort , because wit hout underst anding, it is im possible t o m aint ain your soft ware.
8
La n g u a g e
• O b je c t s a n d S c o p e s
What 's t he m ost im port ant t hing in t he Universe? I s it m at t er? I t seem s like everyt hing is built from m at t er- galaxies, st ars, planet s, houses, cars and even us, program m ers. But what 's m at t er wit hout energy? The Universe would be dead wit hout it . Energy is t he source of change, m ovem ent , life. But what is m at t er and energy wit hout space and t im e? We need space int o which t o put m at t er, and we need t im e t o see m at t er change. Program m ing is like creat ing universes. We need m at t er: dat a st ruct ures, obj ect s, variables. We need energy- - t he execut able code- - t he lifeforce of t he program . Obj ect s would be dead wit hout code t hat operat es on t hem . Obj ect s need space t o be put int o and t o relat e t o each ot her. Lines of code need t im e t o be execut ed. The space- t im e of t he program is described by scopes. An obj ect lives and dies by it s scope. Lines of execut able code operat e wit hin scopes. Scopes provide t he st ruct ure t o program 's space and t im e. And ult im at ely program m ing is about st ruct ure.
• Ar r a ys a n d R e fe r e n c e s
I n a program , an obj ect is ident ified by it s nam e. But if we had t o call t he obj ect by it s nam e everywhere, we would end up wit h one global nam e space. Our program would execut e in a st ruct ureless " obj ect soup." The power t o give an obj ect different nam es in different scopes provides an addit ional level of indirect ion, so im port ant in program m ing. There is an old saying in Com put er Science- - every problem can be solved by adding a level of indirect ion. This indirect ion can be accom plished by using a reference, an alias, an alt ernat ive nam e, t hat can be at t ached t o a different obj ect every t im e it ent ers a scope. Com put ers are great at m enial t asks. They have a lot m ore pat ience t hat we hum ans do. I t is a punishm ent for a hum an t o have t o writ e " I will not challange m y t eacher's aut horit y" a hundred t im es. Tell t he com put er t o do it a hundred t im es, and it won't even blink. That 's t he power of it erat ion ( and conform it y) .
• Po in te rs
Using references, we can give m ult iple nam es t o t he sam e obj ect . Using point ers, we can have t he sam e nam e refer t o different obj ect s- - a point er is a m ut able reference. Point ers give us power t o creat e com plex dat a st ruct ures. They also increase our abilit y t o shoot ourselves in t he foot . Point er is like a plug t hat can be plugged int o a j ack. I f you have t oo m any plugs and t oo m any j acks, you m ay end up wit h a m ess of t angled cables. A program m er has t o st rike a balance bet ween creat ing a program t hat looks like a breadboard or like a print ed circuit .
• Po lym o rph is m
Polym orphic m eans m ult i- shaped. A t uner, a t ape deck, a CD player- - t hey com e in different shapes but t hey all have t he sam e audio- out j ack. You can plug your earphones int o it and list en t o m usic no m at t er whet her it cam e as a m odulat ion of a carrier wave, a set of m agnet ic dom ains on a t ape or a series of pit s in t he alum inum subst rat e on a plast ic disk.
9
• S m a l l S o f t w a r e P r o je c t
When you writ e a program , you don't ask yourself t he quest ion, " How can I use a part icular language feat ure?" You ask, " What language feat ure will help m e solve m y problem ?"
10
O b je c t s a n d S c o p e s 1. Global Scope 2. Local Scope 3. Em bedded Obj ect s 4. I nherit ance 5. Mem ber Funct ions and I nt erfaces 6. Mem ber Funct ion Scope 7. Types 8. Sum m ary 9. Word of Caut ion 10. Exercises 11. Abst ract Dat a Types 12. Exercises
Glo bal s co p e Class definit ion, obj ect definit ion, const ruct or, dest ruct or, out put st ream , include, m ain. There is an old t radit ion in t eaching C, dat ing back t o Kernighan and Rit chie ( The C Program m ing Language) , t o have t he first program print t he greet ing " Hello World! " . I t is only appropriat e t hat our first C+ + program should respond t o t his greet ing. The way t o do it , of course, is t o creat e t he World and let it speak for it self. The following program does j ust t hat , but it also serves as a m et aphor for C+ + program m ing. Every C+ + program is a world in it self. The world is a play and we define t he charact ers in t hat play and let t hem int eract . This program in a sense is " t he Mot her of all C+ + program s," it cont ains j ust one player, t he World, and let s us wit ness it s creat ion and dest ruct ion. The World int eract s wit h us by print ing m essages on t he com put er screen. I t print s " Hello! " when it is creat ed, and " Good bye! " when it vanishes. So here we go: #include class World { public: World () { std::cout "; // prompt cin.getline (buf, maxBuf); Scanner scanner (buf); Parser parser (scanner, store, funTab, symTab); status = parser.Eval (); } while (status != stQuit); } Not ice t hat for every line of input we creat e a new scanner and a new parser. We keep however t he sam e sym bol t able, funct ion t able and t he st ore. This is im port ant because we want t he values assigned t o variables t o be rem em bered as long as t he program is act ive. The parser’s dest ruct or is called aft er t he evaluat ion of every line. This call plays an im port ant role of freeing t he parse t ree. There is a com m ent at t he t op of main, which hint s at ways of im proving t he st ruct ure of t he program . There are five local variables/ obj ect s defined at t he t op of main. Moreover, t hey depend on each ot her: t he sym bol t able has t o be init ialized before t he funct ion t able and before t he st ore. The parser’s const ruct or t akes references t o t hree of t hem . As a rule of t hum b, whenever you see t oo m any local variables, you should t hink hard how t o com bine at least som e of t hem int o a separat e obj ect . I n our case, it 's pret t y obvious t hat t his obj ect sould be called Calculator. I t should com bine SymbolTable, FunctionTable and Store as it s em beddings. We’ll com e back t o t his program in t he next part of t he book t o see how it can be m ade int o a professional, " indust rial st rengt h" program .
In itializatio n o f Aggre gate s Explicit init ializat ion of classes and arrays. Just as you can explicit ly init ialize an array of charact ers using a lit eral st ring char string [] = "Literal String"; you can init ialize ot her aggregat e dat a st ruct ures- - classes and arrays. An obj ect of a given class can be explicit ly init ialized if and only if all it s non- st at ic dat a m em bers are public and t here is no base class, no virt ual funct ions and no user- defined const ruct or. All public dat a m em bers m ust be explicit ly init ializable as well ( a lit t le recursion here) . For inst ance, if you have
118
class Initializable { public: // no constructor int _val; char const * _string; Foo * _pFoo; }; you can define an inst ance of such class and init ialize all it s m em bers at once Foo foo; Initializable init = { 1, "Literal String", &foo }; Since Initializable is init ializable, you can use it as a dat a m em ber of anot her init ializable class. class BigInitializable { public: Initializable _init; double _pi; }; BigInitializable big = { { 1, "Literal String", &foo }, 3.14 }; As you see, you can nest init ializat ions. You can also explicit ly init ialize an array of obj ect s. They m ay be of a sim ple or aggregat e t ype. They m ay even be arrays of arrays of obj ect s. Here are a few exam ples. char string [] = { 'A', 'B', 'C', '\0' }; is equivalent t o it s short hand char string [] = "ABC"; Here's anot her exam ple Initializable init [2] = { { 1, "Literal String", &foo1 }, { 2, "Another String", &foo2 } }; We used t his m et hod in t he init ializat ion of our array of FuncitionEntry obj ect s. I f obj ect s in t he array have single- argum ent const ruct ors, you can specify t hese argum ent s in t he init ializer list . For inst ance, CelestialBody solarSystem = { 0.33, 4.87, 5.98, 0.64, 1900, 569, 87, 103, 0.66 }; where m asses of planet s are given in unit s of 10 24 kg.
Exe rcis e s
119
1. Creat e a t op level Calculator obj ect and m ake appropriat e changes t o lower level com ponent s. 2. Add t wo new built in funct ions t o t he calculat or, sqr and cube. Sqr squares it s argum ent and cube cubes it ( raises t o t he t hird power) . 3. Add t he recognit ion of unary plus t o t he calculat or. Make necessary m odificat ions t o t he scanner and t he parser. Add a new node, UPlusNode. The calculat or should be able t o deal correct ly wit h such expressions as x = +2 2 * + 7 1 / (+1 - 2) 4. Add powers t o t he calculat or according t o t he following product ions Factor is SimpleFactor ^ Factor // a ^ b (a to the power of b) or SimpleFactor SimpleFactor is ( Expression ) // parenthesized expression or Number // literal floating point number or Identifier ( Expression )// function call or Identifier// symbolic variable or - Factor // unary minus 5. To all nodes in t he parse t ree add virt ual m et hod Print(). When Print() is called on t he root of t he t ree, t he whole t ree should be displayed in som e readable form . Use varying indent at ion ( by print ing a num ber of spaces at t he beginning of every line) t o dist inguish bet ween different levels of t he t ree. For inst ance void AddNode::Print (int indent) const { _pLeft->Print (indent + 2); Indent (indent); cout -sin(x) exp(x) -> exp(x) log(x) -> 1/x sqrt(x) -> 1/(2 * sqrt(x)) The derivat ive of a sum is a sum of derivat ives, t he derivat ive a product is given by t he form ula (f(x) * g(x))’ = f’(x) * g(x) + f(x) * g’(x) where prim e denot es a derivat ive. The derivat ive of a quot ient is given by t he form ula
120
(f(x) / g(x))’ = (f(x) * g’(x) - f’(x) * g(x)) / (g(x) * g(x)) and t he derivat ive of t he superposit ion of funct ions is given by (f(g(x))’ = g’(x) * f’(g(x)). Rewrit e t he calculat or t o derive t he sym bolic derivat ive of t he input by t ransform ing t he parse t ree according t o t he form ulas above. Make sure no m em ory is leaked in t he process ( t hat is, you m ust delet e everyt hing you allocat e) .
Op e rato r o ve rlo ad in g You can pret t y m uch do any kind of arit hm et ic in C+ + using t he built - in int egral and float ing- point t ypes. However, t hat 's not always enough. Old- t im e engineers swear by Fort ran, t he language which has built - in t ype com plex. I n a lot of engineering applicat ions, especially in elect ronics, you can't really do effect ive calculat ions wit hout t he use of com plex num bers. C+ + does not support com plex arit hm et ics. Neit her does it support m at rix or vect or calculus. Does t hat m ean t hat engineers and scient ist s should st ick t o Fort ran? Not at all! Obviously in C+ + you can define new classes of obj ect s, so defining a com plex num ber is a piece of cake. What about adding, subt ract ing, m ult iplying, et c.? You can define appropriat e m et hods of class complex. What about not at ional convenience? I n Fort ran you can add t wo com plex num bers sim ply by put t ing t he plus sign bet ween t hem . No problem ! Ent er operat or overloading. I n an expression like double delta = 5 * 5 - 4 * 3.2 * 0.1; you see several arit hm et ic operat ors: t he equal sign, t he m ult iplicat ion sym bol and t he m inus sign. Their m eaning is well underst ood by t he com piler. I t knows how t o m ult iply or subt ract int egers or float ing- point num bers. But if you want t o t each t he com piler t o m ult iply or subt ract obj ect s of som e user- defined class, you have t o overload t he appropriat e operat ors. The synt ax for operat or overloading requires som e get t ing used t o, but t he use of overloaded operat ors doesn't . You sim ply put a m ult iplicat ion sign bet ween t wo com plex variables and t he com piler finds your definit ion of com plex m ult iplicat ion and applies it . By t he way, a complex t ype is convenient ly defined for you in t he st andard library. An equal sign is an operat or t oo. Like m ost operat ors in C+ + , it can be overloaded. I t s m eaning, however, goes well beyond arit hm et ics. I n fact , if you don't do anyt hing special about it , you can assign an arbit rary obj ect t o anot her obj ect of t he sam e class by sim ply put t ing an equal sign bet ween t hem . Yes, t hat 's right , you can, for inst ance, do t hat : SymbolTable symTab1 (100); SymbolTable symTab2 (200); symTab1 = symTab2; Will t he assignm ent in t his case do t he sensible t hing? No, t he assignm ent will m ost definit ely be wrong, and it will result in a very nast y problem wit h m em ory m anagem ent . So, even if you're not planning on overloading t he st andard arit hm et ic operat ors, you should st ill learn som et hing about t he
121
assignm ent operat or; including when and why you would want t o overload it . And t hat brings us t o a very im port ant t opic- - value sem ant ics.
Pas s in g by Valu e Copy const ruct or, overloading t he assignm ent operat or, default copy const ruct or and operat or = , ret urn by value, passing by value, im plicit t ype conversions. So far we've been careful t o pass obj ect s from and t o m et hods using references or point ers. For inst ance, in t he following line of code Parser
parser (scanner, store, funTab, symTab);
all t he argum ent s t o t he Parser's const ruct or- - scanner, store, funTab and symTab- - are passed by reference. We know t hat , because we've seen t he following declarat ion ( and so did t he com piler) : Parser (Scanner & scanner, Store & store, FunctionTable & funTab, SymbolTable & symTab); When we const ruct t he parser, we don't give it a copy of a sym bol t able. We give it access t o an exist ing sym bol t able. I f we gave it a privat e copy, we wouldn't have been able t o see t he changes t he parser m ade t o it . The parser m ay, for inst ance, add a new variable t o t he sym bol t able. We want our sym bol t able t o rem em ber t his variable even aft er t he current parser is dest royed. The sam e goes for store- - it m ust rem em ber t he values assigned t o sym bolic variables across t he invocat ions of t he parser. But what about t he scanner? We don't really care whet her t he parser m akes a scrat ch copy of it for it s privat e use. Neit her do we care what t he parser does t o t he funct ion t able. What we do care about in t his case is perform ance. Creat ing a scrat ch copy of a large obj ect is quit e t im e consum ing. But suppose we didn't care about perform ance. Would t he following work? Parser (Scanner scanner, Store & store, FunctionTable funTab, SymbolTable & symTab); Not ice t he absence of am persands aft er Scanner and FunctionTable. What we are t elling t he com piler is t his: When t he caller creat es a Parser, passing it a scanner, m ake a t em porary copy of t his scanner and let t he Parser's const ruct or operat e on t hat copy. This is, aft er all, t he way built - in t ypes are passed around. When you call a m et hod t hat expect s an int eger, it 's t he copy of t hat int eger t hat 's used inside t he m et hod. You can m odify t hat copy t o your heart 's cont ent and you'll never change t he original. Only if you explicit ly request t hat t he m et hod accept a reference t o an int eger, can you change t he original. There are m any reasons why such approach will not work as expect ed, unless we m ake several furt her m odificat ions t o our code. First of all, t he t em porary copy of t he scanner ( and t he funct ion t able) will disappear as soon as t he execut ion of t he Parser's const ruct or is finished. The parser will st ore a
122
reference t o it in it s m em ber variable, but t hat 's useless. Aft er t he end of const ruct ion t he reference will point t o a non- exit sent scrat ch copy of a scanner. That 's not good. I f we decide t o pass a copy of t he scanner t o t he parser, we should also st ore a copy of t he scanner inside t he parser. Here's how you do it - - j ust om it t he am persand. class Parser { ... private: Scanner _scanner; Node * _pTree; Status _status; Store & _store; FunctionTable _funTab; SymbolTable & _symTab; }; But what is really happening inside t he const ruct or? Now t hat neit her t he argum ent , scanner, nor t he m em ber variable, _scanner, are references, how is _scanner init ialized wit h scanner? The synt ax is m isleadingly sim ple. Parser::Parser (Scanner scanner, Store & store, FunctionTable funTab, SymbolTable & symTab) : _scanner (scanner), _pTree (0), _status (stOk), _funTab (funTab), _store (store), _symTab (symTab) { } What happens behind t he scenes is t hat Scanner's copy const ruct or is called. A copy const ruct or is t he one t hat t akes a ( possibly const) reference t o t he obj ect of t he sam e class and clones it . I n our case, t he appropriat e const ruct or would be declared as follows, Scanner::Scanner (Scanner const & scanner); But wait a m inut e! Scanner does not have a const ruct or of t his signat ure. Why doesn't t he com piler prot est , like it always does when we t ry t o call an undefined m em ber funct ion? The unexpect ed answer is t hat , if you don't explicit ly declare a copy const ruct or for a given class, t he com piler will creat e one for you. I f t his doesn't sound scary, I don't know what does.
Be w a r e of de fa u lt copy con st r u ct or s! The copy const ruct or generat ed by t he com piler is probably wrong! Aft er all, what can a dum b com piler know about copying user defined classes? Sure, it t ries t o do it s best - - it • does a bit wise copy of all t he dat a m em bers t hat are of built - in t ypes and
123
•
calls respect ive copy const ruct ors for user- defined em bedded obj ect s. But t hat 's it . Any t im e it encount ers a point er it sim ply duplicat es it . I t does not creat e a copy of t he obj ect point ed t o by t he point er. That m ight be okay, or not - - only t he creat or of t he class knows for sure. This kind of operat ion is called a shallow copy, as opposed t o a deep copy which follows all t he point ers. Shallow copy is fine when t he point ed- t o dat a st ruct ures can be easily shared bet ween m ult iple inst ances of t he obj ect . But consider, as an exam ple, what happens when we m ake a shallow copy of t he t op node of a parse t ree. I f t he t op node has children, a shallow copy will not clone t he child nodes. We will end up wit h t wo t op nodes, bot h point ing t o t he sam e child nodes. That 's not a problem unt il t he dest ruct or of one of t he t op nodes is called. I t prom pt ly delet es it s children. And what is t he second t op node point ing t o now? A piece of garbage! The m om ent it t ries t o access t he children, it will st om p over reclaim ed m em ory wit h disast rous result s. But even if it does not hing, event ually it s own dest ruct or is called. And t hat dest ruct or will at t em pt t o delet e t he sam e children t hat have already been delet ed by t he first t op node. The result ? Mem ory corrupt ion. But wait , t here's m ore! C+ + not only sneaks a default copy const ruct or on you. I t also provides you wit h a convenient default assignm ent operat or.
Be w a r e of de fa u lt a ssign m e n t s! The following code is perfect ly legal. SymbolTable symTab1 (100); SymbolTable symTab2 (200); // ... symTab1 = symTab2; Not only does it perform a shallow copy of symTab2 int o symTab1, but it also clobbers what ever already was t here in symTab1. All m em ory t hat was allocat ed in symTab1 is lost , never t o be reclaim ed. I nst ead, t he m em ory allocat ed in symTab2 will be double delet ed. Now t hat 's a bargain! Why does C+ + quiet ly let a skunk int o our house and wait s for us t o run int o it in t he dark? I f you've followed t his book closely, you know t he answer- - com pat ibilit y wit h C! You see, in C you can't define a copy const ruct or, because t here aren't any const ruct ors. So every t im e you want ed t o copy som et hing m ore com plex t han an int, you'd have t o writ e a special funct ion or a m acro. So, in t he t radit ional spirit of let t ing program m ers shoot t hem selves in t he foot , C provided t his addit ional facilit y of quiet ly copying all t he user- defined struct's. Now, in C t his wasn't such a big deal- - a C struct is j ust raw dat a. There is no dat a hiding, no m et hods, no inherit ance. Besides, t here are no references in C, so you are m uch less likely t o inadvert ent ly copy a struct, j ust because you forgot one am persand. But in C+ + it 's a com plet ely different st ory. Beware- - m any a bug in C+ + is a result of a m issing am persand. But wait , t here's even m ore! C+ + not only offers a free copy const ruct or and a free assignm ent , it will also quiet ly use t hese t wo t o ret urn obj ect s from funct ions. Here's a fragm ent of code from our old im plem ent at ion of a st ackbased calculat or, except for one sm all m odificat ion. Can you spot it ? class Calculator
124
{ public: int Execute (Input & input); IStack const GetStack () const { return _stack; } private: int Calculate (int n1, int n2, int token) const; IStack
_stack;
}; What happened here was t hat I om it t ed t he am persand in t he ret urn t ype of Calculator::GetStack and IStack is now ret urned by value. Let 's have a very close look at what happens during such t ransfer. Also, let 's assum e for a m om ent t hat t he com piler doesn't do any clever opt im izat ions here. I n part icular, let 's define GetStack out of line, so a regular funct ion call has t o be execut ed. IStack const Calculator::GetStack () const { return _stack; } //... IStack const stk; stk = calc.GetStack (); // = newSize) newSize = idxMax + 1; // allocate new array T * arrNew = new T [newSize]; // copy all entries int i; for (i = 0; i < _capacity; ++i) arrNew [i] = _arr [i]; for (; i < newSize; ++i) arrNew [i] = _valDefault; _capacity = newSize; // free old memory delete []_arr; // substitute new array for old array _arr = arrNew;
175
} Now it 's t im e t o m ake use of our dynam ic array t em plat e t o see how easy it really is. Let 's st art wit h t he class MultiNode. I n t he old, lim it ed, im plem ent at ion it had t wo arrays: an array of point ers t o Node and an array of Boolean flags. Our first st ep is t o change t he t ypes of t hese arrays t o, respect ively, DynArray and DynArray. We have t o pass default values t o t he const ruct ors of t hese arrays in t he pream ble t o MultiNode const ruct or. These m et hods t hat j ust access t he arrays will work wit h no changes ( due t o our overloading of operat or [ ] ) , except for t he places where we used t o check for array bounds. Those are t he places where we m ight have t o ext end t he arrays, so we should use t he new Add m et hod. I t so happens t hat t he only place we do it is inside t he AddChild m et hod and t he conversion is st raight forward. class MultiNode: public Node { public: MultiNode (Node * pNode) : _aChild (0), _aIsPositive (false), _iCur (0) { AddChild (pNode, true); } ~MultiNode (); void AddChild (Node * pNode, bool isPositive) { _aChild.Add (_iCur, pNode); _aIsPositive.Add (_iCur, isPositive); ++_iCur; } protected: int _iCur; DynArray _aChild; DynArray _aIsPositive; }; MultiNode::~MultiNode () { for (int i = 0; i < _iCur; ++i) delete _aChild [i]; } Let 's have one m ore look at t he Calc m et hod of SumNode. Ot her t han for t he rem oval of error checking ( we have got t en rid of t he unnecessary flag, _isError) , it works as if not hing have changed. double SumNode::Calc () const { double sum = 0.0; for (int i = 0; i < _iCur; ++i) { double val = _aChild [i]->Calc ();
176
if (_aIsPositive[i]) sum += val; else sum --= val; } return sum; } The only difference is t hat when we access our arrays _aChild [i] and _aIsPositive [i], we are really calling t he overloaded operat or [ ] of t he respect ive dynam ic arrays. And, by t he way, since t he m et hod Calc is const , it is t he const version of t he overload we're calling. I sn't t hat beaut iful?
Se p aratin g Fu n ctio n ality in to N e w Clas s e s I 'm not happy wit h t he st ruct uring of t he sym bol t able. Just one look at t he seven dat a m em bers t ells m e t hat a new class is budding. ( Seven is t he m agic num ber.) Here t hey are again: HTable _htab; int * _offStr; int _capacity; int _curId; char * _strBuf; int _bufSize; int _curStrOff; The rule of t hum b is t hat if you have t oo m any dat a m em bers you should group som e of t hem int o new classes. This is act ually one of t he t hree rules of class form at ion. You need a new class when t here are • t oo m any local variables in a funct ion, • t oo m any dat a m em bers in a class or • t oo m any argum ent s t o a funct ion. I t will becom e clearer why t hese rules m ake perfect sense ( and why seven is t he m agic num ber) when we t alk about ways of dealing wit h com plexit y in t he t hird part of t his book. The last t hree dat a m em bers of t he sym bol t able are perfect candidat es for a new st ring- buffer obj ect . The st ring buffer is able t o st ore st rings and assign t hem num bers, called offset s, t hat uniquely ident ify t hem . As a bonus, we'll m ake t he st ring buffer dynam ic, so we won't have t o worry about overflowing it wit h t oo m any st rings. class StringBuffer { public: StringBuffer () : _strBuf (0), _bufSize (0), _curStrOff (0) {} ~StringBuffer () { delete _strBuf; } int AddString (char const * str); char const * GetString (int off) const {
177
assert (off < _curStrOff); return &_strBuf [off]; } private: void Reallocate (int addLen); char int int
* _strBuf; _bufSize; _curStrOff;
}; When t he buffer runs out of space, t he AddString m et hod reallocat es t he whole buffer. int StringBuffer::AddString (char const * str) { int len = strlen (str); int offset = _curStrOff; // is there enough space? if (_curStrOff + len + 1 >= _bufSize) { Reallocate (len + 1); } // copy the string there strncpy (&_strBuf [_curStrOff], str, len); // calculate new offset _curStrOff += len; _strBuf [_curStrOff] = 0; // null terminate ++_curStrOff; return offset; } The reallocat ion follows t he st andard doubling pat t ern- - but m aking sure t hat t he new st ring will fit no m at t er what . void StringBuffer::Reallocate (int addLen) { int newSize = _bufSize * 2; if (newSize AddChild (pRight, (token == tPlus)); token = _scanner.Token(); } while (token == tPlus || token == tMinus); The call t o Term ret urns a node point er t hat is t em porarily st ored in pRight. Then t he MultiNode's m et hod AddChild is called, and we know very well t hat it
194
m ight t ry t o resize it s array of children. I f t he reallocat ion fails and an except ion is t hrown, t he t ree point ed t o by pRight will never be deallocat ed. We have a m em ory leak. Before I show you t he syst em at ic solut ion t o t his problem , let 's t ry t he obvious t hing. Since our problem st em s from t he presence of a naked point er, let 's creat e a special purpose class t o encapsulat e it . This class should acquire t he node in it s const ruct or and release it in t he dest ruct or. I n addit ion t o t hat , we would like obj ect s of t his class t o behave like regular point ers. Here's how we can do it . class NodePtr { public: NodePtr (Node * pNode) : _p (pNode) {} ~NodePtr () { delete _p; } Node * operator-->() const { return _p; } Node & operator * () const { return _p; } private: Node * _p; }; Such obj ect s are called safe or sm art point ers. The point er- like behavior is im plem ent ed by overloading t he point er- access and point er- dereference operat ors. This clever device m akes an obj ect behave like a point er. I n part icular, one can call all t he public m et hods ( and access all public dat a m em bers, if t here were any) of Node by " dereferencing" an obj ect of t he t ype NodePtr. { Node * pNode = Expr (); NodePtr pSmartNode (pNode); double x = pSmartNode->Calc (); // pointer-like behavior ... // Automatic destruction of pSmartNode. // pNode is deleted by its destructor. } Of course, a sm art point er by it self will not solve our problem s in t he parser. Aft er all we don't want t he nodes creat ed by calling Term or Factor t o be aut om at ically dest royed upon norm al exit from t he scope. We want t o be able t o build t hem int o t he parse t ree whose lifet im e ext ends well beyond t he local scope of t hese m et hods. To do t hat we will have t o relax our First Rule of Acquisit ion .
Ow n e rs h ip Tran s fe r: Firs t Atte m p t When t he lifet im e of a given resource can be m apped int o t he lifet im e of som e scope, we encapsulat e t his resource in a sm art point er and we're done. When t his can't be done, we have t o pass t he resource bet ween scopes. There are t wo possible direct ions for such t ransfer: up and down. A resource m ay be passed up from a procedure t o t he caller ( ret urned) , or it can be passed down from a caller t o t he procedure ( as an argum ent ) . We assum e t hat before being passed, t he resource is owned by som e t ype of owner obj ect ( e.g., a sm art point er) .
195
Passing a resource down t o a procedure is relat ively easy. We can sim ply pass a reference t o t he owner obj ect ( a sm art point er, in our case) and let t he procedure acquire t he ownership from it . We'll add a special m et hod, Release, t o our sm art point er t o release t he ownership of t he resource. Node * NodePtr::Release () { Node * tmp = _p; _p = 0; return tmp; } The im port ant t hing about Release is t hat it zeroes t he int ernal point er, so t hat t he dest ruct or of NodePtr will not delet e t he obj ect ( delete always checks for a null point er and ret urns im m ediat ely) . Aft er t he call t o Release t he sm art point er no longer owns t he resource. So who owns it ? Whoever called it bet t er provide a new owner! This is how we can apply t his m et hod in our program . Here, t he node resource is passed from t he Parser's m et hod Expr down t o t he MultiNode's m et hod AddChild. do { _scanner.Accept(); NodePtr pRight (Term ()); pMultiNode->AddChild (pRight, (token == tPlus)); token = _scanner.Token(); } while (token == tPlus || token == tMinus); AddChild acquires t he ownership of t he node by calling t he Release m et hod and passes it im m ediat ely t o t he vect or _aChild ( if you see a problem here, read on, we'll t ackle it lat er) . void MultiNode::AddChild (NodePtr & pNode, bool isPositive) { _aChild.push_back (pNode.Release ()); _aIsPositive.push_back (isPositive); } Passing a resource up is a lit t le t rickier. Technically, t here's no problem . We j ust have t o call Release t o acquire t he resource from t he owner and t hen ret urn it back. For inst ance, here's how we can ret urn a node from Parser::Expr. Node * Parser::Expr () { // Parse a term NodePtr pNode (Term ()); ... return pNode.Release (); } What m akes it t ricky is t hat now t he caller of Expr has a naked point er. Of course, if t he caller is sm art , he or she will im m ediat ely find a new owner for t his point er- - presum ably a sm art point er- - j ust like we did a m om ent ago wit h t he result of Term. But it 's one t hing t o expect everybody t o t ake special care of
196
t he naked point ers ret urned by new and Release, and quit e a different st ory t o expect t he sam e level of vigilance wit h every procedure t hat happens t o ret urn a point er. Especially t hat it 's not im m ediat ely obvious which ones are ret urning st rong point ers t hat are supposed t o be delet ed, and which ret urn weak point ers t hat m ust not be delet ed. Of course, you m ay chose t o st udy t he code of every procedure you call and find out what 's expect ed from you. You m ight hope t hat a procedure t hat t ransfer ownership will be appropriat ely com m ent ed in it s header file. Or you m ight rely on som e special nam ing convent ion- - for inst ance st art t he nam es of all resource- ret urning procedures wit h t he prefix " Query" ( been t here! ) . Fort unat ely, you don't have t o do any of t hese horrible t hings. There is a bet t er way. Read on! To sum m arize, even t hough t here are som e big holes in our m et hodology, we have accom plished no m ean feat . We have encapsulat ed all t he resources following t he First Rule of Acquisit ion. This will guarant ee aut om at ic cleanup in t he face of except ions. We have a crude m et hod of t ransfering resources up and down bet ween owners.
Ow n e rs h ip Tran s fe r: Se co n d Atte m p t So far our at t em pt at resource t ransfer t hrough procedure boundaries have been t o release t he resource from it s owner, pass it in it s " naked" form and t hen im m ediat ely encapsulat e it again. The obvious danger is t hat , alt hough t he passing happens wit hin a few nanosecond in a running program , t he code t hat accept s t he resource m ay be writ t en m ont hs or even years aft er t he code t hat releases it . The t wo sides of t he procedure barrier don't necessarily t alk t o each ot her. But who says t hat we have t o " undress" t he resource for t he durat ion of t he t ransfer? Can't we pass it t oget her wit h it s encapsulat or? The short answer is a resounding yes! The longer answer is necessary in order t o explain why it wouldn't work wit hout som e unort hodox t hinking. First of all, if we were t o pass a sm art point er " as is" from a procedure t o t he caller, we'd end up wit h a dangling point er. NodePtr Parser::Expr () { // Parse a term NodePtr pNode = Term (); // Calc (); if (*isPosIt) sum += val; else sum -= val; } assert (isPosIt == _aIsPositive.end ()); return sum; } I said " m ight ," because it 's not im m ediat ely obvious t hat t his st yle of coding, using it erat ors, is m ore advant ageous t hat t he t radit ional array- index it erat ion; especially when t wo parallel arrays are involved. I had t o use t he com m a sequencing operat or t o squeeze t wo increm ent operat ions int o one slot in t he for- loop header. ( Expressions separat ed by com m as are evaluat ed in sequence. The value of t he sequence is equal t o t he value of it s last expression.) On t he ot her hand, t his code would be easier t o convert if we were t o reim plem ent MultiNode t o use linked list s inst ead of vect ors. That , however, seem s rat her unlikely.
Erro r Pro p agatio n Now t hat our code is except ion- safe, we should reconsider our errorhandling policy. Look what we've been doing so far when we det ect ed a synt ax error. We set t he parser's st at us t o stError and ret urned a null point er from what ever parsing m et hod we were in. I t so happened t hat all synt ax errors were det ect ed at t he lowest level, inside Parser::Factor. However, bot h Parser::Term and Parser::Expr had t o deal wit h t he possibilit y of a null node com ing from a lower- level parsing m et hod. I n fact , Parser::Factor it self had t o deal wit h t he possibilit y t hat t he recursive call t o Expr m ight ret urn a null. Our code was sprinkled wit h error- propagat ion art ifact s like t his one: if (pNode.get () == 0) return pNode; Whenever t here is a sit uat ion where an error has t o be propagat ed st raight t hrough a set of nest ed calls, one should consider using except ions. I f we let Parser::Factor t hrow an except ion whenever it det ect s a synt ax error, we won't have t o worry about det ect ing and propagat ing null point ers t hrough ot her parsing m et hods. All we'll need is t o cat ch t his except ion at t he highest level- say in Parser::Parse. class Syntax {}; Status Parser::Parse () {
208
try { // Everything is an expression _pTree = Expr (); if (!_scanner.IsDone ()) _status = stError; } catch (Syntax) { _status = stError; } return _status; } Not ice t hat I defined a separat e class of except ions, Syntax, for propagat ing synt ax errors. For now t his class is em pt y, but it s t ype let s m e dist inguish it from ot her t ypes of except ions. I n part icular, I don't want t o cat ch bad_alloc except ions in Parser::Parse, since I don't know what t o do wit h t hem . They will be caught and dealt wit h in main. Here's an exam ple of code from Parser::Factor convert ed t o use except ions for synt ax error report ing. Not ice t hat we no longer t est for null ret urn from Expr ( in fact we can assert t hat it 's not null! ) . if (_scanner.Token () == tLParen) // function call { _scanner.Accept (); // accept '(' pNode = Expr (); assert (pNode.get () != 0); if (_scanner.Token () == tRParen) _scanner.Accept (); // accept ')' else throw Syntax (); if (id != SymbolTable::idNotFound && id < _funTab.Size ()) { pNode = auto_ptr ( new FunNode (_funTab.GetFun (id), pNode)); } else { cerr IncRefCount (); } ~RefPtr () { Release (); } RefPtr const & operator= (RefPtr const & p) { if (this != &p) { Release (); _p = p._p; _p->IncRefCount (); } return *this; } protected: RefPtr (T * p) : _p (p) {} void Release () { if (_p->DecRefCount () == 0) delete _p; }
218
T * _p; }; Not ice t hat t he reference- count ed t ype T m ust provide at least t wo m et hods, IncRefCount and DecRefCount. We also t acit ly assum e t hat it is creat ed wit h a reference count of one, before being passed t o t he prot ect ed const ruct or of RefPtr. Alt hough it 's not absolut ely necessary, we m ight want t he t ype T t o be a descendant of a base class t hat im plem ent s reference count ing int erface. class RefCounted { public: RefCounted () : _count (1) {} int GetRefCount () const { return _count; } void IncRefCount () const { _count++; } int DecRefCount () const { return --_count; } private: mutable int _count; }; Not ice one int erest ing det ail, t he m et hods IncRefCount and DecRefCount are declared const, even t hough t hey m odify t he obj ect 's dat a. You can do t hat , wit hout t he com piler raising an eyebrow, if you declare t he relevant dat a m em ber mutable. We do want t hese m et hods t o be const ( or at least one of t hem , IncRefCount) because t hey are called on const obj ect s in RefPtr. Bot h t he copy const ruct or and t he assignm ent operat or t ake const references t o t heir argum ent s, but t hey m odify t heir reference count s. We decided not t o consider t he updat ing of t he reference count a " m odificat ion" of t he obj ect . I t will m ake even m ore sense when we get t o t he copy- on- writ e im plem ent at ion. Just for t he dem onst rat ion purposes, let 's creat e a reference- count ed st ring represent at ion using our original StringVal. Norm ally, one would do it m ore efficient ly, by com bining t he reference count wit h t he st ring buffer. class StringRep: public RefCounted { public: StringRep (char const * cstr) :_string (cstr) {} char const * c_str () const { return _string.c_str (); } void Upcase () { _string.Upcase (); } private: StringVal _string; }; Our act ual st ring class is built on t he base of RefPtr which int ernally represent s st ring dat a wit h StringRep. class StringRef: public RefPtr {
219
public: StringRef (char const * cstr) : RefPtr (new StringRep (cstr)) {} StringRef (StringRef const & str) : RefPtr (str) {} char const * c_str () const { return _p->c_str (); } void Upcase () { _p->Upcase (); } }; Ot her t han in t he special C- st ring- t aking const ruct or, t here is no copying of dat a. The copy const ruct or j ust increm ent s t he reference count of t he st ringrepresent at ion obj ect . So does t he ( inherit ed) assignm ent operat or. Consequent ly, " copying" and passing a StringRef by value is relat ively cheap. There is only one t iny problem wit h t his im plem ent at ion. Aft er you call Upcase on one of t he copies of a StringRef, all ot her copies change t o upper case. StringRef strOriginal ("text"); StringRef strCopy (strOriginal); strCopy.Upcase (); // The original will be upper-cased! cout > _number; // read the whole number break; } Reading a float ing- point num ber from t he st andard input is easy. The only com plicat ion arises from t he fact t hat we've already read t he first charact er of t he num ber- - our lookahead. So before we read t he whole num ber, we have t o put our lookahead back int o t he st ream . Don't worry, t his is a sim ple operat ion. Aft er all, t he input st ream is buffered. When you call get, t he charact er is sim ply read from a buffer ( unless t he buffer is em pt y- - in t hat case t he syst em replenishis it by act ually reading t he input ) . Unget t ing a charact er j ust m eans put t ing it back int o t hat buffer. I nput st ream s are im plem ent ed in such a way t hat it 's always possible t o put back one charact er. When reading an ident ifier, we do a slight variat ion of t he sam e t rick. default: if (isalpha (_look) || _look == '_') { _token = tIdent; _symbol.erase (); // erase string contents do { _symbol += _look; _look = _in.get (); } while (isalnum (_look)); _in.putback (_look); } else _token = tError; break; We don't have t o putback a lookahead at t he beginning of reading an ident ifier. I nst ead, we have t o putback t he last charact er, t he one t hat is not part of t he ident ifier, so t hat t he next call t o ReadChar () can see it . Haven't we lost som e generalit y by swit ching from a st ring t o a st ream in our im plem ent at ion of t he scanner? Aft er all, you can always convert a st ream t o a st ring ( e.g., using getline ()) . I s t he opposit e possible? Not t o worry! Convert ing a st ring int o a st ream is as easy. The appropriat e class is called istringstream and is defined in t he header < sst r e a m > . Since istringstream inherit s from istream, our scanner won't not ice t he difference. For inst ance, we can do t his: std::istringstream in ("sin (2 * pi / 3)"); Scanner scanner (in);
226
We have j ust skim m ed t he surface of t he st andard library and we've already found a lot of useful st uff. I t really pays t o st udy it , rat her t han im plem ent your own solut ions from scrat ch.
227
Co d e R e v i e w 7 : S e r i a l i z a t i o n a n d D e s e rializatio n Th e Calcu lato r Obje ct Look at main: There are t oo m any obj ect s t here. The sym bol t able, t he funct ion t able and t he st ore. All t hree obj ect s have t he sam e lifespan- - t he durat ion of t he program execut ion. They have t o be init ialized in part icular order and all t hree of t hem are passed t o t he const ruct or of t he parser. They j ust scream t o be com bined int o a single obj ect called- - you guessed it - - t he Calculator. Em bedding t hem in t he right order inside t his class will t ake care of t he correct order of init ializat ion. class Calculator { friend class Parser; public: Calculator () : _funTab (_symTab), _store (_symTab) {} private: Store & GetStore () { return _store; } PFun GetFun (int id) const { return _funTab.GetFun (id); } bool IsFunction (int id) const { return id < _funTab.Size (); } int AddSymbol (std::string const & str) { return _symTab.ForceAdd (str); } int FindSymbol (std::string const & str) const { return _symTab.Find (str); } SymbolTable _symTab; Function::Table _funTab; Store _store; }; Of course, now we have t o m ake appropriat e changes ( read: sim plificat ions) in main and in t he parser. Here are j ust a few exam ples- - in t he declarat ion of t he parser: class Parser { public: Parser (Scanner & scanner, Calculator & calc); ... private: ... Scanner & _scanner;
228
auto_ptr Status Calculator
_pTree; _status; & _calc;
}; and in it s im plem ent at ion. // Factor := Ident if (id == SymbolTable::idNotFound) { id = _calc.AddSymbol (strSymbol); } pNode = auto_ptr (new VarNode (id, _calc.GetStore ())); Have you not iced som et hing? We j ust went ahead and m ade anot her m aj or t op- level change in our proj ect , j ust like t his! I n fact it was alm ost t rivial t o do, wit h j ust a lit t le help from t he com piler. Here's t he prescript ion. St art in t he spot in main where t he sym bol t able, funct ion t able and st ore are defined ( const ruct ed) . Replace t hem wit h t he new obj ect , calculat or. Declare t he class for Calculator and writ e a const ruct or for it . Now, if you are really lazy and t ired of t hinking, fire off t he com piler. I t will im m ediat ely t ell you what t o do next : You have t o m odify t he const ruct or of t he parser. You have t o pass it t he calculat or rat her t han it s t hree separat e part s. At t his point you m ight not ice t hat it will be necessary t o change t he class declarat ion of t he Parser t o let it st ore a reference t o t he Calculator. Or, you could run t he com piler again and let it rem ind you of it . Next , you will not ice all t he com pilat ion errors in t he im plem ent at ion of Parser. You can fix t hem one- by- one, adding new m et hods t o t he Calculator as t he need arises. The whole procedure is so sim ple t hat you m ight ask an int ern who has j ust st art ed working on t he proj ect t o do it wit h m inim al supervision. The m oral of t his st ory is t hat it 's never t oo lat e t o work on t he im provem ent of t he high level st ruct ure of t he proj ect . The t rut h is t hat you rarely get it right t he first t im e. And, by t he way, you have j ust seen t he m et hod of t op- down program m odificat ion. You st art from t he t op and let t he com piler lead you all t he way down t o t he nit t y- grit t y det ails of t he im plem ent at ion. That 's t he t hird part of t he t op- down m et hodology which consist s of: • Top- down design • Top- down im plem ent at ion and • Top- down m odificat ion. I can't st ress enough t he im port ance of t he t op- down m et hodology. I have yet t o see a clean, well writ t en piece of code t hat was creat ed bot t om - up. You'll hear people saying t hat som e t hings are bet t er done t op- down, ot hers bot t om up. Som e people will say t hat st art ing from t he m iddle and expanding in bot h direct ions is t he best way t o go. Take all such st at em ent s wit h a very big grain of salt . I t is a fact t hat bot t om - up developm ent is m ore nat ural when you have no idea what you're doing- - when your goal is not t o writ e a specific program , but rat her t o play around wit h som e " neat stuff." I t 's an easy way, for inst ance, t o learn t he int erface t o som e obscure subsyst em t hat you m ight want t o use. Bot t om - up developm ent is also preferable if you're not very good at design or if you dislike j ust sit t ing t here and t hinking inst ead of coding. I t is a plus if you
229
enj oy long hours of debugging or have som ebody else ( hopefully not t he end user! ) t o debug your code. Finally, if you em brace t he bot t om - up philosophy, you'll have t o resign yourself t o never being able t o writ e a professionally looking piece of code. Your program s will always look t o t he t rained eye like t hose elect ronics proj ect s creat ed wit h Radio Shack part s, on breadboards, wit h bent wires st icking out in all direct ions and bat t eries held t oget her wit h rubber bands. The real reason I decided t o finally get rid of t he t op level m ess and int roduce t he Calculator obj ect was t o sim plify t he j ob of adding a new piece of funct ionalit y. Every t im e t he m anagem ent asks you t o add new feat ures, t ake t he opport unit y t o sneak in a lit t le rewrit e of t he exist ing code. The code isn't good enough if it hasn't been rewrit t en at least t hree t im es. I 'm serious! By rewrit ing I don't m ean t hrowing it away and st art ing from scrat ch. Just t ake your t im e every now and t hen t o im prove t he st ruct ure of each part of t he proj ect . I t will pay off t rem endously. I t will act ually short en t he developm ent cycle. Of course, if you have st ress- puppy m anagers, you'll have a hard t im e convincing t hem about it . They will keep running around shout ing nonsense like " if it ain't broken, don't fix it " or " if we don't ship it t om orrow, we are all dead." The m om ent you buy int o t hat , you're doom ed! You'll never be able t o do anyt hing right and you'll be spending m ore and m ore t im e fixing t he scaffolding and chasing bugs in som e low qualit y t em porary code pronounced t o be of t he " ain't broken" qualit y. Welcom e t o t he m aint enance night m are! So here we are, alm ost at t he end of our proj ect , when we are t old t hat if we don't provide a com m and t o save and rest ore t he st at e of t he calculat or from a file, we're dead. Fort unat ely, we can add t his feat ure t o t he program wit hout m uch t rouble and, as a bonus, do som e m ore cleanup.
Co m m an d Pars e r We'll go about adding new funct ionalit y in an orderly fashion. We have t o provide t he user wit h a way t o input com m ands. So far we've had a hack for input t ing t he quit com m and- - an em pt y line was int erpret ed as " quit ." Now t hat we want t o add t wo m ore com m ands, save and rest ore, we can as well find a m ore general solut ion. I probably don't have t o t ell you t hat , but ... W h e n e ve r t h e r e a r e m or e t h a n t w o spe cia l ca se s, you sh ou ld ge n e r a lize them . The calculat or expect s expressions from t he user. Let 's dist inguish com m ands from expressions by prefixing t hem wit h an exclam at ion sign. Exclam at ion has t he nat ural connot at ion of com m anding som ebody t o do som et hing. We'll use a prefix rat her t han a suffix t o sim plify our parsing. We'll also m ake quit a regular com m and; t o be input as " ! q" . We'll even rem ind t he user of t his com m and when t he calculat or st art s. cerr Size () - _curOff); xact.LogSecond (para2); Paragraph * oldPara = _curPara; // May throw an exception! SubstCurPara (para1, para2); xact.Commit (); delete oldPara; // destructor of xact executed } This is how t he t ransact ion obj ect is im plem ent ed. class Transaction { public: Transaction () : _commit (false), _para1 (0), _para2 (0) {} ~Transaction () { if (!_commit) { // unroll all the actions delete _para2; delete _para1; } } void LogFirst (Paragraph * para) { _para1 = para; } void LogSecond (Paragraph * para) { _para2 = para; } void Commit () { _commit = true; } private: bool _commit; Paragraph * _para1; Paragraph * _para2; }; Not ice how carefully we prepare all t he ingredient s for t he t ransact ion. We first allocat e all t he resources and log t hem in our t ransact ion obj ect . The new paragraphs are now owned by t he t ransact ion. I f at any point an except ion is t hrown, t he dest ruct or of t he Transact ion, st ill in it s non- com m it t ed st at e, will perform a rollback and free all t hese resources. Once we have all t he resources ready, we m ake t he swit ch- - new resources go int o t he place of t he old ones. The swit ch operat ion usually involves t he m anipulat ion of som e point ers or array indexes. Once t he swit ch has been done, we can com m it t he t ransact ion. From t hat point on, t he t ransact ion no longer owns t he new paragraphs. The dest ruct or of a com m it t ed t ransact ion usually does not hing at all. The swit ch m ade t he docum ent t he owner of t he new paragraphs and, at t he sam e t im e, freed t he ownership of t he old paragraph which we t hen prom pt ly delet e. All sim ple t ransact ions follow t his pat t ern: • Allocat e and log all t he resources necessary for t he t ransact ion. • Swit ch new resources in t he place of old resources and com m it . • Clean up old resources.
247
Figure: Prepared t ransact ion. The t ransact ion owns all t he new resources. The m ast er dat a st ruct ure owns t he old resources.
Figure: Abort ing a t ransact ion. The t ransact ion's dest ruct or frees t he resources.
Figure 3- - 11 The swit ch. The m ast er dat a st ruct ure releases t he old resources and t akes t he ownership of t he new resources.
Figure 3- - 12 The cleanup. Old resources are freed and t he t ransact ion is delet ed.
Pe r s is t e n t Tr a n s a ct io n s When designing a persist ent t ransact ion- - one t hat m anipulat es persist ent dat a st ruct ures- - we have t o t hink of recovering from such disast ers as syst em crashes or power failures. I n cases like t hose, we are not so m uch worried about in- m em ory dat a st ruct ures ( t hese will be lost anyway) , but about t he persist ent , on- disk, dat a st ruct ures. A persist ent t ransact ion goes t hrough sim ilar st ages as t he t ransient one. • Preparat ion: New inform at ion is writ t en t o disk. • Com m it m ent : The new inform at ion becom es current , t he old is disregarded. • Cleanup: The old inform at ion is rem oved from disk.
248
A syst em crash can happen before or aft er com m it m ent m ent ( I 'll explain in a m om ent why it can't happen during t he com m it ) . When t he syst em com es up again, we have t o find all t he int errupt ed t ransact ions ( t hey have t o leave som e t race on disk) and do one of t wo t hings: if t he t ransact ion was int errupt ed before it had a chance t o com m it , we m ust unroll it ; ot herwise we have t o com plet e it . Bot h cases involve cleanup of som e on- disk dat a. The unrolling m eans delet ing t he inform at ion writ t en in preparat ion for t he t ransact ion. The com plet ing m eans delet ing t he old inform at ion t hat is no longer needed.
Figure: The Swit ch. I n one at om ic writ e t he on- disk dat a st ruct ure changes it s cont ent s. The crucial part of t he t ransact ion is, of course, com m it m ent m ent . I t 's t he " flipping of t he swit ch." I n one at om ic operat ion t he new inform at ion becom es current and t he old becom es invalid. An at om ic operat ion eit her succeeds and leaves a perm anent t race on disk, or fails wit hout leaving a t race. That shouldn't be difficult , you'd say. How about sim ply writ ing som et hing t o a file? I t eit her succeeds or fails, doesn't it ? Well, t here's t he rub! I t doesn't ! I n order t o underst and t hat , we have t o delve a lit t le int o t he int ernals of a file syst em . First of all, writ ing int o a file doesn't m ean writ ing t o disk. Or, at least , not im m ediat ely. I n general, file writ es are buffered and t hen cached in m em ory before t hey are physically writ t en t o disk. All t his is quiet ly done by t he runt im e ( t he buffering) and by t he operat ing syst em ( t he caching) in order t o get reasonable perform ance out of your m achine. Disk writ es are so incredibly slow in com parison wit h m em ory writ es t hat caching is a m ust . What 's even m ore im port ant : t he order of physical disk writ es is not guarant eed t o follow t he order of logical file writ es. I n fact t he file syst em goes out of it s way t o com bine writ es based on t heir physical proxim it y on disk, so t hat t he m agnet ic head doesn't have t o m ove t oo m uch. And t he physical layout of a file m ight have not hing t o do wit h it s cont iguous logical shape. Not t o m ent ion writ es t o different files t hat can be quit e arbit rarily reordered by t he syst em , no m at t er what your program t hinks. Thirdly, cont iguous writ es t o a single file m ay be split int o several physical writ es depending on t he disk layout set up by your file syst em . You m ight be writ ing a single 32- bit num ber but , if it happens t o st raddle sect or boundaries, one part of it m ight be writ t en t o disk in one writ e and t he ot her m ight wait for anot her sweep of t he cache. Of course, if t he syst em goes down bet ween t hese t wo writ es, your dat a will end up part ially writ t en. So m uch for at om ic writ es. Now t hat I have convinced you t hat t ransact ions are im possible, let m e explain a few t ricks of t rade t hat m ake t hem possible aft er all. First of all, t here is a file syst em call, Flush, t hat m akes 100% sure t hat t he file dat a is writ t en t o t he disk. Not at om ically, m ind you- - Flush m ay fail in t he m iddle of writ ing a 32- bit num ber. But once Flush succeeds, we are guarant eed t hat t he dat a is safely st ored on disk. Obviously, we have t o flush t he new dat a t o disk before we go
249
about com m it t ing a t ransact ion. Ot herwise we m ight wake up aft er a syst em crash wit h a com m it t ed t ransact ion but incom plet e dat a st ruct ure. And, of course, anot her flush m ust finish t he com m it t ing a t ransact ion. How about at om icit y? How can we at om ically flip t he swit ch? Som e dat abases go so far as t o inst all t heir own file syst em s t hat support at om ic writ es. We won't go t hat far. We will assum e t hat if a file is sm all enough, t he writ es are indeed at om ic. " Sm all enough" m eans not larger t han a sect or. To be on t he safe side, m ake it less t han 256 byt es. Will t his work on every file syst em ? Of course, not ! There are som e file syst em s t hat are not even recoverable. All I can say is t hat t his m et hod will work on NTFS- - t he Windows NT( t m ) file syst em . You can quot e m e on t his. We are now ready t o t alk about t he sim plest im plem ent at ion of t he persist ent t ransact ion- - t he t hree file schem e.
Th e Th r e e -File Sch e m e An idealized word processor reads an input file, let s t he user edit it and t hen saves t he result . I t 's t he save operat ion t hat we are int erest ed in. I f we st art overwrit ing t he source file, we're asking for t rouble. Any kind of failure and we end up wit h a part ially updat ed ( read: corrupt ed! ) file. So here's anot her schem e: Writ e t he com plet e updat ed version of t he docum ent int o a separat e file. When you are done writ ing, flush it t o m ake sure t he dat a get s t o disk. Then com m it t he t ransact ion and clean up t he original file. To keep perm anent record of t he st at e of t he t ransact ion we'll need one m ore sm all file. The t ransact ion is com m it t ed by m aking one at om ic writ e int o t hat file. So here is t he t hree- file schem e: We st art wit h file A cont aining t he original dat a, file B wit h no dat a and a sm all 1- byt e file S ( for Swit ch) init ializedcont aint ain a zero. The t ransact ion begins. • Writ e t he new version of t he docum ent int o file B. • Flush file B t o m ake sure t hat t he dat a get s t o disk. • Com m it : Writ e 1 int o file S and flush it . • Em pt y file A. The m eaning of t he num ber st ored in file S is t he following: I f it s value is zero, file A cont ains valid dat a. I f it 's one, file B cont ains valid dat a. When t he program st art s up, it checks t he value st ored in S, loads t he dat a from t he appropriat e file and em pt ies t he ot her file. That 's it ! Let 's now analyze what happens if t here is a syst em crash at any point in our schem e. I f it happens before t he new value in file S get s t o t he disk, t he program will com e up and read zero from S. I t will assum e t hat t he correct version of t he dat a is st ill in file A and it will em pt y file B. We are back t o t he pre- t ransact ion st at e. The em pt ying of B is our rollback. Once t he value 1 in S get s t o t he disk, t he t ransact ion is com m it t ed. A syst em crash aft er t hat will result in t he program com ing back, reading t he value 1 from S and assum ing t hat t he correct dat a is in file B. I t will em pt y file A, t hus com plet ing t he t ransact ion. Not ice t hat dat a in file B is guarant eed t o be com plet e at t hat point : Since t he value in S is one, file B m ust have been flushed successfully. I f we want t o st art anot her save t ransact ion aft er t hat , we can sim ply int erchange t he roles of files A and B and com m it by changing t he value in S from one t o zero. To m ake t he schem e even m ore robust , we can choose som e random ( but fixed) byt e values for our swit ch, inst ead of zero and one. I n t his way we'll be m ore likely t o discover on- disk dat a corrupt ion- - som et hing t hat m ight always happen as long as disks are not 100% reliable and ot her
250
applicat ions can access our files and corrupt t hem . Redundancy provides t he first line of defense against dat a corrupt ion. This is how one m ight im plem ent a save t ransact ion. class SaveTrans { enum State { // some arbitrary bit patterns stDataInA = 0xC6, stDataInB = 0x3A }; public: SaveTrans () : _switch ("Switch"), _commit (false) { _state = _switch.ReadByte (); if (_state != stDataInA && state != stDataInB) throw "Switch file corrupted"; if (_state == stDataInA) { _data.Open ("A"); _backup.Open ("B"); } else { _data.Open ("B"); _backup.Open ("A"); } } File & GetDataFile () { return _data; } File & GetBackupFile () { return _backup; } ~SaveTrans () { if (_commit) _data.Empty (); else _backup.Empty (); } void Commit () { State otherState; if (_state == stDataInA) otherState = stDataInB; else otherState = stDataInA; _backup.Flush (); _switch.Rewind (); _switch.WriteByte (otherState); _switch.Flush (); _commit = true; } private: bool File
_commit; _switch;
251
File File State
_data; _backup; _state;
}; This is how t his t ransact ion m ight be used in t he process of saving a docum ent . void Document::Save () { SaveTrans xact; File &file = xact.GetBackupFile (); WriteData (file); xact.Commit (); } And t his is how it can be used in t he program init ializat ion. Document::Document () { SaveTrans xact; File &file = xact.GetDataFile (); ReadData (file); // Don't commit! // the destructor will do the cleanup } The sam e t ransact ion is used here for cleanup. Since we are not calling Commit, t he t ransact ion cleans up, which is exact ly what we need.
Th e M a p p in g -File Sch e m e You m ight be a lit t le concerned about t he perform ance charact erist ics of t he t hree- file schem e. Aft er all, t he docum ent m ight be a few m egabyt es long and writ ing it ( and flushing! ) t o disk every t im e you do a save creat es a serious overhead. So, if you want t o be a not ch bet t er t han m ost word processors, consider a m ore efficient schem e. The fact is t hat m ost of t he t im e t he changes you m ake t o a docum ent bet ween saves are localized in j ust a few places . Wouldn't it be m ore efficient t o updat e only t hose places in t he file inst ead of rewrit ing t he whole docum ent ? Suppose we divide t he docum ent int o " chunks" t hat fit each int o a single " page." By " page" I m ean a power- of- t wo fixed size subdivision. When updat ing a given chunk we could sim ply swap a page or t wo. I t 's j ust like swapping a few t iles in a bat hroom floor- - you don't need t o re- t ile t he whole floor when you j ust want t o m ake a sm all change around t he sink. St rict ly speaking we don't even need fixed size power- of- t wo pages, it j ust m akes t he flushes m ore efficient and t he bookkeeping easier. All pages m ay be kept in a single file, but we need a separat e " m ap" t hat est ablishes t he order in which t hey appear in t he docum ent . Now, if only t he " m ap" could fit int o a sm all swit ch file, we would perform t ransact ions by updat ing t he m ap. Suppose, for exam ple, t hat we want t o updat e page t wo out of a t en- page file. First we t ry t o find a free page in t he file ( we'll see in a m om ent how t ransact ions produce free pages) . I f a free page cannot be found, we j ust ext end t he file by adding t he elevent h page. Then we writ e t he new updat ed dat a int o t his free page. We now have t he current version of a part of t he
252
docum ent in page t wo and t he new version of t he sam e part in page eleven ( or what ever free page we used) . Now we at om ically overwrit e t he m ap, m aking page t wo free and page eleven t ake it s place. What if t he m ap doesn't fit int o a sm all file? No problem ! We can always do t he t hree- file t rick wit h t he m ap file. We can prepare a new version of t he m ap file, flush it and com m it by updat ing t he swit ch file.
Figure: The Mapping File Schem e: Before com m it t ing.
Figure: The Mapping File Schem e: Aft er com m it t ing. This schem e can be ext ended t o a m ult i- level t ree. I n fact several dat abases and even file syst em s use som et hing sim ilar, based on a dat a st ruct ure called a B- t ree.
253
Ove rlo a d in g o p e ra to r n e w Bot h new and delete are considered operat ors in C+ + . What it m eans, in part icular, is t hat t hey can be overloaded like any ot her operat or. And j ust like you can define a class- specific operator=, you can also define class- specific operat ors new and delete. They will be aut om at ically called by t he com piler t o allocat e and deallocat e obj ect s of t hat part icular class. Moreover, you can overload and override global versions of new and delete.
Clas s -s p e cific n e w Dynam ic m em ory allocat ion and deallocat ion are not cheap. A lot of program s spend t he bulk of t heir t im e inside t he heap, searching for free blocks, recycling delet ed blocks and m erging t hem t o prevent heap fragm ent at ion. I f m em ory m anagem ent is a perform ance bot t leneck in your program , t here are several opt im izat ion t echniques t hat you m ight use. Overloading new and delete on a per- class basis is usually used t o speed up allocat ion/ deallocat ion of obj ect s of t hat part icular class. There are t wo m ain t echniques- - caching and bulk allocat ion.
Ca ch in g The idea behind caching is t hat recycling is cheaper t han m anufact uring. Suppose t hat we want ed t o speed up addit ions t o a hash t able. Every t im e an addit ion is perform ed, a new link is allocat ed. I n our program , t hese links are only deallocat ed when t he whole hash t able is dest royed, which happens at t he end of t he program . I m agine, however, t hat we are using our hash t able in anot her program , where it 's eit her possible t o select ively rem ove it em s from t he hash t able, or where t here are m any hash t ables creat ed and dest royed during t he lifet im e of t he program . I n bot h cases, we m ight speed up average link allocat ion t im e by keeping around t he links t hat are current ly not in use. A FreeList obj ect will be used as st orage for unused links. To get a new link we call it s NewLink m et hod. To ret urn a link back t o t he pool, we call it s Recycle m et hod. The pool of links is im plem ent ed as a linked list . There is also a Purge m et hod t hat frees t he whole pool. class Link; class FreeList { public: FreeList () : _p (0) {} ~FreeList (); void Purge (); void * NewLink (); void Recycle (void * link); private: Link * _p; }; Class Link has a st at ic m em ber _freeList which is used by t he overloaded class- specific operat ors new and delete. Not ice t he assert ion in operat or new. I t
254
prot ect s us from som ebody calling t his part icular operat or for a different class. How could t hat happen? Operat ors new and delete are inherit ed. I f a class derived from Link didn't override t hese operat ors, new called for t he derived class would ret urn an obj ect of t he wrong size ( base- class size) . class Link { friend class FreeList; public: Link (Link * pNext, int id) : _pNext (pNext), _id (id) {} Link * Next () const { return _pNext; } int Id () const { return _id; } // allocator void * operator new (size_t size) { assert (size == sizeof (Link)); return _freeList.NewLink (); } void operator delete (void * mem) { if (mem) _freeList.Recycle (mem); } static void Purge () { _freeList.Purge (); } private: static FreeList _freeList; Link * int
_pNext; _id;
}; I nside List::Add t he creat ion of a new Link will be t ranslat ed by t he com piler int o t he call t o t he class- specific operat or new followed by t he call t o it s const ruct or ( if any) . The beaut y of t his m et hod is t hat no changes t o t he im plem ent at ion of List are needed. class List { public: List (); ~List () { while (_pHead != 0) { Link * pLink = _pHead; _pHead = _pHead->Next(); delete pLink; } } void Add (int id) { Link * pLink = new Link (_pHead, id); _pHead = pLink; }
255
Link const * GetHead () const { return _pHead; } private: Link * _pHead; }; A hash t able cont ains an array of Lists which will all int ernally use t he special- purpose allocat or for it s links. Aft er we are done wit h t he hash t able, we m ight want t o purge t he m em ory st ored in t he privat e allocat or. That would m ake sense if, for inst ance, t here was only one hash t able in our program , but it allowed delet ion as well as addit ion of ent ries. On t he ot her hand, if we want ed our pool of links t o be shared bet ween m ult iple hash t ables, we wouldn't want t o purge it every t im e a hash t able is dest royed. class HTable { public: explicit HTable (int size): _size(size) { _aList = new List [size]; } ~HTable () { delete [] _aList; // release memory in free list Link::Purge (); // optional } // ... private: List * _aList; int _size; }; Not ice: Purge is a st at ic m et hod of Link, so we don't need an inst ance of a Link in order t o call it . I n t he im plem ent at ion file, we first have t o define t he st at ic m em ber _freeList of t he class Link. St at ic dat a is aut om at ically init ialized t o zero. FreeList Link::_freeList; The im plem ent at ion of FreeList is pret t y st raight forward. We t ry t o reuse Links, if possible; ot herwise we call t he global operat or new. Since we are allocat ing raw m em ory, we ask for sizeof (Link) byt es ( chars) . When we delet e t his st orage, we cast Links back t o t heir raw form . Delet ing a Link as a Link would result in a ( second! ) call t o it s dest ruct or. We don't want t o do it here, since dest ruct ors for t hese Links have already been called when t he class- specific delete was called. void * FreeList::NewLink () { if (_p != 0) {
256
void * mem = _p; _p = _p->_pNext; return mem; } else { // use global operator new return ::new char [sizeof (Link)]; } } void FreeList::Recycle (void * mem) { Link * link = static_cast (mem); link->_pNext = _p; _p = link; } FreeList::~FreeList () { Purge (); } void FreeList::Purge () { while (_p != 0) { // it was allocated as an array of char char * mem = reinterpret_cast (_p); _p = _p->Next(); ::delete [] mem; } } Not ice all t he cast ing we have t o do. When our overloaded new is called, it is expect ed t o ret urn a void point er. I nt ernally, however, we eit her recycle a Link from a linked- list pool, or allocat e a raw chunk of m em ory of t he appropriat e size. We don't want t o call ::new Link, because t hat would have an unwant ed side effect of calling Link's const ruct or ( it will be called anyway aft er we ret urn from operat or new) . Our delete, on t he ot her hand, is called wit h a void point er, so we have t o cast it t o a Link in order t o st ore it in t he list . Purge delet es all as if t hey were arrays of chars- - since t hat is how t hey were allocat ed. Again, we don't want t o delet e t hem as Links, because Link dest ruct ors have already been called. As usually, calls t o global operat ors new and delete can be disam biguat ed by prepending double colons. Here, t hey ar not st rict ly necessary, but t hey enhance t he readabilit y.
Bu lk Allo ca t io n Anot her approach t o speeding up allocat ion is t o allocat e in bulk and t hus am ort ize t he cost of m em ory allocat ion across m any calls t o operat or new. The im plem ent at ion of Links, Lists and HashTables is as before, except t hat a new
257
class, LinkAllocator is used in place of FreeList. I t has t he sam e int erface as FreeList, but it s im plem ent at ion is m ore involved. Besides keeping a list of recycled Links, it also has a separat e list of blocks of links. Each block consist s of a header of class Block and a block of 16 consecut ive raw pieces of m em ory each t he size of a Link. class Link; class LinkAllocator { enum { BlockLinks = 16 }; class Block { public: Block * Next () { return _next; } void SetNext (Block * next) { _next = next; } private: Block * _next; }; public: LinkAllocator () : _p (0), _blocks (0) {} ~LinkAllocator (); void Purge (); void * NewLink (); void Recycle (void * link); private: Link * _p; Block * _blocks; }; This is how a new Link is creat ed: void * LinkAllocator::NewLink () { if (_p == 0) { // use global operator new to allocate a block of links char * p = ::new char [sizeof (Block) + BlockLinks * sizeof (Link)]; // add it to the list of blocks Block * block = reinterpret_cast (p); block->SetNext (_blocks); _blocks = block; // add it to the list of links p += sizeof (Block); for (int i = 0; i < BlockLinks; ++i) { Link * link = reinterpret_cast (p); link->_pNext = _p; _p = link; p += sizeof (Link); } } void * mem = _p; _p = _p->_pNext;
258
return mem; } The first block of code deals wit h t he sit uat ion when t here are no unused links in t he Link list . A whole block of 16 ( BlockLinks) Link- sized chunks is allocat ed all at once, t oget her wit h som e room for t he Block header. The Block is im m ediat ely linked int o t he list of blocks and t hen chopped up int o separat e Links which are added t o t he Link list . Once t he Link list is replenished, we can pick a Link from it and pass it out . The im plem ent at ion of Recycle is t he sam e as before- - t he links are ret urned t o t he Link list . Purge, on t he ot her hand, does bulk deallocat ions of whole blocks. void LinkAllocator::Purge () { while (_blocks != 0) { // it was allocated as an array of char char * mem = reinterpret_cast (_blocks); _blocks = _blocks->Next(); ::delete [] mem; } } Only one call in 16 t o new Link result s in act ual m em ory allocat ion. All ot hers are dealt wit h very quickly by picking a ready- m ade Link from a list .
Ar r a y n e w Even t hough class Link has overloaded operat ors new and delete, if you were t o allocat e a whole array of Links, as in new Link [10], t he com piler would call global new t o allocat e enough m em ory for t he whole array. I t would not call t he class- specific overload. Conversly, delet ing such an array would result in t he call t o global operat or delete- - not it 's class- specific overload. Since in our program we never allocat e arrays of Links, we have not hing t o worry about . And even if we did, global new and delet e would do t he right t hing anyway. However, in t he unlikely case when you act ually want t o have cont rol over array allocat ions, C+ + provides a way. I t let 's you overload operat ors new[] and delete[]. The synt ax and t he signat ures are analogous t o t he overloads of st raight new and delete. void * operator new [] (size_t size); void operator delete [] (void * p); The only difference is t hat t he size passed t o new[] t akes int o account t he t ot al size of t he array plus som e addit ional dat a used by t he com piler t o dist inguish bet ween point ers t o obj ect s and arrays of obj ect s. For inst ance, t he com piler has t o know t he num ber of elem ent s in t he array in order t o be able t o call dest ruct ors on all of t hem when delete [] is called. All four operat ors new, delete, new[] and delete[] are t reat ed as st at ic m em bers of t he class t hat overloads t hem ( i.e., t hey don't have access t o this) .
259
Glo bal n e w Unlike class- specific new, global new is usually overloaded for debugging purposes. I n som e cases, however, you m ight want t o overload global new and delete perm anent ly, because you have a bet t er allocat ion st rat egy or because you want m ore cont rol over it . I n any case, you have a choice of overriding global new and delete or adding your own special versions t hat follow a slight ly different synt ax. St andard operat or new t akes one argum ent of t ype size_t. St andard delete t akes one orgum ent of t ype void *. You can define your own versions of new and delet e t hat t ake addit ional argum ent s of arbit rary t ypes. For inst ance, you can define void * operator new (size_t size, char * name); void operator delete (void * p, char * name); and call t he special new using t his synt ax: Foo * p = new ("special") Foo; Unfort unat ely, t here is no way t o call t he special delete explicit ly, so you have t o be sure t hat st andard delete will correct ly handle m em ory allocat ed using your special new ( or t hat delete is never called for such obj ect s) . So what 's t he use of t he overloaded delete wit h special argum ent s? There is act ually one case in which it will be called- - when an except ion is t hrown during obj ect const ruct ion. As you m ight recall, t here is a cont ract im plicit in t he language t hat if an except ion happens during t he const ruct ion of an obj ect , t he m em ory for t his obj ect will be aut om at ically deallocat ed. I t so happens t hat during obj ect 's const ruct ion t he com piler is st ill aware of which version of operat or new was called t o allocat e m em ory. I t is t herefore able t o generat e a call t o t he corresponding version of delete, in case an except ion is t hrown. Aft er t he successful com plet ion of const ruct ion, t his inform at ion is no longer available and t he com piler has no m eans t o guess which version of global delete is appropriat e for a given obj ect . Once you have defined an overloaded version of new, you can call it explicit ly, by specifying addit ional argum ent ( s) . Or you can subst it ut e all calls t o new in your code wit h t he overloaded version using m acro subst it ut ion.
M a cr o s We haven't really t alked about m acros in t his book- - t hey are a part of st andard C+ + , but t heir use is st rongly discouraged. I n t he old t im es, t hey were used in place of t he m ore sophist icat ed C+ + feat ures, such as inline funct ions and t em plat es. Now t hat t here are bet t er ways of get t ing t he sam e funct ionalit y, m acros are fast becom ing obsolet e. But j ust for com plet eness, let m e explain how t hey work. Macros are obnoxious, sm elly, sheet - hogging bedfellows for several reasons, m ost of which are relat ed t o t he fact t hat t hey are a glorified t ext subst it ut ion facilit y whose effect s are applied during preprocessing, before any C+ + synt ax and sem ant ic rules can even begin t o apply. Herb Sut t er A m acro works t hrough lit eral subst it ut ion. You m ay t hink of m acro expansion as a separat e process perform ed by t he com piler before even get t ing
260
t o t he m ain t ask of parsing C+ + synt ax. I n fact , in older com pilers, m acro expansion was done by a separat e program , t he preprocessor. There are t wo m aj or t ypes of m acros. The first t ype sim ply subst it ut es one st ring wit h anot her, in t he code t hat logically follows it ( by logically I m ean t hat , if t he m acro is defined in an include file, it will also work in t he file t hat includes it , and so on) . Let m e give you an exam ple t hat m ight act ually be useful. Let 's define t he following m acro in t he file dbnew.h #define new new(__FILE__, __LINE__) This m acro will subst it ut e all occurrences of new t hat logically follow it wit h t he st ring new (__FILE__, __LINE__). Moreover, t he m acro preprocessor will t hen subst it ut e all occurrences of t he special pre- defined sym bol __FILE__ wit h t he full nam e of t he source file in which it finds it ; and all occurrences of __LINE__ wit h t he appropriat e line num ber. So if you have a file c: \ t est \ m ain.cpp wit h t he cont ent s: #include "dbnew.h" int main () { int * p = new int; return 0; } it will be pre- processed t o produce t he following code: int main () { int * p = new ("c:\test\main.cpp", 4) int; return 0; } Now you can use your own overloaded operat or new, for inst ance t o t race all m em ory allocat ion. Here's a sim ple exam ple of such im plem ent at ion. void * operator new (size_t size, char const * file, int line) { std::cout (b++))? (a++): (b++)) One of t he variables will be increm ent ed t wice, t he ot her once. This is probably not what t he program m er expect ed. By t he way, t here is one m ore got cha- - not ice t hat I didn't put a space bet ween.
Tr a cin g M e m o r y Le a k s A m ore int erest ing applicat ion of t his t echnique let s you t race unreleased allocat ions, a.k.a. m em ory leaks. The idea is t o st ore inform at ion about each allocat ion in a global dat a st ruct ure and dum p it s cont ent s at t he end of t he program . Overloaded operat or delete would rem ove ent ries from t his dat a st ruct ure. Since operat or delet e has only access t o a point er t o previously allocat ed m em ory, we have t o be able t o reasonably quickly find t he ent ry based on t his point er. A m ap keyed by a point er com es t o m ind im m ediat ely. We'll call t his global dat a st ruct ure a Tracer class Tracer { private: class Entry { public: Entry (char const * file, int line) : _file (file), _line (line) {} Entry () : _file (0), _line (0) {} char const * File () const { return _file; } int Line () const { return _line; } private: char const * _file; int _line; }; class Lock { public:
262
Lock (Tracer & tracer) : _tracer (tracer) { _tracer.lock (); } ~Lock () { _tracer.unlock (); } private: Tracer & _tracer; }; typedef std::map::iterator iterator; friend class Lock; public: Tracer (); ~Tracer (); void Add (void * p, char const * file, int line); void Remove (void * p); void Dump (); static bool Ready; private: void lock () { _lockCount++; } void unlock () { _lockCount--; } private: std::map _map; int _lockCount; }; We have defined t wo auxillary classes, Tracer::Entry which is used as t he value for t he m ap, and Tracer::Lock which is used t o t em porary disable t racing. They are used in t he im plem ent at ion of Tracer::Add and Tracer::Remove. The m et hod Add adds a new ent ry t o t he m ap, but only when t racing is act ive. Not ice t hat it disables t racing when accessing t he m ap- - we don't want t o t race t he allocat ions inside t he m ap code. void Tracer::Add (void * p, char const * file, int line) { if (_lockCount > 0) return; Tracer::Lock lock (*this); _map [p] = Entry (file, line); } The m et hod Remove m akes t he sam e preparat ions as Add and t hen searches t he m ap for t he point er t o be rem oved. I f it 's found, t he whole ent ry is erased. void Tracer::Remove (void * p) { if (_lockCount > 0) return;
263
Tracer::Lock lock (*this); iterator it = _map.find (p); if (it != _map.end ()) { _map.erase (it); } } Finally, at t he end of t he program , t he m et hod Dump is called from t he dest ruct or of Tracer t o display all t he leaks. Tracer::~Tracer () { Ready = false; Dump (); } void Tracer::Dump () { if (_map.size () != 0) { std::cout second.Line (); std::cout first); std::stringstream out; out OnDestroy (); return 0; case WM_MOUSEMOVE: { POINTS p = MAKEPOINTS (lParam); KeyState kState (wParam); if (pCtrl->OnMouseMove (p.x, p.y, kState)) return 0; } } return ::DefWindowProc (hwnd, message, wParam, lParam); } We init ialize t he GWL_USERDATA slot corresponding t o hwnd in one of t he first m essages sent t o our window. The m essage is WM_NCCREATE ( Non- Client Creat e) , sent before t he creat ion of t he non- client part of t he window ( t he border, t he t it le bar, t he syst em m enu, et c.) . ( There is anot her m essage before t hat one, WM_GETMINMAXINFO, which m ight require special handling.) We pass t he point er t o t he cont roller as window creat ion dat a. We use t he class Win::CreateData, a t hin encapsulat ion of Windows st ruct ure CREATESTRUCT. Since we want t o be able t o cast a point er t o CREATESTRUCT passed t o us by Windows t o a point er t o Win: : Creat eDat a, we use inherit ance rat her t han em bedding ( you can inherit from a struct, not only from a class) . namespace Win { class CreateData: public CREATESTRUCT { public: void * GetCreationData () const { return lpCreateParams; } int GetHeight () const { return cy; } int GetWidth () const { return cx; } int GetX () const { return x; } int GetY () const { return y; } char const * GetWndName () const { return lpszName; } }; } The m essage WM_DESTROY is im port ant for t he t op- level window. That 's where t he " quit " m essage is usually post ed. There are ot her m essages t hat m ight be sent t o a window aft er WM_DESTROY, m ost not ably WM_NCDESTROY, but we'll ignore t hem for now. I also added t he processing of WM_MOUSEMOVE, j ust t o illust rat e t he idea of m essage handlers. This m essage is sent t o a window whenever a m ouse m oves over it . I n t he generic window procedure we will always unpack m essage param et ers and pass t hem t o t he appropriat e handler. There are t hree param et ers associat ed wit h WM_MOUSEMOVE, t he x coordinat e, t he y coordinat e and t he st at e of cont rol keys and but t ons. Two of t hese param et ers, x and y, are packed int o one LPARAM and Windows convenient ly provides a m acro t o unpack t hem , MAKEPOINTS, which t urns lParam int o a st ruct ure called POINTS. We ret rieve t he values of x and y from POI NTS and pass t hem t o t he handler.
281
The st at e of cont rol keys and but t ons is passed inside WPARAM as a set of bit s. Access t o t hese bit s is given t hrough special bit m asks, like MK_CONTROL, MK_SHIFT, et c., provided by Windows. We will encapsulat e t hese bit wise operat ions inside a class, Win::KeyState. class KeyState { public: KeyState (WPARAM wParam): _data (wParam) {} bool IsCtrl () const { return (_data & MK_CONTROL) != 0; } bool IsShift () const { return (_data & MK_SHIFT) != 0; } bool IsLButton () const { return (_data & MK_LBUTTON) != 0; } bool IsMButton () const { return (_data & MK_MBUTTON) != 0; } bool IsRButton () const { return (_data & MK_RBUTTON) != 0; } private: WPARAM _data; }; The m et hods of Win::KeyState ret urn t he st at e of t he cont rol and shift keys and t he st at e of t he left , m iddle and right m ouse but t ons. For inst ance, if you m ove t he m ouse while you press t he left but t on and t he shift key, bot h IsShift and IsLButton will ret urn true. I n WinMain, where t he window is creat ed, we init ialize our cont roller and pass it t o Win::Maker::Create along wit h t he window's t it le. TopController ctrl; win.Create (ctrl, "Simpleton"); This is t he m odified Create. I t passes t he poi nt er t o Controller as t he user- defined part of window creat ion dat a- - t he last argum ent t o CreateWindowEx. HWND Maker::Create (Controller & controller, char const * title) { HWND hwnd = ::CreateWindowEx ( _exStyle, _className, title, _style, _x, _y, _width, _height, _hWndParent, _hMenu, _hInst, &controller); if (hwnd == 0) throw "Internal error: Window Creation Failed."; return hwnd; } To sum m arize, t he cont roller is creat ed by t he client and passed t o t he Create m et hod of Win::Maker. There, it is added t o t he creat ion dat a, and
282
Windows passes it as a param et er t o WM_NCREATE m essage. The window procedure unpacks it and st ores it under GWL_USERDATA in t he window's int ernal dat a st ruct ure. During t he processing of each subsequent m essage, t he window procedure ret rieves t he cont roller from t his dat a st ruct ure and calls it s appropriat e m et hod t o handle t he m essage. Finally, in response t o WM_DESTROY, t he window procedure calls t he cont roller one last t im e and unplugs it from t he window. Now t hat t he m echanics of passing t he cont roller around are figured out , let 's t alk about t he im plem ent at ion of Controller. Our goal is t o concent rat e t he logic of a window in t his one class. We want t o have a generic window procedure t hat t akes care of t he ugly st uff- - t he big swit ch st at em ent , t he unpacking and re- packing of m essage param et ers and t he forwarding of t he m essages t o t he default window procedure. Once t he m essage is rout ed t hrough t he swit ch st at em ent , t he appropriat e Controller m et hod is called wit h t he correct ( st rongly- t yped) argum ent s. For now, we'll j ust creat e a st ub of a cont roller. Event ually we'll be adding a lot of m et hods t o it - - as m any as t here are different Windows m essages. The cont roller st ores t he handle t o t he window it services. This handle is init ialized inside t he window procedure during t he processing of WM_NCCREATE. That 's why we m ade Win::Procedure a friend of Win::Controller. The handle it self is prot ect ed, not privat e- - derived classes will need access t o it . There are only t wo m essage- handler m et hods at t his point , OnDestroy and OnMouseMove. namespace Win { class Controller { friend LRESULT CALLBACK Procedure (HWND hwnd, UINT message, WPARAM wParam, LPARAM lParam); void SetWindowHandle (HWND hwnd) { _h = hwnd; } public: virtual ~Controller () {} virtual bool OnDestroy () { return false; } virtual bool OnMouseMove (int x, int y, KeyState kState) { return false; } protected: HWND _h; }; } You should keep in m ind t hat Win::Controller will be a part of t he library t o be used as a base class for all user- defined cont rollers. That 's why all m essage handlers are declared virt ual and, by default , t hey ret urn false. The m eaning of t his Boolean is, " I handled t he m essage, so t here is no need t o call DefWindowProc." Since our default im plem ent at ion doesn't handle any m essages, it always ret urns false. The user is supposed t o define his or her own cont roller t hat inherit s from Win::Controller and overrides som e of t he m essage handlers. I n t his case, t he only m essage handler t hat has t o be overridden is OnDestroy- - it m ust close t he applicat ion by sending t he " quit " m essage. I t ret urns true, so t hat t he default window procedure is not called aft erwards.
283
class TopController: public Win::Controller { public: bool OnDestroy () { ::PostQuitMessage (0); return true; } }; To sum m arize, our library is designed in such a way t hat it s client has t o do m inim al work and is prot ect ed from m aking t rivial m ist akes. For each class of windows, t he client has t o creat e a cust om ized cont roller class t hat inherit s from our library class, Win::Controller. He im plem ent s ( overrides) only t hose m et hods t hat require non- default im plem ent at ion. Since he has t he prot ot ypes of all t hese m et hods, t here is no danger of m isint erpret ing m essage param et ers. This part - - t he int erpret at ion and unpacking- - is done in our Win: : Procedure. I t is writ t en once and for all, and is t horoughly t est ed. This is t he part of t he program t hat is writ t en by t he client of our library. I n fact , we will sim plify it even m ore lat er. I s it explained t hat t he result of assignm ent can be used in an expression? #include #include #include #include
"Class.h" "Maker.h" "Procedure.h" "Controller.h"
class TopController: public Win::Controller { public: bool OnDestroy () { ::PostQuitMessage (0); return true; } }; int WINAPI WinMain (HINSTANCE hInst, HINSTANCE hPrevInst, LPSTR cmdParam, int cmdShow) { char className [] = "Simpleton"; Win::ClassMaker winClass (className, hInst); winClass.Register (); Win::Maker maker (className, hInst); TopController ctrl; Win::Dow win = maker.Create (ctrl, "Simpleton"); win.Display (cmdShow); MSG msg; int status; while ((status = ::GetMessage (& msg, 0, 0, 0)) != 0) { if (status == -1) return -1; ::DispatchMessage (& msg);
284
} return msg.wParam; } Not ice t hat we no longer have t o pass window procedure t o class m aker. Class m aker can use our generic Win::Procedure im plem ent ed in t erm s of t he int erface provided by our generic Win::Controller. What will really dist inguish t he behavior of one window from t hat of anot her is t he im plem ent at ion of a cont roller passed t o Win::Maker::Create. The cost of t his sim plicit y is m ost ly in code size and in som e m inim al speed det eriorat ion. Let 's st art wit h speed. Each m essage now has t o go t hrough param et er unpacking and a virt ual m et hod call- - even if it 's not processed by t he applicat ion. I s t his a big deal? I don't t hink so. An average window doesn't get m any m essages per second. I n fact , som e m essages are queued in such a way t hat if t he window doesn't process t hem , t hey are overwrit t en by new m essages. This is for inst ance t he case wit h m ouse- m ove m essages. No m at t er how fast you m ove t he m ouse over t he window, your window procedure will not choke on t hese m essages. And if a few of t hem are dropped, it shouldn't m at t er, as long as t he last one ends up in t he queue. Anyway, t he frequency wit h which a m ouse sends m essages when it slides across t he pad is quit e arbit rary. Wit h t he current processor speeds, t he processing of window m essages t akes a m arginally sm all am ount of t im e. Program size could be a considerat ion, except t hat m odern com put ers have so m uch m em ory t hat a m egabyt e here and t here doesn't really m at t er. A full blown Win::Controller will have as m any virt ual m et hods as t here are window m essages. How m any is it ? About 200. The full vt able will be 800 byt es. That 's less t han a kilobyt e! For com parison, a single icon is 2kB. You can have a dozen of cont rollers in your program and t he t ot al size of t heir vt ables won't even reach 10kB. There is also t he code for t he default im plem ent at ion of each m et hod of Win::Controller. I t s size depends on how aggressively your com piler opt im izes it , but it adds up t o at m ost a few kB. Now, t he worst case, a program wit h a dozen t ypes of windows, is usually already pret t y com plex- - read, large! - - plus it will probably include m any icons and bit m aps. Seen from t his perspect ive, t he price we have t o pay for sim plicit y and convenience is m inim al.
Exce p tio n Sp e cificatio n What would happen if a Cont roller m et hod t hrew an except ion? I t would pass right t hrough our Win::Procedure, t hen t hrough several layers of Windows code t o finally em erge t hrough t he m essage loop. We could, in principle cat ch it in WinMain. At t hat point , however, t he best we could do is t o display a polit e error m essage and quit . Not only t hat , it 's not ent irely clear how Windows would react t o an except ion rushing t hrough it s code. I t m ight , for inst ance, fail t o deallocat e som e resources or even get int o som e unst able st at e. The bot t om line is t hat Windows doesn't expect an except ion t o be t hrown from a window procedure. We have t wo choices, eit her we put a try/ catch block around t he swit ch st at em ent in Win::Procedure or we prom ise not t o t hrow any except ions from Cont roller's m et hods. A try/ catch block would add t im e t o t he processing of every single m essage, whet her it 's overridden by t he client or not . Besides, we
285
would again face t he problem , what t o do wit h such an except ion. Term inat e t he program ? That seem s pret t y harsh! On t he ot her hand, t he cont ract not t o t hrow except ions is im possible t o enforce. Or is it ?! Ent er except ion specificat ions. I t is possible t o declare what kind of except ions can be t hrown by a funct ion or m et hod. I n part icular, we can specify t hat no except ions can be t hrown by a cert ain m et hod. The declarat ion: virtual bool OnDestroy () throw (); prom ises t hat OnDestroy ( and all it s overrides in derived classes) will not t hrow any except ions. The general synt ax is t o list t he t ypes of except ions t hat can be t hrown by a procedure, like t his: void Foo () throw (bad_alloc, char *); How st rong is t his cont ract ? Unfort unat ely, t he st andard doesn't prom ise m uch. The com piler is only obliged t o det ect except ion specificat ion m ism at ches bet ween base class m et hods and derived class overrides. I n part icular, t he specificat ion can be only m ade st ronger ( fewer except ions allowed) . There is no st ipulat ion t hat t he com piler should det ect even t he m ost blat ant violat ions of t his prom ise, for inst ance an explicit throw inside a m et hod defined as throw() ( t hrow not hing) . The hope, however, is t hat com piler writ ers will give in t o t he dem ands of program m ers and at least m ake t he com piler issue a warning when an except ion specificat ion is violat ed. Just as it is possible for t he com piler t o report violat ions of const- ness, so it should be possible t o t rack down violat ions of except ion specificat ions. For t he t im e being, all t hat an except ion specificat ion accom plishes in a st andard- com pliant com piler is t o guarant ee t hat all unspecified except ions will get convert ed t o a call t o t he library funct ion unexpected (), which by default t erm inat es t he program . That 's good enough, for now. Declaring all m et hods of Win::Controller as " t hrow not hing" will at least force t he client who overrides t hem t o t hink t wice before allowing any except ion t o be t hrown.
Cle an u p I t 's t im e t o separat e library files from applicat ion files. For t he t im e being, we'll creat e a subdirect ory " lib" and copy all t he library files int o it . However, when t he com piler com piles files in t he m ain direct ory, it doesn't know where t o find library includes, unless we t ell it . All com pilers accept addit ional include pat hs. We'll j ust have t o add " lib" t o t he list of addit ional include pat hs. As part of t he cleanup, we'll also m ove t he definit ion of TopController t o a separat e file, cont rol.h.
286
Pain tin g Ap p licatio n Ico n Every Windows program m ust have an icon. When you browse int o t he direct ory where t he execut able is st ored, Windows browser will display t his program 's icon. When t he program is running, t his icon shows up in t he t askbar and in t he upper- left corner of t he program 's window. I f you don't provide your program wit h an icon, Windows will provide a default . The obvious place t o specify an icon for your applicat ion is in t he window class of t he t op- level window. Act ually, it 's best t o provide t wo icons at once, t he large one and t he sm all one, ot herwise Windows will t ry t o st ret ch or shrink t he one icon you give it , oft en wit h un- est het ic result s. Let 's add a SetIcons m et hod t o Win::ClassMaker and em bed t wo icon obj ect s in it . class ClassMaker { public: ... void SetIcons protected: WNDCLASSEX StdIcon SmallIcon };
(int id); _class; _stdIcon; _smallIcon;
We'll get t o t he im plem ent at ion of StdIcon and SmalIcon soon. First , let 's look at t he im plem ent at ion of SetIcons. The im ages of icons are loaded from program resources. void ClassMaker::SetIcons (int id) { _stdIcon.Load (_class.hInstance, id); _smallIcon.Load (_class.hInstance, id); _class.hIcon = _stdIcon; _class.hIconSm = _smallIcon; } Program resources are icons, bit m aps, st rings, m ouse cursors, dialog t em plat es, et c., t hat you can t ack on t o your execut able. Your program , inst ead of having t o search t he disk for files cont aining such resources, sim ply loads t hem from it s own execut able. How do you ident ify resources when you want t o load t hem ? You can eit her give t hem nam es or int eger ids. For sim plicit y ( and efficiency) , we will use ids. The set of your program 's resources is ident ified by t he inst ance handle t hat is passed t o WinMain. Let s st art wit h t he base class, Win::Icon. When you load an icon, you have t o specify t he resources where it can be found, t he unique id of t he part icular icon, it s dim ensions in pixels ( if t he act ual icon has different dim ensions, Windows will st ret ch or shrink it ) and som e flags. class Icon { public:
287
Icon (HINSTANCE res, int id, int dx = 0, int dy = 0, unsigned flag = LR_DEFAULTCOLOR) { Load (res, id, dx, dy, flag); } ~Icon (); operator HICON () const { return _h; } protected: Icon () : _h (0) {} void Load (HINSTANCE res, int id, int dx = 0, int dy = 0, unsigned flag = LR_DEFAULTCOLOR); protected: HICON _h; }; The API t o load an icon is called LoadImage and can be also used t o load ot her t ypes of im ages. I t 's ret urn t ype is am biguous, so it has t o be cast t o HICON. Once t he icon is no longer used, DestroyIcon is called. void Icon::Load (HINSTANCE res, int id, int dx, int dy, unsigned flag) { _h = reinterpret_cast ( ::LoadImage (res, MAKEINTRESOURCE (id), IMAGE_ICON, dx, dy, flag)); if (_h == 0) throw "Icon load image failed"; } Icon::~Icon () { ::DestroyIcon (_h); } Not ice t hat we can't pass t he icon id direct ly t o t he API . We have t o use a m acro MAKEINTRESOURCE which does som e cheat ing behind t he scenes. You see, LoadImage and several ot her API s have t o guess whet her you are passing t hem a st ring or an id. Since t hese are C funct ions, t hey can't be overloaded. I nst ead, you have t o t rick t hem int o accept ing bot h t ypes and t hen let t hem guess t heir real ident it y. MAKEINTRESOURCE m ucks wit h t he bit s of t he int eger t o m ake it look different t han a point er t o char. ( This is t he kind of program m ing t hat was popular when Windows API was first designed.) We can im m ediat ely subclass Icon t o SmallIcon and StdIcon. Their const ruct ors and Load m et hods are sim pler- - t hey don't require dim ensions or flags.
288
class SmallIcon: public Icon { public: SmallIcon () {} SmallIcon (HINSTANCE res, int id); void Load (HINSTANCE res, int id); }; class StdIcon: public Icon { public: StdIcon () {} StdIcon (HINSTANCE res, int id); void Load (HINSTANCE res, int id); }; The Load m et hods are im plem ent ed using t he parent class' Icon::Load m et hod ( you have t o use t he parent 's class nam e followed by double colon t o disam biguat e- - wit hout it t he com piler would underst and it as a recursive call and t he program would go int o an infinit e loop. To find out what t he correct sizes for sm all and st andard icons are, we use t he universal API , GetSystemMetrics t hat knows a lot about current syst em 's default s. void SmallIcon::Load (HINSTANCE res, int id) { Icon::Load (res, id, ::GetSystemMetrics (SM_CXSMICON), ::GetSystemMetrics (SM_CYSMICON)); } void StdIcon::Load (HINSTANCE res, int id) { Icon::Load (res, id, ::GetSystemMetrics (SM_CXICON), ::GetSystemMetrics (SM_CYICON)); } There's one m ore t hing: how does one creat e icons? There is a hard way and an easy way. The hard way is t o have som e kind of separat e icon edit or, writ e your own resource script t hat nam es t he icon files and, using a special t ool, com pile it and link wit h t he execut able. Just t o give you an idea of what 's involved, here are som e det ails. Your resource script file, let 's call it script .rc, should cont ain t hese t wo lines: #include "resource.h" IDI_MAIN ICON "main.ico" IDI_MAIN is a const ant defined in resource.h. The keyword ICON m eans t hat it corresponds t o an icon. What follows is t he nam e of t he icon file, m ain.ico. The header file, resource.h, cont ains t he definit ions of const ant s, for inst ance: #define IDI_MAIN
101
Unfort unat ely, you can't use t he sefer, C+ + version of it ,
289
const int IDI_MAIN = 101; A m acro subst it ut ion result s in exact ly t he sam e code as const int definit ion. The only difference is t hat , as is usual wit h m acros, you forgo t ype checking. The script file has t o be com piled using a program called rc.exe ( resource com piler) t o produce a file script .res. The linker will t hen link such file wit h t he rest of t he obj ect files int o one execut able. Or, if you have an int egrat ed developm ent environm ent wit h a resource edit or, you can creat e an icon in it , add it t o your resources under an appropriat e sym bolic id, and let t he environm ent do t he work for you. ( A graphical resource edit or becom es really indispensable when it com es t o designing dialogs.) Not ice t hat I 'm using t he sam e id for bot h icons. I t 's possible, because you can have t wo ( or m ore) im ages of different size in t he sam e icon. When you call LoadImage, t he one wit h t he closest dim ensions is picked. Norm ally, you'd creat e at least a 32x32 and a 16x16 icon. I have creat ed a set of t wo icons and gave t hem an int eger id I DI _MAI N ( defined in resource.h) . All I need now is t o m ake one addit ional call in WinMain. Win::ClassMaker winClass (className, hInst); winClass.SetIcons (IDI_MAIN); winClass.Register (); Finally, you m ight be wondering: if you add m any icons t o your program resources, which one is used by t he syst em as t he icon for t he whole execut able? The answer is, t he one wit h t he lowest num erical id.
W in d o w Pain tin g an d th e Vie w Obje ct Just like wit h any ot her act ion in Windows, window paint ing is done in response t o som e ext ernal act ions. For inst ance, your program m ay paint som et ing whenever a user m oves a m ouse or clicks a m ouse but t on, it m ay draw charact ers in response t o key presses, and so on. The part of t he window t hat you norm ally paint is called t he client area- - it doesn't include t he window borders, t he t it le bar, t he m enu, et c. But t here is one m ore sit uat ion when Windows m ay ask your program t o redraw a part or t he whole client area of your window. That happens because Windows is lazy ( or short of resources) . Whenever anot her applicat ion ( or som et im es your own m enu) overlaps your program 's window, t he syst em sim ply t hrows away t he part of t he im age t hat 's occluded. When your window is finally uncovered, som ebody has t o redraw t he discarded part . Guess who! Your program ! The sam e t hing happens when a window is m inim ized and t hen m axim ized again. Or when t he user resizes t he window. Since, from t he point of view of your applicat ion, t hese act ions happen m ore or less random ly, you have t o be prepared, at any t im e, t o paint t he whole client area from scrat ch. There is a special m essage, WM_PAINT, t hat Windows sends t o you when it needs your assist ance in repaint ing t he window. This m essage is also sent t he first t im e t he window is displayed. To illust rat e paint ing, we'll ext end our Windows program t o t race m ouse m ovem ent s. Whenever t he m ouse m oves, we'll draw a line connect ing t he new cursor posit ion wit h t he previous one. But y before we do t hat , we'll want t o add t he second obj ect from t he t riad Model- View- Cont roller t o our program . The
290
View will t ake care of all paint ing operat ions. I t will also st ore t he last recorded posit ion of t he m ouse. class TopController: public Win::Controller { ... private: View _view; };
Th e Can vas All display operat ions are done in t he cont ext of a part icular device, be it t he screen, a print er, a plot t er or som et hing else. I n t he case of drawing t o a window, we have t o obt ain a device cont ext ( DC) for t his window's client area. Windows can int ernally creat e a DC for us and give us a handle t o it . We use t his handle for all window out put operat ions. When done wit h t he out put , we m ust release t he handle. A DC is a resource and t he best way t o deal wit h it is t o apply Resource Managem ent m et hods t o it . We'll call t he generic owner of a DC, Canvas. We will have m any different t ypes of Canvas, depending on how t he device cont ext is creat ed and disposed of. They will all, however, share t he sam e funct ionalit y. For inst ance, we can call any Canvas obj ect t o draw a line or print som e t ext . Let 's m ake t hese t wo operat ions t he st art ing point of our im plem ent at ion. namespace Win { class Canvas { public: operator HDC () { return _hdc; } void Line (int x1, int y1, int x2, int y2) { ::MoveToEx (_hdc, x1, y1, 0); ::LineTo (_hdc, x2, y2); } void Text (int x, int y, char const * buf, int count) { ::TextOut (_hdc, x, y, buf, count); } protected: Canvas (HDC hdc) :_hdc (hdc) {} HDC
_hdc;
}; } HDC is Windows dat a st ruct ure, a handle t o a device cont ext . Our generic class, Canvas, doesn't provide any public way t o init ialize t his handle- - t his responsibilit y is left t o derived classes. The m em ber operator HDC () provides im plicit conversion from Canvas t o HDC. I t com es in handy when passing a Canvas obj ect t o an API t hat requires HDC.
291
I n order t o draw a line from one point t o anot her, we have t o m ake t wo API calls. The first one, MoveToEx, set s t he " current posit ion." The second, LineTo, draws a line from current posit ion t o t he point specified as it s argum ent ( it also m oves t he current posit ion t o t hat point ) . Point posit ions are specified by t wo coordinat es, x and y. I n t he default coordinat e syst em , bot h are in unit s of screen pixels. The origin, corresponding t o x = 0 and y = 0, is in t he upper left corner of t he client area of t he window. The x coordinat e increases from left t o right , t he y coordinat e grows from t op t o bot t om . To print t ext , you have t o specify where in t he window you want it t o appear. The x, y coordinat es passed t o TextOut t ell Windows where t o posit ion t he upper left corner of t he st ring. This is different t han print ing t o st andard out put , where t he only cont rol over placem ent was by m eans of newline charact ers. For a Windows device cont ext , newlines have no m eaning ( t hey are blocked out like all ot her non- print able charact ers) . I n fact , t he st ringt erm inat ing null charact er is also m eaningless t o Windows. The st ring t o be print ed using TextOut doesn't have t o be null- t erm inat ed. I nst ead, you are supposed t o specify t he count of charact ers you want print ed. So how and where should we obt ain t he device cont ext ? Since we want t o do t he drawing in response t o every m ouse m ove, we have t o do it in t he handler of t he WM_MOUSEMOVE m essage. That m eans our Cont roller has t o override t he OnMouseMove virt ual m et hod of Win::Controller. The t ype of Canvas t hat get s t he DC from Windows out side of t he processing of WM_PAINT, will be called UpdateCanvas. The pair of API s t o get and release a DC is GetDC and ReleaseDC, respect ively. class UpdateCanvas: public Canvas { public: UpdateCanvas (HWND hwnd) : Canvas (::GetDC(hwnd)), _hwnd(hwnd) {} ~UpdateCanvas () { ::ReleaseDC (_hwnd, _hdc); } protected: HWND _hwnd; }; We creat e t he Canvas is in t he appropriat e Controller m et hod- - in t his case OnMouseMove. This way t he m et hods of View will work independent of t he t ype of Canvas passed t o t hem . bool TopController::OnMouseMove (int x, int y, Win::KeyState kState) throw () { Win::UpdateCanvas canvas (_h); _view.MoveTo (canvas, x, y); return true; } We are now ready t o im plem ent t he View obj ect .
292
class View { public: View () : _x (0), _y (0) {} void MoveTo (Win::Canvas & canvas, int x, int y) { canvas.Line (_x, _y, x, y); _x = x; _y = y; PrintPos (canvas); } private: void PrintPos (Win::Canvas & canvas) { std::string str ("Mouse at: "); str += ToString (_x); str += ", "; str += ToString (_y); canvas.Text (0, 0, &str [0], str.length ()); } private: int _x, _y; }; The PrintPos m et hod is int erest ing. The purpose of t his m et hod is t o print " Mouse at : " followed by t he x and y coordinat es of t he m ouse posit ion. We want t he st ring t o appear in t he upper left corner of t he client area, at coordinat es ( 0, 0) . First , we have t o form at t he st ring. I n part icular, we have t o convert t wo num bers t o t heir st ring represent at ions. The form at t ing of num bers for print ing is built int o st andard st ream s so we'll j ust use t he capabilit ies of a st ring- based st ream . I n fact , any t ype t hat is accepted by a st ream can be convert ed t o a st ring using t his sim ple t em plat e funct ion: #include template inline std::string ToString (T & val) { std::stringstream out; out OnPaint ()) return 0; break; St rict ly speaking, WM_PAINT com es wit h a WPARAM t hat , in som e special cases, having t o do wit h com m on cont rols, m ight be set t o a device cont ext . For now, let 's ignore t his param et er and concent rat e on t he com m on case. The st andard way t o obt ain a device cont ext in response t o WM_PAINT is t o call t he API BeginPaint. This device cont ext has t o be released by a m at ching call t o EndPaint. The ownership funct ionalit y is nicely encapsulat ed int o t he PaintCanvas obj ect : class PaintCanvas: public Canvas { public: PaintCanvas (HWND hwnd) : Canvas (::BeginPaint (hwnd, &_paint)), _hwnd (hwnd) {} ~PaintCanvas () { ::EndPaint(_hwnd, &_paint); } int Top () const { return _paint.rcPaint.top; } int Bottom () const { return _paint.rcPaint.bottom; } int Left () const { return _paint.rcPaint.left; } int Right () const { return _paint.rcPaint.right; } protected: PAINTSTRUCT _paint; HWND _hwnd; }; Not ice t hat BeginPaint gives t he caller access t o som e addit ional useful inform at ion by filling t he PAINTSTRUCT st ruct ure. I n part icular, it is possible t o
294
ret rieve t he coordinat es of t he rect angular area t hat has t o be repaint ed. I n m any cases t his area is only a sm all subset of t he client area ( for inst ance, aft er uncovering a sm all port ion of t he window or resizing t he window by a sm all increm ent ) . I n our unsophist icat ed applicat ion we won't m ake use of t his addit ional info- - we'll j ust repaint t he whole window from scrat ch. Here's our own override of t he OnPaint m et hod of t he cont roller. I t creat es a PaintCanvas and calls t he appropriat e View m et hod. bool TopController::OnPaint () throw () { Win::PaintCanvas canvas (_h); _view.Paint (canvas); return true; } View sim ply calls it s privat e m et hod PrintPos. Not ice t hat View doesn't dist inguish bet ween UpdateCanvas and PaintCanvas. For all it knows, it is being given a generic Win::Canvas. void View::Paint (Win::Canvas & canvas) { PrintPos (canvas); } What can we do about t he varying size of t he st ring being print ed? We need m ore cont rol over form at t ing. The following code will m ake sure t hat each of t he t wo num bers is be print ed using a fixed field of widt h 4, by passing t he std::setw (4) m anipulat or t o t he st ream . I f t he num ber following it in t he st ream cont ains fewer t han 4 digit s, it will be padded wit h spaces. void PrintPos (Win::Canvas & canvas) { std::stringstream out; out *CmdTable [idx]) (); The t ranslat ion from com m and id t o an index is t he weakest point of t his schem e. I n fact , t he whole idea of defining your m enu in t he resource file is not as convenient as you m ight t hink. A reasonably com plex applicat ion will require dynam ic changes t o t he m enu depending on t he current st at e of t he program . The sim plest exam ple is t he Mem ory> Save it em in t he calculat or. I t would m ake sense for it t o be inact ive ( grayed out ) as long as t here has been no userdefined variable added t o m em ory. We could t ry t o som ehow re- act ivat e t his m enu it em when a variable is added t o m em ory. But t hat would require t he m odel t o know som et hing about t he user int erface- - t he m enu. We could st ill save t he day by m aking use of t he not ificat ion sink. However, t here is a bet t er and m ore general approach- - dynam ic m enus.
D yn am ic Me n u s 322
First , let 's generalize and ext end t he idea of a com m and t able. We already know t hat we need t here a point er t o m em ber t hrough which we can execut e com m ands. We can also add anot her point er t o m em ber t hrough which we can quickly t est t he availabilit y of a given com m and- - t his will enable us t o dynam ically gray out som e of t he it em s. A short help st ring for each com m and would be nice, t oo. Finally, I decided t hat it will be m ore general t o give com m ands st ring nam es, rat her t han int eger ident ifiers. Grant ed, searching t hrough st rings is slower t han finding an it em by id, but usually t here aren't t hat m any m enu it em s t o m ake a percept ible difference. Moreover, when t he program grows t o include not only m enus, but also accelerat ors and t oolbars; being able t o specify com m ands by nam e rat her t han by offset is a great m aint ainabilit y win. So here's t he definit ion of a com m and it em , t he building block of a com m and t able. namespace Cmd { template class Item { public: char const * _name; void (T::*_exec)(); Status (T::*_test)() const; char const * _help; }; }
// // // //
official name execute command test commnad status help string
I f we want t o reuse Cmd::Item we have t o m ake it a t em plat e. The param et er of t he t em plat e is t he class of t he part icular com m ander whose m et hods we want t o access. This is how t he client creat es a com m and t able and init ializes it wit h appropriat e st rings and point ers t o m em bers. namespace Cmd { const Cmd::Item Table [] = { { "Program_About", &Commander::Program_About, &Commander::can_Program_About, "About this program"}, { "Program_Exit", &Commander::Program_Exit, &Commander::can_Program_Exit, "Exit program"}, { "Memory_Clear", &Commander::Memory_Clear, &Commander::can_Memory_Clear, "Clear memory"}, { "Memory_Save", &Commander::Memory_Save, &Commander::can_Memory_Save, "Save memory to file"}, { "Memory_Load", &Commander::Memory_Load, &Commander::can_Memory_Load, "Load memory from file"}, { 0, 0, 0} };
323
} Here, Commander is t he nam e of t he com m ander class defined in t he calculat or. Com m and t able is used t o init ialize t he act ual com m and vect or, Cmd::VectorExec, which adds funct ionalit y t o t his dat a st ruct ure. The relat ionship bet ween Cmd::Table and Cmd::VectorExec is analogous t o t he relat ionship bet ween Function::Array and Function::Table inside t he calculat or. As before, t his schem e m akes it very easy t o add new it em s t o t he t able- - new com m ands t o our program . Cmd::VectorExec has t o be a t em plat e, for t he sam e reason Cmd::Items have. However, in order not t o t em plat ize everyt hing else t hat m akes use of t his vect or ( in part icular, t he m enu syst em ) I derived it from a non- t em plat e class, Cmd::Vector, t hat defines a few pure virt ual funct ions and som e generic funct ionalit y, like searching com m ands by nam e using a m ap. The m enu provides acces t o t he com m and vect or. I n a dynam ic m enu syst em , we init ialize t he m enu from a t able. The t able is organized hierarchicaly: m enu bar it em s point t o popup m enus which cont ain com m ands. For inst ance, t his is what t he init ializat ion t able for our calculat or m enu looks like ( not ice t hat com m and t hat require furt her user input - - a dialog- - are followed by t hree dot s) : namespace Menu { const Item programItems [] = { {CMD, "&About...", "Program_About"}, {SEPARATOR, 0, 0}, {CMD, "E&xit", "Program_Exit"}, {END, 0, 0} }; const Item memoryItems [] = { {CMD, "&Clear", "Memory_Clear"}, {SEPARATOR, 0, 0}, {CMD, "&Save...", "Memory_Save"}, {CMD, "&Load...", "Memory_Load"}, {END, 0, 0} }; //---- Menu bar ---const BarItem barItems [] = { {POP, "P&rogram", "Program", programItems}, {POP, "&Memory", "Memory", memoryItems}, {END, 0, 0, 0} }; } Not e t hat each it em cont ains t he display nam e wit h an em bedded am persand. This am persand is t ranslat ed by Windows int o a keyboard short cut ( not t o be confused wit h a keyboard accellerat or) . The am persand it self is not displayed, but t he let t er following it will be underlined. The user will t hen be able t o select a given m enu it em by pressing t he key corresponding t o t hat let t er while holdnig t he Alt key. All it em s also specify com m and nam es- - for
324
popup it em s, t hese are t he sam e st rings t hat were used in t he nam ing of com m ands. Menu bar it em s are also nam ed, but t hey don't have com m ands associat ed wit h t hem . Finally, m enu bar it em s have point ers t o t he corresponding popup t ables. Sim ilar t ables can be used for t he init ializat ion of accelerat ors and t oolbars. The act ual m enu obj ect , of t he class Menu::DropDown, is creat ed in t he const ruct or of t he View. I t is init ialized wit h t he t able of m enu bar it em s, Menu::barItems, shown above; and a Cmd::Vector obj ect ( init ialized using Cmd::Table) . The rest is convenient ly encapsulat ed in t he library. You m ight be int erest ed t o know t hat , since a m enu is a resource ( released using DestroyMenu API ) , t he class Menu::Maker has t ransfer sem ant ics. For inst ance, when we creat e a m enu bar, all t he popup m enus are t ransfered t o Menu::BarMaker, one by one. But t hat 's not t he end of t he st ory. We want t o be able t o dynam ically act ivat e or deact ivat e part icular m enu it em s. We already have Commander m et hods for t est ing t he availabilit y of part icular com m ands- - t hey are in fact accessible t hrough t he com m and vect or. The quest ion rem ains, what is t he best t im e t o call t hese m et hods? I t t urns out t hat Windows sends a m essage, WM_INITMENUPOPUP, right before opening up a popup m enu. The handler for t his m essage is called OnInitPopup. We can use t hat opport unit y t o m anipulat e t he m enu while t est ing for t he availabilit y of part icular com m ands. I n fact , since t he library class Menu::DropDown has access t o t he com m and vect or, it can im plem ent t he RefreshPopup m et hod once and for all. No need for t he client t o writ e any addit ional code. Displaying short help for each select ed m enu it em is also versy sim ple. When t he user m oves t he m ouse cursor t o a popup m enu it em , Windows sends us t he m essage, WM_MENUSELECT, which we can process in t he cont roller's m et hod, OnMenuSelect. We j ust call t he GetHelp m et hod of t he com m and vect or and send t he help st ring t o t he st at us bar. Let 's now review t he whole t ask from t he point of view of t he client . What code m ust t he client writ e t o m ake use of our dynam ic m enu syst em ? To begin wit h, he has t o im plem ent t he com m ander, which is j ust a reposit ory of all com m ands available in t he part icular program . Two m et hods m ust be im plem ent ed for each com m and: one t o execut e it and one t o t est for it s availabilit y. The role of t he com m ander is: • if required, get dat a from t he user, usually by m eans of a dialog box • dispat ch t he request t o t he m odel for execut ion. Once t he com m ander is in place, t he client has t o creat e and st at ically init ialize a t able of com m ands. I n t his t able all com m ands are given nam es and assigned short help st rings. This t able is t hen used in t he init ializat ion of t he com m and vect or. The m enu syst em is likewise init ialized by a t able. This t able cont ains com m and nam es, display nam es for m enu it em s and m arkers different iat ing bet ween com m ands, separat ors and bar it em s. Once t he m enu bar is ready, it has t o be at t ached t o t he t op- level window. However, don't t ry t o at t ach t he m enu inside t he const ruct or of View. Bot h View and Controller m ust be fully const ruct ed before adding t he m enu. Menu at t achm ent result s in a series of m essages sent t o t he t op level window ( m ost not ably, t o resize it s client area) , so t he whole cont roller has t o be ready t o process t hem in an orderly m anner.
325
Finally, t he user m ust provide a sim ple im plem ent at ions of OnInitPopup and, if needed, OnMenuSelect, t o refresh a popup m enu and t o display short help, respect ively. Because m aj or dat a st ruct ures in t he m enu syst em are init ialized by t ables, it is very easy t o change t hem . For inst ance, reorganizing t he m enu or renam ing m enu it em s requires changes only t o a single file- - t he one t hat cont ains t he m enu t able. Modifying t he behavior of com m ands requires only changes t o t he com m ander obj ect . Finally, adding a new com m and can be done in t hree independent st ages: adding t he appropriat e m et hods t o t he com m ander, adding an ent ry t o t he com m and t able, and adding an it em t o t he m enu t able. I t can hardly be m ade sim pler and less error- prone. Fig 1. shows t he relat ionships and dependencies bet ween various elem ent s of t he cont roller.
Fig 1. The relat ionships bet ween various elem ent s of t he cont roller. Because Commander doesn't have access t o View, it has no direct way t o force t he refreshing of t he display aft er such com m ands as Memory_Clear or Memory_Load. Again, we can only solve t his problem by brut e force ( refresh m em ory display aft er every com m and) or som e kind of not ificat ions. I decided t o use t he m ost generic not ificat ion m echanism - - sending a Windows m essage. I n order t o force t he clearing of t he calculat or's m em ory display, t he Commander sends a special user- defined m essage MSG_MEMCLEAR t o t he t op- level window. Rem em ber, a m essage is j ust a num ber. You are free t o define your own m essages, as long as you assign t hem num bers t hat won't conflict wit h any m essages used by Windows. There is a special ident ifier WM_USER which defines a num ber t hat is guarant eed t o be larger t han t hat of any Windows- specific m essage. To process user- defined m essages, I added a new handler, OnUserMessage, t o t he generic Win::Controller. This handler is called whenever t he m essage is larger or equal t o WM_USER. One m ore change is necessary in order t o m ake t he m enus work correct ly. We have t o expand t he m essage loop t o call TranslateMessage before DispatchMessage. TranslateMessage filt ers out t hese keyboard m essages t hat have t o be t ranslat ed int o m enu short cut s and t urns t hem int o WM_COMMAND m essages. I f you are also planning on adding keyboard accelerat ors ( not t o be confused wit h keyboard short cut s t hat are processed direct ly by t he m enu syst em ) - - for inst ance, Ct rl- L t o load m em ory- - you'll have t o furt her expand t he m essage loop t o call TranslateAccellerator. Alt hough we won't discuss m odeless dialogs here, you m ight be int erest ed t o know t hat t hey also require a pre- processing st ep, t he call t o IsDialogMessage, in t he m essage loop. I t m akes sense t o st ick all t hese accellerat ors and m odeless dialog handles in a separat e preprocessor obj ect , of t he class Win::MessagePrepro. I t 's m et hod Pump ent ers t he m essage loop and ret urns
326
only when t he t op- level window is dest royed. One usually passes t he preprocessor obj ect t o t he t op- level cont roller, t o m ake it possible t o dynam ically swit ch accellerat or t ables or creat e and dest roy m odeless dialogs.
Exe rcis e s 1. I n response t o t he user's double- clicking on an it em in t he hist ory pane, copy t he select ed st ring int o t he edit cont rol, so t hat t he user can edit and reexecut e it . 2. Add it em " Funct ion" t o t he m enu bar. The corresponding popup m enu should display t he list of available built - in funct ions. When t he user select s one, it s nam e and t he opening parent esis should be appended t o t he st ring in t he edit cont rol. Hint : This popup m enu should not be init ialized st at ically. I t should use t he funct ion t able from t he calculat or for it s init ializat ion. 3. Add keyboard accelerat ors for Ct rl- L and Ct rl- S for invoking t he Load and Save com m ands, respect ively. Use a st at ically init ialized accelerat or t able. Pass t his t able, t oget her wit h t he com m and vect or ( for com m and nam e t o com m and id t ranslat ion) t o t he accellerat or m aker. The API t o creat e an accelerat or t able is called CreateAcceleratorTable. Since an accellerat or t able is a resource ( released via DestroyAccelleratorTable) , you'll have t o apply resource m anagem ent in t he design of your classes. To at t ach t he accellerat or, pass a reference t o t he m essage preprocessor from WinMain t o TopController. Aft er creat ing t he accellerat or, use t he MsgPrepro::SetKbdAccelerator m et hod t o act ivat e it . Hint : Rem em ber t o change t he display st ring in m enu it em s t o include t he accellerat or key. For inst ance, t he Load it em should read, "&Load...\tCtrl+L" ( t he t ab m arker \t right - aligns t he accellerat or st ring) . 4. Convert t he Load com m and t o use GetOpenFileName for browsing direct ories.
327
S o f t w a r e P r o je c t Ab o u t S o ftw a r e • • • •
Com plexit y The Fract al Nat ure of Soft ware The Living Proj ect The Living Program m er
D e s ign Strate gie s • • •
Top- Down Obj ect - Orient ed Design Model- View- Cont roller Docum ent at ion
Te a m W o r k • •
Product ivit y Team St rat egies
Im p le m e n tatio n Strate gie s • • • • • •
Global Decisions Top- Down Obj ect - Orient ed I m plem ent at ion I nherit ing Som ebody Else's Code Mult i- Plat form Developm ent Program Modificat ions Test ing
328
Ab o u t S o ftw a r e Co m p le xity Dealing wit h com plexit y, t he finit e capacit y of hum an m ind, divide and conquer, abst ract ion. Dealing wit h com plexit y is t he essence of soft ware engineering. I t is also t he m ost dem anding part of it , requiring bot h discipline and creat ivit y. Why do we need special m et hodologies t o deal wit h com plexit y? The answer is in our brains. I n our im m ediat e m em ory we can deal only wit h a finit e and rat her sm all num ber of obj ect s- - what ever t ype t hey are, ideas, im ages, words. The ballpark figure is seven plus/ m inus t wo, depending on t he com plexit y of t he obj ect s t hem selves. Apparent ly in m any ancient cult ures t he num ber seven was considered synonym ous wit h m any. There are m any folk st ories t hat st art wit h " Long, long ago behind seven m ount ains, behind seven forest s, behind seven rivers t here lived..." There are essent ially t wo ways in which we hum an beings can deal wit h com plexit y. The divide- and- conquer m et hod, and t he abst ract ion m et hod. The divide- and- conquer m et hods is based on im posing a t ree- like st ruct ure on t op of a com plex problem . The idea is t hat at every node of t he t ree we have t o deal wit h only a sm all num ber of branches, wit hin t he lim it s of our im m ediat e m em ory. The t raversal of t he t ree leaf- t o- root or root - t o- leaf requires only a logarit hm ic num ber of st eps- - again, presum ably wit hin t he lim it s of our im m ediat e m em ory. For inst ance, t he body of academ ic knowledge is divided int o hum anit ies and sciences ( branching fact or of 2) . Sciences are subdivided int o various areas, one of t hem being Com put er Science, and so on. To underst and Kernighan and Rit chie's book on C, t he CS st udent needs only very lim it ed educat ion in hum anit ies. On t he ot her hand, t o writ e a poem one is not required t o program in C. The t ree- like subdivision of hum an knowledge not only facilit at es in- dept h t raversal and search, it also enables division of work bet ween various t eam s. We can t hink of t he whole hum anit y as one large t eam t aking part in t he enorm ous proj ect of t rying t o underst and t he World. Anot her very powerful t ool developed by all living organism s and perfect ed by hum ans is abst ract ion. The word " abst ract ion" has t he sam e root as subt ract ion. Abst ract ing m eans subt ract ing non- essent ial feat ures. Think of how m any feat ures you can safely subt ract from t he descript ion of your car before it st ops being recognizable as a car. Definit ely t he color of t he paint , t he license plat es, t he windshield wipers, t he capacit y of t he t runk, et c. Unconsciously t he sam e process is applied by a bird when it creat es it s definit ion of a " predat or." Abst ract ion is not 100% accurat e: a crow m ay get scared by a scarecrow, which som ehow falls wit hin it s abst ract not ion of a " predat or." Division and abst ract ion go hand in hand in, what one can call, divide- andabst ract paradigm . A com plex syst em can be visualized as a very large net work of int erconnect ed nodes. We divide t his net work int o a few " obj ect s" - - subset s of nodes. A good division has t he propert y t hat t here are as few int er- obj ect connect ions as possible. To describe t he obj ect s result ing from such a division we use abst ract ion. I n part icular, we can describe t he obj ect s by t he way t hey connect t o ot her obj ect s ( t he int erface) . We can sim plify t heir inner st ruct ure by subt ract ing as m any inessent ial feat ures as possible. At every st age of division, it should be possible t o underst and t he whole syst em in t erm s of int eract ions bet ween a few well abst ract ed obj ect s. I f t here is no such way, we give up. The
329
real m iracle of our World is t hat large port ions of it ( m aybe even everyt hing) can be approached using t his paradigm .
A com plex syst em .
Abst ract ing obj ect s out of a com plex syst em .
The high level view of t he com plex syst em aft er abst ract ing obj ect s. This process is t hen repeat ed recursively by dividing obj ect s int o subobj ect s, and so on. For every obj ect , we first undo t he abst ract ion by adding back all t he feat ures we have subt ract ed, divide it int o sub- obj ect s and use new abst ract ions t o define t hem . An obj ect should becom e underst andable in t erm s of a few well abst ract ed sub- obj ect s. I n som e way t his recursive process creat es a self- sim ilar, fract al- like st ruct ure.
330
The fract al st ruct ure of a com plex syst em s. I n soft ware engineering we divide a large proj ect int o m anageable pieces. I n order t o define, nam e and describe t hese pieces we use abst ract ion. We can t alk about sym bol t ables, parsers, indexes, st orage layers, et c. They are all abst ract ions. And t hey let us divide a bigger problem int o sm aller pieces.
Th e Fractal N atu re o f So ftw are Let m e illust rat e t hese ideas wit h t he fam iliar exam ple of t he soft ware proj ect t hat we've been developing in t he second part of t he book- - t he calculat or. The t op level of t he proj ect is st ruct ured int o a set of int errelat ed obj ect s, Figure.
Top level view of t he calculat or proj ect . This syst em is closed in t he sense t hat one can explain how t he program works ( what t he funct ion main does) using only t hese obj ect s- - t heir public
331
int erfaces and t heir funct ionalit y. I t is not necessary t o know how t hese obj ect perform t heir funct ions; it is enough t o know what t hey do. So how does t he program work? First , t he Calculator is creat ed inside main. The Calculator is Serializable, t hat m eans t hat it s st at e can be saved and rest ored. Not ice t hat , at t his level, we don't need t o know anyt hing about t he st ream s- - t hey are black boxes wit h no visible int erface ( t hat 's why I didn't include t hem in t his pict ure) . Once t he Calculator is creat ed, we ent er t he loop in which we get a st ream of t ext from t he user and creat e a Scanner from it . The Scanner can t ell us whet her t he user input is a com m and or not . I f it is a com m and, we creat e a CommandParser, ot herwise we creat e a Parser. Eit her of t hem requires access t o bot h t he Calculator and t he Scanner. CommandParser can Execute a com m and, whereas Parser can Parse t he input and Calculate t he result . We t hen display t he result and go back t o t he beginning of t he loop. The loop t erm inat es when CommandParser ret urns st at us stQuit from t he Execute m et hod. That 's it ! I t could hardly be sim pler t han t hat . I t 's not easy, t hough, t o com e up wit h such a nice set of abst ract ion on t he first t ry. I n fact we didn't ! We had t o go t hrough a series of rewrit es in order t o arrive at t his sim ple st ruct ure. All t he t echniques and lit t le rules of t hum b described in t he second part of t he book had t his goal in m ind. But let 's cont inue t he j ourney. Let 's zoom - in int o one of t he t op level com ponent s- - t he Calculat or. Again, it can be described in t erm s of a set of int errelat ed obj ect s, Figure.
The result of zoom ing- in on t he Calculat or. And again, I could explain t he im plem ent at ion of all Calculat or m et hods using only t hese obj ect s ( and a few from t he level above) . Next , I could zoom - in on t he Store obj ect and see a very sim ilar pict ure.
332
The result of zoom ing- in on St ore. I could go on like t his, j ust like in one of t hese Mandelbrot set program s, where you can zoom - in on any part of t he pict ure and see som et hing t hat is different and yet sim ilar. Wit h a m at hem at ical fract al, you can keep zoom ing- in indefinit ely and keep seeing t he sam e infinit e level of det ail. Wit h a soft ware proj ect , you will event ually get t o t he level of plain built - in t ypes and com m ands. ( Of course, you m ay cont inue zoom ing- in int o assem bly language, m icrocode, gat es, t ransist ors, at om s, quarks, superst rings and furt her, but t hat 's beyond t he scope of t his book.)
Th e Livin g Pro je ct The lifet im e of t he proj ect , cyclic nat ure of program m ing, t he phases, openended design, t he program as a living organism . Every soft ware proj ect has a beginning. Very few have an end ( unless t hey are cancelled by t he m anagem ent ) . You should get used t o t his kind of openended developm ent . You will save yourself and your coworkers a lot of grief . Assum e from t he very beginning t hat : • New feat ures will be added, • Part s of t he program will be rewrit t en, • Ot her people will have t o read, underst and, and m odify your code, • There will be version 2.0 ( and furt her) . Design for version 2, im plem ent for version 1. Som e of t he funct ionalit y expect ed in v. 2 should be st ubbed out in v. 1 using dum m y com ponent s. The developm ent of a soft ware proj ect consist s of cycles of different m agnit ude. The longest scale cycle is t he m aj or version cycle. Wit hin it we usually have one or m ore m inor version cycles. The creat ion of a m aj or version goes t hrough t he following st ages: • Requirem ent ( or ext ernal) specificat ion,
333
• • •
Archit ect ural design ( or re- design) , I m plem ent at ion, Test ing and bug fixing. Tim e- wise, t hese phases are int erlaced. Archit ect ural design feeds back int o t he requirem ent s spec. Som e feat ures t urn out t o be t oo expensive, t he need for ot hers arises during t he design. I m plem ent at ion feeds back int o t he design in a m aj or way. Som e even suggest t hat t he developm ent should go t hrough t he cycles of im plem ent at ion of t hrow- away prot ot ypes and phases of re- design. Throwing away a prot ot ype is usually t oo big a wast e of developm ent t im e. I t m eans t hat t oo lit t le t im e was spent designing and st udying t he problem , and t hat t he im plem ent at ion m et hodology was inadequat e. One is not supposed t o use a different m et hodology when designing and im plem ent ing prot ot ypes, scaffolding or st ubs- - as opposed t o designing and im plem ent ing t he final product . Not following t his rule is a sign of hypocrisy. Not only is it dem oralizing, but it doesn't save any developm ent t im e. Quit e t he opposit e! My fellow program m ers and I were bit t en by bugs or om issions in t he scaffolding code so m any t im es, and wast ed so m uch t im e chasing such bugs, t hat we have finally learned t o writ e scaffolding t he sam e way we writ e product ion code. As a side effect , whenever t he scaffolding survives t he im plem ent at ion cycle and get s int o t he final product ( you'd be surprised how oft en t hat happens! ) , it doesn't lead t o any m aj or disast ers. Going back t o t he im plem ent at ion cycle. I m plem ent ing or rewrit ing any m aj or com ponent has t o be preceded by careful and det ailed design or redesign. The docum ent at ion is usually updat ed in t he process, lit t le essays are added t o t he archit ect ural spec. I n general, t he design should be t reat ed as an open- ended process. I t is alm ost always st rongly influenced by im plem ent at ion decisions. This is why it is so im port ant t o have t he discipline t o const ant ly updat e t he docum ent at ion. Docum ent at ion t hat is out of sync wit h t he proj ect is useless ( or worse t han useless- - it creat es m isinform at ion) . The im plem ent at ion proper is also done in lit t le cycles. These are t he fundam ent al edit - com pile- run cycles, well known t o every program m er. Not ice how t est ing is again int erlaced wit h t he developm ent . The run part of t he cycle serves as a sim ple sanit y t est . At t his level, t he work of a program m er resem bles t hat of a physician. The first principle- - never harm t he pat ient - - applies very well t o program m ing. I t is called " don't break t he code." The program should be t reat ed like a living organism . You have t o keep it alive at all t im es. Killing t he program and t hen resuscit at ing it is not t he right approach. So m ake all changes in lit t le st eps t hat are self- cont ained and as m uch t est able as possible. Som e funct ionalit y m ay be t em porarily disabled when doing a big " organ t ransplant ," but in general t he program should be funct ional at all t im es. Finally, a word of caut ion: How not t o develop a proj ect ( and how it is st ill done in m any places) . Don't j um p int o im plem ent at ion t oo quickly. Be pat ient . Resist t he pressure from t he m anagers t o have som et hing for a dem o as soon as possible. Think before you code. Don't sit in front of t he com put er wit h only a vague idea of what you want t o do wit h t he hope t hat you'll figure it out by t rial and error. Don't writ e sloppy code " t o be cleaned up lat er." There is a big difference bet ween st ubbing out som e funct ionalit y and writ ing sloppy code.
Th e Liv in g Pr o g r a m m e r Hum ilit y, sim plicit y, t eam spirit , dedicat ion.
334
A program m er is a hum an being. Failing t o recognize it is a source of m any m isunderst andings. The fact t hat t he program m er int eract s a lot wit h a com put er doesn't m ean t hat he or she is any less hum an. Since it is t he com put er t hat is supposed t o serve t he hum ans and not t he ot her way around, program m ing as an act ivit y should be organized around t he hum ans. I t sounds like a t ruism , but you'd be surprised how oft en t his sim ple rule is violat ed in real life. Forcing people t o program in assem bly ( or C for t hat m at t er) is j ust one exam ple. St ruct uring t he design around low level dat a st ruct ures like hash t ables, linked list s, et c., is anot her exam ple. The fact t hat j obs of program m ers haven't been elim inat ed by com put ers ( quit e t he opposit e! ) m eans t hat being hum an has it s advant ages. The fact t hat som e hum an j obs have been elim inated by com put ers m eans t hat being com put er has it s advant ages. The fundam ent al equat ion of soft ware engineering is t hus H u m a n Cr e a t ivit y + Com pu t e r Spe e d a n d Re lia bilit y = Pr ogr a m Trying t o writ e program s com bining hum an speed and reliabilit y wit h com put er creat ivit y is a big m ist ake! So let 's face it , we hum ans are slow and unreliable. When a program m er has t o wait for t he com put er t o finish com pilat ion, som et hing is wrong. When t he program m er is supposed t o writ he error- free code wit hout any help from t he com piler, linker or debugger, som et hing is wrong. I f t he program m er, inst ead of solving a problem wit h paper and pencil, t ries t o find t he com binat ion of param et ers t hat doesn't lead t o a general prot ect ion fault by t rial and error, som et hing is badly wrong. The charact er t rait s t hat m ake a good program m er are ( m aybe not so surprisingly) sim ilar t o t hose of a m art ial art disciple. Hum ilit y, pat ience, sim plicit y on t he one hand; dedicat ion and t eam spirit on t he ot her hand. And m ost of all, m ist rust t owards everybody including oneself. • Hum ilit y: Recognize your short com ings. I t is virt ually im possible for a hum an t o writ e error- free code. We all m ake m ist akes. You should writ e code in ant icipat ion of m ist akes. Use any m eans available t o m en and wom en t o guard your code against your own m ist akes. Don't be st ingy wit h assert ions. Use heap checking. Take t im e t o add debugging out put t o your program . • Pat ience: Don't rush t owards t he goal. Have pat ience t o build solid foundat ions. Design before you code. Writ e solid code for fut ure generat ions. • Sim plicit y: Get rid of unnecessary code. I f you find a sim pler solut ion, rewrit e t he relevant part s of t he program t o m ake use of it . Every program can be sim plified. Try t o avoid special cases. • Dedicat ion: Program m ing is not a nine- t o- five j ob. I am not saying t hat you should work night s and weekends. I f you are, it is usually a sign of bad m anagem ent . But you should expect a lifet im e of learning. You have t o grow in order t o keep up wit h t he t rem endous pace of progress. I f you don't grow, you'll be left behind by t he progress of t echnology. • Team spirit : Long gone are t he t im es of t he Lone Program m er. You'll have t o work in a t eam . And t hat m eans a lot . You'll have t o work on your com m unicat ion skills. You'll have t o accept cert ain st andards, coding convent ions, com m ent ing convent ions, et c. Be ready t o discuss and change som e of t he convent ions if t hey st op m aking sense. Som e people preach t he idea t hat " A st upid convent ion is bet t er t han no convent ion." Avoid such people. • Program m er's paranoia: Don't t rust anybody's code, not even your own.
335
D e s ign Strate gie s To p -D o w n Ob je ct Or ie n t e d D e s ig n Top level obj ect s, abst ract ions and m et aphors, com ponent s. I t is all t oo easy t o st art t he design by com ing up wit h such obj ect s as hash t ables, linked list s, queues, t rees, and t rying t o put t hem t oget her. Such an approach, bot t om - up, im plem ent at ion driven, should be avoided. A program t hat is built bot t om - up ends up wit h a st ruct ure of a soup of obj ect s. There are pieces of veget ables, chunks of m eat , noodles of various kinds, all float ing in som e kind of brot h. I t sort of looks obj ect orient ed- - t here are " classes" of noodles, veget ables, m eat , et c. However, since you rarely change t he im plem ent at ion of linked list s, queues, t rees, et c., you don't gain m uch from t heir obj ect - orient edness. Most of t he t im e you have t o m aint ain and m odify t he shapeless soup. Using t he t op- down approach, on t he ot her hand, you divide your program int o a sm all num ber of int eract ing high- level obj ect s. The idea is t o deal wit h only a few t ypes of obj ect s- - classes ( on t he order of seven plus/ m inus t wo- - t he capacit y of our short - t erm m em ory! ) . The t op- level obj ect s are divided int o t he m ain act ors of t he program , and t he com m unicat ion obj ect s t hat are exchanged bet ween t he act ors. I f t he program is int eract ive, you should st art wit h t he user int erface and t he obj ect s t hat deal wit h user input and screen- ( or t elet ype- ) out put . Once t he t op level obj ect s are specified, you should go t hrough t he exercise of rehearsing t he int eract ions bet ween t he obj ect s ( t his is som et im es called going t hrough use- case scenarios) . Go t hrough t he init ializat ion process, decide which obj ect have t o be const ruct ed first , and in what st at e t hey should st art . You should avoid using global obj ect s at any level ot her t han possibly t he t op level. Aft er everyt hing has been init ialized, pret end t hat you are t he user, see how t he obj ect s react t o user input , how t hey change t heir st at e, what kind of com m unicat ion obj ect s t hey exchange. You should be able t o describe t he int eract ion at t he t op wit hout having t o resort t o t he det ails of t he im plem ent at ion of lower levels. Aft er t his exercise you should have a pret t y good idea about t he int erfaces of your t op- level obj ect s and t he cont ract s t hey have t o fulfill ( t hat is, what t he result s of a given call wit h part icular argum ent s should be) . Every obj ect should be clearly defined in as few words as possible, it s funct ionalit y should form a coherent and well rounded abst ract ion. Try t o use com m on language, rat her t han code, in your docum ent at ion, in order t o describe obj ect s and t heir int eract ions. Rem em ber, cent er t he proj ect around hum ans, not com put ers. I f som et hing can be easily described in com m on language, it usually is a good abst ract ion. For t hings t hat are not easily abst ract ed use a m e t a ph or . An edit or m ight use a m et aphor of a sheet of paper; a scheduler a m et aphor of a calendar; a drawing program , m et aphors of pencils, brushes, erasers, palet t es, et c. The design of t he user int erface revolves around m et aphors, but t hey also com e in handy at ot her levels of design. Files, st ream s, sem aphores, port s, pages of virt ual m em ory, t rees, st acks- - t hese are all exam ples of very useful low- level m et aphors. The right choice of abst ract ions is always im port ant , but it becom es absolut ely crucial in t he case of a large soft ware proj ect , where t op- level obj ect s are im plem ent ed by separat e t eam s. Such obj ect s are called com pon e n t s. Any
336
change t o t he com ponent 's int erface or it s cont ract , once t he developm ent st art ed going full st eam ahead, is a disast er. Depending on t he num ber of com ponent s t hat use t his part icular int erface, it can be a m inor or a m aj or disast er. The m agnit ude of such a disast er can only be m easured in Richt er scale. Every proj ect goes t hrough a few such " eart hquakes" - - t hat 's j ust life! Now, repeat t he sam e design procedure wit h each of t he t op- level obj ect s. Split it int o sub- obj ect s wit h well defined purpose and int erface. I f necessary, repeat t he procedure for t he sub- obj ect s, and so on, unt il you have a pret t y det ailed design. Use t his procedure again and again during t he im plem ent at ion of various pieces. The goal is t o superim pose som e sort of a self- sim ilar, fract al st ruct ure on t he proj ect . The t op level descript ion of t he whole program should be sim ilar t o t he descript ion of each of t he com ponent s, it s sub- com ponent s, obj ect s, sub- obj ect s, et c. Every t im e you zoom - in or zoom - out , you should see m ore or less t he sam e t ype of pict ure, wit h a few self- cont ained obj ect s collaborat ing t owards im plem ent ing som e well- defined funct ionalit y.
M o d e l-Vie w -Co n t r o lle r Designing user int erface, input driven program s, Model- View- Cont roller paradigm Even t he sim plest m odern- day program s offer som e kind of int eract ivit y. Of course, one can st ill see a few rem nant s of t he grand UNI X paradigm , where every program was writ t en t o accept a one dim ensional st ream of charact ers from it s st andard input and spit out anot her st ream at it s st andard out put . But wit h t he advent of t he Graphical User I nt erface ( GUI ) , t he so- called " com m andline int erface" is quickly becom ing ext inct . For t he user, it m eans friendlier, m ore nat ural int erfaces; for t he program m er it m eans m ore work and a change of philosophy. Wit h all t he available help from t he operat ing syst em and wit h appropriat e t ools at hand it isn't difficult t o design and im plem ent user int erfaces, at least for graphically non- dem anding program s. What is needed is a change of perspect ive. An int eract ive program is, for t he m ost part , in pu t - dr ive n . Act ions in t he program happen in response t o user input . At t he highest level, an int eract ive program can be seen a series of event handlers for ext ernally generat ed event s. Every key press, every m ouse click has t o be handled appropriat ely. The obj ect - orient ed response t o t he int eract ive challenge is t he Model- ViewCont roller paradigm first developed and used in Sm allt alk. The Con t r olle r obj ect is t he focus of all ext ernal ( and som et im es int ernal as well) event s. I t s role is t o int erpret t hese event s as m uch as is necessary t o decide which of t he program obj ect s will have t o handle t hem . Appropriat e m essages are t hen sent t o such obj ect s ( in Sm allt alk parlance; in C+ + we j ust call appropriat e m et hods) . The Vie w t akes care of t he program 's visual out put . I t t ranslat es request s from ot her obj ect s int o graphical represent at ions and displays t hem . I n ot her words it abst ract s t he out put . Drawing lines, filling areas, writ ing t ext , showing t he cursor, are som e of t he m any responsibilit ies of t he View. Cent ralizing input in t he Cont roller and out put in t he View leaves t he rest of t he program independent from t he int ricacies of t he input / out put syst em ( also m akes t he program easy t o port t o ot her environm ent s wit h slight ly different graphical capabilit ies and int erfaces) . The part of t he program t hat is independent of t he det ails of input and out put is called t he M ode l. I t is t he hard worker and t he brains of t he program . I n sim ple program s, t he Model corresponds t o a single obj ect , but quit e oft en it is a collect ion of t op level
337
obj ect s. Various part s of t he Model are act ivat ed by t he Cont roller in response t o ext ernal event s. As a result of changes of st at e, t he Model updat es t he View whenever it finds it appropriat e. As a rule, you should st art t he t op- down design of an int eract ive program by est ablishing t he funct ionalit y of t he Cont roller and t he View. What ever happens prior t o any user act ion is considered init ializat ion of t hese com ponent s and t he m odel it self. The M- V- C t riad m ay also reappear at lower levels of t he program t o handle a part icular t ype of cont rol, a dialog box, an edit cont rol, et c.
D o cu m e n t a t io n Re qu ire m e n t Sp e cificatio n St at em ent of purpose, funct ionalit y, user int erface, input , out put , size lim it at ions and perform ance goals, feat ures, com pat ibilit y. The first docum ent t o be writ t en before any ot her work on a proj ect m ay begin is t he Requirem ent Specificat ion ( also called an Ext ernal Specificat ion) . I n large proj ect s t he Requirem ent Spec m ight be prepared by a dedicat ed group of people wit h access t o m arket research, user feedback, user t est s, et c. However, no m at t er who does it , t here has t o be a feedback loop going back from t he archit ect s and im plem ent ers t o t he group responsible for t he Requirem ent Spec. The crucial part of t he spec is t he st at em ent of pu r pose - - what t he purpose of t he part icular soft ware syst em is. Som et im es rest at ing t he purpose of t he program m ight bring som e new insight s or change t he focus of t he design. For inst ance, describing a com piler as a program which checks t he source file for errors, and which occasionally creat es an obj ect file ( when t here are no errors) , m ight result in a com pet it ively superior product . The st at em ent of purpose m ight also cont ain a discussion of t he key m et aphor( s) used in t he program . An edit or, for inst ance, m ay be described as a t ool t o m anipulat e lines of t ext . Experience however has shown t hat edit ors t hat use t he m et aphor of a sheet of paper are superior. Spreadsheet program s owe t heir popularit y t o anot her well chosen m et aphor. Then, a det ailed descript ion of t he fu n ct ion a lit y of t he program follows. I n a word- processor requirem ent spec one would describe t ext input , ways of form at t ing paragraphs, creat ing st yles, et c. I n a sym bolic m anipulat ion program one would specify t he kinds of sym bolic obj ect s and expressions t hat are t o be handled, t he various t ransform at ions t hat could be applied t o t hem , et c. This part of t he spec is supposed t o t ell t he designers and t he im plem ent ers what funct ionalit y t o im plem ent . Som e of it is described as m andat ory, som e of it goes int o t he wish list . The u se r in t e r fa ce and visual m et aphors go next . This part usually undergoes m ost ext ensive changes. When t he first prot ot ype of t he int erface is creat ed, it goes t hrough m ore or less ( m ost ly less) rigorous t est ing, first by developers, t hen by pot ent ial users. Som et im es a m anager doesn't like t he feel of it and sends program m ers back t o t he drawing board. I t is definit ely m ore art t han science, yet a user int erface m ay m ake or break a product . What com pounds t he problem is t he fact t hat anybody m ay crit icize user int erface. No special t raining is required. And everybody has different t ast es. The program m ers t hat im plem ent it are probably t he least qualified people t o j udge it . They are used t o t erse and crypt ic int erfaces of t heir program m ing t ools, grep, m ake, link, Em ax or vi. I n any case, designing user int erface is t he m ost frust rat ing and ungrat eful j ob. Behind t he user int erface is t he in pu t / ou t pu t specificat ion. I t describes what kind of input is accept ed by t he program , and what out put is generat ed by
338
t he program in response t o t his input . For inst ance, what is supposed t o happen when t he user clicks on t he form at - brush but t on and t hen clicks on a word or a paragraph in a docum ent ( t he form at t ing should be past ed over) . Or what happens when t he program reads a file t hat cont ains a com m a- separat ed list s of num bers. Or what happens when a pict ure is past ed from t he clipboard. Speed and size requirem ent s m ay also be specified. The kind of processor, m inim um m em ory configurat ion, and disk size are oft en given. Of course t here is always conflict bet ween t he ever growing list of desired feat ures and t he always conservat ive hardware requirem ent s and breat ht aking perform ance requirem ent s. ( Feat ures always win! Only when t he proj ect ent ers it s crit ical phase, feat ures get decim at ed.) Finally, t here m ay be som e com pa t ibilit y requirem ent s. The product has t o underst and ( or convert ) files t hat were produced eit her by it s earlier versions, or by com pet it ors' product s, or bot h. I t is wise t o include som e com pat ibilit y feat ures t hat will m ake fut ure versions of t he product easier t o im plem ent ( version st am ps are a m ust ) . Arch ite ctu re Sp e cificatio n Top level view, crucial dat a st ruct ures and algorit hm s. Maj or im plem ent at ion choices. The archit ect ural docum ent describes how t hings work and why t hey work t he way t hey work. I t 's a good idea t o eit her describe t heoret ical foundat ions of t he syst em , or at least give point ers t o som e lit erat ure. This is t he docum ent t hat gives t he t op level view of t he product as a program , as seen by t he developers. All t op level com ponent s and t heir int eract ions are described in som e det ail. The docum ent should show it clearly t hat , if t he t op level com ponent s im plem ent t heir funct ionalit y according t o t heir specificat ions, t he syst em will work correct ly. That will t ake t he burden off t he shoulders of developers- - t hey won't have t o t hink about t oo m any dependencies. The archit ect ural spec defines t he m aj or dat a st ruct ures, especially t he persist ent ones. The docum ent t hen proceeds wit h describing m aj or event scenarios and various st at es of t he syst em . The program m ay st art wit h an em pt y slat e, or it m ay load som e hist ory ( docum ent s, logs, persist ent dat a st ruct ures) . I t m ay have t o finish som e t ransact ions t hat were int errupt ed during t he last session. I t has t o go t hrough t he init ializat ion process and presum ably get int o som e quiescent st at e. Ext ernal or int ernal event s m ay cause som e act ivit y t hat t ransform s dat a st ruct ures and leads t o st at e t ransit ions. New st at es have t o be described. I n som e cases t he algorit hm s t o be used during such act ivit y are described as well. Too det ailed a descript ion of t he im plem ent at ion should be avoided. I t becom es obsolet e so quickly t hat it m akes lit t le sense t o t ry t o m aint ain it . Once m ore we should rem ind ourselves t hat t he docum ent at ion is a living t hing. I t should be writ t en is such a way t hat it is easy t o updat e. I t has t o have a sensible st ruct ure of it s own, because we know t hat it will be changed m any t im es during t he im plem ent at ion cycle. I n fact it should be changed, and it is very im port ant t o keep it up- t o- dat e and encourage fellow developers t o look int o it on a regular basis. I n t he im plem ent at ion cycle, t here are t im es when it is necessary t o put som e flesh int o t he design of som e im port ant obj ect t hat has only been sket ched in t he archit ect ural spec. I t is t im e t o eit her expand t he spec or writ e short essays on select ed t opics in t he form of separat e docum ent s. I n such
339
essays one can describe non- t rivial im plem ent at ions, algorit hm s, dat a st ruct ures, program m ing not es, convent ions, et c.
Te am W o rk Pr o d u ct iv it y Com m unicat ion explosion, vicious circle. The life of a big- t eam program m er is spent • Com m unicat ing wit h ot her program m ers, at t ending m eet ings, reading docum ent s, reading em ail, responding t o it , • Wait ing for ot hers t o finish t heir j ob, t o fix a bug, t o im plem ent som e vit al funct ionalit y, t o finish building t heir com ponent , • Fight ing fires, fixing build breaks, chasing som ebody else's bugs, • St aring at t he com put er screen while t he m achine is com piling, loading a huge program , running t est suit s, reboot ing- - and finally, when t im e perm it s- • Developing new code. There are n( n- 1) / 2 possible connect ions bet ween n dot s. That 's of t he order of O( n 2 ) . By t he sam e t oken, t he num ber of possible in t e r a ct ion s wit hin a group of n program m ers is of t he order of O( n 2 ) . The num ber of hours t hey can put out is of t he order of O( n) . I t is t hus inevit able t hat at som e point , as t he size of t he group increases, it s m em bers will st art spending all t heir t im e com m unicat ing. I n real life, people com e up wit h various com m unicat ionoverload defense m echanism s. One defense is t o ignore incom ing m essages, anot her is t o work odd hours ( night s, weekends) , when t here aren't t hat m any people around t o dist urb you ( wishful t hinking! ) . As a program m er you are const ant ly bom barded wit h inform at ion from every possible direct ion. They will broadcast m essages by em ail, t hey will drop print ed docum ent s in your m ailbox, t hey will invit e you t o m eet ings, t hey will call you on t he phone, or in really urgent cases t hey will drop by your office or cubicle and t alk t o you direct ly. I f program m ers were only t o writ e and t est code ( and it used t o be like t his not so long ago) t he m arket would be flooded wit h new operat ing syst em s, applicat ions, t ools, gam es, educat ional program s, and so on, all at ridiculously low prices. As a m at t er of fact , alm ost all public dom ain and shareware program s are writ t en by people wit h virt ually no com m unicat ion overhead. The following chart shows t he result s of a very sim ple sim ulat ion. I assum ed t hat every program m er spends 10 m inut es a day com m unicat ing wit h every ot her program m er in t he group. The rest of t he t im e he or she does som e real program m ing. The t im e spent program m ing, m ult iplied by t he num ber of program m ers in t he group, m easures t eam product ivit y- - t he effect ive work done by t he group every day. Not ice t hat , under t hese assum pt ions, t he effect ive work peaks at about 25 people and t hen st art s decreasing.
340
But wait , t here's m ore! The m ore people you have in t he group, t he m ore com plicat ed t he dependency graph. Com ponent A cannot be t est ed unt il com ponent B works correct ly. Com ponent B needs som e new funct ionalit y from C. C is blocked wait ing for a bug fix in D. People are wait ing, t hey get frust rat ed, t hey send m ore em ail m essages, t hey drop by each ot her's offices. Not enough yet ? Consider t he reliabilit y of a syst em wit h n com ponent s. The m ore com ponent s, t he m ore likely it is t hat one of t hem will break. When one com ponent is broken, t he t est ing of ot her com ponent s is eit her im possible or at least im paired. That in t urn leads t o m ore buggy code being added t o t he proj ect causing even m ore breakages. I t seem s like all t hese m echanism s feed on each ot her in one vicious circle. I n view of all t his, t he t eam 's product ivit y curve is m uch t oo opt im ist ic. The ot her side of t he coin is t hat raising t he product ivit y of a program m er, eit her by providing bet t er t ools, bet t er program m ing language, bet t er com put er, or m ore help in non- essent ial t asks, creat es a posit ive feedback loop t hat am plifies t he product ivit y of t he t eam . I f we could raise t he product ivit y of every program m er in a fixed size proj ect , we could reduce t he size of t he t eam - - t hat in t urn would lead t o decreased com m unicat ion overhead, furt her increasing t he effect ive product ivit y of every program m er. Every dollar invest ed in program m er's product ivit y saves several dollars t hat would ot herwise be spent hiring ot her program m ers. Cont inuing wit h our sim ple sim ulat ion- - suppose t hat t he goal is t o produce 100,000 lines of code in 500 days. I assum ed t he st art ing product ivit y of 16 lines of code per day per program m er, if t here were no com m unicat ion overhead. The following graph shows how t he required size of t he t eam shrinks wit h t he increase in product ivit y.
341
Not ice t hat , when t he curve t urns m ore or less linear ( let 's say at about 15 program m ers) , every 3% increase in product ivit y saves us one program m er, who can t hen be used in anot her proj ect . Several t hings influence product ivit y: • The choice of a program m ing language and m et hodology. So far it is hard t o beat C+ + and obj ect orient ed m et hodology. I f t he size or speed is not an issue, ot her specialized higher- level languages m ay be appropriat e ( Sm allt alk, Prolog, Basic, et c.) . The t radeoff is also in t he init ial invest m ent in t he educat ion of t he t eam . • The choice of proj ect - wide convent ions. Such decisions as whet her t o use except ions, how t o propagat e errors, what is t he code sharing st rat egy, how t o deal wit h proj ect - wide header files, et c., are all very difficult t o correct during t he developm ent . I t is m uch bet t er t o t hink about t hem up front . • The choice of a program m ing environm ent and t ools. • The choice of hardware for t he developm ent . RAM and disk space are of crucial im port ance. Local Area Net work wit h shared print ers and em ail is a necessit y. The need for large m onit ors is oft en overlooked.
Te a m St r a t e g ie s I n t he ideal world we would divide work bet ween sm all t eam s and let each t eam provide a clear and im m ut able int erface t o which ot her t eam s would writ e t heir code. We would couple each int erface wit h a well defined, com prehensive and det ailed cont ract . The int eract ion bet ween t eam s would be reduced t o t he exchange of int erface specificat ions and periodic updat es as t o which part of t he cont ract has already been im plem ent ed. This ideal sit uat ion is, t o som e ext ent , realized when t he t eam uses ext ernally creat ed com ponent s, such as libraries, operat ing syst em API 's ( applicat ion program m er int erfaces) , et c. Everyt hing whose int erface and cont ract can be easily described is a good candidat e for a library. For inst ance, t he st ring m anipulat ion library, t he library of cont ainer obj ect s, iost ream s, et c., t hey are all well described eit her in on- line help, or in com piler m anuals, or in ot her books. Som e API 's are not t hat easily described, so using t hem is oft en a m at t er of t rial and error ( or cust om er support calls) . I n real world, t hings are m ore com plicated t han t hat . Yes, we do divide work bet ween sm all t eam s and t hey do t ry t o com e up wit h som e int erfaces and cont ract s. However, t he int erfaces are far from im m ut able, t he cont ract s are far
342
from com prehensive, and t hey are being const ant ly re- negot iat ed bet ween t eam s. All we can do is t o t ry t o sm oot h out t his process as m uch as possible. You have t o st art wit h a good design. Spend as m uch t im e as necessary on designing good hierarchical st ruct ure of t he product . This st ruct ure will be t ranslat ed int o t he hierarchical st ruct ure of t eam s. The bet t er t he design, t he clearer t he int erfaces and cont ract s bet ween all part s of t he product . That m eans fewer changes and less negot iat ing at t he lat er st ages of t he im plem ent at ion. Divide t he work in clear correspondence t o t he st ruct ure of t he design, t aking int o account com m unicat ion needs. Already during t he design, as soon as t he st ruct ure cryst allizes, assign t eam leaders t o all t he t op level com ponent s. Let t hem negot iat e t he cont ract s bet ween t hem selves. Each t eam leader in t urn should repeat t his procedure wit h his or her t eam , designing t he st ruct ure of t he com ponent and, if necessary, assigning sub- com ponent s t o leaders of sm aller t eam s. The whole design should go t hrough several passes. The result s of lower level design should serve as feedback t o t he design of higher level com ponent s, and event ually cont ribut e t o t he design of t he whole product . Each t eam writ es it s own part of t he specificat ion. These specificat ions are reviewed by ot her t eam s responsible for ot her part s of t he sam e higher level com ponent . The m ore negot iat ing is done up front , during t he design, t he bet t er t he chances of sm oot h im plem ent at ion. The negot iat ions should be st ruct ured in such a way t hat t here are only a few people involved at a t im e. A " plenary" m eet ing is useful t o describe t he t op level design of t he product t o all m em bers of all t eam s, so t hat everybody knows what t he big pict ure is. Such m eet ings are also useful during t he im plem ent at ion phase t o m onit or t he progress of t he product . They are not a good forum for design discussions. Cont ract negot iat ions during im plem ent at ion usually look like t his: Som e m em ber of t eam A is using one of t he t eam B's int erfaces according t o his or her underst anding of t he cont ract . Unexpect edly t he code behind t he int erface ret urns an error, assert s, raises an except ion, or goes haywire. The m em ber of t eam A eit her goes t o t eam B's leader t o ask who is responsible for t he im plem ent at ion of t he int erface in quest ion, or direct ly t o t he im plem ent or of t his int erface t o ask what caused t he st range behavior. The im plem ent or eit her clarifies t he cont ract , changes it according t o t he needs of t eam A, fixes t he im plem ent at ion t o fulfill t he cont ract , or t akes over t he t racing of t he bug. I f a change is m ade t o com ponent B, it has t o be t horoughly t est ed t o see if it doesn't cause any unexpect ed problem s for ot her users of B. During t he im plem ent at ion of som e m aj or new funct ionalit y it m ay be necessary t o ask ot her t eam s t o change or ext end t heir int erfaces and/ or cont ract s. This is considered a r e - de sign . A re- design, like any ot her dist urbance in t he syst em , produces concent ric waves. The feedback process, described previously in t he course of t he original design, should be repeat ed again. The int erface and cont ract dist urbance are propagat ed first wit hin t he t eam s t hat are im m ediat ely involved ( so t hat t hey m ake sure t hat t he changes are indeed necessary, and t o t ry t o describe t hese changes in som e det ail.) , t hen up t owards t he t op. Som ewhere on t he way t o t he t op t he dist urbances of t he design m ay get dum ped com plet ely. Or t hey m ay reach t he very t op and change t he way t op level obj ect s int eract . At t hat point t he changes will be propagat ed downwards t o all t he involved t eam s. Their feedback is t hen bounced back t owards t he t op, and t he process is repeat ed as m any t im es as is
343
necessary for t he changes t o st abilize t hem selves. This " annealing" process ends when t he proj ect reaches a new st at e of equilibrium .
Im p le m e n tatio n Strate gie s Glo b a l D e cis io n s Error handling, except ions, com m on headers, code reuse, debug out put . The biggest decision t o be m ade, before t he im plem ent at ion can even begin, is how t o handle errors and except ions. There are a few m aj or sources of errors • Bug wit hin t he com ponent , • I ncorrect param et ers passed from ot her ( t rust ed) com ponent , • I ncorrect user input , • Corrupt ion of persist ent dat a st ruct ures, • Syst em running out of resources, Bugs are not supposed t o get t hrough t o t he final ret ail version of t he program , so we have t o deal wit h t hem only during developm ent . ( Of course, in pract ice m ost ret ail program s st ill have som e residual bugs.) Since during developm ent we m ost ly deal wit h debug builds, we can prot ect ourselves from bugs by sprinkling our code wit h assert ions. Assert ions can also be used t o enforce cont ract s bet ween com ponent s. User input , and in general input from ot her less t rust ed part s of t he syst em , m ust be t horoughly t est ed for correct ness before proceeding any furt her. " Typing m onkeys" t est s have t o be done t o ensure t hat no input can break our program . I f our program provides som e service t o ot her program s, it should t est t he validit y of ext ernally passed argum ent s. For inst ance, operat ing syst em API funct ions always check t he param et ers passed from applicat ions. This t ype of errors should be dealt wit h on t he spot . I f it 's direct user input , we should provide im m ediat e feedback; if it 's t he input from an unt rust ed com ponent , we should ret urn t he appropriat e error code. Any kind of persist ent dat a st ruct ures t hat are not t ot ally under our cont rol ( and t hat is always t rue, unless we are a file syst em ; and even t hen we should be careful) can always get corrupt ed by ot her applicat ions or t ools, not t o m ent ion hardware failures and user errors. We should t herefore always t est for t heir consist ency. I f t he corrupt ion is fat al, t his kind of error is appropriat e for t urning it int o an except ion. A com m on program m ing error is t o use assert ions t o enforce t he consist ency of dat a st ruct ures read from disk. Dat a on disk should never be t rust ed, t herefore all t he checks m ust also be present in t he ret ail version of t he program . Running out of resources- - m em ory, disk space, handles, et c.- - is t he prim e candidat e for except ions. Consider t he case of m em ory. Suppose t hat all program m ers are t rained t o always check t he ret urn value of operat or new ( t hat 's already unrealist ic) . What are we supposed t o do when t he ret urned point er is null? I t depends on t he sit uat ion. For every case of calling new, t he program m er is supposed t o com e up wit h som e sensible recovery. Consider t hat t he recovery pat h is rarely t est ed ( unless t he t est t eam has a way of sim ulat ing such failures) . We t ake up a lot of program m ers' t im e t o do som et hing t hat is as likely t o fail as t he original t hing whose failure we were handling. The sim plest way t o deal wit h out - of- m em ory sit uat ions is t o print a m essage " Out of m em ory" and exit . This can be easily accom plished by set t ing our own out - of- m em ory handler ( _set_new_handler funct ion in C+ + ) . This is however very rarely t he desired solut ion. I n m ost cases we at least want t o do
344
som e cleanup, save som e user dat a t o disk, m aybe even get back t o som e higher level of our program and t ry t o cont inue. The use of except ions and resource m anagem ent t echniques ( described earlier) seem s m ost appropriat e. I f C+ + except ion handling is not available or prohibit ed by m anagers, one is left wit h convent ional t echniques of t est ing t he result s of new, cleaning up and propagat ing t he error higher up. Of course, t he program m ust be t horoughly t est ed using sim ulat ed failures. I t is t his kind of philosophy t hat leads t o proj ect - wide convent ions such as " every funct ion should ret urn a st at us code." Norm al ret urn values have t hen t o be passed by reference or a point er. Very soon t he syst em of st at us codes develops int o a Byzant ine st ruct ure. Essent ially every error code should not only point at t he culprit , but also cont ain t he whole hist ory of t he error, since t he int erpret at ion of t he error is enriched at each st age t hrough which it passes. The use of const ruct ors is t hen highly rest rict ed, since t hese are t he only funct ions t hat cannot ret urn a value. Very quickly C+ + degenerat es t o " bet t er C." Fort unat ely m ost m odern C+ + com pilers provide except ion support and hopefully soon enough t his discussion will only be of hist orical int erest . Anot her im port ant decision t o be m ade up front is t he choice of proj ect - wide debugging convent ions. I t is ext rem ely handy t o have progress and st at us m essages print ed t o som e kind of a debug out put or log. The choice of direct ory st ruct ure and build procedures com es next . The st ruct ure of t he proj ect and it s com ponent s should find it s reflect ion in t he direct ory st ruct ure of source code. There is also a need for a place where proj ect - wide header files and code could be deposit ed. This is where one put s t he debugging harness, definit ions of com m on t ypes, proj ect - wide param et ers, shared ut ilit y code, useful t em plat es, et c. Som e degree of code reuse wit hin t he proj ect is necessary and should be well organized. What is usually quit e neglect ed is t he need for inform at ion about t he availabilit y of reusable code and it s docum ent at ion. The inform at ion about what is available in t he reusabilit y depart m ent should be broadcast on a regular basis and t he up- t o- dat e docum ent at ion should be readily available. One m ore observat ion- - in C+ + t here is a very t ight coupling bet ween header files and im plem ent at ion files- - we rarely m ake m odificat ions t o one wit hout inspect ing or m odifying t he ot her. This is why in m ost cases it m akes sense t o keep t hem t oget her in t he sam e direct ory, rat her t han is som e special include direct ory. We m ake t he except ion for headers t hat are shared bet ween direct ories. I t is also a good idea t o separat e plat form dependent layers int o separat e direct ories. We'll t alk about it soon.
To p -D o w n Ob je ct Or ie n t e d Im p le m e n t a t io n The im plem ent at ion process should m odel t he design process as closely as possible. This is why im plem ent at ion should st art wit h t he t op level com ponent s. The earlier we find t hat t he t op level int erfaces need m odificat ion, t he bet t er. Besides, we need a working program for t est ing as soon as possible. The goal of t he original im plem ent at ion effort is t o t est t he flow of cont rol, lifet im es and accessibilit y of t op level obj ect s as well as init ializat ion and shut down processes. At t his st age t he program is not supposed t o do anyt hing useful, it cannot be dem oed, it is not a prot ot ype. I f t he m anagem ent needs a prot ot ype, it should be im plem ent ed by a separat e group, possibly using a different language ( Basic, Sm allt alk, et c.) . Trying t o reuse code writ t en for t he prot ot ype in t he m ain proj ect is usually a big m ist ake.
345
Only basic funct ionalit y, t he one t hat 's necessary for t he program t o m ake progress, is im plem ent ed at t hat point . Everyt hing else is st ubbed out . St ubs of class m et hods should only print debugging m essages and display t heir argum ent s if t hey m ake sense. The debugging and error handling harness should be put in place and t est ed. I f t he program is int eract ive, we im plem ent as m uch of t he View and t he Cont roller as is necessary t o get t he inform at ion flowing t owards t he m odel and showing in som e m inim al view. The m odel can be st ubbed out com plet ely. Once t he working skelet on of t he program is in place, we can st art im plem ent ing lower level obj ect s. At every st age we repeat t he sam e basic procedure. We first creat e st ubs of all obj ect s at t hat level, t est t heir int erfaces and int eract ions. We cont inue t he descent unt il we hit t he bot t om of t he proj ect , at which point we st art im plem ent ing som e " real" funct ionalit y. The goal is for t he lowest level com ponent s t o fit right in int o t he whole st ruct ure. They should snap int o place, get cont rol when appropriat e, get called wit h t he right argum ent s, ret urn t he right st uff. This st rat egy produces professional program s of uniform qualit y, wit h com ponent s t hat fit t oget her very t ight ly and efficient ly like in a well designed sport s car. Conversely, t he bot t om - up im plem ent at ion creat es program s whose part s are of widely varying qualit y, put t oget her using scot ch t ape and st rings. A lot of program m er's t im e is spent t rying t o fit square pegs int o round holes. The result resem bles anyt hing but a well designed sport s car.
In h e r it in g So m e b o d y Els e 's Co d e I n t he ideal world ( from t he program m er's point of view) every proj ect would st art from scrat ch and have no ext ernal dependencies. Once in a while such sit uat ion happens and t his is when real progress is m ade. New languages, new program m ing m et hodologies, new t eam st ruct ures can be applied and t est ed. I n t he real world m ost proj ect s inherit som e source code, usually writ t en using obsolet e program m ing t echniques, wit h it s own m odel for error handling, debugging, use or m isuse of global obj ect s, goto's, spaghet t i code, funct ions t hat go for pages and pages, et c. Most proj ect s have ext ernal dependencies- som e code, t ools, or libraries are being developed by ext ernal groups. Worst of all, t hose groups have different goals, t hey have t o ship t heir own product , com pet e in t he m arket place, et c. Sure, t hey are always ent husiast ic about having t heir code or t ool used by anot her group and t hey prom ise cont inuing support . Unfort unat ely t hey have different priorit ies. Make sure your m anager has som e leverage over t heir m anager. I f you have full cont rol over inherit ed code, plan on rewrit ing it st ep by st ep. Go t hrough a series of code reviews t o find out which part s will cause m ost problem s and rewrit e t hem first . Then do parallel developm ent , int erlacing rewrit es wit h t he developm ent of new code. The effort will pay back in t erm s of debugging t im e and overall code qualit y.
M u lt i-p la t fo r m d e v e lo p m e n t A lot of program s are developed for m ult iple plat form s at once. I t could be different hardware or a different set of API s. Operat ing syst em s and com put ers evolve- - at any point in t im e t here is t he obsolet e plat form , t he m ost popular plat form , and t he plat form of t he fut ure. Som et im es t he t arget plat form is different t han t he developm ent plat form . I n any case, t he plat form - dependent t hings should be abst ract ed and separat ed int o layers.
346
Operat ing syst em is supposed t o provide an abst ract ion layer t hat separat es applicat ions from t he hardware. Except for very specialized applicat ions, access t o disk is very well abst ract ed int o t he file syst em . I n windowing syst em s, graphics and user input is abst ract ed int o API s. Our program should do t he sam e wit h t he plat form dependent services- - abst ract t hem int o layers. A layer is a set of services t hrough which our applicat ion can access som e lower level funct ionalit y. The advant age of layering is t hat we can t weak t he im plem ent at ion wit hout having t o m odify t he code t hat uses it . Moreover, we can add new im plem ent at ions or swit ch from one t o anot her using a com pilet im e variable. Som et im es a plat form doesn't provide or even need t he funct ionalit y provided by ot her plat form s. For inst ance, in a non- m ult it asking syst em one doesn't need sem aphores. St ill one can provide a locking syst em whose im plem ent at ion can be swit ched on and off, depending on t he plat form . We can const ruct a layer by creat ing a set of classes t hat abst ract som e funct ionalit y. For inst ance, m em ory m apped files can be com bined wit h buffered files under one abst ract ion. I t is advisable t hat t he im plem ent at ion choices be m ade in such a way t hat t he plat form - of- t he- fut ure im plem ent at ion be t he m ost efficient one. I t is wort h not icing t hat if t he plat form s differ by t he sizes of basic dat a t ypes, such as 16- bit vs. 32- bit int egers, one should be ext rem ely careful wit h t he design of t he persist ent dat a st ruct ures and dat a t ypes t hat can be t ransm it t ed over t he wire. The fool proof m et hod would be t o convert all fundam ent al dat a t ypes int o st rings of byt es of well defined lengt h. I n t his way we could even resolve t he Big Endian vs. Lit t le Endian differences. This solut ion is not always accept able t hough, because of t he runt im e overhead. A t radeoff is m ade t o eit her support only t hese plat form s where t he sizes of short s and longs are com pat ible ( and t he Endians are t he sam e) , or provide conversion program s t hat can t ranslat e persist ent dat a from one form at t o anot her. I n any case it is a good idea t o avoid using ints inside dat a t ypes t hat are st ored on disk or passed over t he wire.
Pr o g r a m M o d ifica t io n s Modificat ions of exist ing code range from cosm et ic changes, such as renam ing a variable, t o sweeping global changes and m aj or rewrit es. Sm all changes are oft en suggest ed during code reviews. The rule of t hum b is t hat when you see t oo m any local variables or obj ect s wit hin a single funct ion, or t oo m any param et ers passed back and fort h, t he code is ripe for a new abst ract ion. I t is int erest ing t o not ice how t he obj ect orient ed paradigm get s dist ort ed at t he highest and at t he lowest levels. I t is oft en difficult t o com e up wit h a good set of t op level obj ect s, and all t oo oft en t he m ain funct ion ends up being a large procedure. Conversely, at t he bot t om of t he hierarchy t here is no good t radit ion of using a lot of short - lived light weight local obj ect s. The t op level sit uat ion is a m at t er of good or poor design, t he bot t om level sit uat ion depends a lot on t he qualit y of code reviews. The above rule of t hum b is of great help t here. You should also be on t he lookout for t oo m uch cut - and- past e code. I f t he sam e set of act ions wit h only sm all m odificat ions happens in m any places, it m ay be t im e t o look for a new abst ract ion. Rewrit es of sm all part s of t he program happen, and t hey are a good sign of healt hy developm ent . Som et im es t he rewrit es are m ore serious. I t could be abst ract ing a layer, in which case all t he users of a given service have t o be m odified; or changing t he higher level st ruct ure, in which case a lot of lower level st ruct ures are influenced. Fort unat ely t he t op- down obj ect - orient ed design m akes such sweeping changes m uch easier t o m ake. I t is quit e possible t o split
347
a t op level obj ect int o m ore independent part s, or change t he cont ainm ent or access st ruct ure at t he highest level ( for exam ple, m ove a sub- obj ect from one obj ect t o anot her) . How is it done? The key is t o m ake t he changes increm ent ally, t op- down. During t he first pass, you change t he int erfaces, pass different set s of argum ent s- - for inst ance pass reference variables t o t hose places t hat used t o have direct access t o som e obj ect s but are about t o loose it . Make as few changes t o t he im plem ent at ion as possible. Com pile and t est . I n t he second pass, m ove obj ect s around and see if t hey have access t o all t he dat a t hey need. Com pile and t est . I n t he t hird pass, once you have all t he obj ect s in place and all t he argum ent s at your disposal, st art m aking t he necessary im plem ent at ion changes, st ep by st ep.
Te s t in g Test ing st art s at t he sam e t im e as t he im plem ent at ion. At all t im es you m ust have a working program . You need it for your t est ing, your t eam m at es need it for t heir t est ing. The funct ionalit y will not be t here, but t he program will run and will at least print som e debugging out put . As soon as t here is som e funct ionalit y, st art regression t est ing. Re gre s s io n Te s tin g Develop a t est suit e t o t est t he basic funct ionalit y of your syst em . Aft er every change run it t o see if you hadn't broken any funct ionalit y. Expand t he t est suit e t o include basic t est s of all new funct ionalit y. Running t he regression suit e should not t ake a long t im e. Stre s s Te s tin g As soon as som e funct ionalit y st art s approaching it s final form , st ress t est ing should st art . Unlike regression t est ing, st ress t est ing is t here t o t est t he lim it s of t he syst em . For inst ance, a com prehensive t est of all possible failure scenarios like out - of- m em ory errors in various places, disk failures, unexpect ed powerdowns, et c., should be m ade. The scaleabilit y under heavy loads should be t est ed. Depending on t he t ype of program , it could be processing lot s of sm all files, or one ext rem ely large file, or lot s of request s, et c.
348