Refactoring Databases: Evolutionary Database Design 9780321293534, 0321293533

Refactoring has proven its value in a wide range of development projects-helping software professionals improve system d

386 94 3MB

English Pages 356 Year 2006

Recommend Papers

Database Design Succinctly 9781642002232

The way a user might perceive and use data and the optimal way a computer system might store it are often very different

166 48 2MB Read more

Grokking Relational Database Design

Grokking Relational Database Design teaches the art of database design through real-world projects, insightful illustrat

121 97 30MB Read more

Managing Distributed Databases. Building Bridges between Database Islands 0471086231, 9780471086239

Everything you need to create, adapt, fine-tune, and manage first-rate distributed databases End-users demand simultaneo

117 66 3MB Read more

Documenting Oracle Databases Complete Oracle Database Schema Auditing 9780974071664, 0974071668

523 24 742KB Read more

Refactoring: Improving the Design of Existing Code 0201485672, 9780201485677

As the application of object technology—particularly the Java programming language—has become commonplace, a new problem

376 26 3MB Read more

Beginning Database Design 9780764574900, 0764574906

* The perfect reference for programmers, administrators, or Web designers who are new to database development and are un

406 104 3MB Read more

Beginning Database Design 0764574906, 9780764574900

The perfect reference for programmers, administrators, or Web designers who are new to database development and are unce

428 7 2MB Read more

Database: Models, Languages, Design 0195107837

This book presents traditional introductory topics in database theory in an accessible and thorough manner. The author c

461 16 171KB Read more

Database Design Microsoft Access Tutorial

478 21 979KB Read more

Business Database Technology (2nd Edition): Theories and Design Process of Relational Databases, SQL, Introduction to OLAP, Overview of NoSQL Databases [2nd Updated and Revised ed.] 162734389X, 9781627343893

Business Database Technology provides essential knowledge of database technology for four-year college/university busine

152 66 26MB Read more

Refactoring Databases: Evolutionary Database Design
9780321293534, 0321293533

Author / Uploaded
Scott W. Ambler
Pramodkumar J. Sadalage

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Refactoring Databases: Evolutionary Database Design By Scott W. Ambler, Pramod J. Sadalage ............................................... Publisher : Addison W e sle y Pr ofe ssion a l Pub Dat e: M a r ch 0 6 , 2 0 0 6 Print I SBN- 10: 0 - 3 2 1 - 2 9 3 5 3 - 3 Print I SBN- 13: 9 7 8 - 0 - 3 2 1 - 2 9 3 5 3 - 4 Pages: 3 8 4

Table of Cont ent s | I ndex

Refact oring has proven it s value in a wide range of developm ent proj ect shelping soft ware professionals im prove syst em designs, m aint ainabilit y, ext ensibilit y, and perform ance. Now, for t he first t im e, leading agile m et hodologist Scot t Am bler and renowned consult ant Pram odkum ar Sadalage int roduce powerful refact oring t echniques specifically designed for dat abase syst em s. Am bler and Sadalage dem onst rat e how sm all changes t o t able st ruct ures, dat a, st ored procedures, and t riggers can significant ly enhance virt ually any dat abase designwit hout changing sem ant ics. You'll learn how t o evolve dat abase schem as in st ep wit h source codeand becom e far m ore effect ive in proj ect s relying on it erat ive, agile m et hodologies. This com prehensive guide and reference helps you overcom e t he pract ical obst acles t o refact oring real- world dat abases by covering every fundam ent al concept underlying dat abase refact oring. Using st art - t o- finish exam ples, t he aut hors walk you t hrough refact oring sim ple st andalone dat abase applicat ions as well as sophist icat ed m ult i- applicat ion scenarios. You'll m ast er every t ask involved in refact oring dat abase schem as, and discover best pract ices for deploying refact orings in even t he m ost com plex product ion environm ent s. The second half of t his book syst em at ically covers five m aj or cat egories of dat abase refact orings. You'll learn how t o use refact oring t o enhance dat abase st ruct ure, dat a qualit y, and referent ial int egrit y; and how t o refact or bot h archit ect ures and m et hods. This book provides an ext ensive set of exam ples built wit h Oracle and Java and easily adapt able for ot her languages, such as C# , C+ + , or VB.NET, and ot her dat abases, such as DB2, SQL Server, MySQL, and Sybase. Using t his book's t echniques and exam ples, you can reduce wast e, rework, risk, and cost and build dat abase syst em s capable of evolving sm oot hly, far int o t he fut ure.

Refactoring Databases: Evolutionary Database Design By Scott W. Ambler, Pramod J. Sadalage ............................................... Publisher : Addison W e sle y Pr ofe ssion a l Pub Dat e: M a r ch 0 6 , 2 0 0 6 Print I SBN- 10: 0 - 3 2 1 - 2 9 3 5 3 - 3 Print I SBN- 13: 9 7 8 - 0 - 3 2 1 - 2 9 3 5 3 - 4 Pages: 3 8 4

Table of Cont ent s | I ndex

Copyright Praise for Refactoring Databases The Addison-Wesley Signature Series The Addison-Wesley Signature Series About the Authors Forewords Preface Why Evolutionary Database Development? Agility in a Nutshell How to Read This Book About the Cover Acknowledgments Chapter 1. Evolutionary Database Development Section 1.1. Database Refactoring Section 1.2. Evolutionary Data Modeling Section 1.3. Database Regression Testing Section 1.4. Configuration Management of Database Artifacts Section 1.5. Developer Sandboxes Section 1.6. Impediments to Evolutionary Database Development Techniques Section 1.7. What You Have Learned Chapter 2. Database Refactoring Section 2.1. Code Refactoring Section 2.2. Database Refactoring Section 2.3. Categories of Database Refactorings Section 2.4. Database Smells Section 2.5. How Database Refactoring Fits In Section 2.6. Making It Easier to Refactor Your Database Schema Section 2.7. What You Have Learned Chapter 3. The Process of Database Refactoring Section 3.1. Verify That a Database Refactoring Is Appropriate Section 3.2. Choose the Most Appropriate Database Refactoring Section 3.3. Deprecate the Original Database Schema Section 3.4. Test Before, During, and After Section 3.5. Modify the Database Schema Section 3.6. Migrate the Source Data Section 3.7. Refactor External Access Program(s) Section 3.8. Run Your Regression Tests Section 3.9. Version Control Your Work Section 3.10. Announce the Refactoring Section 3.11. What You Have Learned

Chapter 4. Deploying into Production Section 4.1. Effectively Deploying Between Sandboxes Section 4.2. Applying Bundles of Database Refactorings Section 4.3. Scheduling Deployment Windows Section 4.4. Deploying Your System Section 4.5. Removing Deprecated Schema Section 4.6. What You Have Learned Chapter 5. Database Refactoring Strategies Section 5.1. Smaller Changes Are Easier to Apply Section 5.2. Uniquely Identify Individual Refactorings Section 5.3. Implement a Large Change by Many Small Ones Section 5.4. Have a Database Configuration Table Section 5.5. Prefer Triggers over Views or Batch Synchronization Section 5.6. Choose a Sufficient Transition Period Section 5.7. Simplify Your Database Change Control Board (CCB) Strategy Section 5.8. Simplify Negotiations with Other Teams Section 5.9. Encapsulate Database Access Section 5.10. Be Able to Easily Set Up a Database Environment Section 5.11. Do Not Duplicate SQL Section 5.12. Put Database Assets Under Change Control Section 5.13. Beware of Politics Section 5.14. What You Have Learned Online Resources Chapter 6. Structural Refactorings Common Issues When Implementing Structural Refactorings Drop Column Drop Table Drop View Introduce Calculated Column Introduce Surrogate Key Merge Columns Merge Tables Move Column Rename Column Rename Table Rename View Replace LOB With Table Replace Column Replace One-To-Many With Associative Table Replace Surrogate Key With Natural Key Split Column Split Table Chapter 7. Data Quality Refactorings Common Issues When Implementing Data Quality Refactorings Add Lookup Table Apply Standard Codes Apply Standard Type Consolidate Key Strategy Drop Column Constraint Drop Default Value Drop Non-Nullable Introduce Column Constraint Introduce Common Format

Introduce Default Value Make Column Non-Nullable Move Data Replace Type Code With Property Flags Chapter 8. Referential Integrity Refactorings Add Foreign Key Constraint Add Trigger For Calculated Column Access Program Update Mechanics Drop Foreign Key Constraint Introduce Cascading Delete Introduce Hard Delete Introduce Soft Delete Introduce Trigger For History Chapter 9. Architectural Refactorings Add CRUD Methods Add Mirror Table Add Read Method Encapsulate Table With View Introduce Calculation Method Introduce Index Introduce Read-Only Table Migrate Method From Database Migrate Method To Database Replace Method(s) With View Replace View With Method(s) Use Official Data Source Chapter 10. Method Refactorings Section 10.1. Interface Changing Refactorings Section 10.2. Internal Refactorings Chapter 11. Transformations Insert Data Introduce New Column Introduce New Table Introduce View Update Data The UML Data Modeling Notation Glossary References and Recommended Reading List of Refactorings and Transformations Index

Copyright Many of t he designat ions used by m anufact urers and sellers t o dist inguish t heir product s are claim ed as t radem arks. Where t hose designat ions appear in t his book, and t he publisher was aware of a t radem ark claim , t he designat ions have been print ed wit h init ial capit al let t ers or in all capit als. The aut hors and publisher have t aken care in t he preparat ion of t his book, but m ake no expressed or im plied warrant y of any kind and assum e no responsibilit y for errors or om issions. No liabilit y is assum ed for incident al or consequent ial dam ages in connect ion wit h or arising out of t he use of t he inform at ion or program s cont ained herein. The publisher offers excellent discount s on t his book when ordered in quant it y for bulk purchases or special sales, which m ay include elect ronic versions and/ or cust om covers and cont ent part icular t o your business, t raining goals, m arket ing focus, and branding int erest s. For m ore inform at ion, please cont act : U.S. Corporat e and Governm ent Sales ( 800) 382- 3419 corpsales@pearsont echgroup.com For sales out side t he Unit ed St at es, please cont act : I nt ernat ional Sales int ernat [email protected] Visit us on t he Web: www.awprofessional.com Library of Congress Cat aloging- in- Publicat ion Dat a

Am bler, Scot t W., 1966Refact oring dat abases : evolut ionary dat abase design / Scot t W. Am bler and Pram od J. Sadalage. p. cm . I ncludes index. I SBN 0- 321- 293533 ( hardback : alk. paper) 1. Dat abase design. 2. Com put er soft wareDevelopm ent . 3. Evolut ionary program m ing ( Com put er science) I . Sadalage, Pram od J. I I . Tit le. QA76.9.D26A52 2006 005.74dc22 2005031959 Copyright © 2006 Scot t W. Am bler and Pram odkum ar J. Sadalage All right s reserved. Print ed in t he Unit ed St at es of Am erica. This publicat ion is prot ect ed by copyright , and perm ission m ust be obt ained from t he publisher prior t o any prohibit ed reproduct ion, st orage in a ret rieval syst em , or t ransm ission in any form or by any m eans, elect ronic, m echanical, phot ocopying, recording, or likewise. For inform at ion regarding perm issions, writ e t o:

Pearson Educat ion, I nc. Right s and Cont ract s Depart m ent 75 Arlingt on St reet , Suit e 300 Bost on, MA 02116

Fax: ( 617) 848- 7047 Text print ed in t he Unit ed St at es on recycled paper at R. R. Donnelley in Crawfordsville, I ndiana. First print ing, March 2006

Dedication Scot t : For Beverley, m y lovely new bride. Pram od: To t he wom en I love m ost , Rupali and our daught er, Arula.

Praise for Refactoring Databases " This groundbreaking book finally reveals why dat abase schem as need not be difficult t o change, why dat a need not be difficult t o m igrat e, and why dat abase professionals need not be overburdened by change request s from cust om ers and developers. Evolut ionary design is at t he heart of agilit y. Am bler and Sadalage have now shown t he world how t o evolve agile dat abases. Bravo! " Joshua Kerievsky, founder, I ndust rial Logic, I nc.; aut hor, Refact oring t o Pat t erns " This book not only lays out t he fundam ent als for evolut ionary dat abase developm ent , it provides m any pract ical, det ailed exam ples of dat abase refact oring. I t is a m ust read for dat abase pract it ioners int erest ed in agile developm ent ." Doug Barry, president , Barry & Associat es, I nc.; aut hor of Web Services and Service- Orient ed Archit ect ures: The Savvy Manager's Guide " Am bler and Sadalage have t aken t he bold st ep of t ackling an issue t hat ot her writ ers have found so daunt ing. Not only have t hey addressed t he t heory behind dat abase refact oring, but t hey have also explained st ep- by- st ep processes for doing so in a cont rolled and t hought ful m anner. But what really blew m e away were t he m ore t han 200 pages of code sam ples and deep t echnical det ails illust rat ing how t o overcom e specific dat abase refact oring hurdles. This is not j ust anot her int roduct ory book convincing people t hat an idea is a good onet his is a t ut orial and t echnical reference book t hat developers and DBAs alike will keep near t heir com put ers. Kudos t o t he brave duo for succeeding where ot hers have failed t o even t ry." Kevin Aguanno, senior proj ect m anager, I BM Canada Lt d. " Anybody working on non- greenfield proj ect s will recognize t he value t hat Scot t and Pram od bring t o t he soft ware developm ent life cycle wit h Refact oring Dat abases. The realit ies of dealing wit h exist ing dat abases is one t hat is t ough t o crack. Though m uch of t he challenge can be cult ural and progress can be held in lim bo by st rong- arm ed DBA t act ics, t his book shows how t echnically t he refact oring and evolut ionary developm ent of a dat abase can indeed be handled in an agile m anner. I look forward t o dropping off a copy on t he desk of t he next ornery DBA I run int o." Jon Kern " This book is excellent . I t is perfect for t he dat a professional who needs t o produce result s in t he world of agile developm ent and obj ect t echnology. A well- organized book, it shows t he what , why, and how of refact oring dat abases and associat ed code. Like t he best cookbook, I will use it oft en when developing and im proving dat abases." David R. Haert zen, edit or, The Dat a Managem ent Cent er, First Place Soft ware, I nc. " This excellent book brings t he agile pract ice of refact oring int o t he world of dat a. I t provides pragm at ic guidance on bot h t he m et hodology t o refact oring dat abases wit hin your organizat ion and t he det ails of how t o im plem ent individual refact orings. Refact oring Dat abases also art iculat es t he im port ance of developers and DBAs working side by side. I t is a m ust have reference for developers and DBAs alike." Per Kroll, developm ent m anager, RUP, I BM; proj ect lead, Eclipse Process Fram ework " Scot t and Pram od have done for dat abase refact oring what Mart in Fowler did for code refact oring. They've put t oget her a coherent set of procedures you can use t o im prove t he qualit y of your dat abase. I f you deal wit h dat abases, t his book is for you."

Ken Pugh, aut hor, Prefact oring " I t 's past t im e for dat a people t o j oin t he agile ranks, and Am bler and Sadalage are t he right persons t o lead t hem . This book should be read by dat a m odelers and adm inist rat ors, as well as soft ware t eam s. We have lived in different worlds for t oo long, and t his book will help t o rem ove t he barriers dividing us." Gary K. Evans, Agile Process evangelist , Evanet ics, I nc. " Evolut ionary design and refact oring are already excit ing, and wit h Refact oring Dat abases t his get s even bet t er. I n t his book, t he aut hors share wit h us t he t echniques and st rat egies t o refact or at t he dat abase level. Using t hese refact orings, dat abase schem as can safely be evolved even aft er a dat abase has been deployed int o product ion. Wit h t his book, dat abase is wit hin reach of any developer." Sven Gort s " Dat abase refact oring is an im port ant new t opic and t his book is a pioneering cont ribut ion t o t he com m unit y." Floyd Marinescu, creat or of I nfoQ.com and TheServerSide.com ; aut hor of EJB Design Pat t erns

The Addison-Wesley Signature Series Th e Addison - W e sle y Sign a t u r e Se r ie s provides readers wit h pract ical and aut horit at ive inform at ion on t he lat est t rends in m odern t echnology for com put er professionals. The series is based on one sim ple prem ise: great books com e from great aut hors. Books in t he series are personally chosen by expert advisors, world- class aut hors in t heir own right . These expert s are proud t o put t heir signat ures on t he covers, and t heir signat ures ensure t hat t hese t hought leaders have worked uniqueness. The expert signaut res also sym bolize a prom ise t o our raders: you are reading a fut ure classic.

The Addison-Wesley Signature Series Signers: Kent Beck and Martin Fowler Ke n t Be ck has pioneered people- orient ed t echnologies like JUnit , Ext rem e Program m ing, and pat t erns for soft ware developm ent . Kent is int erest ed in helping t eam s do well by doing goodfinding a st yle of soft ware developm ent t hat sim ult aneously sat isfies ecnom ic, aest het ic, em ot ional, and pract ical const raint s. His books focus on t ouching t he lives of t he creat ors and users of soft ware. M a r t in Fow le r has been a pioneeer of obj ect t echnology in ent erprise applicat ions. His cent ral concern is how t o design soft ware well. He focuses on get t ing t o t he heart of how t o build ent erprise soft ware t hat will last well int o t he fut ure. He is int erest ed in looking behind t he specifics of t hechnologies t o t he pat t erns, pract ices, and principles t hat last for m any years; t hese books should be usable a decade from now. Mart in's crit erion is t hat t hese are books he wished he could writ e. T I TLES

I N TH E

S ERI ES Test - Drive Developm ent : By Exam ple Kent Brck, I SBN 0321146530 User St ories Applied: For Agile Soft ware Developm ent Mike Cohn, I SBN 0321205685 Refact oring Dat abases: Evolut ionary Dat abase Design Scot t W. Am bler and Pram odkum ar J. Sadalage, I SBN 0321293533 Pat t erns of Ent erprise Applicat ion Archit ect ure Mart in Fowler, I SBN 032112720 Beyond Soft ware Archit ect ure: Creat ing and Sust aining Winning Solut ions Luke Hohm ann, I SBN 0201775948 Ent erprise I nt egrat ion Pat t erns: Designing, Buliding, and Deploying Messaging Solut ions Gregor Hohpe and Bobby Woolf, I SBN 0321200683 Refact oring t o Pat t erns Joshua Kerievsky, I SBN 0321213351

For m ore inform at ion, check out t he series web sit e at www.awprofessional.com

About the Authors Scot t W . Am ble r is a soft ware process im provem ent ( SPI ) consult ant living j ust nort h of Toront o. He is founder and pract ice leader of t he Agile Modeling ( AM) ( www.agilem odeling.com ) , Agile Dat a ( AD) ( www.agiledat a.org ) , Ent erprise Unified Process ( EUP) ( www.ent erpriseunifiedprocess.com ) , and Agile Unified Process ( AUP) (www.am bysoft .com / unifiedprocess) m et hodologies. Scot t is t he ( co- ) aut hor of several books, including Agile Modeling ( John Wiley & Sons, 2002) , Agile Dat abase Techniques ( John Wiley & Sons, 2003) , The Obj ect Prim er, Third Edit ion ( Cam bridge Universit y Press, 2004) , The Ent erprise Unified Process ( Prent ice Hall, 2005) , and The Elem ent s of UML 2.0 St yle ( Cam bridge Universit y Press, 2005) . Scot t is a cont ribut ing edit or wit h Soft ware Developm ent m agazine ( www.sdm agazine.com ) and has spoken and keynot ed at a wide variet y of int ernat ional conferences, including Soft ware Developm ent , UML World, Obj ect Expo, Java Expo, and Applicat ion Developm ent . Scot t graduat ed from t he Universit y of Toront o wit h a Mast er of I nform at ion Science. I n his spare t im e Scot t st udies t he Goj u Ryu and Kobudo st yles of karat e. Pr a m od J. Sa da la ge is a consult ant for Thought Works, an ent erprise applicat ion developm ent and int egrat ion com pany. He first pioneered t he pract ices and processes of evolut ionary dat abase design and dat abase refact oring in 1999 while working on a large J2EE applicat ion using t he Ext rem e Program m ing ( XP) m et hodology. Since t hen, Pram od has applied t he pract ices and processes t o m any proj ect s. Pram od writ es and speaks about dat abase adm inist rat ion on evolut ionary proj ect s, t he adopt ion of evolut ionary processes wit h regard t o dat abases, and evolut ionary pract ices' im pact upon dat abase adm inist rat ion, in order t o m ake it easy for everyone t o use evolut ionary design in regards t o dat abases. When he is not working, you can find him spending t im e wit h his wife and daught er and t rying t o im prove his running.

Forewords A decade ago refact oring was a word only known t o a few people, m ost ly in t he Sm allt alk com m unit y. I t 's been wonderful t o wat ch m ore and m ore people learn how t o use refact oring t o m odify working code in a disciplined and effect ive m anner. As a result m any people now see code refact oring as an essent ial part of soft ware developm ent . I live in t he world of ent erprise applicat ions, and a big part of ent erprise applicat ion developm ent is working wit h dat abases. I n m y original book on refact oring, I picked out dat abases as a m aj or problem area in refact oring because refact oring dat abases int roduces a new set of problem s. These problem s are exacerbat ed by t he sad division t hat 's developed in t he ent erprise soft ware world where dat abase professionals and soft ware developers are separat ed by a wall of m ut ual incom prehension and cont em pt . One of t he t hings I like about Scot t and Pram od is t hat , in different ways, t hey have bot h worked hard t o t ry and cross t his division. Scot t 's writ ings on dat abases have been a consist ent at t em pt t o bridge t he gap, and his work on obj ect - relat ional m apping has been a great influence on m y own writ ings on ent erprise applicat ion archit ect ure. Pram od m ay be less known, but his im pact has been j ust as great on m e. When he st art ed work on a proj ect wit h m e at Thought Works we were t old t hat refact oring of dat abases was im possible. Pram od rej ect ed t hat not ion, t aking som e sket chy ideas and t urning t hem int o a disciplined program t hat kept t he dat abase schem a in const ant , but cont rolled, m ot ion. This freed up t he applicat ion developers t o use evolut ionary design in t he code, t oo. Pram od has since t aken t hese t echniques t o m any of our client s, spreading t hem around our Thought Works colleagues and, at least for us, forever banishing dat abases from t he list of roadblocks t o cont inual design. This book assem bles t he lessons of t wo people who have lived in t he no- m ans land bet ween applicat ions and dat a, and present s a guide on how t o use refact oring t echniques for dat abases. I f you're fam iliar wit h refact oring, you'll not ice t hat t he m aj or change is t hat you have t o m anage cont inual m igrat ion of t he dat a it self, not j ust change t he program and dat a st ruct ures. This book t ells you how t o do t hat , backed by t he proj ect experience ( and scars) t hat t hese t wo have accum ulat ed. Much t hough I 'm delight ed by t he appearance of t his book, I also hope it 's only a first st ep. Aft er m y refact oring book appeared I was delight ed t o find sophist icat ed t ools appear t hat aut om at ed m any refact oring t asks. I hope t he sam e t hing happens wit h dat abases, and we begin t o see vendors offer t ools t hat m ake cont inual m igrat ions of schem a and dat a easier for everyone. Before t hat happens, t his book will help you build your own processes and t ools t o help; aft erward t his book will have last ing value as a foundat ion for using such t ools successfully. M a r t in Fow le r , se r ie s e dit or ; ch ie f scie n t ist , Th ou gh t W or k s I n t he years since I first began m y career in soft ware developm ent , m any aspect s of t he indust ry and t echnology have changed dram at ically. What hasn't changed, however, is t he fundam ent al nat ure of soft ware developm ent . I t has never been hard t o creat e soft warej ust get a com put er and st art churning out code. But it was hard t o creat e good soft ware, and exponent ially harder t o creat e great soft ware. This sit uat ion hasn't changed t oday. Today it is easier t o creat e larger and m ore com plex soft ware syst em s by cobbling t oget her part s from a variet y of sources, soft ware developm ent t ools have advanced in bounds, and we know a lot m ore about what works and doesn't work for t he process of creat ing soft ware. Yet m ost soft ware is st ill brit t le and st ruggling t o achieve accept able qualit y levels. Perhaps t his is because we are creat ing larger and m ore com plex syst em s, or perhaps it is because t here are fundam ent al gaps in t he t echniques st ill used. I believe t hat soft ware developm ent t oday rem ains as challenging as ever because of a com binat ion of t hese t wo fact ors. Fort unat ely, from t im e t o t im e new t echnologies and t echniques appear t hat can help. Am ong t hese advances, a rare few are have t he power t o im prove great ly our abilit y t o realize t he pot ent ial envisioned at t he st art of m ost proj ect s. The t echniques involved in refact oring, along wit h t heir associat e Agile m et hodologies, were one of t hese rare advances. The work cont ained in t his book ext ends t his base in a very

im port ant direct ion. Refact oring is a cont rolled t echnique for safely im proving t he design of code wit hout changing it s behavioral sem ant ics. Anyone can t ake a chance at im proving code, but refact oring brings a discipline of safely m aking changes ( wit h t est s) and leveraging t he knowledge accum ulat ed by t he soft ware developm ent com m unit y ( t hrough refact orings) . Since Fowler's sem inal book on t he subj ect , refact oring has been widely applied, and t ools assist ing wit h det ect ion of refact oring candidat es and applicat ion of refact orings t o code have driven widespread adopt ion. At t he dat a t ier of applicat ions, however, refact oring has proven m uch m ore difficult t o apply. Part of t his problem is no doubt cult ural, as t his book shows, but also t here has not been a clear process and set of refact orings applicable t o t he dat a t ier. This is really unfort unat e, since poor design at t he dat a level alm ost always t ranslat es int o problem s at t he higher t iers, t ypically causing a chain of bad designs in a fut ile effort t o st abilize t he shaky foundat ion. Furt her, t he inabilit y t o evolve t he dat a t ier, whet her due t o denial or fear of change, ham pers t he abilit y of all t hat rest s on it t o deliver t he best soft ware possible. These problem s are exact ly what m ake t his work so im port ant : we now have a process and cat alog for enabling it erat ive design im provem ent s on in t his vit al area. I am very excit ed t o see t he publicat ion of t his book, and hope t hat it drives t he creat ion of t ools t o support t he t echniques it describes. The soft ware indust ry is current ly in an int erest ing st age, wit h t he rise of open- source soft ware and t he collaborat ive vehicles it brings. Proj ect s such as t he Eclipse Dat a Tools Plat form are nat ural collect ion areas for t hose int erest ed in bringing dat abase refact oring t o life in t ools. I hope t he open- source com m unit y will work hard t o realize t his vision, because t he pot ent ial payoff is great . Soft ware developm ent will m ove t o t he next level of m at urit y when dat abase refact oring is as com m on and widely applied as general refact oring it self. Joh n Gr a h a m , Eclipse D a t a Tools Pla t for m , Pr oj e ct M a n a ge m e n t , com m it t e e ch a ir , se n ior st a ff e n gin e e r , Syba se , I n c. I n m any ways t he dat a com m unit y has m issed t he ent ire agile soft ware developm ent revolut ion. While applicat ion developers have em braced refact oring, t est - driven developm ent , and ot her such t echniques t hat encourage it erat ion as a product ive and advant ageous approach t o soft ware developm ent , dat a professionals have largely ignored and even insulat ed t hem selves from t hese t rends. This becam e clear t o m e early in m y career as an applicat ion developer at a large financial services inst it ut ion. At t hat t im e I had a cubicle sit uat ed right bet ween t he developm ent and dat abase t eam s. What I quickly learned was t hat alt hough t hey were only a few feet apart , t he cult ure, pract ices, and processes of each group were significant ly different . A cust om er request t o t he developm ent t eam m eant som e refact oring, a code check- in, and aggressive accept ance t est ing. A sim ilar request t o t he dat abase t eam m eant a form al change request processed t hrough m any levels of approval before even t he m odificat ion of a schem a could begin. The burden of t he process const ant ly led t o frust rat ions for bot h developers and cust om ers but persist ed because t he dat abase t eam knew no ot her way. But t hey m ust learn anot her way if t heir businesses are t o t hrive in t oday's ever- evolving com pet it ive landscape. The dat a com m unit y m ust som ehow adopt t he agile t echniques of t heir developer count erpart s. Refact oring Dat abases is an invaluable resource t hat shows dat a professionals j ust how t hey can leap ahead and confident ly, safely em brace change. Scot t and Pram od show how t he im provem ent in design t hat result s from sm all, it erat ive refact orings allow t he agile DBA t o avoid t he m ist ake of big upfront design and evolve t he schem a along wit h t he applicat ion as t hey gradually gain a bet t er underst anding of cust om er requirem ent s. Make no m ist ake, refact oring dat abases is hard. Even a sim ple change like renam ing a colum n cascades t hroughout a schem a, t o it s obj ect s, persist ence fram eworks, and applicat ion t ier, m aking it seem t o t he DBA like a very inaccessible t echnique. Refact oring Dat abases out lines a set of prescript ive pract ices t hat show t he professional DBA exact ly how t o bring t his agile m et hod int o t he design and developm ent of dat abases. Scot t 's and Pram od's at t ent ion t o t he m inut e det ails of what it t akes t o act ually im plem ent every dat abase refact oring

t echnique proves t hat it can be done and paves t he way for it s widespread adopt ion. Thus, I propose a call t o act ion for all dat a professionals. Read on, em brace change, and spread t he word. Dat abase refact oring is key t o im proving t he dat a com m unit y's agilit y. Sa ch in Re k h i, pr ogr a m m a n a ge r , M icr osoft Cor por a t ion I n t he world of syst em developm ent , t here are t wo dist inct cult ures: t he world dom inat ed by obj ect orient ed ( OO) developers who live and breat he Java and agile soft ware developm ent , and t he relat ional dat abase world populat ed by people who appreciat e careful engineering and solid relat ional dat abase design. These t wo groups speak different languages, at t end different conferences, and rarely seem t o be on speaking t erm s wit h each ot her. This schism is reflect ed wit hin I T depart m ent s in m any organizat ions. OO developers com plain t hat DBAs are st odgy conservat ives, unable t o keep up wit h t he rapid pace of change. Dat abase professionals bem oan t he idiocy of Java developers who do not have a clue what t o do wit h a dat abase. Scot t Am bler and Pram od Sadalage belong t o t hat rare group of people who st raddle bot h worlds. Refact oring Dat abases: Evolut ionary Dat abase Design is about dat abase design writ t en from t he perspect ive of an OO archit ect . As a result , t he book provides value t o bot h OO developers and relat ional dat abase professionals. I t will help OO developers t o apply agile code refact oring t echniques t o t he dat abase arena as well as give relat ional dat abase professionals insight int o how OO archit ect s t hink. This book includes num erous t ips and t echniques for im proving t he qualit y of dat abase design. I t explicit ly focuses on how t o handle real- world sit uat ions where t he dat abase already exist s but is poorly designed, or when t he init ial dat abase design failed t o produce a good m odel. The book succeeds on a num ber of different levels. First , it can be used as a t act ical guide for developers in t he t renches. I t is also a t hought - provoking t reat ise about how t o m erge OO and relat ional t hinking. I wish m ore syst em archit ect s echoed t he sent im ent s of Am bler and Sadalage in recognizing t hat a dat abase is m ore t han j ust a place t o put persist ent copies of classes. D r . Pa u l D or se y, pr e side n t , D u lcia n , I n c.; pr e side n t , N e w Yor k Or a cle Use r s Gr ou p; ch a ir pe r son , J2 EE SI G

Preface Evolut ionary, and oft en agile, soft ware developm ent m et hodologies, such as Ext rem e Program m ing ( XP) , Scrum , t he Rat ional Unified Process ( RUP) , t he Agile Unified Process ( AUP) , and Feat ure- Driven Developm ent ( FDD) , have t aken t he inform at ion t echnology ( I T) indust ry by st orm over t he past few years. For t he sake of definit ion, an evolut ionary m et hod is one t hat is bot h it erat ive and increm ent al in nat ure, and an agile m et hod is evolut ionary and highly collaborat ive in nat ure. Furt herm ore, agile t echniques such as refact oring, pair program m ing, Test - Driven Developm ent ( TDD) , and Agile ModelDriven Developm ent ( AMDD) are also m aking headway int o I T organizat ions. These m et hods and t echniques have been developed and have evolved in a grassroot s m anner over t he years, being honed in t he soft ware t renches, as it were, inst ead of form ulat ed in ivory t owers. I n short , t his evolut ionary and agile st uff seem s t o work incredibly well in pract ice. I n t he sem inal book Refact oring, Mart in Fowler describes a refact oring as a sm all change t o your source code t hat im proves it s design wit hout changing it s sem ant ics. I n ot her words, you im prove t he qualit y of your work wit hout breaking or adding anyt hing. I n t he book, Mart in discusses t he idea t hat j ust as it is possible t o refact or your applicat ion source code, it is also possible t o refact or your dat abase schem a. However, he st at es t hat dat abase refact oring is quit e hard because of t he significant levels of coupling associat ed wit h dat abases, and t herefore he chose t o leave it out of his book. Since 1999 when Refact oring was published, t he t wo of us have found ways t o refact or dat abase schem as. I nit ially, we worked separat ely, running int o each ot her at conferences such as Soft ware Developm ent ( www.sdexpo.com ) and on m ailing list s ( www.agiledat a.org/ feedback.ht m l) . We discussed ideas, at t ended each ot her's conference t ut orials and present at ions, and quickly discovered t hat our ideas and t echniques overlapped and were highly com pat ible wit h one anot her. So we j oined forces t o writ e t his book, t o share our experiences and t echniques at evolving dat abase schem as via refact oring. The exam ples t hroughout t he book are writ t en in Java, Hibernat e, and Oracle code. Virt ually every dat abase refact oring descript ion includes code t o m odify t he dat abase schem a it self, and for som e of t he m ore int erest ing refact orings, we show t he effect s t hey would have on Java applicat ion code. Because all dat abases are not creat ed alike, we include discussions of alt ernat ive im plem ent at ion st rat egies when im port ant nuances exist bet ween dat abase product s. I n som e inst ances we discuss alt ernat ive im plem ent at ions of an aspect of a refact oring using Oracle- specific feat ures such as t he SE,T UNUSED or RENAME TO com m ands, and m any of our code exam ples t ake advant age of Oracle's COMMENT ON feat ure. Ot her dat abase product s include ot her feat ures t hat m ake dat abase refact oring easier, and a good DBA will know how t o t ake advant age of t hese t hings. Bet t er yet , in t he fut ure dat abase refact oring t ools will do t his for us. Furt herm ore, we have kept t he Java code sim ple enough so t hat you should be able t o convert it t o C# , C+ + , or even Visual Basic wit h lit t le problem at all.

Why Evolutionary Database Development? Evolut ionary dat abase developm ent is a concept whose t im e has com e. I nst ead of t rying t o design your dat abase schem a up front early in t he proj ect , you inst ead build it up t hroughout t he life of a proj ect t o reflect t he changing requirem ent s defined by your st akeholders. Like it or not , requirem ent s change as your proj ect progresses. Tradit ional approaches have denied t his fundam ent al realit y and have t ried t o " m anage change," a euphem ism for prevent ing change, t hrough various m eans. Pract it ioners of m odern developm ent t echniques inst ead choose t o em brace change and follow t echniques t hat enable t hem t o evolve t heir work in st ep wit h evolving requirem ent s. Program m ers have adopt ed t echniques such as TDD, refact oring, and AMDD and have built new developm ent t ools t o m ake t his easy. As we have done t his, we have realized t hat we also need t echniques and t ools t o support evolut ionary dat abase developm ent . Advant ages t o an evolut ionary approach t o dat abase developm ent include t he following:

1 . You m in im ize w a st e . An evolut ionary, j ust - in- t im e ( JI T) approach enables you t o avoid t he inevit able wast age inherent in serial t echniques when requirem ent s change. Any early invest m ent in det ailed requirem ent s, archit ect ure, and design art ifact s is lost when a requirem ent is lat er found t o be no longer needed. I f you have t he skills t o do t he work up front , clearly you m ust have t he skills t o do t he sam e work JI T. 2 . You a void sign ifica n t r e w or k . As you will see in Chapt er 1, " Evolut ionary Dat abase Developm ent ," you should st ill do som e init ial m odeling up front t o t hink m aj or issues t hrough, issues t hat could pot ent ially lead t o significant rework if ident ified lat e in t he proj ect ; you j ust do not need t o invest igat e t he det ails early. 3 . You a lw a ys k n ow t h a t you r syst e m w or k s. Wit h an evolut ionary approach, you regularly produce working soft ware, even if it is j ust deployed int o a dem o environm ent , which works. When you have a new, working version of t he syst em every week or t wo, you dram at ically reduce your proj ect 's risk. 4 . You a lw a ys k n ow t h a t you r da t a ba se de sign is t h e h igh e st qu a lit y possible . This is exact ly what dat abase refact oring is all about : im proving your schem a design a lit t le bit at a tim e. 5 . You w or k in a com pa t ible m a n n e r w it h de ve lope r s. Developers work in an evolut ionary m anner, and if dat a professionals want t o be effect ive m em bers of m odern developm ent t eam s, t hey also need t o choose t o work in an evolut ionary m anner. 6 . You r e du ce t h e ove r a ll e ffor t . By working in an evolut ionary m anner, you only do t he work t hat you act ually need t oday and no m ore. There are also several disadvant ages t o evolut ionary dat abase developm ent :

1 . Cu lt u r a l im pe dim e n t s e x ist . Many dat a professionals prefer t o follow a serial approach t o soft ware developm ent , oft en insist ing t hat som e form of det ailed logical and physical dat a m odels be creat ed and baselined before program m ing begins. Modern m et hodologies have abandoned t his approach as being t oo inefficient and risky, t hereby leaving m any dat a professionals in t he cold. Worse yet , m any of t he " t hought leaders" in t he dat a com m unit y are people who cut t heir t eet h in t he 1970s and 1980s but who m issed t he obj ect revolut ion of t he 1990s, and t hereby m issed gaining experience in evolut ionary developm ent . The world changed, but t hey did not seem t o change wit h it . As you will learn in t his book, it is not only possible for dat a professionals t o work in an evolut ionary, if not agile, m anner, it is in fact a preferable way t o work.

2 . Le a r n in g cu r ve . I t t akes t im e t o learn t hese new t echniques, and even longer if you also need t o change a serial m indset int o an evolut ionary one. 3 . Tool su ppor t is st ill e volvin g. When Refact oring was published in 1999, no t ools support ed t he t echnique. Just a few years lat er, every single int egrat ed developm ent environm ent ( I DE) has code- refact oring feat ures built right in t o it . At t he t im e of t his writ ing, t here are no dat abase refact oring t ools in exist ence, alt hough we do include all t he code t hat you need t o im plem ent t he refact orings by hand. Luckily, t he Eclipse Dat a Tools Proj ect ( DTP) has indicat ed in t heir proj ect prospect us t he need t o develop dat abase- refact oring funct ionalit y in Eclipse, so it is only a m at t er of t im e before t he t ool vendors cat ch up.

Agility in a Nutshell Alt hough t his is not specifically a book about agile soft ware developm ent , t he fact is t hat dat abase refact oring is a prim ary t echnique for agile developers. A process is considered agile when it conform s t o t he four values of t he Agile Alliance ( www.agilealliance.org) . The values define preferences, not alt ernat ives, encouraging a focus on cert ain areas but not elim inat ing ot hers. I n ot her words, whereas you should value t he concept s on t he right side, you should value t he t hings on t he left side even m ore. For exam ple, processes and t ools are im port ant , but individuals and int eract ions are m ore im port ant . The four agile values are as follows:

1 . I n dividu a ls a n d in t e r a ct ion s OVER pr oce sse s a n d t ools. The m ost im port ant fact ors t hat you need t o consider are t he people and how t hey work t oget her; if you do not get t hat right , t he best t ools and processes will not be of any use. 2 . W or k in g soft w a r e OVER com pr e h e n sive docu m e n t a t ion . The prim ary goal of soft ware developm ent is t o creat e working soft ware t hat m eet s t he needs of it s st akeholders. Docum ent at ion st ill has it s place; writ t en properly, it describes how and why a syst em is built , and how t o work wit h t he syst em . 3 . Cu st om e r colla bor a t ion OVER con t r a ct n e got ia t ion . Only your cust om er can t ell you what t hey want . Unfort unat ely, t hey are not good at t hist hey likely do not have t he skills t o exact ly specify t he syst em , nor will t hey get it right at first , and worse yet t hey will likely change t heir m inds as t im e goes on. Having a cont ract wit h your cust om ers is im port ant , but a cont ract is not a subst it ut e for effect ive com m unicat ion. Successful I T professionals work closely wit h t heir cust om ers, t hey invest t he effort t o discover what t heir cust om ers need, and t hey educat e t heir cust om ers along t he way. 4 . Re spon din g t o ch a n ge OVER follow in g a pla n . As work progresses on your syst em , your st akeholders' underst anding of what t hey want changes, t he business environm ent changes, and so does t he underlying t echnology. Change is a realit y of soft ware developm ent , and as a result , your proj ect plan and overall approach m ust reflect your changing environm ent if it is t o be effect ive.

How to Read This Book The m aj orit y of t his book, Chapt ers 6 t hrough 11 , consist s of reference m at erial t hat describes each refact oring in det ail. The first five chapt ers describe t he fundam ent al ideas and t echniques of evolut ionary dat abase developm ent , and in part icular, dat abase refact oring. You should read t hese chapt ers in order: Chapt er 1, " Evolut ionary Dat abase Developm ent ," overviews t he fundam ent als of evolut ionary developm ent and t he t echniques t hat support it . I t sum m arizes refact oring, dat abase refact oring, dat abase regression t est ing, evolut ionary dat a m odeling via an AMDD approach, configurat ion m anagem ent of dat abase asset s, and t he need for separat e developer sandboxes. Chapt er 2, " Dat abase Refact oring," explores in det ail t he concept s behind dat abase refact oring and why it can be so hard t o do in pract ice. I t also works t hrough a dat abase- refact oring exam ple in bot h a " sim ple" single- applicat ion environm ent as well as in a com plex, m ult iapplicat ion environm ent . Chapt er 3, " The Process of Dat abase Refact oring," describes in det ail t he st eps required t o refact or your dat abase schem a in bot h sim ple and com plex environm ent s. Wit h single- applicat ion dat abases, you have m uch great er cont rol over your environm ent , and as a result need t o do far less work t o refact or your schem a. I n m ult i- applicat ion environm ent s, you need t o support a t ransit ion period in which your dat abase support s bot h t he old and new schem as in parallel, enabling t he applicat ion t eam s t o updat e and deploy t heir code int o product ion. Chapt er 4, " Deploying int o Product ion," describes t he process behind deploying dat abase refact orings int o product ion. This can prove part icularly challenging in a m ult i- applicat ion environm ent because t he changes of several t eam s m ust be m erged and t est ed. Chapt er 5, " Dat abase Refact oring St rat egies," sum m arizes som e of t he " best pract ices" t hat we have discovered over t he years when it com es t o refact oring dat abase schem as. We also float a couple of ideas t hat we have been m eaning t o t ry out but have not yet been able t o do so.

About the Cover Each book in t he Mart in Fowler Signat ure Series has a pict ure of a bridge on t he front cover. This t radit ion reflect s t he fact t hat Mart in's wife is a civil engineer, who at t he t im e t he book series st art ed worked on horizont al proj ect s such as bridges and t unnels. This bridge is t he Burlingt on Bay Jam es N. Allan Skyway in Sout hern Ont ario, which crosses t he m out h of Ham ilt on Harbor. At t his sit e are t hree bridges: t he t wo in t he pict ure and t he East port Drive lift bridge, not shown. This bridge syst em is significant for t wo reasons. Most im port ant ly it shows an increm ent al approach t o delivery. The lift bridge originally bore t he t raffic t hrough t he area, as did anot her bridge t hat collapsed in 1952 aft er being hit by a ship. The first span of t he Skyway, t he port ion in t he front wit h t he m et al support s above t he roadway, opened in 1958 t o replace t he lost bridge. Because t he Skyway is a m aj or t horoughfare bet ween Toront o t o t he nort h and Niagara Falls t o t he sout h, t raffic soon exceeded capacit y. The second span, t he one wit hout m et al support s, opened in 1985 t o support t he new load. I ncrem ent al delivery m akes good econom ic sense in bot h civil engineering and in soft ware developm ent . The second reason we used t his pict ure is t hat Scot t was raised in Burlingt on Ont arioin fact , he was born in Joseph Brant hospit al, which is near t he nort hern foot ing of t he Skyway. Scot t t ook t he cover pict ure wit h a Nikon D70S.

Acknowledgments We want t o t hank t he following people for t heir input int o t he developm ent of t his book: Doug Barry, Gary Evans, Mart in Fowler, Bernard Goodwin, Joshua Graham , Sven Gort s, David Hay, David Haert zen, Michelle Housely, Sriram Narayan, Paul Pet ralia, Sachin Rekhi, Andy Slocum , Brian Sm it h, Michael Thurst on, Michael Vizdos, and Greg Warren. I n addit ion, Pram od want s t o t hank I rfan Shah, Narayan Ram an, Anishek Agarwal, and m y ot her t eam m at es who const ant ly challenged m y opinions and t aught m e a lot about soft ware developm ent . I also want t o t hank Mart in for get t ing m e t o writ e, t alk, and generally be act ive out side of Thought Works; Kent Beck for his encouragem ent ; m y colleagues at Thought Works who have helped m e in num erous ways and m ake working fun; m y parent s Jinappa and Shobha who put a lot of effort in raising m e; and Praveen, m y brot her, who since m y childhood days has crit iqued and im proved t he way I writ e.

Chapter 1. Evolutionary Database Development Wat erfalls are wonderful t ourist at t ract ions. They are spect acularly bad st rat egies for organizing soft ware developm ent proj ect s. Scot t Am bler Modern soft ware processes, also called m et hodologies, are all evolut ionary in nat ure, requiring you t o work bot h it erat ively and increm ent ally. Exam ples of such processes include Rat ional Unified Process ( RUP) , Ext rem e Program m ing ( XP) , Scrum , Dynam ic Syst em Developm ent Met hod ( DSDM) , t he Cryst al fam ily, Team Soft ware Process ( TSP) , Agile Unified Process ( AUP) , Ent erprise Unified Process ( EUP) , Feat ure- Driven Developm ent ( FDD) , and Rapid Applicat ion Developm ent ( RAD) , t o nam e a few. Working it erat ively, you do a lit t le bit of an act ivit y such as m odeling, t est ing, coding, or deploym ent at a t im e, and t hen do anot her lit t le bit , t hen anot her, and so on. This process differs from a serial approach in which you ident ify all t he requirem ent s t hat you are going t o im plem ent , t hen creat e a det ailed design, t hen im plem ent t o t hat design, t hen t est , and finally deploy your syst em . Wit h an increm ent al approach, you organize your syst em int o a series of releases rat her t han one big one. Furt herm ore, m any of t he m odern processes are agile, which for t he sake of sim plicit y we will charact erize as bot h evolut ionary and highly collaborat ive in nat ure. When a t eam t akes a collaborat ive approach, t hey act ively st rive t o find ways t o work t oget her effect ively; you should even t ry t o ensure t hat proj ect st akeholders such as business cust om ers are act ive t eam m em bers. Cockburn( 2002) advises t hat you should st rive t o adopt t he " hot t est " com m unicat ion t echnique applicable t o your sit uat ion: Prefer face- t o- face conversat ion around a whit eboard over a t elephone call, prefer a t elephone call over sending som eone an e- m ail, and prefer an e- m ail over sending som eone a det ailed docum ent . The bet t er t he com m unicat ion and collaborat ion wit hin a soft ware developm ent t eam , t he great er your chance of success. Alt hough bot h evolut ionary and agile ways of working have been readily adopt ed wit hin t he developm ent com m unit y, t he sam e cannot be said wit hin t he dat a com m unit y. Most dat a- orient ed t echniques are serial in nat ure, requiring t he creat ion of fairly det ailed m odels before im plem ent at ion is " allowed" t o begin. Worse yet , t hese m odels are oft en baselined and put under change m anagem ent cont rol t o m inim ize changes. ( I f you consider t he end result s, t his should really be called a change prevent ion process.) Therein lies t he rub: Com m on dat abase developm ent t echniques do not reflect t he realit ies of m odern soft ware developm ent processes. I t does not have t o be t his way. Our prem ise is t hat dat a professionals need t o adopt t he evolut ionary t echniques sim ilar t o t hose of developers. Alt hough you could argue t hat developers should ret urn t o t he " t ried- and- t rue" t radit ional approaches com m on wit hin t he dat a com m unit y, it is becom ing m ore and m ore apparent t hat t he t radit ional ways j ust do not work well. I n Chapt er 5 of Agile & I t erat ive Developm ent , Craig Larm an ( 2004) sum m arizes t he research evidence, as well as t he overwhelm ing support am ong t he t hought leaders wit hin t he inform at ion t echnology ( I T) com m unit y, in support of evolut ionary approaches. The bot t om line is t hat t he evolut ionary and agile t echniques prevalent wit hin t he developm ent com m unit y work m uch bet t er t han t he t radit ional t echniques prevalent wit hin t he dat a com m unit y. I t is possible for dat a professionals t o adopt evolut ionary approaches t o all aspect s of t heir work, if t hey choose t o do so. The first st ep is t o ret hink t he " dat a cult ure" of your I T organizat ion t o reflect t he needs of m odern I T proj ect t eam s. The Agile Dat a ( AD) m et hod ( Am bler 2003) does exact ly t hat , describing a collect ion of philosophies and roles for m odern dat a- orient ed act ivit ies. The philosophies reflect how dat a is one of m any im port ant aspect s of business soft ware, im plying t hat developers need t o becom e m ore adept at dat a t echniques and t hat dat a professionals need t o learn m odern developm ent t echnologies and skills. The AD m et hod recognizes t hat each proj ect t eam is unique and needs t o follow a process t ailored for t heir sit uat ion. The im port ance of looking beyond your current

proj ect t o address ent erprise issues is also st ressed, as is t he need for ent erprise professionals such as operat ional dat abase adm inist rat ors and dat a archit ect s t o be flexible enough t o work wit h proj ect t eam s in an agile m anner. The second st ep is for dat a professionals, in part icular dat abase adm inist rat ors, t o adopt new t echniques t hat enable t hem t o work in an evolut ionary m anner. I n t his chapt er, we briefly overview t hese crit ical t echniques, and in our opinion t he m ost im port ant t echnique is dat abase refact oring, which is t he focus of t his book. The evolut ionary dat abase developm ent t echniques are as follows: 1 . D a t a ba se r e fa ct or in g. Evolve an exist ing dat abase schem a a sm all bit at a t im e t o im prove t he qualit y of it s design wit hout changing it s sem ant ics. 2 . Evolu t ion a r y da t a m ode lin g. Model t he dat a aspect s of a syst em it erat ively and increm ent ally, j ust like all ot her aspect s of a syst em , t o ensure t hat t he dat abase schem a evolves in st ep wit h t he applicat ion code. 3 . D a t a ba se r e gr e ssion t e st in g. Ensure t hat t he dat abase schem a act ually works. 4 . Con figu r a t ion m a n a ge m e n t of da t a ba se a r t ifa ct s. Your dat a m odels, dat abase t est s, t est dat a, and so on are im port ant proj ect art ifact s t hat should be m anaged j ust like any ot her art ifact . 5 . D e ve lope r sa n dbox e s. Developers need t heir own working environm ent s in which t hey can m odify t he port ion of t he syst em t hat t hey are building and get it working before t hey int egrat e t heir work wit h t hat of t heir t eam m at es.

Let 's consider each evolut ionary dat abase t echnique in det ail.

1.1. Database Refactoring Refact oring ( Fowler 1999) is a disciplined way t o m ake sm all changes t o your source code t o im prove it s design, m aking it easier t o work wit h. A crit ical aspect of a refact oring is t hat it ret ains t he behavioral sem ant ics of your codeyou neit her add nor rem ove anyt hing when you refact or; you m erely im prove it s qualit y. An exam ple refact oring would be t o renam e t he get Persons( ) operat ion t o get People( ) . To im plem ent t his refact oring, you m ust change t he operat ion definit ion, and t hen change every single invocat ion of t his operat ion t hroughout your applicat ion code. A refact oring is not com plet e unt il your code runs again as before. Sim ilarly, a dat abase refact oring is a sim ple change t o a dat abase schem a t hat im proves it s design while ret aining bot h it s behavioral and inform at ional sem ant ics. You could refact or eit her st ruct ural aspect s of your dat abase schem a such as t able and view definit ions or funct ional aspect s such as st ored procedures and t riggers. When you refact or your dat abase schem a, not only m ust you rework t he schem a it self, but also t he ext ernal syst em s, such as business applicat ions or dat a ext ract s, which are coupled t o your schem a. Dat abase refact orings are clearly m ore difficult t o im plem ent t han code refact orings; t herefore, you need t o be careful. Dat abase refact oring is described in det ail in Chapt er 2, and t he process of perform ing a dat abase refact oring in Chapt er 3.

1.2. Evolutionary Data Modeling Regardless of what you m ay have heard, evolut ionary and agile t echniques are not sim ply " code and fix" wit h a new nam e. You st ill need t o explore requirem ent s and t o t hink t hrough your archit ect ure and design before you build it , and one good way of doing so is t o m odel before you code. Figure 1.1 reviews t he life cycle for Agile Mobile Driven Developm ent ( AMDD) ( Am bler 2004; Am bler 2002) . Wit h AMDD, you creat e init ial, high- level m odels at t he beginning of a proj ect , m odels t hat overview t he scope of t he problem dom ain t hat you are addressing as well as a pot ent ial archit ect ure t o build t o. One of t he m odels t hat you t ypically creat e is a " slim " concept ual/ dom ain m odel t hat depict s t he m ain business ent it ies and t he relat ionships bet ween t hem ( Fowler and Sadalage 2003). Figure 1.2 depict s an exam ple for a sim ple financial inst it ut ion. The am ount of det ail shown in t his exam ple is all t hat you need at t he beginning of a proj ect ; your goal is t o t hink t hrough m aj or issues early in your proj ect wit hout invest ing in needless det ails right awayyou can work t hrough t he det ails lat er on a j ust - in- t im e ( JI T) basis.

Figu r e 1 .1 . Th e Agile M ode l- D r ive n D e ve lopm e n t ( AM D D ) life cycle .

Figu r e 1 .2 . Con ce pt u a l/ dom a in m ode l for a fict ion a l fin a n cia l in st it u t ion u sin g UM L.

[View full size image]

Your concept ual m odel will nat urally evolve as your underst anding of t he dom ain grows, but t he level of det ail will rem ain t he sam e. Det ails are capt ured wit hin your obj ect m odel ( which could be your source code) and your physical dat a m odel. These m odels are guided by your concept ual dom ain m odel and are developed in parallel along wit h ot her art ifact s t o ensure consist ency. Figure 1.3 depict s a det ailed physical dat a m odel ( PDM) t hat represent s t he ext ent of t he m odel at t he end of t he t hird developm ent cycle. I f " cycle 0" was one week in lengt h, a period of t im e t ypical for proj ect s of less t han one year, and developm ent cycles are t wo weeks in lengt h, t his is t he PDM t hat exist s at t he end of t he sevent h week on t he proj ect . The PDM reflect s t he dat a requirem ent s, and any legacy const raint s, of t he proj ect up unt il t his point . The dat a requirem ent s for fut ure developm ent cycles are m odeled during t hose cycles on a JI T basis.

Figu r e 1 .3 . D e t a ile d ph ysica l da t a m ode l ( PD M ) u sin g UM L. [View full size image]

Evolut ionary dat a m odeling is not easy. You need t o t ake legacy dat a const raint s int o account , and as we all know, legacy dat a sources are oft en nast y beast s t hat will m aim an unwary soft ware developm ent proj ect . Luckily, good dat a professionals underst and t he nuances of t heir organizat ion's dat a sources, and t his expert ise can be applied on a JI T basis as easily as it could on a serial basis. You st ill need t o apply int elligent dat a m odeling convent ions, j ust as Agile Modeling's Apply Modeling St andards pract ice suggest s. A det ailed exam ple of evolut ionary/ agile dat a m odeling is post ed at www.agiledat a.org/ essays/ agileDat aModeling.ht m l.

1.3. Database Regression Testing To safely change exist ing soft ware, eit her t o refact or it or t o add new funct ionalit y, you need t o be able t o verify t hat you have not broken anyt hing aft er you have m ade t he change. I n ot her words, you need t o be able t o run a full regression t est on your syst em . I f you discover t hat you have broken som et hing, you m ust eit her fix it or roll back your changes. Wit hin t he developm ent com m unit y, it has becom e increasingly com m on for program m ers t o develop a full unit t est suit e in parallel wit h t heir dom ain code, and in fact agilist s prefer t o writ e t heir t est code before t hey writ e t heir " real" code. Just like you t est your applicat ion source code, shouldn't you also t est your dat abase? I m port ant business logic is im plem ent ed wit hin your dat abase in t he form of st ored procedures, dat a validat ion rules, and referent ial int egrit y ( RI ) rules, business logic t hat clearly should be t est ed t horoughly. Test - First Developm ent ( TFD) , also known as Test - First Program m ing, is an evolut ionary approach t o developm ent ; you m ust first writ e a t est t hat fails before you writ e new funct ional code. As depict ed by t he UML act ivit y diagram of Figure 1.4 , t he st eps of TFD are as follows:

Figu r e 1 .4 . A t e st - fir st a ppr oa ch t o de ve lopm e n t.

1 . Quickly add a t est , basically j ust enough code so t hat your t est s now fail. 2 . Run your t est soft en t he com plet e t est suit e, alt hough for t he sake of speed you m ay decide t o run only a subset t o ensure t hat t he new t est does in fact fail. 3 . Updat e your funct ional code so t hat it passes t he new t est . 4 . Run your t est s again. I f t he t est s fail, ret urn t o St ep 3; ot herwise, st art over again.

The prim ary advant ages of TFD are t hat it forces you t o t hink t hrough new funct ionalit y before you im plem ent it ( you're effect ively doing det ailed design) , it ensures t hat you have t est ing code available t o validat e your work, and it gives you t he courage t o know t hat you can evolve your syst em because you know t hat you can det ect whet her you have " broken" anyt hing as t he result of t he change. Just like having a full regression t est suit e for your applicat ion source code enables code refact oring, having a full regression t est suit e for your dat abase enables dat abase refact oring ( Meszaros 2006) . Test - Driven Developm ent ( TDD) ( Ast els 2003; Beck 2003) is t he com binat ion of TFD and refact oring. You first writ e your code t aking a TFD approach; t hen aft er it is working, you ensure t hat your design rem ains of high qualit y by refact oring it as needed. As you refact or, you m ust rerun your regression t est s t o verify t hat you have not broken anyt hing. An im port ant im plicat ion is t hat you will likely need several unit t est ing t ools, at least one for your dat abase and one for each program m ing language used in ext ernal program s. The XUnit fam ily of t ools ( for exam ple, JUnit for Java, VBUnit for Visual Basic, NUnit for .NET, and OUnit for Oracle) luckily are free and fairly consist ent wit h one anot her.

1.4. Configuration Management of Database Artifacts Som et im es a change t o your syst em proves t o be a bad idea and you need t o roll back t hat change t o t he previous st at e. For exam ple, renam ing t he Cust om er.FNam e colum n t o Cust om er.First Nam e m ight break 50 ext ernal program s, and t he cost t o updat e t hose program s m ay prove t o be t oo great for now. To enable dat abase refact oring, you need t o put t he following it em s under configurat ion m anagem ent cont rol: Dat a definit ion language ( DDL) script s t o creat e t he dat abase schem a Dat a load/ ext ract / m igrat ion script s Dat a m odel files Obj ect / relat ional m apping m et a dat a Reference dat a St ored procedure and t rigger definit ions View definit ions Referent ial int egrit y const raint s Ot her dat abase obj ect s like sequences, indexes, and so on Test dat a Test dat a generat ion script s Test script s

1.5. Developer Sandboxes A " sandbox" is a fully funct ioning environm ent in which a syst em m ay be built , t est ed, and/ or run. You want t o keep your various sandboxes separat ed for safet y reasonsdevelopers should be able t o work wit hin t heir own sandbox wit hout fear of harm ing ot her effort s, your qualit y assurance/ t est group should be able t o run t heir syst em int egrat ion t est s safely, and your end users should be able t o run t heir syst em s wit hout having t o worry about developers corrupt ing t heir source dat a and/ or syst em funct ionalit y. Figure 1.5 depict s a logical organizat ion for your sandboxeswe say t hat it is logical because a large/ com plex environm ent m ay have seven or eight physical sandboxes, whereas a sm all/ sim ple environm ent m ay only have t wo or t hree physical sandboxes.

Figu r e 1 .5 . Logica l sa n dbox e s t o pr ovide de ve lope r s w it h sa fe t y.

To successfully refact or your dat abase schem a, developers need t o have t heir own physical sandboxes t o work in, a copy of t he source code t o evolve, and a copy of t he dat abase t o work wit h and evolve. By having t heir own environm ent , t hey can safely m ake changes, t est t hem , and eit her adopt or back out of t hem . When t hey are sat isfied t hat a dat abase refact oring is viable, t hey prom ot e it int o t heir shared proj ect environm ent , t est it , and put it under change m anagem ent cont rol so t hat t he rest of t he t eam get s it . Event ually, t he t eam prom ot es t heir work, including all dat abase refact orings, int o any dem o and/ or preproduct ion t est ing environm ent s. This prom ot ion oft en occurs once a developm ent cycle, but could occur m ore or less oft en depending on your environm ent . ( The m ore oft en you prom ot e your syst em , t he great er t he chance of receiving valuable feedback.) Finally, aft er your syst em passes accept ance and syst em t est ing, it will be deployed int o product ion. Chapt er 4, " Deploying int o Product ion," covers t his prom ot ion/ deploym ent process in great er det ail.

1.6. Impediments to Evolutionary Database Development Techniques We would be rem iss if we did not discuss t he com m on im pedim ent s t o adopt ing t he t echniques described in t his book. The first im pedim ent , and t he hardest one t o overcom e, is cult ural. Many of t oday's dat a professionals began t heir careers in t he 1970s and early 1980s when " code- and- fix" approaches t o developm ent were com m on. The I T com m unit y recognized t hat t his approach result ed in low- qualit y, difficult - t o- m aint ain code and adopt ed t he heavy, st ruct ured developm ent t echniques t hat m any st ill follow t oday. Because of t hese experiences, t he m aj orit y of dat a professionals believed t hat t he evolut ionary t echniques int roduced by t he obj ect t echnology revolut ion of t he 1990s were j ust a rehash of t he code- and- fix approaches of t he 1970s; t o be fair, m any obj ect pract it ioners did in fact choose t o work t hat way. They have chosen t o equat e evolut ionary approaches wit h low qualit y; but as t he agile com m unit y has shown, t his does not have t o be t he case. The end result is t hat t he m aj orit y of dat a- orient ed lit erat ure appears t o be m ired in t he t radit ional, serial t hought processes of t he past and has m ost ly m issed agile approaches. The dat a com m unit y has a lot of cat ching up t o do, and t hat is going t o t ake t im e. The second im pedim ent is a lack of t ooling, alt hough open source effort s ( at least wit hin t he Java com m unit y) are quickly filling in t he gaps. Alt hough a lot of effort has been put int o t he developm ent of obj ect / relat ional ( O/ R) m apping t ools, and som e int o dat abase t est ing t ools, t here is st ill a lot of work t o be done. Just like it t ook several years for program m ing t ool vendors t o im plem ent refact oring funct ionalit y wit hin t heir t oolsin fact , now you would be hard pressed t o find a m odern int egrat ed developm ent environm ent ( I DE) t hat does not offer such feat uresit will t ake several years for dat abase t ool vendors t o do t he sam e. Clearly, a need exist s for usable, flexible t ools t hat enable evolut ionary developm ent of a dat abase schem at he open source com m unit y is clearly st art ing t o fill t hat gap, and we suspect t hat t he com m ercial t ool vendors will event ually do t he sam e.

1.7. What You Have Learned Evolut ionary approaches t o developm ent t hat are it erat ive and increm ent al in nat ure are t he de fact o st andard for m odern soft ware developm ent . When a proj ect t eam decides t o t ake t his approach t o developm ent , everyone on t hat t eam m ust work in an evolut ionary m anner, including t he dat a professionals. Luckily, evolut ionary t echniques exist t hat enable dat a professionals t o work in an evolut ionary m anner. These t echniques include dat abase refact oring, evolut ionary dat a m odeling, dat abase regression t est ing, configurat ion m anagem ent of dat a- orient ed art ifact s, and separat e developer sandboxes.

Chapter 2. Database Refactoring As soon as one freezes a design, it becom es obsolet e. Fred Brooks This chapt er overviews t he fundam ent al concept s behind dat abase refact oring, explaining what it is, how it fit s int o your developm ent effort s, and why it is oft en hard t o do successfully. I n t he following chapt ers, we describe in det ail t he act ual process of refact oring your dat abase schem a.

2.1. Code Refactoring I n Refact oring, Mart in Fowler ( 1999) describes t he program m ing t echnique called refact oring, which is a disciplined way t o rest ruct ure code in sm all st eps. Refact oring enables you t o evolve your code slowly over t im e, t o t ake an evolut ionary ( it erat ive and increm ent al) approach t o program m ing. A crit ical aspect of a refact oring is t hat it ret ains t he behavioral sem ant ics of your code. You do not add funct ionalit y when you are refact oring, nor do you t ake it away. A refact oring m erely im proves t he design of your codenot hing m ore and not hing less. For exam ple, in Figure 2.1 we apply t he Push Down Met hod refact oring t o m ove t he calculat eTot al( ) operat ion from Offering int o it s subclass I nvoice. This change looks easy on t he surface, but you m ay also need t o change t he code t hat invokes t his operat ion t o work wit h I nvoice obj ect s rat her t han Offering obj ect s. Aft er you have m ade t hese changes, you can say you have t ruly refact ored your code because it works again as before.

Figu r e 2 .1 . Pu sh in g a m e t h od dow n in t o a su bcla ss.

Clearly, you need a syst em at ic way t o refact or your code, including good t ools and t echniques t o do so. Most m odern int egrat ed developm ent environm ent s ( I DEs) now support code refact oring t o som e ext ent , which is a good st art . However, t o m ake refact oring work in pract ice, you also need t o develop an up- t o- dat e regression- t est ing suit e t hat validat es t hat your code st ill worksyou will not have t he confidence t o refact or your code if you cannot be reasonably assured t hat you have not broken it . Many agile developers, and in part icular Ext rem e Program m ers ( XPers) , consider refact oring t o be a prim ary developm ent pract ice. I t is j ust as com m on t o refact or a bit of code as it is t o int roduce an if st at em ent or a loop. You should refact or your code m ercilessly because you are m ost product ive when you are working on high- qualit y source code. When you have a new feat ure t o add t o your code, t he first quest ion t hat you should ask is " I s t his code t he best design possible t hat enables m e t o add t his feat ure?" I f t he answer is yes, add t he feat ure. I f t he answer is no, first refact or your code t o m ake it t he best design possible, and t hen add t he feat ure. On t he surface, t his sounds like a lot of work; in pract ice, however, if you st art wit h high- qualit y source code, and t hen refact or it t o keep it so, you will find t hat t his approach works incredibly well.

2.2. Database Refactoring A dat abase refact oring ( Am bler 2003) is a sim ple change t o a dat abase schem a t hat im proves it s design while ret aining bot h it s behavioral and inform at ional sem ant icsin ot her words, you cannot add new funct ionalit y or break exist ing funct ionalit y, nor can you add new dat a or change t he m eaning of exist ing dat a. From our point of view, a dat abase schem a includes bot h st ruct ural aspect s, such as t able and view definit ions, and funct ional aspect s, such as st ored procedures and t riggers. From t his point forward, we use t he t erm s code refact oring t o refer t o t radit ional refact oring as described by Mart in Fowler and dat abase refact oring t o refer t o t he refact oring of dat abase schem as. The process of dat abase refact oring, described in det ail in Chapt er 3, is t he act of m aking t hese sim ple changes t o your dat abase schem a. Dat abase refact orings are concept ually m ore difficult t han code refact orings: Code refact orings only need t o m aint ain behavioral sem ant ics, whereas dat abase refact orings m ust also m aint ain inform at ional sem ant ics. Worse yet , dat abase refact orings can becom e m ore com plicat ed by t he am ount of coupling result ing from your dat abase archit ect ure, overviewed in Figure 2.2 . Coupling is a m easure of t he dependence bet ween t wo it em s; t he m ore highly coupled t wo t hings are, t he great er t he chance t hat a change in one will require a change in anot her. The single- applicat ion dat abase archit ect ure is t he sim plest sit uat ionyour applicat ion is t he only one int eract ing wit h your dat abase, enabling you t o refact or bot h in parallel and deploy bot h sim ult aneously. These sit uat ions do exist and are oft en referred t o as st andalone applicat ions or st ovepipe syst em s. The second archit ect ure is m uch m ore com plicat ed because you have m any ext ernal program s int eract ing wit h your dat abase, som e of which are beyond t he scope of your cont rol. I n t his sit uat ion, you cannot assum e t hat all t he ext ernal program s will be deployed at once, and m ust t herefore support a t ransit ion period ( also referred t o as a deprecat ion period) during which bot h t he old schem a and t he new schem a are support ed in parallel. More on t his lat er.

Figu r e 2 .2 . Th e t w o ca t e gor ie s of da t a ba se a r ch it e ct u r e. [View full size image]

Alt hough we discuss t he single- applicat ion environm ent t hroughout t he book, we focus m ore on t he m ult i- applicat ion environm ent , in which your dat abase current ly exist s in product ion and is accessed by m any ot her ext ernal program s over which you have lit t le or no cont rol. Don't worry. I n Chapt er 3,

we describe st rat egies for working in t his sort of sit uat ion. To put dat abase refact oring int o cont ext , let 's st ep t hrough a quick exam ple. You have been working on a banking applicat ion for a few weeks and have not iced som et hing st range about t he Cust om er and Account t ables depict ed in Figure 2.3 . Does it really m ake sense t hat t he Balance colum n be part of the Cust om er t able? No, so let 's apply t he Move Colum n ( page 103) refact oring t o im prove our dat abase design.

Figu r e 2 .3 . Th e in it ia l da t a ba se sch e m a for Cu st om e r a n d Accou n t . [View full size image]

2.2.1. Single-Application Database Environments Let 's st art by working t hrough an exam ple of m oving a colum n from one t able t o anot her wit hin a single- applicat ion dat abase environm ent . This is t he sim plest sit uat ion t hat you will ever be in, because you have com plet e cont rol over bot h t he dat abase schem a and t he applicat ion source code t hat accesses it . The im plicat ion is t hat you can refact or bot h your dat abase schem a and your applicat ion code sim ult aneouslyyou do not need t o support bot h t he original and new dat abase schem as in parallel because only t he one applicat ion accesses your dat abase. I n t his scenario, we suggest t hat t wo people work t oget her as a pair; one person should have applicat ion program m ing skills, and t he ot her dat abase developm ent skills, and ideally bot h people have bot h set s of skills. This pair begins by det erm ining whet her t he dat abase schem a needs t o be refact ored. Perhaps t he program m er is m ist aken about t he need t o evolve t he schem a, and how best t o go about t he refact oring. The refact oring is first developed and t est ed wit hin t he developer's sandbox. When it is finished, t he changes are prom ot ed int o t he proj ect - int egrat ion environm ent , and t he syst em is rebuilt , t est ed, and fixed as needed. To apply t he Move Colum n ( page 103) refact oring in t he developm ent sandbox, t he pair first runs all t he t est s t o see t hat t hey pass. Next , t hey writ e a t est because t hey are t aking a Test - Driven Developm ent ( TDD) approach. A likely t est is t o access a value in t he Account .Balance colum n. Aft er running t he t est s and seeing t hem fail, t hey int roduce t he Account .Balance colum n, as you see in Figure 2.4 . They rerun t he t est s and see t hat t he t est s now pass. They t hen refact or t he exist ing t est s, which verify t hat cust om er deposit s work properly wit h t he Account .Balance colum n rat her t han t he Cust om er.Balance colum n. They see t hat t hese t est s fail, and t herefore rework t he deposit funct ionalit y t o work wit h Account .Balance. They m ake sim ilar changes t o ot her code wit hin t he t est s suit e and t he applicat ion, such as wit hdrawal logic, t hat current ly works wit h Cust om er.Balance.

Figu r e 2 .4 . Th e fin a l da t a ba se sch e m a for Cu st om e r a n d Accou n t . [View full size image]

Aft er t he applicat ion is running again, t hey t hen back up t he dat a in Cust om er.Balance, for safet y purposes, and t hen copy t he dat a from Cust om er.Balance int o t he appropriat e row of Account .Balance. They rerun t heir t est s t o verify t hat t he dat a m igrat ion has safely occurred. To com plet e t he schem a changes, t he final st ep is t o drop t he Cust om er.Balance colum n and t hen rerun all t est s and fix anyt hing as necessary. When t hey finish doing so, t hey prom ot e t heir changes int o t he proj ect - int egrat ion environm ent as described earlier.

2.2.2. Multi-Application Database Environments This sit uat ion is m ore difficult because t he individual applicat ions have new releases deployed at different t im es over t he next year and a half. To im plem ent t his dat abase refact oring, you do t he sam e sort of work t hat you did for t he single- applicat ion dat abase environm ent , except t hat you do not delet e t he Cust om er.Balance colum n right away. I nst ead, you run bot h colum ns in parallel during a " t ransit ion period" of at least 1.5 years t o give t he developm ent t eam s t im e t o updat e and redeploy all of t heir applicat ions. This port ion of t he dat abase schem a during t he t ransit ion period is shown in Figure 2.5 . Not ice how t here are t wo t riggers, SynchronizeCust om erBalance and Synchronize Account Balance, which are run in product ion during t he t ransit ion period t o keep t he t wo colum ns in sync.

Figu r e 2 .5 . Th e da t a ba se sch e m a du r in g t h e t r a n sit ion pe r iod. [View full size image]

Why such a long period of t im e for t he t ransit ion period? Because som e applicat ions current ly are not being worked on, whereas ot her applicat ions are following a t radit ional developm ent life cycle and only release every year or soyour t ransit ion period m ust t ake int o account t he slow t eam s as well as t he fast ones. Furt herm ore, because you cannot count on t he individual applicat ions t o updat e bot h colum ns, you need t o provide a m echanism such as t riggers t o keep t heir values synchronized. There are ot her opt ions t o do t his, such as views or synchronizat ion aft er t he fact , but as we discuss in Chapt er 5, " Dat abase Refact oring St rat egies," we find t hat t riggers work best . Aft er t he t ransit ion period, you rem ove t he original colum n plus t he t rigger( s) , result ing in t he final dat abase schem a of Figure 2.4 . You rem ove t hese t hings only aft er sufficient t est ing t o ensure t hat it is safe t o do so. At t his point , your refact oring is com plet e. I n Chapt er 3, we work t hrough im plem ent ing t his exam ple in det ail.

2.2.3. Maintaining Semantics

When you refact or a dat abase schem a, you m ust m aint ain bot h t he inform at ional and behavioral sem ant icsyou should neit her add anyt hing nor t ake anyt hing away. I nform at ional sem ant ics refers t o t he m eaning of t he inform at ion wit hin t he dat abase from t he point of view of t he users of t hat inform at ion. Preserving t he inform at ional sem ant ics im plies t hat if you change t he values of t he dat a st ored in a colum n, t he client s of t hat inform at ion should not be affect ed by t he changefor exam ple, if you apply the I nt roduce Com m on Form at ( page 183) dat abase refact oring t o a charact er- based phone num ber colum n t o t ransform dat a such as ( 416) 555- 1234 and 905.555.1212 int o 4165551234 and 9055551212, respect ively. Alt hough t he form at has been im proved, requiring sim pler code t o work wit h t he dat a, from a pract ical point of view t he t rue inform at ion cont ent has not . Not e t hat you would st ill choose t o display phone num bers in ( XXX) XXX- XXXX form at ; you j ust would not st ore t he inform at ion in t hat m anner. Focusing on pract icalit y is a crit ical issue when it com es t o dat abase refact oring. Mart in Fowler likes t o t alk about t he issue of " observable behavior" when it com es t o code refact oring, his point being t hat wit h m any refact orings you cannot be com plet ely sure t hat you have not changed t he sem ant ics in som e sm all way, t hat all you can hope for is t o t hink it t hrough as best you can, t o writ e what you believe t o be sufficient t est s, and t hen run t hose t est s t o verify t hat t he sem ant ics have not changed. I n our experience, a sim ilar issue exist s when it com es t o preserving inform at ion sem ant ics when refact oring a dat abase schem achanging ( 416) 555- 1234 t o 4165551234 m ay in fact have changed t he sem ant ics of t hat inform at ion for an applicat ion in som e slight ly nuanced way t hat we do not know about . For exam ple, perhaps a report exist s t hat som ehow only works wit h dat a rows t hat have phone num bers in t he ( XXX) XXX- XXXX form at , and t he report relies on t hat fact . Now t he report is out put t ing num bers in t he XXXXXXXXXX form at , m aking it harder t o read, even t hough from a pract ical sense t he sam e inform at ion is st ill being out put . When t he problem is event ually discovered, t he report m ay need t o be updat ed t o reflect t he new form at . Sim ilarly, wit h respect t o behavioral sem ant ics, t he goal is t o keep t he black- box funct ionalit y t he sam eany source code t hat works wit h t he changed aspect s of your dat abase schem a m ust be reworked t o accom plish t he sam e funct ionalit y as before. For exam ple, if you apply I nt roduce Calculat ion Met hod ( page 245) , you m ay want t o rework ot her exist ing st ored procedures t o invoke t hat m et hod rat her t han im plem ent t he sam e logic for t hat calculat ion. Overall, your dat abase st ill im plem ent s t he sam e logic, but now t he calculat ion logic is j ust in one place. I t is im port ant t o recognize t hat dat abase refact orings are a subset of dat abase t ransform at ions. A dat abase t ransform at ion m ay or m ay not change t he sem ant ics; a dat abase refact oring does not . We describe several com m on dat abase t ransform at ions in Chapt er 11, " Non- Refact oring Transit ions," because t hey are not only im port ant t o underst and, t hey can oft en be a st ep wit hin a dat abase refact oring. For exam ple, when applying t he Move Colum n earlier t o m ove t he Balance colum n from Cust om er t o Account , you needed t o apply t he I nt roduce Colum n t ransform at ion ( page 180) as one of t he st eps. On t he surface, t he I nt roduce Colum n sounds like a perfect ly fine refact oring; adding an em pt y colum n t o a t able does not change t he sem ant ics of t hat t able unt il new funct ionalit y begins t o use it . We st ill consider it a t ransform at ion ( but not a refact oring) because it could inadvert ent ly change t he behavior of an applicat ion. For exam ple, if we int roduce t he colum n in t he m iddle of t he t able, any program logic using posit ional access ( for exam ple, code t hat refers t o colum n 17 rat her t han t he colum n's nam e) will break. Furt herm ore, COBOL code bound t o a DB2 t able will break if it is not rebound t o t he new schem a, even if t he colum n is added at t he end of t he t able. I n t he end, pract icalit y should be your guide. I f we were t o label I nt roduce Colum n as a refact oring, or as a " Yabba Dabba Do" for all t hat m at t er, would it affect t he way t hat you use it ? We hope not .

Why Not Just Get It Right Up Front? We are oft en t old by exist ing dat a professionals t hat t he real solut ion is t o m odel everyt hing up front , and t hen you would not need t o refact or your dat abase schem a. Alt hough t hat is an int erest ing vision, and we have seen it work in a few sit uat ions, experience from t he past t hree decades has shown t hat t his approach does not seem t o be working well in pract ice for t he overall I T com m unit y. The t radit ional approach t o dat a m odeling does not reflect t he evolut ionary approach of m odern m et hods such as t he RUP and XP, nor does it reflect t he fact t hat business cust om ers are dem anding new feat ures and changes t o exist ing funct ionalit y at an accelerat ing rat e. The old ways are sim ply no longer sufficient . As discussed in Chapt er 1, " Evolut ionary Dat abase Developm ent ," we suggest t hat you t ake an Agile Model- Driven Developm ent ( AMDD) approach, in which you do som e highlevel m odeling t o ident ify t he overall " landscape" of your syst em , and t hen m odel st orm t he det ails on a j ust - in- t im e ( JI T) basis. Take advant age of t he benefit s of m odeling wit hout suffering from t he cost s of overm odeling, overdocum ent at ion, and t he result ing bureaucracy of t rying t o keep t oo m any art ifact s up- t o- dat e and synchronized wit h one anot her. Your applicat ion code and your dat abase schem a evolve as your underst anding of t he problem dom ain evolves, and you m aint ain qualit y t hrough refact oring bot h.

2.3. Categories of Database Refactorings We also dist inguish six different cat egories of dat abase refact orings, as described in Table 2.1. This cat egorizat ion st rat egy was int roduced t o help organize t his book, and hopefully t o help organize fut ure dat abase refact oring t ools. Our cat egorizat ion st rat egy is not perfect ; for exam ple, t he Replace Met hod Wit h View refact oring ( page 265) arguably fit s int o bot h t he Archit ect ural and Met hod cat egories. ( We have put it int o t he Archit ect ural cat egory.)

Ta ble 2 .1 . D a t a ba se Re fa ct or in g Ca t e gor ie s D a t a ba se Re fa ct or in g Ca t e gor y D e scr ipt ion

Ex a m ple ( s)

St ruct ural( Chapt er 6)

A change t o t he definit ion of one or m ore t ables or views.

Moving a colum n from one t able t o anot her or split t ing a m ult ipurpose colum n int o several separat e colum ns, one for each purpose.

Dat a Qualit y(Chapt er 7)

A change t hat im proves t he qualit y of t he inform at ion cont ained wit hin a dat abase.

Making a colum n non- nullable t o ensure t hat it always cont ains a value or applying a com m on form at t o a colum n t o ensure consist ency.

Referent ial I nt egrit y( Chapt er 8)

A change t hat ensures t hat a referenced row exist s wit hin anot her t able and/ or t hat ensures t hat a row t hat is no longer needed is rem oved appropriat ely.

Adding a t rigger t o enable a cascading delet e bet ween t wo ent it ies, code t hat was form erly im plem ent ed out side of t he dat abase.

Archit ect ural( Chapt er 9)

A change t hat im proves t he overall m anner in which ext ernal program s int eract wit h a dat abase.

Replacing an exist ing Java operat ion in a shared code library wit h a st ored procedure in t he dat abase. Having it as a st ored procedure m akes it available t o non- Java applicat ions.

Met hod( Chapt er 10)

A change t o a m et hod ( a st ored procedure, st ored funct ion, or t rigger) t hat im proves it s qualit y. Many code refact orings are applicable t o dat abase m et hods.

Renam ing a st ored procedure t o m ake it easier t o underst and.

NonA change t o your dat abase Adding a new colum n t o an exist ing Refact oringTransform at ion( Chapt er schem a t hat changes it s t able. 11 ) sem ant ics.

2.4. Database Smells Fowler ( 1997) int roduced t he concept of a " code sm ell," a com m on cat egory of problem in your code t hat indicat es t he need t o refact or it . Com m on code sm ells include swit ch st at em ent s, long m et hods, duplicat ed code, and feat ure envy. Sim ilarly, t here are com m on dat abase sm ells t hat indicat e t he pot ent ial need t o refact or it ( Am bler 2003) . These sm ells include t he following: M u lt ipu r pose colu m n . I f a colum n is being used for several purposes, it is likely t hat ext ra code exist s t o ensure t hat t he source dat a is being used t he " right way," oft en by checking t he values of one or m ore ot her colum ns. An exam ple is a colum n used t o st ore eit her som eone's birt h dat e if he or she is a cust om er or t he st art dat e if t hat person is an em ployee. Worse yet , you are likely const rained in t he funct ionalit y t hat you can now support for exam ple, how would you st ore t he birt h dat e of an em ployee? M u lt ipu r pose t a ble . Sim ilarly, when a t able is being used t o st ore several t ypes of ent it ies, t here is likely a design flaw. An exam ple is a generic Cust om er t able t hat is used t o st ore inform at ion about bot h people and corporat ions. The problem wit h t his approach is t hat dat a st ruct ures for people and corporat ions differpeople have a first , m iddle, and last nam e, for exam ple; whereas a corporat ion sim ply has a legal nam e. A generic Cust om er t able would have colum ns t hat are NULL for som e kinds of cust om ers but not ot hers. Re du n da n t da t a . Redundant dat a is a serious problem in operat ional dat abases because when dat a is st ored in several places, t he opport unit y for inconsist ency occurs. For exam ple, it is quit e com m on t o discover t hat cust om er inform at ion is st ored in m any different places wit hin your organizat ion. I n fact , m any com panies are unable t o put t oget her an accurat e list of who t heir cust om ers act ually are. The problem is t hat in one t able John Sm it h lives at 123 Main St reet , and in anot her t able at 456 Elm St reet . I n t his case, t his is act ually one person who used t o live at 123 Main St reet but who m oved last year; unfort unat ely, John did not subm it t wo change of address form s t o your com pany, one for each applicat ion t hat knows about him . Ta ble s w it h t oo m a n y colu m n s. When a t able has m any colum ns, it is indicat ive t hat t he t able lacks cohesiont hat it is t rying t o st ore dat a from several ent it ies. Perhaps your Cust om er t able cont ains colum ns t o st ore t hree different addresses ( shipping, billing, seasonal) or several phone num bers ( hom e, work, cell, and so on) . You likely need t o norm alize t his st ruct ure by adding Address and PhoneNum ber t ables. Ta ble s w it h t oo m a n y r ow s. Large t ables are indicat ive of perform ance problem s. For exam ple, it is t im e- consum ing t o search a t able wit h m illions of rows. You m ay want t o split t he t able vert ically by m oving som e colum ns int o anot her t able, or split it horizont ally by m oving som e rows int o anot her t able. Bot h st rat egies reduce t he size of t he t able, pot ent ially im proving perform ance. " Sm a r t " colu m n s. A sm art colum n is one in which different posit ions wit hin t he dat a represent different concept s. For exam ple, if t he first four digit s of t he client I D indicat e t he client 's hom e branch, t hen client I D is a sm art colum n because you can parse it t o discover m ore granular inform at ion ( for exam ple, hom e branch I D) . Anot her exam ple includes a t ext colum n used t o st ore XML dat a st ruct ures; clearly, you can parse t he XML dat a st ruct ure for sm aller dat a fields. Sm art colum ns oft en need t o be reorganized int o t heir const it uent dat a fields at som e point so t hat t he dat abase can easily deal wit h t hem as separat e elem ent s. Fe a r of ch a n ge . I f you are afraid t o change your dat abase schem a because you are afraid t o break som et hingfor exam ple, t he 50 applicat ions t hat access it t hat is t he surest sign t hat you need t o refact or your schem a. Fear of change is a good indicat ion t hat you have a serious t echnical risk on your hands, one t hat will only get worse over t im e.

I t is im port ant t o underst and t hat j ust because som et hing sm ells, it does not m ean t hat it is badlim burger cheese sm ells even when it is perfect ly fine. However, when m ilk sm ells bad, you know t hat you have a problem . I f som et hing sm ells, look at it , t hink about it , and refact or it if it m akes sense.

2.5. How Database Refactoring Fits In Modern soft ware developm ent processes, including t he Rat ional Unified Process ( RUP) , Ext rem e Program m ing ( XP) , Agile Unified Process ( AUP) , Scrum , and Dynam ic Syst em Developm ent Met hod ( DSDM) , are all evolut ionary in nat ure. Craig Larm an ( 2004) sum m arizes t he research evidence, as well as t he overwhelm ing support am ong t he t hought leaders wit hin t he I T com m unit y, in support of evolut ionary approaches. Unfort unat ely, m ost dat a- orient ed t echniques are serial in nat ure, relying on specialist s perform ing relat ively narrow t asks, such as logical dat a m odeling or physical dat a m odeling. Therein lies t he rubt he t wo groups need t o work t oget her, but bot h want t o do so in different m anners. Our posit ion is t hat dat a professionals can benefit from adopt ing m odern evolut ionary t echniques sim ilar t o t hose of developers, and t hat dat abase refact oring is one of several im port ant skills t hat dat a professionals require. Unfort unat ely, t he dat a com m unit y m issed t he obj ect revolut ion of t he 1990s, which m eans t hey m issed out on opport unit ies t o learn t he evolut ionary t echniques t hat applicat ion program m ers now t ake for grant ed. I n m any ways, t he dat a com m unit y is also m issing out on t he agile revolut ion, which is t aking evolut ionary developm ent one st ep furt her t o m ake it highly collaborat ive and cooperat ive. Dat abase refact oring is a dat abase im plem ent at ion t echnique, j ust like code refact oring is an applicat ion im plem ent at ion t echnique. You refact or your dat abase schem a t o ease addit ions t o it . You oft en find t hat you have t o add a new feat ure t o a dat abase, such as a new colum n or st ored procedure, but t he exist ing design is not t he best one possible t o easily support t hat new feat ure. You st art by refact oring your dat abase schem a t o m ake it easier t o add t he feat ure, and aft er t he refact oring has been successfully applied, you t hen add t he feat ure. The advant age of t his approach is t hat you are slowly, but const ant ly, im proving t he qualit y of your dat abase design. This process not only m akes your dat abase easier t o underst and and use, it also m akes it easier t o evolve over t im e; in ot her words, you im prove your overall developm ent product ivit y. Figure 2.6 provides a high- level overview of t he crit ical developm ent act ivit ies t hat occur on a m odern proj ect working wit h bot h obj ect and relat ional dat abase t echnologies. Not ice how all t he arrows are bidirect ional. You it erat e back and fort h bet ween act ivit ies as needed. Also not ice how t here is neit her a defined st art ing point nor a defined ending point t his clearly is not a t radit ional, serial process.

Figu r e 2 .6 . Pot e n t ia l de ve lopm e n t a ct ivit ie s on a n e volu t ion a r y de ve lopm e n t pr oj e ct .

Dat abase refact oring is only part of t he evolut ionary dat abase developm ent pict ure. You st ill need t o t ake an evolut ionary/ agile approach t o dat a m odeling. You st ill need t o t est your dat abase schem a and put it under configurat ion m anagem ent cont rol. And, you st ill need t o t une it appropriat ely. These are t opics bet t er left t o ot her books.

2.6. Making It Easier to Refactor Your Database Schema The great er t he coupling, t he harder it is t o refact or som et hing. This is t rue of code refact oring, and it is cert ainly t rue of dat abase refact oring. Our experience is t hat coupling becom es a serious issue when you st art t o consider behavioral issues ( for exam ple, code) , som et hing t hat m any dat abase books choose not t o address. The easiest scenario is clearly t he single- applicat ion dat abase because your dat abase schem a will only be coupled t o it self and t o your applicat ion. Wit h t he m ult i- applicat ion dat abase archit ect ure depict ed in Figure 2.7 , your dat abase schem a is pot ent ially coupled t o applicat ion source code, persist ence fram eworks and Obj ect - Relat ional Mapping ( ORM) t ools, ot her dat abases ( via replicat ion, dat a ext ract s/ loads, and so on) , dat a file schem as, t est ing code, and even t o it self.

Figu r e 2 .7 . D a t a ba se s a r e h igh ly cou ple d t o e x t e r n a l pr ogr a m s.

An effect ive way t o decrease t he coupling t hat your dat abase is involved wit h is t o encapsulat e access t o it . You do t his by having ext ernal program s access your dat abase via persist ence layers, as depict ed in Figure 2.8 . A persist ence layer can be im plem ent ed in several waysvia dat a access obj ect s ( DAOs) , which im plem ent t he necessary SQL code; by fram eworks; via st ored procedures; or even via Web services. As you see in t he diagram , you can never get t he coupling down t o zero, but you can definit ely reduce it t o som et hing m anageable.

Figu r e 2 .8 . Re du cin g cou plin g via e n ca psu la t in g a cce ss.

2.7. What You Have Learned Code refact oring is a disciplined way t o rest ruct ure code in sm all, evolut ionary st eps t o im prove t he qualit y of it s design. A code refact oring ret ains t he behavioral sem ant ics of your code; it neit her adds funct ionalit y nor t akes funct ionalit y away. Sim ilarly, a dat abase refact oring is a sim ple change t o a dat abase schem a t hat im proves it s design while ret aining bot h it s behavioral and inform at ional sem ant ics. Dat abase refact oring is one of t he core t echniques t hat enable dat a professionals t o t ake an evolut ionary approach t o dat abase developm ent . The great er t he coupling t hat your dat abase is involved wit h, t he harder it will be t o refact or.

Chapter 3. The Process of Database Refactoring A new scient ific t rut h does not t rium ph by convincing it s opponent s and m aking t hem see t he light , but rat her because it s opponent s event ually die, and a new generat ion grows up t hat is fam iliar wit h it . Max Planck This chapt er describes how t o im plem ent a single refact oring wit hin your dat abase. We work t hrough an exam ple of applying the Move Colum n ( page 103) , a st ruct ural refact oring. Alt hough t his seem s like a sim ple refact oring, and it is, you will see it can be quit e com plex t o safely im plem ent it wit hin a product ion environm ent . Figure 3.1 overviews how we will m ove t he Cust om er.Balance colum n t o t he Account t able, a st raight forward change t o im prove t he dat abase design.

Figu r e 3 .1 . M ovin g t h e Cu st om e r .Ba la n ce colu m n t o Accou n t .

I n Chapt er 1, " Evolut ionary Dat abase Developm ent ," we overviewed t he concept of logical working sandboxesdevelopm ent sandboxes in which developers have t heir own copy of t he source code and dat abase t o work wit h; a proj ect - int egrat ion environm ent where t eam m em bers prom ot e and t hen t est t heir changes; preproduct ion environm ent s for syst em , int egrat ion, and user accept ance t est ing; and product ion. The hard work of dat abase refact oring is done wit hin your developm ent sandboxit is considered, im plem ent ed, and t est ed before it is prom ot ed int o ot her environm ent s. The focus of t his chapt er is on t he work t hat is perform ed wit hin your developm ent sandbox. Chapt er 4, " Deploying int o Product ion," covers t he prom ot ion and event ual deploym ent of your refact orings. Because we are describing what occurs wit hin your developm ent sandbox, t his process applies t o bot h t he single- applicat ion dat abase as well as t he m ult i- applicat ion dat abase environm ent s. The only real difference bet ween t he t wo sit uat ions is t he need for a longer t ransit ion period ( m ore on t his lat er) in t he m ult i- applicat ion scenario. Figure 3.2 depict s a UML 2 Act ivit y diagram t hat overviews t he dat abase refact oring process. The process begins wit h a developer who is t rying t o im plem ent a new requirem ent t o fix a defect . The

developer realizes t hat t he dat abase schem a m ay need t o be refact ored. I n t his exam ple, Eddy, a developer, is adding a new t ype of financial t ransact ion t o his applicat ion and realizes t hat t he Balance colum n act ually describes Account ent it ies, not Cust om er ent it ies. Because Eddy follows com m on agile pract ices such as pair program m ing ( William s & Kessler 2002) and m odeling wit h ot hers ( Am bler 2002) , he decides t o enlist t he help of Beverley, t he t eam 's dat abase adm inist rat or ( DBA) , t o help him t o apply t he refact oring. Toget her t hey it erat ively work t hrough t he following act ivit ies:

Figu r e 3 .2 . Th e da t a ba se r e fa ct or in g pr oce ss. [View full size image]

Verify t hat a dat abase refact oring is appropriat e. Choose t he m ost appropriat e dat abase refact oring. Deprecat e t he original dat abase schem a.

Test before, during, and aft er. Modify t he dat abase schem a. Migrat e t he source dat a. Modify ext ernal access program ( s) . Run regression t est s. Version cont rol your work. Announce t he refact oring.

3.1. Verify That a Database Refactoring Is Appropriate First , Beverley det erm ines whet her t he suggest ed refact oring needs t o occur. There are t hree issues t o consider:

1 . Does t he refact oring m ake sense? Perhaps t he exist ing t able st ruct ure is correct . I t is com m on for developers t o eit her disagree wit h, or t o sim ply m isunderst and, t he exist ing design of a dat abase. This m isunderst anding could lead t hem t o believe t hat t he design needs t o change when it really does not . The DBA should have a good knowledge of t he proj ect t eam 's dat abase, ot her corporat e dat abases, and will know whom t o cont act about issues such as t his. Therefore, t hey will be in a bet t er posit ion t o det erm ine whet her t he exist ing schem a is t he best one. Furt herm ore, t he DBA oft en underst ands t he bigger pict ure of t he overall ent erprise, providing im port ant insight t hat m ay not be apparent when you look at it from t he point of view of t he single proj ect . However, in our exam ple, it appears t hat t he schem a needs t o change. 2 . I s t he change act ually needed now? This is usually a " gut call" based on her previous experience wit h t he applicat ion developer. Does Eddy have a good reason for m aking t he schem a change? Can Eddy explain t he business requirem ent t hat t he change support s? Does t he requirem ent feel right ? Has Eddy suggest ed good changes in t he past ? Has Eddy changed his m ind several days lat er, requiring Beverley t o back out of t he change? Depending on t his assessm ent ,Beverley m ay suggest t hat Eddy t hink t he change t hrough som e m ore or m ay decide t o cont inue working wit h him , but will wait for a longer period of t im e before t hey act ually apply t he change in t he proj ect - int egrat ion environm ent ( Chapt er 4) if t hey believe t he change will need t o be reversed. 3 . I s it wort h t he effort ? The next t hing t hat Beverley does is t o assess t he overall im pact of t he refact oring. To do t his, Beverley should have an underst anding of how t he ext ernal program ( s) are coupled t o t his part of t he dat abase. This is knowledge t hat Beverley has built up over t im e by working wit h t he ent erprise archit ect s, operat ional dat abase adm inist rat ors, applicat ion developers, and ot her DBAs. When Beverley is not sure of t he im pact , she needs t o m ake a decision at t he t im e and go wit h her gut feeling or decide t o advise t he applicat ion developer t o wait while she t alks t o t he right people. Her goal is t o ensure t hat she im plem ent s dat abase refact orings t hat will succeedif you are going t o need t o updat e, t est , and redeploy 50 ot her applicat ions t o support t his refact oring, it m ay not be viable for her t o cont inue. Even when t here is only one applicat ion accessing t he dat abase, it m ay be so highly coupled t o t he port ion of t he schem a t hat you want t o change t hat t he dat abase refact oring sim ply is not wort h it . I n our exam ple, t he design problem is so clearly severe t hat she decides t o im plem ent it even t hough m any applicat ions will be affect ed.

Take Small Steps Dat abase refact oring changes t he schem a in sm all st eps; each refact oring should be m ade one at a t im e. For exam ple, assum e you realize t hat you need t o m ove an exist ing colum n, renam e it , and apply a com m on form at t o it . I nst ead of t rying t his all at once, you should inst ead successfully im plem ent Move Colum n ( page 103) , t hen successfully im plem ent Renam e Colum n ( page 109) , and t hen apply I nt roduce Com m on Form at ( page 183) one st ep at a t im e. The advant age is t hat if you m ake a m ist ake, it is easy t o find t he bug because it will likely be in t he part of t he schem a t hat you j ust changed.

3.2. Choose the Most Appropriate Database Refactoring As you can see in t his book, you could pot ent ially apply a large num ber of refact orings t o your dat abase schem a. To det erm ine which is t he m ost appropriat e refact oring for your sit uat ion, you m ust first analyze and underst and t he problem you face. When Eddy first approached Beverley, he m ay or m ay not have done t his analysis. For exam ple, he m ay have j ust gone t o her and said t hat t he Account t able needs t o st ore t he current balance; t herefore, we need t o add a new colum n ( via t he I nt roduce Colum n t ransform at ion on page 180) . However, what he did not realize was t hat t he colum n already exist s in t he Cust om er t able, which is arguably t he wrong place for it t o beEddy had ident ified t he problem correct ly, but had m isident ified t he solut ion. Based on her knowledge of t he exist ing dat abase schem a, and her underst anding of t he problem ident ified by Eddy, Beverley inst ead suggest s t hat t hey apply t he Move Colum n ( page 103) refact oring.

Sometimes the Data Is Elsewhere Your dat abase is likely not t he only source of dat a wit hin your organizat ion. A good DBA should at least know about , if not underst and, t he various dat a sources wit hin your ent erprise t o det erm ine t he best source of dat a. I n our exam ple, anot her dat abase could pot ent ially be t he official reposit ory of Account inform at ion. I f t hat is t he case, m oving t he colum n m ay not m ake sense because t he t rue refact oring would be Use Official Dat a Source ( page 271) .

3.3. Deprecate the Original Database Schema I f m ult iple applicat ions access your dat abase, you likely need t o work under t he assum pt ion t hat you cannot refact or and t hen deploy all of t hese program s sim ult aneously. I nst ead, you need a t ransit ion period, also called a deprecat ion period, for t he original port ion of t he schem a t hat you are changing ( Sadalage & Schuh 2002; Am bler 2003) . During t he t ransit ion period, you support bot h t he original and new schem as in parallel t o provide t im e for t he ot her applicat ion t eam s t o refact or and redeploy t heir syst em s. Typical t ransit ion periods last for several quart ers, if not years. The pot ent ially long t im e t o fully im plem ent a refact oring underscores t he need t o aut om at e as m uch of t he process as possible. Over a several- year period, people wit hin your depart m ent will change, put t ing you at risk if part s of t he process are m anual. Having said t hat , even in t he case of a single- applicat ion dat abase, your t eam m ay st ill require a t ransit ion period of a few days wit hin your proj ect - int egrat ion sandboxyour t eam m at es need t o refact or and ret est t heir code t o work wit h t he updat ed dat abase schem a. Figure 3.3 depict s t he life cycle of a dat abase refact oring wit hin a m ult i- applicat ion scenario. You first im plem ent it wit hin t he scope of your proj ect , and if successful, you event ually deploy it int o product ion. During t he t ransit ion period, bot h t he original schem a and t he new schem a exist , wit h sufficient scaffolding code t o ensure t hat any updat es are correct ly support ed. During t he t ransit ion period, you need t o assum e t wo t hings: first , t hat som e applicat ions will use t he original schem a whereas ot hers will use t he new schem a; and second, t hat applicat ions should only have t o work wit h one but not bot h versions of t he schem a. I n our exam ple, som e applicat ions will work wit h Cust om er.Balance and ot hers wit h Account .Balance, but not bot h sim ult aneously. Regardless of which colum n t hey work wit h, t he applicat ions should all run properly. When t he t ransit ion period has expired, t he original schem a plus any scaffolding code is rem oved and t he dat abase ret est ed. At t his point , t he assum pt ion is t hat all applicat ions work wit h Account .Balance.

Figu r e 3 .3 . Th e life cycle of a da t a ba se r e fa ct or in g in a m u lt i- a pplica t ion sce n a r io.

Figure 3.4 depict s t he original dat abase schem a, and Figure 3.5 shows what t he dat abase schem a would look like during t he t ransit ion period for when we apply t he Move Colum n dat abase refact oring to Cust om er.Balance. I n Figure 3.5 , t he changes are shown in bold, a st yle t hat we use t hroughout t he book. Not ice how bot h versions of t he schem a are support ed during t his period. Account .Balance has been added as a colum n, and Cust om er.Balance has been m arked for rem oval on or aft er June 14, 2006. A t rigger was also int roduced t o keep t he values cont ained in t he t wo colum ns synchronized, t he assum pt ion being t hat new applicat ion code will work wit h Account .Balance but will not keep Cust om er.Balance up- t o- dat e. Sim ilarly, we assum e t hat older applicat ion code t hat has not been refact ored t o use t he new schem a will not know t o keep Account .Balance up- t o- dat e. This t rigger is an exam ple of dat abase scaffolding code, sim ple and com m on code t hat is required t o keep your dat abase " glued t oget her." This code has been assigned t he sam e rem oval dat e as Cust om er.Balance.

Figu r e 3 .4 . Th e or igin a l Cu st om e r / Accou n t sch e m a . [View full size image]

Figu r e 3 .5 . Su ppor t in g bot h ve r sion s of t h e sch e m a. [View full size image]

Not all dat abase refact orings require a t ransit ion period. For exam ple, neit her I nt roduce Colum n Const raint ( page 180) nor Apply St andard Codes ( page 157) dat abase refact orings require a t ransit ion period because t hey sim ply im prove t he dat a qualit y by narrowing t he accept able values wit hin a colum n. A narrower value m ay break exist ing applicat ions, so beware of t he refact orings. Chapt er 5, " Dat abase Refact oring St rat egies," discusses st rat egies for choosing an appropriat e t ransit ion period.

3.4. Test Before, During, and After You can have t he confidence t o change your dat abase schem a if you can easily validat e t hat t he dat abase st ill works wit h your applicat ion aft er t he change, and t he only way t o do t hat is t o t ake a Test - Driven Developm ent ( TDD) approach, as suggest ed in Chapt er 1. Wit h a TDD- based approach, you writ e a t est and t hen you writ e j ust enough code, oft en dat a definit ion language ( DDL) , t o fulfill t he t est . You cont inue in t his m anner unt il t he dat abase refact oring has been im plem ent ed fully. You will pot ent ially need t o writ e t est s t hat do t he following: Test your dat abase schem a. Test t he way your applicat ion uses t he dat abase schem a. Validat e your dat a m igrat ion. Test your ext ernal program code.

3.4.1. Testing Your Database Schema Because a dat abase refact oring will affect your dat abase schem a, you need t o writ e dat abase- orient ed t est s. Alt hough t his m ay sound st range at first , you can validat e m any aspect s of a dat abase schem a: St or e d pr oce du r e s a n d t r igge r s. St ored procedures and t riggers should be t est ed j ust like your applicat ion code would be. Re fe r e n t ia l in t e gr it y ( RI ) . RI rules, in part icular cascading delet es in which highly coupled " child" rows are delet ed when a parent row is delet ed, should also be validat ed. Exist ence rules, such as a cust om er row corresponding t o an account row, m ust exist before t he row can be insert ed int o t he Account t able, and can be easily t est ed, t oo. Vie w de fin it ion s. Views oft en im plem ent int erest ing business logic. Things t o look out for include: Does t he filt ering/ select logic work properly? Do you get back t he right num ber of rows? Are you ret urning t he right colum ns? Are t he colum ns, and rows, in t he right order? D e fa u lt va lu e s. Colum ns oft en have default values defined for t hem . Are t he default values act ually being assigned? ( Som eone could have accident ally rem oved t his part of t he t able definit ion.) D a t a in va r ia n t s. Colum ns oft en have invariant s, im plem ent ed in t he form s of const raint s, defined for t hem . For exam ple, a num ber colum n m ay be rest rict ed t o cont aining t he values 1 t hrough 7. These invariant s should be t est ed. Dat abase t est ing is new t o m any people, and as a result you are likely t o face several challenges when adopt ing dat abase refact oring as a developm ent t echnique: I n su fficie n t t e st in g sk ills. This problem can be overcom e t hrough t raining, t hrough pairing wit h som eone wit h good t est ing skills ( pairing a DBA wit hout t est ing skills and a t est er wit hout DBA skills st ill works) , or sim ply t hrough t rial and error. The im port ant t hing is t hat you recognize t hat you need t o pick up t hese skills. I n su fficie n t u n it t e st s for you r da t a ba se . Few organizat ions have yet t o adopt t he pract ice of dat abase t est ing, so it is likely t hat you will not have a sufficient t est suit e for your exist ing

schem a. Alt hough t his is unfort unat e, t here is no bet t er t im e t han t he present t o st art writ ing your t est suit e. I n su fficie n t da t a ba se t e st in g t ools. Luckily, t ools such as DBUnit ( dbunit .sourceforge.net ) for m anaging t est dat a and SQLUnit ( sqlunit .sourceforge.net ) for t est ing st ored procedures are available as open source soft ware ( OSS) . I n addit ion, several com m ercial t ools are available for dat abase t est ing. However, at t he t im e of t his writ ing, t here is st ill significant opport unit y for t ool vendors t o im prove t heir dat abase t est ing offerings. So how would we t est t he changes t o t he dat abase schem a? As you can see in Figure 3.5 , t here are t wo changes t o t he schem a during t he t ransit ion period t hat we m ust validat e. The first one is t he addit ion of t he Balance colum n t o t he Account t able. This change is covered by our dat a m igrat ion and ext ernal program t est ing effort s, discussed in t he following sect ions. The second change is t he addit ion of t he t wo t riggers, SynchronizeAccount Balance and SynchronizeCust om erBalance, which, as t heir nam es im ply, keep t he t wo dat a colum ns synchronized. We need t est s t o ensure t hat if Cust om er.Balance is updat ed t hat Account .Balance is sim ilarly updat ed, and vice versa.

3.4.2. Validating Your Data Migration Many dat abase refact orings require you t o m igrat e and som et im es even cleanse t he source dat a. I n our exam ple, we m ust copy t he dat a values from Cust om er.Balance t o Account .Balance as part of im plem ent ing t he refact oring. I n t his case, we want t o validat e t hat t he correct balance was in fact copied over for individual cust om ers. I n refact orings such as Apply St andard Codes ( page 157) and Consolidat e Key St rat egy ( page 168) , you act ually " cleanse" dat a values. This cleansing logic m ust be validat ed. Wit h t he first refact oring, you m ay convert code values such as USA and U.S. all t o t he st andard value of US t hroughout your dat abase. You would want t o writ e t est s t o validat e t hat t he older codes were no longer being used and t hat t hey were convert ed properly t o t he official value. Wit h t he second refact oring, you m ight discover t hat cust om ers are ident ified via t heir cust om er I D in som e t ables, by t heir social securit y num ber ( SSN) in ot her t ables, and by t heir phone num ber in ot her t ables. You would want t o choose one way t o ident ify cust om ers, perhaps by t heir cust om er I D, and t hen refact or t he ot her t ables t o use t his t ype of colum n inst ead. I n t his case, you would want t o writ e t est s t o verify t hat t he relat ionship bet ween t he various rows was st ill being m aint ained properly. ( For exam ple, if t he t elephone num ber 555- 1234 referenced t he Sally Jones cust om er record, t he Sally Jones record should st ill be get t ing referenced when you replace it wit h cust om er I D 987654321.)

3.4.3. Testing Your External Access Programs Your dat abase is accessed by one or m ore program s, including t he applicat ion t hat you are working on. These program s should be validat ed j ust like any ot her I T asset wit hin your organizat ion. To successfully refact or your dat abase, you need t o be able t o int roduce t he final schem a, shown in Figure 3.6 , and see what breaks in your ext ernal access program s. The only way t hat you can have t he confidence t o refact or your dat abase schem a is if you have a full regression t est suit e for t hese program syes, we realize t hat you likely do not have t hese t est suit es. Once again, t here is no bet t er t im e t han t he present t o st art building up your t est suit e. We suggest t hat you writ e all t he t est ing code you require t o support each individual dat abase refact oring for all ext ernal access program s. ( Act ually, t he owners of t hese syst em s need t o writ e t hose t est s, not you.) I f you work t his way, over t im e you will build up t he t est suit e t hat you require.

Figu r e 3 .6 . Th e fin a l ve r sion of t h e da t a ba se sch e m a . [View full size image]

3.5. Modify the Database Schema Eddy and Beverley work t oget her t o m ake t he changes wit hin t heir developm ent sandbox. As you see in Figure 3.5 , t hey need t o add t he Account .Balance colum n as well as t he t wo t riggers, SynchronizeAccount Balance and SynchronizeCust om erBalance. The DDL code t o do t his is shown here: ALTER TABLE Account ADD Balance Numeric; COMMENT ON Account.Balance 'Move of Customer.Balance column, finaldate = 2006-06-14'; CREATE OR REPLACE TRIGGER SynchronizeCustomerBalance BEFORE INSERT OR UPDATE ON Account REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF :NEW.Balance IS NOT NULL THEN UpdateCustomerBalance; END IF; END; / COMMENT ON SynchronizeCustomerBalance 'Move of Customer.Balance column to Account, dropdate = 2006-06-14'; CREATE OR REPLACE TRIGGER SynchronizeAccountBalance BEFORE INSERT OR UPDATE OR DELETE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF DELETING THEN DeleteCustomerIfAccountNotFound; END IF; IF (UPDATING OR INSERTING) THEN IF :NEW.Balance IS NOT NULL THEN UpdateAccountBalanceForCustomer; END IF; END IF; END; / COMMENT ON SynchronizeAccountBalance 'Move of Customer.Balance column to Account, dropdate = 2006-06-14'

At t he t im e of t his writ ing, no aut om at ed dat abase refact oring t ools are availablet herefore, you need t o code everyt hing by hand for now. Do not worry. This will change in t im e. For now, you want t o writ e a single script cont aining t he preceding code t hat you can apply against t he dat abase schem a. We suggest assigning a unique, increm ent al num ber t o each script . The easiest way t o do so is j ust t o st art at t he num ber one and increm ent a count er each t im e you define a new dat abase refact oringt he easiest way t o do t hat is t o use t he build num ber of your applicat ion. However, t o m ake t his st rat egy work wit hin a m ult iple t eam environm ent , you need a way t o eit her assign unique num bers across all t eam s or t o add a unique t eam ident ifier t o t he individual refact orings. Fundam ent ally, you need t o be able t o different iat e bet ween Team A's refact oring num ber 1701 and Team B's refact oring num ber 1701. Anot her opt ion, discussed in m ore det ail in Chapt er 5, is t o assign t im est am ps t o t he refact oring.

There are several reasons why you want t o work wit h sm all script s for individual refact orings: Sim plicit y. Sm all, focused change script s are easier t o m aint ain t han script s com prising m any st eps. I f you discover t hat a refact oring should not be perform ed because of unforeseen problem s ( perhaps you cannot updat e a m aj or applicat ion t hat accesses t he changed port ion of t he schem a) , for exam ple, you want t o be able t o easily not perform t hat refact oring. Cor r e ct n e ss. You want t o be able t o apply each refact oring, in t he appropriat e order, t o your dat abase schem a so as t o evolve it in a defined m anner. Refact orings can build upon each ot her. For exam ple, you m ight renam e a colum n and t hen a few weeks lat er m ove it t o anot her t able. The second refact oring would depend on t he first refact oring because it s code would refer t o t he new nam e of t he colum n. Ve r sion in g. Different dat abase inst ances will have different versions of your dat abase schem a. For exam ple, Eddy's developm ent sandbox m ay have version 163, t he proj ect - int egrat ion sandbox version 161, t he QA/ Test sandbox version 155, and t he product ion dat abase version 134. To m igrat e t he proj ect - int egrat ion sandbox schem a t o version 163, you should m erely have t o apply dat abase refact oring 162 and 163. To keep t rack of t he version num ber, you need t o int roduce a com m on t able, such as Dat abaseConfigurat ion, t hat st ores t he current version num ber am ong ot her t hings. This t able is discussed in furt her det ail in Chapt er 5. The following DDL code m ust be run against your dat abase aft er t he t ransit ion period has ended ( discussed in Chapt er 4) . Sim ilarly, t his code should be capt ured in a single script file, along wit h t he ident ifier of 163 in t his case, and run in sequent ial order against your dat abase schem a as appropriat e. ALTER TABLE Customer DROP COLUMN Balance; DROP TRIGGER SynchronizeAccountBalance; DROP TRIGGER SynchronizeCustomerBalance;

Follow Your Database Design Conventions An im port ant part of im plem ent ing a refact oring is ensuring t hat t he changed port ion of your dat abase schem a follows your corporat e dat abase developm ent guidelines. These guidelines should be provided and support ed by your Dat abase Adm inist rat ion group, and at a m inim um should address nam ing and docum ent at ion guidelines.

3.6. Migrate the Source Data Many dat abase refact orings require you t o m anipulat e t he source dat a in som e way. Som et im es, you j ust need t o m ove dat a from one locat ion t o anot her, som et hing we need t o do wit h Move Dat a ( page 192) . Ot her t im es, you need t o cleanse t he values of t he dat a it self; t his is com m on wit h t he dat a qualit y refact orings ( Chapt er 7, " Dat a Qualit y Refact orings" ) such as Apply St andard Type ( page 162) and I nt roduce Com m on Form at ( page 183) . Sim ilar t o m odifying your dat abase schem a, you will pot ent ially need t o creat e a script t o perform t he required dat a m igrat ion. This script should have t he sam e ident ificat ion num ber as your ot her script t o m ake t hem easy t o m anage. I n our exam ple of m oving t he Cust om er.Balance colum n t o Account , t he dat a m igrat ion script would cont ain t he following dat a m anipulat ion language ( DML) code: /* One time migration of data from Customer.Balance to Account.Balance. */ UPDATE Account SET Balance = (SELECT Balance FROM Customer WHERE CustomerID = Account.CustomerID);

Depending on t he qualit y of t he exist ing dat a, you m ay quickly discover t he need t o furt her cleanse t he source dat a. This would require t he applicat ion of one or m ore dat a qualit y dat abase refact orings. I t is good pract ice t o keep your eye out for dat a qualit y problem s when you are working t hrough st ruct ural and archit ect ural dat abase refact orings. Dat a qualit y problem s are quit e com m on wit h legacy dat abase designs t hat have been allowed t o degrade over t im e.

The Need to Document Reflects a Need to Refactor When you find t hat you need t o writ e support ing docum ent at ion t o describe a t able, colum n, or st ored procedure, t hat is a good indicat ion t hat you need t o refact or t hat port ion of your schem a t o m ake it easier t o underst and. Perhaps a sim ple renam ing can avoid several paragraphs of docum ent at ion. The cleaner your design, t he less docum ent at ion you require.

3.7. Refactor External Access Program(s) When your dat abase schem a changes, you will oft en need t o refact or any exist ing ext ernal program s t hat access t he changed port ion of t he schem a. As you learned in Chapt er 2, " Dat abase Refact oring," t his includes legacy applicat ions, persist ence fram eworks, dat a replicat ion code, and report ing syst em s, t o nam e a few. Several good books provide guidance for effect ive refact oring of ext ernal access program s: Refact oring: I m proving t he Design of Exist ing Code (Fowler 1999 ) is t he classic t ext on t he subj ect . Working Effect ively wit h Legacy Code (Feat hers 2004) describes how t o refact or legacy syst em s t hat have exist ed wit hin your organizat ion for m any years. Refact oring t o Pat t erns (Kerievsky 2004) describes how t o m et hodically refact or your code t o im plem ent com m on design and archit ect ural pat t erns. When m any program s access your dat abase, you run t he risk t hat som e of t hem will not be updat ed by t he developm ent t eam s responsible for t hem , or worse yet t hey m ay not even be assigned t o a t eam at t he present m om ent . The im plicat ion is t hat som eone will need t o be assigned responsibilit y for updat ing t he applicat ion( s) , as well as t he responsibilit y t o burden t he cost . Hopefully, ot her t eam s are responsible for t hese ext ernal program s; ot herwise, your t eam will need t o accept responsibilit y for m aking t he required changes. I t is frust rat ing t o discover t hat t he polit ical challenges surrounding t he need t o updat e ot her syst em s oft en far out weigh t he t echnical challenges of doing so.

Aim for Continuous Development I deally, your organizat ion would cont inuously work on all of t heir applicat ions, evolving t hem over t im e and deploying t hem on a regular basis. Alt hough t his sounds com plicat ed, and it can be, shouldn't your I T depart m ent act ively st rive t o ensure t hat t he syst em s wit hin your organizat ion m eet it s changing needs? I n t hese environm ent s, you can have a relat ively short t ransit ion period because you know t hat all t he applicat ions accessing your dat abase evolve on a regular basis and t herefore can be updat ed t o work wit h t he changed schem a.

So what do you do when t here is no funding t o updat e t he ext ernal program s? You have t wo basic st rat egies from which t o choose. First , m ake t he dat abase refact oring and assign it a t ransit ion period of several decades. This way t he ext ernal program s t hat you cannot change st ill work; however, ot her applicat ions can access t he im proved design. This st rat egy has t he unfort unat e disadvant age t hat t he scaffolding code t o support bot h schem as will exist for a long t im e, reducing dat abase perform ance and clut t ering your dat abase. The second st rat egy is t o not do t he refact oring.

3.8. Run Your Regression Tests Part of im plem ent ing your refact oring is t o t est it t o ensure t hat it works. As indicat ed earlier, you will t est a lit t le, change a lit t le, t est a lit t le, and so on unt il t he refact oring is com plet e. Your t est ing act ivit ies should be aut om at ed as m uch as possible. A significant advant age of dat abase refact oring is t hat because t he refact orings represent sm all changes, when a t est breaks you have a pret t y good idea where t he problem lieswhere you j ust m ade t he change.

3.9. Version Control Your Work When your dat abase refact oring is successful, you should put all your work under configurat ion m anagem ent ( CM) cont rol by checking it int o a version cont rol t ool. I f you t reat your dat abaseorient ed art ifact s t he exact sam e way t hat you t reat your source code, you should be okay. Art ifact s t o version cont rol include t he following: Any script s t hat you have creat ed Test dat a and/ or generat ion code Test cases Docum ent at ion Models

3.10. Announce the Refactoring A dat abase is a shared resource. Minim ally, it is shared wit hin your applicat ion developm ent t eam , if not by several applicat ion t eam s. Therefore, you need t o com m unicat e t o int erest ed part ies t hat t he dat abase refact oring has been m ade. Early in t he life cycle of t he refact oring, you need t o com m unicat e t he changes wit hin your t eam , som et hing t hat could be as sim ple as announcing t he change at your t eam 's next st andup m eet ing. I n a m ult i- applicat ion dat abase environm ent , you m ust com m unicat e t he changes t o ot her t eam s, part icularly when you decide t o prom ot e t he refact oring int o your preproduct ion t est environm ent s. This com m unicat ion m ight be a sim ple e- m ail on an int ernal m ailing list specifically used t o announce dat abase changes, it could be a line it em in your regular proj ect st at us report , or it could be a form al report t o your operat ional dat abase adm inist rat ion group. An im port ant aspect of your announcem ent effort s will be t he updat e of any relevant docum ent at ion. This docum ent at ion will be crit ical during your prom ot ion and deploym ent effort s ( see Chapt er 4) because t he ot her t eam s need t o know how t he dat abase schem a has evolved. A sim ple approach is t o develop dat abase release not es t hat sum m arize t he changes t hat you have m ade, list ing each dat abase refact oring in order. Our exam ple refact oring would appear in t his list as " 163: Move t he Cust om er.Balance colum n int o t he Account t able." These release not es will likely be required by ent erprise adm inist rat ors so t hat t hey can updat e t he relevant m et a dat a. ( Bet t er yet , your t eam should updat e t his m et a dat a as part of t heir refact oring effort s.) You will want t o updat e t he physical dat a m odel ( PDM) for your dat abase. Your PDM is t he prim ary m odel describing your dat abase schem a and is oft en one of t he few " keeper" m odels creat ed on applicat ion developm ent proj ect s, and t herefore should be kept up- t o- dat e as m uch as possible.

Do Not Publish Data Models Prematurely Your obj ect and dat abase schem as are likely t o fluct uat e init ially because wit h an evolut ionary approach t o developm ent your design em erges over t im e. Because of t his init ial flux, you should wait unt il new port ions of your schem a have st abilized before publishing updat es t o your physical dat a m odel. This will reduce your docum ent at ion effort as well as m inim ize t he im pact t o ot her applicat ion t eam s t hat rely on your dat abase schem a.

3.11. What You Have Learned The hard work of dat abase refact oring is done wit hin your developm ent sandbox, hopefully by a developer paired wit h a DBA. The first st ep is t o verify t hat a dat abase refact oring is even appropriat eperhaps t he cost of perform ing t he refact oring current ly out weighs t he benefit , or perhaps t he current schem a is t he best design for t hat specific issue. I f a refact oring is required, you m ust choose t he m ost appropriat e one t o get t he j ob done. I n a m ult i- applicat ion environm ent , m any refact orings require you t o run bot h t he original and new versions of t he schem a in parallel during a t ransit ion period t hat is long enough t o allow any applicat ions accessing t hat port ion of t he schem a t im e t o be redeployed. To im plem ent t he refact oring, you should t ake a Test - First approach t o increase t he chance t hat you det ect any breakages int roduced by t he refact oring. You m ust m odify t he dat abase schem a, pot ent ially m igrat e any relevant source dat a, and t hen m odify any ext ernal program s t hat access t he schem a. All of your work should be version cont rolled, and aft er t he refact oring has been im plem ent ed wit hin your developm ent environm ent , it should be announced t o your t eam m at es and t hen event ually t o appropriat e ext ernal t eam s t hat m ight need t o know about t he schem a change.

Chapter 4. Deploying into Production I f we do not change direct ion soon we will end up where we are going. Dr. I rwin Corey I t is not enough j ust t o refact or your dat abase schem as wit hin your developm ent and int egrat ion sandboxes. You m ust also deploy t he changes int o product ion. The way t hat you do so m ust reflect your organizat ion's exist ing deploym ent processyou m ay need t o im prove t his exist ing process t o reflect t he evolut ionary approach described in t his book. You will likely discover t hat your organizat ion already has experience at deploying dat abase changes, but because schem a changes are oft en feared in m any environm ent s, you will also discover t hat your experiences have not been all t hat good. Tim e t o change t hat . The good news is t hat deploying dat abase refact orings is m uch safer t han deploying t radit ional dat abase schem a changes, assum ing, of course, you are following t he advice present ed in t his book. This is t rue for several reasons. First , individual dat abase refact orings are less risky t o deploy t han t radit ional dat abase schem a changes because t hey are sm all and sim ple. However, collect ions of dat abase refact orings, what you act ually deploy, can becom e com plex if you do not m anage t hem well. This chapt er provides advice for doing exact ly t hat . Second, when you t ake a Test - Driven Developm ent ( TDD) approach, you have a full regression t est suit e in place t hat validat es your refact orings. Knowing t hat t he refact orings act ually work enables you t o deploy t hem wit h great er confidence. Third, by having a t ransit ion period during which bot h t he old and new schem as exist in parallel, you are not required t o also deploy a slew of applicat ion changes t hat reflect t he schem a changes. To successfully deploy dat abase refact orings int o product ion, you need t o adopt st rat egies for t he following: Effect ively deploying bet ween sandboxes Applying bundles of dat abase refact orings Scheduling deploym ent windows Deploying your syst em Rem oving deprecat ed schem a

4.1. Effectively Deploying Between Sandboxes Chapt er 1, " Evolut ionary Dat abase Developm ent ," described t he idea t hat you need a series of sandboxes in which t o im plem ent , t est , and run your syst em s. As you see in Figure 4.1 , each proj ect t eam has a collect ion of developer sandboxes and possibly even t heir own corresponding dem o sandbox. The preproduct ion t est sandbox is shared bet ween t eam s, as is t he product ion environm ent . You can also see t hat t here are deploym ent gat es bet ween t he various sandboxes. I t should be relat ively easy t o deploy dat abase refact orings from a developer's workst at ion int o your shared proj ect - int egrat ion sandbox because t he im pact of a m ist ake is fairly low: You only affect t he t eam . I t should be a lit t le m ore difficult t o deploy int o your preproduct ion t est ing environm ent ( s) because t he im pact of a m ist ake is m uch great er: Not only could your syst em be unavailable t o t he t est ers, it could also cause ot her syst em s wit hin t hese environm ent s t o crash, t hereby affect ing ot her t eam s. Deploying int o your product ion environm ent is oft en a rigorous process because t he pot ent ial cost of a m ist ake is quit e high because you could easily im pact your cust om ers.

Figu r e 4 .1 . D e ployin g be t w e e n sa n dbox e s. [View full size image]

To deploy int o each sandbox, you will need t o bot h build your applicat ion and run your dat abase m anagem ent script s ( t he change log, t he updat e log, and t he dat a m igrat ion log, or equivalent , described in Chapt er 3, " The Process of Dat abase Refact oring" ) . The next st ep is t o rerun your regression t est s t o ensure t hat your syst em st ill worksif it does not , you m ust fix it in your developm ent sandbox, redeploy, and ret est . The goal in your proj ect - int egrat ion sandbox is t o validat e t hat t he changes m ade by t he individual developer ( pairs) work t oget her, whereas your goal in t he preproduct ion t est / QA sandbox is t o validat e t hat your syst em works well wit h t he ot her syst em s wit hin your organizat ion. I t is quit e com m on t o see developers prom ot e changes from t heir developm ent sandboxes int o t he proj ect - int egrat ion sandbox several t im es a day. As a t eam , you should st rive t o deploy your syst em

int o at least your dem o environm ent at least once an it erat ion so t hat you can share your current working soft ware wit h appropriat e int ernal proj ect st akeholders. Bet t er yet , you should also deploy your syst em int o your preproduct ion t est environm ent s so t hat it can be accept ance t est ed and syst em t est ed, ideally at least once an it erat ion. You want t o deploy regularly int o your preproduct ion environm ent for t wo reasons. First , you get concret e feedback as t o how well your syst em act ually works. Second, because you deploy regularly, you discover ways t o get bet t er at deploym ent by running your inst allat ion script s on a regular basis, including t he t est s t hat validat e t he inst allat ion script s, you will quickly get t hem t o t he point where t hey are robust enough t o deploy your syst em successfully int o product ion.

4.2. Applying Bundles of Database Refactorings Modern developm ent t eam s work in short it erat ions; wit hin an agile t eam , it erat ions of one or t wo weeks in lengt h are quit e com m on. But j ust because a t eam is developing working soft ware each week, it does not m ean t hat t hey are going t o deploy a new version of t he syst em int o product ion each week. Typically, t hey will deploy int o product ion once every few m ont hs. The im plicat ion is t hat t he t eam will need t o bundle up all t he dat abase refact orings t hat t hey perform ed since t he last t im e t hey deployed int o product ion so t hat t hey can be applied appropriat ely. As you saw in Chapt er 3, t he easiest way t o do t his is j ust t o t reat each dat abase refact oring as it s own t ransact ion t hat is im plem ent ed as a com binat ion of dat a definit ion language ( DDL) script s t o change t he dat abase schem a and t o m igrat e any dat a as appropriat e, as well as changes t o your program source code t hat accesses t hat port ion of t he schem a. This t ransact ion should be assigned a unique I D, a num ber or dat e/ t im est am p suffices, which enables you t o put t he refact orings in order. This allows you t o apply t he refact orings in order, eit her wit h a handwrit t en script or som e sort of generic t ool, in any of your sandboxes as you need. Because you cannot assum e t hat all dat abase schem as will be ident ical t o one anot her, you need som e way t o safely apply different com binat ions of refact orings t o each schem a. For exam ple, consider t he num ber- based schem e depict ed in Figure 4.2 . The various developer dat abases are each at different versions. Your dat abase is at version 813, anot her at 811, and anot her at 815. Because your proj ect - int egrat ion dat abase is at version 811, we can t ell t hat your dat abase schem a has been changed since t he last t im e you synced up wit h t he proj ect - int egrat ion environm ent , t hat t he second developer has not changed his version of t he schem a and has not yet obt ained t he m ost recent version ( 809 is less t han 811) , and t hat t he t hird developer has m ade changes in parallel t o yours. The changes m ust have been m ade in parallel because neit her of you have prom ot ed your changes t o t he int egrat ion environm ent yet ( ot herwise, t hat environm ent would have t he sam e num ber as one of t he t wo dat abases) . To updat e t he proj ect - int egrat ion dat abase, you need t o run it for changes st art ing at 812; t o updat e t he preproduct ion t est dat abase, you st art at change script 806; t o updat e t he dem o environm ent , you st art at change script 801; and t o updat e t he product ion dat abase, you st art at change num ber 758. Furt herm ore, you m ight not want t o apply all changes t o all versionsfor exam ple, you m ay only bring product ion up t o version 794 for t he next release.

Figu r e 4 .2 . D iffe r e n t da t a ba se s, diffe r e n t ve r sion s.

One way t o t hink about t his is t hat at t he beginning of developm ent for a new release of your syst em , you st art a new st ack of dat abase refact orings. Throughout your developm ent effort s, you keep adding new schem a changes t o t he st ack, and som et im es rem ove som e of t hem t hat you decide t o back out of. At t he end of t he developm ent of t hat release, you baseline t he st ack, accept ing it as t he bundle of schem a changes for t he current release.

4.3. Scheduling Deployment Windows A deploym ent window, oft en called a release window, is a specific point in t im e in which it is perm issible t o deploy a syst em int o product ion. Your operat ions st aff will likely have st rict rules regarding when applicat ion t eam s m ay deploy syst em s. I t is quit e com m on t o have rules such as applicat ion t eam s only being allowed t o deploy new releases on Sat urday evening bet ween 2 a.m . and 6 a.m ., and bug fixes bet ween 4 a.m . and 6 a.m . on ot her evenings. These deploym ent windows are t ypically defined t o coincide wit h periods of reduced syst em act ivit y. Furt herm ore, t hey m ay have rules about when dat abase schem a changes are allowed t o be m adefor exam ple, only on t he t hird Sat urday of each m ont h. Sm aller organizat ions, part icularly t hose wit hout m any developm ent proj ect s underway, m ay choose t o have deploym ent windows once every few m ont hs. The im plicat ion is t hat your t eam will not be allowed t o deploy your syst em int o product ion whenever you want , but inst ead it m ust schedule deploym ent int o a predefined deploym ent window. Figure 4.3 capt ures t his concept , showing how t wo proj ect t eam s schedule t he deploym ent of t heir changes ( including dat abase refact orings) int o available deploym ent windows. Som et im es t here is not hing t o deploy, som et im es one t eam has changes, and ot her t im es bot h t eam s have schem a changes t o deploy. The deploym ent windows in Figure 4.3 coincide wit h t he final deploym ent from your preproduct ion t est environm ent int o your product ion environm ent in Figure 4.1 .

Figu r e 4 .3 . D e ploy sch e m a ch a n ge s in t o pr odu ct ion a t pr e de fin e d poin t s in t im e . [View full size image]

You will nat urally need t o coordinat e wit h any ot her t eam s t hat are deploying during t he sam e deploym ent window. This coordinat ion will occur long before you go t o deploy, and frankly, t he prim ary reason why your preproduct ion t est environm ent exist s is t o provide a sandbox in which you can resolve m ult isyst em issues. Regardless of how m any dat abase refact orings are t o be applied t o your product ion dat abase, or how m any t eam s t hose refact orings were developed by, t hey will have first been t est ed wit hin your preproduct ion t est ing environm ent before being applied in product ion. The prim ary benefit of defined deploym ent windows is t hat it provides a cont rol m echanism over what goes int o product ion when. This helps your operat ions st aff t o organize t heir t im e, it provides developm ent t eam s wit h t arget dat es t o aim for, and set s t he expect at ions of your end users as t o when t hey m ight receive new funct ionalit y.

4.4. Deploying Your System You generally will not deploy dat abase refact orings on t heir own. I nst ead, you will deploy t hem as part of t he overall deploym ent of one or m ore syst em s. Deploym ent is easiest when you have one applicat ion and one dat abase t o updat e, and t his sit uat ion does occur in pract ice. Realist ically, however, you m ust consider t he sit uat ion in which you are deploying several syst em s and several dat a sources sim ult aneously. Figure 4.4 depict s a UML act ivit y diagram overviewing t he deploym ent process. You will need t o do t he following: 1 . Ba ck u p you r da t a ba se . Alt hough t his is oft en difficult at best wit h large product ion dat abases, whenever possible you want t o be able t o back out t o a known st at e if your deploym ent does not go well. One advant age of dat abase refact orings is t hat t hey are sm all, so t hey are easy t o back out of on an individual basis, an advant age t hat get s lost t he longer you wait t o deploy t he refact orings because t here is a great er chance t hat ot her refact orings will depend on it . 2 . Ru n you r pr e viou s r e gr e ssion t e st s. You first want t o ensure t hat t he exist ing product ion syst em ( s) are in fact up and running properly. Alt hough it should not have happened, som eone m ay have inadvert ent ly changed som et hing t hat you do not know about . I f t he t est suit e corresponding t o t he previous deploym ent fails, your best bet is t o abort and t hen invest igat e t he problem . Not e t hat you also need t o ensure t hat your regression t est s do not have any inadvert ent side effect s wit hin your product ion environm ent . The im plicat ion is t hat you will need t o be careful wit h your t est ing effort s. 3 . D e ploy you r ch a n ge d a pplica t ion ( s) . Follow your exist ing procedures t o do t his. 4 . D e ploy you r da t a ba se r e fa ct or in gs. You need t o run t he appropriat e schem a change script s and dat a m igrat ion script s t o your dat a sources. 5 . Ru n you r cu r r e n t r e gr e ssion t e st s. Aft er t he applicat ion( s) and dat abase refact orings have been deployed, you m ust run t he current version of your t est suit e t o verify t hat t he deployed syst em ( s) work in product ion. Once again, beware of side effect s from your t est s. 6 . Ba ck ou t if n e ce ssa r y. I f your regression t est s reveal serious defect s, you m ust back out your applicat ions and dat abase schem as t o t he previous versions, in addit ion t o rest oring t he dat abase based on t he backup from St ep 1. I f t he deploym ent is com plex, you m ay want t o deploy in increm ent s, t est ing each increm ent one at a t im e. An increm ent al deploym ent approach is m ore com plex t o im plem ent but has t he advant age t hat t he ent ire deploym ent does not fail j ust because one port ion of it is problem at ic. 7 . An n ou n ce t h e de ploym e n t r e su lt s. When syst em s are successfully deployed, you should let your st akeholders know im m ediat ely. Even if you have t o abort t he deploym ent , you should st ill announce what happened and when you will at t em pt t o deploy again. You need t o m anage your st akeholders' expect at ionst hey are hoping for t he successful deploym ent of one or m ore syst em s t hat t hey have been wait ing for, so t hey are likely int erest ed t o hear how t hings have gone.

Figu r e 4 .4 . Th e de ploym e n t pr oce ss.

4.5. Removing Deprecated Schema A dat abase refact oring has not been t ruly deployed unt il you have rem oved t he deprecat ed schem a from product ion. When t he t ransit ion period has ended, t he deprecat ed schem a and any scaffolding code, such as t riggers t o keep different versions of t he schem a synchronized, m ust be rem oved. Because t he t ransit ion period m ay be several years, because t hat is how long it will t ake t o updat e all t he program s accessing t he dat abase, you need t o have a process in place t o m anage t hese changes. The easiest approach, as described in Chapt er 3, is t o sim ply have specified dat es ( perhaps once a quart er) on which a t ransit ion period can end. The im plicat ion is t hat not only will you bundle up dat abase schem a im provem ent s t o apply t hem all at one t im e, you will also bundle up t he rem oval of deprecat ed schem a and apply t hose changes all at once. We cannot say t his enough: You m ust t est t horoughly. Before rem oving t he deprecat ed port ions of t he schem a from product ion, you should first rem ove it from your preproduct ion t est / QA environm ent and ret est everyt hing t o ensure t hat it all st ill works. Aft er you have done t hat , apply t he changes in product ion, run your t est suit e t here, and eit her back out or cont inue as appropriat e.

4.6. What You Have Learned Not only do you need t o im plem ent dat abase refact orings, you also need t o deploy t hem int o product ion; ot herwise, why do t hem at all? By having separat e sandboxes, you can safely im plem ent and t est your refact orings t o get t hem ready t o be deployed. By deploying developm ent versions of your syst em int o your preproduct ion t est environm ent on a regular basis, you im prove and validat e your deploym ent script s long before you need t o apply t hem wit hin a product ion environm ent . Alt hough you m ay develop working soft ware on a regular basis, som et im es weekly, you generally will not release it int o product ion t hat oft en. I nst ead, you will bundle up your dat abase refact orings and deploy a collect ion of t hem all at one t im e during a predefined deploym ent window. Your applicat ion m ay not be t he only one deploying int o product ion during a given deploym ent window; t herefore, you m ay need t o coordinat e wit h ot her t eam s t o deploy successfully.

Chapter 5. Database Refactoring Strategies Knowing m ore t oday t han yest erday is good news about t oday, not bad news about yest erday. Ron Jeffries This chapt er describes som e of our experiences wit h dat abase refact oring on act ual proj ect s, and suggest s a few pot ent ial st rat egies t hat you m ay want t o consider. I n m any ways, t his chapt er sum m arizes a collect ion of " lessons learned" t hat we hope will help your adopt ion effort s. These lessons include t he following: Sm aller changes are easier t o apply. Uniquely ident ify individual refact orings. I m plem ent a large change by m any sm all ones. Have a dat abase configurat ion t able. Prefer t riggers over views or bat ch synchronizat ion. Choose a sufficient deprecat ion period. Sim plify your dat abase change cont rol board ( CCB) st rat egy. Sim plify negot iat ions wit h ot her t eam s. Encapsulat e dat abase access. Be able t o easily set up a dat abase environm ent . Do not duplicat e SQL. Put dat abase asset s under change cont rol. Beware of polit ics.

5.1. Smaller Changes Are Easier to Apply I t is t em pt ing t o t ry t o m ake several changes t o your dat abase at once. For exam ple, what is st opping you from m oving a colum n from one t able t o anot her, renam ing it , and applying a st andard t ype t o it all at t he sam e t im e? Absolut ely not hing, ot her t han t he knowledge t hat doing t his all at once is harder, and t herefore riskier, t han doing it one st ep at a t im e. I f you m ake a sm all change and discover t hat som et hing is broken, you pret t y m uch know which change caused t he problem .

Small Changes Decrease Project Risk I t is safer t o proceed in sm all st eps, one at t im e. The larger t he change, t he great er t he chance t hat you will int roduce a defect , and t he great er t he difficult y in finding any defect s t hat you do inj ect .

5.2. Uniquely Identify Individual Refactorings During a soft ware developm ent proj ect , you are likely t o apply hundreds of refact orings and/ or t ransform at ions t o your dat abase schem a. Because t hese refact orings oft en build upon each ot herfor exam ple, you m ay renam e a colum n and t hen a few weeks lat er m ove it t o anot her t ableyou need t o ensure t hat t he refact orings are applied in t he right order. To do t his, you need t o ident ify each refact oring som ehow and ident ify any dependencies bet ween t hem . Table 5.1 com pares and cont rast s t he t hree basic st rat egies for doing so. The st rat egies in Table 5.1 assum e t hat you are working in a single- applicat ion single- dat abase environm ent .

Ta ble 5 .1 . Ve r sion I de n t ifica t ion St r a t e gie s Appr oa ch

Adva n t a ge s

Bu ild n u m be r . The applicat ion build num ber, t ypically an int eger num ber t hat is assigned by your build t ool ( for exam ple, CruiseCont rol) , whenever your applicat ion com piles and all your unit t est s run successfully aft er a change ( even if t hat change is a dat abase refact oring) .

Sim ple st rat egy.

D isa dva n t a ge s

Assum es t hat your dat abase refact oring t ool is int egrat ed wit h your build t ool, Refact orings can be t reat ed or t hat each refact oring is one or m ore as a First I n, First Out ( FI FO) script s kept under configurat ion queue t o be applied in order m anagem ent cont rol. by t he build num ber. Many builds do not involve dat abase Dat abase version direct ly changes. Therefore, t he version linked t o applicat ion version. ident ifiers are not cont iguous for t he dat abase. ( For exam ple, t hey m ay go 1, 7, 11, 12, ... rat her t han 1, 2, 3, 4, ....) Difficult t o m anage when you have m ult iple applicat ions being developed against t he sam e dat abase, because each t eam will have t he sam e build num bers.

D a t e / t im e st a m p. The Sim ple st rat egy. current dat e/ t im e is assigned Refact orings m anaged as a t o t he refact oring. FI FO queue.

Wit h a script - based approach t o im plem ent ing refact orings, using a dat e/ t im est am p for a filenam e can be awkward. You need a st rat egy t o associat e t he refact orings wit h t he appropriat e applicat ion build.

Un iqu e ide n t ifie r. A unique ident ifier, such as a GUI D or an increm ent al value, is assigned t o t he refact oring.

Exist ing st rat egies for generat ing unique values. ( For exam ple, a globally unique ident ifier ( GUI D) generat or can be used.)

GUI Ds are awkward filenam es. Wit h GUI Ds, you st ill need t o ident ify t he order in which t o apply t he refact orings. You need a st rat egy t o associat e t he refact orings wit h t he appropriat e applicat ion build.

When you are in a m ult i- applicat ion environm ent in which several proj ect t eam s m ay be applying

refact orings t o t he sam e dat abase schem a, you also need t o find a way t o ident ify which t eam produced a refact oring. The easiest way t o do t his is t o assign a unique ident ifier t o each t eam and t hen include t hat value as part of t he refact oring ident ifier. Therefore, wit h a build num ber st rat egy, t eam 1 m ight have refact orings wit h I Ds 1- 7, 1- 12, 1- 15, and so on; and t eam 7 could have refact orings wit h I Ds 7- 3, 7- 7, 7- 13, and so on. Our experience is t hat when a single t eam is responsible for a dat abase, t he build num ber st rat egy works best . However, when several t eam s can evolve t he sam e dat abase, a dat e/ t im est am p approach works best because you can readily t ell in which order t he refact orings were applied from t he dat e/ t im est am p. Wit h a build num ber approach, you cannot for exam ple, det erm ine which refact oring com es first , refact oring 1- 7 or 7- 7? I t is not , however, as sim ple as applying refact orings in order. Scot t worked in an organizat ion in which four separat e t eam s could evolve t he sam e dat abase schem a. There were t wo dat abase adm inist rat ors ( DBAs) we will call t hem Fred and Barneyt o support t he t eam s t hat worked closely t oget her. Alt hough we t ried t o coordinat e t heir effort s, and m ost of t he t im e succeeded, m ist akes would happen. One t im e Fred applied t he refact oring Apply St andard Codes ( page 157) t o a colum n, and a few days lat er Barney applied t he sam e refact oring on a different t eam , but used a different set of " st andard" values. As it t urned out , Barney's " st andard" values was t he right set , but we did not find t hat out unt il t he t wo t eam s prom ot ed t heir changes int o t he preproduct ion t est ing environm ent and effect ively clobbered one anot her. The point t o be m ade is t hat in a m ult iple- t eam environm ent , you need a coordinat ion st rat egy. ( Several are discussed lat er in t his chapt er.)

5.3. Implement a Large Change by Many Small Ones Large changes t o your dat abase, such as im plem ent ing a com m on surrogat e key st rat egy across all t ables or applying a consist ent nam ing st rat egy t hroughout your dat abase, should be im plem ent ed as a collect ion of sm all refact orings. This st rat egy follows t he old adage, " How do you eat an elephant ? One bit e at a t im e." Consider split t ing an exist ing t able in t wo. Alt hough we have a single refact oring called Split Table ( page 145) , t he realit y is t hat in pract ice you need t o apply m any refact orings t o get t his done. For exam ple, you need t o apply the I nt roduce New Table ( page 304) t ransform at ion t o add t he new t able, the Move Colum n ( page 103) refact oring several t im es ( one for each colum n) t o m ove, and pot ent ially the I nt roduce I ndex ( page 248) refact oring t o im plem ent t he prim ary key of t he new t able. To im plem ent each of t he Move Colum n refact orings, you m ust apply t he I nt roduce New Colum n ( page 301) t ransform at ion and t he Move Dat a ( page 192) t ransform at ion. When doing t his, you m ay discover t hat you need t o apply one or m ore dat a qualit y refact orings ( Chapt er 7, " Dat a Qualit y Refact orings" ) t o im prove t he source dat a in t he individual colum ns.

5.4. Have a Database Configuration Table Chapt er 3, " The Process of Dat abase Refact oring," discussed t he need t o ident ify t he current schem a version of t he dat abase t o enable you t o updat e t he schem a appropriat ely. This schem a version should reflect your dat abase refact oring st rat egy; for exam ple, if you ident ify refact orings using a dat e/ t im est am p st rat egy, you should ident ify t he current schem a version wit h a dat e/ t im est am p, t oo. The easiest way t o do t hat is t o have a t able t hat m aint ains t his inform at ion. I n t he following code, we creat e a single- row, single- colum n t able called Dat abaseConfigurat ion t hat reflect s a build num ber st rat egy: CREATE TABLE DatabaseConfiguration (SchemaVersion NUMBER NOT NULL); INSERT INTO DatabaseConfiguration ( SchemaVersion ) VALUES (0);

This t able is updat ed wit h t he ident ifier value of a dat abase refact oring whenever t he refact oring is applied t o t he dat abase. For exam ple, when you apply refact oring num ber 17 t o t he schem a, Dat abaseConfigurat ion.Schem aVersion, would be updat ed t o 17, as shown in t he following code: UPDATE DatabaseConfiguration SET SchemaVersion = 17;

5.5. Prefer Triggers over Views or Batch Synchronization I n Chapt er 2, " Dat abase Refact oring," you learned t hat when several applicat ions access t he sam e dat abase schem a, you oft en require a t ransit ion period during which bot h t he original and new schem as exist in product ion. You need a way t o ensure t hat regardless of which version of t he schem a an applicat ion accesses, it accesses consist ent dat a. Table 5.2 com pares and cont rast s t he t hree st rat egies t hat you m ay use t o keep t he dat a synchronized. Our experience is t hat t riggers are t he best approach for t he vast m aj orit y of sit uat ions. We have used views a couple of t im es and have t aken a bat ch approach rarely. All of t he exam ples t hroughout t his book assum e a t rigger- based synchronizat ion approach.

Ta ble 5 .2 . Sch e m a Syn ch r on iza t ion St r a t e gie s St r a t e gy

Adva n t a ge s

D isa dva n t a ge s

Tr igge r . One or m ore t riggers are im plem ent ed t hat m ake t he appropriat e updat e t o t he ot her version of t he schem a.

Real- t im e updat e

Pot ent ial perform ance bot t leneck. Pot ent ial for t rigger cycles. Pot ent ial for deadlocks. Oft en int roduces duplicat e dat a. ( Dat a is st ored in bot h t he original and new schem a.)

Vie w s. View( s) represent ing t he original t able( s) are int roduced; see Encapsulat e Table Wit h View ( page 243) , which updat es bot h t he original and new schem as appropriat ely.

Real- t im e updat e.

Ba t ch u pda t e s. A bat ch j ob t hat processes and updat es t he dat a accordingly is run on a regular basis ( for exam ple, daily) .

Perform ance im pact from dat a synchronizat ion is absorbed during nonpeak loads.

No need t o m ove physical dat a bet ween t ables/ colum ns.

Updat eable views are not support ed by som e dat abases, or t he dat abase does not support j oins wit hin an updat eable view. Addit ional com plexit y of int roducing and event ually rem oving t he view( s) . Huge pot ent ial for referent ial int egrit y problem s. You need t o keep t rack of previous versions of dat a t o det erm ine which changes were m ade t o t he record. When m ult iple changes are m ade during t he bat ch periodfor exam ple, som eone updat es dat a cont ained in bot h t he original and new schem asit can be difficult t o det erm ine which change( s) t o accept . Oft en int roduces duplicat e dat a. ( Dat a is st ored in bot h t he original and new schem a.)

5.6. Choose a Sufficient Transition Period The DBA m ust assign a realist ic t ransit ion period t o t he refact oring t hat is sufficient for all t he ot her applicat ion t eam s. We have found t hat t he easiest approach is t o agree on a com m on t ransit ion period for t he various cat egories of refact oring and t hen apply t hat t ransit ion period consist ent ly. For exam ple, st ruct ural refact orings m ay have a t wo- year t ransit ion period, whereas archit ect ure refact orings m ay have a t hree- year t ransit ion period. The prim ary disadvant age is t hat t his approach requires you t o adopt t he longest t ransit ion periods, even when you are refact oring schem a t hat is accessed by a handful of applicat ions t hat are deployed frequent ly. You can m it igat e t his problem by act ively rem oving schem a wit hin your product ion dat abases t hat are no longer required, even t hough t he t ransit ion period m ay not have expired yet , or by negot iat ing a short er t ransit ion period via a dat abase change cont rol board or t hrough direct negot iat ion wit h t he ot her t eam s.

5.7. Simplify Your Database Change Control Board (CCB) Strategy Scot t worked wit h one com pany in which t hey had a dat abase CCB com prised of t he operat ional DBAs, people who underst ood t he ent erprise dat a asset s well. This CCB m et once a week, and at t heir m eet ings t he proj ect DBAs would bring a list of suggest ed changes t hat t heir t eam s want ed t o m ake t o exist ing product ion dat a sources. ( The t eam s were free t o change dat a sources not yet in product ion.) The CCB would det erm ine whet her t he change would be allowed and if so what t he deprecat ion period would be. The advant age is t hat t here is t he percept ion of t ight er cont rol on t he part of t he CCB. ( Of course, t his st rat egy com plet ely falls apart if a developm ent t eam chooses t o go around t hem .) I t has t he disadvant age t hat it slows down t he developm ent effort s of t he t eam s. Even when you can get a decision wit hin a few hours or days, t hat is st ill t im e during which t he developm ent t eam has t o t olerat e t he original schem a. Our suggest ion is t o eit her give t he proj ect DBA t he aut horit y t o m ake t he changes t o t he dat abase schem a as needed, wit h t he underst anding t hat t he CCB m ay lat er decide t o override t he change, or t o have t he CCB m eet daily t o discuss changes.

5.8. Simplify Negotiations with Other Teams An alt ernat ive st rat egy for defining t he t ransit ion period is t o negot iat e individual refact orings wit h t he owners of any syst em t hat m ight be affect ed by t he dat abase refact oring. You could do t his one t im e for each refact oring, or in bat ch during a regular dat abase schem a change negot iat ion m eet ing. This approach has t he advant ages t hat it will help t o com m unicat e t he pot ent ial changes t o everyone affect ed and will likely result in t he m ost accurat e t ransit ion period ( assum ing t hat everyone at t he m eet ing can accurat ely predict when t hey can deploy t heir required changes) . The prim ary disadvant age is t hat t his approach m ight be slow and arduous. We have never seen t his t ried in pract ice. I f you do t ry it , however, we suggest t hat you keep t hings as sim ple as possible.

5.9. Encapsulate Database Access I n Chapt er 2, we argued t hat t he m ore dat abase access is encapsulat ed, t he easier it is t o refact or. Minim ally, even if your applicat ion cont ains hard- coded SQL, you should at least st rive t o put t hat SQL code in one place so t hat you can easily find and updat e it when you need t o. You could im plem ent t he SQL logic in a consist ent m anner, such as having save( ) , delet e( ) , ret rieve( ) , and find( ) operat ions for each business class. Or you could im plem ent dat a access obj ect s ( DAOs) , classes t hat im plem ent t he dat a access logic separat ely from business classes. For exam ple, your Cust om er and Account business classes would have Cust om erDAO and Account DAO classes respect ively. Bet t er yet , you could forego SQL code com plet ely by generat ing your dat abase access logic from m apping m et a dat a ( Am bler 2003) .

5.10. Be Able to Easily Set Up a Database Environment People j oin, and event ually leave, your proj ect t hroughout it s life cycle. As you see in Figure 5.1 , your t eam ( s) will need t o be able t o creat e inst ances of your dat abase, oft en wit h different versions of t he schem a on a range of m achines, as you learned in Chapt er 4, " Deploying int o Product ion." The m ost effect ive way t o do t his is wit h an inst allat ion script t hat applies t he init ial DDL t o creat e t he dat abase schem a and any applicable change script s, and t hen runs your regression t est suit e t o ensure t hat t he inst allat ion was successful.

Figu r e 5 .1 . Sa n dbox e s.

5.11. Do Not Duplicate SQL One of t he great t hings about SQL is t hat it is fairly easy t o writ e. Unfort unat ely, because it is fairly easy t o writ e, we have found t hat SQL code is oft en duplicat ed t hroughout an applicat ion, and even t hroughout a dat abase wit hin it s views, st ored procedures, and t riggers. The m ore SQL code you have, t he harder it is t o refact or your dat abase schem a, because t here will pot ent ially be m ore code coupled t o what ever you are refact oring. I t is best if you writ e SQL in a single package or class, have t he SQL generat ed from m et a dat a, or st ore t he SQL code in XML files t hat is accessed at runt im e.

5.12. Put Database Assets Under Change Control Chapt er 1, " Evolut ionary Dat abase Developm ent ," described how im port ant it is t o put all dat abase asset s, such as dat a m odels and dat abase script s, under change cont rol m anagem ent . Bot h of us have been involved wit h proj ect t eam s where t he DBAs, and som et im es even t he developers, did not do t his. As you would expect , t hese t eam s oft en st ruggled t o ident ify t he proper version of t he dat a m odel, or of a change script , when it cam e t im e t o deploy t heir applicat ions int o preproduct ion t est ing and som et im es even int o product ion. Your dat abase asset s, j ust like t he rest of your crit ical proj ect asset s, should be m anaged effect ively. We have found it m ost helpful when dat abase asset s are colocat ed in t he sam e reposit ory as t he applicat ion, enabling us t o see who m ade changes and support ing rollback capabilit ies.

5.13. Beware of Politics I nt roducing evolut ionary dat abase t echniques, and in part icular dat abase refact oring, is likely t o be a m aj or change wit hin your organizat ion. As Manns and Rising ( 2005) point out in Fearless Change, polit ics is likely t o rear it s ugly head as you t ry t o m ake t hese im provem ent s t o t he way t hat you work. Alt hough m any I T professionals prefer t o ignore polit ics, naively believing t hat t hey are above it , t hey do so at t heir peril. The t echniques t hat we describe in t his book work; we know t his because we have done it . We also know t hat m any t radit ional dat a professionals are t hreat ened by t hese t echniques, and right fully so, because t hese t echniques force t hem t o m ake significant changes t o t he way t hat t hey will work in t he fut ure. You will likely need t o " play t he polit ical gam e" if you want t o adopt dat abase refact oring wit hin your organizat ion.

5.14. What You Have Learned Dat abase refact oring is a relat ively new developm ent t echnique, but you can st ill learn from t he experiences of ot her people. I n t his chapt er, we shared som e of our experiences and suggest ed a few new st rat egies t hat you m ay want t o t ry.

Online Resources We adm inist er t he Agile Dat abases m ailing list at Yahoo Groups. The URL for t he group is groups.yahoo.com / group/ agileDat abases/ . We invit e you t o get involved wit h t he discussions and share your experiences. We also m aint ain bot h www.dat abaserefact oring.com and www.agiledat a.org where up- t o- dat e list s of dat abase refact orings are m aint ained. New refact orings will be added as t hey are discovered.

Chapter 6. Structural Refactorings St ruct ural refact orings, as t he nam e im plies, change t he t able st ruct ure of your dat abase schem a. The st ruct ural refact orings are as follows: Drop Colum n Drop Table Drop View I nt roduce Calculat ed Colum n I nt roduce Surrogat e Key Merge Colum ns Merge Tables Move Colum n Renam e Colum n Renam e Table Renam e View Replace Large Obj ect ( LOB) Wit h Table Replace Colum n Replace One- t o- Many Wit h Associat ive Table Replace Surrogat e Key Wit h Nat ural Key Split Colum n Split Table

Common Issues When Implementing Structural Refactorings When im plem ent ing st ruct ural refact orings, you need t o consider several com m on issues when updat ing t he dat abase schem a, including t he following: 1 . Avoid t r igge r cycle s. You need t o im plem ent t he t rigger so t hat cycles do not occurif t he value in one of t he original colum ns changes, Table.NewColum n1..N should be updat ed, but t hat updat e should not t rigger t he sam e updat e t o t he original colum ns and so on. The following code shows an exam ple of keeping t he value of t wo colum ns synchronized: CREATE OR REPLACE TRIGGER SynchronizeFirstName BEFORE INSERT OR UPDATE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF INSERTING THEN IF :NEW.FirstName IS NULL THEN :NEW.FirstName := :NEW.FName; END IF; IF :NEW.Fname IS NULL THEN :NEW.FName := :NEW.FirstName; END IF; END IF; IF UPDATING THEN IF NOT(:NEW.FirstName=:OLD.FirstName) THEN :NEW.FName:=:NEW.FirstName; END IF; IF NOT(:NEW.FName=:OLD.FName) THEN :NEW.FirstName:=:NEW.FName; END IF; END IF; END; / 2 . Fix br ok e n vie w s. Views are coupled t o ot her port ions of your dat abase; so when you apply a st ruct ural refact oring, you m ay inadvert ent ly break a view. View definit ions are t ypically coupled t o ot her views and t able definit ions. For exam ple, t he VCust om erBalance view is defined by j oining t he Cust om er and Account t able t oget her t o obt ain by Cust om erNum ber t he t ot al balance across all account s for each individual cust om er. I f you renam e Cust om er.Cust om erNum ber , t his view will effect ively be broken. 3 . Fix br ok e n t r igge r s. Triggers are coupled t o t able definit ions; t herefore, st ruct ural changes such as a renam ed or m oved colum n could pot ent ially break a t rigger. For exam ple, an insert t rigger m ay validat e t he dat a st ored in a specific colum n; and if t hat colum n has been changed, t he t rigger will pot ent ially be broken. The following code finds broken t riggers in Oracle, som et hing t hat you should add t o your t est suit e. You st ill need ot her t est s t o find business logic defect s: SELECT Object_Name, Status FROM User_Objects WHERE Object_Type='TRIGGER'

AND Status='INVALID'; 4 . Fix br ok e n st or e d pr oce du r e s. St ored procedures invoke ot her st ored procedures and access t ables, views, and colum ns. Therefore, any st ruct ural refact oring has t he pot ent ial t o break an exist ing st ored procedure. The following code finds broken st ored procedures in Oracle, som et hing t hat you should add t o your t est suit e. You st ill need ot her t est s t o find business logic defect s: SELECT Object_Name, Status FROM User_Objects WHERE Object_Type='PROCEDURE' AND Status='INVALID'; 5 . Fix br ok e n t a ble s. Tables are indirect ly coupled t o t he colum ns of ot her t ables via nam ing convent ions. For exam ple, if you renam e t he Cust om er.Cust om erNum ber colum n, you should sim ilarly renam e Account .Cust om erNum ber and Policy.Cust om erNum ber. The following code finds all t he t ables wit h colum n nam es cont aining t he t ext CUSTOMERNUMBER in Oracle: SELECT Table_Name,Column_Name FROM User_Tab_Columns WHERE Column_Name LIKE '%CUSTOMERNUMBER%'; 6 . D e fin e t h e t r a n sit ion pe r iod. St ruct ural refact orings require a t ransit ion period when you im plem ent t hem in a m ult i- applicat ion environm ent . You m ust assign t he sam e drop dat es t o t he original schem a t hat is being refact ored as well as any colum ns and t he t rigger. This drop dat e m ust t ake int o account t he t im e required t o updat e t he ext ernal program s accessing t hat port ion of t he dat abase.

Drop Column Rem ove a colum n from an exist ing t able.

Figu r e 6 .1 . D r op t h e Cu st om e r .Fa vor it e Color colu m n .

Motivation The prim ary reason t o apply Drop Colum n is t o refact or a dat abase t able design or as t he result of t he refact oring of ext ernal applicat ions, such t hat t he colum n is no longer used. Drop Colum n is oft en applied as a st ep of t he Move Colum n ( page 103) dat abase refact oring because t he colum n is rem oved from t he source t able. Or, som et im es you discover t hat som e of t he colum ns are not really used. Usually, it is bet t er t o rem ove t hese colum ns before som eone st art s using t hem by m ist ake.

Potential Tradeoffs The colum n being dropped m ay cont ain valuable dat a; in t hat case, t he dat a m ay need t o be preserved. You can use Move Dat a ( page 192) t o m ove t he dat a t o som e ot her t able so t hat it is preserved. On t ables cont aining m any rows, t he dropping of a colum n m ay run for a long t im e, m aking your t able unavailable for updat e during t he execut ion of Drop Colum n.

Schema Update Mechanics To updat e t he schem a t o rem ove a colum n, you m ust do t he following: 1 . Ch oose a r e m ove st r a t e gy. Som e dat abase product s m ay not allow for a colum n t o be rem oved, forcing you t o creat e a t em porary t able, m ove all t he dat a int o a t em porary t able, drop the original t able, re- creat e t he original t able wit hout t he colum n, m ove t he dat a from t he t em porary t able, and t hen drop t he t em porary t able. I f your dat abase provides a way t o rem ove colum ns, you j ust have t o rem ove t he colum n using t he D ROP COLUM N opt ion of t he ALTER TABLE com m and. 2 . D r op t h e colu m n . Som et im es, when t he am ount of dat a is large, we have t o m ake sure t hat t he Drop Colum n runs in a reasonable am ount of t im e. To m inim ize t he disrupt ion, schedule t he physical rem oval of t he colum n t o a t im e when t he t able is least used. Anot her st rat egy is t o m ark t he dat abase colum n as unused; t his can be achieved by using t he SET UN USED opt ion of the ALTER TABLE com m and. The SET UN USED com m and runs m uch fast er, t hus m inim izing disrupt ion. You can t hen rem ove t he unused colum ns during scheduled downt im es. When t his opt ion is used, t he dat abase does not physically rem ove t he colum n but hides t he colum n from everyone. 3 . Re w or k for e ign k e ys. I f Favorit eColor is part of a prim ary key, you m ust also rem ove t he corresponding colum ns from t he ot her t ables t hat use it as ( part of) t he foreign key t o Cust om er. You will have t o re- creat e t he foreign key const raint s on t hese ot her t ables. I n t his sit uat ion, you m ay want t o consider applying refact orings such as I nt roduce Surrogat e Key ( page 85 ) or Replace Surrogat e Key wit h Nat ural Key ( page 135) before applying Drop Colum n t o sim plify t his refact oring. An alt ernat ive st rat egy t o physically rem oving t he colum n is m asking it s exist ence by int roducing a t able view t hat does not reference Favorit eColor via t he refact oring Encapsulat e Table Wit h View ( page 243) . During t he t ransit ion period, you j ust associat e a com m ent wit h Cust om er.Favorit eColor t o indicat e t hat it will soon be dropped. Aft er t he t ransit ion period, you rem ove t he colum n from t he Cust om er t able via t he ALTER TABLE com m and, as you see here: COMMENT ON Customer.FavoriteColor 'Drop date = September 14 2007'; On September 14 2004 ALTER TABLE Customer DROP COLUMN FavoriteColor; I f you are using t he SET UN USED opt ion, you can use t he following com m and t o m ake t he Cust om er.Favorit eColor unused so t hat it is not really physically rem oved from t he Cust om er t able, but is m ade unavailable and invisible t o all t he client s: ALTER TABLE Customer SET UNUSED FavoriteColor;

Data-Migration Mechanics To support t he rem oval of a colum n from a t able, you m ay discover t hat you need t o preserve exist ing dat a or you m ay need t o plan for t he perform ance of t he Drop Colum n ( because rem oving a colum n from a t able will disallow dat a m odificat ions on t he t able) . The prim ary issue here is t o preserve t he dat a before you drop t he colum n. When you are going t o rem ove an exist ing colum n from a t able t hat has been in product ion, you will likely be required by t he business t o preserve t he exist ing dat a " j ust in case" t hey need it again at som e point in t he fut ure. The sim plest approach is t o creat e a t em porary t able wit h t he prim ary key of t he source t able and t he colum n t hat is being rem oved and t hen m ove t he appropriat e dat a int o t his new t em porary t able. You can choose ot her m et hods of preserving t he dat a such as archiving dat a t o ext ernal files.

The following code depict s t he st eps t o rem ove t he Cust om er.Favorit eColor colum n. To preserve t he dat a, you creat e a t em porary t able called Cust om erFavorit eColor , which includes t he prim ary key from the Cust om er t able and t he Favorit eColor colum n. CREATE TABLE CustomerFavoriteColor AS SELECT CustomerID, FavoriteColor FROM Customer;

Access Program Update Mechanics You m ust ident ify and t hen updat e any ext ernal program s t hat reference Cust om er.Favorit eColor . I ssues t o consider include t he following: 1 . Re fa ct or code t o u se a lt e r n a t e da t a sou r ce s. Som e ext ernal program s m ay include code t hat st ill uses t he dat a current ly cont ained wit hin Cust om er.Favorit eColor . When t his is t he case, alt ernat e dat a sources m ust be found, and t he code reworked t o use t hem ; ot herwise, t he refact oring m ust be abandoned. 2 . Slim dow n SELECT st a t e m e n t s. Som e ext ernal program s m ay include queries t hat read in t he dat a but t hen ignore t he ret rieved value. 3 . Re fa ct or da t a ba se in se r t s a n d u pda t e s. Som e ext ernal program s m ay include code t hat put s a " fake value" int o t his colum n for insert s of new dat a, code t hat m ust be rem oved. Or t he program s m ay include code t o not writ e over Favorit eColor during an insert or updat e int o t he dat abase. I n ot her cases, you m ay have SELECT * FROM Cu st om e r where t he applicat ion expect s a cert ain num ber of colum ns and get s t he colum ns from t he result set using posit ional reference. This applicat ion code is likely t o break now because t he result set of t he SELECT st at em ent now ret urns one less colum n. Generally, it is not a good idea t o use SELECT * from any t able in your applicat ion. Grant ed, t he real problem here is t he fact t hat t he applicat ion is using posit ional references, som et hing you should consider refact oring, t oo.

The following code shows how you have t o rem ove t he reference t o Favorit eColor : //Before code public Customer findByCustomerID(Long customerID) { stmt = DB.prepare("SELECT CustomerId, FirstName, "+ "FavoriteColor FROM Customer WHERE CustomerId = ?") stmt.setLong(1, customerID.longValue()); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customer.setCustomerId(rs.getLong("CustomerId")); customer.setFirstName(rs.getString("FirstName")); customer.setFavoriteColor(rs.getString ("FavoriteColor")); } return customer; } public void insert(long customerId,String firstName, String favoriteColor) { stmt = DB.prepare("INSERT into customer" + "(Customerid, FirstName, FavoriteColor)" + "values (?, ?, ?)"); stmt.setLong(1, customerId); stmt.setString(2, firstName); stmt.setString(3, favoriteColor); stmt.execute();

} public void update(long customerId, String firstName, String color) { stmt = DB.prepare("UPDATE Customer "+ "SET FirstName = ?, FavoriteColor=? " + "WHERE Customerid = ?"); stmt.setString(1, firstName); stmt.setString(2, color); stmt.setLong(3, customerId); stmt.executeUpdate(); } //After code public Customer findByCustomerID(Long customerID) { stmt = DB.prepare("SELECT CustomerId, FirstName " + "FROM Customer WHERE CustomerId = ?"); stmt.setLong(1, customerID.longValue()); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customer.setCustomerId(rs.getLong("CustomerId")); customer.setFirstName(rs.getString("FirstName")); } return customer; } public void insert(long customerId,String firstName) { stmt = DB.prepare("INSERT into customer" + "(Customerid, FirstName) " + "values (?, ?)"); stmt.setLong(1, customerId); stmt.setString(2, firstName); stmt.execute(); } public void update(long customerId, String firstName, String color) { stmt = DB.prepare("UPDATE Customer "+ "SET FirstName = ? " + "WHERE Customerid = ?"); stmt.setString(1, firstName); stmt.setLong(2, customerId); stmt.executeUpdate(); }

Drop Table Rem ove an exist ing t able from t he dat abase.

Motivation Apply Drop Table when a t able is no longer required and/ or used. This occurs when t he t able has been replaced by anot her sim ilar dat a source, such as anot her t able or a view, or sim ply when t here is no longer need for t hat specific dat a source.

Potential Tradeoffs Dropping a t able delet es t hat specific dat a from your dat abase, so you m ay need t o preserve som e or all of t he dat a. I f t his is t he case, t he required dat a m ust be st ored wit hin anot her dat a source, especially when you are norm alizing a dat abase design and find t hat som e of t he dat a exist s in ot her t ables( s) . You can replace t he t able wit h a view or a query t o t he dat a source. I n t his scenario, you cannot writ e back t o t he sam e view or dat a source query.

Schema Update Mechanics To perform Drop Table, you m ust resolve dat a- int egrit y issues. I f TaxJurisdict ions is being referenced by any ot her t ables, you have t o eit her rem ove t he foreign key const raint or point t he foreign key const raint t o anot her t able. Figure 6.2 depict s an exam ple of how t o go about rem oving t he TaxJurisdict ions t ableyou j ust m ark t he t able as deprecat ed and t hen rem ove it aft er t he t ransit ion dat e. The following code depict s t he DDL t o rem ove t he t able:

Figu r e 6 .2 . D r oppin g t h e Ta x Ju r isdict ion s t a ble .

drop date = June 14 2007 DROP TABLE TaxJurisdictions;

You can also choose t o j ust renam e t he t able, as shown below. When doing t his, som e dat abase product s aut om at ically change all references from TaxJurisdict ions t o TaxJurisdict ionsRem oved aut om at ically. You want t o delet e t hose referent ial int egrit y const raint s by using Drop Foreign Key ( page 213) because you m ay not want t o have referent ial int egrit y const raint s t o a t able t hat is going

t o be dropped: rename date = June 14 2007 ALTER TABLE TaxJurisdictions RENAME TO TaxJurisdictionsRemoved;

Data-Migration Mechanics The only dat a- m igrat ion issue wit h t his refact oring is t he pot ent ial need t o archive t he exist ing dat a so t hat it can be rest ored if needed at a lat er dat e. You can do t his by using t he CREATE TABLE AS SELECT com m and. The following code depict s t he DDL t o opt ionally preserve dat a in t he TaxJurisdict ions t able: copy data before drop CREATE TABLE TaxJurisdictionsRemoved AS SELECT * FROM TaxJurisdictions; drop date = June 14 2007 DROP TABLE TaxJurisdictions;

Access Program Update Mechanics Any ext ernal program s referencing TaxJurisdict ions m ust be refact ored t o access t he alt ernat ive dat a source( s) t hat have replaced TaxJurisdict ions. I f t here are no alt ernat ives, and t he dat a is st ill required, you m ust not rem ove t he t able unt il t he alt ernat ive( s) exist .

Drop View Rem ove an exist ing view.

Motivation Apply Drop View when a view is no longer required and/ or used. This occurs when t he view has been replaced by anot her sim ilar dat a source, such as anot her view or a t able, or sim ply when t here is no longer t he need for t hat specific query.

Potential Tradeoffs Dropping a view does not delet e any dat a from your dat abase; however, it does m ean t hat t he view is no longer available t o t he ext ernal program s t hat access it . Views are oft en used t o obt ain t he dat a for report s. I f t he dat a is st ill required, t he view should have already been replaced by anot her dat a source, eit her a view or a t able, or a query t o t he source dat a it self. This new dat a access approach should ideally perform as well or bet t er t han t he view t hat is being rem oved. Views are also used t o im plem ent securit y access cont rol ( SAC) t o dat a values wit hin a dat abase. When t his is t he case, a new SAC st rat egy for t he t ables accessed by t he view should have been previously im plem ent ed and deployed. A view- based securit y st rat egy is oft en a lowest - com m on denom inat or approach t hat can be shared across m any applicat ions but is not as flexible as a program m at ic SAC st rat egy ( Am bler 2003) .

Schema Update Mechanics To rem ove t he view in Figure 6.3 , you m ust apply t he D ROP VI EW com m and t o Account Det ails aft er t he t ransit ion period. The following code t o drop t he Account Det ails view is very st raight forwardyou j ust m ark t he view as deprecat ed and t hen rem ove it aft er t he t ransit ion dat e.

Figu r e 6 .3 . D r oppin g t h e Accou n t D e t a ils vie w .

drop date = June 14 2007 DROP VIEW AccountDetails;

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics You m ust ident ify and t hen updat e any ext ernal program s t hat reference Account Det ails. You m ay need t o refact or SQL code t hat form erly used Account Det ails t o now explicit ly access t he dat a direct ly from t he source t ables. Sim ilarly, any m et a dat a used t o generat e SQL using Account Det ails would need t o be updat ed. The following code shows how you have t o change your applicat ion code t o use dat a from t he base t ables: // Before code stmt.prepare( "SELECT * " + "FROM AccountDetails "+ "WHERE CustomerId = ?"); stmt.setLong(1,customer.getCustomerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); // After code stmt.prepare( "SELECT * " + "FROM Customer, Account " + "WHERE" + " Customer.CustomerId = Account.CustomerId " + " AND Customer.CustomerId = ?");

stmt.setLong(1,customer.getCustomerID); stmt.execute(); ResultSet rs = stmt.executeQuery();

Introduce Calculated Column I nt roduce a new colum n based on calculat ions involving dat a in one or m ore t ables. (Figure 6.4 depict s t wo t ables, but it could be any num ber.)

Figu r e 6 .4 . I n t r odu cin g t h e Cu st om e r .Tot a lAccou n t Ba la n ce ca lcu la t e d colu m n .

Motivation The prim ary reason you would apply I nt roduce Calculat ed Colum n is t o im prove applicat ion perform ance by providing prepopulat ed values for a given propert y derived from ot her dat a. For exam ple, you m ay want t o int roduce a calculat ed colum n t hat indicat es t he credit risk level ( for exam ple, exem plary, good risk, bad risk) of a client based on t hat client 's previous paym ent hist ory wit h your firm .

Potential Tradeoffs The calculat ed colum n m ay get out of sync wit h t he act ual dat a values, part icularly when ext ernal applicat ions are required t o updat e t he value. We suggest t hat you int roduce a m echanism , such as a regular bat ch j ob or t riggers on t he source dat a, which aut om at ically updat e t he calculat ed colum n.

Schema Update Mechanics Applying I nt roduce Calculat ed Colum n can be com plicat ed because of dat a dependencies and t he need t o keep t he calculat ed colum n synchronized wit h t he dat a values it is based on. You will need t o do t he following:

1 . D e t e r m in e a syn ch r on iza t ion st r a t e gy. Your basic choices are bat ch j obs, applicat ion updat es, or dat abase t riggers. A bat ch j ob can be used when you do not require t he value t o be updat ed in real t im e; ot herwise, you need t o use one of t he ot her t wo st rat egies. When t he applicat ion( s) are responsible t o do t he appropriat e updat es, you run t he risk of different applicat ions doing it in different ways. The t rigger approach is likely t he safer of t he t wo real- t im e st rat egies because t he logic only needs t o be im plem ent ed once, in t he dat abase. Figure 6.4 assum es t he use of t riggers. 2 . D e t e r m in e h ow t o ca lcu la t e t h e va lu e . You have t o ident ify t he source dat a, and how it should be used, t o det erm ine t he value of Tot alAccount Balance. 3 . D e t e r m in e t h e t a ble t o con t a in t h e colu m n . You have t o det erm ine which t able should include Tot alAccount Balance. To do t his, ask yourself which business ent it y does t he calculat ed colum n best describe. For exam ple, a cust om er's credit risk indicat or is m ost applicable t o t he Cust om er ent it y. 4 . Add t h e n e w colu m n . Add Cust om er.Tot alAccount Balance of Figure 6.4 via t he I nt roduce New Colum n t ransform at ion ( page 301) . 5 . I m ple m e n t t h e u pda t e st r a t e gy. You need t o im plem ent and t est t he st rat egy chosen in St ep 1.

The following code shows you how t o add t he Cust om er.Tot alAccount Balance colum n and t he Updat eCust om erTot alAccount Balance t rigger, which is run any t im e t he Account t able is m odified: Create the new column TotalAccountBalance ALTER TABLE Customer ADD TotalAccountBalance NUMBER; Create trigger to keep data in sync. CREATE OR REPLACE TRIGGER UpdateCustomerTotalAccountBalance BEFORE UPDATE OR INSERT OR DELETE ON Account REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE NewBalanceToUpdate NUMBER:=0; CustomerIdToUpdate NUMBER; BEGIN CustomerIdToUpdate := :NEW.CustomerID; IF UPDATING THEN NewBalanceToUpdate := :NEW.Balance-:OLD.Balance; END IF; IF INSERTING THEN NewBalanceToUpdate := :NEW.Balance; END IF; IF DELETING THEN NewBalanceToUpdate := -1*:OLD.Balance; CustomerIdToUpdate := :OLD.CustomerID; END IF; UPDATE Customer SET TotalAccountBalance = TotalAccountBalance + NewBalanceToUpdate WHERE Customerid = CustomerIdToUpdate; END; /

Data-Migration Mechanics There is no dat a t o be m igrat ed per se, alt hough t he value of Cust om er.Tot alAccount Balance m ust be populat ed based on t he calculat ion. This is t ypically done once using t he UPD ATE SQL com m and or can be done in bat ch via one or m ore script s. The following code shows you how t o set t he init ial value in Cust om er.Tot alAccount Balance: UPDATE Customer SET TotalAccountBalance = (SELECT SUM(balance) FROM Account WHERE Account.CustomerId = Customer.CustomerId)

Access Program Update Mechanics When you int roduce t he calculat ed colum n, you need t o ident ify all t he places in ext ernal applicat ions where t his calculat ion is used and t hen rework t hat code t o work wit h Tot alAccount Balance. You need t o replace exist ing calculat ion logic wit h access t o t he value of Tot alAccount Balance. You m ay discover t hat t he calculat ion is perform ed different ly in various applicat ions, eit her because of a bug or because of a different sit uat ion, and t herefore you will need t o negot iat e t he correct algorit hm wit h your st akeholder( s) . The following code shows how an applicat ion is used t o calculat e t he t ot al balance by looping t hrough all t he account s of a cust om er. I n t he " aft er version," it sim ply reads t he value int o m em ory when t he cust om er obj ect is ret rieved from t he dat abase: // Before code stmt.prepare( "SELECT SUM(Account.Balance) Balance FROM Customer, Account " + "WHERE Customer.CustomerID = Account.CustomerID "+ "AND Customer.CustomerID=?"); stmt.setLong(1,customer.getCustomerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); return rs.getBigDecimal("Balance")); //After code return customer.getBalance();

Introduce Surrogate Key Replace an exist ing nat ural key wit h a surrogat e key. This refact oring is t he opposit e of Replace Surrogat e Key Wit h Nat ural Key ( page 135) .

Motivation There are several reasons why you want t o int roduce a surrogat e key t o a t able: Re du ce cou plin g. The prim ary reason is t o reduce coupling bet ween your t able schem a and t he business dom ain. I f part of a nat ural key is likely t o changefor exam ple, if a part num ber st ored wit hin an invent ory t able is likely t o increase in size or change it s t ype ( from num eric t o alphanum eric) t hen having it as a prim ary key is a dangerous proposit ion. I n cr e a se con sist e n cy. You m ay want t o apply t he Consolidat e Key St rat egy ( page 168) refact oring, pot ent ially im proving perform ance and reducing code com plexit y. I m pr ove da t a ba se pe r for m a n ce . Your dat abase perform ance m ay have degraded because of a large com posit e nat ural key. ( Som e dat abases st ruggle when a key is m ade up of several colum ns.) When you replace t he large com posit e prim ary key wit h a single- colum n surrogat e prim ary key, t he dat abase will be able t o updat e t he index.

Potential Tradeoffs Many dat a professionals prefer nat ural keys. The debat e over surrogat e and nat ural keys is a " religious issue" wit hin t he dat a com m unit y, but t he realit y is t hat bot h t ypes of keys have t heir place. Even t hough a t able has a surrogat e prim ary key, you m ay st ill require nat ural alt ernat e keys t o support searching. Because a surrogat e key has no business m eaning, and because it is t ypically im plem ent ed as a collect ion of m eaningless charact ers or num bers, your end users cannot use it for searches. As a result , t hey st ill need t o ident ify dat a via nat ural ident ifiers. For exam ple, t he I nvent oryI t em t able has a surrogat e prim ary key called I nvent oryI t em POI D ( POI D is short for persist ent obj ect ident ifier) and a nat ural alt ernat e key called I nvent oryI D. I ndividual it em s are ident ified uniquely wit hin t he syst em by t he surrogat e key but ident ified by users via t he nat ural key. The value in applying a surrogat e key is t hat it sim plifies your key st rat egy wit hin your dat abase and reduces t he coupling bet ween your dat abase schem a and your business dom ain. Anot her challenge is t hat you m ay im plem ent a surrogat e key when it really is not needed. Many people can becom e overzealous when it com es t o im plem ent ing keys, and oft en t ry t o apply t he sam e st rat egy t hroughout t heir schem a. For exam ple, in t he Unit ed St at es, individual st at es are ident ified by a unique t wo- let t er st at e code ( for exam ple, CA for California) . This st at e code is guarant eed t o be unique wit h t he Unit ed St at es and Canada; t he code for t he province of Ont ario is ON, and t here will never be an Am erican st at e wit h t hat code. The st at es and provinces are fairly st able ent it ies, t here is a large num ber of codes st ill available ( only 65 of 676 possible com binat ions have been used t o dat e) , and, because of t he upheaval it would cause wit hin t heir own syst em s, t he respect ive governm ent s are unlikely t o change t he st rat egy. Therefore, does it really m ake sense t o int roduce a surrogat e key t o a lookup t able list ing all t he st at es and provinces? Likely not . Also, when OriginalKey is being used as a foreign key in ot her t ables, you want t o apply Consolidat e Key St rat egy ( page 168) and m ake sim ilar updat es t o t hose t ables. Not e t hat t his m ay be m ore work t han it is wort h; you m ight want t o reconsider applying t his refact oring.

Schema Update Mechanics Applying I nt roduce Surrogat e Key can be com plicat ed because of t he coupling t hat t he original keyin our exam ple, t he com binat ion of Cust om erNum ber, OrderDat e, and St oreI Dis pot ent ially involved wit h. Because it is a prim ary key of a t able, it is likely t hat it also form s ( part of) t he foreign key back to Order from ot her t ables. You will need t o do t he following: 1 . I n t r odu ce t h e n e w k e y colu m n . Add t he colum n t o t he t arget t able via t he SQL com m and AD D COLUM N . I n Figure 6.5 , t his is OrderPOI D. This colum n will need t o be populat ed wit h unique values.

Figu r e 6 .5 . I n t r odu cin g t h e Or de r .Or de r POI D su r r oga t e k e y. [View full size image]

2 . Add a n e w in de x . A new index based on OrderPOI D needs t o be int roduced for Order. 3 . D e pr e ca t e t h e or igin a l colu m n . The original key colum ns m ust be m arked for dem ot ion t o alt ernat e key st at us, or nonkey st at us as t he case m ay be, at t he end of t he t ransit ion period. I n our exam ple, t he colum n will not be delet ed at t his t im e from Order; t hey will j ust no longer be considered t he prim ary key. They will be delet ed from OrderI t em . 4 . Upda t e a n d possibly a dd r e fe r e n t ia l in t e gr it y ( RI ) t r igge r s. Any t riggers t hat exist t o m aint ain referent ial int egrit y bet ween t ables need t o be updat ed t o work wit h t he corresponding new key values in t he ot her t ables. Triggers need t o be int roduced t o populat e t he value of t he foreign key colum ns during t he t ransit ion period because t he applicat ions m ay not have been updat ed t o do so.

Figure 6.5 depict s how t o int roduce OrderPOI D, a surrogat e key, t o t he Order t able. You also need t o recursively apply I nt roduce Surrogat e Key t o t he OrderI t em t able of Figure 6.5 t o m ake use of t he new key colum n. This is opt ional. Of course, OrderI t em could st ill use t he exist ing com posit e key m ade up of t he Cust om erNum ber, OrderDat e, and St oreI D colum ns, but for consist ency, we have decided t o refact or t hat t able, t oo. The following SQL code int roduces and init ially populat es t he OrderPOI D colum n in bot h t he Order and

OrderI t em t ables. I t obt ains unique values for OrderPOI D by invoking t he Generat eUniqueI D st ored procedure as needed, which im plem ent s t he HI GH- LOW algorit hm ( Am bler 2003) . I t also int roduces t he appropriat e index required t o support OrderPOI D as a key of Order: Add new surrogate key to Order table ALTER TABLE Order ADD OrderPOID NUMBER; Add new surrogate foreign key to OrderItem table ALTER TABLE OrderItem ADD OrderPOID NUMBER; Assign values to surrogate key column in Order UPDATE Order SET OrderPOID = getOrderPOIDFromOrder(CustomerNumber,OrderDate,StoreID); Propagate ForeignKey in OrderItem UPDATE OrderItem SET OrderPOID = (SELECT OrderPOID FROM Order WHERE CustomerNumber = Order.CustomerNumber AND OrderDate = Order.OrderDate AND StoreID=Order.StoreID); CREATE INDEX OrderOrderPOIDIndex ON Order (OrderPOID);

To support t his new key, we need t o add t he Populat eOrderPOI D t rigger, which is invoked whenever an insert occurs in OrderI t em . This t rigger obt ains t he value of Order.OrderPOI D, as you can see in t he following SQL code: CREATE OR REPLACE TRIGGER PopulateOrderPOID BEFORE INSERT ON OrderItem REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF :NEW.OrderPOID IS NULL THEN :NEW.OrderPOID := getOrderPOIDFromOrder(CustomerNumber,OrderDate,StoreID); END IF; IF :NEW.OrderPOID IS NOT NULL THEN IF :NEW.CustomerNumber IS NULL OR :NEW.OrderDate IS NULL OR :NEW.StoreID IS NULL THEN :NEW.CustomerNumber := getCustomerNumberFromOrder(OrderPOID); :NEW.OrderDate := getOrderDateFromOrder(OrderPOID); :NEW.StoreID := getStoreIDFromOrder(OrderPOID); END IF; END IF; END; / June 14 2007 ALTER TABLE OrderItem DROP CONSTRAINT OrderItemToOrderForeignKey; ALTER TABLE Order DROP CONSTRAINT

OrderPrimaryKey; ALTER TABLE Order MODIFY OrderPOID NOT NULL; ALTER TABLE Order ADD CONSTRAINT OrderPrimaryKey PRIMARY KEY (OrderPOID); ALTER TABLE OrderItem DROP CONSTRAINT OrderItemPrimaryKey; ALTER TABLE OrderItem MODIFY OrderPOID NOT NULL; ALTER TABLE OrderItem ADD CONSTRAINT OrderItemPrimaryKey PRIMARY KEY (OrderPOID, OrderItemNumber); ALTER TABLE OrderItem ADD (CONSTRAINT OrderItemToOrderForeignKey FOREIGN KEY (OrderPOID) REFERENCES Order; CREATE UNIQUE INDEX OrderNaturalKey ON Order (CustomerNumber,OrderDate,StoreID); DROP TRIGGER PopulateOrderPOID;

Data-Migration Mechanics We m ust generat e dat a in Order.OrderPOI D and assign t hese values t o t he foreign key colum ns in ot her t ables.

Access Program Update Mechanics Any ext ernal program s referencing t he original key colum ns m ust be updat ed t o work wit h Order.OrderPOI D. You m ay need t o rework t he code t o do t he following: 1 . Assign n e w t ype s of k e y va lu e s. I f ext ernal applicat ion code assigns new surrogat e key values, inst ead of t he dat abase it self, all ext ernal applicat ions need t o be reworked t o assign values t o Order.OrderPOI D. Minim ally, every single program m ust im plem ent t he sam e algorit hm t o do so, but a bet t er st rat egy is t o im plem ent a com m on service t hat every applicat ion invokes. 2 . Join ba se d on t h e n e w k e y. Many ext ernal access program s will define j oins involving Order, im plem ent ed eit her via hard- coded SQL or via m et a dat a. These j oins should be refact ored t o work wit h Order.OrderPOI D. 3 . Re t r ie ve ba se d on t h e n e w k e y. Som e ext ernal program s will t raverse t he dat abase one or m ore rows at a t im e, ret rieving dat a based on t he key values. These ret rievals need t o be updat ed t o work wit h Order.OrderPOI D.

The following hibernat e m apping shows you how t he surrogat e key is int roduced: //Before mapping

//After mapping

Merge Columns Merge t wo or m ore colum ns wit hin a single t able.

Motivation There are several reasons why you m ay want t o apply Merge Colum ns:

An ide n t ica l colu m n . Two or m ore developers m ay have added t he colum ns unbeknownst t o each ot her, com m on occurrence when t he developers are on different t eam s or when m et a dat a describing t he t able schem a is not available. For exam ple, t he FeeSt ruct ure t able has 37 colum ns, 2 of which are called and CheckingAccount OpeningFee, and bot h of which st ore t he init ial fee levied by t he bank when opening a checking account . The second colum n was added because nobody was sure what t he CA_I NI T colum n was really being used for.

Th e colu m n s a r e t h e r e su lt of ove r de sign . The original colum ns where int roduced t o ensure t hat t he inform at ion was st ored in it s const it uent form s, but act ual usage shows t hat you do not need t he fine det a t hat you originally t hought . For exam ple, Cust om er t able of Figure 6.6 includes t he colum ns PhoneCount ryCode, PhoneAreaCode, and PhoneLocal t o represent a single phone num ber.

Figu r e 6 .6 . M e r gin g colu m n s in t h e Cu st om e r t a ble .

Th e a ct u a l u sa ge of t h e colu m n s h a s be com e t h e sa m e . Several colum ns were originally added t o a t able, but over t im e t he way t hat one or m ore of t hem are used has changed t o t he point where t hey are a being used for t he sam e purpose. For exam ple, t he Cust om er t able includes PreferredCheckSt yle and Select edCheckSt yle colum ns ( not shown in Figure 6.6 ) . The first colum n was used t o record which st yle of checks t o send t o t he cust om er from next season's select ion, and t he second colum n was used t o record t

st yle which t he cust om er previously had sent out t o t hem . This was useful 20 years ago when it t ook m ont hs t o order in new checks, but now t hat t hey can be print ed over night , we have st art ed aut om at icall st oring t he sam e value in bot h colum ns.

Potential Tradeoffs

This dat abase refact oring can result in a loss of dat a precision when you m erge finely det ailed colum ns. When y m erge colum ns t hat ( you believe) are used for t he sam e purpose, you run t he risk t hat you should in fact be using t hem for separat e t hings. ( I f so, you will discover t hat you need t o reint roduce one or m ore of t he origina colum ns.) The usage of t he dat a should det erm ine whet her t he colum ns should be m erged, som et hing t hat you will need t o explore wit h your st akeholders.

Schema Update Mechanics

To perform Merge Colum ns, you m ust do t wo t hings. First , you need t o int roduce t he new colum n. Add t he colum n t o t he t able via t he SQL com m and AD D COLUM N . I n Figure 6.6 , t his is Cust om er.PhoneNum ber. st ep is opt ional because you m ay find it possible t o use one of t he exist ing colum ns int o which t o m erge t he dat You also need t o int roduce a synchronizat ion t rigger t o ensure t hat t he colum ns rem ain synchronized wit h one anot her. The t rigger m ust be invoked by any change t o t he colum ns.

Figure 6.6 shows an exam ple where t he Cust om er t able init ially st ores t he phone num ber of a person in t hree separat e colum ns: PhoneCount ryCode, PhoneAreaCode, and PhoneLocal. Over t im e, we have discovered t hat fe applicat ions are int erest ed in t he count ry code because t hey are used only wit hin Nort h Am erica. We have also discovered t hat every applicat ion uses bot h t he area code and t he local phone num ber t oget her. Therefore, we have decided t o leave t he PhoneCount ryCode alone but t o m erge t he PhoneAreaCode and PhoneLocal colum ns int o PhoneNum ber, reflect ing t he act ual usage of t he dat a by t he applicat ion ( because t he applicat ion does not use PhoneAreaCode or PhoneLocal individually) . We int roduced t he SynchronizePhoneNum ber t rigger t o keep t h values in t he four colum ns synchronized. The following SQL code depict s t he DDL t o int roduce t he PhoneNum ber colum n and t o event ually drop t he t wo original colum ns: ALTER TABLE Customer ADD PhoneNumber NUMBER(12); COMMENT ON Customer.PhoneNumber 'Added as the result of merging Customer.PhoneAreaCode and Customer.PhoneLocal finaldate = December 14 2007'; On December 14 2007 ALTER TABLE Customer DROP COLUMN PhoneAreaCode; ALTER TABLE Customer DROP COLUMN PhoneLocal;

Data-Migration Mechanics

You m ust convert all t he dat a from t he original colum n( s) int o t he m erge colum n, in t his case from Cust om er.PhoneAreaCode and Cust om er.PhoneLocal int o Cust om er.PhoneNum ber . The following SQL code depi t he DML t o init ially com bine t he dat a from PhoneAreaCode and PhoneLocal int o PhoneNum ber.

/*One-time migration of data from Customer.PhoneAreaCode and Customer.PhoneLocal to Customer.PhoneNumber. When both the columns are active, there is a need to have a trigger tha keeps both the columns in sync */ UPDATE Customer SET PhoneNumber = PhoneAreaCode*10000000 + PhoneLocal);

Access Program Update Mechanics

You need t o analyze t he access program s t horoughly, and t hen updat e t hem appropriat ely, during t he t ransit ion period. I n addit ion t o t he obvious, you need t o work wit h Cust om er.PhoneNum ber rat her t han t he form er unm erged colum ns. Pot ent ially, you m ust rem ove m erging code. There m ay be code t hat com bines t he exist ing colum ns int o a dat a at t ribut e sim ilar t o t he m erged colum n. This code should be refact ored and pot ent ially rem oved ent irely.

Second, you m ay also need t o updat e dat a- validat ion code t o work wit h m erged dat a. Som e dat a- validat ion cod m ay exist solely because t he colum ns have not yet been m erged. For exam ple, if a value is st ored in t wo separa colum ns, you m ay have validat ion code in place t hat verifies t hat t he values are t he sam e. Aft er t he colum ns ar m erged, t here m ay no longer be a need for t his code.

The before and aft er code snippet shows how t he get Cust om erPhoneNum ber( ) m et hod changes when we m erge the Cust om er.PhoneAreaCode and Cust om er.PhoneLocal colum ns: //Before code public String getCustomerPhoneNumber(Customer customer){ String phoneNumber = customer.getCountryCode(); phoneNumber.concat(phoneNumberDelimiter()); phoneNumber.concat(customer.getPhoneAreaCode()); phoneNumber.concat(customer.getPhoneLocal()); return phoneNumber; } //After code public String getCustomerPhoneNumber(Customer customer){ String phoneNumber = customer.getCountryCode(); phoneNumber.concat(phoneNumberDelimiter()); phoneNumber.concat(customer.getPhoneNumber()); return phoneNumber; }

Merge Tables Merge t wo or m ore t ables int o a single t able.

Motivation There are several reasons why you m ay want t o apply Merge Tables: Th e t a ble s a r e t h e r e su lt of ove r de sign . The original t ables were int roduced t o ensure t hat t he inform at ion was st ored in it s const it uent form s, but act ual usage shows t hat you do not need t he fine det ails t hat you originally t hought . For exam ple, t he Em ployee t able includes colum ns for em ployee ident ificat ion, as well as ot her dat a, whereas t he Em ployeeI dent ificat ion t able specifically capt ures j ust ident ificat ion inform at ion. Th e a ct u a l u sa ge of t h e t a ble s h a s be com e t h e sa m e . Over t im e, t he way t hat one or m ore t ables are used has changed t o t he point where several t ables are being used for t he sam e purpose. You could also have t ables t hat are relat ed t o one anot her in one- t o- one fashion; you m ay want t o m erge t he t ables t o avoid m aking t he j oin t o t he ot her t able. A good exam ple of t his is t he Em ployee t able m ent ioned previously. I t originally was used t o record em ployee inform at ion, but t he Em ployeeI dent ificat ion was int roduced t o st ore j ust ident ificat ion inform at ion. Som e people did not realize t hat t his t able exist ed, and evolved t he Em ployee t able t o capt ure sim ilar dat a. A t a ble is m ist a k e n ly r e pe a t e d. Two or m ore developers m ay have added t he t ables unbeknownst t o each ot her, a com m on occurrence when t he developers are on different t eam s or when t he m et a dat a describing t he t able schem a is not available. For exam ple, t he FeeSt ruct ure and FeeSchedule t ables bot h st ore t he init ial fee levied by t he bank when opening a checking account . The second t able was added because nobody was sure what t he FeeSt ruct ure t able was really being used for.

Potential Tradeoffs Merging t wo or m ore t ables can result in a loss of dat a precision when you m erge finely det ailed t ables. When you m erge t ables t hat ( you believe) are used for t he sam e purpose, you run t he risk t hat you should in fact be using t hem for separat e t hings. For exam ple, t he Em ployeeI dent ificat ion t able m ay have been int roduced t o separat e securit y- crit ical inform at ion int o a single t able t hat had lim it ed access right s. I f so, you will discover t hat you need t o reint roduce one or m ore of t he original t ables. The usage of t he dat a should det erm ine whet her t he t ables should be m erged.

Schema Update Mechanics As depict ed in Figure 6.7 , t o updat e t he dat abase schem a when you perform Merge Tables, you m ust do t wo t hings. First , int roduce t he m erged t able by adding t he colum ns from Em ployeeI dent ificat ion t o Em ployee t able via t he SQL com m and AD D COLUM N . Not e t hat Em ployee m ay already include som e or all of t he required colum ns, in which case inconsist encies or clut t ered dom ain code m ay exist ; t his refact oring should allow for sim plificat ion of t he applicat ion code.

Figu r e 6 .7 . M ovin g a ll t h e colu m n s fr om Em ploye e I de n t ifica t ion t o Em ploye e .

[View full size image]

Second, int roduce synchronizat ion t rigger( s) t o ensure t hat t he t ables rem ain synchronized wit h one anot her. The t rigger( s) m ust be invoked by any change t o t he colum ns. You need t o im plem ent t he t rigger so t hat cycles do not occurif t he value in one of t he original colum ns changes, Em ployee should be updat ed, but t hat updat e should not t rigger t he sam e updat e t o t he original t ables and so on. Figure 6.7 depict s an exam ple where t he Em ployee t able init ially st ores t he em ployee dat a. Over t im e, we have also added Em ployeeI dent ificat ion t able t hat st ores em ployee ident ificat ion inform at ion. Therefore, we have decided t o m erge t he Em ployee and Em ployeeI dent ificat ion t ables so t hat we have all t he inform at ion regarding t he em ployee at one place. We int roduced t he SynchronizeI dent ificat ion t rigger t o keep t he values in t he t ables synchronized. The following SQL code depict s t he DDL t o int roduce t he Pict ure, VoicePrint , Ret inalPrint colum ns, and t hen t o event ually drop t he Em ployeeI dent ificat ion t able. [View full width]ALTER TABLE Employee ADD Picture BINARY; COMMENT ON Employee.Picture 'Added as the result of merging Employee and EmployeeIdentification finaldate = December 14 2007'; ALTER

TABLE Employee ADD VoicePrint BINARY;

COMMENT ON Employee.VoicePrint 'Added as the result of merging Employee and EmployeeIdentification finaldate = December 14 2007'; ALTER TABLE Employee ADD RetinalPrint BINARY; COMMENT ON Employee.RetinalPrint 'Added as the result of merging Employee and EmployeeIdentification finaldate = December 14 2007';

Data-Migration Mechanics

You m ust copy all t he dat a from t he original t ables( s) int o t he m erge t ablein t his case, from Em ployeeI dent ificat ion t o Em ployee. This can be done via several m eansfor exam ple, wit h an SQL script or wit h an ext ract - t ransform - load ( ETL) t ool. ( Wit h t his refact oring, t here should not be a t ransform st ep.) The following SQL code depict s t he DDL t o init ially com bine t he dat a from t he Em ployee and Em ployeeI dent ificat ion t ables: [View full width]/*One-time migration of data from Employee to EmployeeIdentification. When both the tables are active, there is a need to have a trigger that keeps both the tables in sync */ UPDATE Employee e SET e.Picture = (SELECT ei.Picture FROM EmployeeIdentificaion ei WHERE ei.EmployeeNumber = e.EmployeeNumber); UPDATE Employee e SET e.VoicePrint = (SELECT ei.VoicePrint FROM EmployeeIdentificaion ei WHERE ei.EmployeeNumber = e.EmployeeNumber); UPDATE Employee e SET e.RetinalPrint = (SELECT ei.RetinalPrint FROM EmployeeIdentificaion ei WHERE ei.EmployeeNumber = e.EmployeeNumber); On December 14 2007 DROP TRIGGER SynchronizeWithEmployee; DROP TRIGGER SynchronizeWithEmployeeIdentification; DROP TABLE EmployeeIdentification;

The following code shows how SynchronizeWit hEm ployeeI dent ificat ion and SynchronizeWit hEm ployee t riggers are used t o keep t he values in t he t ables synchronized: CREATE TRIGGER SynchronizeWithEmployeeIdentification BEFORE INSERT OR UPDATE OR DELETE ON Employee REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN updateOrCreateEmployeeIdentification; END IF; IF inserting THEN createNewEmployeeIdentification; END IF; IF deleting THEN deleteEmployeeIdentification; END IF; END; / CREATE TRIGGER SynchronizeWithEmployee

BEFORE INSERT OR UPDATE OR DELETE ON EmployeeIdentification REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN updateOrCreateEmployee; END IF; IF inserting THEN createNewEmployee; END IF; IF deleting THEN deleteEmployee; END IF; END; /

Access Program Update Mechanics I n addit ion t o t he obvious need t o work wit h Em ployee rat her t han t he form er unm erged t able( s) , pot ent ial updat es you need t o m ake are as follows: 1 . Sim plify da t a a cce ss code . Som e access code m ay exist t hat accesses t wo or m ore of t he t ables involved wit h t he m erge. For exam ple, t he Em ployee class m ay updat e it s inform at ion int o t he t wo t ables in which it is current ly st ored, t ables t hat have now been m erged int o one. 2 . I n com ple t e or con t r a dict or y u pda t e s. Now t hat t he dat a is st ored in one place, you m ay discover t hat individual access program s worked only wit h subset s of t he dat a. For exam ple, t he Cust om er class current ly updat es it s hom e phone num ber inform at ion in t wo t ables, yet it is really st ored in t hree t ables ( which have now been m erged int o one) . Ot her program s m ay realize t hat t he dat a qualit y in t he t hird t able was not very good and m ay include code t hat count eract s t he problem s. For exam ple, a report ing class m ay convert NULL phone num bers t o " Unknown," but now t hat t here are no NULL phone num bers, t his code can be rem oved. 3 . Som e m e r ge d da t a is n ot r e qu ir e d by som e a cce ss pr ogr a m s. Som e of t he access program s t hat current ly work wit h Em ployee need only t he dat a t hat it cont ains. However, now t hat colum ns from Em ployeeI dent ificat ion have been added, t he pot ent ial exist s t hat t he exist ing access program s will not updat e t hese new colum ns appropriat ely. Exist ing access program s m ay need t o be ext ended t o accept and work wit h t he new colum ns. For exam ple, t he source t able for t he Em ployee class m ay have had a Birt hDat e colum n m erged int o it . Minim ally, t he Em ployee class should not overwrit e t his colum n wit h invalid dat a, and it should insert an appropriat e value when a new cust om er obj ect is creat ed. You m ay need t o apply I nt roduce Default Value ( page 186) t o t he colum ns t hat are m erged int o Em ployee.

The following exam ple shows exam ple code changes when you apply Merge Tables t o Em ployee and Em ployeeI dent ificat ion: //Before code public Employee getEmployeeInformation (Long employeeNumber) throws SQLException { Employee employee = new Employee(); stmt.prepare( "SELECT EmployeeNumber, Name, PhoneNumber " + "FROM Employee" + "WHERE EmployeeNumber = ?"); stmt.setLong(1,employeeNumber);

stmt.execute(); ResultSet rs = stmt.executeQuery(); employee.setEmployeeNumber(rs.getLong ("EmployeeNumber")); employee.setName(rs.getLong("Name")); employee.setPhoneNumber(rs.getLong("PhoneNumber")); stmt.prepare( "SELECT Picture, VoicePrint, RetinalPrint " + "FROM EmployeeIdentification" + "WHERE EmployeeNumber = ?"); stmt.setLong(1,employeeNumber); stmt.execute(); rs = stmt.executeQuery(); employee.setPicture(rs.getBlob("Picture")); employee.setVoicePrint(rs.getBlob("VoicePrint")); employee.setRetinalPrint(rs.getBlob("RetinalPrint")); return employee; } //After code public Employee getEmployeeInformation (Long employeeNumber) throws SQLException { Employee employee = new Employee(); stmt.prepare( "SELECT EmployeeNumber, Name, PhoneNumber " + "Picture, VoicePrint, RetinalPrint "+ "FROM Employee" + "WHERE EmployeeNumber = ?"); stmt.setLong(1,employeeNumber); stmt.execute(); ResultSet rs = stmt.executeQuery(); employee.setEmployeeNumber(rs.getLong ("EmployeeNumber")); employee.setName(rs.getLong("Name")); employee.setPhoneNumber(rs.getLong("PhoneNumber")); employee.setPicture(rs.getBlob("Picture")); employee.setVoicePrint(rs.getBlob("VoicePrint")); employee.setRetinalPrint(rs.getBlob("RetinalPrint")); return employee; }

Move Column Migrat e a t able colum n, wit h all of it s dat a, t o anot her exist ing t able.

Motivation There are several reasons t o apply Move Colum n. The first t wo reasons m ay appear cont radict ory, but rem em b dat abase refact oring is sit uat ional. Com m on m ot ivat ions t o apply Move Colum n include t he following:

N or m a liza t ion . I t is com m on t hat an exist ing colum n breaks one of t he rules of norm alizat ion. By m oving colum n t o anot her t able, you can increase t he norm alizat ion of t he source t able and t hereby reduce dat a r wit hin your dat abase.

D e n or m a liza t ion t o r e du ce com m on j oin s. I t is quit e com m on t o discover t hat a t able is included in a t o gain access t o a single colum n. You can im prove perform ance by rem oving t he need t o perform t his j oin m oving t he colum n int o t he ot her t able.

Re or ga n iza t ion of a split t a ble . You previously perform ed Split Table ( page 145) , or t he t able was effec in t he original design, and you t hen realize t hat one m ore colum n needs t o be m oved. Perhaps t he colum n t he com m only accessed t able but is rarely needed, or perhaps it exist s in a rarely accessed t able but is nee oft en. I n t he first case, net work perform ance would be im proved by not select ing and t hen t ransm it t ing t h t o t he applicat ions when it is not required; in t he second case, dat abase perform ance would be im proved b few j oins would be required.

Potential Tradeoffs

Moving a colum n t o increase norm alizat ion reduces dat a redundancy but m ay decrease perform ance if addit iona required by your applicat ions t o obt ain t he dat a. Conversely, if you im prove perform ance by denorm alizing your t hrough m oving t he colum n, you will increase dat a redundancy.

Schema Update Mechanics To updat e t he dat abase schem a when you perform Move Colum n, you m ust do t he following:

1 . I de n t ify de le t ion r u le ( s) . What should happen when a row from one t able is delet ed? Should t he corresp row in t he ot her t able be delet ed, should t he colum n value in t he corresponding be nulled/ zeroed out , shoul corresponding value be set t o som e sort of default value, or should t he corresponding value be left alone? T will be im plem ent ed in t rigger code ( as discussed lat er in t his sect ion) . Not e t hat zero or m ore delet ion t rigg already exist t o support referent ial int egrit y rules bet ween t he t ables. 2 . I de n t ify in se r t ion r u le ( s) . What should happen when a row is insert ed int o one of t he t ables? Should a corresponding row in t he ot her t able be insert ed or should not hing occur? This rule will be im plem ent ed in t code, and zero or m ore insert ion t riggers m ay already exist . 3 . I n t r odu ce t h e n e w colu m n . Add t he colum n t o t he t arget t able via t he SQL com m and AD D COLUM N 6.8, t his is Account .Balance.

Figu r e 6 .8 . M ovin g t h e Ba la n ce colu m n fr om Cu st om e r t o Accou n t . [View full size image]

4 . I n t r odu ce t r igge r s. You require t riggers on bot h t he original and new colum n t o copy dat a from one colum ot her during t he t ransit ion period. These t rigger( s) m ust be invoked by any change t o a row.

Figure 6.8 depict s an exam ple where Cust om er.Balance is m oved int o t he Account t able. This is a norm alizat ion issueinst ead of st oring a balance each t im e a cust om er's account is updat ed, we can inst ead st ore it once for ea individual account . During t he t ransit ion period, t he Balance colum n appears in bot h Cust om er and Account expect .

The exist ing t riggers are int erest ing. The Account t able already had a t rigger for insert s and updat es t hat check t hat t he corresponding row exist s in t he Cust om er t able, a basic referent ial int egrit y ( RI ) check. This t rigger is le The Cust om er t able had a delet e t rigger t o ensure t hat it is not delet ed if an Account row refers t o it , anot her RI The advant age of t his is t hat we do not need t o im plem ent a delet ion rule for t he m oved colum n because we ca t he wrong t hing" and delet e a Cust om er row t hat has one or m ore Account rows referencing it .

I n t he following code, we int roduce t he Account .Balance colum n and t he SynchronizeCust om erBalance and SynchronizeAccount Balance t riggers t o keep t he Balance colum ns synchronized. The code also includes t he scrip t he scaffolding code aft er t he t ransit ion period ends:

ALTER TABLE Account ADD Balance NUMBER(32,7); COMMENT ON Account.Balance 'Moved from Customer table, finaldate = June 14 2007'; COMMENT ON Customer.Balance 'Moved to Account table, dropdate = June 14 2007'; CREATE OR REPLACE TRIGGER SynchronizeCustomerBalance BEFORE INSERT OR UPDATE ON Account REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF :NEW.Balance IS NOT NULL THEN UpdateCustomerBalance; END IF; END; CREATE OR REPLACE TRIGGER SynchronizeAccountBalance BEFORE INSERT OR UPDATE OR DELETE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF DELETING THEN DeleteCustomerIfAccountNotFound; END IF; IF (UPDATING OR INSERTING) THEN IF :NEW.Balance IS NOT NULL THEN UpdateAccountBalanceForCustomer; END IF; END IF; END; On June 14 2007 ALTER TABLE Customer DROP COLUMN Balance; DROP TRIGGER SynchronizeCustomerBalance; DROP TRIGGER SynchronizeAccountBalance;

Data-Migration Mechanics

Copy all t he dat a from t he original colum n int o t he new colum nin t his case, from Cust om er.Balance t o Account .B This can be done via several m eansfor exam ple, wit h a SQL script or wit h an ETL t ool. ( Wit h t his refact oring, t he not be a t ransform st ep.) The following code depict s t he DML t o m ove t he Balance colum n values from Cust om e Account : [View full width]/*One-time migration of data from Customer.Balance to Account.Balance. When columns are active, there is a need to have a trigger that keeps both the balance columns in sync */ UPDATE Account SET Balance = (SELECT Balance FROM Customer WHERE CustomerID = Account.CustomerID);

Access Program Update Mechanics

You need t o analyze t he access program s t horoughly and t hen updat e t hem appropriat ely during t he t ransit ion Pot ent ial updat es you need t o m ake are as follows:

1 . Re w or k j oin s t o u se t h e m ove d colu m n . Joins, eit her hard- coded in SQL or defined via m et a dat a, m us refact ored t o work wit h t he m oved colum n. For exam ple, when you m ove Cust om er.Balance t o Account .Bal have t o change your queries t o get t he balance inform at ion from Account and not from Cust om er. 2 . Add t h e n e w t a ble t o j oin s. The Account t able m ust now be included in j oins if it is not already included. degrade perform ance.

3 . Re m ove t h e or igin a l t a ble fr om j oin s. There m ay be som e j oins t hat included t he Cust om er t able for t h purpose of j oining in t he dat a from Cust om er.Balance. Now t hat t his colum n has been m oved, t he Cust om e can be rem oved from t he j oin, which could pot ent ially im prove perform ance.

The following code shows you how t he Cust om er.Balance colum n is referenced in t he original code and t he upda t hat works wit h Account .Balance: //Before code public BigDecimal getCustomerBalance(Long customerId) throws SQLException { PreparedStatement stmt = null; BigDecimal customerBalance = null; stmt = DB.prepare("SELECT Balance FROM Customer " + "WHERE CustomerId = ?"); stmt.setLong(1, customerId.longValue()); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customerBalance = rs.getBigDecimal("Balance"); } return customerBalance; } //After code public BigDecimal getCustomerBalance(Long customerId) throws SQLException { PreparedStatement stmt = null; BigDecimal customerBalance = null; stmt = DB.prepare( "SELECT SUM(Account.Balance) Balance " + "FROM Customer, Account " + "WHERE Customer.CustomerId= Account.CustomerId " + "AND CustomerId = ?"); stmt.setLong(1, customerId.longValue()); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customerBalance = rs.getBigDecimal("Balance"); } return customerBalance; }

Rename Column Renam e an exist ing t able colum n.

Motivation The prim ary reasons t o apply Renam e Colum n are t o increase t he readabilit y of your dat abase schem a, t o conform t o accept ed dat abase nam ing convent ions in your ent erprise, or t o enable dat abase port ing. For exam ple, when you are port ing from one dat abase product t o anot her, you m ay discover t hat t he original colum n nam e cannot be used because it is a reserved keyword in t he new dat abase.

Potential Tradeoffs The prim ary t rade- off is t he cost of refact oring t he ext ernal applicat ions t hat access t he colum n versus t he im proved readabilit y and/ or consist ency provided by t he new nam e.

Schema Update Mechanics To renam e a colum n, you m ust do t he following: 1 . I n t r odu ce t h e n e w colu m n . I n Figure 6.9 , we first add First Nam e t o t he t arget t able via t he SQL com m and AD D COLUM N .

Figu r e 6 .9 . Re n a m in g t h e Cu st om e r .FN a m e colu m n .

2 . I n t r odu ce a syn ch r on iza t ion t r igge r . As you can see in Figure 6.9 , you require a t rigger t o copy dat a from one colum n t o t he ot her during t he t ransit ion period. This t rigger m ust be invoked by any change t o t he dat a row. 3 . Re n a m e ot h e r colu m n s. I f FNam e is used in ot her t ables as ( part of) a foreign key, you m ay want t o apply Renam e Colum n recursively t o ensure nam ing consist ency. For exam ple, if Cust om er.Cust om erNum ber is renam ed as Cust om er.Cust om erI D, you m ay want t o go ahead and renam e all inst ances of Cust om erNum ber in ot her t ables. Therefore, Account .Cust om erNum ber will now be renam ed t o Account .Cust om erI D t o keep t he colum n nam es consist ent .

The following code depict s t he DDL t o renam e Cust om er.FNam e t o Cust om er.First Nam e, creat es t he SynchronizeFirst Nam e t rigger t hat synchronizes t he dat a during t he t ransit ion period, and rem oves t he original colum n and t rigger aft er t he t ransit ion period ends: ALTER TABLE Customer ADD FirstName VARCHAR(40); COMMENT ON Customer.FirstName 'Renaming of FName column, finaldate = November 14 2007'; COMMENT ON Customer.FName 'Renamed to FirstName, dropdate = November 14 2007'; UPDATE Customer SET FirstName = FName; CREATE OR REPLACE TRIGGER SynchronizeFirstName BEFORE INSERT OR UPDATE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF INSERTING THEN IF :NEW.FirstName IS NULL THEN :NEW.FirstName := :NEW.FName; END IF; IF :NEW.Fname IS NULL THEN :NEW.FName := :NEW.FirstName; END IF; END IF; IF UPDATING THEN IF NOT(:NEW.FirstName=:OLD.FirstName) THEN :NEW.FName:=:NEW.FirstName; END IF; IF NOT(:NEW.FName=:OLD.FName) THEN :NEW.FirstName:=:NEW.FName; END IF; END IF; END; / On Nov 30 2007 DROP TRIGGER SynchronizeFirstName; ALTER TABLE Customer DROP COLUMN FName;

Data-Migration Mechanics

You need t o copy all t he dat a from t he original colum n int o t he new colum n, in t his case from FNam e t o First Nam e. See t he refact oring Move Dat a ( page 192) for det ails.

Access Program Update Mechanics Ext ernal program s t hat reference Cust om er.FNam e m ust be updat ed t o reference colum ns by it s new nam e. You should sim ply have t o updat e any em bedded SQL and/ or m apping m et a dat a. The following hibernat e m apping files show how your m apping files would change when t he fNam e colum n is renam ed: //Before mapping

//Transition mapping

//After mapping

Rename Table Renam e an exist ing t able.

Motivation The prim ary reason t o apply Renam e Table is t o clarify t he t able's m eaning and int ent in t he overall dat abase schem a or t o conform t o accept ed dat abase nam ing convent ions. I deally, t hese reasons are one in t he sam e.

Potential Tradeoffs The prim ary t radeoff is t he cost t o refact oring t he ext ernal applicat ions t hat access t he t able versus t he im proved readabilit y and/ or consist ency provided by t he new nam e.

Schema Update Mechanics via a New Table To perform Renam e Table, you creat e a new t able using t he SQL com m and CREATE TABLE, in t his case Cust om er. I f any colum ns of Cust _TB_Prod are used in ot her t ables as ( part of) a foreign key, you m ust refact or t hose const raint s and/ or indices im plem ent ing t he foreign key t o refer t o Cust om er. We want t o renam e Cust _TB_Prod t o Cust om er as depict ed in Figure 6.10. The SynchronizeCust _TB_Prod and SynchronizeCust om er t riggers keep t he t wo t ables synchronized wit h each ot her. Each t rigger is invoked by any change t o a row in t he Cust _TB_Prod or Cust om er t able, respect ively. The following code depict s t he DDL t o renam e t he t able and int roduce t he t riggers:

Figu r e 6 .1 0 . Re n a m in g t h e Cu st _ TB_ Pr od t a ble t o Cu st om e r . [View full size image]

CREATE TABLE Customer (FirstName VARCHAR(40), LastName VARCHAR(40), );

COMMENT ON Customer 'Renaming of Cust_TB_Prod, finaldate = September 14 2006'; COMMENT ON Cust_TB_Prod 'Renamed to Customer, dropdate = September 14 2006'; CREATE OR REPLACE TRIGGER SynchronizeCustomer BEFORE INSERT OR UPDATE ON Cust_TB_Prod REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN findAndUpdateIfNotFoundCreateCustomer; END IF; IF inserting THEN createNewIntoCustomer; END IF; IF deleting THEN deleteFromCustomer; END IF; END; / CREATE OR REPLACE TRIGGER SynchronizeCust_TB_Prod BEFORE INSERT OR UPDATE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN findAndUpdateIfNotFoundCreateCust_TB_Prod; END IF; IF inserting THEN createNewIntoCust_TB_Prod; END IF; IF deleting THEN deleteFromCust_TB_Prod; END IF; END; /

Schema Update Mechanics via an Updateable View The second approach is t o renam e t he t able and t hen int roduce an updat eable view wit h t he original nam e of t he t able. Not e t hat som e dat abase product s support s t he REN AM E opt ion of t he ALTER TABLE com m and. I f yours does not , you m ust re- creat e t he t able wit h t he new nam e, and t hen load t he dat a int o t he t able. You should schedule t he drop of t he old t able so t hat it is not accident ally used by som eone. As you see in Figure 6.11, t he updat eable view is needed during t he t ransit ion period t o support ext ernal access program s t hat have yet t o be refact ored t o use t he renam ed t able. This st rat egy is viable only if your dat abase support s updat eable views.

Figu r e 6 .1 1 . Re n a m in g t h e Cu st _ TB_ Pr od t a ble t o Cu st om e r via a vie w .

Figure 6.11 shows how t o renam e t he t able using a view. You sim ply use t he REN AM E TO clause of ALTER TABLE SQL com m and t o renam e t he t able and creat e t he view, as shown here: ALTER TABLE Cust_tb_Prod RENAME TO Customer; CREATE VIEW Cust_tb_Prod AS SELECT * FROM Customer;

As wit h t he new t able approach, if any colum ns of Cust _TB_Prod are used in ot her t ables as ( part of) a foreign key, you m ust re- creat e t hose const raint s and/ or indices im plem ent ing t he foreign key t o refer t o Cust om er.

Data-Migration Mechanics Wit h t he updat eable view approach, you do not need t o m igrat e dat a. However, wit h t he new t able approach, you m ust first copy t he dat a. Copy all t he dat a from t he original t able int o t he new t ablein t his case, from Cust _TB_Prod t o Cust om er. Second, you m ust int roduce t riggers on bot h t he original and new t able t o copy dat a from one t able t o t he ot her during t he t ransit ion period. These t riggers m ust be invoked by any change t o t he t ables. You need t o im plem ent t he t riggers so t hat cycles do not occurif Cust _TB_Prod changes, Cust om er m ust also be updat ed, but t hat updat e should not t rigger t he sam e updat e t o Cust _TB_Prod and so on. The following code shows how t o copy t he dat a from Cust _TB_Prod int o Cust om er: INSERT INTO Customer SELECT * FROM CUST_TB_PROD;

Access Program Update Mechanics Ext ernal access program s m ust be refact ored t o work wit h Cust om er rat her t han Cust _TB_Prod. The following hibernat e m apping shows t he change you have t o m ake when you renam e t he Cust _TB_Prod t able: //Before mapping

.....

//After mapping

.....

Rename View Renam e an exist ing view.

Motivation The prim ary reason t o apply Renam e View is t o increase t he readabilit y of your dat abase schem a or t o conform t o accept ed dat abase nam ing convent ions. I deally, t hese reasons are one in t he sam e.

Potential Tradeoffs The prim ary t radeoff is t he cost of refact oring t he ext ernal applicat ions t hat access t he view versus t he im proved readabilit y and/ or consist ency provided by t he new nam e.

Schema Update Mechanics To perform Renam e View , you m ust do t he following: 1 . I n t r odu ce t h e n e w vie w . Creat e a new view using t he SQL com m and CREATE VI EW . I n Figure 6.12 , t his is Cust om erOrders, t he definit ion of which has t o m at ch wit h Cust Ords.

Figu r e 6 .1 2 . Re n a m in g t h e Cu st Or ds vie w t o Cu st om e r Or de r s.

2 . D e pr e ca t e t h e or igin a l vie w . Aft er you creat e Cust om erOrders, you want t o indicat e t hat Cust Ords should no longer be updat ed wit h new feat ures or bug fixes. 3 . Re de fin e t h e old vie w . You should redefine Cust Ords t o be based on Cust om erOrders t o avoid duplicat e code st ream s. The benefit is t hat any changes t o Cust om erOrders, such as a new dat a source for a colum n, will propagat e t o Cust Ords wit hout any addit ional work.

The following code depict s t he DDL t o creat e Cust om erOrders, which is ident ical t o t he code t hat was used t o creat e Cust Ords:

CREATE VIEW CustomerOrders AS SELECT Customer.CustomerCode, Order.OrderID, Order.OrderDate, Order.ProductCode FROM Customer,Order WHERE Customer.CustomerCode = Order.CustomerCode AND Order.ShipDate = TOMORROW ; COMMENT ON CustomerOrders 'Renamed from CustOrds, CustOrds dropdate = September 15 2007';

The following code drops and t hen re- creat es t he Cust Ords so t hat it derives it s result s from t he Cust om erOrders: DROP VIEW CustOrds; CREATE VIEW CustOrds AS SELECT CustomerCode, OrderID, OrderDate, ProductCode FROM CustomerOrders ;

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics You m ust refact or all t he ext ernal program s current ly accessing Cust Ords t o access Cust om erOrders. I n t he case of hard- coded SQL, t his will sim ply be t he updat e of t he FROM / I N TO clauses, and in t he case of m et a dat a- driven approaches, t he updat e of t he nam e wit hin t he represent at ion for t his view. The following code shows how a reference t o Cust Ords should be changed t o Cust om erOrders: // Before code stmt.prepare( "SELECT * " + "FROM CustOrds "+ "WHERE CustomerId = ?"); stmt.setLong(1,customer.getCustomerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); // After code stmt.prepare( "SELECT * " + "FROM CustomerOrders " + "WHERE " + " CustomerId = ?"); stmt.setLong(1,customer.getCustomerID); stmt.execute(); ResultSet rs = stmt.executeQuery();

Replace LOB With Table Replace a large obj ect ( LOB) colum n t hat cont ains st ruct ured dat a wit h a new t able or wit h new colum ns in t he sam e t able. LOBs are t ypically st ored as eit her a binary large obj ect ( BLOB) , a variable charact er ( VARCHAR) , or in som e cases as XML dat a.

Motivation The prim ary reason t o replace a LOB wit h a t able is because you need t o t reat part s of t he LOB as dist inct dat a elem ent s. This is quit e com m on wit h XML dat a st ruct ures t hat have been st ored in a single colum n, oft en t o avoid " shredding" t he st ruct ure int o individual colum ns.

Potential Tradeoffs The advant age of st oring a com plex dat a st ruct ure in a single colum n is t hat you can quickly get t hat specific dat a st ruct ure easily. This proves part icularly valuable when exist ing code already works wit h t he dat a st ruct ure in quest ion and m erely needs t o use t he dat abase as a handy file- st orage m echanism . By replacing t he LOB wit h a t able, or perhaps several t ables if t he st ruct ure of t he dat a cont ained wit hin t he LOB is very com plex, you can easily work wit h t he individual dat a elem ent s wit hin your dat abase. You also m ake t he dat a m ore accessible t o ot her applicat ions t hat m ay not need t he exact st ruct ure cont ained wit hin t he LOB. Furt herm ore, if t he LOB cont ains som e dat a t hat is already present wit hin your dat abase, you can pot ent ially use t hose exist ing dat a sources t o represent t he appropriat e port ions of t he LOB, reducing dat a redundancy ( and t hus int egrit y errors) . The disadvant age of t his approach is t he increased t im e and com plexit y required t o shred t he dat a t o st ore it wit hin t he dat abase and sim ilarly t o ret rieve and convert it back int o t he required st ruct ure.

Schema Update Mechanics As you see in Figure 6.13, applying Replace LOB Wit h Table is st raight forward. You need t o do t he following:

Figu r e 6 .1 3 . Re pla cin g a LOB w it h a t a ble. [View full size image]

1 . D e t e r m in e a t a ble sch e m a . You need t o analyze Cust om er.MailingAddress t o det erm ine t he dat a t hat it cont ains, and t hen develop a t able schem a t o st ore t hat dat a. I f t he st ruct ure cont ained wit hin MailingAddress is com plex, you eit her need t o have sm aller LOB colum ns wit hin t he new t able or recursively apply Replace LOB Wit h Table t o deal wit h t hese sm aller st ruct ures. 2 . Add t h e t a ble . I n Figure 6.13, t his is Cust om erAddress. The colum ns of t his t able are t he prim ary key of Cust om er, t he Cust om erPOI D colum n, and t he new colum ns cont aining t he dat a from MailingAddress. 3 . D e pr e ca t e t h e or igin a l colu m n . MailingAddress m ust be m arked for delet ion at t he end of t he deprecat ion period. 4 . Add a n e w in de x . For perform ance reasons, you m ay need t o int roduce an new index for Cust om erAddress via t he CREATE I N D EX com m and. 5 . I n t r odu ce syn ch r on iza t ion t r igge r s. Cust om er will require a t rigger t o populat e t he values in Cust om erAddress appropriat ely. This t rigger will need t o shred t he MailingAddress st ruct ure and st ore it appropriat ely. Sim ilarly, a t rigger on Cust om erAddress is needed t o updat e Cust om er during t he t ransit ion period.

The code t o creat e t he Cust om erAddress t able, add an index, define t he synchronizat ion t riggers, and event ually drop t he colum n and t riggers is shown here: CREATE TABLE CustomerAddress ( CustomerPOID NUMBER NOT NULL, Street VARCHAR(40), City VARCHAR(40), State VARCHAR(40), ZipCode VARCHAR(10) ); CREATE INDEX IndexCustomerAddress ON CustomerAddress(CustomerPOID); Triggers to keep the tables synchronized CREATE OR REPLACE TRIGGER SynchronizeWithCustomerAddress BEFORE INSERT OR UPDATE OR DELETE

ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN FindOrCreateCustomerAddress; END IF; IF inserting THEN CreateCustomerAddress; END IF; IF deleting THEN DeleteCustomerAddress; END IF; END; / CREATE OR REPLACE TRIGGER SynchronizeWithCustomer BEFORE INSERT OR UPDATE OR DELETE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating OR inserting THEN FindAndUpdateCustomer; END IF; IF deleting THEN UpdateCustomerNullAddress; END IF; END; / On Dec 14 2007 DROP TRIGGER SynchronizeWithCustomerAddress; DROP TRIGGER SynchronizeWithCustomer; ALTER TABLE Customer DROP COLUMN MailingAddress;

Data-Migration Mechanics Cust om erAddress m ust be populat ed by shredding and t hen copying t he dat a cont ained in Cust om er.MailingAddress. The value of Cust om er.Cust om erPOI D m ust also be copied t o m aint ain t he relat ionship. I f MailingAddress has a NULL or em pt y value, a row in Cust om erAddress does not need t o be creat ed. This can be accom plished via one or m ore SQL script s, as you see in t he following code: INSERT INTO CustomerAddress SELECT CustomerPOID, ExtractStreet(MailingAddress), ExtractCity(MailingAddress), ExtractState(MailingAddress), ExtractZipCode(MailingAddress) FROM Customer WHERE MailingAddress IS NOT NULL;

Access Program Update Mechanics

You m ust ident ify any ext ernal program s referencing Cust om er.MailingAddress so t hat t hey can be updat ed t o work wit h Cust om erAddress as appropriat e. You will need t o do t he following: 1 . Re m ove t r a n sla t ion code . Ext ernal program s could have code t hat shreds t he dat a wit hin MailingAddress t o work wit h it s subdat a elem ent s, or t hey could cont ain code t hat t akes t he source dat a elem ent s and builds t he form at t o be st ored int o MailingAddress. This code will no longer be needed wit h t he new dat a st ruct ure. 2 . Add t r a n sla t ion code . Conversely, som e ext ernal program s m ay require t he exact dat a st ruct ure cont ained wit hin MailingAddress. I f several applicat ions require t his, you should consider int roducing st ored procedures or int roduce a library wit hin t he dat abase t o do t his t ranslat ion, enabling reuse. 3 . W r it e code t o a cce ss t h e n e w t a ble . Aft er you add Cust om erAddress, you have t o writ e applicat ion code t hat uses t his new t able rat her t han MailingAddress.

The following code shows how t he code t o ret rieve dat a at t ribut es from Cust om er.MailingAddress is replaced wit h a SELECT against t he Cust om er- Address t able: // Before code public Customer findByCustomerID(Long customerPOID) { Customer customer = new Customer(); stmt = DB.prepare("SELECT CustomerPOID, " + "MailingAddress, Name, PhoneNumber " + "FROM Customer " + "WHERE CustomerPOID = ?"); stmt.setLong(1, customerPOID); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customer.setCustomerId(rs.getLong("CustomerPOID")); customer.setName(rs.getString("Name")); customer.setPhoneNumber(rs.getString ("PhoneNumber")); String mailingAddress = rs.getString ("MailingAddress"); customer.setStreet(extractStreet(mailingAddress)); customer.setCity(extractCity(mailingAddress)); customer.setState(extractState(mailingAddress)); customer.setZipCode(extractZipCode(mailingAddress)); } return customer; } // After code public Customer findByCustomerID(Long customerPOID) { Customer customer = new Customer(); stmt = DB.prepare("SELECT CustomerPOID, "+ "Name, PhoneNumber, "+ "Street, City, State, ZipCode " + "FROM Customer, CustomerAddress " + "WHERE Customer.CustomerPOID = ? " + "AND Customer.CustomerPOID = CustomerAddress.CustomerPOID"); stmt.setLong(1, customerPOID); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customer.setCustomerId(rs.getLong("CustomerPOID"));

customer.setName(rs.getString("Name")); customer.setPhoneNumber(rs.getString ("PhoneNumber")); customer.setStreet(rs.getString("Street")); customer.setCity(rs.getString("City")); customer.setState(rs.getSring("State")); customer.setZipCode(rs.getString("ZipCode")); } return customer; }

Replace Column Replace an exist ing nonkey colum n wit h a new one. For replacing a colum n t hat is part of a key, eit her t he prim ary key or an alt ernat e key, see t he I nt roduce Surrogat e Key ( page 85 ) and Replace Surrogat e Key Wit h Nat ural Key ( page 135) refact orings.

Motivation There are t wo reasons why you want t o apply Replace Colum n. First , t he m ost com m on reason is t hat usage of t he colum n has changed over t im e, requiring you t o change it s t ype. For exam ple, you previously had a num eric cust om er ident ifier, but now your business st akeholders have m ade it alphanum eric. Second, t his m ay be an int erm ediat e st ep t o im plem ent ot her refact orings. Anot her com m on reason t o replace an exist ing colum n is t hat it is oft en an im port ant st ep in m erging t wo sim ilar dat a sources, or applying Consolidat e Key St rat egy ( page 168) because you need t o ensure t ype and form at consist ency wit h anot her colum n.

Potential Tradeoffs A significant risk when replacing a colum n is inform at ion loss when t ransferring t he dat a t o t he replacem ent colum n. This is part icularly t rue when t he t ypes of t he t wo colum ns are significant ly different convert ing from a CHAR t o a VARCH AR is st raight forward as is N UM ERI C t o CHAR, but convert ing CHAR t o N UM ERI C can be problem at ic when t he original colum n cont ains non- num eric charact ers.

Schema Update Mechanics To apply Replace Colum n, you m ust do t he following: 1 . I n t r odu ce t h e n e w colu m n . Add t he colum n t o t he t arget t able via t he SQL com m and AD D COLUM N . I n Figure 6.14, t his is Cust om erI D.

Figu r e 6 .1 4 . Re pla cin g t h e Cu st om e r .Cu st om e r N u m be r colu m n .

2 . D e pr e ca t e t h e or igin a l colu m n . Cust om erNum ber m ust be m arked for delet ion at t he end of your chosen t ransit ion period. 3 . I n t r odu ce a syn ch r on iza t ion t r igge r . As you can see in Figure 6.14, you require a t rigger t o copy dat a from one colum n t o t he ot her during t he t ransit ion period. This t rigger m ust be invoked by any change t o a dat a row. 4 . Upda t e ot h e r t a ble s. I f Cust om erNum ber is used in ot her t ables as part of a foreign key, you will want t o replace t hose colum ns sim ilarly, as well as updat e any corresponding index definit ions.

The following SQL code depict s t he DDL t o replace t he colum n, creat e t he synchronizat ion t rigger, and event ually drop t he colum n and t rigger aft er t he t ransit ion period: ALTER TABLE Customer ADD CustomerID CHAR(12); COMMENT ON Customer.CustomerID 'Replaces CustomerNumber column, finaldate = 2007-06-14'; COMMENT ON Customer.CustomerNumber 'Replaced with CustomerID, dropdate = 2007-06-14'; CREATE OR REPLACE TRIGGER SynchronizeCustomerIDNumber BEFORE INSERT OR UPDATE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF :NEW.CustomerID IS NULL THEN :NEW.CustomerID:= formatCustomerNumber(:New.CustomerNumber); END IF; IF :NEW.CustomerNumber IS NULL THEN

:New.CustomerNumber := :New.CustomerID; END IF; END; / On June 14 2007 DROP TRIGGER SynchronizeCustomerIDNumber; ALTER TABLE Customer DROP COLUMN CustomerNumber;

Data-Migration Mechanics The dat a m ust be init ially copied from Cust om erNum ber t o Cust om erI D and t hen kept synchronized during t he t ransit ion period ( for exam ple, via st ored procedures) . As described earlier, t his can be problem at ic wit h t he dat a form at s are significant ly different from one anot her. Before applying Replace Colum n, you m ay discover t hat you need t o apply one or m ore dat a qualit y refact orings t o clean up t he source dat a first . The code t o copy t he values int o t he new colum n is shown here: UPDATE Customer SET CustomerID = CustomerNumber;

Access Program Update Mechanics The prim ary issue is t hat ext ernal program s need t o be refact ored t o work wit h t he new dat a t ype and form at of Cust om erI D. This could im ply t hat conversion code be writ t en t hat convert s back and fort h bet ween t he old dat a form at and t he new. A longer- t erm st rat egy, alt hough pot ent ially a m ore expensive one, would be t o com plet ely rework all t he ext ernal program code t o use t he new dat a form at . The following code snippet shows you how t he colum n nam e and t he dat a t ype needs t o change in t he applicat ion code: // Before code public Customer findByCustomerID(Long customerID) { Customer customer = new Customer(); stmt = DB.prepare("SELECT CustomerPOID, " + "CustomerNumber, FirstName, LastName " + "FROM Customer " + "WHERE CustomerPOID = ?"); stmt.setLong(1, customerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customer.setCustomerPOID(rs.getLong ("CustomerPOID")); customer.setCustomerNumber(rs.getInt ("CustomerNumber")); customer.setFirstName(rs.getString("FirstName")); customer.setLastName(rs.getString("LastName")); } return customer; } // After code public Customer findByCustomerID(Long customerID) { Customer customer = new Customer(); stmt = DB.prepare("SELECT CustomerPOID, " + "CustomerID, FirstName, LastName " + "FROM Customer " + "WHERE CustomerPOID = ?");

stmt.setLong(1, customerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customer.setCustomerPOID(rs.getLong ("CustomerPOID")); customer.setCustomerID(rs.getString("CustomerID")); customer.setFirstName(rs.getString("FirstName")); customer.setLastName(rs.getString("LastName")); } return customer; }

Replace One-To-Many With Associative Table Replace a one- t o- m any associat ion bet ween t wo t ables wit h an associat ive t able.

Motivation The prim ary reason t o int roduce an associat ive t able bet ween t wo t ables is t o im plem ent a m any- t om any associat ion bet ween t hem lat er on. I t is quit e com m on for a one- t o- m any associat ion t o evolve int o a m any- t o- m any associat ion. For exam ple, any given em ployee current ly has at m ost one m anager. ( The president of t he com pany is t he only person wit hout a m anager.) However, t he com pany want s t o m ove t o a m at rix organizat ion st ruct ure where people can pot ent ially report t o several m anagers. Because a one- t o- m any associat ion is a subset of a m any- t o- m any associat ion, t he new associat ive t able would im plem ent t he exist ing hierarchical organizat ion st ruct ure yet be ready for t he com ing m at rix st ruct ure. You m ay also want t o add inform at ion t o t he relat ionship it self t hat does not belong t o eit her of t he exist ing t ables.

Potential Tradeoffs You are overbuilding your dat abase schem a when you use an associat ive t able t o im plem ent a one- t om any associat ion. I f t he associat ion is not likely t o evolve int o a m any- t o- m any relat ionship, it is not advisable t o t ake t his approach. When you add associat ive t ables, you are increasing t he num ber of j oins you have t o m ake t o get t o t he relevant dat a, t hus degrading perform ance and m aking it harder t o underst and t he dat abase schem a.

Schema Update Mechanics To apply Replace One- To- Many Wit h Associat ive Table, you m ust do t he following: 1 . Add t h e a ssocia t ive t a ble . I n Figure 6.15, t his is Holds. The colum ns of t his t able are t he com binat ion of t he prim ary keys of Cust om er and Policy. Not e t hat som e t ables m ay not necessarily have a prim ary key, alt hough t his is rarewhen t his is t he case, you m ay decide t o apply t he I nt roduce Surrogat e Key refact oring.

Figu r e 6 .1 5 . Re pla cin g a on e - t o- m a n y w it h a n a ssocia t ive t a ble . [View full size image]

2 . D e pr e ca t e t h e or igin a l colu m n . Because we no longer m aint ain t he relat ionship direct ly from Policy t o Cust om er, Policy.Cust om erPOI D m ust be m arked for delet ion at t he end of t he t ransit ion period, because it is current ly used t o m aint ain t he one- t o- m any relat ionship wit h Cust om er but will no longer be needed. 3 . Add a n e w in de x . A new index for Holds should be int roduced via the I nt roduce I ndex ( page 248) refact oring. 4 . I n t r odu ce syn ch r on iza t ion t r igge r s. Policy will require a t rigger t hat will populat e t he key values in t he Holds t able, if t he appropriat e values do not already exist , during t he t ransit ion period. Sim ilarly, t here will need t o be a t rigger on Holds t hat verifies t hat Policy.Cust om erPOI D is populat ed appropriat ely.

The code t o add t he Holds t able, t o add an index on Holds, t o add t he synchronizat ion t riggers, and finally t o drop t he old schem a and t riggers is shown here: CREATE TABLE Holds ( CustomerPOID BIGINT, PolicyID INT, ); CREATE INDEX HoldsIndex ON Holds (CustomerPOID, PolicyID); CREATE OR REPLACE TRIGGER InsertHoldsRow BEFORE INSERT OR UPDATE OR DELETE ON Policy REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN UpdateInsertHolds; END IF; IF inserting THEN CreateHolds; END IF; IF deleting THEN RemoveHolds; END IF; END;

/ CREATE OR REPLACE TRIGGER UpdatePolicyCustomerPOID BEFORE INSERT OR UPDATE OR DELETE ON Holds REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW BEGIN IF updating THEN UpdateInsertPolicy; END IF; IF inserting THEN CreatePolicy; END IF; IF deleting THEN RemovePolicy; END IF; END; / On Mar 15 2007 DROP TRIGGER InsertHoldsRow; DROP TRIGGER UpdatePolicyCustomerPOID; ALTER TABLE customer DROP COLUMN balance;

Set a Naming Convention There are t wo com m on nam ing convent ions for associat ive t ableseit her assign t he t able the sam e nam e as t he original associat ion, as we have done, or concat enat e t he t wo t able nam es, which would have result ed in Cust om erPolicy for t he nam e.

Data-Migration Mechanics The associat ive t able m ust be populat ed by copying t he values of Policy.Cust om erPOI D and Policy.PolicyI D int o Holds.Cust om erPOI D and Holds.PolicyI D, respect ively. This can be accom plished via a sim ple SQL script , as follows: INSERT INTO Holds (CustomerPOID,PolicyID) SELECT CustomerPOID,PolicyID FROM Policy

Access Program Update Mechanics To updat e ext ernal program s, you m ust do t he following:

1 . Re m ove u pda t e s t o t h e for e ign k e y. Any code t o assign values t o Policy.Cust om erPOI D should be refact ored t o writ e t o Holds t o m aint ain t he associat ion properly. 2 . Re w or k j oin s. Many ext ernal access program s will define j oins involving Cust om er and Policy, im plem ent ed eit her via hard- coded SQL or via m et a dat a. These j oins should be refact ored t o work wit h Holds. 3 . Re w or k r e t r ie va ls. Som e ext ernal program s will t raverse t he dat abase one or m ore rows at a t im e, ret rieving dat a based on t he key values, t raversing from Policy t o Cust om er. These ret rievals will need t o be updat ed sim ilarly.

The following code shows how t o change your applicat ion code so t hat ret rieval of dat a is now done via a j oin using t he associat ive t able: //Before code stmt.prepare( "SELECT Customer.CustomerPOID, Customer.Name, " + "Policy.PolicyID,Policy.Amount " + "FROM Customer, Policy" + "WHERE Customer.CustomerPOID = Policy.CustomerPOID " + "AND Customer.CustomerPOID = ? "); stmt.setLong(1,customerPOID); ResultSet rs = stmt.executeQuery(); //After code stmt.prepare( "SELECT Customer.CustomerPOID, Customer.Name, " + "Policy.PolicyID,Policy.Amount " + "FROM Customer, Holds, Policy" + "WHERE Customer.CustomerPOID = Holds.CustomerPOID " + "AND Holds.PolicyId = Policy.PolicyId " + "AND Customer.CustomerPOID = ? "); stmt.setLong(1,customerPOID); ResultSet rs = stmt.executeQuery();

Replace Surrogate Key With Natural Key Replace a surrogat e key wit h an exist ing nat ural key. This refact oring is t he opposit e of I nt roduce Surrogat e Key ( page 85 ) .

Motivation There are several reasons t o apply Replace Surrogat e Key wit h Nat ural Key : Re du ce ove r h e a d. When you replace a surrogat e key wit h an exist ing nat ural key, you reduce t he overhead wit hin your t able st ruct ure of m aint aining t he addit ional surrogat e key colum n( s) . To con solida t e you r k e y st r a t e gy. To support Consolidat e Key St rat egy ( page 168) , you m ay decide t o first replace an exist ing surrogat e prim ary key wit h t he " official" nat ural key. Re m ove n on r e qu ir e d k e ys. You m ay have discovered t hat a surrogat e key was int roduced t o a t able when it really was not needed. I t is always bet t er t o rem ove unused indexes t o im prove perform ance.

Potential Tradeoffs Alt hough m any dat a professionals debat e t he use of surrogat e versus nat ural keys, t he realit y is t hat bot h t ypes of keys have t heir place. When you have t ables wit h nat ural keys, each ext ernal applicat ion, as well as t he dat abase it self, m ust access dat a from each t able in it s own unique way. Som et im es, t he key will be a single num eric colum n, som et im es a single charact er colum n, or som et im es a com binat ion of several colum ns. Wit h a consist ent surrogat e key st rat egy, t ables are accessed in t he exact sam e m anner, enabling you t o sim plify your code. Thus, by replacing a surrogat e key wit h a nat ural key, you pot ent ially increase t he com plexit y of t he code t hat accesses your dat abase. The prim ary advant age is t hat you sim plify your t able schem a.

Schema Update Mechanics Applying Replace Surrogat e Key Wit h Nat ural Key can be com plicat ed because of t he coupling t hat t he surrogat e key is pot ent ially involved wit h. Because it is a prim ary key of a t able, it is likely t hat it also form s ( part of) foreign keys wit hin ot her t ables. You will need t o do t he following:

1 . I de n t ify t h e colu m n ( s) t o for m t h e n e w pr im a r y k e y. I n Figure 6.16, t his is St at eCode. ( This could be several colum ns.) St at eCode m ust have a unique value wit hin each row for it t o qualify t o be a prim ary key.

Figu r e 6 .1 6 . Re pla cin g a su r r oga t e k e y w it h a n a t u r a l k e y. [View full size image]

2 . Add a n e w in de x . I f one does not already exist , a new index based on St at eCode needs t o be int roduced for St at e. 3 . D e pr e ca t e t h e or igin a l colu m n . St at ePOI D m ust be m arked for delet ion at t he end of t he t ransit ion period. 4 . Upda t e cou ple d t a ble s. I f St at ePOI D is used in ot her t ables as part of a foreign key, you will want t o updat e t hose t ables t o use t he new key. You m ust rem ove t he colum n( s) using Drop Colum n ( page 172) , which current ly corresponds t o St at ePOI D. You also need t o add new colum n( s) t hat correspond t o St at eCode if t hose colum ns do not already exist . The corresponding index definit ion( s) need t o be updat ed t o reflect t his change. When St at ePOI D is used in m any t ables, you m ay want t o consider updat ing t he t ables one at a t im e t o sim plify t he effort . 5 . Upda t e a n d possibly a dd RI t r igge r s. Any t riggers t hat exist t o m aint ain referent ial int egrit y bet ween t ables m ust be updat ed t o work wit h t he corresponding St at eCode values in t he ot her t ables.

Figure 6.16 depict s how t o rem ove St at e.St at ePOI D, a surrogat e key, replacing it wit h t he exist ing St at e.St at eCode as key. To support t his new key, we m ust add t he Populat eSt at eCode t rigger, which is invoked whenever an insert occurs in Address, obt aining t he value of St at e.St at eCode. CREATE OR REPLACE TRIGGER PopulateStateCode

BEFORE INSERT ON Address REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF :NEW.StateCode IS NULL THEN :NEW.StateCode := getStatePOIDFromState(StatePOID); END IF; END; / ALTER TABLE Address ADD (CONSTRAINT AddressToStateForeignKey FOREIGN KEY (StateCode) REFERENCES State; June 14 2007 ALTER TABLE Address DROP CONSTRAINT AddressToStateForeignKey; ALTER TABLE State DROP CONSTRAINT StatePrimaryKey; ALTER TABLE State MODIFY StateCode NOT NULL; ALTER TABLE State ADD CONSTRAINT StatePrimaryKey PRIMARY KEY (StateCode); DROP TRIGGER PopulateStateCode;

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics To updat e ext ernal access program s, you need t o do t he following: 1 . Re m ove su r r oga t e k e y code . The code t o assign values t o t he surrogat e key colum n ( which m ay be im plem ent ed eit her wit hin ext ernal applicat ions or t he dat abase) should no longer be invoked. I t m ay not even be needed any longer at all. 2 . Join in g ba se d on t h e n e w k e y. Many ext ernal access program s will define j oins involving St at e, im plem ent ed eit her via hard- coded SQL or via m et a dat a. These j oins should be refact ored t o work wit h St at eCode, not St at ePOI D. 3 . Re t r ie va ls ba se d on t h e n e w k e y. Som e ext ernal program s will t raverse t he dat abase one or m ore rows at a t im e, ret rieving dat a based on t he key values. These ret rievals m ust be updat ed sim ilarly.

The following hibernat e m appings show how t he referenced t ables m ust refer t o t he new keys, and how t he POI D colum ns are no longer generat ed: //Before mapping

//After mapping

Split Column Split a colum n int o one or m ore colum ns wit hin a single t able.

N ot e I f one or m ore of t he new colum ns needs t o appear in anot her t able, first apply Split Colum n and t hen app Colum n ( page 103) .

Motivation

There are t wo reasons why you m ay want t o apply Split Colum n. First , you have a need for fine- grained dat a. F Cust om er t able has a Nam e colum n, which cont ains t he full nam e of t he person, but you want t o split t his colum st ore First Nam e, MiddleNam e, and Last Nam e as independent colum ns.

Second, t he colum n has several uses. The original colum n was int roduced t o t rack t he Account st at us, and now using it t o t rack t he t ype of Account . For exam ple, t he Account .St at us colum n cont ains t he st at us of t he accoun Closed, OverDrawn, and so on) . Unknowingly, som eone else has also st art ed using it for account t ype inform at i Checking, Savings, and so on. We need t o split t hese usages int o t heir own fields t o avoid int roduct ion of bugs b usage.

Potential Tradeoffs

This dat abase refact oring can result in duplicat ion of dat a when you split colum ns. When you split a colum n t hat used for different purposes, you run t he risk t hat you should in fact be using t he new colum ns for sam e t hings. discover t hat you need t o apply Merge Colum ns.) The usage of a colum n should det erm ine whet her it should be

Schema Update Mechanics

To perform Split Colum n, you m ust first int roduce t he new colum ns. Add t he colum n t o t he t able via t he SQL co COLUM N . I n Figure 6.17, t his is First Nam e, MiddleNam e, and Last Nam e. This st ep is opt ional because you m ay use one of t he exist ing colum ns int o which t o split t he dat a. Then you m ust int roduce a synchronizat ion t rigger colum ns rem ain in sync wit h one anot her. The t rigger m ust be invoked by any change t o t he colum ns.

Figu r e 6 .1 7 . Split t in g t h e Cu st om e r .N a m e colu m n .

Figure 6.17 depict s an exam ple where t he Cust om er t able init ially st ores t he nam e of a person in t he colum n we have discovered t hat few applicat ions are int erest ed in t he full nam e, but inst ead need com ponent s of t he fu part icular t he last nam e of t he cust om er. We have also discovered t hat m any applicat ions include duplicat e code Nam e colum n, a source of pot ent ial bugs. Therefore, we have decided t o split t he Nam e colum n int o First Nam e, Last Nam e colum ns, reflect ing t he act ual usage of t he dat a. We int roduced t he SynchronizeCust om erNam e values in t he colum ns synchronized. The following code im plem ent s t he changes t o t he schem a: [View full width]ALTER TABLE Customer ADD FirstName VARCHAR(40); COMMENT ON Customer.FirstName 'Added as the result of splitting Customer.Name finaldate = December 14 2007'; ALTER TABLE Customer ADD MiddleName VARCHAR(40); COMMENT ON Customer.MiddleName 'Added as the result of splitting Customer.Name finaldate = December 14 2007'; ALTER TABLE Customer ADD LastName VARCHAR(40); COMMENT ON Customer.LastName 'Added as the result of splitting Customer.Name finaldate = December 14 2007'; COMMENT ON Customer.Name 'Replaced with FirstName, LastName and MiddleName, will be dropped December 14 2007'; Trigger to keep all the split columns in sync CREATE OR REPLACE TRIGGER SynchronizeCustomerName BEFORE INSERT OR UPDATE ON Customer REFERENCING OLD AS OLD NEW AS NEW

FOR EACH ROW DECLARE BEGIN IF :NEW.FirstName IS NULL THEN :NEW.FirstName := getFirstName(Name); END IF; IF :NEW.MiddleName IS NULL THEN :NEW.MiddleName := getMiddleName(Name); END IF; IF :NEW.LastName IS NULL THEN :NEW.LastName := getLastName(Name); END IF; END; / On December 14 2007 ALTER TABLE Customer DROP COLUMN Name; DROP TRIGGER SynchronizeCustomerName;

Data-Migration Mechanics

You m ust copy all t he dat a from t he original colum n( s) int o t he split colum nsin t his case, from Cust om er.Nam e MiddleNam e, and, Last Nam e. The following code depict s t he DML t o init ially split t he dat a from Nam e int o t he t h ( The source code for t he t hree st ored funct ions t hat are invoked are not shown for t he sake of brevit y.)

[View full width]/*One-time migration of data from Customer.Name to Customer.FirstName, Custo , and Customer.LastName. When both set of columns are active, there is a need to have a trigger tha keeps both set of columns in sync */ UPDATE Customer SET FirstName = getFirstName(Name); UPDATE Customer SET MiddleName = getMiddleName(Name); UPDATE Customer SET LastName = getLastName(Name);

Access Program Update Mechanics

You need t o analyze t he access program s t horoughly, and t hen updat e t hem appropriat ely, during t he t ransit ion addit ion t o t he obvious need t o work wit h First Nam e, MiddleNam e, and Last Nam e rat her t han t he form er colum updat es you need t o m ake are as follows:

1 . Re m ove split t in g code . There m ay be code t hat split s t he exist ing colum ns int o a dat a at t ribut e sim ilar t o colum ns. This code should be refact ored and pot ent ially rem oved ent irely.

2 . Upda t e da t a - va lida t ion code t o w or k w it h split da t a . Som e dat a- validat ion code m ay exist t hat exist s the colum ns have not been split . For exam ple, if a value is st ored in t he Cust om er.Nam e colum n, you m ay h code in place t hat verifies t hat t he values cont ain t he First Nam e and Last Nam e. Aft er t he colum n is split , t h longer be a need for t his code. 3 . Re fa ct or t h e u se r in t e r fa ce . Aft er t he original colum n is split , t he present at ion layer should m ake use of dat a, if it was not doing so already, as appropriat e.

The following code shows how t he applicat ion m akes use of t he finer- grained dat a available t o it : //Before code

public Customer findByCustomerID(Long customerID) { Customer customer = new Customer(); stmt = DB.prepare("SELECT CustomerID, "+ "Name, PhoneNumber " + "FROM Customer " + "WHERE CustomerID = ?"); stmt.setLong(1, customerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customer.setCustomerId(rs.getLong("CustomerID")); String name = rs.getString("Name"); customer.setFirstName(getFirstName(name)); customer.setMiddleName(getMiddleName(name)); customer.setLastName(getMiddleName(name)); customer.setPhoneNumber(rs.getString("PhoneNumber")); } return customer; } //After code public Customer findByCustomerID(Long customerID) { Customer customer = new Customer(); stmt = DB.prepare("SELECT CustomerID, "+ "FirstName, MiddleName, LastName, PhoneNumber " + "FROM Customer " + "WHERE CustomerID = ?"); stmt.setLong(1, customerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { customer.setCustomerId(rs.getLong("CustomerID")); customer.setFirstName(rs.getString("FirstName")); customer.setMiddleName(rs.getString("MiddleName")); customer.setLastName(rs.getString("LastName")); customer.setPhoneNumber(rs.getString("PhoneNumber")); } return customer; }

Split Table Vert ically split ( for exam ple, by colum ns) an exist ing t able int o one or m ore t ables.

N ot e

I f t he dest inat ion of t he split colum ns happens t o be an exist ing t able, t hen in realit y you would be applying Colum n( s) ( page 103) . To split a t able horizont ally ( for exam ple, by rows) , apply Move Dat a ( page 192

Motivation There are several reasons why you m ay want t o apply Split Table:

Pe r for m a n ce im pr ove m e n t . I t is very com m on for m ost applicat ions t o require a core collect ion of dat a any given ent it y, and t hen a specific subset of t he noncore dat a at t ribut es. For exam ple, t he core colum ns Em ployee t able would include t he colum ns required t o st ore t heir nam e, address, and phone num bers; wh colum ns would include t he Pict ure colum n as well as salary inform at ion. Because Em ployee.Pict ure is large only by a few applicat ions, you would want t o consider split t ing it off int o it s own t able. This would help t o ret rieval access t im es for applicat ions t hat select all colum ns from t he Em ployee t able yet do not require t h

Re st r ict da t a a cce ss. You m ay want t o rest rict access t o som e colum ns, perhaps t he salary inform at ion Em ployee t able, by split t ing it off int o it s own t able and assigning specific securit y access cont rol ( SAC) rul

Re du ce r e pe a t in g da t a gr ou ps ( a pply 1 N F) . The original t able m ay have been designed when requirem yet finalized, or by people who did not appreciat e why you need t o norm alize dat a st ruct ures ( Dat e 2001 For exam ple, t he Em ployee t able m ay st ore descript ions of t he five previous evaluat ion reviews for t he per inform at ion is a repeat ing group t hat you would want t o split off int o an Em ployee- Evaluat ion t able.

Potential Tradeoffs

When you split a t able t hat ( you believe) is used for different purposes, you run t he risk t hat you would in fact b new t ables for sam e t hings; if so, you will discover t hat you need t o apply Merge Tables ( page 96 ) . The usage o det erm ine whet her it should be split .

Schema Update Mechanics

To perform Split Table, you m ust first add t he t able( s) via t he SQL com m and CREATE TABLE. This st ep is opt io you m ay find it possible t o use an exist ing t able( s) int o which t o m ove t he colum ns. I n t his sit uat ion, you should apply t he Move Colum n refact oring ( page 103) . Second, you m ust int roduce a t rigger t o ensure t hat t he colum n synchronized wit h one anot her. The t rigger m ust be invoked by any change t o t he t ables. You need t o im plem en t hat cycles do not occur.

Figure 6.18 depict s an exam ple where t he Address t able init ially st ores t he address inform at ion along wit h t he s st at e nam e. To reduce dat a duplicat ion, we have decided t o split Address t able int o Address and St at e t ables, re current refact or of t he t able design. We int roduced t he SynchronizeWit hAddress and SynchronizeWit hSt at e values in t he t ables synchronized:

Figu r e 6 .1 8 . Split t in g t h e Addr e ss t a ble . [View full size image]

[View full width]CREATE TABLE State ( StateCode VARCHAR(2) NOT NULL, Name VARCHAR(40) NOT NULL, CONSTRAINT PKState PRIMARY KEY (StateCode) ); COMMENT ON State.Name 'Added as the result of splitting Address into Address and State drop date = December 14 2007'; Trigger to keep all the split tables in sync CREATE OR REPLACE TRIGGER SynchronizeWithAddress BEFORE INSERT OR UPDATE ON Address REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN FindOrCreateState; END IF; IF inserting THEN CreateState; END IF; END; / CREATE OR REPLACE TRIGGER SynchronizeWithState BEFORE UPDATE OF Statename

ON State REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF updating THEN FindAndUpdateAllAddressesForStateName; END IF; END; / On December 14 2007 ALTER TABLE Address DROP COLUMN StateName; DROP TRIGGER SynchronizeWithAddress; DROP TRIGGER SynchronizeWithState;

Data-Migration Mechanics

You m ust copy all t he dat a from t he original colum n( s) int o t he new t able's colum ns. I n t he case of Figure 6.18 from Address.St at eCode and Address.St at eNam e int o St at e.St at eCode and St at e.Nam e, respect ively. The follow how t o init ially m igrat e t his dat a:

[View full width]/*One-time migration of data from Address.StateCode and Address.StateName to both set of columns are active, there is a need to have a trigger that keeps both set of columns in sync */ INSERT INTO State (StateCode, Name) SELECT StateCode,StateName FROM Address WHERE StateCode IS NOT NULL AND StateName IS NOT NULL GROUP BY StateCode, StateName;

Access Program Update Mechanics

You m ust analyze t he access program s t horoughly, and t hen updat e t hem appropriat ely, during t he t ransit ion p addit ion t o t he obvious need t o work wit h t he new colum ns rat her t han t he form er colum ns, pot ent ial updat es y m ake are as follows:

1 . I n t r odu ce n e w t a ble m e t a da t a . I f you are using a m et a dat a- based persist ence fram ework, you m ust in m et a dat a for St at e and change t he m et a dat a for Address. 2 . Upda t e SQL code . Sim ilarly, any em bedded SQL code t hat accesses Address m ust be updat ed t o j oin in appropriat e. This m ay slight ly reduce perform ance of t his code.

3 . Re fa ct or t h e u se r in t e r fa ce . Aft er t he original t able is split , t he present at ion layer should m ake use of t h dat a, if it was not doing so already, as appropriat e.

The following hibernat e m appings show how we split the Address t able and creat e a new St at e t able: //Before mapping

//After mapping //Address table

//State table

Chapter 7. Data Quality Refactorings Dat a qualit y refact orings are changes t hat im prove t he qualit y of t he inform at ion cont ained wit hin a dat abase. Dat a qualit y refact orings im prove and/ or ensure t he consist ency and usage of t he values st ored wit hin t he dat abase. The dat a qualit y refact orings are as follows: Add Lookup Table Apply St andard Codes Apply St andard Type Consolidat e Key St rat egy Drop Colum n Const raint Drop Default Value Drop Non- Nullable Const raint I nt roduce Colum n Const raint I nt roduce Com m on Form at I nt roduce Default Value Make Colum n Non- Nullable Move Dat a Replace Type Code Wit h Propert y Flags

Common Issues When Implementing Data Quality Refactorings Because dat a qualit y refact orings change t he values of t he dat a st ored wit hin your dat abase, several issues are com m on t o all of t hem . As a result , you need t o do t he following: 1 . Fix br ok e n con st r a in t s. You m ay have const raint s defined on t he affect ed dat a. I f so, you can apply Drop Colum n Const raint ( page 172) t o first rem ove t he const raint and t hen apply I nt roduce Colum n Const raint ( page 180) t o add t he const raint back, reflect ing t he values of t he im proved data. 2 . Fix br ok e n vie w s. Views will oft en reference hard- coded dat a values in t heir W H ERE clauses, usually t o select a subset of t he dat a. As a result , t hese views m ay becom e broken when t he values of t he dat a change. You will need t o find t hese broken views by running your t est suit e and by searching for view definit ions t hat reference t he colum ns in which t he changed dat a is st ored. 3 . Fix br ok e n st or e d pr oce du r e s. The variables defined wit hin a st ored procedure, any param et ers passed t o it , t he ret urn value( s) calculat ed by it , and any SQL defined wit hin it are pot ent ially coupled t o t he values of t he im proved dat a. Hopefully, your exist ing t est s will reveal business logic problem s arising from t he applicat ion of any dat a qualit y refact orings; ot herwise, you will need t o search for any st ored procedure code accessing t he colum n( s) in which t he changed dat a is st ored. 4 . Upda t e t h e da t a . You will likely want t o lock t he source dat a rows during t he updat e, affect ing perform ance and availabilit y of t he dat a for t he applicat ion( s) . You can t ake t wo st rat egies t o do t his. First , you can lock all t he dat a and t hen do t he updat es at t hat t im e. Second, you can lock subset s of t he dat a, perhaps even j ust a single row at a t im e, and do t he updat e j ust on t he subset . The first approach ensures consist ency but risks perform ance degradat ion wit h large am ount s of dat aupdat ing m illions of rows can t ake t im e, prevent ing applicat ions from m aking updat es during t his period. The second approach enables applicat ions t o work wit h t he source dat a during t he updat e process but risks inconsist ency bet ween rows because som e will have t he older, " low- qualit y" values, whereas ot her rows will have been updat ed.

Add Lookup Table Creat e a lookup t able for an exist ing colum n.

Motivation There are several reasons why you m ay want t o apply Add Lookup Table: I n t r odu ce r e fe r e n t ia l in t e gr it y. You m ay want t o int roduce a referent ial int egrit y const raint on an exist ing Address.St at e t o ensure t he qualit y of t he dat a. Pr ovide code look u p. Many t im es you want t o provide a defined list of codes in your dat abase inst ead of having an enum erat ion in every applicat ion. The lookup t able is oft en cached in m em ory. Re pla ce a colu m n con st r a in t . When you int roduced t he colum n, you added a colum n const raint t o ensure t hat a sm all num ber of correct code values persist ed. But , as your applicat ion( s) evolved, you needed t o int roduce m ore code values, unt il you got t o t he point where it was easier t o m aint ain t he values in a lookup t able inst ead of updat ing t he colum n const raint . Pr ovide de t a ile d de scr ipt ion s. I n addit ion t o defining t he allowable codes, you m ay also want t o st ore descript ive inform at ion about t he codes. For exam ple, in t he St at e t able, you m ay want t o relat e t he code CA t o California.

Potential Tradeoffs There are t wo issues t o consider when adding a lookup t able. The first is dat a populat ionyou need t o be able t o provide valid dat a t o populat e t he lookup t able. Alt hough t his sounds sim ple in pract ice, t he im plicat ion is t hat you m ust have an agreem ent as t o t he sem ant ics of t he exist ing dat a values in Address.St at e of Figure 7.1 . This is easier said t han done. For exam ple, in t he case of int roducing a St at e lookup t able, som e applicat ions m ay work wit h all 50 U.S. st at es, whereas ot hers m ay also include t he four t errit ories ( Puert o Rico, Guam , t he Dist rict of Colum bia, and t he U.S. Virgin I slands) . I n t his sit uat ion, you m ay eit her need t o add t wo lookup t ables, one for t he 50 st at es and t he ot her for t he t errit ories, or im plem ent a single t able and t hen t he appropriat e validat ion logic wit hin applicat ions t hat only need a subset of t he lookup dat a.

Figu r e 7 .1 . Addin g a St a t e look u p t a ble .

The second issue is t hat t here will be a perform ance im pact result ing from t he addit ion of a foreign key const raint . ( See t he refact oring Add Foreign Key Const raint on page 204 for det ails.)

Schema Update Mechanics As depict ed in Figure 7.1 , t o updat e t he dat abase schem a, you m ust do t he following: 1 . D e t e r m in e t h e t a ble st r u ct u r e . You m ust ident ify t he colum n( s) of t he lookup t able (St at e) . 2 . I n t r odu ce t h e t a ble . Creat e St at e in t he dat abase via t he CREATE TABLE com m and. 3 . D e t e r m in e look u p da t a . You have t o det erm ine what rows are going t o be insert ed in t he St at e. Consider applying t he I nsert Dat a refact oring ( page 296) . 4 . I n t r odu ce r e fe r e n t ia l con st r a in t . To enforce referent ial int egrit y const raint s from t he code colum n in t he source t able( s) t o St at e, you m ust apply t he Add Foreign Key refact oring.

The following code depict s t he DDL t o int roduce t he St at e t able and add a foreign key const raint bet ween it and Address: Create the lookup table. CREATE TABLE State ( State CHAR(2) NOT NULL, Name CHAR(50), CONSTRAINT PKState PRIMARY KEY (State) ); Introduce Foreign Key to the lookup table ALTER TABLE Address ADD CONSTRAINT FK_Address_State FOREIGN KEY (State) REFERENCES State;

Data-Migration Mechanics You m ust ensure t hat t he dat a values in Address.St at e have corresponding values in St at e. The easiest way t o populat e St at e.St at e is t o copy t he unique values from Address.St at e. Wit h t his aut om at ed approach, you need t o rem em ber t o inspect t he result ing rows t o ensure t hat invalid dat a values are not int roducedif so, you need t o updat e bot h Address and St at e appropriat ely. When t here

are descript ive inform at ion colum ns, such as St at e.Nam e, you m ust provide t he appropriat e values; t his is oft en done m anually via a script or a dat a- adm inist rat ion ut ilit y. An alt ernat ive st rat egy is t o sim ply load t he values int o t he St at e t able from an ext ernal file. The following code depict s t he DDL t o populat e t he St at e t able wit h dist inct values from t he Address.St at e colum n. We t hen cleanse t he dat a, in t his case ensuring t hat all addresses use t he code TX inst ead of Tx or t x or Texas. The final st ep is t o provide st at e nam es corresponding t o each st at e code. ( I n t he exam ple, we populat e values for j ust t hree st at es.) Populate data in the lookup table INSERT INTO State (State) SELECT DISTINCT UPPER(State) FROM Address; Update the Address.StateCode to valid values and clean data UPDATE Address SET State = 'TX' WHERE UPPER(State) ='TX'; ... Now provide state names UPDATE State SET Name = 'Florida' WHERE State='FL'; UPDATE State SET Name = 'Illinois' WHERE State='IL'; UPDATE State SET Name = 'California' WHERE State='CA';

Access Program Update Mechanics When you add St at e, you have t o ensure t hat ext ernal program s now use t he dat a values from t he lookup t able. The following code shows how ext ernal program s can now get t he nam e of t he st at e from t he St at e t able; in t he past , t hey would have t o get t his inform at ion from an int ernally hardcoded collect ion: // After code ResultSet rs = statement.executeQuery( "SELECT State, Name FROM State");

Som e program s m ay choose t o cache t he dat a values, whereas ot hers will access St at e as neededcaching works well because t he values in St at e rarely change. Furt herm ore, if you are also int roducing a foreign key const raint along wit h Lookup Table, ext ernal program s will need t o handle any except ions t hrown by t he dat abase. See t he Add Foreign Key refact oring ( page 204) for det ails.

Apply Standard Codes Apply a st andard set of code values t o a single colum n t o ensure t hat it conform s t o t he values of sim ilar colum ns st ored elsewhere in t he dat abase.

Motivation You m ay need t o apply Apply St andard Codes t o do t he following: Cle a n se da t a . When you have t he sam e sem ant ic m eaning for different code values in your dat abase, it is generally bet t er t o st andardize t hem so t hat you can apply st andard logic across all dat a at t ribut es. For exam ple, in Figure 7.2 , when t he values in Count ry.Code is USA and Address.Count ryCode is US, you have a pot ent ial problem because you can no longer accurat ely j oin t he t wo t ables. Apply a consist ent value, one or t he ot her, t hroughout your dat abase.

Figu r e 7 .2 . Applyin g st a n da r d st a t e code s. [View full size image]

Su ppor t r e fe r e n t ia l in t e gr it y. When you want t o apply Add Foreign Key Const raint ( page 204) t o t ables based on t he code colum n, you need t o st andardize t he code values first . Add a look u p t a ble . When you are applying Add Lookup Table ( page 153) , you oft en need t o first st andardize t he code values on which t he lookup is based. Con for m t o cor por a t e st a n da r ds. Many organizat ions have det ailed dat a and dat a m odeling st andards t hat developm ent t eam s are expect ed t o conform t o. Oft en when applying Use Official Dat a Source ( page 271) , you discover t hat your current dat a schem a does not follow your organizat ion's st andards and t herefore needs t o be refact ored t o reflect t he official dat a source code values. Re du ce code com ple x it y. When you have a variet y of values for t he sem ant ically sam e dat a, you will be writ ing ext ra program code t o deal wit h t he different values. For exam ple, your exist ing program code of count ryCode = 'US' | | count ryCode = 'USA' . . . would be sim plified t o som et hing like count ry Code = 'USA'.

Potential Tradeoffs St andardizing code values can be t ricky because t hey are oft en used in m any places. For exam ple,

several t ables m ay use t he code value as a foreign key t o anot her t able; t herefore, not only does t he source need t o be st andardized but so do t he foreign key colum ns. Second, t he code values m ay be hard- coded in one or m ore applicat ions, requiring ext ensive updat es. For exam ple, applicat ions t hat access t he Count ry t able m ay have t he value USA hard- coded in SQL st at em ent s, whereas applicat ions t hat use t he Address t able have US hard- coded.

Schema Update Mechanics To Apply St andard Codes t o t he dat abase schem a, you m ust do t he following: 1 . I de n t ify t h e st a n da r d va lu e s. You n e e d t o se t t le on t h e " officia l" va lu e s for t h e code . Are t he values being provided from exist ing applicat ion t ables or are t hey being provided by your business users? Eit her way, t he values m ust be accept ed by your proj ect st akeholder( s) . 2 . I de n t ify t h e t a ble s w h e r e t h e code is st or e d. You m ust ident ify t he t ables t hat include t he code colum n. This m ay require ext ensive analysis and m any it erat ions before you discover all t he t ables where t he code is used. Not e t hat t his refact oring is applied a single colum n at a t im e; you will pot ent ially need t o apply it several t im es t o ensure consist ency across your dat abase. 3 . Upda t e st or e d pr oce du r e s. When you st andardize code values, t he st ored procedures t hat access t he affect ed colum ns m ay need t o be updat ed. For exam ple, if get USCust om erAddress has the W H ERE clause as Address.Count ryCode= " USA" , t his needs t o change t o Address.Count ryCode= "US".

Data-Migration Mechanics When you st andardize on a part icular code, you m ust updat e all t he rows where t here are nonst andard codes t o use t he st andard ones. I f you are updat ing sm all num bers of rows, a sim ple SQL script t hat updat es t he t arget t able( s) is sufficient . When you have t o updat e large am ount s of dat a, or in cases where t he code in t ransact ional t ables is being changed, apply Updat e Dat a ( page 310) inst ead. The following code depict s t he DML t o updat e dat a in t he Address and Count ry t ables t o use t he st andard code values: UPDATE Address SET CountryCode = 'CA' WHERE CountryCode = 'CAN'; UPDATE Address SET CountryCode = 'US' WHERE CountryCode = 'USA'; UPDATE Country SET CountryCode = 'CA' WHERE CountryCode = 'CAN'; UPDATE Country SET CountryCode = 'US' WHERE CountryCode = 'USA';

Access Program Update Mechanics When Apply St andard Codes is applied, t he following aspect s of ext ernal program s m ust be exam ined:

1 . H a r d- code d W H ERE cla u se s. You m ay need t o updat e SQL st at em ent s t o have t he correct values in t he W H ERE clause. For exam ple, if t he Count ry.Code row values changes from 'US' t o 'USA', you will have t o change your W H ERE clause t o use t he new value. 2 . Va lida t ion code . Sim ilarly, you m ay need t o updat e source code t hat validat es t he values of dat a at t ribut es. For exam ple, code t hat looks like count ryCode = 'US' m ust be updat ed t o use t he new code value. 3.

3 . Look u p con st r u ct s. The values of codes m ay be defined in various program m ing " lookup const ruct s" such as const ant s, enum erat ions, and collect ions for use t hroughout an applicat ion. The definit ion of t hese lookup const ruct s m ust be updat ed t o use t he new code values. 4 . Te st code . Code values are oft en hard- coded int o t est ing logic and/ or t est dat a generat ion logic; you will have t o change t hese t o now use t he new values. The following code shows you t he before and aft er st at e of t he m et hod t o read U.S. addresses: // Before code stmt = DB.prepare("SELECT addressId, line1, line2, city, state, zipcode, country FROM address WHERE countrycode = ?"); stmt.setString(1,"USA"); stmt.execute(); ResultSet rs = stmt.executeQuery(); //After code stmt = DB.prepare("SELECT addressId, line1, line2, city, state, zipcode, country FROM address WHERE countrycode = ?"); stmt.setString(1,"US"); stmt.execute(); ResultSet rs = stmt.executeQuery();

Apply Standard Type Ensure t hat t he dat a t ype of a colum n is consist ent wit h t he dat a t ype of ot her sim ilar colum ns wit hin t he dat abase.

Motivation The refact oring Apply St andard Types can be used t o do t he following: En su r e r e fe r e n t ia l in t e gr it y. When you want t o apply Add Foreign Key ( page 204) t o all t ables st oring t he sam e sem ant ic inform at ion, you need t o st andardize t he dat a t ypes of t he individual colum ns. For exam ple, Figure 7.3 shows how all t he phone num ber colum ns are refact ored t o be st ored as int egers. Anot her com m on exam ple occurs when you have Address.ZipCode st ored as a VARCH AR dat a t ype and Cust om er.Zip st ored as N UM ERI C dat a t ype; you should st andardize on one dat a t ype so t hat you can apply referent ial int egrit y const raint s. Add a look u p t a ble . When you are applying Add Lookup Table ( page 153) , you will want a consist ent t ype used for t he t wo code colum ns. Con for m t o cor por a t e st a n da r ds. Many organizat ions have det ailed dat a and dat a m odeling st andards t hat developm ent t eam s are expect ed t o conform t o. Oft en, when applying Use Official Dat a Source ( page 271) , you discover t hat your current dat a schem a does not follow your st andards and t herefore needs t o be refact ored t o reflect t he official dat a source t ype. Re du ce code com ple x it y. When you have a variet y of dat a t ypes for sem ant ically t he sam e dat a, you require ext ra program code t o handle t he different colum n t ypes. For exam ple, phone num ber- validat ion code for t he Cust om er, Branch , and Em ployee, classes could be refact ored t o use a shared m et hod.

Figu r e 7 .3 . Applyin g st a n da r d da t a t ype s in Cu st om e r , Br a n ch , a n d Em ploye e t a ble s. [View full size image]

Potential Tradeoffs St andardizing dat a t ypes can be t ricky because t he individual colum ns are oft en referred t o in m any places. For exam ple, several applicat ion classes m ay reference t he colum n, code t hat will need t o change when you change t he colum n's dat a t ype. Second, when you are at t em pt ing t o change t he dat a t ype of a colum n, you m ay have sit uat ions when t he source dat a cannot be convert ed t o t he dest inat ion dat a t ype. For exam ple, when you change Cust om er.Zip t o N UM ERI C dat a t ype, you m ay not be able t o convert int ernat ional post al codes such as R2D 2C3 t hat cont ain charact er dat a.

Schema Update Mechanics To apply t his refact oring, you m ust first ident ify t he st andard dat a t ype. You need t o set t le on t he " official" dat a t ype for t he colum ns. This dat a t ype m ust handle all exist ing dat a and ext ernal access program s m ust be able t o handle it . ( Older languages som et im es cannot process new dat a t ypes.) Then you m ust ident ify t he t ables t hat include colum n( s) where t he dat a t ype needs t o change. This m ay require ext ensive analysis and m any it erat ions before you find out all t he t ables where t he colum n needs t o change. Not e t hat t his refact oring is applied t o a single colum n at a t im e; t herefore, you will pot ent ially need t o apply it several t im es t o ensure consist ency across your dat abase. Figure 7.3 depict s how we change t he Branch.Phone, Branch.FaxNum ber , and Em ployee.PhoneNum ber colum ns t o use t he sam e dat a t ype of int egert he colum n Cust om er.PhoneNum ber is already an int eger and t herefore does not need t o be refact ored. Alt hough t his would really be t hree individual refact orings, because t he t hree colum ns appear in t he sam e t able it behooves us t o bundle t hese t hree refact orings t oget her. The following code depict s t he t hree refact orings required t o change t he Branch.Phone, Branch.FaxNum ber, and Em ployee.Phone colum ns. We are going t o add a new colum n t o t he t ables using I nt roduce New Colum n ( page 301) . Because we want t o provide som e t im e for all t he applicat ions t o m igrat e t o t he new colum ns, during t his t ransit ion phase we are going t o m aint ain bot h t he colum ns and also synchronize t he dat a in t hem : ALTER TABLE Branch ADD COLUMN PhoneNumber INT; COMMENT ON Branch.PhoneNumber "Replaces Phone, dropdate=2007-03-27" ALTER TABLE Branch ADD COLUMN FaxNo INT; COMMENT ON Branch.FaxNo "Replaces FaxNumber, dropdate=2007-03-27" ALTER TABLE Employee ADD PhoneNo INT; COMMENT ON Employee.PhoneNo "Replaces PhoneNumber, dropdate=2007-03-27"

The following code depict s how t o synchronize changes in t he Branch.Phone, Branch.FaxNum ber , and Em ployee.Phone colum ns wit h t he exist ing colum ns: CREATE OR REPLACE TRIGGER SynchronizeBranchPhoneNumbers BEFORE INSERT OR UPDATE ON Branch REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF :NEW.PhoneNumber IS NULL THEN :NEW.PhoneNumber := :NEW.Phone;

END IF; IF :NEW.Phone IS NULL THEN :NEW.Phone := :NEW.PhoneNumber; END IF; IF :NEW.FaxNumber IS NULL THEN :NEW.FaxNumber := :NEW.FaxNo; END IF; IF :NEW.FaxNo IS NULL THEN :NEW.FaxNo := :NEW.FaxNumber; END IF; END; / CREATE OR REPLACE TRIGGER SynchronizeEmployeePhoneNumbers BEFORE INSERT OR UPDATE ON Employee REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF :NEW.PhoneNumber IS NULL THEN :NEW.PhoneNumber := :NEW.PhoneNo; END IF; IF :NEW.PhoneNo IS NULL THEN :NEW.PhoneNo := :NEW.PhoneNumber; END IF; END; / Update Existing data for the first time UPDATE Branch SET PhoneNumber = formatPhone(Phone), FaxNo = formatPhone(FaxNumber); UPDATE Employee SET PhoneNo = formatPhone(PhoneNumber); Drop the old columns on Mar 23 2007 ALTER TABLE Branch DROP COLUMN Phone; ALTER TABLE Branch DROP COLUMN FaxNumber; ALTER TABLE Employee DROP COLUMN PhoneNumber; DROP TRIGGER SynchronizeBranchPhoneNumbers; DROP TRIGGER SynchronizeEmployeePhoneNumbers;

Data-Migration Mechanics When sm all num bers of rows need t o be convert ed, you will likely find t hat a sim ple SQL script t hat convert s t he t arget colum n( s) is sufficient . When you want t o convert large am ount s of dat a, or you have com plicat ed dat a conversions, consider applying Updat e Dat a ( page 310) .

Access Program Update Mechanics When Apply St andard Type is applied, t he ext ernal program s should be updat ed in t he following m anner:

1 . Ch a n ge a pplica t ion va r ia ble s da t a t ype . You need t o change t he program code so t hat it s dat a t ypes m at ch t he dat a t ype of t he colum n. 2 . D a t a ba se in t e r a ct ion code . The code t hat saves, delet es, and ret rieves dat a from t his colum n m ust be updat ed t o work wit h t he new dat a t ype. For exam ple, if t he Cust om er.Zip has changed from a charact er t o num eric dat a t ype, you m ust change your applicat ion code from cust om erGat eway.get St ring( " ZI P" ) t o cust om erGat eway.get Long ( " ZI P" ). 3 . Bu sin e ss logic code . Sim ilarly, you m ay need t o updat e applicat ion code t hat works wit h t he colum n. For exam ple, com parison code such as Branch.Phone = 'XXX- XXXX' m ust be updat ed t o look like Branch.Phone = XXXXXXX.

The following code snippet shows t he before and aft er st at e of t he class t hat finds t he Branch row for a given BranchI D when we change dat a t ype of PhoneNum ber t o a Lon g from a St r in g: // Before code stmt = DB.prepare("SELECT BranchId, Name, PhoneNumber, " "FaxNumber FROM branch WHERE BranchId = ?"); stmt.setLong(1,findBranchId); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { rs.getLong("BranchId"); rs.getString("Name"); rs.getString("PhoneNumber"); rs.getString("FaxNumber"); } //After code stmt = DB.prepare("SELECT BranchId, Name, PhoneNumber, "+ "FaxNumber FROM branch WHERE branchId = ?"); stmt.setLong(1,findBranchId); stmt.execute(); ResultSet rs = stmt.executeQuery(); if (rs.next()) { rs.getLong("BranchId"); rs.getString("Name"); rs.getLong("PhoneNumber"); rs.getString("FaxNumber"); }

Consolidate Key Strategy Choose a single key st rat egy for an ent it y and apply it consist ent ly t hroughout your dat abase.

Motivation Many business ent it ies have several pot ent ial keys. There is usually one or m ore keys wit h business m eaning, and fundam ent ally you can always assign a surrogat e key colum n t o a t able. For exam ple, your Cust om er t able m ay have Cust om erPOI D as t he prim ary key and bot h Cust om erNum ber and SocialSecurit yNum ber alt ernat e/ secondary keys. There are several reasons why you would want t o apply Consolidat e Key St rat egy I m pr ove pe r for m a n ce . You likely have an index for each key t hat t he dat abase needs t o insert , updat e, and delet e in a m anner t hat perform s well. Con for m t o cor por a t e st a n da r ds. Many organizat ions have det ailed dat a and dat a m odeling guidelines t hat developm ent t eam s are expect ed t o conform t o, guidelines t hat indicat e t he preferred keys for ent it ies. Oft en when applying Use Official Dat a Source ( page 271) , you discover t hat your current dat a schem a does not follow your st andards and t herefore needs t o be refact ored t o reflect t he official dat a source code values. I m pr ove code con sist e n cy. When you have a variet y of keys for a single ent it y, you have code t hat im plem ent s access t o t he t able in various ways. This increases t he m aint enance burden for t hose working wit h t hat code because t hey m ust underst and each approach being used.

Potential Tradeoffs Consolidat ing key st rat egies can be difficult . Not only do you need t o updat e t he schem a of Policy in Figure 7.4, you also need t o updat e t he schem a of any t ables t hat include foreign keys t o Policy t hat do not use your chosen key st rat egy. To do t his, you need t o apply Replace Colum n ( page 126) . You m ay also find t hat your exist ing set of keys does not cont ain a palat able opt ion for t he " one, t rue key," and t herefore you need t o apply eit her I nt roduce Surrogat e Key ( page 85 ) or I nt roduce I ndex ( page 248) .

Figu r e 7 .4 . Con solida t in g t h e k e y st r a t e gy for t h e Policy t a ble . [View full size image]

Schema Update Mechanics To im plem ent t his refact oring wit hin your dat abase schem a, you m ust do t he following: 1 . I de n t ify t h e pr ope r k e y. You need t o set t le on t he " official" key colum n( s) for t he ent it y. I deally, t his should reflect your corporat e dat a st andards, if any. 2 . Upda t e t h e sou r ce t a ble sch e m a . The sim plest approach is t o use t he current prim ary key and t o st op using t he alt ernat e keys. Wit h t his st rat egy, you sim ply drop t he indices, if any, support ing t he key. This approach st ill works if you choose t o use one of t he alt ernat e keys rat her t han t he current prim ary key. However, if none of t he exist ing keys work, you m ay need t o apply I nt roduce Surrogat e Key . 3 . D e pr e ca t e t h e u n w a n t e d k e ys. Any exist ing keys t hat are not t o be t he prim ary key, in t his case PolicyNum ber, should be m arked t hat t hey will no longer be used as keys aft er t he t ransit ion period. Not e t hat you m ay want t o ret ain t he uniqueness const raint on t hese colum ns, even t hough you are not going t o use t hem as keys anym ore. 4 . Add a n e w in de x . I f one does not already exist , a new index based on t he official key needs t o be int roduced for Policy via I nt roduce I ndex ( page 248) .

Figure 7.4 shows how we consolidat e t he key st rat egy for Policy t o use only PolicyOI D for t he key. To do t his, the Policy.PolicyNum ber is deprecat ed t o indicat e t hat it will no longer be used as a key as of Decem ber 14, 2007, and PolicyNot es.PolicyOI D is int roduced as a new key colum n t o replace PolicyNot es.PolicyNum ber following code adds t he PolicyNot es.PolicyNum ber colum n: ALTER TABLE PolicyNotes ADD PolicyOID CHAR(16);

The following code is run aft er t he t ransit ion period ends t o drop PolicyNot es.PolicyNum ber and t he index for t he alt ernat e key based on Policy.PolicyNum ber: COMMENT ON Policy 'Consolidation of keys to use only PolicyPOID, dropdate = '

DROP INDEX PolicyIndex2; COMMENT ON PolicyNotes 'Consolidation of keys to use only PolicyPOID, therefore drop the PolicyNumber column, dropdate = ' ALTER TABLE PolicyNotes ADD CONSTRAINT PolicyNotesPolciyOID_PK PRIMARY KEY (PolicyOID, NoteNumber); ALTER TABLE PolicyNotes DROP COLUMN PolicyNumber;

Data-Migration Mechanics Tables wit h foreign keys m aint aining relat ionships t o Policy m ust now im plem ent foreign keys t hat reflect t he chosen key st rat egy. For exam ple, t he PolicyNot es t able originally im plem ent ed a foreign key based on Policy.PolicyNum ber. I t m ust now im plem ent a foreign key based on Policy.PolicyOI D. The im plicat ion is t hat you will need t o pot ent ially apply Replace Colum n ( page 126) t o do so, and t his refact oring requires you t o copy t he dat a from t he source colum n ( t he value in Policy.PolicyOI D) t o t he PolicyNot es.PolicyOI D colum n. The following code set s t he value of t he PolicyNot es.PolicyNum ber colum n: UPDATE PolicyNotes SET PolicyNotes.PolicyOID = Policy.PolicyOID WHERE PolicyNotes.PolicyNumber = Policy.PolicyNumber;

Access Program Update Mechanics When t his refact oring is applied, t he prim ary t ask is t o ensure t hat exist ing SQL st at em ent s use t he official prim ary key colum n in W H ERE clauses t o ensure cont inued good perform ance of j oins. For exam ple, t he before code j oined t he Policy, PolicyNot es, and PolicyDueDiligence t ables by a com binat ion of t he PolicyOI D and PolicyNum ber colum ns. The " aft er code" code j oins t hem based solely on t he PolicyOI D colum n: // Before code stmt.prepare( "SELECT Policy.Note FROM Policy, PolicyNotes " + "WHERE Policy.PolicyNumber = PolicyNotes.PolicyNumber "+ "AND Policy.PolicyOID=?"); stmt.setLong(1,policyOIDToFind); stmt.execute(); ResultSet rs = stmt.executeQuery(); //After code stmt.prepare( "SELECT Policy.Note FROM Policy, PolicyNotes " + "WHERE Policy.PolicyOID = PolicyNotes.PolicyOID "+ "AND Policy.PolicyOID=?"); stmt.setLong(1,policyOIDToFind); stmt.execute(); ResultSet rs = stmt.executeQuery();

Drop Column Constraint Rem ove a colum n const raint from an exist ing t able. When you need t o rem ove a referent ial int egrit y const raint on a colum n, you should consider t he use of Drop Foreign Key Const raint ( page 213) .

Motivation The m ost com m on reason why you want t o apply Drop Colum n Const raint is t hat t he const raint is no longer applicable because of changes in business rules. For exam ple, perhaps t he Address.Count ry colum n is current ly const rained t o t he values of " US" and " Canada" , but now you are also doing business int ernat ionally and t herefore have addresses all over t he world. A second reason is t hat t he const raint is applicable t o only a subset of t he applicat ions t hat access t he colum n, perhaps because of changes in som e applicat ions but not ot hers. For exam ple, som e applicat ions m ay be used in int ernat ional set t ings, whereas ot hers are st ill only Nort h Am ericanbecause t he const raint is no longer applicable t o all applicat ions, it m ust be rem oved from t he com m on dat abase and im plem ent ed in t he applicat ions t hat act ually need it . As a side effect , you m ay see som e m inor perform ance im provem ent s because t he dat abase no longer needs t o do t he work of enforcing t he const raint . A t hird reason is t hat t he colum n has been refact ored, perhaps Apply St andard Codes ( page 157) or Apply St andard Type ( page 162) have been applied t o it and now t he colum n const raint needs t o be dropped, refact ored, and t hen reapplied via I nt roduce Colum n Const raint ( page 180) .

Potential Tradeoffs The prim ary challenge wit h t his refact oring is t hat you pot ent ially need t o im plem ent t he const raint logic in t he subset of applicat ions t hat require it . Because you are im plem ent ing t he sam e logic in several places, you are at risk of im plem ent ing it in different ways.

Schema Update Mechanics To updat e t he dat abase schem a, you m ust sim ply drop t he const raint from Balance via t he DROP CON STRAI N T clause of t he ALTER TABLE SQL com m and. Figure 7.5 depict s an exam ple where t he const raint is t hat t he Account .Balance colum n be posit ive; t his const raint is being rem oved t o support t he abilit y t o allow account s t o be overdrawn. The following code depict s t he DDL t o drop t he const raint on Account .Balance colum n:

Figu r e 7 .5 . D r oppin g t h e colu m n con st r a in t fr om t h e Accou n t t a ble .

ALTER TABLE Account DROP CONSTRAINT Positive_Balance;

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics The access program s working wit h t his colum n will include logic t o handle any errors being t hrown by t he dat abase when t he dat a being writ t en t o t he colum n does not conform t o t he const raint . You need t o rem ove t his code from t he program s.

Drop Default Value Rem ove t he default value t hat is provided by a dat abase from an exist ing t able colum n.

Motivation We oft en use I nt roduce Default Value ( page 186) when we want t he dat abase t o persist som e of t he colum ns wit h dat a, when t hose colum ns are not being assigned any dat a from t he applicat ion. When we no longer have a need for t he dat abase t o insert dat a for som e of t he colum ns, because t he applicat ion is providing t he dat a t hat is needed, in m any cases we m ay no longer want t he dat abase t o persist t he default value, because we want t he applicat ion t o provide t he value for t he colum n in quest ion. I n t his sit uat ion, we m ake use of Drop Default Value refact oring.

Potential Tradeoffs There are t wo pot ent ial challenges t o consider regarding t o t his refact oring. First , t here m ay be unint ended side effect s. Som e applicat ions m ay assum e t hat a default value is persist ed by t he dat abase and t herefore exhibit different behavior now t hat colum ns in new rows t hat form erly would have som e value are null now. Second, you need t o im prove t he except ion handling of ext ernal program s, which m ay be m ore effort t han it is wort h. I f a colum n is non- nullable and dat a is not provided by t he applicat ion, t he dat abase will t hrow an except ion, which t he applicat ion will not be expect ing.

Schema Update Mechanics To updat e t he schem a t o rem ove a default value, you m ust rem ove t he default value from t he colum n of t he t able in t he dat abase using t he M OD I FY clause of t he ALTER TABLE com m and. The following code depict s t he st eps t o rem ove t he default value on t he Cust om er.St at us colum n of Figure 7.6 . From a dat abase point of view, a default value of null is t he sam e as having no default value:

Figu r e 7 .6 . D r oppin g t h e de fa u lt va lu e for t h e Cu st om e r .St a t u s colu m n .

ALTER TABLE Customer MODIFY Status DEFAULT NULL;

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics I f som e access program s depend on t he default value t o be used by t he t able, you eit her need t o add dat a validat ion code t o count eract t he change t o t he t able or consider backing out of t his refact oring. The following code shows how your applicat ion code has t o provide t he value for t he colum n now inst ead of depending on t he dat abase t o provide t he default value: //Before code public void createRetailCustomer (long customerId,String firstName) { stmt = DB.prepare("INSERT into customer" + "(Customerid, FirstName) " + "values (?, ?)"); stmt.setLong(1, customerId); stmt.setString(2, firstName); stmt.execute(); } //After code public void createRetailCustomer (long customerId,String firstName) { stmt = DB.prepare("INSERT into customer" + "(Customerid, FirstName, Status) " + "values (?, ?, ?)"); stmt.setLong(1, customerId); stmt.setString(2, firstName); stmt.setString(3, RETAIL);

stmt.execute(); }

Drop Non-Nullable Change an exist ing non- nullable colum n such t hat it accept s null values.

Figu r e 7 .7 . M a k in g t h e Cit yLook u p.St a t e I D colu m n n u lla ble .

Motivation There are t wo prim ary reasons t o apply t his refact oring. First , your business process has changed so t hat now part s of t he ent it y are persist ed at different t im es. For exam ple, one applicat ion m ay creat e t he ent it y, not assign a value t o t his colum n right away, and anot her applicat ion updat e t he row at a lat er t im e. Second, during t ransit ion periods, you m ay want a part icular colum n t o be nullable. For exam ple, when one of t he applicat ions cannot provide a value for t he colum n because of som e refact oring t hat t he applicat ion is going t hrough, you want t o change t he non- nullable const raint for a lim it ed am ount of t im e during t he t ransit ion phase so t hat t he applicat ion can cont inue working. Lat er on, you will apply Make Colum n Non- Nullable ( page 189) t o revert t he const raint back t o t he way it was.

Potential Tradeoffs Every applicat ion t hat accesses t his colum n m ust be able t o accept a null value, even if it m erely ignores it or m ore likely assum es an int elligent default when it discovers t he colum n is null. I f t here is an int elligent default value, you should consider t he refact oring I nt roduce Default Value ( page 186) , too.

Schema Update Mechanics

To perform t his refact oring you m ust sim ply rem ove t he N OT N ULL const raint from t he colum n. This is done via t he SQL com m and ALTER TABLE M OD I FY COLUM N . The following code shows how t o m ake Cit yLookup.St at eCode nullable: On 2007-06-14 ALTER TABLE CityLookup MODIFY StateCode NULL;

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics You m ust refact or all t he ext ernal program s current ly accessing Cit yLookup.St at eCode so t hat t he code can appropriat ely handle null values. Because null values can be persist ed, t hey can t herefore be ret rieved, im plying t hat we need t o add null value checking code. Furt herm ore, you want t o rework any code t hat checks for null value except ions t hat are no longer being t hrown by t he dat abase. The following code sam ple shows how t o add null checking logic t o applicat ion code: //Before code public StringBuffer getAddressString(Address address) { StringBuffer stringAddress = new StringBuffer(); stringAddress = address.getStreetLine1(); stringAddress.append(address.getStreetLine2()); stringAddress.append(address.getCity()); stringAddress.append(address.getPostalCode()); stringAddress.append(states.getNameFor (address.getStateCode())); return stringAddress; } //After code public StringBuffer getAddressString(Address address) { StringBuffer stringAddress = new StringBuffer(); stringAddress = address.getStreetLine1(); stringAddress.append(address.getStreetLine2()); stringAddress.append(address.getCity()); stringAddress.append(address.getPostalCode()); String statecode = address.getStateCode; if (statecode != null) { stringAddress.append(states.getNameFor (stateCode())); } return stringAddress; }

Introduce Column Constraint I nt roduce a colum n const raint in an exist ing t able. When you have a need t o add a non- nullable const raint on a colum n, you should consider t he use of Make Colum n Non- Nullable refact oring ( page 189) . When you need t o add a referent ial int egrit y const raint on a colum n, you should consider t he use of Add Foreign Key ( page 204) .

Motivation The m ost com m on reason why you want t o apply I nt roduce Colum n Const raint is t o ensure t hat all applicat ions int eract ing wit h your dat abase persist valid dat a in t he colum n. I n ot her words, colum n const raint s are t ypically used t o im plem ent com m on, and relat ively sim ple, dat a- validat ion rules.

Potential Tradeoffs The prim ary issue is whet her t here is t ruly a com m on const raint for t his dat a elem ent across all program s t hat access it . I ndividual applicat ions m ay have t heir own unique version of a const raint for t his colum n. For exam ple, t he Cust om er t able could have a Favorit eColor colum n. Your corporat e st andard m ight be t he values " Red" , " Green" , and " Blue" , yet for com pet it ive reasons one applicat ion m ay allow t he color " Yellow" as a fourt h value, whereas t wo ot her applicat ions will only allow " Red" and " Blue" . You could argue t hat t here should be one consist ent st andard across all applicat ions, but t he realit y is t hat individual applicat ions have good reasons for doing t hings a cert ain way. The im plicat ion is t hat because individual applicat ions will im plem ent t heir own versions of business rules, t hat even in t he best of circum st ances you will discover t hat you will have exist ing dat a wit hin your dat abase t hat does not m eet t he const raint condit ions. One st rat egy is t o writ e a script t hat crawls t hrough your dat abase t ables and t hen report s on any const raint violat ions, ident ifying dat a t o be fixed. These processes will be run in bat ch m ode during nonbusy hours for t he dat abase and t he applicat ion. You m ay also apply t his refact oring because you need t o refact or an exist ing colum n const raint . Refact orings such as Apply St andard Codes ( page 157) and Apply St andard Type ( page 162) could force you t o change t he underlying code value or dat a t ype respect ively of t he colum n, requiring you t o reint roduce t he const raint .

Schema Update Mechanics As depict ed in Figure 7.8 , t o updat e t he dat abase schem a you m ust sim ply int roduce a const raint on t he colum n, in t his case Cust om er.Credit Lim it . As you can see in t he following code, you do t his via the AD D CON STRAI N T clause of t he ALTER TABLE SQL com m and. I n t his case, t he credit lim it m ust be less t han $50,000 as a fraud- prevent ion t act ic:

Figu r e 7 .8 . I n t r odu cin g a con st r a in t t o t h e Cu st om e r .Cr e dit Lim it colu m n .

ALTER TABLE Customer ADD CONSTRAINT Check_Credit_Limit CHECK (CreditLimit < 50000.00);

Data-Migration Mechanics Alt hough not a dat a m igrat ion per se, a significant prim ary challenge wit h t his refact oring is t o m ake sure t hat exist ing dat a conform s t o t he const raint t hat is being applied on t he colum n. You first need t o define t he const raint by working wit h your proj ect st akeholders t o ident ify what needs t o be done t o t he dat a values t hat do not conform t o t he const raint being applied. Then you need t o fix t he source dat a. You m ay need t o apply Updat e Dat a ( page 310) , as appropriat e, t o ensure t hat t he values st ored in t his colum n conform t o t he const raint .

Access Program Update Mechanics You need t o ensure t hat t he access program s can handle any errors being t hrown by t he dat abase when t he dat a being writ t en t o t he colum n does not conform t o t he const raint . You need t o look for every point in t he program s where an insert or updat e is being done t o t he t able, and t hen add t he proper except ion handling code, as shown in t he following code: //Before code stmt = conn.prepare( "INSERT INTO Customer "+ "(CustomerID,FirstName,Status,CreditLimit) "+ "VALUES (?,?,?,?)"); stmt.setLong(1,customerId); stmt.setString(2,firstName); stmt.setString(3,status); stmt.setBigDecimal(4,creditLimit); stmt.executeUpdate(); } //After code stmt = conn.prepare( "INSERT INTO Customer "+ "(CustomerID,FirstName,Status,CreditLimit) "+ "VALUES (?,?,?,?)"); stmt.setLong(1,customerId); stmt.setString(2,firstName); stmt.setString(3,status);

stmt.setBigDecimal(4,creditLimit); try { stmt.executeUpdate(); } catch (SQLException exception){ while (exception != null) { int errorCode = e.getErrorCode(); if (errorCode = 2290) { handleCreditLimitExceeded(); } } }

Introduce Common Format Apply a consist ent form at t o all t he dat a values in an exist ing t able colum n.

Figu r e 7 .9 . Applyin g a com m on for m a t t o t h e Addr e ss.Ph on e N u m be r colu m n .

Motivation You t ypically apply I nt roduce Com m on Form at t o sim plify ext ernal program code. When t he sam e dat a is st ored in different form at s, ext ernal program s require unnecessarily com plex logic t o work wit h your dat a. I t is generally bet t er t o have t he dat a in uniform form at so t hat applicat ions int erfacing wit h your dat abase do not have t o handle m ult iple form at s. For exam ple, when t he values in Cust om er.Phonenum ber are '4163458899', '905- 777- 8889', and '( 990) 345- 6666', every applicat ion accessing t hat colum n m ust be able t o parse all t hree form at s. This problem is even worse when an ext ernal program parses t he dat a in several places, oft en because t he program is poorly designed. A com m on form at enables you t o reduce t he validat ion code dealing wit h Cust om er.Phonenum ber and t o reduce t he com plexit y of your display logicalt hough t he dat a is st ored as 1234567890, it m ight be out put in a report as ( 123) 456- 7890. Anot her com m on reason is t o conform t o your exist ing corporat e st andards. Oft en when applying Use Official Dat a Source ( page 271) , you discover t hat your current dat a form at differs from t hat of t he " official" dat a source, and t herefore your dat a needs t o be reform at t ed for consist ency.

Potential Tradeoffs St andardizing t he form at for t he dat a values can be t ricky when m ult iple form at s exist wit hin t he sam e colum n. For exam ple, t he Cust om er.Phonenum ber colum n could be st ored using 15 different form at syou would need code t hat can det ect and t hen convert t he dat a in each row t o t he st andard form at . Luckily, t he code t o do t his conversion should already exist wit hin one or m ore ext ernal program s.

Schema Update Mechanics To I nt roduce Com m on Form at t o a colum n, you m ust first ident ify t he st andard form at . You need t o set t le on t he " official" form at for t he dat a values. The dat a form at m ay be an exist ing st andard or it

m ay be a new one defined by your business st akeholders, but eit her way t he form at m ust be accept ed by your proj ect st akeholder( s) . Then, you m ust ident ify t he colum n( s) where t he form at needs t o be applied. You apply I nt roduce Com m on Form at one colum n at a t im e; sm aller refact orings are always easier t o im plem ent and t hen deploy int o product ion. You will pot ent ially need t o apply it several t im es t o ensure consist ency across your dat abase.

Data-Migration Mechanics The first st ep is t o ident ify t he various form at s current ly being used wit hin t he colum n, oft en a sim ple SELECT st at em ent will get you t his inform at ion, t o help you underst and what your dat a m igrat ion code needs t o do. The second st ep is t o writ e t he code t o convert t he exist ing dat a int o t he st andard form at . You m ay want t o writ e t his code using eit her st andard SQL dat a m anipulat ion language ( DML) , an applicat ion program m ing language such as Java or C# , or wit h an ext ract - t ransform - load ( ETL) t ool. When sm all num bers of rows need t o be updat ed, a sim ple SQL script t hat updat es t he t arget t able( s) is sufficient . When you want t o updat e a large am ount of dat a, or in cases where t he code in t ransact ional t ables is being changed, apply Updat e Dat a ( page 310) inst ead. The following code depict s t he DML t o updat e dat a in t he Cust om er t able: UPDATE Customer SET PhoneNumber = REPLACE(PhoneNumber,'-',''); UPDATE Customer SET PhoneNumber = REPLACE(PhoneNumber,' ',''); UPDATE Customer SET PhoneNumber = REPLACE(PhoneNumber,'(',''); UPDATE Customer SET PhoneNumber = REPLACE(PhoneNumber,')',''); UPDATE Customer SET PhoneNumber = REPLACE(PhoneNumber,'+1',''); UPDATE Customer SET PhoneNumber = REPLACE(PhoneNumber,'.','');

As you can see, we are updat ing one t ype of form at at a t im e. You can also encapsulat e t he individual changes using a st ored procedure, as shown here: UPDATE Address SET PhoneNumber = FormatPhoneNumber(PhoneNumber);

Access Program Update Mechanics When I nt roduce Com m on Form at is applied, t he following aspect s of t he ext ernal program s m ust be exam ined:

1 . Cle a n sin g code . Your ext ernal program s will cont ain logic t hat accept s t he various form at s and convert s t hem t o t he form at t hat t hey want t o work wit h. You m ay even be able t o use som e of t his exist ing code as t he basis for your dat a m igrat ion code above. 2 . Va lida t ion code . You m ay need t o updat e source code t hat validat es t he values of dat a at t ribut es. For exam ple, code t hat looks for form at s such as Phonenum ber = 'NNN- NNN- NNNN'

m ust be updat ed t o use t he new dat a form at . 3 . D ispla y code . Your user int erface code will oft en include logic t o display dat a elem ent s in a specific form at , oft en one t hat is not being used for st orage ( for exam ple, NNNNNNNNNN for st orage, ( NNN) NNNNNNN for display) . This includes display logic for bot h your screens and report s. 4 . Te st da t a . Your t est dat a, or t est dat a generat ion code, m ust now be changed t o conform t o t he new st andard dat a form at . You m ay also want t o add new t est cases t o t est for dat a rows in inappropriat e form at s.

Introduce Default Value Let t he dat abase provide a default value for an exist ing t able colum n.

Motivation We oft en want t he value of a colum n t o have a default value populat ed when a new row is added t o a t able. However, t he insert st at em ent s m ay not always populat e t hat colum n, oft en because t he colum n has been added aft er t he original insert was writ t en or sim ply because t he applicat ion code t hat is subm it t ing t he insert does not require t hat colum n. Generally, we have found t hat I nt roduce Default Value is useful when we want t o m ake t he colum n non- nullable lat er on ( see t he dat abase refact oring Make Colum n Non- Nullable on page 189) .

Potential Tradeoffs There are several pot ent ial challenges t o consider regarding t his refact oring: I de n t ifyin g a t r u e de fa u lt ca n be difficu lt . When m any applicat ions share t he sam e dat abase, t hey m ay have different default values for t he sam e colum n, oft en for good reasons. Or it m ay sim ply be t hat your business st akeholders cannot agree on a single valueyou need t o work closely wit h t hem t o negot iat e t he correct value. Un in t e n de d side e ffe ct s. Som e applicat ions m ay assum e t hat a null value wit hin a colum n act ually m eans som et hing and will t herefore exhibit different behavior now t hat colum ns in new rows t hat form erly would have been null now are not . Con fu se d con t e x t . When a colum n is not used by an applicat ion, t he default value m ay int roduce confusion over t he colum n's usage wit h t he applicat ion t eam .

Schema Update Mechanics This is a single- st ep refact oring. You m erely use t he SQL ALTER TABLE com m and t o define t he default value for t he colum n. An effect ive dat e for t his refact oring is opt ionally indicat ed t o t ell people when t his default value was first int roduced int o t he schem a. The following code shows how you would int roduce a default value on t he Cust om er.St at us colum n, as depict ed in Figure 7.10:

Figu r e 7 .1 0 . I n t r odu cin g a de fa u lt va lu e on t h e Cu st om e r .St a t u s colu m n .

[View full width]ALTER TABLE Customer MODIFY Status DEFAULT 'NEW'; COMMENT ON Customer.Status 'Default value of NEW will be inserted when there is no data present on the insert as of June 14 2006';

Data-Migration Mechanics The exist ing rows m ay already have null values in t his colum n, rows t hat will not be aut om at ically updat ed as a result of adding a default value. Furt herm ore, t here m ay be invalid values in som e rows, t oo. You need t o exam ine t he values cont ained in t he colum nsim ply looking at a unique list of values m ay sufficet o det erm ine whet her you need t o do any updat es. I f appropriat e, you need t o writ e a script t hat runs t hrough t he t able t o int roduce t he default value t o t hese rows.

Access Program Update Mechanics On t he surface, it seem s unlikely t hat access program s would be affect ed by t he int roduct ion of a default value; however, appearances can be deceiving. Pot ent ial issues you m ay run across include t he following: 1 . I n va r ia n t s a r e br ok e n by t h e n e w va lu e . For exam ple, a class m ay assum e t hat t he value of a color colum n is red, green, or blue, but t he default value has now been defined as yellow. 2 . Code e x ist s t o a pply de fa u lt va lu e s. There m ay now be ext raneous source code t hat checks for a null value and int roduces t he default value program m at ically. This code should be rem oved. 3 . Ex ist in g sou r ce code a ssu m e s a diffe r e n t de fa u lt va lu e . For exam ple, exist ing code m ay look for t he default value of none, which was set program m at ically in t he past , and if found it gives users t he opt ion t o change t he color. Now t he default value is yellow, so t his code will never be invoked.

You need t o analyze t he access program s t horoughly, and t hen updat e t hem appropriat ely, before int roducing a default value t o a colum n.

Make Column Non-Nullable Change an exist ing colum n such t hat it does not accept any null values.

Motivation There are t wo reasons t o apply Make Colum n Non- Nullable. First , you want t o enforce business rules at t he dat abase level such t hat every applicat ion updat ing t his colum n is forced t o provide a value for it . Second, you want t o rem ove repet it ious logic wit hin applicat ions t hat im plem ent a not - null checkif you are not allowed t o insert a null value, you will never receive one.

Potential Tradeoffs Any ext ernal program t hat updat es rows wit hin t he given t able m ust provide a value for t he colum nsom e program s m ay current ly assum e t hat t he colum n is nullable and t herefore not provide such a value. Whenever an updat e or insert ion occurs, you m ust ensure t hat a value is provided, im plying t hat t he ext ernal program s need t o be updat ed and/ or t he dat abase it self m ust provide a valid value. One t echnique we have found useful is t o assign a default value using I nt roduce Default Value ( page 186) for t his colum n.

Schema Update Mechanics As depict ed in Figure 7.11, t o perform Make Colum n Non- Nullable, you m ust sim ply add a N OT N ULL const raint t o t he colum n. This is done via t he SQL com m and ALTER TABLE, as shown in t he following code:

Figu r e 7 .1 1 . M a k in g t h e Cu st om e r .Fir st N a m e colu m n n on - n u lla ble .

ALTER TABLE Customer MODIFY FirstName NOT NULL;

Depict Modles Simply Figure 7.11 shows an int erest ing st yle issueit does not indicat e in t he original schem a t hat First Nam e is nullable. Because it is com m on t o assum e t hat colum ns t hat are not part of t he prim ary key are nullable, it would m erely clut t er t he diagram t o include t he st ereot ype of < < nullable> > on m ost colum ns. Your diagram s are m ore readable when you depict t hem sim ply.

Data-Migration Mechanics You m ay need t o clean t he exist ing dat a because you cannot m ake a colum n non- nullable if t here are exist ing rows wit h a null value in t he colum n. To address t his problem , writ e a script t hat replaces t he rows cont aining a null wit h an appropriat e value. The following code shows how t o clean t he exist ing dat a t o support t he change depict ed in Figure 7.11 . The init ial st ep is t o m ake sure t hat t he First Nam e colum n does not cont ain any null values; if it does, we have t o updat e t he dat a in t he t able: SELECT count(FirstName) FROM Customer WHERE FirstName IS NULL;

I f we do find rows where Cust om er.First Nam e is null, we have t o go ahead and apply som e algorit hm and m ake sure t hat Cust om er.First Nam e does not cont ain any null values. I n t his exam ple, we set Cust om er.First Nam e t o '???' t o indicat e t hat we need t o updat e t his record. This st rat egy was chosen carefully by our proj ect st akeholders because t he dat a being changed is crit ical t o t he business: UPDATE Customer SET FirstName='???' WHERE FirstName IS NULL;

Access Program Update Mechanics You m ust refact or all t he ext ernal program s current ly accessing Cust om er.First Nam e t o provide an appropriat e value whenever t hey m odify a row wit hin t he t able. Furt herm ore, t hey m ust also det ect and t hen handle any new except ions t hat are t hrown by t he dat abase. I f you are unsure about where all t his colum n is used and do not have a way t o change all t hose inst ances, you can apply t he dat abase refact oring I nt roduce Default Value ( page 186) so t hat t he dat abase provides a default value when no value is provided by t he applicat ion. This m ay be an int erim st rat egy for youaft er sufficient t im e has passed and you believe t hat t he access program s have been updat ed, or at least unt il you are willing t o quickly deal wit h t he handful of access program s t hat were not updat ed, you can apply Drop Default Value ( page 174) and t hereby im prove perform ance. The following code shows t he before and aft er st at e when t he Make Colum n Non- Nullable refact oring is applied. I n t he exam ple, we have decided t o t hrow an except ion when null values are found: //Before code stmt = conn.prepare( "INSERT INTO Customer "+ "(CustomerID,FirstName,Surname) "+ "VALUES (?,?,?,)");

stmt.setLong(1,customerId); stmt.setString(2,firstName); stmt.setString(3,surname); stmt.executeUpdate(); } //After code if (firstName == null) { throw new CustomerFirstNameCannotBeNullException(); }; stmt = conn.prepare( "INSERT INTO Customer "+ "(CustomerID,FirstName,Surname) "+ "VALUES (?,?,?,)"); stmt.setLong(1,customerId); stmt.setString(2,firstName); stmt.setString(3,surname); stmt.executeUpdate(); }

Move Data Move t he dat a cont ained wit hin a t able, eit her all or a subset of it s colum ns, t o anot her exist ing t able.

Motivation You t ypically want t o apply Move Dat a as t he result of st ruct ural changes wit h your t able design; but as always, applicat ion of t his refact oring depends on t he sit uat ion. You m ay want t o apply Move Dat a as t he result of t he following: Colu m n r e n a m in g. When you renam e a colum n, t he process is first t o int roduce a new colum n wit h t he new nam e, m ove t he original dat a t o t he new colum n, and t hen rem ove t he original colum n using Drop Colum n ( page 72 ) . Ve r t ica lly split t in g a t a ble . When you apply Split Table ( page 145) t o vert ically reorganize an exist ing t able, you need t o m ove dat a from t he source t able int o t he new t ables t hat have been split off from it . H or izon t a lly split t in g a t a ble . Som et im es a horizont al slice of dat a is m oved from a t able int o anot her t able wit h t he sam e st ruct ure because t he original t able has grown so large t hat perform ance has degraded. For exam ple, you m ight m ove all t he rows in your Cust om er t able t hat represent European cust om ers int o a EuropeanCust om er t able. This horizont al split m ight be t he first st ep t oward building an inherit ance hierarchy ( for exam ple, EuropeanCust om er inherit s from Cust om er) wit hin your dat abase and/ or because you int end t o add specific colum ns for Europeans. Split t in g a colu m n . Wit h Split Colum n ( page 140) , you need t o m ove dat a t o t he new colum n( s) from t he source. M e r gin g a t a ble or colu m n . Applying Merge Tables ( page 96 ) or Merge Colum ns ( page 92 ) requires you t o m ove dat a from t he source( s) int o t he t arget ( s) . Con solida t ion of da t a w it h ou t in volvin g st r u ct u r a l ch a n ge s. Dat a is oft en st ored in several locat ions. For exam ple, you m ay have several cust om er- orient ed t ables, one t o st ore Canadian cust om er inform at ion and several for cust om ers who live in various part s of t he Unit ed St at es, all of which have t he sam e basic st ruct ure. As t he result of a corporat e reorganizat ion, t he dat a for t he provinces of Brit ish Colum bia and Albert a are m oved from t he CanadianCust om er t able t o t he West Coast Cust om er t able t o reflect t he dat a ownership wit hin your organizat ion. M ove da t a be for e r e m ovin g a t a ble or colu m n . Applying Drop Table ( page 77 ) or Drop Colum n ( page 72 ) m ay require you t o apply Move Dat a first if som e or all of t he dat a st ored in t he t able or colum n is st ill required.

Potential Tradeoffs Moving dat a bet ween colum ns can be t ricky at any t im e, but it is part icularly challenging when m illions of rows are involved. While t he m ove occurs, t he applicat ions accessing t he dat a m ay be im pact edyou will likely want t o lock t he dat a during t he m oveaffect ing perform ance and availabilit y of t he dat a for t he applicat ion, because t he applicat ions now cannot access dat a during t his m ove.

Schema Update Mechanics To perform Move Dat a, you m ust first ident ify dat a t o m ove and any dependencies on it . Should t he corresponding row in t he ot her t able be delet ed or should t he colum n value in t he corresponding be nulled/ zeroed out , or should t he corresponding value be left alone? I s t he row being m oved or are we j ust m oving som e colum n( s) . Second, you m ust ident ify t he dat a dest inat ion. Where is t he dat a being m oved t o? To anot her t able or t ables? I s t he dat a being t ransform ed while it s being m oved? When you m ove rows, you m ust m ake sure t hat all t he dependent t ables t hat have foreign key references t o t he t able where t he dat a is being m oved t o now reference t he t able where t he dest inat ion of t he m ove is.

Data-Migration Mechanics When sm all am ount s of dat a need t o be m oved, you will likely find t hat a sim ple SQL script t hat insert s t he source dat a int o t he t arget locat ion, and t hen delet es t he source, is sufficient . For large am ount s of dat a, you m ust t ake a m ore sophist icat ed approach because of t he t im e it will t ake t o m ove t he dat a. You m ay need t o export t he source dat a and t hen im port it int o t he t arget locat ion. You can also use dat abase ut ilit ies such as Oracle's SQLLDR or a bulk loader. Figure 7.12 depict s how we m oved t he dat a wit hin t he Cust om er.St at us colum n t o Cust om erSt at usHist ory t able t o enable us t o t rack t he st at us of a cust om er over a period of t im e. The following code depict s t he DML t o m ove t he dat a from t he Cust om er.St at us colum n t o t he Cust om erSt at usHist ory. First , we insert t he dat a int o t he Cust om erSt at usHist ory from t he Cust om er t able, and lat er we updat e t he Cust om er.St at us colum n t o N ULL:

Figu r e 7 .1 2 . M ovin g st a t u s da t a fr om Cu st om e r t o Cu st om e r St a t u sH ist or y. [View full size image]

INSERT INTO CustomerStatusHistory (CustomerId,Status,EffectiveDate) SELECT CustomerId,Status,Sysdate FROM Customer;

UPDATE Customer SET Status = NULL;

Access Program Update Mechanics When Move Dat a is applied as t he result of anot her dat abase refact oring, t he applicat ions should have been updat ed t o reflect t hat refact oring. However, when Move Dat a is applied t o support a reorganizat ion of dat a wit hin an exist ing schem a, it is possible t hat ext ernal applicat ion code will need t o be updat ed. For exam ple, SELECT clauses m ay need t o be reworked t o access t he m oved dat a from t he new source( s) for it . The following code snippet shows how t he get Cust om erSt at us( ) m et hod changes: //Before code public String getCustomerStatus(Customer customer) { return customer.getStatus(); } //After code public String getCustomerStatus(Customer customer) throws SQLException { stmt.prepare( "SELECT Status "+ "FROM CustomerStatusHistory " + "WHERE " + "CustomerId = ? "+ "AND EffectiveDate < TRUNC(SYSDATE) "+ "ORDER BY EffectiveDate DESC"); stmt.setLong(1,customer.getCustomerId); ResultSet rs = stmt.execute(); if (rs.next()) { return rs.getString("Status"); } throw new CustomerStatusNotFoundInHistoryException(); }

Replace Type Code With Property Flags Replace a code colum n wit h individual propert y flags, usually im plem ent ed as Boolean colum ns, wit hin t he sam e t able colum n. Not e t hat som e dat abase product s support nat ive Boolean colum ns. I n ot her dat abase product s, you can use alt ernat ive dat a t ypes for Boolean values ( for exam ple, NUMBER( 1) in Oracle, where a value 1 m eans TRUE and 0 m eans FALSE) .

Motivation The refact oring Replace Type Code Wit h Propert y Flags is used t o do t he following: D e n or m a lize for pe r for m a n ce . When you want t o im prove perform ance, by having a colum n for each inst ance of t he t ype code. For exam ple, t he Address.Type colum n has values of Hom e and Business would be replaced by isHom e and isBusiness propert y flag colum ns. This enables a m ore efficient search because it is fast er t o com pare t wo Boolean values t han it is t o com pare t wo st ring values. Sim plify se le ct ion s. Type code colum ns work well when t he t ypes are m ut ually exclusive; when t hey are not , however, t hey becom e problem at ic t o search on. For exam ple, assum e t hat it is possible for an address t o be eit her a hom e or a business. Wit h t he t ype code approach, you would need t he Address.Type colum n values of Hom e, Business, and Hom eAndBusiness, for exam ple. To obt ain a list of all business addresses, you would need t o run a query along t he lines of SELECT * FROM Addr e ss W H ERE Type = " Bu sin e ss" OR Type = " H om e An dBu sin e ss" . As you can im agine, t his query would need t o be updat ed if a new kind of address t ype, such as a vacat ion hom e address, was added t hat could also be a pot ent ial business. Wit h t he propert y flag colum n approach, t he query would look like SELECT * FROM Addr e ss W H ERE isBu sin e ss = TRUE. This query is sim ple and would not need t o be changed if a new address t ype was added. D e cou ple a pplica t ion s fr om t ype code va lu e s. When you have m ult iple applicat ions using Account .Account Type of Figure 7.13, m aking changes t o t he values becom es difficult because m ost applicat ions will have hard coded t he values. When t hese t ype codes are replaced wit h propert y flag colum ns, t he applicat ions will only have t o check for t he st andard TRUE or FALSE values. Wit h a t ype code colum n, t he applicat ions are coupled t o t he nam e of t he colum n and t he values wit hin t he colum n; wit h propert y flag colum ns, t he applicat ions are m erely coupled t o t he nam es of t he colum ns.

Figu r e 7 .1 3 . Re pla cin g t h e Accou n t Type code colu m n w it h pr ope r t y fla gs.

Potential Tradeoffs Every t im e you want t o add a new t ype of value, you m ust change t he t able st ruct ure. For exam ple, when you want t o add a m oney m arket account t ype, you m ust add t he isMoneyMarket colum n t o t he Account t able. This will not be desirable aft er awhile because t ables wit h large num bers of colum ns are m ore difficult t o underst and t han t ables wit h sm aller num bers of colum ns. The result of j oins wit h t his t able increase in size each t im e you add a new t ype colum n. However, it is very easy t o add a colum n independent of t he rest of t he colum ns. Wit h t he t ype code colum n approach, t he colum n is coupled t o all t he applicat ions accessing it .

Schema Update Mechanics To Replace Type Code Wit h Propert y Flags t o your dat abase t able, you m ust do t he following:

1 . I de n t ify t h e t ype code t o r e pla ce . Find t he t ype code colum n t hat you need t o replace, which is Account .Account Type in our exam ple. You should also find out t he inst ances of t he t ype code t o replacefor exam ple, when you want t o replace t he AddressType code, you need t o find all t he different t ypes of addresses, and replace all of t he inst ances of AddressType. 2 . I n t r odu ce pr ope r t y fla g colu m n s. Aft er you have ident ified t he inst ances of t he t ype codes t hat you want t o replace, you have t o add t hose m any colum ns t o t he t able using I nt roduce New Colum n ( page 301) . I n Figure 7.13, we are going t o add t he HasChecks, HasFinancialAdviceRight s, HasPassbook, and I sRet ailCust om er colum ns t o Account . 3 . Re m ove t h e t ype code colu m n . Aft er all t he t ype codes have been convert ed t o propert y flags, you have t o rem ove t he t ype code colum n using Drop Colum n ( page 72 ) .

Figure 7.13 shows how we change t he Account .Account Type colum n t o use t ype flag colum ns for every inst ance of t he Account Type dat a. I n t he exam ple, we have four different t ypes of account s t hat will be convert ed t o t ype flag colum ns, nam ed HasChecks, HasFinancialAdviceRight s, HasPassbook, and isRet ailCust om er. The following SQL depict s how t o add t he four t ype flag colum ns t o t he Account t able, t he synchronizat ion t rigger during t he t ransit ion period, t he code t o drop t he original colum n, and t he t rigger aft er t he t ransit ion period ends: [View full width]ALTER TABLE Account ADD COLUMN HasChecks Boolean; COMMENT ON Account.HasChecks "Replaces AccountType of Checks" ALTER TABLE Account ADD COLUMN HasFinancialAdviceRights Boolean; COMMENT ON Account.HasFinancialAdviceRights "Replaces AccountType of FinancialAdviceRights" ALTER TABLE Account ADD COLUMN HasPassbook Boolean; COMMENT ON Account.HasPassbook "Replaces AccountType of Passbook" ALTER TABLE Account ADD COLUMN isRetailCustomer Boolean; COMMENT ON Account. isRetailCustomer "Replaces AccountType of RetailCustomer" CREATE OR REPLACE TRIGGER SynchronizeAccountTypeColumns BEFORE INSERT OR UPDATE ON ACCOUNT REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF (:NEW.HasChecks IS NULL AND :NEW.AccountType = 'CHECKS') THEN :NEW.HasChecks := TRUE; END IF; IF (:NEW.HasFinancialAdviceRights IS NULL AND :NEW.AccountType = 'FINANCIALADVICERIGHTS') THEN :NEW.HasFinancialAdviceRights := TRUE; END IF; IF (:NEW.HasPassbook IS NULL AND :NEW.AccountType = 'PASSBOOK') THEN NEW.HasPassbook := TRUE; END IF; IF (:NEW.isRetailCustomer IS NULL AND :NEW.AccountType ='RETAILCUSTOMER') THEN :NEW.isRetailCustomer := TRUE; END IF; END; / On June 14 2007 ALTER TABLE Account DROP COLUMN AccountType;

DROP TRIGGER SynchronizeAccountTypeColumns;

Data-Migration Mechanics You have t o writ e dat a- m igrat ion script s t o updat e t he propert y flags based on t he value of t he t ype code you can use Updat e Dat a ( page 310) . During t he t ransit ion phase, when t he t ype code colum n and t he t ype flag colum ns also exist , we have t o keep t he colum ns synchronized, so t hat t he applicat ions using t he dat a get consist ent inform at ion. This can be achieved by using dat abase t riggers. The following SQL synt ax shows how t o updat e t he exist ing dat a in t he Account t able ( see Updat e Dat a for det ails) : UPDATE Account SET HasChecks = TRUE WHERE AccountType = 'CHECKS'; UPDATE Account SET HasChecks = FALSE WHERE HasChecks != TRUE; UPDATE Account SET HasFinancialAdviceRights = TRUE WHERE AccountType = 'FINANCIALADVICERIGHTS'; UPDATE Account SET HasFinancialAdviceRights = FALSE WHERE HasFinancialAdviceRights != TRUE; UPDATE Account SET HasPassbook = TRUE WHERE Accounttype = 'PASSBOOK'; UPDATE Account SET HasPassbook = FALSE WHERE HasPassbook != TRUE; UPDATE Account SET isRetailCustomer = TRUE WHERE Accounttype ='RETAILCUSTOMER'; UPDATE Account SET isRetailCustomer = FALSE WHERE isRetailCustomer!= TRUE;

Access Program Update Mechanics When Replace Type Code Wit h Propert y Flags is applied, you need t o updat e ext ernal program s in t wo ways. First , t he SQL code ( or m et a dat a) t hat saves, delet es, and ret rieves dat a from t his colum n m ust be updat ed t o work wit h t he individual t ype flag colum ns and not Account Type. For exam ple, when we have SELECT * FROM Accou n t W H ERE Accou n t Type = 'XXXX' , t his SQL will change t o SELECT * FROM Accou n t W H ERE isXXXX = TRUE. Sim ilarly, you will have t o change t he program code t o updat e t he t ype flag colum ns rat her t han Account Type during an insert or updat e operat ion. Second, you m ay need t o updat e applicat ion code t hat works wit h t he colum n. For exam ple, com parison code such as Cu st om e r .Addr e ssType = 'H om e ' m ust be updat ed t o work wit h isHom e. The following code depict s how t he Account class changes when you replace t he account t ype code wit h propert y flags: //Before code public class Account { private private private private private

Long accountID; BigDecimal balance; String accountType; Boolean FALSE = Boolean.FALSE; Boolean TRUE = Boolean.TRUE;

public Long getAccountID() { return accountID; }

public BigDecimal getBalance() { return balance; } public Boolean HasChecks() { return accountType.equals("CHECKS"); } public Boolean HasFinancialAdviceRights() { return accountType.equals("FINANCIALADVICERIGHTS"); } public Boolean HasPassBook() { return accountType.equals("PASSBOOK"); } public Boolean IsRetailCustomer() { return accountType.equals("RETAILCUSTOMER"); } } //After code public class Account { private private private private private private

Long accountID; BigDecimal balance; Boolean HasChecks; Boolean HasFinancialAdviceRights; Boolean HasPassBook; Boolean IsRetailCustomer;

public Long getAccountID() { return accountID; } public BigDecimal getBalance() { return balance; } public Boolean HasChecks() { return HasChecks; } public Boolean HasFinancialAdviceRights() { return HasFinancialAdviceRights; } public Boolean HasPassBook() { return HasPassBook; } public Boolean IsRetailCustomer() { return IsRetailCustomer; } }

Chapter 8. Referential Integrity Refactorings Referent ial int egrit y refact orings are changes t hat ensure t hat a referenced row exist s wit hin anot her t able and/ or t hat ensures t hat a row t hat is no longer needed is rem oved appropriat ely. The referent ial int egrit y refact orings are as follows: Add Foreign Key Const raint Add Trigger For Calculat ed Colum n Drop Foreign Key Const raint I nt roduce Cascading Delet e I nt roduce Hard Delet e I nt roduce Soft Delet e I nt roduce Trigger For Hist ory

Add Foreign Key Constraint Add a foreign key const raint t o an exist ing t able t o enforce a relat ionship t o anot her t able.

Motivation The prim ary reason t o apply Add Foreign Key Const raint is t o enforce dat a dependencies at t he dat abase level, ensuring t hat t he dat abase enforces som e referent ial int egrit y ( RI ) business rules prevent ing invalid dat a from being persist ed. This is part icularly im port ant when several applicat ions access t he sam e dat abase because you cannot count on t hem enforcing dat a- int egrit y rules consist ent ly.

Potential Tradeoffs Foreign key const raint s reduce perform ance wit hin your dat abase because t he exist ence of t he row in t he foreign t able will be verified whenever t he source row is updat ed. Furt herm ore, when adding a foreign key const raint t o t he dat abase, you m ust be careful about t he order of insert s, updat es, and delet ions. For exam ple, in Figure 8.1 , you cannot add a row in Account wit hout t he corresponding row in Account St at us. The im plicat ion is t hat your applicat ion, or your persist ence layer as t he case m ay be, m ust be aware of t he t able dependencies in t he dat abase. Luckily, m any dat abases allow com m it t im e enforcing of t he dat abase const raint s, enabling you t o insert / updat e or delet e rows in any order as long as dat a int egrit y is m aint ained at com m it t im e. This t ype of feat ure m akes developm ent easy and provides higher incent ive t o use foreign key const raint s.

Figu r e 8 .1 . Addin g a for e ign k e y con st r a in t t o t h e Accou n t .Accou n t St a t u s colu m n .

Schema Update Mechanics As depict ed in Figure 8.1 , t o updat e t he schem a t o add a foreign key const raint , you m ust do t he following:

1 . Ch oose a con st r a in t ch e ck in g st r a t e gy. Depending on your dat abase product , it will support one or t wo ways t o enforce foreign keys. First , wit h t he im m ediat e checking m et hod, t he foreign key const raint is checked when t he dat a is insert ed/ updat ed or delet ed from t he t able. The im m ediat e checking m et hod is bet t er at failing fast er and will force you t o consider t he order of dat abase changes ( insert s, updat es, and delet es) . Second, wit h t he deferred checking m et hod, t he foreign key const raint is checked when t he t ransact ion is com m it t ed from t he applicat ion. This m et hod allows you t he flexibilit y t o not worry about t he order of dat abase changes because t he const raint s are checked at com m it t im e. This approach also enables you t o cache all t he dirt y obj ect s and writ e t hem in a bat ch; you j ust have t o m ake sure t hat at com m it t im e t he dat abase is in a clean st at e. I n eit her case, t he dat abase ret urns an except ion when t he first foreign key const raint fails. ( There could be several.) 2 . Cr e a t e t h e for e ign k e y con st r a in t . Creat e t he foreign key const raint in t he dat abase via t he AD D CON STRAI N T clause of t he ALTER TABLE st at em ent . The dat abase const raint should be nam ed according t o your dat abase nam ing convent ions for clarit y and effect ive error report ing by t he dat abase. I f you are using t he com m it - t im e checking of const raint s, t here m ay be perform ance degradat ion because t he dat abase will be checking t he int egrit y of t he dat a at com m it t im e, a significant problem wit h t ables wit h m illions of rows. 3 . I n t r odu ce a n in de x for t h e pr im a r y k e y of t h e for e ign t a ble ( opt ion a l) . Dat abases use select st at em ent s on t he referenced t ables t o verify whet her t he dat a being ent ered in t he child t able is valid. I f t he Account St at us.St at usCode colum n does not have an index, you m ay experience significant perform ance degradat ion and need t o consider applying t he I nt roduce I ndex ( page 248) dat abase refact oring. When you creat e an index, you will increase t he perform ance of const raint checking, but you will be decreasing t he perform ance of updat e, insert , and delet e on Account St at us because t he dat abase now has t o m aint ain t he added index.

The following code depict s t he st eps t o add a foreign key const raint t o t he t able. I n t his exam ple, we are creat ing t he const raint such t hat t he foreign key const raint is checked im m ediat ely upon dat a m odificat ion: ALTER TABLE Account ADD CONSTRAINT FK_Account_AccountStatus FOREIGN KEY (StatusCode) REFERENCES AccountStatus;

I n t his exam ple, we are creat ing t he foreign key const raint such t hat t he foreign key is checked at com m it t im e: ALTER TABLE Account ADD CONSTRAINT FK_Account_AccountStatus FOREIGN KEY (StatusCode) REFERENCES AccountStatus INITIALLY DEFERRED;

Data-Migration Mechanics To support the addit ion of a foreign key const raint t o a t able, you m ay discover t hat you need t o updat e t he exist ing dat a wit hin t he dat abase first . This is a m ult i- st ep process:

1 . En su r e t h e r e fe r e n ce d da t a e x ist s. First , we need t o ensure t hat t he rows being referred t o exist in Account St at us. You need t o analyze t he exist ing dat a in bot h Account and Account St at us t o det erm ine whet her t here are m issing rows in Account St at us. The easiest way t o do t his is t o com pare t he count of t he num ber of rows in Account t o t he count of t he num ber of rows result ing in t he j oin of Account and Account St at us. 2 . En su r e t h a t t h e for e ign t a ble con t a in s a ll r e qu ir e d r ow s. I f t he count s are different , eit her you are m issing rows in Account St at us and/ or t here are incorrect values in Account .St at usCode. First , creat e a list of unique values of Account .St at usCode and com pare it wit h t he list of values from Account St at us.St at usCode. I f t he first list cont ains values t hat are valid but do not appear in t he second list , Account St at us needs t o be updat ed. Second, t here m ay st ill be valid values t hat appear in neit her list but are st ill valid wit hin your business dom ain. To ident ify t hese values, you need t o work closely wit h your proj ect st akeholders; bet t er yet , j ust wait unt il t hey act ually need t he dat a rows and t hen add t hem at t hat t im e. 3 . En su r e t h a t sou r ce t a ble 's for e ign k e y colu m n con t a in s va lid va lu e s. Updat e t he list s from t he previous st ep. Any differences in t he list m ust now be t he result of invalid or m issing values in the Account .St at usCode colum n. You need t o updat e t hese rows appropriat ely, eit her wit h an aut om at ed script t hat set s a default value or by m anually updat ing t hem . 4 . I n t r odu ce a de fa u lt va lu e for t h e for e ign k e y colu m n . You m ay opt ionally need t o m ake t he dat abase insert default values when t he ext ernal program s do not provide a value for Account .St at usCode. See t he dat abase refact oring I nt roduce Default Value ( page 186) .

For t he exam ple of Figure 8.1 , you m ust ensure t hat t he dat a before t he foreign key const raint is added is clean; if it is not , you m ust updat e t he dat a. Let 's assum e t hat we have som e Account rows where t he st at us is not set or is not part of t he Account St at us t able. I n t his sit uat ion, you m ust updat e the Account .St at us colum n t o som e known value t hat exist s in t he Account St at us t able: UPDATE Account SET Status = 'DORMANT' WHERE Status NOT IN (SELECT StatusCode FROM AccountStatus) AND Status IS NOT NULL;

I n ot her cases, you m ay have t he Account .St at us cont aining a null value. I f so, updat e t he Account .St at us colum n wit h a known value, as shown here: UPDATE Account SET Status = 'NEW' WHERE Status IS NULL;

Access Program Update Mechanics You m ust ident ify and t hen updat e any ext ernal program s t hat m odify dat a in t he t able where t he foreign key const raint was added. I ssues t o look for include t he following:

1 . Sim ila r RI code . Som e ext ernal program s will im plem ent t he RI business rule t hat will now be handled via t he foreign key const raint wit hin t he dat abase. This code should be rem oved. 2 . D iffe r e n t RI code . Som e ext ernal program s will include code t hat enforces different RI business rules t han what you are about t o im plem ent . This im plicat ion is t hat you eit her need t o reconsider adding t his foreign key const raint because t here is no consensus wit hin your organizat ion regarding t he business rule t hat it im plem ent s or you need t o rework t he code t o work based on t his new version ( from it s point of view) of t he business rule. 3 . N on e x ist e n t RI code . Som e ext ernal program s will not even be aware of t he RI business rule pert aining t o t hese dat a t ables.

All ext ernal program s m ust be updat ed t o handle any except ion( s) t hrown by t he dat abase as t he result of t he new foreign key const raint . The following code shows how t he applicat ion code needs t o change t o handle except ions t hrows by t he dat abase: // Before code stmt = conn.prepare( "INSERT INTO Account(AccountID,StatusCode,Balance) "+ "VALUES (?,?,?)"; stmt.setLong(1,accountId); stmt.setString(2,statusCode); stmt.setBigDecimal(3,balance); stmt.executeUpdate(); //After code stmt = conn.prepare( "INSERT INTO Account(AccountID,StatusCode,Balance) "+ "VALUES (?,?,?)"; stmt.setLong(1,accountId); stmt.setString(2,statusCode); stmt.setBigDecimal(3,balance); try { stmt.executeUpdate(); } catch (SQLException exception){ int errorCode = exception.getErrorCode(); if (errorCode = 2291) { handleParentRecordNotFoundError(); } if (errorCode = 2292) { handleParentDeletedWhenChildFoundError(); } }

Add Trigger For Calculated Column I nt roduce a new t rigger t o updat e t he value cont ained in a calculat ed colum n. The calculat ed colum n m ay have been previously int roduced by t he I nt roduce Calculat ed Colum n refact oring ( page 81 ) .

Motivation The prim ary reason you would apply Add Trigger For Calculat ed Colum n is t o ensure t hat t he value cont ained in t he colum n is updat ed properly whenever t he source dat a changes. This should be done by t he dat abase so t hat all t he applicat ions are not required t o do so.

Potential Tradeoffs When a calculat ed colum n is based on com plex algorit hm s, or sim ply on dat a locat ed in several places, your t rigger will im plem ent a lot of business logic. This m ay lead t o inconsist ency wit h sim ilar business logic im plem ent ed wit hin your applicat ions. The source dat a used by t he t rigger m ight be updat ed wit hin t he scope of a t ransact ion. I f t he t rigger fails, t he t ransact ion fails, t oo, causing it t o be rolled back. This will likely be perceived as an undesirable side effect by ext ernal program s. When a calculat ed colum n is based on dat a from t he sam e t able, it m ay not be possible t o use a t rigger t o do t he updat e because m any dat abase product s do not allow t his.

Schema Update Mechanics Applying Add Trigger For Calculat ed Colum n can be com plicat ed because of dat a dependencies of t he calculat ed colum n. You will need t o do t he following: 1 . D e t e r m in e w h e t h e r t r igge r s ca n be u se d t o u pda t e t h e ca lcu la t e d colu m n . I n Figure 8.2 , the Tot alPort folioValue colum n is calculat ed. You know t hat because of t he forward slash (/) in front of t he nam e, a UML convent ion. When Tot alPort folioValue and t he source dat a is in t he sam e t able, you likely cannot use t riggers t o updat e t he dat a value.

Figu r e 8 .2 . Addin g a t r igge r t o ca lcu la t e Cu st om e r .Tot a lPor t folioVa lu e . [View full size image]

2 . I de n t ify t h e sou r ce da t a . You have t o ident ify t he source dat a, and how it should be used, t o det erm ine t he value of t he calculat ed colum n. 3 . I de n t ify t h e t a ble t o con t a in t h e colu m n . You have t o ident ify which t able should include t he calculat ed colum n if it does not already exist . To do t his, ask yourself which business ent it y t he calculat ed colum n describes best . For exam ple, a cust om er's credit risk indicat or is m ost applicable t o t he Cust om er ent it y. 4 . Add t h e colu m n. I f t he colum n does not exist , add it via t he ALTER TABLE AD D COLUM N com m and. Use I nt roduce New Colum n ( page 301) . 5 . Add t h e t r igge r ( s) . You need t o add a t rigger t o each t able t hat cont ains source dat a pert inent t o calculat ing t he value. I n t his case, t he source dat a for Tot alPort folioValue exist s in t he Account and I nsurancePolicy t ables. Therefore, we need a t rigger for each t able, Updat eCust om erTot alPort folioValue and Updat eTot alPort folioValue, respect ively.

The following code shows you how t o add t he t wo t riggers: Update the TotalPortfolioValue with the trigger. CREATE OR REPLACE TRIGGER UpdateCustomerTotalPortfolioValue AFTER UPDATE OR INSERT OR DELETE ON Account REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN UpdateCustomerWithPortfolioValue; END; END; / CREATE OR REPLACE TRIGGER

UpdateCustomerTotalPortfolioValue AFTER UPDATE OR INSERT OR DELETE ON InsurancePolicy REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN UpdateCustomerWithPortfolioValue; END; END; /

Data-Migration Mechanics There is no dat a t o be m igrat ed per se, alt hough t he value of Cust om er.Tot alPort folioValue m ust be populat ed based on t he calculat ion. This is t ypically done once in bat ch via one or m ore script s. I n our exam ple, we have t o updat e t he Cust om er.Tot alPort folioValue for all t he exist ing rows in t he Cust om er t able wit h t he sum of Account .Balance and Policy.Value for each cust om er: UPDATE Customer SET TotalPortfolioValue = (SELECT SUM(Account.Balance) + SUM(Policy.Balance) FROM Account, CustomerInsurancePolicy, InsurancePolicy WHERE Account.AccountID = CustomerInsurancePolicy.AccountID AND CustomerInsurancePolicy.PolicyID=Policy.PolicyID AND Account.CustomerID=Customer.CustomerID );

Access Program Update Mechanics You need t o ident ify all t he places in ext ernal applicat ions where t his calculat ion is current ly perform ed and t hen rework t hat code t o work wit h Tot alPort folioValuet his t ypically ent ails rem oving t he calculat ion code and replacing it wit h a read t o t he dat abase. You m ay discover t hat t he calculat ion is perform ed different ly in various applicat ions, eit her because of a bug or because of a different sit uat ion, and need t o negot iat e t he appropriat e calculat ion wit h t he business.

Drop Foreign Key Constraint Rem ove a foreign key const raint from an exist ing t able so t hat a relat ionship t o anot her t able is no longer enforced by t he dat abase.

Motivation The prim ary reason t o apply Drop Foreign Key Const raint is t o no longer enforce dat a dependencies at t he dat abase levelinst ead, dat a int egrit y is enforced by ext ernal applicat ions. This is part icularly im port ant when t he perform ance cost of enforcing RI by t he dat abase cannot be sust ained by t he dat abase anym ore and/ or when t he RI rules vary bet ween applicat ions.

Potential Tradeoffs The fundam ent al t radeoff is perform ance versus qualit y: Foreign key const raint s ensure t he validit y of t he dat a at t he dat abase level at t he cost of t he const raint being enforced each t im e t he source dat a is updat ed. When you apply Drop Foreign Key , your applicat ions will be at risk of int roducing invalid dat a if t hey do not validat e t he dat a before writ ing t o t he dat abase.

Schema Update Mechanics To drop a foreign key const raint , you m ust eit her apply t he ALTER TABLE D ROP CON STRAI N T com m and or you can j ust apply t he ALTER TABLE D I SABLE CON STRAI N T com m and. The advant age of t hat lat t er approach is t hat it ensures t hat t he relat ionship is docum ent ed even t hough t he const raint is not enforced. The following code depict s t he t wo ways by which you can drop t he foreign key const raint bet ween Account .St at usCode and Account St at us.St at usCode in Figure 8.3 . The first way j ust drops t he const raint ; t he ot her one disables it and t hereby docum ent s t he need for it . We recom m end t he second approach:

Figu r e 8 .3 . D r oppin g a for e ign k e y con st r a in t fr om t h e Accou n t t a ble .

ALTER TABLE Account DROP CONSTRAINT FK_Account_Status; ALTER TABLE Account DISABLE CONSTRAINT FK_Account_Status;

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics You m ust ident ify and t hen updat e any ext ernal program s t hat m odify t he dat a colum n( s) on which t he foreign key const raint was defined. There are t wo issues t o look for in t he ext ernal program s. First , individual applicat ions need t o be updat ed t o ensure t hat appropriat e RI rules are st ill being enforced. The rules could vary, but in general, you need t o add logic t o each applicat ion t o ensure t hat a row exist s in Account St at us whenever it is referenced in Account . The im plicat ion is t hat you will eit her need t o det erm ine t hat t he row already exist sif Account St at us is sm all, you m ight consider caching it in m em oryand be prepared t o add a new row t o t he t able. The second issue concerns except ion handling. Because t he dat abase will no longer t hrow RI except ions pert aining t o t his foreign key const raint , you will need t o rework any ext ernal program code appropriat ely.

Introduce Cascading Delete The dat abase aut om at ically delet es t he appropriat e " child records" when a " parent record" is delet ed.

Figu r e 8 .4 . I n t r odu cin g a ca sca din g de le t e on Policy.

Not e t hat an alt ernat ive t o delet ing child records is j ust t o rem ove t he reference t o t he parent record wit hin t he children records. This opt ion can only be used when t he foreign key colum n( s) in t he child t ables allows null values, alt hough t his alt ernat ive m ay lead t o lot s of orphaned rows.

Motivation The prim ary reason you would apply I nt roduce Cascading Delet e is t o preserve t he referent ial int egrit y of your dat a by ensuring t hat relat ed rows are appropriat ely delet ed when a parent row is delet ed.

Potential Tradeoffs There are t hree pot ent ial t radeoffs wit h t his refact oring: D e a dlock . When you im plem ent cascading delet es, you m ust avoid cyclic dependencies; ot herwise, deadlock m ay occur. Modern dat abases det ect deadlocks and will roll back t he t ransact ion t hat caused t he deadlock. Accide n t a l m a ss de le t ion . You need t o be very careful when applying t his refact oring. For exam ple, it m ay t heoret ically m ake sense t hat if you delet e a row in your Corporat eDivision, t hen all rows corresponding t o t hat division in your Em ployee t able should also be delet ed. However, you could easily delet e t housands of em ployee records when som eone inadvert ent ly delet es a record represent ing a large division. D u plica t e d fu n ct ion a lit y. Persist ence fram eworks such as Hibernat e (www.hibernat e.org) and Oracle's Toplink ( www.oracle.com ) aut om at e relat ionship m anagem ent bet ween obj ect s; t herefore, you m ay be duplicat ing funct ionalit y when you apply t his refact oring.

Schema Update Mechanics To apply I nt roduce Cascading Delet e, you need t o do t he following: 1 . I de n t ify w h a t is t o be de le t e d. You need t o ident ify t he " children" of a row t hat should be delet ed when t he parent row is delet ed. For exam ple, when you delet e an order, you should also delet e all t he order it em s associat ed wit h t hat order. This act ivit y is recursive; t he child rows in t urn m ay have children t hat also need t o be delet ed, m ot ivat ing you t o apply I nt roduce Cascading Delet e for t hem , t oo. We do not recom m end applying t his refact oring all at once; inst ead, it is bet t er t o apply t his refact oring t o one set of t ables at a t im e, fully im plem ent ing and t est ing it before m oving on t o t he next . Sm all changes are easier t o im plem ent t han large changes. 2 . Ch oose ca sca din g m e ch a n ism . You can im plem ent cascading delet es eit her wit h t riggers or wit h referent ial int egrit y const raint s wit h t he D ELETE CASCAD E opt ion. ( Not all dat abase vendors m ay support t his opt ion.) 3 . I m ple m e n t t h e ca sca din g de le t e. Wit h t he first approach, you writ e a t rigger t hat delet es all t he children of t he parent row when it is delet ed. This opt ion is best suit ed when you want t o have fine- grained cont rol over what get s delet ed when t he parent row is delet ed. The downside is t hat you m ust writ e code t o im plem ent t his funct ionalit y. You m ay also int roduce deadlock sit uat ions when you have not t hought t hrough t he int errelat ionships bet ween m ult iple t riggers being execut ed at t he sam e t im e. Wit h t he second approach, you define RI const raint s wit h t he D ELETE CASCAD E opt ion t urned on, of t he ALTER TABLE M OD I FY CON STRAI N T SQL com m and. When you choose t his opt ion, you m ust define referent ial const raint s on t he dat abase, which could be a huge t ask if you do not already have referent ial const raint s defined ( because you would need t o apply t he Add Foreign Key refact oring [ page 204] t o pret t y m uch all t he relat ionships in t he dat abase) . The prim ary advant age wit h t his opt ion is t hat you do not have t o writ e any new code because t he dat abase aut om at ically t akes care of delet ing t he child rows. The challenge wit h t his approach is t hat it can be very hard t o debug.

Figure 8.4 depict s how t o I nt roduce Cascade Delet e t o t he Policy t able using t he t rigger m et hod. The Delet ePolicy t rigger, t he code is shown below, delet es any rows from t he PolicyNot es or Claim t ables t hat are relat ed t o t he row in t he Policy t able t hat is being delet ed: Create trigger to delete the PolicyNotes and Claim. CREATE OR REPLACE TRIGGER DeletePolicy AFTER DELETE ON Policy FOR EACH ROW DECLARE BEGIN DeletePolicyNotes(); DeletePolicyClaims(); END; /

The following code shows you how t o im plem ent I nt roduce Cascade Delet e using RI const raint s wit h the D ELETE CASCAD E opt ion: ALTER TABLE POLICYNOTES ADD CONSTRAINT FK_DELETEPOLICYNOTES FOREIGN KEY (POLICYID) REFERENCES POLICY (POLICYID) ON DELETE CASCADE ENABLE ;

ALTER TABLE CLAIMS ADD CONSTRAINT FK_DELETEPOLICYCLAIM FOREIGN KEY (POLICYID) REFERENCES POLICY (POLICYID) ON DELETE CASCADE ENABLE ;

Data-Migration Mechanics There is no dat a t o m igrat e wit h t his dat abase refact oring.

Access Program Update Mechanics When you apply t his refact oring, you m ust rem ove any applicat ion code t hat current ly im plem ent s t he delet e children funct ionalit y. One challenge will be t hat som e applicat ions im plem ent t he delet ion whereas ot hers do not perhaps one applicat ion delet es OrderI t em rows when t he corresponding Order row is delet ed, but anot her applicat ion does not . I m plem ent ing t he cascading delet e wit hin t he dat abase m ay inadvert ent ly im pact, perhaps for t he bet t er, t he second applicat ion. The point is t hat you need t o be very careful; you should not assum e t hat all applicat ions im plem ent t he sam e RI rules, regardless of how " obvious" it is t o you t hat t hey should. You will also need t o handle any new errors ret urned by t he dat abase when t he cascading delet e does not work. The following code shows how you would change your applicat ion code before and aft er I nt roduce Cascade Delet e is applied: //Before code private void deletePolicy (Policy policyToDelete) { Iterator policyNotes = policyToDelete.getPolicyNotes().iterator(); for (Iterator iterator = policyNotes; iterator.hasNext();) { PolicyNote policyNote = (PolicyNote) iterator.next(); DB.remove(policyNote); } DB.remove(policyToDelete); } //After code private void deletePolicy (Policy policyToDelete) { DB.remove(policyToDelete); }

I f you are using any of t he O- R m apping t ools, you will have t o change t he m apping file so t hat you can specify t he cascade opt ion in t he m apping, as shown here: //After mapping

......

......

Introduce Hard Delete Rem ove an exist ing colum n t hat indicat es t hat a row has been delet ed ( t his is called a soft delet e or logical delet e) and inst ead act ually delet e t he row from t he applicat ion ( for exam ple, do a hard delet e) . This refact oring is t he opposit e of I nt roduce Soft Delet e ( page 222) .

Motivation The prim ary reason t o apply I nt roduce Hard Delet e is t o reduce t he size of your t ables, result ing in bet t er perform ance and sim pler querying because you no longer have t o check t o see whet her a row is m arked as delet ed.

Potential Tradeoffs The only disadvant age of t his refact oring is t he loss of hist orical dat a, alt hough you can use I nt roduce Trigger For Hist ory ( page 227) if required.

Schema Update Mechanics As Figure 8.5 im plies, t o apply I nt roduce Hard Delet e you first need t o rem ove t he ident ifying colum n. You m ust rem ove t he delet ion indicat or colum nsee t he Drop Colum n refact oring ( page 72 ) which, in t his case, is t he Cust om er.isDelet ed colum n. Next you rem ove any code, usually t rigger code ( alt hough som et im es applicat ion code) , t hat updat es t he Cust om er.isDelet ed colum n. There m ay be code t hat set s t he init ial value t o FALSE in t he case of a Boolean indicat or, or t o a predet erm ined dat e in t he case of a dat e/ t im est am p. This logic is t ypically im plem ent ed wit h a default value colum n const raint t hat would be aut om at ically dropped along wit h t he colum n. There m ay also be t rigger code t hat set s t he value t o TRUE, or t o t he current dat e/ t im e, when t here is an at t em pt t o delet e t he row. Most likely, you will only need t o drop t he t rigger. The following code shows you how t o rem ove t he Cust om er.isDelet ed colum n: ALTER TABLE Customer DROP COLUMN isDeleted;

Figu r e 8 .5 . I n t r odu cin g a h a r d de le t e for Cu st om e r .

Data-Migration Mechanics You have t o delet e all t he dat a rows in t he Cust om er where isDelet ed is set t o TRUE, because t hese are t he rows t hat have been logically delet ed. Before you delet e t hese rows, you need t o updat e, and possibly delet e, all t he dat a t hat references t he logically delet ed dat a. This is t ypically done once in bat ch via one or m ore script s. You should also consider archiving all t he rows m arked for delet ion so t hat you can back out of t his refact oring if need be. The following code shows you how t o delet e t he rows from Cust om er t able where t he Cust om er.isDelet ed flag is set t o TRUE: Delete customers marked for delete DELETE FROM Customer WHERE isDeleted = TRUE;

Access Program Update Mechanics When you apply t he I nt roduce Hard Delet e refact oring, you m ust change ext ernal program s accessing t he dat a in t wo ways. First , SELECT st at em ent s m ust not reference t he Cust om er.isDelet ed colum n. Second, all logical delet ion code m ust be updat ed: //Before code public void customerDelete(Long customerIdToDelete) throws Exception { PreparedStatement stmt = null; try { stmt = DB.prepare("UPDATE Customer "+ "SET isDeleted = ? "+ "WHERE CustomerID = ?"); stmt.setLong(1, Boolean.TRUE); stmt.setLong(2, customerIdToDelete); stmt.execute();

} catch (SQLException SQLexc){ DB.HandleDBException(SQLexc); } finally {DB.cleanup(stmt);} } //After code public void customerDelete(Long customerIdToDelete) throws Exception { PreparedStatement stmt = null; try { stmt = DB.prepare("DELETE FROM Customer "+ "WHERE CustomerID = ?"); stmt.setLong(1, customerIdToDelete); stmt.execute(); } catch (SQLException SQLexc){ DB.HandleDBException(SQLexc); } finally {DB.cleanup(stmt);} }

Introduce Soft Delete I nt roduce a flag t o an exist ing t able t hat indicat es t hat a row has been delet ed ( t his is called a soft / logical delet e) inst ead of act ually delet ing t he row ( a hard delet e) . This refact oring is t he opposit e of I nt roduce Hard Delet e ( page 219) .

Motivation The prim ary reason t o apply I nt roduce Soft Delet e is t o preserve all applicat ion dat a, t ypically for hist orical m eans.

Potential Tradeoffs Perform ance is pot ent ially im pact ed for t wo reasons. First , t he dat abase m ust st ore all t he rows t hat have been m arked as delet ed. This could lead t o significant ly m ore disk space usage and reduced query perform ance. Second, applicat ions m ust now do t he addit ional work of dist inguishing bet ween delet ed and nondelet ed rows, decreasing perform ance while pot ent ially increasing code com plexit y.

Schema Update Mechanics As Figure 8.6 im plies, t o apply I nt roduce Soft Delet e, you will need t o do t he following: 1 . I n t r odu ce t h e ide n t ifyin g colu m n. You m ust int roduce a new colum n t o Cust om ersee t he I nt roduce New Colum n t ransform at ion ( page 301) t hat m arks t he row as delet ed or not . This colum n is usually eit her a Boolean field t hat cont ains t he value TRUE when t he row is delet ed and FALSE ot herwise or a dat e/ t im est am p indicat ing when t he row was delet ed. I n our exam ple, we are int roducing t he Boolean colum n isDelet ed. This colum n should not allow NULL values. ( See the Make Colum n Non- Nullable refact oring on page 189.) 2 . D e t e r m in e h ow t o u pda t e t h e fla g . The Cust om er.isDelet ed colum n can be updat ed eit her by your applicat ion( s) or wit hin t he dat abase using t riggers. We prefer t he t rigger- based approach because it is sim ple and it avoids t he risk t hat t he applicat ions will not updat e t he colum n properly. 3 . D e ve lop de le t ion code . The code m ust be writ t en and t est ed t o updat e t he delet ion indicat or colum n appropriat ely upon " delet ion" of a row. I n t he case of a Boolean colum n set t o t he value t o TRUE, in t he case of a dat e/ t im est am p, set it t o t he current dat e and t im e. 4 . D e ve lop in se r t ion code. You have t o set t he delet ion indicat or colum n appropriat ely upon an insert , FALSE in t he case of a Boolean colum n and a predet erm ined dat e ( for exam ple, t he January 1, 5000) for a dat e/ t im est am p. This could be easily im plem ent ed by using the I nt roduce Default Value refact oring ( page 186) or via a t rigger.

Figu r e 8 .6 . I n t r odu cin g a soft de le t e t o Cu st om e r .

The following code shows you how t o add t he Cust om er.isDelet ed colum n and assign it a default value of FALSE: ALTER TABLE Customer ADD isDeleted BOOLEAN; ALTER TABLE Customer MODIFY isDeleted DEFAULT FALSE;

The following code shows you how t o creat e a t rigger t hat int ercept s t he DELETE SQL com m and and assigns t he Cust om er.isDelet ed flag t o TRUE. The code copies t he dat a row before delet ion, updat es t he delet ion indicat or, and t hen insert s t he row back int o t he t able aft er t he original is rem oved: Create a array to store the deleted Customers CREATE OR REPLACE PACKAGE SoftDeleteCustomerPKG AS TYPE ARRAY IS TABLE OF Customer%ROWTYPE INDEX BY BINARY_INTEGER; oldvals ARRAY; empty ARRAY; END; / Initialize the array CREATE OR REPLACE TRIGGER SoftDeleteCustomerBefore BEFORE DELETE ON Customer BEGIN SoftDeleteCustomerPKG.oldvals := SoftDeleteCustomerPKG.empty; END; / Capture the deleted rows CREATE OR REPLACE TRIGGER

SoftDeleteCustomerStore BEFORE DELETE ON Customer FOR EACH ROW DECLARE i NUMBER DEFAULT SoftDeleteCustomerPKG.oldvals.COUNT+1; BEGIN SoftDeleteCustomerPKG.oldvals(i).CustomerID := :old.CustomerID; deleteCustomer.oldvals(i).Name := :old.Name; deleteCustomer.oldvals(i).PhoneNumber := :old.PhoneNumber; END; / Insert the customers back with the isdeleted flag set to true. CREATE OR REPLACE TRIGGER SoftDeleteCustomerAdd AFTER DELETE ON Customer DECLARE BEGIN FOR i IN 1 .. SoftDeleteCustomerPKG.oldvals.COUNT LOOP insert into Customer(CustomerID,Name,PhoneNumber,isDeleted) values( deleteCustomer.oldvals(i).CustomerID, deleteCustomer.oldvals(i).Name, deleteCustomer.oldvals(i).PhoneNumber, TRUE); END LOOP; END; /

Data-Migration Mechanics There is no dat a t o be m igrat ed per se, alt hough t he value of Cust om er.isDelet ed m ust be set t o t he appropriat e default value wit hin all rows when t his refact oring is im plem ent ed. This is t ypically done once in bat ch via one or m ore script s.

Access Program Update Mechanics When you apply the I nt roduce Soft Delet e refact oring, you m ust change ext ernal program s accessing t he dat a. First , you m ust change read queries now t o ensure t hat dat a read from t he dat abase has not been m arked as delet ed. Applicat ions m ust add a W H ERE clause t o all SELECT queries, such as W H ERE isD e le t e d = FALSE. I nst ead of changing all t he read queries, you can use Encapsulat e Table Wit h View refact oring ( page 243) so t hat t he view ret urns rows from Cust om er where isDelet ed is not TRUE. Anot her opt ion is t o apply t he Add Read Met hod refact oring ( page 240) so t hat t he appropriat e W H ERE clause is im plem ent ed in one place. Second, you m ust change delet ion m et hods. All ext ernal program s m ust change physical delet es t o updat es t hat updat e Cust om er.isDelet ed inst ead of physically rem oving t he row. For exam ple, DELETE FROM Cu st om e r W H ERE PKColu m n = n n n will change t o UPD ATE Cu st om e r SET isD e le t e d = TRUE W H ERE PKColu m n = n n n. Alt ernat ively, as not ed earlier, you can int roduce a delet e t rigger t hat prevent s t he delet ion and updat es Cust om er.isDelet ed t o TRUE. The following code shows you how t o set t he init ial value of t he Cust om er.isDelet ed colum n: UPDATE Customer SET isDeleted = FALSE WHERE isDeleted IS NULL;

The following code depict s t he read m et hod for a sim ple Cust om er obj ect before and aft er I nt roduce Soft Delet e refact oring is int roduced: // Before code stmt.prepare( "SELECT CustomerId, Name, PhoneNumber " + "FROM Customer" + "WHERE " + " CustomerId = ?"); stmt.setLong(1,customer.getCustomerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); // After code stmt.prepare( "SELECT CustomerId, Name, PhoneNumber " + "FROM Customer" + "WHERE " + " CustomerId = ? "+ " AND isDeleted = ?"); stmt.setLong(1,customer.getCustomerID); stmt.setBoolean(2,false); stmt.execute(); ResultSet rs = stmt.executeQuery();

The before and aft er code snippet shows you how t he delet e m et hod changes aft er I nt roduce Soft Delet e refact oring is int roduced: //Before code stmt.prepare("DELETE FROM Customer WHERE CustomerID=?"); stmt.setLong(1, customer.getCustomerID); :stmt.executeUpdate(); //After code stmt.prepare("UPDATE Customer SET isDeleted = ?"+ "WHERE CustomerID=?"); stmt.setBoolean(1, true); stmt.setLong(2, customer.getCustomerID); stmt.executeUpdate();

Introduce Trigger For History I nt roduce a new t rigger t o capt ure dat a changes for hist orical or audit purposes.

Motivation The prim ary reason you would apply I nt roduce Trigger For Hist ory is t o delegat e t he t racking of dat a changes t o t he dat abase it self. This st rat egy ensures t hat if any applicat ion m odifies crit ical source dat a, t he change will be t racked or audit ed.

Potential Tradeoffs The prim ary t rade- off is perform ance relat edt he addit ion of t he t rigger will increase t he t im e it t akes t o updat e rows in Cust om er of Figure 8.7 .

Figu r e 8 .7 . I n t r odu cin g a h ist or y t r igge r for Cu st om e r .

Furt herm ore, you m ay be forced t o updat e your applicat ions t o pass user cont ext inform at ion so t hat t he dat abase can record who m ade t he change.

Schema Update Mechanics To apply I nt roduce Trigger for Hist ory , you need t o do t he following:

1 . D e t e r m in e a ct ion s t o colle ct h ist or y for . Alt hough it is possible t o insert , updat e, and delet e dat a in Cust om er, you m ay not need t o t rack all t he changes. For exam ple, perhaps you are only int erest ed in t racking updat es and delet ions but not original insert ions. 2 . D e t e r m in e w h a t colu m n s t o colle ct h ist or y for. You have t o ident ify t he colum ns in Cust om er against which you are int erest ed in t racking changes t o. For exam ple, perhaps you only want t o t rack changes t o t he PhoneNum ber colum n but not hing else. 3 . D e t e r m in e h ow t o r e cor d t h e h ist or ica l da t a. You have t wo basic choices: First , you could have a generic t able t hat t racks all t he hist orical dat a changes wit hin your dat abase, or you could int roduce a corresponding hist ory t able for each t able t hat you want t o record hist ory for. The single- t able approach will not scale very well, alt hough is appropriat e for sm aller dat abases. 4 . I n t r odu ce t h e h ist or y t a ble. I f t he hist ory t able does not yet exist , creat e it via t he CREATE TABLE com m and. 5 . I n t r odu ce t h e t r igge r ( s) . I nt roduce t he appropriat e t rigger( s) via t he CREATE OR REPLACE TRI GGER com m and. This could be achieved by j ust having a t rigger t hat capt ures t he original im age of t he row and insert ing it int o t he Cust om erHist ory t able. A second st rat egy is t o com pare t he before and aft er values of t he pert inent colum ns and st ore descript ions of t he changes int o Cust om erHist ory.

The following code shows you how t o add t he Cust om erHist ory t able and how t o updat e it via t he Updat eCust om erHist ory t rigger. We are capt uring t he change in every colum n on t he Cust om er t able and recording t hem in Cust om erHist ory: Create the CustomerHistory table CREATE TABLE CustomerHistory ( CustomerID NUMBER, OldName VARCHAR2(32), NewName VARCHAR2(32), ChangedBy NUMBER, ChangedOn DATE ); Auditing trigger for table Customer CREATE OR REPLACE TRIGGER UpdateCustomerHistory AFTER INSERT OR UPDATE OR DELETE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN Handle Updating of Rows IF UPDATING THEN IF NOT (NVL(:OLD.Name = :NEW.Name,FALSE) OR (:OLD.Name IS NULL AND :NEW.Name IS NULL)) THEN CreateCustomerHistoryRow( :NEW.CustomerId, :NEW.Name, :OLD.Name, User); END IF; END IF; Handle Deleting of Records IF DELETING THEN IF (:OLD.Name IS NOT NULL) THEN CreateCustomerHistoryRow (:OLD.CustomerId,:OLD.Name,NULL,User);

END IF; END IF; IF INSERTING THEN IF (:NEW.Name IS NOT NULL) THEN CreateCustomerHistoryRow(:NEW.CustomerId, :NEW.Name,NULL,User); END IF; END IF; END; /

Data-Migration Mechanics Dat a m igrat ion is t ypically not required for t his dat abase refact oring. However, if you want t o t rack insert ions, you m ay decide t o creat e a hist ory record for each exist ing row wit hin Cust om er. This works well when Cust om er includes a colum n recording t he original creat ion dat e; ot herwise, you will need t o generat e a value for t he insert ion dat e. ( The easiest t hing t o do is t o use t he current dat e.)

Access Program Update Mechanics When you apply I nt roduce Trigger for Hist ory , you need t o do t he following: 1 . St op a pplica t ion ge n e r a t ion of h ist or y . When you add t riggers t o collect hist orical inform at ion, you need t o ident ify any exist ing logic in ext ernal applicat ions where t he code is creat ing hist orical inform at ion and t hen rework it . 2 . Upda t e pr e se n t a t ion code . I f ext ernal program s current ly display applicat ion- generat ed hist ory, t hey will need t o be reworked t o use t he inform at ion cont ained in Cust om erHist ory. 3 . Pr ovide u se r con t e x t t o da t a ba se . I f you want t he dat abase t rigger t o record which user m ade t he change t o t he dat a, you m ust provide a user cont ext . Som e applicat ions, in part icular t hose built using Web or n- t ier t echnologies, m ay not be providing t his inform at ion and will need t o be updat ed t o do so. Alt ernat ively, you could j ust supply a default value for t he user cont ext when it is not provided.

Alt hough it would not be part of t he refact oring it self, a relat ed schem a t ransform at ion would be t o m odify t he t able design t o add colum ns t o record who m odified t he record and when it was done. A com m on st rat egy is t o add colum ns such as UserCreat ed, Creat ionDat e, UserLast Modified, and Last ModifiedDat e t o m ain t ables ( see I nt roduce New Colum n on page 301) . The t wo user colum ns would be a user I D t hat could be used as a foreign key t o a user det ails t able. You m ay also need t o add t he user det ails t able ( see Add Lookup Table on page 153) .

Chapter 9. Architectural Refactorings Archit ect ural refact orings are changes t hat im prove t he overall m anner in which ext ernal program s int eract wit h a dat abase. The archit ect ural refact orings are as follows: Add CRUD Met hods Add Mirror Table Add Read Met hod Encapsulat e Table Wit h View I nt roduce Calculat ion Met hod I nt roduce I ndex I nt roduce Read- Only Table Migrat e Met hod From Dat abase Migrat e Met hod To Dat abase Replace Met hod( s) Wit h View Replace View Wit h Met hod( s) Use Official Dat a Source

Add CRUD Methods I nt roduce four st ored procedures ( m et hods) t o im plem ent t he creat ion, ret rieval, updat e, and delet ion ( CRUD) of t he dat a represent ing a business ent it y. Aft er t hese st ored procedures are in place, access t o t he source t ables is rest rict ed t o j ust t hese procedures and adm inist rat ors.

Motivation There are several reasons t o apply Add CRUD Met hods: En ca psu la t e da t a a cce ss. St ored procedures are a com m on way t o encapsulat e access t o your dat abase, albeit not as effect ive as persist ence fram eworks ( see Chapt er 1, " Evolut ionary Dat abase Developm ent " ) . D e cou ple a pplica t ion code fr om da t a ba se t a ble s. St ored procedures are an effect ive way t o decouple applicat ions from dat abase t ables. They enable you t o change dat abase t ables wit hout changing applicat ion code. I m ple m e n t e n t it y- ba se d se cu r it y a cce ss con t r ol ( SAC) . I nst ead of direct ly accessing source t ables, applicat ions inst ead invoke t he relevant st ored procedures. CRUD st ored procedures provide a way t o im plem ent an ent it y- level approach t o SAC. Dat abase product s t ypically enable dat a SAC at t he t able, colum n, and som et im es row levels. But , when t he dat a for com plex business ent it ies such as Cust om er are st ored in several t ables, SAC, at t he ent it y level, can becom e quit e com plicat ed if you rest rict yourself t o dat a- orient ed SAC. Luckily, m ost dat abase product s also im plem ent m et hod SAC, and t hus you can im plem ent ent it y- level SAC via CRUD m et hods.

Potential Tradeoffs The prim ary advant age of encapsulat ing dat abase access in t his m anner is t hat it m akes it easier t o refact or your t able schem a. When you im plem ent com m on schem a refact orings such as Renam e Table ( page 113) , Move Colum n ( page 103) , and Split Colum n ( page 140) , you should only have t o refact or t he corresponding CRUD m et hods t hat access t hem ( assum ing t hat no ot her ext ernal program direct ly accesses t he t able schem a) . Unfort unat ely, t his approach t o dat abase encapsulat ion com es at a cost . Met hods ( st ored procedures, st ored funct ions, and t riggers) are specific t o dat abase vendorsOracle m et hods are not easily port able t o Sybase, for exam ple. Furt herm ore, t here is no guarant ee t hat t he m et hods t hat work in one version of a dat abase will be easy t o port t o a newer version, so you m ay be increasing your upgrade burden, t oo. Anot her issue is lack of flexibilit y. What happens if you want t o access a port ion of a business ent it y? Do you really want t o work wit h all t he ent it y's dat a each t im e or perhaps int roduce CRUD m et hods for t hat subent it y? What happens if you need dat a t hat crosses ent it ies, perhaps for a report ? Do you invoke several st ored procedures t o read each business ent it y t hat you require and t hen select t he dat a you require, or do you apply Add Read Met hod ( page 240) t o ret rieve t he dat a specifically required by t he report ?

Schema Update Mechanics To updat e t he dat abase schem a, you m ust do t he following:

1 . I de n t ify t h e bu sin e ss da t a . You need t o ident ify t he business ent it y, such as Cust om er of Figure 9.1 , for which you want t o im plem ent t he CRUD m et hods. Aft er you know what ent it y you are encapsulat ing access t o, you need t o ident ify t he source dat a wit hin t he dat abase for t hat ent it y. 2 . W r it e t h e st or e d pr oce du r e s. You need t o writ e at least four st ored procedures, one t o creat e t he ent it y, one t o read it based on it s prim ary key, one t o updat e t he ent it y, and one t o delet e it . Addit ional st ored procedures t o ret rieve t he ent it y via m eans ot her t han t he prim ary key can be added by applying t he Add Read Met hod ( page 240) dat abase refact oring. 3 . Te st t h e st or e d pr oce du r e s. One of t he best ways t o work is t o t ake a Test - Driven Developm ent ( TDD) approach; see Chapt er 1, " Evolut ionary Dat abase Developm ent ."

Figu r e 9 .1 . Addin g Cu st om e r CRUD m e t h ods.

Figure 9.1 depict s how t o int roduce t he CRUD m et hods for t he Cust om er ent it y. The code for ReadCust om er is shown here. The code for t he ot her st ored procedures is self- explanat ory: CREATE OR REPLACE PACKAGE CustomerCRUD AS TYPE customerType IS REF CURSOR RETURN Customer%ROWTYPE; PROCEDURE ReadCustomer (readCustomerId IN NUMBER,customers OUT customerType); PROCEDURE CreateCustomer(....); PROCEDURE UpdateCustomer(....); PROCEDURE DeleteCustomer(....); END CustomerCRUD; / CREATE OR REPLACE PACKAGE BODY CustomerCRUD AS PROCEDURE ReadCustomer (readCustomerId IN NUMBER,customerReturn OUT

customerType) IS BEGIN OPEN refCustomer FOR SELECT * FROM Customer WHERE CustomerID = readCustomerId; END ReadCustomer; END CustomerCRUD; /

Set Naming Conventions CRUD m et hods should follow a com m on nam ing convent ion t o im prove t heir readabilit y. We prefer t he Creat eCust om er, ReadCust om er, Updat eCust om er , and Delet eCust om er form at shown in Figure 9.1 alt hough Cust om erCreat e, Cust om erRead, Cust om erUpdat e, and Cust om erDelet e also m ake sense. Choose one approach and st ick t o it .

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics When you int roduce t he CRUD m et hods, you need t o ident ify all t he places in ext ernal applicat ions where t he ent it y is current ly used and t hen refact or t hat code t o invoke t he new st ored procedures as appropriat e. You m ay discover t hat t he individual program s require different subset s of t he dat a, or anot her way t o look at it is t hat som e program s require j ust a lit t le bit m ore dat a t han ot hers. You m ay discover t hat you need t o apply Add CRUD Met hods for several ent it ies sim ult aneously, pot ent ially a m aj or change. The first code exam ple shows how t he Cust om er Java class originally subm it t ed hard- coded SQL t o t he dat abase t o ret rieve t he appropriat e dat a. The second code exam ple shows how it would be refact ored t o invoke t he st ored procedure: // Before code stmt.prepare( "SELECT FirstName, Surname, PhoneNumber FROM Customer " + "WHERE CustomerId=?"); stmt.setLong(1,customerId); stmt.execute(); ResultSet rs = stmt.executeQuery(); //After code stmt = conn.prepareCall("begin ? := ReadCustomer(?); end;"); stmt.registerOutParameter(1, OracleTypes.CURSOR); stmt.setLong(2, customerId); stmt.execute(); ResultSet rs = stmt.getObject(1);

Add Mirror Table Creat e a m irror t able, an exact duplicat e of an exist ing t able in one dat abase, in anot her dat abase.

Motivation There are several reasons why you m ay want t o apply Add Mirror Table: I m pr ove qu e r y pe r for m a n ce . Querying a given set of t ables m ay be slow due t o t he dat abase being in a rem ot e locat ion; t herefore, a prepopulat ed t able on a local server m ay im prove overall perform ance. Cr e a t e r e du n da n t da t a . Many applicat ions query dat a in real t im e from ot her dat abases. A t able cont aining t his dat a in your local dat abase reduces your dependency on t hese ot her dat abase( s) , providing a buffer for when t hey go down or are t aken down for m aint enance. Re pla ce r e du n da n t r e a ds. Several ext ernal program s, or st ored procedures for t hat m at t er, oft en im plem ent t he sam e ret rieval query. These queries t o t he rem ot e dat abase could be replaced by a m irror t able on t he local dat abase t hat replicat es wit h t he rem ot e dat abase.

Potential Tradeoffs The prim ary challenge when int roducing a m irror t able is " st ale dat a." St ale dat a occurs when one t able is updat ed but not t he ot her; rem em ber, eit her t he source or t he m irror t able could pot ent ially be updat ed. This problem increases as m ore m irrors of Cust om er are creat ed in ot her dat abases. As you see in Figure 9.2 , you need t o im plem ent som e sort of synchronizat ion st rat egy.

Figu r e 9 .2 . Addin g a Cu st om e r m ir r or t a ble . [View full size image]

Schema Update Mechanics As depict ed in Figure 9.2 , t o updat e t he dat abase schem a, when you perform Add Mirror Table you m ust do t he following: 1 . D e t e r m in e t h e loca t ion . You m ust decide where t he m irrored t able, Cust om er, will residein t his case, we will be m irroring it in Deskt opDB. 2 . I n t r odu ce t h e m ir r or t a ble . Creat e Deskt opDB.Cust om er in t he ot her dat abase using t he CREATE TABLE com m and of SQL. 3 . D e t e r m in e syn ch r on iza t ion st r a t e gy. The real- t im e approach of Figure 9.2 should be t aken when your end users require up- t o- dat e inform at ion, t he synchronizat ion of dat a. 4 . Allow u pda t e s. I f you want t o allow updat es t o Deskt opDB.Cust om er , you m ust provide a way t o synchronize t he dat a from Deskt opDB.Cust om er t o t he Cust om er. An updat able Deskt opDB.Cust om er is known as a peer- t o- peer m irror; it is also known as a m ast er/ slave m irror .

The following code depict s t he DDL t o int roduce t he Deskt opDB.Cust om er t able: CREATE TABLE Customer ( CustomerID NUMBER NOT NULL, Name VARCHAR(40), PhoneNumber VARCHAR2(40), CONSTRAINT PKCustomer PRIMARY KEY (CustomerID) ); COMMENT ON Customer "Mirror table of Customer on Remote Location"

Data-Migration Mechanics

You m ust init ially copy all t he relevant source dat a int o t he m irror t able, and t hen apply your synchronizat ion st rat egy ( real- t im e updat e or use dat abase replicat ion) . There are several im plem ent at ion st rat egies you can apply for your synchronizat ion st rat egy:

1 . Pe r iodic r e fr e sh . Use a scheduled j ob t hat synchronizes your Cust om er and Deskt opDB.Cust om er t able. The j ob m ust be able t o deal wit h dat a changes on bot h t he t ables and be able t o updat e dat a bot h ways. Periodic refreshes are usually bet t er when you are building dat a warehouses and dat a m art s. 2 . D a t a ba se r e plica t ion . Dat abase product s provide a feat ure where you can set up t ables t o be replicat ed bot h ways called m ult im ast er replicat ion. The dat abase keeps bot h t he t ables synchronized. Generally speaking, you would use t his approach when you want t o have t he Cust om er and Deskt opDB.Cust om er updat able by t he users. I f your dat abase product provides t his feat ure, it is advisable t o use t his feat ure over using cust om - coded solut ions. 3 . Use t r igge r - ba se d syn ch r on iza t ion . Creat e t riggers on t he Cust om er so t hat source dat a changes are propagat ed t o t he Deskt opDB.Cust om er and creat e t riggers on t he Deskt opDB.Cust om er so t hat changes t o t he t able are propagat ed t o Cust om er. This t echnique enables you t o cust om code t he dat a synchronizat ion, which is desirable when you have com plex dat a obj ect s t hat need t o be synchronized; however, you m ust writ e all t he t riggers, which could be t im e consum ing. The following code depict s how t o synchronize dat a in Product ionDB.Cust om er t able and Deskt opDB.Cust om er using t riggers: CREATE OR REPLACE TRIGGER UpdateCustomerMirror AFTER UPDATE OR INSERT OR DELETE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF DELETING THEN DeleteCustomerMirror; END IF; IF (UPDATING OR INSERTING) THEN UpdateCustomerMirror; END IF; END; END; / CREATE OR REPLACE TRIGGER UpdateCustomer AFTER UPDATE OR INSERT OR DELETE ON CustomerMirror REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF DELETING THEN DeleteCustomer; END IF; IF (UPDATING OR INSERTING) THEN UpdateCustomer; END IF; END; END;

/

Access Program Update Mechanics You will have already analyzed why you are int roducing t his Deskt opDB.Cust om er . For exam ple, if you are going t o m irror Cust om er on your rem ot e dat abase for easy access on t he local dat abase, you m ust m ake sure t hat t he applicat ion connect s t o t his local dat abase when it want s t o query t he Cust om er t able inst ead of t he product ion dat abase. You can also set up alt ernat e connect ion propert ies t hat enable you t o use Deskt opDB or Product ionDB t o query your dat a, as you see in t he following code: // Before code stmt = remoteDB.prepare("select CustomerID,Name,PhoneNumber FROM Customer WHERE CustomerID = ?"); stmt.setLong(1, customerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); //After code stmt = localDB.prepare("select CustomerID,Name,PhoneNumber FROM Customer WHERE CustomerID = ?"); stmt.setLong(1, customerID); stmt.execute(); ResultSet rs = stmt.executeQuery();

Add Read Method I nt roduce a m et hod, in t his case a st ored procedure, t o im plem ent t he ret rieval of t he dat a represent ing zero or m ore business ent it ies from t he dat abase.

Motivation The prim ary reason t o apply Add Read Met hod is t o encapsulat e t he logic required t o ret rieve specific dat a from t he dat abase in accordance wit h defined crit eria. The m et hod, oft en a st ored procedure, m ight be required t o replace one or m ore SELECT st at em ent s im plem ent ed in your applicat ion and/ or report ing code. Anot her com m on m ot ivat ion is t o im plem ent a consist ent search st rat egy for a given business ent it y. Som et im es t his refact oring is applied t o support t he I nt roduce Soft Delet e ( page 222) or I nt roduce Hard Delet e ( page 219) refact orings. When eit her of t hese refact orings is applied, t he way in which dat a is delet ed changeseit her you m ark a row as delet ed or you physically delet e t he dat a, respect ively. A change in st rat egy such as t his m ay require a m ult it ude of changes t o be im plem ent ed wit hin your applicat ions t o ensure t hat ret rievals are perform ed correct ly. By encapsulat ing t he dat a ret rieval in a st ored procedure, and t hen invoking t hat st ored procedure appropriat ely, it is m uch easier t o im plem ent t hese t wo refact orings because t he requisit e ret rieval logic only needs t o be reworked in one place ( t he st ored procedure) .

Potential Tradeoffs The prim ary advant age of encapsulat ing ret rieval logic in t his m anner, oft en in com binat ion wit h Add CRUD Met hods ( page 232) , is t hat it m akes it easier t o refact or your t able schem a. Unfort unat ely, t his approach t o dat abase encapsulat ion com es at a cost . St ored procedures are specific t o dat abase vendors, reducing your pot ent ial for port abilit y, and m ay decrease your overall perform ance if t hey are writ t en poorly.

Schema Update Mechanics To updat e t he dat abase schem a, you m ust do t he following: 1 . I de n t ify t h e da t a . You need t o ident ify t he dat a you want t o ret rieve, which m ay com e from several t ables. 2 . I de n t ify t h e cr it e r ia . How do you want t o specify t he subset of dat a t o ret rieve? For exam ple, would you like t o be able t o ret rieve inform at ion for bank account s whose balance is over a specific am ount , or which have been opened at a specific branch, or which have been accessed during a specific period of t im e, or for a com binat ion t hereof. Not e t hat t he crit eria m ay or m ay not involve t he prim ary key. 3 . W r it e a n d t e st t h e st or e d pr oce du r e . We are firm believers in writ ing a full, 100 percent regression t est suit e for your st ored procedures. Bet t er yet , we recom m end a Test - Driven Developm ent ( TDD) approach where you writ e a t est before writ ing a lit t le bit of code wit hin your st ored procedures ( Ast els 2003 ; Beck 2003) .

Figure 9.3 shows how t o int roduce a read st ored procedure for t he Cust om er ent it y t hat t akes as param et ers a first and last nam e. The code for t he Read- Cust om er st ored procedure is shown below. I t is writ t en assum ing t hat it would get param et ers passed t o it such as S% and Am bler and ret urn all t he people wit h t he surnam e Am bler whose first nam e begins wit h t he let t er S.

Figu r e 9 .3 . Addin g a Cu st om e r r e a d m e t h od.

PROCEDURE ReadCustomer ( firstNameToFind IN VARCHAR, lastNameToFind IN VARCHAR, customerRecords OUT CustomerREF ) IS BEGIN OPEN refCustomer FOR SELECT * FROM Customer WHERE Customer.FirstName = firstNameToFind AND Customer.LastName = lastNameToFind; END ReadCustomer; /

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics When you int roduce t he st ored procedure, you need t o ident ify all t he places in ext ernal applicat ions where t he dat a is read and t hen refact or t hat code t o invoke t he new st ored procedure appropriat ely. You m ay discover t hat t he individual program s require different subset s of t he dat a, and t herefore you m ay need t o apply Add Read Met hod several t im es, one for each m aj or collect ion of dat a elem ent s. I n t he following code, applicat ions would subm it SELECT st at em ent s t o t he dat abase t o ret rieve cust om er inform at ion; but aft er t he refact oring, t hey j ust invoke t he st ored procedure: // Before code stmt.prepare(

"SELECT * FROM Customer" + "WHERE Customer.FirstName=? AND Customer.LastName=?"); stmt.setString(1,firstNameToFind); stmt.setString(2,lastNameToFind); stmt.execute(); ResultSet rs = stmt.executeQuery(); //After code stmt = conn.prepareCall("begin ? := ReadCustomer(?,?,?); end;"); stmt.registerOutParameter(1, OracleTypes.CURSOR); stmt.setString(2, firstNameToFind); stmt.setString(3,lastNameToFind); stmt.execute(); ResultSet rs = stmt.getObject(1); while (rs.next()) { getCustomerInformation(rs); }

Encapsulate Table With View Wrap access t o an exist ing t able wit h a view.

Motivation There are t wo reasons t o apply Encapsulat e Table Wit h View . First , t o im plem ent a façade around an exist ing t able t hat you int end t o lat er refact or. By writ ing applicat ions t o access source dat a t hrough views inst ead of direct ly t hrough t ables, you reduce t he direct coupling t o t he t able, m aking it easier t o refact or. For exam ple, you m ay want t o encapsulat e access t o a t able via a view first before you apply refact orings such as Renam e Colum n ( page 109) or Drop Colum n ( page 72 ) . Second, you m ay want t o put a securit y access cont rol ( SAC) st rat egy in place for your dat abase. You can do t his by encapsulat ing access t o a t able via several views, and t hen rest rict ing access t o t he t able and t he various views appropriat ely. Few users, if any, would have direct access t o t he source t able; users would inst ead be grant ed access right s t o one or m ore views. The first st ep of t his process is t o encapsulat e access t o t he original t able wit h a view, and t hen you int roduce new views as appropriat e.

Potential Tradeoffs This refact oring, as shown in Figure 9.4 , will only work when your dat abase support s t he sam e level of access t hrough views as it does t o t ables. For exam ple, if Cust om er is updat ed by an ext ernal program , your dat abase m ust support updat able views. I f it does not , you should consider t he refact oring Add CRUD Met hod ( page 232) t o encapsulat e access t o t his t able inst ead.

Figu r e 9 .4 . En ca psu la t in g t h e TCu st om e r t a ble w it h a vie w .

Schema Update Mechanics As you can see in Figure 9.4 , t his refact oring is st raight forward. The first st ep is t o renam e t he exist ing t able, in t his case t o TCust om er. The second st ep is t o int roduce a view wit h t he nam e of t he original t able, in t his case Cust om er. From t he point of view of t he ext ernal program s t hat access t he t able, not hing has really changed. They sim ply access t he source dat a t hrough t he view, which t o t hem appears t o be t he original t able. One way t o renam e t he Cust om er t able is t o use t he REN AM E TO clause of t he ALTER TABLE SQL

com m and, as shown below. I f your dat abase does not support t his clause, you m ust creat e TCust om er, copy t he dat a from Cust om er, and t hen drop t he Cust om er t able. An exam ple of t his approach is provided wit h t he Renam e Table refact oring ( page 113) . Regardless of how you renam e t he t able, t he next st ep is t o add t he view via t he CREATE VI EW com m and: ALTER TABLE Customer RENAME TO TCustomer; CREATE VIEW Customer AS SELECT * FROM TCustomer;

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics You should not have t o updat e any access program s because t he view is ident ical in st ruct ure t o t he original t able.

Introduce Calculation Method I nt roduce a new m et hod, t ypically a st ored funct ion, t hat im plem ent s a calculat ion t hat uses dat a st ored wit hin t he dat abase.

Figu r e 9 .5 . I n t r odu cin g a ca lcu la t ion m e t h od for Cu st om e r .

Motivation There are several reasons t o apply I nt roduce Calculat ion Met hod: To im pr ove a pplica t ion pe r for m a n ce . You can im prove overall perform ance by im plem ent ing a calculat ion t hat requires significant am ount s of dat a wit hin t he dat abase and t hen responding wit h j ust t he answer. This avoids shipping t he required dat a across t he net work t o do t he calculat ion. To im ple m e n t a com m on , r e u sa ble ca lcu la t ion . A calculat ion m ay be im plem ent ed by several exist ing m et hods. Therefore, it m akes sense t o ext ract t hat calculat ion int o a single dat abase m et hod t hat is invoked by t he ot hers. To su ppor t I n t r odu ce Ca lcu la t e d Colu m n ( pa ge 8 1 ) . You can im plem ent t he logic required t o calculat e t he value of t he colum n in a st ored funct ion. To r e pla ce a ca lcu la t e d colu m n . You m ay choose t o replace a calculat ed colum n wit h a st ored procedure t hat you invoke inst ead. To rem ove t he colum n, apply the Drop Colum n ( page 72 ) refact oring.

Potential Tradeoffs When t oo m uch logic is im plem ent ed wit hin your dat abase, it can becom e a perform ance bot t leneck if

t he dat abase has not been archit ect ed t o handle t he load. You need t o eit her ensure t hat your dat abase is scalable or lim it t he am ount of funct ionalit y t hat you im plem ent in it .

Schema Update Mechanics To updat e t he dat abase schem a, you m ust writ e and t hen t est t he st ored procedure. I f exist ing st ored procedures im plem ent t his logic, t hey should be refact ored t o invoke t his st ored procedure. The following code shows how t o int roduce t he st ored funct ion Get Cust om erAccount Tot al t o t he Cust om erDB dat abase: CREATE OR REPLACE FUNCTION getCustomerAccountTotal (CustomerID IN NUMBER) RETURN NUMBER IS customerTotal NUMBER; BEGIN SELECT SUM(Amount) INTO customerTotal FROM Policy WHERE PolicyCustomerId=CustomerID; RETURN customerTotal; EXCEPTION WHEN no_data_found THEN RETURN 0; END; END; /

Data-Migration Mechanics There is no dat a t o m igrat e for t his dat abase refact oring.

Access Program Update Mechanics When you int roduce t he calculat ion m et hod, you need t o ident ify all t he places in ext ernal applicat ions where t his calculat ion is current ly im plem ent ed and t hen refact or t he code t o appropriat ely invoke t he new st ored procedure. The following code shows how t he calculat ion used t o be perform ed as a Java operat ion, but aft erward t he operat ion sim ply invokes t he appropriat e st ored funct ion: //Before code private BigDecimal getCustomerTotalBalance() { BigDecimal customerTotalBalance = new BigDecimal(0); for (Iterator iterator = customer.getPolicies().iterator(); iterator.hasNext();) { Policy policy = (Policy) iterator.next(); customerTotalBalance.add(policy.getBalance()); } return customerTotalBalance; } //After code private BigDecimal getCustomerTotalBalance() { BigDecimal customerTotalBalance = new BigDecimal(0); stmt = connection.prepareCall("{call getCustomerAccountTotal(?)}"); stmt.registerOutParameter(1, Types.NUMBER); stmt.setString(1, customer.getId); stmt.execute(); customerTotalBalance = stmt.getBigDecimal(1);

return customerTotalBalance; }

You m ay discover t hat t he calculat ion is im plem ent ed in slight ly different ways in various applicat ions; perhaps t he business rule has changed over t im e but t he applicat ion was not updat ed, or perhaps t here is a valid reason for t he differences. Regardless, you need t o negot iat e any changes wit h t he appropriat e proj ect st akeholders.

Introduce Index I nt roduce a new index of eit her unique or nonunique t ype.

Motivation The reason why you want t o int roduce an index t o a t able is t o increase query perform ance on your dat abase reads. You m ay also need t o int roduce an index t o creat e a prim ary key for a t able, or t o support a foreign key t o anot her t able, when applying Consolidat e Key St rat egy ( page 168) .

Potential Tradeoffs Too m any indexes on a t able will degrade perform ance when you updat e, insert int o, or delet e from t he t able. Many t im es you m ay want t o int roduce a unique index but will not be able t o because t he exist ing dat a cont ains duplicat e values, forcing you t o rem ove t he duplicat es before applying t his refact oring.

Schema Update Mechanics Applying I nt roduce I ndex can be com plicat ed because of dat a dependencies, pot ent ially requiring updat es t o bot h dat a and applicat ion code. You need t o do t he following: 1 . D e t e r m in e t ype of in de x . You have t o decide whet her you need a unique or nonunique index, which is usually det erm ined by t he business rules surrounding t he dat a at t ribut es and your usage of t he dat a. For exam ple, in t he Unit ed St at es, individuals are assigned unique Social Securit y Num bers ( SSNs) . Wit hin m ost com panies, cust om ers are assigned unique cust om er num bers. But , your t elephone num ber m ay not be unique ( several people m ay share a single num ber) . Therefore, bot h SSN and Cust om erNum ber would pot ent ially be valid colum ns t o define a unique index for, but TelephoneNum ber would likely require a nonunique index. 2 . Add a n e w in de x . I n t he exam ple of Figure 9.6 , a new index based on SocialSecurit yNum ber needs t o be int roduced for Cust om er, using t he CREATE I N D EX com m and of SQL. 3 . Pr ovide m or e disk spa ce . When you use t he I nt roduce I ndex refact oring, you m ust plan for m ore disk space usage, so you m ay need t o allocat e m ore disk space.

Figu r e 9 .6 . I n t r odu cin g a n in de x for t h e Cu st om e r t a ble.

Data-Migration Mechanics There is no dat a t o be m igrat ed per se, alt hough t he value of Cust om er.SocialSecurit yNum ber m ust be checked for duplicat e values if you are int roducing a unique index. Duplicat e values m ust be addressed by eit her updat ing t hem or by deciding t o use a nonunique index. The following code det erm ines whet her we have duplicat e values in t he Cust om er.SocialSecurit yNum ber colum n: Find any duplicate values in customer.socialsecuritynumber. SELECT SocialSecurityNumber,COUNT(SocialSecurityNumber) FROM Customer GROUP BY SocialSecurityNumber HAVING COUNT(SocialSecurityNumber)>1;

I f you find any duplicat e values, you will have t o change t hem before you can apply I nt roduce I ndex. The following code shows a way t o do so: First , you use t he Cust om er.Cust om erI D as an anchor t o find duplicat e rows, and t hen you replace t he duplicat e values by apply the Updat e Dat a refact oring ( page 310) : Create a temp table of the duplicates CREATE TABLE temp_customer AS SELECT * FROM Customer parent WHERE CustomerID != (SELECT MAX(CustomerID) FROM Customer child WHERE parent.SocialSecurityNumber = child.SocialSecurityNumber); Now create the Unique Index CREATE UNIQUE INDEX Customer_Name_UNQ ON Customer(SocialSecurityNumber) ;

Access Program Update Mechanics To updat e access program s, you need t o first analyze t he dependencies t o det erm ine which ext ernal program s access dat a in Cust om er. First , t hese ext ernal program s m ust be able t o m anage dat abase except ions t hat are t hrown by t he dat abase in case t hey generat e duplicat e values. Second, you should change your queries t o m ake use of t his new index t o get bet t er perform ance of dat a ret rieval. Som e dat abase product s enable you t o specify what index t o use while ret rieving rows from t he

dat abase t hrough t he use of hint s, an exam ple of which follows: SELECT /*+ INDEX(Customer_Name_UNQ) */ CustomerID, SocialSecurityNumber FROM Customer WHERE SocialSecurityNumber = 'XXX-XXX-XXX';

Introduce Read-Only Table Creat e a read- only dat a st ore based on exist ing t ables in t he dat abase. There are t wo ways t o im plem ent t his refact oring: as a t able populat ed in real t im e or as a t able populat ed in bat ch.

Motivation There are several reasons why you m ay want t o apply I nt roduce Read- Only Table: I m pr ove qu e r y pe r for m a n ce . Querying a given set of t ables m ay be very slow because of t he requisit e j oins; t herefore, a prepopulat ed t able m ay im prove overall perform ance. Su m m a r ize da t a for r e por t in g. Many report s require sum m ary dat a, which can be prepopulat ed int o a read- only t able and t hen used m any t im es over. Cr e a t e r e du n da n t da t a . Many applicat ions query dat a in real t im e from ot her dat abases. A read- only t able cont aining t his dat a in your local dat abase reduces your dependency on t hese ot her dat abase( s) , providing a buffer for when t hey go down or are t aken down for m aint enance. Re pla ce r e du n da n t r e a ds. Several ext ernal program s, or st ored procedures for t hat m at t er, oft en im plem ent t he sam e ret rieval query. These queries can be replaced by a com m on readonly t able or a new view; see I nt roduce View ( page 306) . D a t a se cu r it y. A read- only t able enables end users t o query t he dat a but not updat e it . I m pr ove da t a ba se r e a da bilit y. I f you have a highly norm alized dat abase, it is usually difficult for users t o navigat e t hrough all t he t ables t o get t o t he required inform at ion. By int roducing read- only t ables t hat capt ure com m on, denorm alized dat a st ruct ures, you m ake your dat abase schem a easier t o underst and because people can st art by focusing j ust on t he denorm alized t ables.

Potential Tradeoffs The prim ary challenge wit h int roducing a read- only t able is " st ale dat a," dat a t hat does not represent t he current st at e of t he source where it was populat ed from . For exam ple, you could pull dat a from a rem ot e syst em t o populat e a local t able, and im m ediat ely aft er doing so, t he source dat a get s updat ed. The users of t he read- only t able need t o underst and bot h t he t im eliness of t he copied dat a as well as t he volat ilit y of t he source dat a t o det erm ine whet her t he read- only t able is accept able t o t hem .

Schema Update Mechanics As depict ed in Figure 9.7 , t o updat e t he dat abase schem a when you perform I nt roduce Read- Only Table, you m ust first int roduce t he new t able. You m ust decide on t he st ruct ure of t he read- only t able t hat you want t o int roduce and t hen can creat e it using t he CREATE TABLE com m and. Next , det erm ine a populat ion st rat egy. A real- t im e approach should be t aken when your end users require up- t o- dat e inform at ion; t he j oin t o creat e t he read- only t able is relat ively sim ple, and t he dat a can t herefore be safely updat ed; and t he result ing t able is sm all, t hus allowing for adequat e perform ance. A bat ch approach should be t aken when t he real- t im e approach is not appropriat e.

Figu r e 9 .7 . I n t r odu cin g t h e r e a d- on ly t a ble Cu st om e r Por t folio. [View full size image]

Figure 9.7 shows an exam ple where t he Cust om erPort folio t able is a read- only t able based on t he Cust om er, Account, and I nsurance t ables t hat sum m arizes t he business t hat we do wit h each cust om er. This provides an easier way for end users t o do ad- hoc queries t hat analyze t he cust om er inform at ion. The following code depict s t he DDL t o int roduce t he Cust om erPort folio t able: CREATE TABLE CustomerPortfolio ( CustomerID NUMBER NOT NULL, Name VARCHAR(40), PhoneNumber VARCHAR2(40), AccountsTotalBalance NUMBER, InsuranceTotalPayment NUMBER, InsuranceTotalValue NUMBER, CONSTRAINT PKCustomerPortfolio PRIMARY KEY (CustomerID) );

There are several im plem ent at ion st rat egies. First , you could t rust developers t o do t he right t hing and sim ply m ark t he t able as read- only via a com m ent . Second, you could im plem ent t riggers which t hrow an error on insert ion, updat e, or delet ion. Third, you could rest rict access t o t he t able via your securit y access cont rol ( SAC) st rat egy. Fourt h, apply Encapsulat e Table Wit h View ( page 243) and m ake t he view read- only.

Data-Migration Mechanics You m ust init ially copy all t he relevant source dat a int o t he read- only t able, and t hen apply your populat ion st rat egy ( real- t im e updat e or periodic bat ch updat e) . There are several im plem ent at ion st rat egies you can apply for your populat ion st rat egy:

1 . Pe r iodic r e fr e sh . Use a scheduled j ob t hat refreshes your read- only t able. The j ob m ay refresh all t he dat a in t he read- only t able or it m ay j ust updat e t he changes since t he last refresh. Not e t hat t he am ount of t im e t aken t o refresh t he dat a should be less t han t he scheduled int erval t im e of t he refresh. This t echnique is part icularly suit ed for dat a warehouse kind of environm ent s, where dat a is generally sum m arized and used t he next day. Hence, st ale dat a can be t olerat ed; also, t his approach provides you wit h an easier way t o synchronize t he dat a. 2 . M a t e r ia lize d vie w s. Som e dat abase product s provide a feat ure where a view is no longer j ust a query; inst ead, it is act ually a t able based on a query. The dat abase keeps t his m at erialized view current based on t he opt ions you choose when you creat e it . This t echnique enables you t o use t he dat abase's built - in feat ures t o refresh t he dat a in t he m at erialized view, wit h t he m aj or downside being t he com plexit y of t he view SQL. When t he view SQL get s m ore com plicat ed, t he dat abase product s t end not t o support aut om at ed synchronizat ion of t he view. 3 . Use t r igge r - ba se d syn ch r on iza t ion . Creat e t riggers on t he source t ables so t hat source dat a changes are propagat ed t o t he read- only t able. This t echnique enables you t o cust om code t he dat a synchronizat ion, which is desirable when you have com plex dat a obj ect s t hat need t o be synchronized; however, you m ust writ e all of t he t riggers, which could be t im e consum ing. 4 . Use r e a l- t im e a pplica t ion u pda t e s. You can change your applicat ion so t hat it updat es t he read- only t able, m aking t he dat a current . This can only work when you know all t he applicat ions t hat are writ ing dat a t o your source dat abase t ables. This t echnique allows for t he applicat ion t o updat e t he read- only t able, and hence it s always kept current , and you can m ake sure t hat t he dat a is not used by t he applicat ion. The downside of t he t echnique is you m ust writ e your inform at ion t wice, first t o t he original t able and second t o t he denorm alized read- only t able; t his could lead t o duplicat ion and hence bugs. The following code depict s how t o populat e dat a in Cust om erPort folio t able for t he first t im e: INSERT INTO CustomerPortfolio SELECT Customer.CustomerID, Customer.Name, Customer.PhoneNumber, SUM(Account.Balance), SUM(Insurance.Payment), SUM(Insurance.Value) FROM Customer,Account,Insurance WHERE Customer.CustomerID=Account.CustomerID AND Customer.CustomerID=Insurance.CustomerID GROUP BY Customer.CustomerID, Customer.Name, Customer.PhoneNumber;

The following code depict s how t o synchronize dat a from t he source t ables of Cust om er, Account , and I nsurance. Because t he source t ables are updat able but Cust om erPort folio is not , we only need oneway updat es: CREATE OR REPLACE TRIGGER UpdateCustomerPortfolioCustomer AFTER UPDATE OR INSERT OR DELETE ON Customer REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN IF DELETING THEN DeleteCustomerPortfolio; END IF; IF (UPDATING OR INSERTING) THEN UpdateCustomerPortfolio; END IF; END; END; / CREATE OR REPLACE TRIGGER UpdateCustomerPortfolioAccount AFTER UPDATE OR INSERT OR DELETE ON Account REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN UpdateCustomerPortfolioForAccount; END; END; / CREATE OR REPLACE TRIGGER UpdateCustomerPortfolioInsurance AFTER UPDATE OR INSERT OR DELETE ON Insurance REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW DECLARE BEGIN UpdateCustomerPortfolioForInsurance; END; END; /

Access Program Update Mechanics You will have already analyzed why you are int roducing t his read- only t able. For exam ple, in Figure 9.7, if you are going t o use Cust om erPort folio inside t he applicat ion, you m ust m ake sure t hat t he applicat ion uses t his for read- only purposes. I f you are providing Cust om erPort folio as a sum m arized t able for int egrat ion t o som e ot her applicat ion, you m ust keep t he dat a updat ed in Cust om erPort folio. When you decide t o use dat a from Cust om erPort folio, you m ust change all t he places where you current ly access t he source t ables and rework t hem t o use Cust om erPort folio inst ead.

You m ay also want t o show your end users how recent t he dat a is by displaying t he value of t he Cust om erPort folio.Last Updat edTim e colum n. Furt herm ore, if bat ch report s require up- t o- dat e dat a, t he bat ch j ob will need t o be reworked t o first refresh Cust om erPort folio and t hen run t he report s: // Before code stmt.prepare( "SELECT Customer.CustomerId, " + " Customer.Name, " + " Customer.PhoneNumber, " + " SUM(Account.Balance) AccountsTotalBalance, " + " SUM(Insurance.Payment) InsuranceTotalPayment, " + " SUM(Insurance.Value) InsuranceTotalPayment " + "FROM Customer, Account, Insurance " + "WHERE " + " Customer.CustomerId = Account.CustomerId " + " AND Customer.CustomerId = Insurance.CustomerId " + " AND Customer.CustomerId = ?"); stmt.setLong(1,customer.getCustomerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); // After code stmt.prepare( "SELECT CustomerId, " + " Name, " + " PhoneNumber, " + " AccountsTotalBalance, " + " InsuranceTotalPayment, " + " InsuranceTotalPayment " + "FROM CustomerPortfolio " + "WHERE CustomerId = ?"); stmt.setLong(1,customer.getCustomerID); stmt.execute(); ResultSet rs = stmt.executeQuery();

Migrate Method From Database Rehost an exist ing dat abase m et hod ( a st ored procedure, st ored funct ion, or t rigger) in t he applicat ion( s) t hat current ly invoke it .

Motivation There are four reasons why you m ay want t o apply Migrat e Met hod From Dat abase: To su ppor t va r ia bilit y. When t he m et hod was originally im plem ent ed in t he dat abase, t he business logic was consist ent , or at least was t hought t o be consist ent , bet ween applicat ions. Over t im e, t he applicat ion requirem ent s evolved, and now different logic is required in t he individual applicat ions. To in cr e a se por t a bilit y. Dat abase m et hod languages are specific t o individual dat abase vendors. By im plem ent ing logic out side of t he dat abase, perhaps in a shared com ponent or Web service, you will m ake it easy t o port t o a new dat abase. To in cr e a se sca la bilit y. Alt hough it is possible t o scale dat abases t hrough t he use of grid t echnology, t he fact st ill rem ains t hat you can also scale via ot her m eans such t hrough t he addit ion of addit ional applicat ion servers. Your ent erprise archit ect ural st rat egy m ay be t o scale your syst em s via nondat abase m eans; and if t he m et hod is proving t o be a bot t leneck, you will want t o m igrat e it out of t he dat abase. To in cr e a se m a in t a in a bilit y. The developm ent t ools for leading program m ing languages such as Java, C# , and Visual Basic are oft en m uch m ore sophist icat ed t han t he t ools for dat abase program m ing languages. Bet t er t oolsand, of course, bet t er developm ent pract iceswill m ake your code easier and t herefore less expensive t o m aint ain. Also, by program m ing in a single language, you reduce t he program m ing skill requirem ent s for proj ect t eam m em bersit is easier t o find som eone wit h C# experience t han it is t o find som eone wit h C# and Oracle PLSQL experience.

Potential Tradeoffs There are several t rade- offs associat ed wit h t his refact oring: Re du ce d r e u sa bilit y. Relat ional dat abases are a lowest com m on denom inat or t echnology. Virt ually any applicat ion t echnology can access t hem , and t herefore im plem ent ing shared logic wit hin a relat ional dat abase offers t he great est chance for reusabilit y. Pe r for m a n ce de gr a da t ion . There is t he pot ent ial for decreased perform ance, part icularly if t he m et hod accesses significant am ount s of dat a, which would need t o get t ransm it t ed t o t he m et hod before it could get processed.

Schema Update Mechanics As depict ed in Figure 9.8 , t he change t o t he dat abase schem a is st raight forwardyou j ust m ark t he m et hod t o be rem oved, and t hen aft er t he t ransit ion period has ended, you physically rem ove t he m et hod from t he dat abase. I n t his exam ple, t he Get ValidPolicies st ored procedure was m igrat ed from Cust om erDB t o t he policies adm inist rat ion applicat ion. This was done for several reasons: Only one

applicat ion required t hat business logic, t he net work t raffic rem ained t he sam e regardless of where t he funct ionalit y was perform ed, and t he proj ect t eam want ed t o im prove t he m aint ainabilit y of t heir applicat ion. I t is int erest ing t o not e t hat t he nam e of t he m et hod changed slight ly t o reflect t he nam ing convent ions of t he program m ing language.

Figu r e 9 .8 . M igr a t in g policy r e t r ie va l code in t o t h e policy a dm in ist r a t ion a pplica t ion . [View full size image]

Data-Migration Mechanics There is no dat a t o m igrat e wit h t his dat abase refact oring.

Access Program Update Mechanics The m et hod m ust be developed and t est ed wit hin t he appropriat e applicat ion( s) . You need t o search your program code for invocat ions of t he exist ing m et hod, and t hen replace t hat code wit h t he corresponding funct ion call. The following code exam ple shows how applicat ion code will be developed t o replace funct ionalit y provided by t he m et hod:

//Before code stmt.prepareCall("begin ? := getValidPolicies(); end;"); stmt.registerOutParameter(1, OracleTypes.CURSOR); stmt.execute(); ResultSet rs = stmt.getObject(1); List validPolicies = new ArrayList(); Policy policy = new Policy(); while (rs.next()) { policy.setPolicyId(rs.getLong(1)); policy.setCustomerID(rs.getLong(2)); policy.setActivationDate(rs.getDate(3)); policy.setExpirationDate(rs.getDate(4)); validPolicies.add(policy); } return validPolicies; //After code stmt.prepare( "SELECT PolicyId, CustomerId, "+ "ActivationDate, ExpirationDate " + "FROM Policy" + "WHERE ActivationDate < TRUNC(SYSDATE) AND "+ "ExpirationDate IS NOT NULL AND "+ "ExpirationDate > TRUNC(SYSDATE)"); ResultSet rs = stmt.execute(); List validPolicies = new ArrayList(); Policy policy = new Policy(); while (rs.next()) { policy.setPolicyId(rs.getLong("PolicyId")); policy.setCustomerID(rs.getLong("CustomerId")); policy.setActivationDate(rs.getDate ("ActivationDate")); policy.setExpirationDate(rs.getDate ("ExpirationDate")); validPolicies.add(policy); } return validPolicies;

Migrate Method To Database Rehost exist ing applicat ion logic in t he dat abase.

Motivation There are t hree reasons why you m ay want t o apply Migrat e Met hod To Dat abase: To su ppor t r e u se . Virt ually any applicat ion t echnology can access relat ional dat abases, and t herefore im plem ent ing shared logic as dat abase m et hods ( st ored procedures, funct ions, or t riggers) wit hin a relat ional dat abase offers t he great est chance for reusabilit y. To in cr e a se sca la bilit y. I t is possible t o scale dat abases t hrough t he use of grid t echnology; t herefore, your ent erprise archit ect ure m ay direct you t o host all dat a- orient ed logic wit hin your dat abase. To im pr ove pe r for m a n ce . There is t he pot ent ial for im proved perform ance, part icularly if t he st ored procedure will process significant am ount s of dat a, because t he processing will happen closer t o t he dat abase, which result s in a reduced result set being t ransm it t ed across t he net work.

Potential Tradeoffs The prim ary drawback t o t his refact oring is t he decreased port abilit y of dat abase m et hods, code t hat is specific t o t he individual dat abase vendor. This problem is oft en overblown by purist sit is not com m on for organizat ions t o swit ch dat abase vendors, perhaps because t he vendors have t heir exist ing client base locked in or perhaps because it sim ply is not wort h t he expense any m ore.

Schema Update Mechanics As depict ed in Figure 9.9 , t he change t o t he dat abase schem a is st raight forwardyou j ust develop and t est t he st ored procedure. I n t his exam ple, t he logic is t o ident ify a list of default ed cust om ers, t hose who are behind on t heir paym ent s. This logic is im plem ent ed in t wo places, t he Policy Adm inist rat ion and t he Cust om er Credit applicat ions. Each applicat ion nam ed t he operat ion different ly, I dent ifyDefault edCust om ers( ) and List Cust om ersThat Default ed( ), respect ively, alt hough bot h are replaced by Cust om erDB.Get Default edCust om ers( ) .

Figu r e 9 .9 . M igr a t in g a pplica t ion logic t o t h e da t a ba se . [View full size image]

Data-Migration Mechanics There is no dat a t o m igrat e wit h t his dat abase refact oring.

Access Program Update Mechanics The m et hod logic m ust be rem oved from any applicat ion im plem ent ing it , and t he st ored procedure invoked inst ead. The easiest way t o accom plish t his is t o sim ply invoke t he st ored procedure from t he exist ing applicat ion m et hod( s) . You will likely discover t hat individual applicat ions have nam ed t his m et hod different ly, or t hey m ay have even designed it different ly. For exam ple, one applicat ion m ay im plem ent t he business logic as a single large funct ion, whereas anot her one m ay im plem ent it as several, sm aller funct ions. Sim ilarly, you m ay discover t hat different applicat ions im plem ent t he m et hod in different ways, eit her because t he code no longer reflect s t he act ual requirem ent s ( if it ever did) or because t he individual applicat ions had good reason t o im plem ent t he logic different ly. For exam ple, t he Det erm ineVacat ionDays( Year) operat ion provides a list of dat es for which em ployees get paid vacat ions. This operat ion is im plem ent ed in several applicat ions, but upon exam inat ion of t he code, it is im plem ent ed in different m anners. The various versions were writ t en at different t im es, t he rules have changed since som e of t he older versions were writ t en, and t he rules vary by count ry, st at e, and som et im es even cit y. At t his point , you would eit her need t o decide t o leave t he various versions alone ( for exam ple, live wit h t he differences) , fix t he exist ing applicat ion m et hods, or writ e a st ored procedure( s) in t he dat abase t hat im plem ent s t he correct version of t he logic.

The following code sam ple shows how t he applicat ion logic was m oved t o t he dat abase and how t he applicat ion now uses t he st ored procedure t o get a list of default ed cust om ers. The exam ple does not show t he code for t he creat ion of t he st ored procedure: //Before code stmt.prepare( "SELECT CustomerId, " + "PaymentAmount" + "FROM Transactions" + "WHERE LastPaymentdate < TRUNC(SYSDATE-90) AND " + "PaymentAmount >30 "); ResultSet rs = stmt.execute(); List defaulters = new ArrayList(); DefaultedCustomer defaulted = new DefaultedCustomer(); while (rs.next()) { defaulted.setCustomerID(rs.getLong("CustomerId")); defaulted.setAmount(rs.getBigDecimal ("PaymentAmount")); defaulters.add(defaulted); } return defaulters; //After code stmt.prepareCall("begin ? := getDefaultedCustomers(); end;"); stmt.registerOutParameter(1, OracleTypes.CURSOR); stmt.execute(); ResultSet rs = stmt.getObject(1); List defaulters = new ArrayList(); DefaultedCustomer defaulted = new DefaultedCustomer(); while (rs.next()) { defaulted.setCustomerID(rs.getLong(1)); defaulted.setAmount(rs.getBigDecimal(2)); defaulters.add(defaulted); } return defaulters;

Replace Method(s) With View Creat e a view based on one or m ore exist ing dat abase m et hods ( st ored procedures, st ored funct ions, or t riggers) wit hin t he dat abase.

Motivation There are t hree basic reasons why you m ay want t o apply Replace Met hod( s) Wit h View : Ea se of u se . You m ay have adopt ed new t ools, in part icular report ing t ools, t hat work wit h views m uch easier t han wit h m et hods. Re du ce m a in t e n a n ce . Many people find view definit ions easier t o m aint ain t han m et hods. I n cr e a se por t a bilit y. You can increase t he port abilit y of your dat abase schem a. Dat abase m et hod languages are specific t o individual dat abase vendors, whereas view definit ions can be writ t en so t hat t hey are SQL st andards com pliant .

Potential Tradeoffs This refact oring can t ypically be applied only t o relat ively sim ple m et hods t hat im plem ent logic t hat could also be im plem ent ed by a view definit ion. The im plicat ion is t hat t his refact oring lim it s your archit ect ural flexibilit y. Furt herm ore, if your dat abase does not support updat able views, you are lim it ed t o replacing only ret rieval- orient ed m et hods. Perform ance and scalabilit y are rarely im pact ed because all t he work st ill occurs wit hin t he dat abase.

Schema Update Mechanics To updat e t he dat abase schem a, you m ust first int roduce t he view via t he I nt roduce View refact oring ( page 306) . You m ust t hen m ark t he m et hod for rem oval, and t hen event ually rem ove it aft er t he t ransit ion period expires via t he D ROP PROCED URE com m and. Figure 9.10 depict s an exam ple where t he Get Cust om erAccount List st ored procedure is replaced wit h t he Cust om erAccount List view. The following code depict s t he DDL t o do so: CREATE VIEW CustomerAccountList ( CustomerID NUMBER NOT NULL, CustomerName VARCHAR(40), CustomerPhone VARCHAR2(40), AccountNumber VARCHAR(14), AccountBalance NUMBER, ) AS SELECT Customer.CustomerID, Customer.Name, Customer.PhoneNumber, Account.AccountNumber, Account.Balance FROM Customer,Account WHERE Customer.CustomerID=Account.CustomerID ;

Run this code after June 14 2007 DROP PROCEDURE GetCustomerAccountList

Figu r e 9 .1 0 . I n t r odu cin g t h e Cu st om e r Accou n t List vie w . [View full size image]

Data-Migration Mechanics There is no dat a t o m igrat e wit h t his dat abase refact oring.

Access Program Update Mechanics You need t o refact or t he access program s t o work t hrough t he view inst ead of invoking t he st ored procedure. The error codes t hrown by t he dat abase will change when using a view, so you m ay need t o rework your error handling code. The following code shows how an applicat ion would be changed t o call t he view rat her t han t he st ored procedure: //Before code stmt.prepareCall("begin ? := getCustomerAccountList(?); end;"); stmt.registerOutParameter(1, OracleTypes.CURSOR); stmt.setInt(1,customerId); stmt.execute(); ResultSet rs = stmt.getObject(1); List customerAccounts = new ArrayList(); while (rs.next()) { customerAccounts.add(populateAccount(rs)); } return customerAccounts; //After code

stmt.prepare( "SELECT CustomerID, CustomerName, "+ "CustomerPhone, AccountNumber, AccountBalance " + "FROM CustomerAccountList " + "WHERE CustomerId = ? "); stmt.setLong(1,customerId); ResultSet rs = stmt.executeQuery(); List customerAccounts = new ArrayList(); while (rs.next()) { customerAccounts.add(populateAccount(rs)); } return customerAccounts;

Replace View With Method(s) Replace an exist ing view wit h one or m ore exist ing m et hods ( st ored procedures, st ored funct ions, or t riggers) wit hin t he dat abase.

Motivation There are t wo fundam ent al reasons why you m ay want t o apply Replace View Wit h Met hod( s) . First , it is possible t o im plem ent m ore com plex logic wit h m et hods t han wit h views, so t his refact oring would be t he first st ep in t hat direct ion. Second, m et hods can updat e dat a t ables. Som e dat abases do not support updat able views, or if t hey do, it is oft en lim it ed t o a single t able. By m oving from a viewbased encapsulat ion st rat egy t o a m et hod- based one, your dat abase archit ect ure becom es m ore flexible.

Potential Tradeoffs There are several pot ent ial problem s wit h t his refact oring. First , report ing t ools usually do not work well wit h m et hods, but t hey do wit h views. Second, t here is a pot ent ial decrease in port abilit y because dat abase m et hod languages are specific t o individual dat abase vendors. Third, m aint ainabilit y m ay decrease because m any people prefer t o work wit h views rat her t han m et hods ( and vice versa) . Luckily, perform ance and scalabilit y are rarely im pact ed because all t he work st ill occurs wit hin t he dat abase.

Schema Update Mechanics To updat e t he dat abase schem a, you m ust first int roduce t he appropriat e m et hod( s) required t o replace t he view. You m ust t hen m ark t he view for rem oval, and t hen event ually rem ove it aft er t he t ransit ion period expires using t he Drop View ( page 79 ) refact oring. Figure 9.11 depict s an exam ple where t he Cust om erTransact ionsHist ory view is replaced wit h t he Get Cust om erTransact ions st ored procedure. The following code depict s t he code t o int roduce t he m et hod and t o drop t he view: CREATE OR REPLACE PROCEDURE GetCustomerTransactions { P_CustomerID IN NUMBER P_Start IN DATE P_End IN DATE } IS BEGIN SELECT * FROM Transaction WHERE Transaction.CustomerID = P_CustomerID AND Transaction.PostingDate BETWEEN P_Start AND P_End; END; On March 14 2007 DROP VIEW CustomerTransactionsHistory;

Figu r e 9 .1 1 . I n t r odu cin g t h e Ge t Cu st om e r Tr a n sa ct ion s st or e d pr oce du r e . [View full size image]

Data-Migration Mechanics There is no dat a t o m igrat e wit h t his dat abase refact oring.

Access Program Update Mechanics You need t o refact or t he access program s t o work t hrough t he m et hod( s) rat her t han t he view. The error codes t hrown by t he dat abase will change when using m et hods, so you m ay need t o rework your error handling code. When you replace t he Cust om erTransact ionsHist ory view wit h t he Get Cust om erTransact ions st ored procedure view, your code needs t o change as shown here: //Before code stmt.prepare( "SELECT * " + "FROM CustomerTransactionsHistory " + "WHERE CustomerId = ? "+ "AND TransactionDate BETWEEN ? AND ? "); stmt.setLong(1,customerId); stmt.setDate(2,startDate); stmt.setDate(3,endDate); ResultSet rs = stmt.executeQuery(); List customerTransactions = new ArrayList(); while (rs.next()) { customerTransactions.add(populateTransactions(rs)); } return customerTransactions; //After code stmt.prepareCall("begin ? :=

getCustomerTransactions(?,?,?); end;"); stmt.registerOutParameter(1, OracleTypes.CURSOR); stmt.setInt(1,customerId); stmt.setDate(2,startDate); stmt.setDate(3,endDate); stmt.execute(); ResultSet rs = stmt.getObject(1); List customerTransactions = new ArrayList(); while(rs.next()) { customerTransactions.add(populateAccount(rs)); } return customerTransactions;

Use Official Data Source Use t he official dat a source for a given ent it y, inst ead of t he current one which you are using.

Motivation The prim ary reason t o apply Use Official Dat a Source is t o use t he correct version of t he dat a for a given t able or t ables. When t he sam e dat a is st ored in several places, you run t he risk of inconsist ency and lack of availabilit y. For exam ple, assum e t hat you have m ult iple dat abases in your ent erprise, one of which, t he CRM dat abase, is t he official source of cust om er inform at ion. I f your applicat ion is using it s own Cust om er t able, you m ay not be working wit h all t he cust om er dat a available wit hin your organizat ion. Worse yet , your applicat ion could record a new cust om er but t heir inform at ion is not m ade available t o t he CRM dat abase, and t herefore it is not available t o t he ot her applicat ions wit hin your organizat ion.

Potential Tradeoffs The prim ary t radeoff is t he cost of refact oring all t he references t o t he t able in your local dat abase t o now use t he official dat abase t ables. I f t he key st rat egy of t he official dat a source is different from what your t able is using, you m ay want t o first apply Consolidat e Key St rat egy ( page 168) so t hat t he ident ifiers used in your dat abase reflect t hat of t he official dat a source. Furt herm ore, t he sem ant ics and t im eliness of t he official source dat a m ay not be ident ical t o t hat of t he dat a you are current ly using. You m ay need t o apply refact orings such as Apply St andard Codes ( page 157) , Apply St andard Type ( page 162) , and I nt roduce Com m on Form at ( page 183) t o convert your exist ing dat a over t o som et hing t hat conform s t o t he official dat a source.

Schema Update Mechanics To updat e t he schem a, you m ust do t he following:

1 . I de n t ify t h e officia l da t a sou r ce . You have t o ident ify t he official dat a source, which will oft en be in an ext ernal dat abase t hat is out side t he scope of your cont rol. Worse yet , t here could be several " official sources," which you eit her need t o pick from or consolidat e appropriat ely for your sit uat ion. You need t o negot iat e agreem ent wit h bot h your proj ect st akeholder( s) and t he owners of t he dat a source t hat your applicat ion should be accessing. 2 . Ch oose a n im ple m e n t a t ion st r a t e gy. You have t wo choices: t o eit her rework your applicat ion( s) t o direct ly access t he official dat a source ( t he st rat egy depict ed in Figure 9.12) or t o replicat e t he source dat a wit h your exist ing t able ( t he st rat egy depict ed in Figure 9.13) . When t he official dat a source is in anot her dat abase, t he replicat ion st rat egy scales bet t er t han t he direct access st rat egy because your applicat ion will pot ent ially be only coupled t o a single dat abase. 3 . D ir e ct a cce ss st r a t e gy. You change your applicat ion such t hat t he applicat ion now direct ly accesses Cust om erDat abase. Not e t hat t his m ay require a different dat abase connect ion t han what is current ly used, or t he invocat ion of a web service or t ransact ion. Following t his st rat egy, you m ust be able t o handle t ransact ions across different dat abases. 4 . Re plica t ion st r a t e gy. You can set up replicat ion bet ween YourDat abase and Cust om erDat abase so t hat you replicat e all t he t ables t hat you require. I f you want t o have updat able t ables in YourDat abase, you m ust use m ult im ast er replicat ion. Following t his st rat egy, you should not have t o change your applicat ion code as long as bot h t he schem a and t he sem ant ics of your source dat a rem ains t he sam e. I f changes are required t o your schem a, such as renam ing t ables or, worse yet , t he sem ant ics of t he official dat a are different from t hat of your current dat a, you will likely require change t o your ext ernal program s. 5 . Re m ove t a ble s t h a t a r e n ot u se d a n ym or e . When you choose t o direct ly access t he official source dat a, you no longer have any use for t he t ables in t he YourDat abase. You should use Drop Table ( page 77 ) refact oring t o rem ove t hose t ables.

Figu r e 9 .1 2 . D ir e ct ly a cce ssin g t h e officia l cu st om e r da t a fr om t h e Cu st om e r da t a ba se .

Figu r e 9 .1 3 . Usin g officia l cu st om e r da t a via a r e plica t ion st r a t e gy.

Data-Migration Mechanics There is no dat a t o m igrat e for t his refact oring if t he dat a sem ant ics of t he official YourDat abase and t he local YourDat abase are t he sam e. I f t hey are not t he sam e, you need t o eit her consider backing out of t his refact oring, following a replicat ion st rat egy where you convert t he different values back and fort h, or refact oring your applicat ion( s) t o accept t he dat a sem ant ics of t he official source. Unless t he dat a is sem ant ically sim ilar, or you get very lucky and find a way t o safely convert t he dat a back and fort h, it is unlikely t hat a replicat ion st rat egy will work for you.

Access Program Update Mechanics The way t hat you change ext ernal program s varies depending on your im plem ent at ion st rat egy. I f you use t he direct access st rat egy, t he program s m ust now connect t o and work wit h t he official dat a source, Cust om erDat abase. Wit h t he replicat ion st rat egy, you m ust creat e replicat ion script s bet ween t he appropriat e t ables in YourDat abase and Cust om erDat abase. When t he t able( s) in YourDat abase m at ch t he t able( s) in Cust om erDat abase, t he program s should not need t o be changed. Ot herwise, you eit her have t o adopt t he schem a and sem ant ics of t he official dat a source, and t hen refact or your applicat ion( s) appropriat ely, or you need t o figure out a way t o im plem ent t he replicat ion in such a

way t hat t he schem a in YourDat abase is not changed. One way t o do t hat is t o im plem ent script s t hat convert t he dat a back and fort h, which is oft en difficult t o accom plish in pract ice because t he m appings are rarely one t o one. Anot her way, equally difficult , is t o replace t he source t able( s) in YourDat abase wit h t he appropriat e ones from Cust om erDat abase, and t hen apply t he Encapsulat e Table Wit h View refact oring ( page 243) so t hat t he exist ing applicat ions can st ill work wit h " t he old schem a." These new view( s) would need t o im plem ent t he sam e conversion logic as t he previously m ent ioned script s. The following code shows how t o change your applicat ion t o work wit h t he official dat a source by using the crm DB connect ion inst ead: // Before code stmt = DB.prepare("select CustomerID,Name,PhoneNumber "+ "FROM Customer WHERE CustomerID = ?"); stmt.setLong(1, customerID); stmt.execute(); ResultSet rs = stmt.executeQuery(); // After code stmt = crmDB.prepare("select CustomerID,Name,PhoneNumber "+ "FROM Customer WHERE CustomerID = ?"); stmt.setLong(1, customerID); stmt.execute(); ResultSet rs = stmt.executeQuery();

Chapter 10. Method Refactorings This chapt er sum m arizes t he code refact orings t hat we have found applicable t o st ored procedures, st ored funct ions, and t riggers. For t he sake of sim plicit y, we refer t o t hese t hree t ypes of funct ionalit y sim ply as m et hods, t he t erm t hat Mart in Fowler used in Refact oring ( Fowler 1999). As t he nam e suggest s, a m et hod refact oring is a change t o a st ored procedure t hat im proves it s qualit y. Our goal is t o present an overview of t hese refact orings. You can find det ailed descript ions in Refact oring, t he sem inal work on t he subj ect . We different iat e bet ween t wo cat egories of m et hod refact orings, t hose t hat change t he int erface of t he dat abase offered t o ext ernal program s and t hose t hat m odify t he int ernals of a m et hod. For exam ple, t he refact oring Add Param et er ( page 278) would change t he int erface of a m et hod, whereas Consolidat e Condit ional Expression ( page 283) does not . ( I t replaces a com plex com parison wit h an invocat ion t o a new m et hod t hat im plem ent s j ust t he com parison.) We dist inguish bet ween t he t wo cat egories because refact orings t hat change t he int erface will require ext ernal program s t o be refact ored, whereas t hose t hat only m odify t he int ernals of a m et hod do not .

10.1. Interface Changing Refactorings Each of t hese refact orings includes a t ransit ion period t o provide developm ent t eam s sufficient t im e t o updat e ext ernal program s t hat invoke t he m et hods. Part of t he refact oring is t o rework t he original version of t he m et hod( s) t o invoke t he new version( s) appropriat ely, and t hen aft er t he t ransit ion period, t he old version is invoked accordingly.

10.1.1. Add Parameter As t he nam e suggest s, t his refact oring adds a new param et er t o an exist ing m et hod. I n Figure 10.1, you see an exam ple where MiddleNam e was added as a t hird param et er t o ReadCust om er . The safest way t o add a param et er is t o add it t o t he end of param et er list ; t hat way, when you refact or t he invocat ions of t he m et hod, you do not run t he risk of reordering t he param et ers im properly. I f you lat er decide t hat t he param et ers appear in t he wrong order, you can apply the Reorder Param et ers refact oring ( page 281) .

Figu r e 1 0 .1 . Addin g a pa r a m e t e r . [View full size image]

10.1.2. Parameterize Method Som et im es t wo or m ore m et hods do basically t he sam e t hing. For exam ple, in Figure 10.2, t he Get Am ericanCust om ers, Get CanadianCust om ers, and Get BrazilianCust om ers st ored procedures produce a list of Am erican, Canadian, and Brazilian cust om ers, respect ively. Alt hough t hese separat e procedures have t he advant age of being specific, t hereby ensuring t hat t he right queries are being invoked on t he dat abase, t hey are m ore difficult t o m aint ain t han a single st ored procedure. These st ored procedures are replaced wit h t he Get Cust om erByCount ry st ored procedure, which t akes a count ry ident ifier as a param et er. There is now less code t o m aint ain as a result , and it is easy t o add

support for new count ries j ust by adding new count ry codes.

Figu r e 1 0 .2 . Pa r a m e t e r izin g a st or e d pr oce du r e .

10.1.3. Remove Parameter Som et im es a m et hod includes an ext ra param et er, perhaps because it was originally included in t he belief t hat it would be needed in t he fut ure, or perhaps because t he business changed and it is no longer required, or perhaps t he param et er value can be obt ained via anot her m anner, such as reading it from a lookup t able. I n Figure 10.3, you see t hat Get Account List has an ext raneous AsOfDat e param et er t hat is rem oved t o reflect t he act ual usage of t he st ored procedure.

Figu r e 1 0 .3 . Re m ovin g a pa r a m e t e r . [View full size image]

10.1.4. Rename Method Som et im es a m et hod is poorly nam ed, or it sim ply does not follow your corporat e nam ing convent ions. For exam ple, in Figure 10.4, t he Get Account List st ored procedure is renam ed t o Get Account sForCust om er t o reflect your st andard nam ing convent ions.

Figu r e 1 0 .4 . Re n a m in g a st or e d pr oce du r e .

10.1.5. Reorder Parameters I t is com m on t o discover t hat t he ordering of t he param et ers of a m et hod does not m ake sense, perhaps because new param et ers have been added in t he wrong order via Add Param et er ( page 278) or sim ply because t he ordering does not reflect t he current business needs. Regardless, you can m ake a st ored procedure easier t o underst and by ordering it s param et ers appropriat ely ( see Figure 10.5) .

Figu r e 1 0 .5 . Re or de r in g t h e pa r a m e t e r s of a m e t h od. [View full size image]

As you see in Figure 10.6, t he param et ers of Get Cust om ers were reordered t o reflect com m on business ordering convent ions. Not e t hat t his is a highly risky exam pleyou cannot have a t ransit ion period in t his case because t he t ypes of all t hree param et ers are ident ical, and t herefore you cannot sim ply override t he m et hod. The im plicat ion is t hat you could reorder t he param et ers but forget t o reorder t hem in ext ernal code t hat invokes t he m et hodt he m et hod would st ill run, but wit h t he wrong values for t he param et ers, a t ricky defect t o find wit hout good t est cases. Had t he param et ers been of different t ypes, ext ernal applicat ions t hat had not been refact ored t o pass t he param et ers in t he new order would break, and it would be very easy t o find t he affect ed code as a result .

Figu r e 1 0 .6 . Re or de r in g t h e pa r a m e t e r s of a m e t h od w it h ou t a t r a n sit ion pe r iod. [View full size image]

10.1.6. Replace Parameter with Explicit Methods Som et im es you will have a single m et hod doing one of several t hings, based on t he value of a param et er being passed t o it . For exam ple, in Figure 10.7, you see t hat Get Account Value, which is basically a generic get t er t hat ret urns t he value of a colum n wit hin a row, is replaced wit h t he m ore specific Get Account Balance, Get Account Cust om erI D, and Get Account OpeningDat e st ored procedures.

Figu r e 1 0 .7 . Re pla cin g a pa r a m e t e r w it h e x plicit st or e d pr oce du r e s.

10.2. Internal Refactorings I nt ernal refact orings im prove t he qualit y of t he im plem ent at ion of a m et hod wit hout changing it s int erface.

10.2.1. Consolidate Conditional Expression Wit hin t he logic of a m et hod, you oft en have a series of condit ional t est s t hat produce t he sam e result . You should refact or your code t o com bine t hose condit ionals t o m ake your code clearer, as you see in t he following exam ple, in which t he t hree condit ionals are com bined int o one. This refact oring also set s you up t o apply Ext ract Met hod ( page 285) t o creat e a m et hod t hat im plem ent s t he condit ional:

Before code CREATE OR REPLACE FUNCTION GetAccountAverageBalance ( inAccountID IN NUMBER) RETURN NUMBER; AS averageBalance := 0; BEGIN IF inAccountID = 10000 THEN RETURN 0; END IF; IF inAccountID = 123456 THEN RETURN 0; END IF; IF inAccountID = 987654 THEN RETURN 0; END IF; Code to calculate the average balance RETURN averageBalance; END; After code CREATE OR REPLACE FUNCTION GetAccountAverageBalance ( inAccountID IN NUMBER) RETURN NUMBER; AS averageBalance := 0; BEGIN IF inAccountID = 10000 || inAccountID = 123456 || inAccountID = 987654 THEN RETURN 0; END IF; Code to calculate the average balance RETURN averageBalance; END;

10.2.2. Decompose Conditional

You have a condit ional I F- TH EN - ELSE expression t hat is com plex and t herefore difficult t o underst and. Replace t he condit ion, t hen part, and else part wit h invocat ions t o m et hods t hat im plem ent t he logic of t he expression. I n t he first version of t he following code, t he logic t o det erm ine which int erest rat e t o apply is det erm ined based on whet her t he value of inBalance is below, wit hin, or over a cert ain t hreshold. I n t he second version, t he code is sim plified by t he addit ion of calls t o BalanceI sSufficient, Calculat eLowI nt erest, and Calculat eHighI nt erest . ( These funct ions are not shown.) Before code CREATE OR REPLACE FUNCTION CalculateInterest ( inBalance IN NUMBER ) RETURN NUMBER; AS lowBalance NUMBER; highBalance NUMBER; lowInterestRate NUMBER; highInterestRate NUMBER; BEGIN lowBalance := GetLowBalance(); highBalance := GetHighBalance(); lowInterestRate := GetLowInterestRate(); highInterestRate := GetHighInterestRate(); IF inBalance < lowBalance THEN RETURN 0; END IF IF inBalance >= lowBalance && inBalance = 500 THEN interest := medianBalance * 0.01; END IF; RETURN interest; END; Intermediate version of the code CREATE OR REPLACE Function CalculateAccountInterest ( inAccountID IN NUMBER, inStart IN DATE, inEnd IN DATE ) RETURN NUMBER; AS medianBalance NUMBER; startBalance NUMBER; endBalance NUMBER; interest := 0; BEGIN

startBalance := GetDailyEndBalance ( inAccountID, inStart ); BEGIN Determine the ending balance SELECT Balance INTO endBalance FROM DailyEndBalance WHERE AccountID = inAccountID && PostingDate = inEnd; EXCEPTION WHEN NO_DATA_FOUND THEN endBalance := 0; END; medianBalance := ( startBalance + endBalance ) / 2; IF medianBalance < 0 THEN medianBalance := 0; END IF; IF medianBalance >= 500 THEN interest := medianBalance * 0.01; END IF; RETURN interest; END; CREATE OR REPLACE Function GetDailyBalance ( inAccountID IN NUMBER, inDate IN DATE ) RETURN NUMBER; AS endbalance NUMBER; BEGIN BEGIN SELECT Balance INTO endBalance FROM DailyEndBalance WHERE AccountID = inAccountID < PostingDate = inDate; EXCEPTION WHEN NO_DATA_FOUND THEN endBalance := 0; END; RETURN endBalance; END; Final version of the code CREATE OR REPLACE FUNCTION CalculateAccountInterest ( inAccountID IN NUMBER, inStart IN DATE, inEnd IN DATE ) RETURN NUMBER; AS medianBalance NUMBER; startBalance NUMBER; endBalance NUMBER; BEGIN startBalance := GetDailyEndBalance ( inAccountID, inStart ); endBalance:= GetDailyEndBalance ( inAccountID, inEnd ); medianBalance := CalculateMedianBalance ( startBalance, endBalance ); RETURN CalculateInterest ( medianBalance ); END; Code for GetDailyEndBalance, CalculateMedianBalance, & CalculateInterest goes here

10.2.4. Introduce Variable Your m et hod code cont ains a com plicat ed expression t hat is difficult t o read. You can int roduce wellnam ed variables in your code t hat are used t o im plem ent port ions of t he expression and t hen are com posed t o form t he original expression, t hereby m aking t he m et hod easier t o underst and. This refact oring is t he equivalent of Mart in Fowler's I nt roduce Explaining Variable ( Fowler 1999). Aft er speaking wit h him , he suggest ed t hat we use t he sim pler nam e; in hindsight , he realized t hat he should have originally called t his refact oring I nt roduce Variable, and m ost t ool vendors also use t he sim pler nam e: Before code CREATE OR REPLACE FUNCTION DetermineAccountStatus ( inAccountID IN NUMBER, inStart IN DATE, inEnd IN DATE ) RETURN VARCHAR; AS lastAccessedDate DATE; BEGIN Some code to calculate lastAccessDate IF ( inDate < lastAccessDate && outdate > lastAccessDate ) && ( inAccountID > 10000 ) && ( inAccountID != 123456 && inAcountID != 987654) THEN do something END IF; do another thing END; After code CREATE OR REPLACE FUNCTION DetermineAccountStatus ( inAccountID IN NUMBER, inStart IN DATE, inEnd IN DATE ) RETURN VARCHAR; AS lastAccessedDate DATE; isBetweenDates BOOLEAN; isValidAccountID BOOLEAN; isNotTestAccount BOOLEAN BEGIN Some code to calculate lastAccessDate isBetweenDates := inDate < lastAccessDate && outdate > lastAccessDate; isValidAccountID := inAccountID > 100000; isNotTestAccount := inAccountID != 123456 && inAcountID != 987654; IF isBetweenDates && isValidAccountID && isNotTestAccount THEN do something END IF; do another thing END;

10.2.5. Remove Control Flag You have a variable act ing as a cont rol flag t o break out of a cont rol const ruct such as a loop. You can sim plify your code by replacing t he use of t he cont rol flag by exam ining t he expression wit hin t he

cont rol const ruct . As you see in t he following exam ple, when you rem ove t he cont rolFlag logic, t he code is sim plified: Before code DECLARE controlFlag := 0; anotherVariable := 0; BEGIN WHILE controlFlag = 0 LOOP Do something IF anotherVariable > 20 THEN controlFlag = 1; ELSE Do something else END IF; END LOOP; END; After code DECLARE anotherVariable := 0; BEGIN WHILE anotherVariable = 500 THEN interest := medianBalance * 0.01; END IF; RETURN interest; END; Intermediate version of the code CREATE OR REPLACE FUNCTION CalculateInterest ( inBalance IN NUMBER ) RETURN NUMBER; AS interest := 0; minimumBalance NUMBER; BEGIN BEGIN SELECT MinimumBalanceForInterest INTO minimumBalance FROM CorporateBusinessConstants WHERE RowNumber = 1; EXCEPTION WHEN NO_DATA_FOUND THEN minimumBalance := 0; END; IF inBalance >= minimumBalance THEN interest := medianBalance * 0.01; END IF; RETURN interest; END; Final version of the code CREATE OR REPLACE FUNCTION CalculateInterest ( inBalance IN NUMBER ) RETURN NUMBER; AS interest := 0; minimumBalance NUMBER; interestRate NUMBER; BEGIN

minimumBalance := GetMinimumBalance(); interestRate := GetInterestRate(); IF inBalance >= minimumBalance THEN interest := medianBalance * interestRate; END IF; RETURN interest; END;

10.2.9. Replace Nested Conditional with Guard Clauses Nest ed condit ional st at em ent s can be difficult t o underst and. I n t he following exam ple, t he nest ed I F st at em ent s were replaced wit h a series of separat e I F st at em ent s t o im prove t he readabilit y of t he code: Before code BEGIN IF condition1 THEN do something 1 ELSE IF condition2 THEN do something 2 ELSE IF condition3 THEN do something 3 END IF; END IF; END IF; END; After code BEGIN IF condition1 THEN do something 1 RETURN; END IF; IF condition2 THEN do something 2 RETURN; END IF; IF condition3 THEN do something 3 RETURN; END IF; END;

10.2.10. Split Temporary Variable You have a t em porary variable being used for one or m ore purposes wit hin your m et hod. This has likely prevent ed you from giving it a m eaningful nam e, or perhaps it has a m eaningful nam e for one purpose but not t he ot her. The solut ion is t o int roduce a t em porary variable for each purpose, as you see in t he following code, which uses aTem poraryVariable as a reposit ory for t he convert ed im perial values int o m et ric values:

Before code DECLARE aTemporaryVariable := 0; farenheitTemperature := 0; lengthInInches := 0; BEGIN retrieve farenheitTemperature aTemporaryVariable := (farenheitTemperature32 ) * 5 / 9; do something retrieve lengthInInches aTemporaryVariable := lengthInInches * 2.54; do something END; After code DECLARE celciusTemperature := 0; farenheitTemperature := 0; lenghtInCentimeters := 0; lengthInInches := 0; BEGIN retrieve farenheitTemperature celciusTemperature := (farenheitTemperature32 ) * 5 / 9; do something retrieve lengthInInches lenghtInCentimeters := lengthInInches * 2.54; do something END;

10.2.11. Substitute Algorithm I f you find a clearer way t o writ e t he logic of an algorit hm im plem ent ed as a m et hod, you should do it . I f you need t o change an exist ing algorit hm , it is oft en easier t o sim plify it first and t hen updat e it . Because it is hard t o rewrit e a com plicat ed algorit hm , you should first sim plify it via ot her refact orings, such as Ext ract Met hod ( page 285) before applying Subst it ut e Algorit hm .

Chapter 11. Transformations Transform at ions are changes t hat change t he sem ant ics of your dat abase schem a by adding new feat ures t o it . The t ransform at ions described in t his chapt er are as follows: I nsert Dat a I nt roduce New Colum n I nt roduce New Table I nt roduce View Updat e Dat a

Insert Data I nsert dat a int o an exist ing t able.

Motivation You t ypically need t o apply I nsert Dat a as t he result of st ruct ural changes wit hin your t able design. You m ay need t o apply I nsert Dat a as t he result of t he following: Ta ble r e or ga n iza t ion . When you are using Renam e Table ( page 113) , Merge Tables ( page 96 ) , Split Table ( page 145) , or Drop Table ( page 77 ) refact orings, you m ay have t o use I nsert Dat a t o reorganize t he dat a in t he exist ing t ables. Pr ovide st a t ic look u p da t a . All applicat ions need st at ic lookup dat afor exam ple, t ables list ing t he st at es/ provinces ( for inst ance, I llinois and Ont ario) t hat you do business in, a list of address t ypes ( for inst ance, hom e, business, vacat ion) , and a list of account t ypes ( for inst ance, checking, savings, and invest m ent ) . I f your applicat ion does not include adm inist rat ive edit ing screens for m aint aining t hese list s, you need t o insert t his dat a m anually. Cr e a t e t e st da t a . During developm ent , you need known dat a values insert ed int o your developm ent dat abase( s) t o support your t est ing effort s.

Potential Tradeoffs I nsert ing new dat a int o t ables can be t ricky, especially when you are going t o insert lookup dat a t hat is referenced from one or m ore ot her t ables. For exam ple, assum e t he Address t able references dat a in t he St at e t able, which you are insert ing new dat a int o. The dat a t hat you insert int o St at e m ust cont ain valid values t hat will appear in Address.

Schema Update Mechanics To updat e t he dat abase schem a, you m ust do t he following: 1 . I de n t ify t h e da t a t o in se r t . This includes ident ifying any dependencies. I f t his is part of m oving dat a from anot her t able, should t he source row be delet ed? I s t he insert ed dat a new, and if so, have t he values been accept ed by your proj ect st akeholder( s) ? 2 . I de n t ify t h e da t a de st in a t ion. Which t able is t he dat a being insert ed int o? 3 . I de n t ify t h e da t a sou r ce . I s t he dat a com ing from anot her t able or is it a m anual insert ( for exam ple, you are writ ing a script t o creat e t he dat a) ? 4 . I de n t ify t r a n sfor m a t ion n e e ds. Does t he source dat a need t o be t ranslat ed before it can be insert ed int o t he t arget t able? For exam ple, you m ay have a list of m et ric m easurem ent s t hat need t o be convert ed int o im perial values before being insert ed int o a lookup t able.

Data-Migration Mechanics

When sm all am ount s of dat a need t o be insert ed, you will likely find t hat a sim ple SQL script t hat insert s t he source dat a int o t he t arget locat ion is sufficient . For large am ount s of dat a, you need t o t ake a m ore sophist icat ed approach, such as using dat abase ut ilit ies such as Oracle's SQLLDR or a bulk loader ( because of t he t im e it will t ake t o insert t he dat a) . Figure 11.1 shows how we insert a new record int o t he Account Type t able represent ing brokerage account s. This Account Type support s new funct ionalit y t hat needs t o first be t est ed and t hen m oved t o product ion lat er on. The following code depict s t he DML t o insert dat a int o t he Account Type t able:

Figu r e 1 1 .1 . I n se r t in g a n e w Accou n t Type .

INSERT INTO AccountType (AccountTypeId,Name,EffectiveDate) VALUES (6,'Brokerage','Feb 1 2007');

Access Program Update Mechanics When I nsert Dat a is applied as t he result of anot her dat abase refact oring, t he applicat ions should have been updat ed t o reflect t hat refact oring. However, when I nsert Dat a is applied t o support t he addit ion of dat a wit hin an exist ing schem a, it is possible t hat ext ernal applicat ion code will need t o be changed.

You need t o updat e W H ERE clauses. For exam ple, in Figure 11.1, we insert ed Brokerage int o t he Account Type t able. You m ay need t o updat e SELECT st at em ent s t o not read t his value when you only want t o work wit h st andard banking account s ( everyt hing but brokerage) , as you see in t he following code. You can also use I nt roduce View ( page 306) t o creat e a specific view t hat t he applicat ion uses which ret urns t his subset of dat a: // Before code stmt.prepare( "SELECT * FROM AccountType " + "WHERE AccountTypeId NOT IN (?,?)"); stmt.setLong(1,PRIVATEACCOUNT.getId); stmt.setLong(2,MONEYMARKETACCOUNT.getId); stmt.execute(); ResultSet standardAccountTypes = stmt.executeQuery(); //After code stmt.prepare( "SELECT * FROM AccountType " + "WHERE AccountTypeId NOT IN (?,?,?)"); stmt.setLong(1,PRIVATEACCOUNT.getId); stmt.setLong(2,MONEYMARKETACCOUNT.getId); stmt.setLong(3,BROKERAGE.getId); stmt.execute(); ResultSet standardAccountTypes = stmt.executeQuery();

Sim ilarly, you need t o updat e source code t hat validat es t he values of dat a at t ribut es. For exam ple, you m ay have code t hat defines prem ium account s as being eit her of t ype Privat e or Money Market in your applicat ion logic. You now have t o add Brokerage t o t his list , as you see in t he following code: //Before code public enum PremiumAccountType { PRIVATEACCOUNT(new Long(3)), MONEYMARKET(new Long(4)); private Long id; public Long getId() { return id; } PremiumAccountType(Long value) { this.id = value; } public static Boolean isPremiumAccountType(Long idToFind) { for (PremiumAccountType premiumAccountType : PremiumAccountType.values()) { if (premiumAccountType.id.equals(idToFind)) return Boolean.TRUE; } Return Boolean.FALSE } //After code public enum PremiumAccountType { PRIVATEACCOUNT(new Long(3)), MONEYMARKET(new Long(4)), BROKERAGE(new Long(6));

private Long id; public Long getId() { return id; } PremiumAccountType(Long value) { this.id = value; } public static Boolean isPremiumAccountType(Long idToFind) { for (PremiumAccountType premiumAccountType : PremiumAccountType.values()) { if (premiumAccountType.id.equals(idToFind)) return Boolean.TRUE; } Return Boolean.FALSE }

Introduce New Column I nt roduce a new colum n in an exist ing t able.

Motivation There are several reasons why you m ay want t o apply the I nt roduce New Colum n t ransform at ion: To pe r sist a n e w a t t r ibu t e . A new requirem ent m ay necessit at e t he addit ion of a new colum n in your dat abase. I n t e r m e dia t e st e p of a r e fa ct or in g. Many dat abase refact orings, such as Move Colum n ( page 103) and Renam e Colum n ( page 109) , include a st ep in which you int roduce a new colum n int o an exist ing t able.

Potential Tradeoffs You need t o ensure t hat t he colum n does not exist elsewhere; ot herwise, you are at risk of increasing referent ial int egrit y problem s due t o great er dat a redundancy.

Schema Update Mechanics As depict ed in Figure 11.2, t o updat e t he dat abase schem a you m ust sim ply int roduce Count ryCode via t he AD D clause of t he ALTER TABLE SQL com m and. The St at e.Count ryCode colum n is a foreign key reference t o t he Count ry t able ( not shown) , enabling our applicat ion( s) t o t rack st at es by count ry. The following code depict s t he DDL t o int roduce t he St at e.Count ryCode colum n. To creat e t he referent ial const raint , apply t he Add Foreign Key Const raint refact oring ( page 204) :

Figu r e 1 1 .2 . I n t r odu cin g t h e St a t e .Cou n t r yCode colu m n .

ALTER TABLE State ADD Country Code VARCHAR2(3) NULL;

Data-Migration Mechanics Alt hough not a dat a m igrat ion per se, a significant prim ary challenge wit h t his t ransform at ion is t o populat e t he colum n wit h dat a values aft er you have added it t o t he t able. You need t o do t he following: 1 . Work wit h your proj ect st akeholders t o ident ify t he appropriat e values. 2 . Eit her m anually input t he new values int o t he colum n or writ e a script t o aut om at ically populat e t he colum n ( or a com binat ion of t he t wo st rat egies) . 3 . Consider applying t he refact orings I nt roduce Default Value ( page 186) Drop Non- Nullable ( page 177) , or Make Colum n Non- Nullable ( page 189) , as appropriat e, t o t his new colum n.

The following code depict s t he DML t o populat e St at e.Count ryCode wit h t he init ial value of 'USA': UPDATE State SET CountryCode = 'USA' WHERE CountryCode IS NULL;

Access Program Update Mechanics The updat e m echanics are sim ple because all you m ust do is use t he new colum n in your applicat ion( s) t o work wit h t he new colum n. I n t he following code, we show an exam ple of how hibernat e OR m apping m et a dat a would be affect ed: //Before mapping

//After mapping

Introduce New Table I nt roduce a new t able in an exist ing dat abase.

Motivation There are several reasons why you m ay want t o apply t he I nt roduce New Table t ransform at ion: To pe r sist a n e w a t t r ibu t e . A new requirem ent m ay necessit at e t he addit ion of a new t able in your dat abase. As a n in t e r m e dia t e st e p of a r e fa ct or in g. Many dat abase refact orings, such as Split Table ( page 145) and Renam e Table ( page 113) , include a st ep in which you int roduce a new t able in place of an exist ing t able. To in t r odu ce a n e w officia l da t a sou r ce . I t is quit e com m on t o discover t hat sim ilar inform at ion is st ored in several t ables. For exam ple, t here m ay be several sources of cust om er inform at ion, sources t hat are oft en out of synchronizat ion wit h each ot her and som et im es even cont radict ory. I n t his scenario, you m ust use Use Official Dat a Source ( page 271) , and t hen over t im e apply Drop Table ( page 77 ) t o t he original source t ables. N e e d t o ba ck u p da t a . While im plem ent ing som e refact orings, such as Drop Table ( page 77 ) or Merge Tables ( page 96 ) , you m ay need t o creat e a t able t o hold t able dat a for int erm ediat e usages or for backup purposes.

Potential Tradeoffs The prim ary t radeoff is t hat you need t o ensure t hat t he t able you want t o int roduce does not exist elsewhere already. Oft en, t he exact t able t hat you require will not exist , but som et hing close willyou m ay discover t hat it is easier t o refact or t hat exist ing t able t han it is t o add a new t able cont aining redundant inform at ion.

Schema Update Mechanics As depict ed in Figure 11.3, t o updat e t he dat abase schem a you m ust sim ply int roduce Cust om erI dent ificat ion via t he CREATE TABLE SQL com m and. The following code depict s t he DDL t o int roduce t he Cust om erI dent ificat ion t able. The Cust om erI dent ificat ion.Cust om erI D colum n is a foreign key reference t o t he Cust om er t able ( not shown) , enabling our applicat ion( s) t o t rack t he various ways t o ident ify individual cust om ers:

Figu r e 1 1 .3 . I n t r odu cin g t h e Cu st om e r I de n t ifica t ion t a ble .

CREATE TABLE CustomerIdentification( CustomerID NUMBER NOT NULL, Photo BLOB, PassportID NUMBER, CONSTRAINT PK_CustomerIdentification PRIMARY KEY (CustomerID) );

Data-Migration Mechanics Alt hough not a dat a m igrat ion per se, a significant prim ary challenge wit h t his t ransform at ion is t o populat e t he t able wit h dat a values aft er you have added it t o t he dat abase. You need t o do t he following: 1 . Work wit h your proj ect st akeholders t o ident ify t he appropriat e values. 2 . Eit her m anually input t he new values int o t he t able or writ e a script t o aut om at ically populat e t he t able ( or a com binat ion of t he t wo st rat egies) . 3 . Consider applying the I nsert Dat a t ransform at ion ( page 296) .

Access Program Update Mechanics I deally, t he updat e m echanics are st raight forward because all you should have t o do is use t he new t able in your applicat ion( s) t o work wit h t he new t able. However, when you are int roducing a new t able as a replacem ent for several ot her t ables, you will oft en discover t hat your new t able has a slight ly different schem a, and worse yet im plem ent s slight ly different dat a sem ant ics, t han t he exist ing t ables. When t his is t he case, you need t o refact or t he access program s t o work wit h t he new version.

Introduce View Creat e a view based on exist ing t ables in t he dat abase.

Motivation There are several reasons why you m ay want t o apply I nt roduce View : Su m m a r ize da t a for r e por t in g. Many report s require sum m ary dat a, which can be generat ed via t he view definit ion. Re pla ce r e du n da n t r e a ds. Several ext ernal program s, or st ored procedures for t hat m at t er, oft en im plem ent t he sam e ret rieval query. These queries can be replaced by a com m on readonly t able or view. D a t a se cu r it y. A view can be used t o provide end users wit h read access t o dat a but not updat e privileges. En ca psu la t e a cce ss t o a t a ble . Som e organizat ions choose t o encapsulat e access t o t ables by defining updat eable views t hat ext ernal program s access inst ead of t he source t ables. This enables t he organizat ion t o easily perform dat abase refact orings such as Renam e Colum n ( page 109) or Renam e Table ( page 113) wit hout im pact ing t he ext ernal applicat ions, because views add an encapsulat ion layer bet ween your t ables and your applicat ion. Re du ce SQL du plica t ion . When you have com plex SQL queries in an applicat ion, it is com m on t o discover t hat part s of t he SQL are duplicat ed in m any places. When t his is t he case, you should int roduce views t o ext ract out t he duplicat e SQL, as shown in t he exam ple sect ion.

Potential Tradeoffs There are t wo prim ary challenges wit h int roducing a view. First , t he perform ance of t he j oins defined by t he view m ay not be accept able t o your end users, requiring a different approach such as t he refact oring I nt roduce Read- Only Table ( page 251) . Second, t he addit ion of a new view increases t he coupling wit hin your dat abase schem aas you can see in Figure 11.4, t he view definit ion depends on t he t able schem a( s) definit ions.

Figu r e 1 1 .4 . I n t r odu cin g t h e Cu st om e r Por t folio vie w . [View full size image]

Schema Update Mechanics To updat e t he dat abase schem a when you perform I nt roduce View , you m ust sim ply int roduce t he view via t he CREATE VI EW com m and. Figure 11.4 depict s an exam ple where t he Cust om erPort folio view is based on t he Cust om er, Account , and I nsurance t ables t o sum m arize t he business t hat we do wit h each cust om er. This provides an easier way for end users t o do ad- hoc queries. The following code depict s t he DDL t o int roduce t he Cust om erPort folio view: CREATE VIEW CustomerPortfolio ( CustomerID Name PhoneNumber AccountsTotalBalance InsuranceTotalPayment InsuranceTotalValue ) AS SELECT Customer.CustomerID, Customer.Name, Customer.PhoneNumber, SUM(Account.Balance), SUM(Insurance.Payment), SUM(Insurance.Value) FROM Customer,Account,Insurance WHERE Customer.CustomerID=Account.CustomerID AND Customer.CustomerID=Insurance.CustomerID

;

Data-Migration Mechanics There is no dat a t o m igrat e wit h t his dat abase refact oring.

Access Program Update Mechanics The updat e m echanics depend on t he sit uat ion. I f you are int roducing t he view t o replace com m on SQL ret rieval code wit hin access program s, you m ust refact or t hat code accordingly t o use t he new view inst ead. I f you are int roducing t he view for report ing purposes, new report s should be writ t en t o t ake advant age of t hat view. For program s accessing t he view, t he view looks like a read- only t able. When SQL code in your applicat ion is duplicat ed, t here is a great er pot ent ial for bugs because when you change t he SQL in one place, you m ust m ake sim ilar changes t o all of t he duplicat es. For exam ple, let 's consider inst ances of applicat ion SQL code: SELECT Customer.CustomerID, SUM(Insurance.Payment), SUM(Insurance.Value) FROM Customer, Insurance WHERE Customer.CustomerID=Insurance.CustomerID AND Customer.Status = 'ACTIVE' AND Customer.InBusinessSince 1)

I n Figure A.3, we would also say t hat a cust om er accesses zero or m ore account s and t hat an account is accessed by one or m ore cust om ers. Alt hough t here is a m any- t o- m any associat ion bet ween t he cust om er and account ent it ies, we cannot nat ively im plem ent t his in a relat ional dat abase, hence t he addit ion of t he Cust om erAccount associat ive t able t o m aint ain t he relat ionship. Furt herm ore, an order it em is part of a single order, and an order is m ade up of one or m ore order it em s. The diam ond on t he end of t he line indicat es t hat t he relat ionship bet ween Order and OrderI t em is one of aggregat ion, also called a " part of" associat ion. When t he m ult iplicit y is not indicat ed beside t he diam ond, a 1 is assum ed. Figure A.4 present s several m ore exam ples of relat ionships bet ween t ables and how t o read t hem .

Figu r e A.4 . Ex a m ple s of r e la t ion sh ips.

A Cu st om e r ow n s ze r o or m or e policie s, a policy is ow n e d by on e a n d on ly on e cu st om e r .

An or de r is m a de u p of on e or m or e or de r it e m s, a n d a n or de r it e m is pa r t of a sin gle or de r .

A cu st om e r a cce sse s ze r o or m or e a ccou n t s, a n d a n a ccou n t is a cce sse d by on e or m or e cu st om e r s.

A policy n ot e w ill be w r it t e n for a sin gle policy, a n d a policy m a y h a ve a sin gle policy n ot e w r it t e n for it .

Figure A.5 overviews t he not at ion for several ot her com m on dat abase concept s:

Figu r e A.5 . N ot a t ion for m ode lin g st or e d pr oce du r e s, vie w s, a n d in dice s. [View full size image]

St or e d pr oce du r e s. St ored procedures are present ed in a t wo- sect ion rect angle. The t op sect ion indicat es t he nam e of t he dat abase and t he st ereot ype of St ored Procedures. The bot t om sect ion list s t he signat ures of t he st ored procedures ( or funct ions) wit hin t he dat abase. I n dice s. An index is m odeled as a box wit h t he nam e of t he index, t he st ereot ype I ndex, and dependency relat ionships point ing t o t he colum n( s) upon which t he index is based. Vie w s. A view is depict ed as a t wo- sect ion rect angle. The t op sect ion indicat es t he nam e of t he view and t he st ereot ype View . The bot t om sect ion, which is opt ional, list s t he colum ns cont ained wit hin t he view.

Glossary This glossary describes t he m aj or t erm s as we have used t hem wit hin t he cont ext of t his book. Agile D a t a ( AD ) m e t h od A collect ion of philosophies and roles for defining how dat a- orient ed developm ent act ivit ies can be applied in an agile m anner.

Agile M ode l- D r ive n D e ve lopm e n t ( AM D D ) A highly it erat ive approach t o developm ent in which you creat e agile m odels before you writ e source code.

Agile M ode lin g ( AM ) A chaordic, pract ices- based m et hodology t hat describes how t o be effect ive at m odeling and docum ent at ion.

Agile soft w a r e de ve lopm e n t An evolut ionary and highly collaborat ive approach t o developm ent in which t he focus is on delivering high- qualit y, t est ed soft ware t hat m eet s t he highest - priorit y needs of it s st akeholders on a regular basis.

Agile Un ifie d Pr oce ss ( AUP) An inst ant iat ion of t he Unified Process ( UP) t hat applies com m on agile pract ices such as dat abase refact oring, Test - Driven Developm ent , and Agile Model- Driven Developm ent .

Ar ch it e ct u r a l r e fa ct or in g A change t hat im proves t he overall m anner in which ext ernal program s int eract wit h a dat abase.

Ar t ifa ct A docum ent , m odel, file, diagram , or ot her it em t hat is produced, m odified, or used during t he developm ent , operat ion, or support of a syst em .

Be h a vior a l se m a n t ics The m eaning of t he funct ionalit y im plem ent ed wit hin your dat abase.

Code sm e ll A com m on cat egory of problem in your source code t hat indicat es t he need t o refact or it .

Con ce pt u a l/ dom a in m ode l A m odel t hat depict s t he m ain business ent it ies and t he relat ionships bet ween t hem , and opt ionally t he at t ribut es or responsibilit ies of t hose ent it ies.

Cou plin g A m easure of t he dependence bet ween t wo it em s; t he m ore highly coupled t wo t hings are, t he great er t he chance t hat a change in one will require a change in anot her.

Cr yst a l A self- adapt ing fam ily of " hum an- powered" agile m et hodologies developed by Alist air Cockburn.

D a t a a cce ss obj e ct ( D AO) A class t hat im plem ent s t he necessary dat abase code t o persist a corresponding business class.

D a t a ba se r e fa ct or in g ( n ou n ) A sim ple change t o a dat abase schem a t hat im proves it s design while ret aining bot h it s behavioral and inform at ional sem ant icsin ot her words, you can neit her add new funct ionalit y nor break exist ing funct ionalit y, nor can you add new dat a nor change t he m eaning of exist ing dat a.

D a t a ba se r e fa ct or in g ( ve r b) The process by which you evolve an exist ing dat abase schem a a sm all bit at a t im e t o im prove t he qualit y of it s design wit hout changing it s sem ant ics.

D a t a ba se r e gr e ssion t e st in g A process in which you ensure t hat t he dat abase schem a act ually works by developing and t hen regularly running a t est suit e against it .

D a t a ba se sch e m a The st ruct ural aspect s, such as t able and view definit ions, and t he funct ional aspect s, such as dat abase m et hods, of a dat abase.

D a t a ba se sm e ll A com m on problem wit hin a dat abase schem a t hat indicat es t he pot ent ial need t o refact or it .

D a t a ba se t r a n sfor m a t ion A change t o your dat abase schem a t hat m ay or m ay not change t he sem ant ics of t he schem a. A dat abase refact oring is a kind of dat abase t ransform at ion.

D a t a de fin it ion la n gu a ge ( D D L) Com m ands support ed by a dat abase t hat enable t he creat ion, rem oval, or m odificat ion of st ruct ures ( such as relat ional t ables or classes) wit hin it .

D a t a m a n ipu la t ion la n gu a ge ( D M L) Com m ands support ed by a dat abase t hat enables t he access of dat a wit hin it , including t he creat ion, ret rieval, updat e, and delet ion of t hat dat a.

D a t a qu a lit y r e fa ct or in g A change t hat im proves t he qualit y of t he inform at ion cont ained wit hin a dat abase.

D e m o sa n dbox A t echnical environm ent int o which you deploy soft ware t o dem onst rat e it t o people out side of your im m ediat e developm ent t eam .

D e ploym e n t w in dow A specific point in t im e during which it is perm issible t o deploy a syst em int o product ion. Oft en called a release window.

D e pr e ca t ion pe r iod See Transit ion period.

D e ve lopm e n t sa n dbox A t echnical environm ent in which I T professionals writ e, t est , and build soft ware.

D yn a m ic Syst e m D e ve lopm e n t M e t h od ( D SD M )

An agile m et hod t hat is a form alizat ion of t he Rapid Applicat ion Developm ent ( RAD) m et hods of t he 1980s.

En t e r pr ise Un ifie d Pr oce ss ( EUP) An ext ension t o t he Rat ional Unified Process ( RUP) which addresses t he cross- proj ect / syst em needs.

Evolu t ion a r y da t a m ode lin g A process in which you m odel t he dat a aspect s of a syst em it erat ively and increm ent ally, t o ensure t hat t he dat abase schem a evolves in st ep wit h t he applicat ion code.

Evolu t ion a r y soft w a r e de ve lopm e n t An approach in which you work bot h it erat ively and increm ent ally.

Ex t r a ct - t r a n sfor m - loa d ( ETL) An approach t o m oving dat a from one source t o anot her in which you " cleanse" it during t he process.

Ex t r e m e Pr ogr a m m in g ( XP) A disciplined and deliberat e agile developm ent m et hod t hat focuses on t he crit ical act ivit ies required t o build soft ware.

Fe a t u r e - D r ive n D e ve lopm e n t ( FD D ) An agile developm ent m et hod based on short it erat ions t hat is driven by a shared obj ect dom ain m odel and feat ures ( sm all requirem ent s) .

I n cr e m e n t a l soft w a r e de ve lopm e n t An approach t o soft ware developm ent t hat organizes a proj ect int o several releases inst ead of one " big- bang" release.

I n for m a t ion a l se m a n t ics The m eaning of t he inform at ion wit hin t he dat abase from t he point of view of t he users of t hat inform at ion.

I t e r a t ion

A period of t im e, oft en a week or t wo, during which working soft ware is writ t en. Also called a developm ent cycle.

I t e r a t ive soft w a r e de ve lopm e n t A nonserial approach t o developm ent in which you are likely t o do som e requirem ent s definit ion, som e m odeling, som e program m ing, or som e t est ing on any given day.

M e t h od Wit hin a dat abase, a st ored procedure, st ored funct ion, or t rigger.

M e t h od r e fa ct or in g A change t o a m et hod t hat im proves it s qualit y. Many code refact orings are applicable t o dat abase m et hods.

M ode l st or m in g A short burst of m odeling, oft en 5 t o 15 m inut es in durat ion, in which t wo or m ore people work t oget her t o explore part of t he problem or solut ion dom ain. Model st orm ing sessions are im m ediat ely followed by coding sessions ( oft en several hours or days in lengt h) .

M u lt i- a pplica t ion da t a ba se A dat abase t hat is accessed by several applicat ions, one or m ore of which are out side t he scope of your cont rol.

Obj e ct - r e la t ion a l m a ppin g ( ORM ) The definit ion of a relat ionship( s) bet ween t he dat a aspect s of an obj ect schem a ( for exam ple, Java classes) and a relat ional dat abase schem a.

Pr odu ct ion e n vir on m e n t The t echnical environm ent in which end users run one or m ore syst em s.

Pr odu ct ion ph a se The port ion of t he syst em life cycle where t he syst em is run by it s end users.

Pr oj e ct in t e gr a t ion sa n dbox A t echnical environm ent where t he code of all proj ect t eam m em bers is com piled and t est ed.

Ra pid Applica t ion D e ve lopm e n t ( RAD ) An approach t o soft ware developm ent t hat is highly evolut ionary in nat ure t hat t ypically involves significant am ount s of user int erface prot ot yping.

Ra t ion a l Un ifie d Pr oce ss ( RUP) A rigorous, four- phase soft ware developm ent process creat ed by I BM Rat ional t hat is evolut ionary in nat ure.

Re fa ct or in g ( n ou n ) A sim ple change t o source code t hat ret ains it s behavioral sem ant ics: You neit her add funct ionalit y when you are refact oring nor t ake it away. A refact oring m erely im proves t he design of your codenot hing m ore and not hing less.

Re fa ct or in g ( ve r b) A program m ing t echnique t hat enables you t o evolve your code slowly over t im e, t o t ake an evolut ionary approach t o program m ing.

Re fe r e n t ia l in t e gr it y The assurance t hat a reference from one ent it y t o anot her ent it y is valid. I f ent it y A references ent it y B, ent it y B exist s. I f ent it y B is rem oved, all references t o ent it y B m ust also be rem oved.

Re fe r e n t ia l in t e gr it y r e fa ct or in g A change t hat ensures t hat a referenced row exist s wit hin anot her t able and/ or ensures t hat a row t hat is no longer needed is rem oved appropriat ely.

Re gr e ssion t e st su it e A collect ion of t est s t hat are run against a syst em on a regular basis t o validat e t hat it works according t o t he t est s.

Sa n dbox A fully funct ioning environm ent in which a syst em m ay be built , t est ed, and/ or run.

Scr u m An agile m et hod whose focus is on proj ect m anagem ent and requirem ent s m anagem ent . Scrum

is oft en com bined wit h XP.

Se r ia l de ve lopm e n t An approach in which fairly det ailed m odels are creat ed before im plem ent at ion is " allowed" t o begin. Also known as wat erfall developm ent .

Sin gle - a pplica t ion da t a ba se A dat abase t hat is accessed by a single applicat ion t hat is " owned" by t he sam e t eam t hat owns t he dat abase.

St a n du p m e e t in g A short m eet ing t hat is held wit h t eam m em bers st anding up and giving st at us report s on t asks t hey did yest erday and plan t o do t oday, problem s t hey faced, archit ect ure t hey changed, and ot her im port ant t hings t hat need t o be com m unicat ed t o t he t eam .

St e r e ot ype A UML const ruct t hat denot es a com m on use of a m odeling elem ent . St ereot ypes are used t o ext end t he UML in a consist ent m anner.

St r u ct u r a l r e fa ct or in g A change t o t he definit ion of one or m ore t ables or views.

Te st - D r ive n D e ve lopm e n t ( TD D ) The com binat ion of TFD and refact oring.

Te st - Fir st D e ve lopm e n t ( TFD ) An evolut ionary approach t o developm ent in which you m ust first writ e a t est t hat fails before you writ e new funct ional code so t hat t he t est passes. This is also known as Test - First Program m ing.

Tr a n sa ct ion A single unit of work t hat eit her com plet ely succeeds or com plet ely fails. A t ransact ion m ay be one or m ore updat es t o an obj ect , one or m ore reads, one or m ore delet es, one or m ore insert ions, or any com binat ion t hereof.

Tr a n sit ion pe r iod

The t im e during which bot h t he old schem a and t he new schem a are support ed in parallel. Also referred t o as a deprecat ion period.

Tr igge r A dat abase m et hod t hat is aut om at ically invoked as t he result of Dat a Manipulat ion Language ( DML) act ivit y wit hin a persist ence m echanism .

Un ifie d M ode lin g La n gu a ge ( UM L) The definit ion of a st andard m odeling language for obj ect - orient ed soft ware, including t he definit ion of a m odeling not at ion and t he sem ant ics for applying it , as defined by t he Obj ect Managem ent Group ( OMG) .

W a t e r fa ll de ve lopm e n t See Serial developm ent .

XUn it A fam ily of unit t est ing t ools, including JUnit for Java, VBUnit for Visual Basic, NUnit for .NET, and OUnit for Oracle.

References and Recommended Reading Agile Alliance ( 2001a) . Manifest o for Agile Soft ware Developm ent. www.agilealliance.com Agile Alliance ( 2001b) . Principles: The Agile Alliance. www.agilealliance.com / principles.ht m l Am bler, S. W . ( 2002) . Agile Modeling: Best Pract ices for t he Unified Process and Ext rem e Program m ing. New York: John Wiley & Sons. www.am bysoft .com / agileModeling.ht m l Am bler, S. W . ( 2003) . Agile Dat abase Techniques: Effect ive St rat egies for t he Agile Soft ware Developer. New York: John Wiley & Sons. www.am bysoft .com / agileDat abaseTechniques.ht m l Am bler, S. W . ( 2004) . The Obj ect Prim er, 3rd Edit ion: Agile Model Driven Developm ent wit h UML 2 . New York: Cam bridge Universit y Press. www.am bysoft .com / t heObj ect Prim er.ht m l Am bler, S. W . ( 2005a) . A UML Profile for Dat a Modeling. www.agiledat a.org/ essays/ um lDat aModelingProfile.ht m l Am bler, S. W . ( 2005b) . The Elem ent s of UML 2.0 St yle. New York: Cam bridge Universit y Press. www.am bysoft .com / elem ent sUMLSt yle.ht m l Am bler, S. W . ( 2005c) . The Agile Unified Process. www.am bysoft .com / unifiedprocess/ agileUP.ht m l Ast els D . ( 2003) . Test Driven Developm ent : A Pract ical Guide. Upper Saddle River, NJ: Prent ice Hall. Beck, K . ( 2003) . Test Driven Developm ent : By Exam ple. Bost on: Addison- Wesley. Boehm , B. W ., and Turner, R . ( 2003) . Balancing Agilit y and Discipline: A Guide for t he Perplexed. Reading, MA: Addison- Wesley Longm an, I nc. Burry, C ., and Mancusi, D . ( 2004) . How t o Plan for Dat a Migrat ion. Com put erworld. www.com put erworld.com / dat abaset opics/ dat a/ st ory/ 0,10801,93284,00.ht m l Celko, J . ( 1999) . Joe Celko's Dat a & Dat abases: Concept s in Pract ice. San Francisco: Morgan Kaufm ann. Cockburn, A . ( 2002) . Agile Soft ware Developm ent . Reading, MA: Addison- Wesley Longm an, I nc. Dat e, C. J . ( 2001) . An I nt roduct ion t o Dat abase Syst em , 7/ e. Reading, MA: Addison- Wesley Longm an, I nc. Feat hers, M . ( 2004) . Working Effect ively wit h Legacy Code. Bost on: Addison- Wesley.

Fowler, M . ( 1999) . Refact oring: I m proving t he Design of Exist ing Code. Menlo Park, CA: AddisonWesley Longm an. Fowler, M ., and Sadalage, P . ( 2003) . Evolut ionary Dat abase Design. m art infowler.com / art icles/ evodb.ht m l Halpin, T. A . ( 2001) . I nform at ion Modeling and Relat ional Dat abases: From Concept ual Analysis t o Logical Design. San Francisco: Morgan Kaufm ann. Hay, D. C . ( 1996) . Dat a Model Pat t erns: Convent ions of Thought . New York: Dorset House Publishing. Hay, D. C . ( 2003) . Requirem ent s Analysis: From Business Views t o Archit ect ure. Upper Saddle River, NJ: Prent ice Hall. Hernandez, M. J ., and Viescas, J. L . ( 2000) . SQL Queries for Mere Mort als: A Hands- on Guide t o Dat a Manipulat ion in SQL. Reading, MA: Addison- Wesley. Kerievsky, J . ( 2004) . Refact oring t o Pat t erns. Bost on: Addison- Wesley. Kyt e, T . ( 2005) . Avoiding Mut at ing Tables. askt om .oracle.com / ~ t kyt e/ Mut at e/ Larm an, C . ( 2004) . Agile and I t erat ive Developm ent : A Manager's Guide. Bost on: Addison- Wesley. Loshin, D . ( 2004a) . I ssues and Opport unit ies in Dat a Qualit y Managem ent Coordinat ion. DM Review Magazine, April 2004. Loshin, D . ( 2004b) . More on I nform at ion Qualit y ROI . The Dat a Adm inist rat ion Newslet t er ( TDAN.com ) , July 2004. www.t dan.com / nwt _issue29.ht m Manns, M. L ., and Rising, L . ( 2005) . Fearless Change: Pat t erns for I nt roducing New I deas. Bost on: Pearson Educat ion, Lt d. Meszaros, G . ( 2006) . Xunit Test Pat t erns: Refact oring Test Code. Bost on: Prent ice Hall. Muller, R. J . ( 1999) . Dat abase Design for Sm art ies: Using UML for Dat a Modeling. San Francisco: Morgan Kaufm ann. Mullins, C. S . ( 2002) . Dat abase Adm inist rat ion: The Com plet e Guide t o Pract ices and Procedures. Reading, MA: Addison- Wesley Longm an, I nc. Pascal, F . ( 2000) . Pract ical I ssues in Dat abase Managem ent : A Reference for t he Thinking Pract it ioner . Upper Saddle River, NJ: Addison- Wesley. Sadalage, P ., and Schuh, P . ( 2002) . The Agile Dat abase: Tut orial Not es. Paper present ed at XP/ Agile Universe 2002. Ret rieved Novem ber 12, 2003 from www.xpuniverse.com Seiner, R . ( 2004) . Dat a St ewardship Perform ance Measures. The Dat a Adm inist rat ion Newslet t er, July

2004. www.t dan.com / i029fe01.ht m William s, L ., and Kessler, R . ( 2002) . Pair Program m ing I llum inat ed. Bost on: Addison- Wesley.

List of Refactorings and Transformations Add CRUD M e t h ods (page 232) : I nt roduce st ored procedures ( m et hods) t o im plem ent t he creat ion, ret rieval, updat e, and delet ion ( CRUD) of t he dat a represent ing a business ent it y. Add For e ign Ke y Con st r a in t (page 204) : Add a foreign key const raint t o an exist ing t able t o enforce a relat ionship t o anot her t able. Add Look u p Ta ble (page 153) : Creat e a lookup t able for an exist ing colum n. Add M ir r or Ta ble (page 236) : Creat e a m irror t able, an exact duplicat e of an exist ing t able in one dat abase, in anot her dat abase. Add Pa r a m e t e r (page 278) : Exist ing m et hod needs inform at ion t hat was not passed in before. Add Re a d M e t h od (page 240) : I nt roduce a m et hodin t his case, a st ored proceduret o im plem ent t he ret rieval of t he dat a represent ing zero or m ore business ent it ies from t he dat abase. Add Tr igge r For Ca lcu la t e d Colu m n (page 209) : I nt roduce a new t rigger t o updat e t he value cont ained in a calculat ed colum n. Apply St a n da r d Code s (page 157) : Apply a st andard set of code values t o a single colum n t o ensure t hat it conform s t o t he values of sim ilar colum ns st ored elsewhere in t he dat abase. Apply St a n da r d Type (page 162) : Ensure t hat t he dat a t ype of a colum n is consist ent wit h t he dat a t ype of ot her sim ilar colum ns wit hin t he dat abase. Con solida t e Con dit ion a l Ex pr e ssion (page 283) : Com bine sequence of condit ional t est s int o a single condit ional expression and ext ract it . Con solida t e Ke y St r a t e gy (page 168) : Choose a single key st rat egy for an ent it y and apply it consist ent ly t hroughout your dat abase. D e com pose Con dit ion a l (page 284) : Ext ract m et hods from t he condit ion. D r op Colu m n (page 72) : Rem ove a colum n from an exist ing t able. D r op Colu m n Con st r a in t (page 172) : Rem ove a colum n const raint from an exist ing t able. D r op D e fa u lt Va lu e (page 174) : Rem ove t he default value t hat is provided by a dat abase from an exist ing t able colum n. D r op For e ign Ke y Con st r a in t (page 213) : Rem ove a foreign key const raint from an exist ing t able so t hat a relat ionship t o anot her t able is no longer enforced by t he dat abase. D r op N on - N u lla ble (page 177) : Change an exist ing non- nullable colum n so t hat it accept s null values. D r op Ta ble (page 77) : Rem ove an exist ing t able from t he dat abase. D r op Vie w (page 79) : Rem ove an exist ing view. En ca psu la t e Ta ble W it h Vie w (page 243) : Wrap access t o an exist ing t able wit h a view. Ex t r a ct M e t h od (page 285) : Turn t he code fragm ent int o a m et hod whose nam e explains t he purpose of t he m et hod.

I n se r t D a t a (page 296) : I nsert dat a int o an exist ing t able. I n t r odu ce Ca lcu la t e d Colu m n (page 81) : I nt roduce a new colum n based on calculat ions involving dat a in one or m ore t ables. I n t r odu ce Ca lcu la t ion M e t h od (page 245) : I nt roduce a new m et hod, t ypically a st ored funct ion, which im plem ent s a calculat ion t hat uses dat a st ored wit hin t he dat abase. I n t r odu ce Ca sca din g D e le t e (page 215) : Ensure t hat t he dat abase aut om at ically delet es t he appropriat e " child records" when a " parent record" is delet ed. I n t r odu ce Colu m n Con st r a in t (page 180) : I nt roduce a colum n const raint in an exist ing t able. I n t r odu ce Com m on For m a t (page 183) : Apply a consist ent form at t o all t he dat a values in an exist ing t able colum n. I n t r odu ce D e fa u lt Va lu e (page 186) : Let t he dat abase provide a default value for an exist ing t able colum n. I n t r odu ce H a r d D e le t e (page 219) : Rem ove an exist ing colum n which indicat es t hat a row has been delet ed and inst ead act ually delet e t he row. I n t r odu ce I n de x (page 248) : I nt roduce a new index of eit her unique or non- unique t ype. I n t r odu ce N e w Colu m n (page 301) : I nt roduce a new colum n in an exist ing t able. I n t r odu ce N e w Ta ble (page 304) : I nt roduce a new t able in an exist ing dat abase. I n t r odu ce Re a d- On ly Ta ble (page 251) : Creat e a read- only dat a st ore based on exist ing t ables in t he dat abase. I n t r odu ce Soft D e le t e (page 222) : I nt roduce a flag t o an exist ing t able which indicat es t hat a row has been delet ed inst ead of act ually delet ing t he row. I n t r odu ce Su r r oga t e Ke y (page 85) : Replace an exist ing nat ural key wit h a surrogat e key. I n t r odu ce Tr igge r For H ist or y (page 227) : I nt roduce a new t rigger t o capt ure dat a changes for hist orical or audit purposes. I n t r odu ce Va r ia ble (page 287) : Put t he result of t he expression, or part s of t he expression, in a t em porary variable wit h a nam e t hat explains t he purpose. I n t r odu ce Vie w (page 306) : Creat e a view based on exist ing t ables in t he dat abase. M a k e Colu m n N on - N u lla ble (page 189) : Change an exist ing colum n so t hat it does not accept any null values. M e r ge Colu m n s (page 92) : Merge t wo or m ore colum ns wit hin a single t able. M e r ge Ta ble s (page 96) : Merge t wo or m ore t ables int o a single t able. M igr a t e M e t h od Fr om D a t a ba se (page 257) : Rehost an exist ing dat abase m et hod ( a st ored procedure, st ored funct ion, or t rigger) in t he applicat ion( s) which current ly invoke it . M igr a t e M e t h od To D a t a ba se (page 261) : Rehost exist ing applicat ion logic in t he dat abase. M ove Colu m n (page 103) : Migrat e a t able colum n, wit h all of it s dat a, t o anot her exist ing t able. M ove D a t a (page 192) : Move t he dat a cont ained wit hin a t able, eit her all or a subset of it s colum ns, t o anot her exist ing t able. Pa r a m e t e r ize M e t h od (page 278) : Creat e one m et hod t hat uses a param et er for t he different values.

Re m ove Con t r ol Fla g (page 289) : Use ret urn or break inst ead of a variable act ing as a cont rol flag. Re m ove M iddle M a n (page 289) : Get t he caller t o call t he m et hod direct ly. Re m ove Pa r a m e t e r (page 279) : Rem ove a param et er no longer used by t he m et hod body. Re n a m e Colu m n (page 109) : Renam e an exist ing t able colum n wit h a nam e t hat explains it s purpose. Re n a m e M e t h od (page 279) : Renam e an exist ing m et hod wit h a nam e t hat explains it s purpose. Re n a m e Ta ble (page 113) : Renam e an exist ing t able wit h a nam e t hat explains it s purpose. Re n a m e Vie w (page 117) : Renam e an exist ing view wit h a nam e t hat explains it s purpose. Re or de r Pa r a m e t e r s (page 281) : Change t he order of t he param et ers of a m et hod. Re pla ce Colu m n (page 126) : Replace an exist ing non- key colum n wit h a new one. Re pla ce LOB W it h Ta ble (page 120) : Replace a large obj ect ( LOB) colum n t hat cont ains st ruct ured dat a wit h a new t able or in t he sam e t able. Re pla ce Lit e r a l W it h Ta ble Look u p (page 290) : Replace code const ant s wit h values from dat abase t ables. Re pla ce M e t h od( s) W it h Vie w (page 265) : Creat e a view based on one or m ore exist ing dat abase m et hods ( st ored procedures, st ored funct ions, or t riggers) wit hin t he dat abase. Re pla ce N e st e d Con dit ion a l W it h Gu a r d Cla u se s (page 292) : Rem ove nest ed if condit ions wit h a series of separat e I F st at em ent s. Re pla ce On e - To- M a n y W it h Associa t ive Ta ble (page 130) : Replace a one- t o- m any associat ion bet ween t wo t ables wit h an associat ive t able. Re pla ce Pa r a m e t e r W it h Ex plicit M e t h ods (page 282) : Creat e a separat e m et hod for each value of t he param et er. Re pla ce Su r r oga t e Ke y W it h N a t u r a l Ke y (page 135) : Replace a surrogat e key wit h an exist ing nat ural key. Re pla ce Type Code W it h Pr ope r t y Fla gs (page 196) : Replace a code colum n wit h individual propert y flags, usually im plem ent ed as Boolean colum ns, wit hin t he sam e t able colum n. Re pla ce Vie w W it h M e t h od( s) (page 268) : Replace an exist ing view wit h one or m ore exist ing m et hods ( st ored procedures, st ored funct ions, or t riggers) wit hin t he dat abase. Split Colu m n (page 140) : Split a colum n int o one or m ore colum ns wit hin a single t able. Split Ta ble (page 145) : Vert ically split ( e.g., by colum ns) an exist ing t able int o one or m ore t ables. Split Te m por a r y Va r ia ble (page 292) : Make a separat e t em porary variable for each assignm ent . Su bst it u t e Algor it h m (page 293) : Replace t he body of t he m et hod wit h t he new algorit hm . Upda t e D a t a (page 310) : Updat e dat a wit hin an exist ing t able. Use Officia l D a t a Sou r ce (page 271) : Use t he official dat a source for a given ent it y, inst ead of t he current one you are using. [View full size image]

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V]

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] access programs [See external access programs] AD (Agile Data) method Add CRUD Methods refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th Add Foreign Key Constraint refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Add Foreign Key refactoring 2nd Add Lookup Table refactoring 2nd 3rd 4th 5th 6th 7th 8th Add Mirror Table refactoring 2nd 3rd 4th 5th 6th 7th Add Parameter refactoring 2nd 3rd 4th Add Read Method refactoring 2nd 3rd 4th 5th 6th Add Trigger For Calculated Column refactoring 2nd 3rd 4th 5th 6th 7th adding calculation methods Introduce Calculation Method refactoring 2nd 3rd 4th 5th cascading deletes Introduce Cascading Delete refactoring 2nd 3rd 4th 5th 6th 7th columns Introduce New Column transformation 2nd 3rd 4th 5th constraints Introduce Column Constraint refactoring 2nd 3rd 4th 5th 6th data formats Introduce Common Format refactoring 2nd 3rd 4th 5th 6th default values Introduce Default Value refactoring 2nd 3rd 4th 5th hard deletes Introduce Hard Delete refactoring 2nd 3rd 4th 5th indexes Introduce Index refactoring 2nd 3rd 4th 5th 6th non-nullable columns Make Column Non Nullable refactoring 2nd 3rd 4th 5th 6th read-only tables Introduce Read-Only Table refactoring 2nd 3rd 4th 5th 6th 7th 8th soft deletes Introduce Soft Delete refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th tables Introduce New Table transformation 2nd 3rd 4th triggers Introduce Trigger For History refactoring 2nd 3rd 4th 5th 6th 7th variables Introduce Variable refactoring 2nd views Introduce View transformation 2nd 3rd 4th 5th 6th Agile Data (AD) method Agile Databases mailing list agile methods defined Agile Model-Driven Development (AMDD) 2nd agile software development values of 2nd agile software processes algorithms

Substitute Algorithm refactoring AMDD (Agile Model-Driven Development) 2nd announcing database refactoring 2nd 3rd applications [See external access programs] Apply Standard Codes refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th Apply Standard Type refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th Apply Standard Types refactoring apply Standard Types refactoring architectural refactorings Add CRUD Methods 2nd 3rd 4th 5th 6th 7th Add Mirror Table 2nd 3rd 4th 5th 6th 7th Add Read Method 2nd 3rd 4th 5th Encapsulate Table With View 2nd 3rd 4th 5th Introduce Calculation Method 2nd 3rd 4th 5th Introduce Index 2nd 3rd 4th 5th 6th Introduce Read-Only Table 2nd 3rd 4th 5th 6th 7th 8th list of Migrate Method From Database 2nd 3rd 4th 5th 6th Migrate Method To Database 2nd 3rd 4th 5th Replace Method(s) With View 2nd 3rd 4th 5th 6th Replace View With Method(s) 2nd 3rd 4th 5th 6th Use Official Data Source 2nd 3rd 4th 5th 6th associations [See relationships] associative tables naming conventions replacing one-to-many relationship with Replace One-To-Many With Associative Table refactoring 2nd 3rd 4th 5th 6th 7th 8th

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] batch synchronization behavioral semantics maintaining 2nd 3rd bundles deploying database refactorings as 2nd 3rd

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] calculated columns Add Trigger For Calculated Column refactoring 2nd 3rd 4th 5th 6th 7th Introduce Calculated Column refactoring 2nd 3rd 4th 5th 6th 7th calculation methods Introduce Calculation Method refactoring 2nd 3rd 4th 5th cascading deletes adding Introduce Cascading Delete refactoring 2nd 3rd 4th 5th 6th 7th CCB (change control board) change control board (CCB) change, fear of as database smell child records removing Introduce Cascading Delete refactoring 2nd 3rd 4th 5th 6th 7th cleansing data values 2nd 3rd code columns replacing with property flags Replace Type Code With Property Flags refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th code refactoring [See also database refactoring] database refactoring versus 2nd defined 2nd of external access programs 2nd 3rd columns Apply Standard Codes refactoring 2nd 3rd 4th 5th 6th Apply Standard Type refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th calculated columns Add Trigger For Calculated Column refactoring 2nd 3rd 4th 5th 6th 7th Introduce Calculated Column refactoring 2nd 3rd 4th 5th 6th 7th code columns replacing with property flags (Replace Type Code With Property Flags refactoring) 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th constraints adding (Introduce Column Constraint refactoring) 2nd 3rd 4th 5th 6th removing (Drop Column Constraint refactoring) 2nd 3rd 4th data formats adding (Introduce Common Format refactoring) 2nd 3rd 4th 5th 6th default values adding (Introduce Default Value refactoring) 2nd 3rd 4th 5th removing (Drop Default Value refactoring) 2nd 3rd 4th 5th excess columns as database smells Introduce New Column transformation 2nd 3rd 4th 5th lookup tables Add Lookup Table refactoring 2nd 3rd 4th 5th 6th merging Merge Columns refactoring 2nd 3rd 4th 5th 6th 7th moving Move Column refactoring 2nd 3rd 4th 5th 6th 7th 8th multipurpose columns

as database smells non-nullable columns adding (Make Column Non Nullable refactoring) 2nd 3rd 4th 5th 6th removing (Drop Non Nullable refactoring) 2nd 3rd 4th 5th removing Drop Column refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th renaming Rename Column refactoring 2nd 3rd 4th 5th 6th 7th replacing Replace Column refactoring 2nd 3rd 4th 5th 6th 7th 8th smart columns as database smells splitting Split Column refactoring 2nd 3rd 4th 5th 6th 7th 8th conditional expressions Consolidate Conditional Express refactoring 2nd Decompose Conditional refactoring 2nd nested conditionals Replace Nested Conditional with Guard Clauses refactoring 2nd configuration management [See version control] configuration management of database artifacts [See also version identification strategies] defined 2nd Consistent Key Strategy refactoring 2nd Consolidate Conditional Expression refactoring 2nd 3rd Consolidate Key Strategy refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th constraints Add Foreign Key Constraint refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th adding Introduce Column Constraint refactoring 2nd 3rd 4th 5th 6th Drop Foreign Key Constraint refactoring 2nd 3rd 4th 5th fixing after data quality refactorings removing Drop Column Constraint refactoring 2nd 3rd 4th continuous development control flags Remove Control Flag refactoring 2nd coupling external access programs reducing 2nd 3rd 4th with Introduce Surrogate Key refactoring CRUD (creation, retrieval, update, and deletion) methods Add CRUD Methods refactoring 2nd 3rd 4th 5th 6th 7th culture as impediment to evolutionary database development techniques

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] data Insert Data transformation 2nd 3rd 4th 5th 6th 7th 8th Update Data transformation 2nd 3rd 4th 5th 6th 7th data formats adding Introduce Common Format refactoring 2nd 3rd 4th 5th 6th data migration 2nd Add Foreign Key Constraint refactoring and 2nd 3rd Add Lookup Table refactoring and 2nd Add Mirror Table refactoring and 2nd 3rd Add Trigger For Calculated Column refactoring Apply Standard Codes refactoring and Apply Standard Type refactoring and Consolidate Key Strategy and Drop Column refactoring and 2nd Drop Table refactoring and Insert Data transformation and Introduce Calculated Column refactoring and Introduce Column Constraint refactoring and Introduce Common Format refactoring and 2nd Introduce Default Value refactoring and Introduce Hard Delete refactoring and Introduce Index refactoring and 2nd Introduce New Column transformation and Introduce New Table transformation and Introduce Read-Only Table refactoring and 2nd 3rd 4th 5th Introduce Soft Delete refactoring and Introduce Surrogate Key refactoring and Introduce Trigger For History refactoring and Make Column Non Nullable refactoring and 2nd Merge Columns refactoring and Merge Tables refactoring and 2nd 3rd 4th Move Column refactoring and Move Data refactoring and Rename Column refactoring and Rename Table refactoring and Replace Column refactoring and Replace LOB With Table refactoring and Replace One-To-Many With Associative Table refactoring and Replace Type Code With Property Flags refactoring and 2nd Split Column refactoring and Split Table refactoring and Update Data transformation and 2nd Use Official Data Source refactoring and validating 2nd data modeling notation 2nd 3rd 4th 5th 6th data quality [See cleansing data values] data quality refactorings Add Lookup Table 2nd 3rd 4th 5th 6th Apply Standard Codes 2nd 3rd 4th 5th 6th

Apply Standard Type 2nd 3rd 4th 5th 6th 7th 8th 9th common issues 2nd Consolidate Key Strategy 2nd 3rd 4th 5th 6th 7th Drop Column Constraint 2nd 3rd 4th Drop Default Value 2nd 3rd 4th 5th Drop Non Nullable 2nd 3rd 4th 5th Introduce Column Constraint 2nd 3rd 4th 5th 6th Introduce Common Format 2nd 3rd 4th 5th 6th Introduce Default Value 2nd 3rd 4th 5th list of Make Column Non Nullable 2nd 3rd 4th 5th 6th Move Data 2nd 3rd 4th 5th 6th Replace Type Code With Property Flags 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th data refactoring documentation and data sources Use Official Data Source refactoring 2nd 3rd 4th 5th 6th data types Apply Standard Type refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th data updates after data quality refactorings database access logic encapsulation of external access programs database change control board (CCB) database configuration tables 2nd database development techniques versus software development techniques 2nd database implementation database refactoring as technique 2nd 3rd database performance Introduce Surrogate Key refactoring and Split Table refactoring and database refactoring as database implementation technique 2nd 3rd categories of code refactoring versus 2nd data quality refactorings [See data quality refactorings] database smells and 2nd 3rd defined 2nd 3rd 4th 5th deployment 2nd as bundles 2nd 3rd between sandboxes 2nd process overview 2nd 3rd removing deprecated schemas 2nd scheduling deployment windows 2nd 3rd documentation and example 2nd in multi-application database architecture 2nd 3rd in single-application database architecture 2nd 3rd semantics, maintaining 2nd 3rd "lessons learned" strategies configuration management of database artifacts database change control board (CCB) database configuration tables 2nd encapsulation of database access logic installation scripts 2nd large changes implemented as small changes list of politics

small changes, ease of application 2nd SQL duplication synchronization with triggers team negotiations transition periods version identification 2nd 3rd modeling versus 2nd online resources process of announcing refactoring 2nd 3rd migrating source data 2nd modifying database schema 2nd 3rd 4th overview 2nd 3rd 4th refactoring external access programs 2nd 3rd regression testing selecting appropriate refactoring 2nd testing phases 2nd 3rd 4th 5th 6th transition period 2nd 3rd 4th 5th verifying appropriateness of refactoring 2nd 3rd version control reducing coupling 2nd 3rd 4th structural refactorings [See structural refactorings] database regression testing defined 2nd 3rd 4th database schema deprecation of 2nd 3rd 4th 5th modifying 2nd 3rd 4th testing 2nd 3rd versioning database smells list of 2nd 3rd database transformations database refactorings as subset of deadlock avoiding Decompose Conditional refactoring 2nd default values adding Introduce Default Value refactoring 2nd 3rd 4th 5th removing Drop Default Value refactoring 2nd 3rd 4th 5th Delete Data refactoring deploying database refactoring 2nd as bundles 2nd 3rd between sandboxes 2nd process overview 2nd 3rd removing deprecated schemas 2nd scheduling deployment windows 2nd 3rd deployment windows scheduling 2nd 3rd deprecated schemas removing 2nd deprecation of database schema 2nd 3rd 4th 5th developer sandboxes defined 2nd 3rd deployment between 2nd documentation data refactoring and database refactoring and

Drop Column Constraint refactoring 2nd 3rd 4th 5th Drop Column refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th Drop Default Value refactoring 2nd 3rd 4th 5th Drop Foreign Key Constraint refactoring 2nd 3rd 4th 5th 6th Drop Foreign Key refactoring Drop Non Nullable refactoring 2nd 3rd 4th 5th 6th Drop Table refactoring 2nd 3rd 4th 5th 6th 7th 8th Drop View refactoring 2nd 3rd 4th 5th duplication of SQL code

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] Encapsulate Table With View refactoring 2nd 3rd 4th 5th 6th encapsulation of database access logic with Add CRUD Methods refactoring with Add Read Method refactoring evolutionary data modeling defined 2nd 3rd 4th 5th 6th evolutionary database development advantages of 2nd disadvantages of 2nd evolutionary database development techniques database refactoring 2nd 2nd [See database refactoring] database regression testing 2nd 3rd developer sandboxes 2nd evolutionary data modeling 2nd 3rd 4th 5th impediments to 2nd list of 2nd evolutionary methods defined evolutionary software processes examples of excess columns as database smells excess rows as database smells external access programs continuous development coupling refactoring 2nd 3rd testing 2nd updating for Add CRUD Methods refactoring 2nd for Add Foreign Key Constraint refactoring 2nd for Add Lookup Table refactoring 2nd for Add Mirror Table refactoring 2nd for Add Read Method refactoring 2nd for Add Trigger For Calculated Column refactoring for Apply Standard Codes refactoring 2nd for Apply Standard Type refactoring 2nd for Consolidate Key Strategy refactoring 2nd for Drop Column Constraint refactoring for Drop Column refactoring 2nd 3rd 4th for Drop Default Value refactoring 2nd for Drop Foreign Key Constraint refactoring for Drop Non Nullable refactoring 2nd for Drop Table refactoring for Drop View refactoring for Encapsulate Table With View refactoring for Insert Data transformation 2nd 3rd 4th for Introduce Calculated Column refactoring 2nd

for Introduce Calculation Method refactoring 2nd for Introduce Cascading Delete refactoring 2nd 3rd for Introduce Column Constraint refactoring 2nd for Introduce Common Format refactoring 2nd for Introduce Default Value refactoring 2nd for Introduce Hard Delete refactoring 2nd for Introduce Index refactoring for Introduce New Column transformation 2nd for Introduce New Table transformation for Introduce Read-Only Table refactoring 2nd for Introduce Soft Delete refactoring 2nd 3rd for Introduce Surrogate Key refactoring 2nd 3rd for Introduce Trigger For History refactoring 2nd for Introduce View transformation 2nd 3rd for Make Column Non Nullable refactoring 2nd for Merge Columns refactoring 2nd for Merge Tables refactoring 2nd 3rd 4th for Migrate Method From Database refactoring 2nd for Migrate Method To Database refactoring 2nd 3rd for Move Column refactoring 2nd 3rd for Move Data refactoring 2nd for Rename Column refactoring 2nd for Rename Table refactoring 2nd for Rename View refactoring 2nd for Replace Column refactoring 2nd for Replace LOB With Table refactoring 2nd 3rd for Replace Method(s) With View refactoring 2nd for Replace One-To-Many With Associative Table refactoring 2nd for Replace Surrogate Key With Natural Key refactoring 2nd 3rd for Replace Type Code With Property Flags refactoring 2nd 3rd 4th for Replace View With Method(s) refactoring 2nd for Split Column refactoring 2nd 3rd for Split Table refactoring 2nd for Update Data transformation 2nd for Use Official Data Source refactoring 2nd Extract Method refactoring 2nd 3rd 4th 5th 6th Extract Stored Procedure refactoring

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] fear of change as database smell foreign keys Add Foreign Key Constraint refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th Drop Column refactoring and 2nd removing Drop Foreign Key Constraint refactoring 2nd 3rd 4th 5th Fowler, Martin Refactoring (ital) 2nd

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] guard clauses Replace Nested Conditional with Guard Clauses refactoring 2nd

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] hard deletes adding Introduce Hard Delete refactoring 2nd 3rd 4th 5th historical changes adding triggers for Introduce Trigger For History refactoring 2nd 3rd 4th 5th 6th 7th

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] indexes Introduce Index refactoring 2nd 3rd 4th 5th 6th indices UML notation for 2nd informational semantics maintaining 2nd 3rd Insert Data refactoring Insert Data transformation 2nd 3rd 4th 5th 6th 7th 8th installation scripts 2nd interface changing refactorings Add Parameter refactoring 2nd Parameterize Method refactoring Remove Parameter refactoring 2nd Rename Method refactoring 2nd Reorder Parameters refactoring 2nd 3rd 4th Replace Parameter with Explicit Methods refactoring 2nd internal refactorings (for methods) Consolidate Conditional Expression 2nd Decompose Conditional 2nd Extract Method 2nd 3rd 4th 5th Introduce Variable 2nd Remove Control Flag 2nd Remove Middle Man 2nd Replace Literal with Table Lookup 2nd 3rd Replace Nested Conditional with Guard Clauses 2nd Split Temporary Variable 2nd Substitute Algorithm Introduce Calculated Column refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th Introduce Calculation Method refactoring 2nd 3rd 4th 5th 6th Introduce Cascading Delete refactoring 2nd 3rd 4th 5th 6th 7th Introduce Column Constraint refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th Introduce Column transformation 2nd 3rd Introduce Common Format refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Introduce Default Value refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Introduce Hard Delete refactoring 2nd 3rd 4th 5th 6th 7th Introduce Index refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th Introduce New Column refactoring Introduce New Column transformation 2nd 3rd 4th 5th 6th 7th 8th Introduce New Table transformation 2nd 3rd 4th 5th Introduce Read-Only Table refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th Introduce Soft Delete refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th Introduce Surrogate Key refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th Introduce Trigger For History refactoring 2nd 3rd 4th 5th 6th 7th 8th Introduce Variable refactoring 2nd Introduce View refactoring 2nd Introduce View transformation 2nd 3rd 4th 5th 6th

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] joins Move Column refactoring and

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] keys [See foreign keys, natural keys, primary keys, surrogate keys] Consolidate Key Strategy refactoring 2nd 3rd 4th 5th 6th 7th UML notation for 2nd

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] large changes implemented as small changes large object [See LOB] literal numbers Replace Literal with Table Lookup refactoring 2nd 3rd LOB (large object) replacing Replace LOB With Table refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th logical deletes [See soft deletes] lookup tables Add Lookup Table refactoring 2nd 3rd 4th 5th 6th

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] Make Column Non Nullable refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th many-to-many relationship Replace One-To-Many With Associative Table refactoring and materialized views Merge Columns refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Merge Table refactoring Merge Tables refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th method refactorings interface changing refactorings Add Parameter refactoring 2nd Parameterize Method refactoring Remove Parameter refactoring 2nd Rename Method refactoring 2nd Reorder Parameters refactoring 2nd 3rd 4th Replace Parameter with Explicit Methods refactoring 2nd internal refactorings Consolidate Conditional Expression 2nd Decompose Conditional 2nd Extract Method 2nd 3rd 4th 5th Introduce Variable 2nd Remove Control Flag 2nd Remove Middle Man 2nd Replace Literal with Table Lookup 2nd 3rd Replace Nested Conditional with Guard Clauses 2nd Split Temporary Variable 2nd Substitute Algorithm Migrate Method From Database refactoring 2nd 3rd 4th 5th 6th Migrate Method To Database refactoring 2nd 3rd 4th 5th migration [See data migration] mirror tables Add Mirror Table refactoring 2nd 3rd 4th 5th 6th 7th modeling database refactoring versus 2nd Move Column refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th Move Data refactoring 2nd 3rd 4th 5th 6th 7th 8th Move Data refactorings Move Data transformation multi-application database architecture database refactoring example 2nd 3rd multiplicity indicators UML notation multipurpose columns as database smells multipurpose tables as database smells

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] naming conventions associative tables fixing after structural refactorings for CRUD methods natural keys replacing surrogate keys with Replace Surrogate Key with Natural Key refactoring 2nd 3rd 4th 5th 6th 7th 8th surrogate keys versus 2nd nested conditionals Replace Nested Conditional with Guard Clauses refactoring 2nd non-nullable columns adding Make Column Non Nullable refactoring 2nd 3rd 4th 5th 6th removing Drop Non Nullable refactoring 2nd 3rd 4th 5th normalization Move Column refactoring and Split Table refactoring and notation (UML data modeling) 2nd 3rd 4th 5th 6th numbering scripts

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] one-to-many relationship replacing Replace One-To-Many With Associative Table refactoring 2nd 3rd 4th 5th 6th 7th 8th online resources order (for keys) UML notation for

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] Parameterize Method refactoring parameters Add Parameter refactoring 2nd Parameterize Method refactoring Remove Parameter refactoring 2nd Reorder Parameters refactoring 2nd 3rd 4th Replace Parameter with Explicit Methods refactoring 2nd PDM (physical data model) performance [See database performance] periodic refreshes physical data model (PDM) politics production deployment [See deploying database refactoring] property flags Replace Type Code With Property Flags refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] quality [See cleansing data values]

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] read methods Add Read Method refactoring 2nd 3rd 4th 5th read-only tables Introduce Read-Only Table refactoring 2nd 3rd 4th 5th 6th 7th 8th real-time application updates for read-only tables redundant data as database smells refactoring [See also database refactoring] [See also database refactoring] [See code refactoring] defined Refactoring (ital) (Fowler) 2nd referential integrity refactorings Add Foreign Key Constraint 2nd 3rd 4th 5th 6th 7th 8th 9th Add Trigger For Calculated Column 2nd 3rd 4th 5th 6th 7th Drop Foreign Key Constraint 2nd 3rd 4th 5th Introduce Cascading Delete 2nd 3rd 4th 5th 6th 7th Introduce Hard Delete 2nd 3rd 4th 5th Introduce Soft Delete 2nd 3rd 4th 5th 6th 7th 8th 9th Introduce Trigger For History 2nd 3rd 4th 5th 6th 7th list of regression testing 2nd [See also database regression testing] defined relationships UML notation for 2nd 3rd 4th release windows [See deployment windows] Remove Control Flag refactoring 2nd Remove Middle Man refactoring 2nd Remove Parameter refactoring 2nd Remove Table refactoring removing child records Introduce Cascading Delete refactoring 2nd 3rd 4th 5th 6th 7th columns Drop Column refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th constraints Drop Column Constraint refactoring 2nd 3rd 4th default values Drop Default Value refactoring 2nd 3rd 4th 5th deprecated schemas 2nd foreign keys Drop Foreign Key Constraint refactoring 2nd 3rd 4th 5th non-nullable columns Drop Non Nullable refactoring 2nd 3rd 4th 5th rows Introduce Hard Delete refactoring 2nd 3rd 4th 5th Introduce Soft Delete refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th tables Drop Table refactoring 2nd 3rd 4th 5th views Drop View refactoring 2nd 3rd 4th

Rename Column refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th Rename Method refactoring 2nd Rename Table refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th Rename View refactoring 2nd 3rd 4th 5th 6th Reorder Parameters refactoring 2nd 3rd 4th 5th Replace Column refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Replace Literal with Table Lookup refactoring 2nd 3rd Replace LOB With Table refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Replace Magic Number With Symbolic Constant refactoring Replace Method(s) With View refactoring 2nd 3rd 4th 5th 6th Replace Nested Conditional with Guard Clauses refactoring 2nd Replace One-To-Many With Associative Table refactoring 2nd 3rd 4th 5th 6th 7th 8th Replace Parameter with Explicit Methods refactoring 2nd Replace Surrogate Key With Natural Key refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th Replace Type Code With Property Flags refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th Replace View With Method(s) refactoring 2nd 3rd 4th 5th 6th rows excess rows as database smells removing Introduce Hard Delete refactoring 2nd 3rd 4th 5th Introduce Soft Delete refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] sandboxes [See developer sandboxes] scheduling deployment windows 2nd 3rd schema [See database schema] updating for Add CRUD Methods refactoring 2nd for Add Foreign Key Constraint refactoring 2nd 3rd for Add Lookup Table refactoring 2nd for Add Mirror Table refactoring 2nd for Add Read Method refactoring 2nd for Add Trigger For Calculated Column refactoring 2nd 3rd for Apply Standard Codes refactoring for Apply Standard Type refactoring 2nd 3rd 4th for Consolidate Key Strategy refactoring 2nd 3rd for Drop Column Constraint refactoring for Drop Column refactoring 2nd 3rd for Drop Default Value refactoring for Drop Foreign Key Constraint refactoring for Drop Non Nullable refactoring for Drop Table refactoring 2nd for Drop View refactoring for Encapsulate Table With View refactoring 2nd for Insert Data transformation 2nd for Introduce Calculated Column refactoring 2nd 3rd for Introduce Calculation Method refactoring for Introduce Cascading Delete refactoring 2nd 3rd 4th for Introduce Column Constraint refactoring for Introduce Common Format refactoring for Introduce Default Value refactoring for Introduce Hard Delete refactoring for Introduce Index refactoring 2nd for Introduce New Column transformation for Introduce New Table transformation for Introduce Read-Only Table refactoring for Introduce Soft Delete refactoring 2nd 3rd 4th for Introduce Surrogate Key refactoring 2nd 3rd 4th 5th for Introduce Trigger For History refactoring 2nd 3rd 4th for Introduce View transformation 2nd for Make Column Non Nullable refactoring for Merge Columns refactoring 2nd for Merge Tables refactoring 2nd for Migrate Method From Database refactoring for Migrate Method To Database refactoring for Move Column refactoring 2nd 3rd 4th for Move Data refactoring for Rename Column refactoring 2nd 3rd for Rename Table refactoring 2nd 3rd 4th for Rename View refactoring 2nd for Replace Column refactoring 2nd for Replace LOB With Table refactoring 2nd 3rd

for Replace Method(s) With View refactoring 2nd for Replace One-To-Many With Associative Table refactoring 2nd 3rd 4th for Replace Surrogate Key With Natural Key refactoring 2nd 3rd for Replace Type Code With Property Flags refactoring 2nd 3rd 4th for Replace View With Method(s) refactoring 2nd for Split Column refactoring 2nd 3rd for Split Table refactoring 2nd 3rd for Use Official Data Source refactoring 2nd schemas deprecated schemas removing 2nd version identification strategies 2nd 3rd 4th 5th scripts numbering semantics maintaining 2nd 3rd single-application database architecture database refactoring example 2nd 3rd small changes ease of application 2nd large changes implemented as smart columns as database smells smells [See database smells] soft deletes adding Introduce Soft Delete refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th software development techniques versus database development techniques 2nd source data Use Official Data Source refactoring 2nd 3rd 4th 5th 6th source data migration [See data migration] Split Column refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th Split Table refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th Split Temporary Variable refactoring 2nd SQL duplication standard codes Apply Standard Codes refactoring 2nd 3rd 4th 5th 6th static lookup data stored procedures Add CRUD Methods refactoring 2nd 3rd 4th 5th 6th 7th Add Read Method refactoring 2nd 3rd 4th 5th fixing after data quality refactorings after structural refactorings Introduce Calculation Method refactoring 2nd 3rd 4th 5th method refactorings [See method refactorings] Migrate Method From Database refactoring 2nd 3rd 4th 5th 6th Migrate Method To Database refactoring 2nd 3rd 4th 5th Replace Method(s) With View refactoring 2nd 3rd 4th 5th 6th Replace View With Method(s) refactoring 2nd 3rd 4th 5th 6th UML notation for 2nd structural refactorings common issues 2nd 3rd 4th Drop Column 2nd 3rd 4th 5th 6th 7th 8th 9th Drop Table 2nd 3rd 4th 5th Drop View 2nd 3rd 4th Introduce Calculated Column 2nd 3rd 4th 5th 6th 7th Introduce Surrogate Key 2nd 3rd 4th 5th 6th 7th 8th 9th 10th list of

Merge Columns 2nd 3rd 4th 5th 6th 7th Merge Tables 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th Move Column 2nd 3rd 4th 5th 6th 7th 8th Rename Column 2nd 3rd 4th 5th 6th 7th Rename Table 2nd 3rd 4th 5th 6th 7th 8th Rename View 2nd 3rd 4th 5th 6th Replace Column 2nd 3rd 4th 5th 6th 7th 8th Replace LOB With Table 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Replace One-To-Many With Associative Table 2nd 3rd 4th 5th 6th 7th 8th Replace Surrogate Key With Natural Key 2nd 3rd 4th 5th 6th 7th 8th Split Column 2nd 3rd 4th 5th 6th 7th 8th Split Table 2nd 3rd 4th 5th 6th Substitute Algorithm refactoring surrogate keys Introduce Surrogate Key refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th natural keys versus 2nd replacing Replace Surrogate Key with Natural Key refactoring 2nd 3rd 4th 5th 6th 7th 8th synchronization for Introduce Calculated Column refactoring trigger-based synchronization with triggers

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] tables associative tables naming conventions replacing one-to-many relationship with (Replace One-To-Many With Associative Table refactoring) 2nd 3rd 4th 5th 6th 7th 8th columns adding constraints (Introduce Column Constraint refactoring) 2nd 3rd 4th 5th 6th adding data formats (Introduce Common Format refactoring) 2nd 3rd 4th 5th 6th adding default values (Introduce Default Value refactoring) 2nd 3rd 4th 5th adding non-nullable columns (Make Column Non Nullable refactoring) 2nd 3rd 4th 5th 6th Apply Standard Codes refactoring 2nd 3rd 4th 5th 6th Apply Standard Type refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th calculated columns (Add Trigger For Calculated Column refactoring) 2nd 3rd 4th 5th 6th 7th calculated columns (Introduce Calculated Column refactoring) 2nd 3rd 4th 5th 6th 7th Introduce New Column transformation 2nd 3rd 4th 5th merging (Merge Columns refactoring) 2nd 3rd 4th 5th 6th 7th moving (Move Column refactoring) 2nd 3rd 4th 5th 6th 7th 8th removing (Drop Column refactoring) 2nd 3rd 4th 5th 6th 7th 8th 9th removing constraints (Drop Column Constraint refactoring) 2nd 3rd 4th removing default values (Drop Default Value refactoring) 2nd 3rd 4th 5th removing non-nullable columns (Drop Non Nullable refactoring) 2nd 3rd 4th 5th renaming (Rename Column refactoring) 2nd 3rd 4th 5th 6th 7th replacing (Replace Column refactoring) 2nd 3rd 4th 5th 6th 7th 8th replacing code columns with property flags (Replace Type Code With Property Flags refactoring) 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th splitting (Split Column refactoring) 2nd 3rd 4th 5th 6th 7th 8th Encapsulate Table With View refactoring 2nd 3rd 4th 5th fixing after structural refactorings foreign keys [See foreign keys] Insert Data transformation 2nd 3rd 4th 5th 6th 7th 8th Introduce New Table transformation 2nd 3rd 4th lookup tables Add Lookup Table refactoring 2nd 3rd 4th 5th 6th merging Merge Tables refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th mirror tables Add Mirror Table refactoring 2nd 3rd 4th 5th 6th 7th Move Data refactoring 2nd 3rd 4th 5th 6th multipurpose tables as database smells read-only tables Introduce Read-Only Table refactoring 2nd 3rd 4th 5th 6th 7th 8th removing Drop Table refactoring 2nd 3rd 4th 5th renaming Rename Table refactoring 2nd 3rd 4th 5th 6th 7th 8th Replace Literal with Table Lookup refactoring 2nd 3rd replacing LOB (large object) with Replace LOB With Table refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th rows [See rows]

splitting Split Table refactoring 2nd 3rd 4th 5th 6th UML notation for Update Data transformation 2nd 3rd 4th 5th 6th 7th views Introduce View transformation 2nd 3rd 4th 5th 6th TDD (Test-Driven Development) in database refactoring process 2nd 3rd 4th 5th 6th team negotiations for selecting transition periods temporary variables Split Temporary Variable refactoring 2nd test data Test-Driven Development [See TDD] Test-Driven Development (TDD) Test-First Development (TFD) 2nd 3rd testing in database refactoring process 2nd 3rd 4th 5th 6th regression testing TFD (Test-First Development) 2nd 3rd tools as impediment to evolutionary database development techniques transformations [See database transformations] Insert Data 2nd 3rd 4th 5th 6th 7th 8th Introduce New Column 2nd 3rd 4th 5th Introduce New Table 2nd 3rd 4th Introduce View 2nd 3rd 4th 5th 6th list of Update Data 2nd 3rd 4th 5th 6th 7th transition period 2nd 3rd 4th 5th for structural refactorings transition periods selecting 2nd 3rd trigger-based synchronization triggers Add Trigger For Calculated Column refactoring 2nd 3rd 4th 5th 6th 7th adding Introduce Trigger For History refactoring 2nd 3rd 4th 5th 6th 7th avoiding cycles 2nd fixing after structural refactorings synchronization with

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] UML data modeling notation 2nd 3rd 4th 5th 6th Unified Modeling Language (UML) data modeling notation 2nd 3rd 4th 5th 6th Update Data refactoring 2nd 3rd 4th 5th Update Data transformation 2nd 3rd 4th 5th 6th 7th updateable views Rename Table refactoring and 2nd updating data after data quality refactorings external access programs for Add CRUD Methods refactoring 2nd for Add Foreign Key Constraint refactoring 2nd for Add Lookup Table refactoring 2nd for Add Mirror Table refactoring 2nd for Add Read Method refactoring 2nd for Add Trigger For Calculated Column refactoring for Apply Standard Codes refactoring 2nd for Apply Standard Type refactoring 2nd for Consolidate Key Strategy refactoring 2nd for Drop Column Constraint refactoring for Drop Column refactoring 2nd 3rd 4th for Drop Default Value refactoring 2nd for Drop Foreign Key Constraint refactoring for Drop Non Nullable refactoring 2nd for Drop Table refactoring for Drop View refactoring for Encapsulate Table With View refactoring for Insert Data transformation 2nd 3rd 4th for Introduce Calculated Column refactoring 2nd for Introduce Calculation Method refactoring 2nd for Introduce Cascading Delete refactoring 2nd 3rd for Introduce Column Constraint refactoring 2nd for Introduce Common Format refactoring 2nd for Introduce Default Value refactoring 2nd for Introduce Hard Delete refactoring 2nd for Introduce Index refactoring for Introduce New Column transformation 2nd for Introduce New Table transformation for Introduce Read-Only Table refactoring 2nd for Introduce Soft Delete refactoring 2nd 3rd for Introduce Surrogate Key refactoring 2nd 3rd for Introduce Trigger For History refactoring 2nd for Introduce View transformation 2nd 3rd for Make Column Non Nullable refactoring 2nd for Merge Columns refactoring 2nd for Merge Tables refactoring 2nd 3rd 4th for Migrate Method From Database refactoring 2nd for Migrate Method To Database refactoring 2nd 3rd for Move Column refactoring 2nd 3rd for Move Data refactoring 2nd

for Rename Column refactoring 2nd for Rename Table refactoring 2nd for Rename View refactoring 2nd for Replace Column refactoring 2nd for Replace LOB With Table refactoring 2nd 3rd for Replace Method(s) With View refactoring 2nd for Replace One-To-Many With Associative Table refactoring 2nd for Replace Surrogate Key With Natural Key refactoring 2nd 3rd for Replace Type Code With Property Flags refactoring 2nd 3rd 4th for Replace View With Method(s) refactoring 2nd for Split Column refactoring 2nd 3rd for Split Table refactoring 2nd for Update Data transformation 2nd for Use Official Data Source refactoring 2nd schema for Add CRUD Methods refactoring 2nd for Add Foreign Key Constraint refactoring 2nd 3rd for Add Lookup Table refactoring 2nd for Add Mirror Table refactoring 2nd for Add Read Method refactoring 2nd for Add Trigger For Calculated Column refactoring 2nd 3rd for Apply Standard Codes refactoring for Apply Standard Type refactoring 2nd 3rd 4th for Consolidate Key Strategy refactoring 2nd 3rd for Drop Column Constraint refactoring for Drop Column refactoring 2nd 3rd for Drop Default Value refactoring for Drop Foreign Key Constraint refactoring for Drop Non Nullable refactoring for Drop Table refactoring 2nd for Drop View refactoring for Encapsulate Table With View refactoring 2nd for Insert Data transformation 2nd for Introduce Calculated Column refactoring 2nd 3rd for Introduce Calculation Method refactoring for Introduce Cascading Delete refactoring 2nd 3rd 4th for Introduce Column Constraint refactoring for Introduce Common Format refactoring for Introduce Default Value refactoring for Introduce Hard Delete refactoring for Introduce Index refactoring 2nd for Introduce New Column transformation for Introduce New Table transformation for Introduce Read-Only Table refactoring for Introduce Soft Delete refactoring 2nd 3rd 4th for Introduce Surrogate Key refactoring 2nd 3rd 4th 5th for Introduce Trigger For History refactoring 2nd 3rd 4th for Introduce View transformation 2nd for Make Column Non Nullable refactoring for Merge Columns refactoring 2nd for Merge Tables refactoring 2nd for Migrate Method From Database refactoring for Migrate Method To Database refactoring for Move Column refactoring 2nd 3rd 4th for Move Data refactoring for Rename Column refactoring 2nd 3rd for Rename Table refactoring 2nd 3rd 4th for Rename View refactoring 2nd for Replace Column refactoring 2nd for Replace LOB With Table refactoring 2nd 3rd

for Replace Method(s) With View refactoring 2nd for Replace One-To-Many With Associative Table refactoring 2nd 3rd 4th for Replace Surrogate Key With Natural Key refactoring 2nd 3rd for Replace Type Code With Property Flags refactoring 2nd 3rd 4th for Replace View With Method(s) refactoring 2nd for Split Column refactoring 2nd 3rd for Split Table refactoring 2nd 3rd for Use Official Data Source refactoring 2nd Use Official Data Source refactoring 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] validating data migration 2nd variables Introduce Variable refactoring 2nd temporary variables Split Temporary Variable refactoring 2nd version control version identification strategies 2nd 3rd 4th 5th for database artifacts versioning database schema views Encapsulate Table With View refactoring 2nd 3rd 4th 5th fixing after data quality refactorings after structural refactorings Introduce View transformation 2nd 3rd 4th 5th 6th materialized views removing Drop View refactoring 2nd 3rd 4th renaming Rename View refactoring 2nd 3rd 4th 5th 6th Replace Method(s) With View refactoring 2nd 3rd 4th 5th 6th Replace View With Method(s) refactoring 2nd 3rd 4th 5th 6th synchronization with UML notation for 2nd updateable views Rename Table refactoring and 2nd