MySQL Cookbook

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

MySQL Cookbook By Paul DuBois

Publisher

: O'Reilly

Pub Dat e

: Oct ober 2002

Preface The MySQL dat abase m anagem ent syst em has becom e quit e popular in recent years. This has been t rue especially in t he Linux and open source com m unit ies, but MySQL's presence in t he com m ercial sect or now is increasing as well. I t is well liked for several reasons: MySQL is fast , and it 's easy t o set up, use, and adm inist rat e. MySQL runs under m any variet ies of Unix and Windows, and MySQL-based program s can be writ t en in m any languages. MySQL is especially heavily used in com binat ion wit h a web server for const ruct ing dat abase-backed web sit es t hat involve dynam ic cont ent generat ion. Wit h MySQL's rise in popularit y com es t he need t o address t he quest ions posed by it s users about how t o solve specific problem s. That is t he purpose of MySQL Cookbook. I t 's designed t o serve as a handy resource t o which you can t urn when you need quick solut ions or t echniques for at t acking part icular t ypes of quest ions t hat com e up when you use MySQL. Nat urally, because it 's a cookbook, it cont ains recipes: st raight forward inst ruct ions you can follow rat her t han develop your own code from scrat ch. I t 's writ t en using a problem - and-solut ion form at designed t o be ext rem ely pract ical and t o m ake t he cont ent s easy t o read and assim ilat e. I t cont ains m any short sect ions, each describing how t o writ e a query, apply a t echnique, or develop a script t o solve a problem of lim it ed and specific scope. This book doesn't at t em pt t o develop full- fledged applicat ions. I nst ead, it 's int ended t o assist you in developing such applicat ions yourself by helping you get past problem s t hat have you st um ped. For exam ple, a com m on quest ion is, "How can I deal wit h quot es and special charact ers in dat a values when I 'm writ ing queries?" That 's not difficult , but figuring out how t o do it is frust rat ing when you're not sure where t o st art . This book dem onst rat es what t o do; it shows you where t o begin and how t o proceed from t here. This knowledge will serve you repeat edly, because aft er you see what 's involved, you'll be able t o apply t he t echnique t o any kind of dat a, such as t ext , im ages, sound or video clips, news art icles, com pressed files, PDF files, or word processing docum ent s. Anot her com m on quest ion is, "Can I access t ables from t wo dat abases at t he sam e t im e?" The answer is "Yes," and it 's easy t o do because it 's j ust a m at t er of knowing t he proper SQL synt ax. But it 's hard t o do unt il you see how; t his book will show you. Ot her t hings t hat you'll learn from t his book include:

• •

How t o use SQL t o select , sort , and sum m arize records.



How t o perform a t ransact ion.

• •

How t o det erm ine int ervals bet ween dat es or t im es, including age calculat ions.

How t o find m at ches or m ism at ches bet ween records in t wo t ables.

How t o rem ove duplicat e records.



How t o st ore im ages int o MySQL and ret rieve t hem for display in web pages.



How t o convert t he legal values of an ENUM colum n int o radio but t ons in a web page, or t he values of a SET colum n int o checkboxes.



How t o get LOAD DATA t o read your dat afiles properly, or find out which values in t he file are bad.



How t o use pat t ern m at ching t echniques t o cope wit h m ism at ches bet ween t he CCYY-

MM-DD dat e form at t hat MySQL uses and dat es in your dat afiles. • •

How t o copy a t able or a dat abase t o anot her server. How t o resequence a sequence num ber colum n, and why you really don't want t o.

One part of knowing how t o use MySQL is underst anding how t o com m unicat e wit h t he server—t hat is, how t o use SQL, t he language t hrough which queries are form ulat ed. Therefore, one m aj or em phasis of t his book is on using SQL t o form ulat e queries t hat answer part icular kinds of quest ions. One helpful t ool for learning and using SQL is t he m ysql client program t hat is included in MySQL dist ribut ions. By using t his client int eract ively, you can send SQL st at em ent s t o t he server and see t he result s. This is ext rem ely useful because it provides a direct int erface t o SQL. The m ysql client is so useful, in fact , t hat t he ent ire first chapt er is devot ed t o it . But t he abilit y t o issue SQL queries alone is not enough. I nform at ion ext ract ed from a dat abase oft en needs t o be processed furt her or present ed in a part icular way t o be useful. What if you have queries wit h com plex int errelat ionships, such as when you need t o use t he result s of one query as t he basis for ot hers? SQL by it self has lit t le facilit y for m aking t hese kinds of choices, which m akes it difficult t o use decision- based logic t o det erm ine which queries t o execut e. Or what if you need t o generat e a specialized report wit h very specific form at t ing requirem ent s? This t oo is difficult t o achieve using j ust SQL. These problem s bring us t o t he ot her m aj or em phasis of t he book—how t o writ e program s t hat int eract wit h t he MySQL server t hrough an applicat ion program m ing int erface ( API ) . When you know how t o use MySQL from wit hin t he cont ext of a program m ing language, you gain t he abilit y t o exploit MySQL's capabilit ies in t he following ways:

• •

You can rem em ber t he result from a query and use it at a lat er t im e. You can m ake decisions based on success or failure of a query, or on t he cont ent of t he rows t hat are ret urned. Difficult ies in im plem ent ing cont rol flow disappear when using an API because t he host language provides facilit ies for expressing decisionbased logic: if- t hen-else const ruct s, while loops, subrout ines, and so fort h.



You can form at and display query result s however you like. I f you're writ ing a com m and- line script , you can generat e plain t ext . I f it 's a web-based script , you can generat e an HTML t able. I f it 's an applicat ion t hat ext ract s inform at ion for t ransfer t o som e ot her syst em , you m ight writ e a dat afile expressed in XML.

When you com bine SQL wit h a general purpose program m ing language and a MySQL client API , you have an ext rem ely flexible fram ework for issuing queries and processing t heir result s. Program m ing languages increase your expressive capabilit ies by giving you a great deal of addit ional power t o perform com plex dat abase operat ions. This doesn't m ean t his book is com plicat ed, t hough. I t keeps t hings sim ple, showing how t o const ruct sm all building blocks by using t echniques t hat are easy t o underst and and easily m ast ered. I 'll leave it t o you t o com bine t hese t echniques wit hin your own program s, which you can do t o produce arbit rarily com plex applicat ions. Aft er all, t he genet ic code is based on only four

nucleic acids, but t hese basic elem ent s have been com bined t o produce t he ast onishing array of biological life we see all around us. Sim ilarly, t here are only 12 not es in t he scale, but in t he hands of skilled com posers, t hey can be int erwoven t o produce a rich and endless variet y of m usic. I n t he sam e way, when you t ake a set of sim ple recipes, add your im aginat ion, and apply t hem t o t he dat abase program m ing problem s you want t o solve, you can produce t hat are perhaps not works of art , but cert ainly applicat ions t hat are useful and t hat will help you and ot hers be m ore product ive.

MySQL APIs Used in This Book MySQL program m ing int erfaces exist for m any languages, including ( in alphabet ical order) C, C+ + , Eiffel, Java, Pascal, Perl, PHP, Pyt hon, Ruby, Sm allt alk, and Tcl. [ ] Given t his fact , writ ing a MySQL cookbook present s an aut hor wit h som et hing of a challenge. Clearly t he book should provide recipes for doing m any int erest ing and useful t hings wit h MySQL, but which API or API s should t he book use? Showing an im plem ent at ion of every recipe in every language would result eit her in covering very few recipes or in a very, very large book! I t would also result in a lot of redundancy when im plem ent at ions in different languages bear a st rong resem blance t o each ot her. On t he ot her hand, it 's wort hwhile t aking advant age of m ult iple languages, because one language oft en will be m ore suit able t han anot her for solving a part icular t ype of problem . []

To see what API s are current ly available, visit t he developm ent port al at t he MySQL web sit e, locat ed at ht t p: / / www.m ysql.com / port al/ developm ent / ht m l/ .

To resolve t his dilem m a, I 've picked a sm all num ber of API s from am ong t hose t hat are available and used t hem t o writ e t he recipes in t his book. This lim it s it s scope t o a m anageable num ber of API s while allowing som e lat it ude t o choose from am ong t hem . The prim ary API s covered here are: Perl Using t he DBI m odule and it s MySQL- specific driver

PHP Using it s set of built - in MySQL support funct ions

Pyt hon Using t he DB- API m odule and it s MySQL- specific driver

Java Using a MySQL- specific driver for t he Java Dat abase Connect ivit y ( JDBC) int erface

Why t hese languages? Perl and PHP were easy t o pick. Perl is arguably t he m ost widely used language on t he Web, and it becam e so based on cert ain st rengt hs such as it s t ext - processing

capabilit ies. I n part icular, it 's very popular for writ ing MySQL program s. PHP also is widely deployed, and it s use is increasing st eadily. One of PHP's st rengt hs is t he ease wit h which you can use it t o access dat abases, m aking it a nat ural choice for MySQL script ing. Pyt hon and Java are not as popular as Perl or PHP for MySQL program m ing, but each has significant num bers of followers. I n t he Java com m unit y in part icular, MySQL seem s t o be m aking st rong inroads am ong developers who use JavaServer Pages ( JSP) t echnology t o build dat abasebacked web applicat ions. ( An anecdot al observat ion: Aft er I wrot e MySQL ( New Riders) , Pyt hon and Java were t he t wo languages not covered in t hat book t hat readers m ost oft en said t hey would have liked t o have seen addressed. So here t hey are! ) I believe t hese languages t aken t oget her reflect pret t y well t he m aj orit y of t he exist ing user base of MySQL program m ers. I f you prefer som e language not shown here, you can st ill use t his book, but be sure t o pay careful at t ent ion t o Chapt er 2, t o fam iliarize yourself wit h t he book's prim ary API languages. Knowing how dat abase operat ions are perform ed wit h t he API s used here will help you underst and t he recipes in lat er chapt ers so t hat you can t ranslat e t hem int o languages not discussed.

Who This Book Is For This book should be useful for anybody who uses MySQL, ranging from novices who want t o use a dat abase for personal reasons, t o professional dat abase and web developers. The book should also appeal t o people who do not now use MySQL, but would like t o. For exam ple, it should be useful t o beginners who want t o learn about dat abases but realize t hat Oracle isn't t he best choice for t hat . I f you're relat ively new t o MySQL, you'll probably find lot s of ways t o use it here t hat you hadn't t hought of. I f you're m ore experienced, you'll probably be fam iliar wit h m any of t he problem s addressed here, but you m ay not have had t o solve t hem before and should find t he book a great t im esaver; t ake advant age of t he recipes given in t he book and use t hem in your own program s rat her t han figuring out how t o writ e t he code from scrat ch. The book also can be useful for people who aren't even using MySQL. You m ight suppose t hat because t his is a MySQL cookbook and not a Post greSQL cookbook or an I nt erBase cookbook t hat it won't apply t o dat abases ot her t han MySQL. To som e ext ent t hat 's t rue, because som e of t he SQL const ruct s are MySQL- specific. On t he ot her hand, m any of t he queries are st andard SQL t hat is port able t o m any ot her dat abase engines, so you should be able t o use t hem wit h lit t le or no m odificat ion. And several of our program m ing language int erfaces provide dat abase- independent access m et hods; you use t hem t he sam e way regardless of which dat abase you're connect ing t o. The m at erial ranges from int roduct ory t o advanced, so if a recipe describes t echniques t hat seem obvious t o you, skip it . Or if you find t hat you don't underst and a recipe, it m ay be best t o set it aside for a while and com e back t o it lat er, perhaps aft er reading som e of t he preceding recipes.

More advanced readers m ay wonder on occasion why in a book on MySQL I som et im es provide explanat ory m at erial on cert ain basic t opics t hat are not direct ly MySQL- relat ed, such as how t o set environm ent variables. I decided t o do t his based on m y experience in helping novice MySQL users. One t hing t hat m akes MySQL at t ract ive is t hat it is easy t o use, which m akes it a popular choice for people wit hout ext ensive background in dat abases. However, m any of t hese sam e people also t end t o be t hwart ed by sim ple barriers t o m ore effect ive use of MySQL, as evidenced by t he com m on quest ion, "How can I avoid having t o t ype t he full pat hnam e of m ysql each t im e I invoke it ?" Experienced readers will recognize im m ediat ely t hat t his is sim ply a m at t er of appropriat ely set t ing t he PATH environm ent variable t o include t he direct ory where m ysql is inst alled. But ot her readers will not , part icularly Windows users who are used t o dealing only wit h a graphical int erface and, m ore recent ly, Mac OS X users who find t heir fam iliar user int erface now augm ent ed by t he powerful but som et im es m yst erious com m and line provided by t he Term inal applicat ion. I f you are in t his sit uat ion, you'll find t hese m ore elem ent ary sect ions helpful in knocking down barriers t hat keep you from using MySQL m ore easily. I f you're a m ore advanced user, j ust skip over such sect ions.

What's in This Book I t 's very likely when you use t his book t hat you'll have an applicat ion in m ind you're t rying t o develop but are not sure how t o im plem ent cert ain pieces of it . I n t his case, you'll already know what t ype of problem you want t o solve, so you should search t he t able of cont ent s or t he index looking for a recipe t hat shows how t o do what you want . I deally, t he recipe will be j ust what you had in m ind. Failing t hat , you should be able t o find a recipe for a sim ilar problem t hat you can adapt t o suit t he issue at hand. ( I t ry t o explain t he principles involved in developing each t echnique so t hat you'll be able t o m odify it t o fit t he part icular requirem ent s of your own applicat ions.) Anot her way t o approach t his book is t o j ust read t hrough it wit h no specific problem in m ind. This can help you because it will give you a broader underst anding of t he t hings MySQL can do, so I recom m end t hat you page t hrough t he book occasionally. I t 's a m ore effect ive t ool if you have a general fam iliarit y wit h it and know t he kinds of problem s it addresses. The following paragraphs sum m arize each chapt er, t o help give you an overview of t he book's cont ent s. Chapt er 1, describes how t o use t he st andard MySQL com m and- line client . m ysql is oft en t he first int erface t o MySQL t hat people use, and it 's im port ant t o know how t o exploit it s capabilit ies. This program allows you t o issue queries and see t he result s int eract ively, so it 's good for quick experim ent at ion. You can also use it in bat ch m ode t o execut e canned SQL script s or send it s out put int o ot her program s. I n addit ion, t he chapt er discusses ot her ways t o use m ysql, such as how t o num ber out put lines or m ake long lines m ore readable, how t o generat e various out put form at s, and how t o log m ysql sessions. Chapt er 2, dem onst rat es t he basic elem ent s of MySQL program m ing in each API language: how t o connect t o t he server, issue queries, ret rieve t he result s, and handle errors. I t also discusses how t o handle special charact ers and NULL values in queries, how t o writ e library

files t o encapsulat e code for com m only used operat ions, and various ways t o gat her t he param et ers needed for m aking connect ions t o t he server. Chapt er 3, covers several aspect s of t he SELECT st at em ent , which is t he prim ary vehicle for ret rieving dat a from t he MySQL server: specifying which colum ns and rows you want t o ret rieve, perform ing com parisons, dealing wit h NULL values, select ing one sect ion of a query result , using t em porary t ables, and copying result s int o ot her t ables. Lat er chapt ers cover som e of t hese t opics in m ore det ail, but t his chapt er provides an overview of t he concept s on which t hey depend. You should read it if you need som e int roduct ory background on record select ion, for exam ple, if you don't yet know a lot about SQL. Chapt er 4, describes how t o deal wit h st ring dat a. I t addresses st ring com parisons, pat t ern m at ching, breaking apart and com bining st rings, dealing wit h case-sensit ivit y issues, and perform ing FULLTEXT searches. Chapt er 5, shows how t o work wit h t em poral dat a. I t describes MySQL's dat e form at and how t o display dat e values in ot her form at s. I t also covers conversion bet ween different t em poral unit s, how t o perform dat e arit hm et ic t o com put e int ervals or generat e one dat e from anot her, leap- year calculat ions, and how t o use MySQL's special TIMESTAMP colum n t ype. Chapt er 6, describes how t o put t he rows of a query result in t he order you want . This includes specifying t he sort direct ion, dealing wit h NULL values, account ing for st ring case sensit ivit y, and sort ing by dat es or part ial colum n values. I t also provides exam ples t hat show how t o sort special kinds of values, such as dom ain nam es, I P num bers, and ENUM values. Chapt er 7, shows t echniques t hat are useful for assessing t he general charact erist ics of a set of dat a, such as how m any values it cont ains or what t he m inim um , m axim um , or average values are. Chapt er 8, describes how t o alt er t he st ruct ure of t ables by adding, dropping, or m odifying colum ns, and how t o set up indexes. Chapt er 9, discusses how t o get inform at ion about t he dat a a query ret urns, such as t he num ber of rows or colum ns in t he result , or t he nam e and t ype of each colum n. I t also shows how t o ask MySQL what dat abases and t ables are available or about t he st ruct ure of a t able and it s colum ns. Chapt er 10, describes how t o t ransfer inform at ion bet ween MySQL and ot her program s. This includes how t o convert files from one form at t o anot her, ext ract or rearrange colum ns in dat afiles, check and validat e dat a, rewrit e values such as dat es t hat oft en com e in a variet y of form at s, and how t o figure out which dat a values cause problem s when you load t hem int o MySQL wit h LOAD DATA. Chapt er 11, discusses AUTO_INCREMENT colum ns, MySQL's m echanism for producing sequence num bers. I t shows how t o generat e new sequence values or det erm ine t he m ost

recent value, how t o resequence a colum n, how t o begin a sequence at a given value, and how t o set up a t able so t hat it can m aint ain m ult iple sequences at once. I t also shows how t o use AUTO_INCREMENT values t o m aint ain a m ast er- det ail relat ionship bet ween t ables, including som e of t he pit falls t o avoid. Chapt er 12, shows how t o perform j oins, which are operat ions t hat com bine rows in one t able wit h t hose from anot her. I t dem onst rat es how t o com pare t ables t o find m at ches or m ism at ches, produce m ast er-det ail list s and sum m aries, enum erat e m any- t o-m any relat ionships, and updat e or delet e records in one t able based on t he cont ent s of anot her. Chapt er 13, illust rat es how t o produce descript ive st at ist ics, frequency dist ribut ions, regressions, and correlat ions. I t also covers how t o random ize a set of rows or pick a row at random from t he set . Chapt er 14, discusses how t o ident ify, count , and rem ove duplicat e records—and how t o prevent t hem from occurring in t he first place. Chapt er 15, shows how t o handle m ult iple SQL st at em ent s t hat m ust execut e t oget her as a unit . I t discusses how t o cont rol MySQL's aut o- com m it m ode, how t o com m it or roll back t ransact ions, and dem onst rat es som e workarounds you can use if t ransact ional capabilit ies are unavailable in your version of MySQL. Chapt er 16, get s you set up t o writ e web-based MySQL script s. Web program m ing allows you t o generat e dynam ic pages or collect inform at ion for st orage in your dat abase. The chapt er discusses how t o configure Apache t o run Perl, PHP, and Pyt hon script s, and how t o configure Tom cat t o run Java script s writ t en using JSP not at ion. I t also provides an overview of t he Java St andard Tag Library ( JSTL) t hat is used heavily in JSP pages in t he following chapt ers. Chapt er 17, shows how t o use t he result s of queries t o produce various t ypes of HTML st ruct ures, such as paragraphs, list s, t ables, hyperlinks, and navigat ion indexes. I t also describes how t o st ore im ages int o MySQL, ret rieve and display t hem lat er, and how t o send a downloadable result set t o a browser. Chapt er 18, discusses ways t o obt ain input from users over t he Web and use it t o creat e new dat abase records or as t he basis for perform ing searches. I t deals heavily wit h form processing, including how t o const ruct form elem ent s, such as radio but t ons, pop- up m enus, or checkboxes, based on inform at ion cont ained in your dat abase. Chapt er 19, describes how t o writ e web applicat ions t hat rem em ber inform at ion across m ult iple request s, using MySQL for backing st ore. This is useful when you want t o collect inform at ion in st ages, or when you need t o m ake decisions based on what t he user has done earlier. Appendix A, indicat es where t o get t he source code for t he exam ples shown in t his book, and where t o get t he soft ware you need t o use MySQL and writ e your own dat abase program s.

Appendix B, provides a general overview of JSP and inst allat ion inst ruct ions for t he Tom cat web server. Read t his if you need t o inst all Tom cat or are not fam iliar wit h it , or if you're never writ t en JSP pages. Appendix C, list s sources of inform at ion t hat provide addit ional inform at ion about t opics covered in t his book. I t also list s som e books t hat provide int roduct ory background for t he program m ing languages used here. As you get int o lat er chapt ers, you'll som et im es find recipes t hat assum e a knowledge of t opics covered in earlier chapt ers. This also applies wit hin a chapt er, where lat er sect ions oft en use t echniques discussed earlier in t he chapt er. I f you j um p int o a chapt er and find a recipe t hat uses a t echnique wit h which you're not fam iliar, check t he t able of cont ent s or t he index t o find out where t he t echnique is covered. You should find t hat it 's been explained earlier. For exam ple, if you find t hat a recipe sort s a query result using an ORDER BY clause t hat you don't underst and, t urn t o Chapt er 6, which discusses various sort ing m et hods and explains how t hey work.

Platform Notes Developm ent of t he code in t his book t ook place under MySQL 3.23 and 4.0. Because new feat ures are added t o MySQL on a regular basis, som e exam ples will not work under older versions. I 've t ried t o point out version dependencies when int roducing such feat ures for t he first t im e. The MySQL language API m odules t hat I used include DBI 1.20 and up, DBD: : m ysql 2.0901 and up, MySQLdb 0.9 and up, MM.MySQL 2.0.5 and up, and MySQL Connect or/ J 2.0.14. DBI requires Perl 5.004_05 or higher up t hrough DBI 1.20, aft er which it requires Perl 5.005_03 or higher. MySQLdb requires Pyt hon 1.5.6 or higher. MM.MySQL and MySQL Connect or/ J require Java SDK 1.1 or higher. Language processors include Perl 5.6 and 5.6.1; PHP 3 and 4; Pyt hon 1.5.6, 2.2; and 2.3, and Java SDK 1.3.1. Most PHP script s shown here will run under eit her PHP 3 or PHP 4 ( alt hough I st rongly recom m end PHP 4 over PHP 3) . Script s t hat require PHP 4 are so not ed. I do not assum e t hat you are using Unix, alt hough t hat is m y own preferred developm ent plat form . Most of t he m at erial here should be applicable bot h t o Unix and Windows. The operat ing syst em s I used m ost for developm ent of t he recipes in t his book were Mac OS X; RedHat Linux 6.2, 7.0, and 7.3; and various versions of Windows ( Me, 98, NT, and 2000) . I do assum e t hat MySQL is inst alled already and available for you t o use. I also assum e t hat if you plan t o writ e your own MySQL-based program s, you're reasonably fam iliar wit h t he language you'll use. I f you need t o inst all soft ware, see Appendix A. I f you require background m at erial on t he program m ing languages used here, see Appendix C.

Conventions Used in This Book The following font convent ions have been used t hroughout t he book:

Constant width Used for program list ings, as well as wit hin paragraphs t o refer t o program elem ent s such as variable or funct ion nam es.

Constant width bold

Used t o indicat e t ext t hat you t ype when running com m ands.

Constant width italic Used t o indicat e variable input ; you should subst it ut e a value of your own choosing.

I t alic Used for URLs, host nam es, nam es of direct ories and files, Unix com m ands and opt ions, and occasionally for em phasis.

Com m ands oft en are shown wit h a prom pt t o illust rat e t he cont ext in which t hey are used. Com m ands t hat you issue from t he com m and line are shown wit h a % prom pt :

% chmod 600 my.cnf That prom pt is one t hat Unix users are used t o seeing, but it doesn't necessarily signify t hat a com m and will work only under Unix. Unless indicat ed ot herwise, com m ands shown wit h a % prom pt generally should work under Windows, t oo. I f you should run a com m and under Unix as t he root user, t he prom pt is # inst ead:

# chkconfig --add tomcat4 For com m ands t hat are specific only t o Windows, t he C:\> prom pt is used:

C:\> copy C:\mysql\lib\cygwinb19.dll C:\Windows\System SQL st at em ent s t hat are issued from wit hin t he m ysql client program are shown wit h a

mysql> prom pt and t erm inat ed wit h a sem icolon: mysql> SELECT * FROM my_table; For exam ples t hat show a query result as you would see it when using m ysql, I som et im es t runcat e t he out put , using an ellipsis ( ...) t o indicat e t hat t he result consist s of m ore rows t han are shown. The following query produces m any rows of out put , of which t hose in t he m iddle have been om it t ed:

mysql> SELECT name, abbrev FROM states ORDER BY name; +----------------+--------+ | name | abbrev | +----------------+--------+ | Alabama | AL | | Alaska | AK | | Arizona | AZ | ... | West Virginia | WV | | Wisconsin | WI | | Wyoming | WY | +----------------+--------+ Exam ples t hat j ust show t he synt ax for SQL st at em ent s do not include t he mysql> prom pt , but t hey do include sem icolons as necessary t o m ake it clear where st at em ent s end. For exam ple, t his is a single st at em ent :

CREATE TABLE t1 (i INT) SELECT * FROM t2; But t his exam ple represent s t wo st at em ent s:

CREATE TABLE t1 (i INT); SELECT * FROM t2; The sem icolon is a not at ional convenience used wit hin m ysql as a st at em ent t erm inat or. But it is not part of SQL it self, so when you issue SQL st at em ent s from wit hin program s t hat you writ e ( for exam ple, using Perl or Java) , you should not include t erm inat ing sem icolons.

This icon indicat es a t ip, suggest ion, or general not e.

The Companion Web Site MySQL Cookbook has a com panion web sit e t hat you can visit t o obt ain t he source code and sam ple dat a for exam ples developed t hroughout t his book: ht t p: / / www.kit ebird.com / m ysql- cookbook/ The m ain soft ware dist ribut ion is nam ed recipes and you'll find m any references t o it t hroughout t he book. You can use it t o save a lot of t yping. For exam ple, when you see a

CREATE TABLE st at em ent in t he book t hat describes what a dat abase t able looks like, you'll find a SQL bat ch file in t he t ables direct ory of t he recipes dist ribut ion t hat you can use t o creat e t he t able inst ead of t yping out t he definit ion. Change locat ion int o t he t ables direct ory, t hen execut e t he following com m and, where filename is t he nam e of t he cont aining t he

CREATE TABLE st at em ent :

% mysql cookbook
GRANT ALL ON cookbook.* TO 'cbuser'@'localhost' IDENTIFIED BY 'cbpass'; Query OK, 0 rows affected (0.09 sec) mysql> QUIT Bye Aft er you ent er t he m ysql com m and shown on t he first line, if you get a m essage indicat ing t hat t he program cannot be found or t hat it is a bad com m and, see Recipe 1.8. Ot herwise, when m ysql print s t he password prom pt , ent er t he MySQL root password where you see t he

******. ( I f t he MySQL root user has no password, j ust press Ret urn at t he password prom pt .) Then issue a GRANT st at em ent like t he one shown. To use a dat abase nam e ot her t han cookbook, subst it ut e it s nam e where you see cookbook in t he GRANT st at em ent . Not e t hat you need t o grant privileges for t he dat abase even if t he user account already exist s. However, in t hat case, you'll likely want t o om it t he IDENTIFIED

BY 'cbpass' part of t he st at em ent , because ot herwise you'll change t hat account 's current password.

The host nam e part of 'cbuser'@'localhost' indicat es t he host from which you'll be connect ing t o t he MySQL server t o access t he cookbook dat abase. To set up an account t hat will connect t o a server running on t he local host , use localhost, as shown. I f you plan t o m ake connect ions t o t he server from anot her host , subst it ut e t hat host in t he GRANT st at em ent . For exam ple, if you'll be connect ing t o t he server as cbuser from a host nam ed xyz.com , t he GRANT st at em ent should look like t his:

mysql> GRANT ALL ON cookbook.* TO 'cbuser'@'xyz.com' IDENTIFIED BY 'cbpass'; I t m ay have occurred t o you t hat t here's a bit of a paradox involved in t he procedure j ust described. That is, t o set up a user account t hat can m ake connect ions t o t he MySQL server, you m ust connect t o t he server first so t hat you can issue t he GRANT st at em ent . I 'm assum ing t hat you can already connect as t he MySQL root user, because GRANT can be used only by a user such as root t hat has t he adm inist rat ive privileges needed t o set up ot her user account s. I f you can't connect t o t he server as root, ask your MySQL adm inist rat or t o issue t he GRANT st at em ent for you. Once t hat has been done, you should be able t o use t he new MySQL account t o connect t o t he server, creat e your own dat abase, and proceed from t here on your own.

MySQL Accounts and Login Accounts MySQL account s and login account s for your operat ing syst em are different . For exam ple, t he MySQL root user and t he Unix root user are separat e and have not hing t o do wit h each ot her, even t hough t he usernam e is t he sam e in each case. This m eans t hey are very likely t o have different passwords. I t also m eans you cannot creat e new MySQL account s by creat ing login account s for your operat ing syst em ; use t he GRANT st at em ent inst ead.

1.3 Creating a Database and a Sample Table 1.3.1 Problem You want t o creat e a dat abase and t o set up t ables wit hin it .

1.3.2 Solution Use a CREATE DATABASE st at em ent t o creat e a dat abase, a CREATE TABLE st at em ent for each t able you want t o use, and INSERT t o add records t o t he t ables.

1.3.3 Discussion The GRANT st at em ent used in t he previous sect ion defines privileges for t he cookbook dat abase, but does not creat e it . You need t o creat e t he dat abase explicit ly before you can use

it . This sect ion shows how t o do t hat , and also how t o creat e a t able and load it wit h som e sam ple dat a t hat can be used for exam ples in t he following sect ions. Aft er t he cbuser account has been set up, verify t hat you can use it t o connect t o t he MySQL server. Once you've connect ed successfully, creat e t he dat abase. From t he host t hat was nam ed in t he GRANT st at em ent , run t he following com m ands t o do t his ( t he host nam ed aft er - h should be t he host where t he MySQL server is running) :

% mysql -h localhost -p -u cbuser Enter password: cbpass mysql> CREATE DATABASE cookbook; Query OK, 1 row affected (0.08 sec) Now you have a dat abase, so you can creat e t ables in it . I ssue t he following st at em ent s t o select cookbook as t he default dat abase, creat e a sim ple t able, and populat e it wit h a few records: [ 1] [ 1]

I f you don't want t o ent er t he com plet e t ext of t he INSERT st at em ent s ( and I don't blam e you) , skip ahead t o Recipe 1.13 for a short cut . And if you don't want t o t ype in any of t he st at em ent s, skip ahead t o Recipe 1.16.

mysql> mysql> mysql> mysql> mysql> mysql> mysql> mysql> mysql> mysql> mysql> mysql> mysql> mysql>

USE cookbook; CREATE TABLE limbs (thing VARCHAR(20), legs INT, arms INT); INSERT INTO limbs (thing,legs,arms) VALUES('human',2,2); INSERT INTO limbs (thing,legs,arms) VALUES('insect',6,0); INSERT INTO limbs (thing,legs,arms) VALUES('squid',0,10); INSERT INTO limbs (thing,legs,arms) VALUES('octopus',0,8); INSERT INTO limbs (thing,legs,arms) VALUES('fish',0,0); INSERT INTO limbs (thing,legs,arms) VALUES('centipede',100,0); INSERT INTO limbs (thing,legs,arms) VALUES('table',4,0); INSERT INTO limbs (thing,legs,arms) VALUES('armchair',4,2); INSERT INTO limbs (thing,legs,arms) VALUES('phonograph',0,1); INSERT INTO limbs (thing,legs,arms) VALUES('tripod',3,0); INSERT INTO limbs (thing,legs,arms) VALUES('Peg Leg Pete',1,2); INSERT INTO limbs (thing,legs,arms) VALUES('space alien',NULL,NULL);

The t able is nam ed limbs and cont ains t hree colum ns t o records t he num ber of legs and arm s possessed by various life form s and obj ect s. ( The physiology of t he alien in t he last row is such t hat t he proper values for t he arms and legs colum n cannot be det erm ined; NULL indicat es "unknown value.") Verify t hat t he t able cont ains what you expect by issuing a SELECT st at em ent :

mysql> SELECT * FROM limbs; +--------------+------+------+ | thing | legs | arms | +--------------+------+------+ | human | 2 | 2 | | insect | 6 | 0 | | squid | 0 | 10 | | octopus | 0 | 8 | | fish | 0 | 0 |

| centipede | 100 | 0 | | table | 4 | 0 | | armchair | 4 | 2 | | phonograph | 0 | 1 | | tripod | 3 | 0 | | Peg Leg Pete | 1 | 2 | | space alien | NULL | NULL | +--------------+------+------+ 12 rows in set (0.00 sec) At t his point , you're all set up wit h a dat abase and a t able t hat can be used t o run som e exam ple queries.

1.4 Starting and Terminating mysql 1.4.1 Problem You want t o st art and st op t he m ysql program .

1.4.2 Solution I nvoke m ysql from your com m and prom pt t o st art it , specifying any connect ion param et ers t hat m ay be necessary. To leave m ysql, use a QUIT st at em ent .

1.4.3 Discussion To st art t he m ysql program , t ry j ust t yping it s nam e at your com m and- line prom pt . I f m ysql st art s up correct ly, you'll see a short m essage, followed by a mysql> prom pt t hat indicat es t he program is ready t o accept queries. To illust rat e, here's what t he welcom e m essage looks like ( t o save space, I won't show it in any furt her exam ples) :

% mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 18427 to server version: 3.23.51-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> I f m ysql t ries t o st art but exit s im m ediat ely wit h an "access denied" m essage, you'll need t o specify connect ion param et ers. The m ost com m only needed param et ers are t he host t o connect t o ( t he host where t he MySQL server runs) , your MySQL usernam e, and a password. For exam ple:

% mysql -h localhost -p -u cbuser Enter password: cbpass I n general, I 'll show m ysql com m ands in exam ples wit h no connect ion param et er opt ions. I assum e t hat you'll supply any param et ers t hat you need, eit her on t he com m and line, or in an opt ion file ( Recipe 1.5) so t hat you don't have t o t ype t hem each t im e you invoke m ysql.

I f you don't have a MySQL usernam e and password, you need t o obt ain perm ission t o use t he MySQL server, as described earlier in Recipe 1.2. The synt ax and default values for t he connect ion param et er opt ions are shown in t he following t able. These opt ions have bot h a single-dash short form and a double-dash long form . Pa r a m e t e r t ype

Opt ion synt a x for m s

D e fa ult va lue

Host nam e

- h hostname- - host =hostname

localhost

Usernam e

- u username- - user =username

Your login nam e

Password

- p- - password

None

As t he t able indicat es, t here is no default password. To supply one, use - - password or -p, t hen ent er your password when m ysql prom pt s you for it :

%

mysql -p

Enter password:

enter your password here

I f you like, you can specify t he password direct ly on t he com m and line by using eit her p password ( not e t hat t here is no space aft er t he -p) or - - password =password. I don't recom m end doing t his on a m ult iple- user m achine, because t he password m ay be visible m om ent arily t o ot her users who are running t ools such as ps t hat report process inform at ion. I f you get an error m essage t hat m ysql cannot be found or is an invalid com m and when you t ry t o invoke it , t hat m eans your com m and int erpret er doesn't know where m ysql is inst alled. See Recipe 1.8. To t erm inat e a m ysql session, issue a QUIT st at em ent :

mysql> QUIT You can also t erm inat e t he session by issuing an EXIT st at em ent or ( under Unix) by t yping Ct rl- D. The way you specify connect ion param et ers for m ysql also applies t o ot her MySQL program s such as m ysqldum p and m ysqladm in. For exam ple, som e of t he act ions t hat m ysqladm in can perform are available only t o t he MySQL root account , so you need t o specify nam e and password opt ions for t hat user:

% mysqladmin -p -u root shutdown Enter password:

1.5 Specifying Connection Parameters by Using Option Files 1.5.1 Problem You don't want t o t ype connect ion param et ers on t he com m and line every t im e you invoke m ysql.

1.5.2 Solution Put t he param et ers in an opt ion file.

1.5.3 Discussion To avoid ent ering connect ion param et ers m anually, put t hem in an opt ion file for m ysql t o read aut om at ically. Under Unix, your personal opt ion file is nam ed .m y.cnf in your hom e direct ory. There are also sit e- wide opt ion files t hat adm inist rat ors can use t o specify param et ers t hat apply globally t o all users. You can use / et c/ m y.cnf or t he m y.cnf file in t he MySQL server's dat a direct ory. Under Windows, t he opt ion files you can use are C: \ m y.cnf, t he m y.ini file in your Windows syst em direct ory, or m y.cnf in t he server's dat a direct ory.

Windows m ay hide filenam e ext ensions when displaying files, so a file nam ed m y.cnf m ay appear t o be nam ed j ust m y. Your version of Windows m ay allow you t o disable ext ension- hiding. Alt ernat ively, issue a DIR com m and in a DOS window t o see full nam es.

The following exam ple illust rat es t he form at used t o writ e MySQL opt ion files:

# general client program connection options [client] host=localhost user=cbuser password=cbpass # options specific to the mysql program [mysql] no-auto-rehash # specify pager for interactive mode pager=/usr/bin/less This form at has t he following general charact erist ics:



Lines are writ t en in groups. The first line of t he group specifies t he group nam e inside of square bracket s, and t he rem aining lines specify opt ions associat ed wit h t he group. The exam ple file j ust shown has a [client] group and a [mysql] group. Wit hin a group, opt ion lines are writ t en in name=value form at , where name corresponds t o an opt ion nam e ( wit hout leading dashes) and value is t he opt ion's value. I f an opt ion

doesn't t ake any value ( such as for t he no-auto-rehash opt ion) , t he nam e is list ed by it self wit h no t railing =value part .



I f you don't need som e part icular param et er, j ust leave out t he corresponding line. For exam ple, if you norm ally connect t o t he default host ( localhost) , you don't need any host line. I f your MySQL usernam e is t he sam e as your operat ing syst em login nam e, you can om it t he user line.



I n opt ion files, only t he long form of an opt ion is allowed. This is in cont rast t o com m and lines, where opt ions oft en can be specified using a short form or a long form . For exam ple, t he host nam e can be given using eit her - h hostname or - host =hostname on t he com m and line; in an opt ion file, only host=hostname is allowed.



Opt ions oft en are used for connect ion param et ers ( such as host, user, and

password) . However, t he file can specify opt ions t hat have ot her purposes. The pager opt ion shown for t he [mysql] group specifies t he paging program t hat m ysql should use for displaying out put in int eract ive m ode. I t has not hing t o do wit h how t he program connect s t o t he server.



The usual group for specifying client connect ion param et ers is [client]. This group act ually is used by all t he st andard MySQL client s, so by creat ing an opt ion file t o use wit h m ysql, you m ake it easier t o invoke ot her program s such as m ysqldum p and m ysqladm in as well.



You can define m ult iple groups in an opt ion file. A com m on convent ion is for a program t o look for param et ers in t he [client] group and in t he group nam ed aft er t he program it self. This provides a convenient way t o list general client param et ers t hat you want all client program s t o use, but st ill be able t o specify opt ions t hat apply only t o a part icular program . The preceding sam ple opt ion file illust rat es t his convent ion for t he m ysql program , which get s general connect ion param et ers from t he

[client] group and also picks up t he no-auto-rehash and pager opt ions from t he [mysql] group. ( I f you put t he m ysql- specific opt ions in t he [client] group, t hat will result in "unknown opt ion" errors for all ot her program s t hat use t he [client] group and t hey won't run properly.)



I f a param et er is specified m ult iple t im es in an opt ion file, t he last value found t akes precedence. This m eans t hat norm ally you should list any program - specific groups aft er t he [client] group so t hat if t here is any overlap in t he opt ions set by t he t wo groups, t he m ore general opt ions will be overridden by t he program -specific values.



Lines beginning wit h # or ; charact ers are ignored as com m ent s. Blank lines are ignored, t oo.



Opt ion files m ust be plain t ext files. I f you creat e an opt ion file wit h a word processor t hat uses som e non- t ext form at by default , be sure t o save t he file explicit ly as t ext . Windows users especially should t ake not e of t his.



Opt ions t hat specify file or direct ory pat hnam es should be writ t en using / as t he pat hnam e separat or charact er, even under Windows.

I f you want t o find out which opt ions will be t aken from opt ion files by m ysql, use t his com m and:

% mysql --print-defaults You can also use t he m y_print _default s ut ilit y, which t akes as argum ent s t he nam es of t he opt ion file groups t hat it should read. For exam ple, m ysql looks in bot h t he [client] and

[mysql] groups for opt ions, so you can check which values it will t ake from opt ion files like t his:

% my_print_defaults client mysql

1.6 Protecting Option Files 1.6.1 Problem Your MySQL usernam e and password are st ored in your opt ion file, and you don't want ot her users reading it .

1.6.2 Solution Change t he file's m ode t o m ake it accessible only by you.

1.6.3 Discussion I f you use a m ult iple- user operat ing syst em such as Unix, you should prot ect your opt ion file t o prevent ot her users from finding out how t o connect t o MySQL using your account . Use chm od t o m ake t he file privat e by set t ing it s m ode t o allow access only by yourself:

% chmod 600 .my.cnf

1.7 Mixing Command-Line and Option File Parameters 1.7.1 Problem You'd rat her not st ore your MySQL password in an opt ion file, but you don't want t o ent er your usernam e and server host m anually.

1.7.2 Solution Put t he usernam e and host in t he opt ion file, and specify t he password int eract ively when you invoke m ysql; it looks bot h in t he opt ion file and on t he com m and line for connect ion param et ers. I f an opt ion is specified in bot h places, t he one on t he com m and line t akes precedence.

1.7.3 Discussion

m ysql first reads your opt ion file t o see what connect ion param et ers are list ed t here, t hen checks t he com m and line for addit ional param et ers. This m eans you can specify som e opt ions one way, and som e t he ot her way. Com m and- line param et ers t ake precedence over param et ers found in your opt ion file, so if for som e reason you need t o override an opt ion file param et er, j ust specify it on t he com m and line. For exam ple, you m ight list your regular MySQL usernam e and password in t he opt ion file for general purpose use. I f you need t o connect on occasion as t he MySQL root user, specify t he user and password opt ions on t he com m and line t o override t he opt ion file values:

% mysql -p -u root To explicit ly specify "no password" when t here is a non- em pt y password in t he opt ion file, use - p on t he com m and line, and t hen j ust press Ret urn when m ysql prom pt s you for t he password:

%

mysql -p

Enter password:

press Return here

1.8 What to Do if mysql Cannot Be Found 1.8.1 Problem When you invoke m ysql from t he com m and line, your com m and int erpret er can't find it .

1.8.2 Solution Add t he direct ory where m ysql is inst alled t o your PATH set t ing. Then you'll be able t o run m ysql from any direct ory easily.

1.8.3 Discussion I f your shell or com m and int erpret er can't find m ysql when you invoke it , you'll see som e sort of error m essage. I t m ay look like t his under Unix:

% mysql mysql: Command not found. Or like t his under Windows:

C:\> mysql Bad command or invalid filename

One way t o t ell your shell where t o find m ysql is t o t ype it s full pat hnam e each t im e you run it . The com m and m ight look like t his under Unix:

% /usr/local/mysql/bin/mysql Or like t his under Windows:

C:\> C:\mysql\bin\mysql Typing long pat hnam es get s t iresom e pret t y quickly, t hough. You can avoid doing so by changing int o t he direct ory where m ysql is inst alled before you run it . However, I recom m end t hat you not do t hat . I f you do, t he inevit able result is t hat you'll end up put t ing all your dat afiles and query bat ch files in t he sam e direct ory as m ysql, t hus unnecessarily clut t ering up what should be a locat ion int ended only for program s. A bet t er solut ion is t o m ake sure t hat t he direct ory where m ysql is inst alled is included in t he

PATH environm ent variable t hat list s pat hnam es of direct ories where t he shell looks for com m ands. ( See Recipe 1.9.) Then you can invoke m ysql from any direct ory by ent ering j ust it s nam e, and your shell will be able t o find it . This elim inat es a lot of unnecessary pat hnam e t yping. An addit ional benefit is t hat because you can easily run m ysql from anywhere, you will have no need t o put your dat afiles in t he sam e direct ory where m ysql is locat ed. When you're not operat ing under t he burden of running m ysql from a part icular locat ion, you'll be free t o organize your files in a way t hat m akes sense t o you, not in a way im posed by som e art ificial necessit y. For exam ple, you can creat e a direct ory under your hom e direct ory for each dat abase you have and put t he files associat ed wit h each dat abase in t he appropriat e direct ory. I 've point ed out t he im port ance of t he search pat h here because I receive m any quest ions from people who aren't aware of t he exist ence of such a t hing, and who consequent ly t ry t o do all t heir MySQL- relat ed work in t he bin direct ory where m ysql is inst alled. This seem s part icularly com m on am ong Windows users. Perhaps t he reason is t hat , except for Windows NT and it s derivat ives, t he Windows Help applicat ion seem s t o be silent on t he subj ect of t he com m and int erpret er search pat h or how t o set it . ( Apparent ly, Windows Help considers it dangerous for people t o know how t o do som et hing useful for t hem selves.) Anot her way for Windows users t o avoid t yping t he pat hnam e or changing int o t he m ysql direct ory is t o creat e a short cut and place it in a m ore convenient locat ion. That has t he advant age of m aking it easy t o st art up m ysql j ust by opening t he short cut . To specify com m and- line opt ions or t he st art up direct ory, edit t he short cut 's propert ies. I f you don't always invoke m ysql wit h t he sam e opt ions, it m ight be useful t o creat e a short cut corresponding t o each set of opt ions you need—for exam ple, one short cut t o connect as an ordinary user for general work and anot her t o connect as t he MySQL root user for adm inist rat ive purposes.

1.9 Setting Environment Variables 1.9.1 Problem You need t o m odify your operat ing environm ent , for exam ple, t o change your shell's PATH set t ing.

1.9.2 Solution Edit t he appropriat e shell st art up file. Under Windows NT- based syst em s, anot her alt ernat ive is t o use t he Syst em cont rol panel.

1.9.3 Discussion The shell or com m and int erpret er you use t o run program s from t he com m and- line prom pt includes an environm ent in which you can st ore variable values. Som e of t hese variables are used by t he shell it self. For exam ple, it uses PATH t o det erm ine which direct ories t o look in for program s such as m ysql. Ot her variables are used by ot her program s ( such as PERL5LIB, which t ells Perl where t o look for library files used by Perl script s) . Your shell det erm ines t he synt ax used t o set environm ent variables, as well as t he st art up file in which t o place t he set t ings. Typical st art up files for various shells are shown in t he following t able. I f you've never looked t hrough your shell's st art up files, it 's a good idea t o do so t o fam iliarize yourself wit h t heir cont ent s. She ll

Possible st a r t up file s

csh, t csh

.login, .cshrc, .t cshrc

sh, bash, ksh

.profile .bash_profile, .bash_login, .bashrc

DOS prom pt

C: \ AUTOEXEC.BAT

The following exam ples show how t o set t he PATH variable so t hat it includes t he direct ory where t he m ysql program is inst alled. The exam ples assum e t here is an exist ing PATH set t ing in one of your st art up files. I f you have no PATH set t ing current ly, sim ply add t he appropriat e line or lines t o one of t he files.

I f you're reading t his sect ion because you've been referred here from anot her chapt er, you'll probably be m ore int erest ed in changing som e variable ot her t han PATH. The inst ruct ions are sim ilar because you use t he sam e synt ax.

The PATH variable list s t he pat hnam es for one or m ore direct ories. I f an environm ent variable's value consist s of m ult iple pat hnam es, it 's convent ional under Unix t o separat e t hem using t he colon charact er ( :) . Under Windows, pat hnam es m ay cont ain colons, so t he separat or is t he sem icolon charact er ( ;) .

To set t he value of PATH, use t he inst ruct ions t hat pert ain t o your shell:



For csh or t csh, look for a setenv PATH com m and in your st art up files, t hen add t he appropriat e direct ory t o t he line. Suppose your search pat h is set by a line like t his in your .login file:

setenv PATH /bin:/usr/bin:/usr/local/bin I f m ysql is inst alled in / usr/ local/ m ysql/ bin, add t hat direct ory t o t he search pat h by changing t he setenv line t o look like t his:

setenv PATH /usr/local/mysql/bin:/bin:/usr/bin:/usr/local/bin I t 's also possible t hat your pat h will be set wit h set path, which uses different synt ax:

set path = (/usr/local/mysql/bin /bin /usr/bin /usr/local/bin)



For a shell in t he Bourne shell fam ily such as sh, bash, or ksh, look in your st art up files for a line t hat set s up and export s t he PATH variable:

export PATH=/bin:/usr/bin:/usr/local/bin The assignm ent and t he export m ight be on separat e lines:

PATH=/bin:/usr/bin:/usr/local/bin export PATH Change t he set t ing t o t his:

export PATH=/usr/local/mysql/bin:/bin:/usr/bin:/usr/local/bin Or:

PATH=/usr/local/mysql/bin:/bin:/usr/bin:/usr/local/bin export PATH



Under Windows, check for a line t hat set s t he PATH variable in your AUTOEXEC.BAT file. I t m ight look like t his:

PATH=C:\WINDOWS;C:\WINDOWS\COMMAND Or like t his:

SET PATH=C:\WINDOWS;C:\WINDOWS\COMMAND

Change t he PATH value t o include t he direct ory where m ysql is inst alled. I f t his is C: \ m ysql\ bin, t he result ing PATH set t ing looks like t his:

PATH=C:\mysql\bin;C:\WINDOWS;C:\WINDOWS\COMMAND Or:

SET PATH=C:\mysql\bin;C:\WINDOWS;C:\WINDOWS\COMMAND



Under Windows NT- based syst em s, anot her way t o change t he PATH value is t o use t he Syst em cont rol panel ( use it s Environm ent or Advanced t ab, whichever is present ) . I n ot her versions of Windows, you can use t he Regist ry Edit or applicat ion. Unfort unat ely, t he nam e of t he Regist ry Edit or key t hat cont ains t he pat h value seem s t o vary am ong versions of Windows. For exam ple, on t he Windows m achines t hat I use, t he key has one nam e under Windows Me and a different nam e under Windows 98; under Windows 95, I couldn't find t he key at all. I t 's probably sim pler j ust t o edit AUTOEXEC.BAT.

Aft er set t ing an environm ent variable, you'll need t o cause t he m odificat ion t o t ake effect . Under Unix, you can log out and log in again. Under Windows, if you set PATH using t he Syst em cont rol panel, you can sim ply open a new DOS window. I f you edit ed AUTOEXEC.BAT inst ead, rest art t he m achine.

1.10 Issuing Queries 1.10.1 Problem You've st art ed m ysql and now you want t o send queries t o t he MySQL server.

1.10.2 Solution Just t ype t hem in, but be sure t o let m ysql know where each one ends.

1.10.3 Discussion To issue a query at t he mysql> prom pt , t ype it in, add a sem icolon ( ;) at t he end t o signify t he end of t he st at em ent , and press Ret urn. An explicit st at em ent t erm inat or is necessary; m ysql doesn't int erpret Ret urn as a t erm inat or because it 's allowable t o ent er a st at em ent using m ult iple input lines. The sem icolon is t he m ost com m on t erm inat or, but you can also use

\g ( "go") as a synonym for t he sem icolon. Thus, t he following exam ples are equivalent ways of issuing t he sam e query, even t hough t hey are ent ered different ly and t erm inat ed different ly: [ 2] [ 2]

Exam ple queries in t his book are shown wit h SQL keywords like SELECT in uppercase for dist inct iveness, but t hat 's sim ply a t ypographical convent ion. You can ent er keywords in any let t ercase.

mysql> SELECT NOW( ); +---------------------+ | NOW( ) | +---------------------+ | 2001-07-04 10:27:23 | +---------------------+ mysql> SELECT -> NOW( )\g +---------------------+ | NOW( ) | +---------------------+ | 2001-07-04 10:27:28 | +---------------------+ Not ice for t he second query t hat t he prom pt changes from mysql> t o -> on t he second input line. m ysql changes t he prom pt t his way t o let you know t hat it 's st ill wait ing t o see t he query t erm inat or. Be sure t o underst and t hat neit her t he ; charact er nor t he \g sequence t hat serve as query t erm inat ors are part of t he query it self. They're convent ions used by t he m ysql program , which recognizes t hese t erm inat ors and st rips t hem from t he input before sending t he query t o t he MySQL server. I t 's im port ant t o rem em ber t his when you writ e your own program s t hat send queries t o t he server ( as we'll begin t o do in t he next chapt er) . I n t hat cont ext , you don't include any t erm inat or charact ers; t he end of t he query st ring it self signifies t he end of t he query. I n fact , adding a t erm inat or m ay well cause t he query t o fail wit h an error.

1.11 Selecting a Database 1.11.1 Problem You want t o t ell m ysql which dat abase t o use.

1.11.2 Solution Nam e t he dat abase on t he m ysql com m and line or issue a USE st at em ent from wit hin m ysql.

1.11.3 Discussion When you issue a query t hat refers t o a t able ( as m ost queries do) , you need t o indicat e which dat abase t he t able is part of. One way t o do so is t o use a fully qualified t able reference t hat begins wit h t he dat abase nam e. ( For exam ple, cookbook.limbs refers t o t he limbs t able in t he cookbook dat abase.) As a convenience, MySQL also allows you t o select a default ( current ) dat abase so t hat you can refer t o it s t ables wit hout explicit ly specifying t he dat abase nam e each t im e. You can specify t he dat abase on t he com m and line when you st art m ysql:

% mysql cookbook I f you provide opt ions on t he com m and line such as connect ion param et ers when you run m ysql, t hey should precede t he dat abase nam e:

% mysql -h

host

-p -u

user

cookbook

I f you've already st art ed a m ysql session, you can select a dat abase ( or swit ch t o a different one) by issuing a USE st at em ent :

mysql> USE cookbook; Database changed I f you've forgot t en or are not sure which dat abase is t he current one ( which can happen easily if you're using m ult iple dat abases and swit ching bet ween t hem several t im es during t he course of a m ysql session) , use t he following st at em ent :

mysql> SELECT DATABASE( ); +------------+ | DATABASE() | +------------+ | cookbook | +------------+

DATABASE( ) is a funct ion t hat ret urns t he nam e of t he current dat abase. I f no dat abase has been select ed yet , t he funct ion ret urns an em pt y st ring:

mysql> SELECT DATABASE( ); +------------+ | DATABASE() | +------------+ | | +------------+ The STATUS com m and ( and it s synonym , \s) also display t he current dat abase nam e, in addit ional t o several ot her pieces of inform at ion:

mysql> \s -------------Connection id: 5589 Current database: cookbook Current user: cbuser@localhost Current pager: stdout Using outfile: '' Server version: 3.23.51-log Protocol version: 10 Connection: Localhost via UNIX socket Client characterset: latin1 Server characterset: latin1 UNIX socket: /tmp/mysql.sock Uptime: 9 days 39 min 43 sec Threads: 4 Questions: 42265 Slow queries: 0 Open tables: 52 Queries per second avg: 0.054 --------------

Opens: 82

Flush tables: 1

Temporarily Using a Table from Another Database To use a t able from anot her dat abase t em porarily, you can swit ch t o t hat dat abase and t hen swit ch back when you're done using t he t able. However, you can also use t he t able wit hout swit ching dat abases by referring t o t he t able using it s fully qualified nam e. For exam ple, t o use t he t able other_tbl in anot her dat abase other_db, you can refer t o it as other_db.other_tbl.

1.12 Canceling a Partially Entered Query 1.12.1 Problem You st art t o ent er a query, t hen decide not t o issue it aft er all.

1.12.2 Solution Cancel t he query using your line kill charact er or t he \c sequence.

1.12.3 Discussion I f you change your m ind about issuing a query t hat you're ent ering, cancel it . I f t he query is on a single line, use your line kill charact er t o erase t he ent ire line. ( The part icular charact er t o use depends on your t erm inal set up; for m e, t he charact er is Ct rl-U.) I f you've ent ered a st at em ent over m ult iple lines, t he line kill charact er will erase only t he last line. To cancel t he st at em ent com plet ely, ent er \c and t ype Ret urn. This will ret urn you t o t he mysql> prom pt :

mysql> SELECT * -> FROM limbs -> ORDER BY\c mysql> Som et im es \c appears t o do not hing ( t hat is, t he mysql> prom pt does not reappear) , which leads t o t he sense t hat you're "t rapped" in a query and can't escape. I f \c is ineffect ive, t he cause usually is t hat you began t yping a quot ed st ring and haven't yet ent ered t he m at ching end quot e t hat t erm inat es t he st ring. Let m ysql's prom pt help you figure out what t o do here. I f t he prom pt has changed from mysql> t o ">, That m eans m ysql is looking for a t erm inat ing double quot e. I f t he prom pt is '> inst ead, m ysql is looking for a t erm inat ing single quot e. Type t he appropriat e m at ching quot e t o end t he st ring, t hen ent er \c followed by Ret urn and you should be okay.

1.13 Repeating and Editing Queries 1.13.1 Problem The query you j ust ent ered cont ained an error, and you want t o fix it wit hout t yping t he whole t hing again. Or you want t o repeat an earlier st at em ent wit hout ret yping it .

1.13.2 Solution Use m ysql's built - in query edit or.

1.13.3 Discussion I f you issue a long query only t o find t hat it cont ains a synt ax error, what should you do? Type in t he ent ire correct ed query from scrat ch? No need. m ysql m aint ains a st at em ent hist ory and support s input - line edit ing. This allows you t o recall queries so t hat you can m odify and reissue t hem easily. There are m any, m any edit ing funct ions, but m ost people t end t o use a sm all set of com m ands for t he m aj orit y of t heir edit ing. [ 3] A basic set of useful com m ands is shown in t he following t able. Typically, you use Up Arrow t o recall t he previous line, Left Arrow and Right Arrow t o m ove around wit hin t he line, and Backspace or Delet e t o erase charact ers. To add new charact ers t o t he line, j ust m ove t he cursor t o t he appropriat e spot and t ype t hem in. When you're done edit ing, press Ret urn t o issue t he query ( t he cursor need not be at t he end of t he line when you do t his) . [ 3]

The input - line edit ing capabilit ies in m ysql are based on t he GNU Readline library. You can read t he docum ent at ion for t his library t o find out m ore about t he m any edit ing funct ions t hat are available. For m ore inform at ion, check t he Bash m anual, available online at ht t p: / / www.gnu.org/ m anual/ . Edit ing Ke y

Effe ct of Ke y

Up Arrow

Scroll up t hrough st at em ent hist ory

Down Arrow

Scroll down t hrough st at em ent hist ory

Left Arrow

Move left wit hin line

Right Arrow

Move right wit hin line

Ct rl- A

Move t o beginning of line

Ct rl- E

Move t o end of line

Backspace

Delet e previous charact er

Ct rl- D

Delet e charact er under cursor

I nput - line edit ing is useful for m ore t han j ust fixing m ist akes. You can use it t o t ry out variant form s of a query wit hout ret yping t he ent ire t hing each t im e. I t 's also handy for ent ering a series of sim ilar st at em ent s. For exam ple, if you want ed t o use t he query hist ory t o issue t he series of INSERT st at em ent s shown earlier in Recipe 1.3 t o creat e t he limbs t able, first ent er t he init ial INSERT st at em ent . Then, t o issue each successive st at em ent , press t he Up Arrow key t o recall t he previous st at em ent wit h t he cursor at t he end, backspace back t hrough t he colum n values t o erase t hem , ent er t he new values, and press Ret urn. To recall a st at em ent t hat was ent ered on m ult iple lines, t he edit ing procedure is a lit t le t rickier t han for single- line st at em ent s. I n t his case, you m ust recall and reent er each successive line of t he query in order. For exam ple, if you've ent ered a t wo- line query t hat cont ains a m ist ake, press Up Arrow t wice t o recall t he first line. Make any m odificat ions

necessary and press Ret urn. Then press Up Arrow t wice m ore t o recall t he second line. Modify it , press Ret urn, and t he query will execut e. Under Windows, m ysql allows st at em ent recall only for NT- based syst em s. For versions such as Windows 98 or Me, you can use t he special m ysqlc client program inst ead. However, m ysqlc requires an addit ional library file, cygwinb19.dll. I f you find a copy of t his library in t he sam e direct ory where m ysqlc is inst alled ( t he bin dir under t he MySQL inst allat ion direct ory) , you should be all set . I f t he library is locat ed in t he MySQL lib direct ory, copy it int o your Windows syst em direct ory. The com m and looks som et hing like t his; you should m odify it t o reflect t he act ual locat ions of t he t wo direct ories on your syst em :

C:\> copy C:\mysql\lib\cygwinb19.dll C:\Windows\System Aft er you m ake sure t he library is in a locat ion where m ysqlc can find it , invoke m ysqlc and it should be capable of input - line edit ing. One unfort unat e consequence of using m ysqlc is t hat it 's act ually a fairly old program . ( For exam ple, even in MySQL 4.x dist ribut ions, m ysqlc dat es back t o 3.22.7.) This m eans it doesn't underst and newer st at em ent s such as SOURCE.

1.14 Using Auto-Completion for Database and Table Names 1.14.1 Problem You wish t here was a way t o t ype dat abase and t able nam es m ore quickly.

1.14.2 Solution There is; use m ysql's nam e aut o- com plet ion facilit y.

1.14.3 Discussion Norm ally when you use m ysql int eract ively, it reads t he list of dat abase nam es and t he nam es of t he t ables and colum ns in your current dat abase when it st art s up. m ysql rem em bers t his inform at ion t o provide nam e com plet ion capabilit ies t hat are useful for ent ering st at em ent s wit h fewer keyst rokes:

• •

I f t he part ial nam e is unique, m ysql com plet es it for you. Ot herwise, you can hit Tab



Ent er addit ional charact ers and hit Tab again once t o com plet e it or t wice t o see t he

Type in a part ial dat abase, t able, or colum n nam e and t hen hit t he Tab key. again t o see t he possible m at ches. new set of m at ches.

m ysql's nam e aut o-com plet ion capabilit y is based on t he t able nam es in t he current dat abase, and t hus is unavailable wit hin a m ysql session unt il a dat abase has been select ed, eit her on t he com m and line or by m eans of a USE st at em ent .

Aut o-com plet ion allows you t o cut down t he am ount of t yping you do. However, if you don't use t his feat ure, reading nam e-com plet ion inform at ion from t he MySQL server m ay be count erproduct ive because it can cause m ysql t o st art up m ore slowly when you have a lot of t ables in your dat abase. To t ell m ysql not t o read t his inform at ion so t hat it st art s up m ore quickly, specify t he - A ( or - - no- aut o- rehash) opt ion on t he m ysql com m and line. Alt ernat ively, put a no-auto-rehash line in t he [mysql] group of your MySQL opt ion file:

[mysql] no-auto-rehash To force m ysql t o read nam e com plet ion inform at ion even if it was invoked in no- com plet ion m ode, issue a REHASH or \# com m and at t he mysql> prom pt .

1.15 Using SQL Variables in Queries 1.15.1 Problem You want t o save a value from a query so you can refer t o it in a subsequent query.

1.15.2 Solution Use a SQL variable t o st ore t he value for lat er use.

1.15.3 Discussion As of MySQL 3.23.6, you can assign a value ret urned by a SELECT st at em ent t o a variable, t hen refer t o t he variable lat er in your m ysql session. This provides a way t o save a result ret urned from one query, t hen refer t o it lat er in ot her queries. The synt ax for assigning a value t o a SQL variable wit hin a SELECT query is @var_name := value, where var_name is t he variable nam e and value is a value t hat you're ret rieving. The variable m ay be used in subsequent queries wherever an expression is allowed, such as in a WHERE clause or in an

INSERT st at em ent . A com m on sit uat ion in which SQL variables com e in handy is when you need t o issue successive queries on m ult iple t ables t hat are relat ed by a com m on key value. Suppose you have a customers t able wit h a cust_id colum n t hat ident ifies each cust om er, and an

orders t able t hat also has a cust_id colum n t o indicat e which cust om er each order is associat ed wit h. I f you have a cust om er nam e and you want t o delet e t he cust om er record as well as all t he cust om er's orders, you need t o det erm ine t he proper cust_id value for t hat cust om er, t hen delet e records from bot h t he customers and orders t ables t hat m at ch t he I D. One way t o do t his is t o first save t he I D value in a variable, t hen refer t o t he variable in t he DELETE st at em ent s: [ 4] [ 4]

I n MySQL 4, you can use m ult iple- t able DELETE st at em ent s t o accom plish t asks like t his wit h a single query. See Chapt er 12 for exam ples.

mysql> SELECT @id := cust_id FROM customers WHERE cust_id=' customer name '; mysql> DELETE FROM customers WHERE cust_id = @id; mysql> DELETE FROM orders WHERE cust_id = @id; The preceding SELECT st at em ent assigns a colum n value t o a variable, but variables also can be assigned values from arbit rary expressions. The following st at em ent det erm ines t he highest sum of t he arms and legs colum ns in t he limbs t able and assigns it t o t he

@max_limbs variable: mysql> SELECT @max_limbs := MAX(arms+legs) FROM limbs; Anot her use for a variable is t o save t he result from LAST_INSERT_ID( ) aft er creat ing a new record in a t able t hat has an AUTO_INCREMENT colum n:

mysql> SELECT @last_id := LAST_INSERT_ID( );

LAST_INSERT_ID( ) ret urns t he value of t he new AUTO_INCREMENT value. By saving it in a variable, you can refer t o t he value several t im es in subsequent st at em ent s, even if you issue ot her st at em ent s t hat creat e t heir own AUTO_INCREMENT values and t hus change t he value ret urned by LAST_INSERT_ID( ). This is discussed furt her in Chapt er 11. SQL variables hold single values. I f you assign a value t o a variable using a st at em ent t hat ret urns m ult iple rows, t he value from t he last row is used:

mysql> SELECT @name := thing FROM limbs WHERE legs = 0; +----------------+ | @name := thing | +----------------+ | squid | | octopus | | fish | | phonograph | +----------------+ mysql> SELECT @name; +------------+ | @name | +------------+ | phonograph | +------------+ I f t he st at em ent ret urns no rows, no assignm ent t akes place and t he variable ret ains it s previous value. I f t he variable has not been used previously, t hat value is NULL:

mysql> SELECT @name2 := thing FROM limbs WHERE legs < 0; Empty set (0.00 sec) mysql> SELECT @name2; +--------+ | @name2 | +--------+ | NULL |

+--------+ To set a variable explicit ly t o a part icular value, use a SET st at em ent . SET synt ax uses = rat her t han := t o assign t he value:

mysql> SET @sum = 4 + 7; mysql> SELECT @sum; +------+ | @sum | +------+ | 11 | +------+ A given variable's value persist s unt il you assign it anot her value or unt il t he end of your m ysql session, whichever com es first . Variable nam es are case sensit ive:

mysql> SET @x = 1; SELECT @x, @X; +------+------+ | @x | @X | +------+------+ | 1 | NULL | +------+------+ SQL variables can be used only where expressions are allowed, not where const ant s or lit eral ident ifiers m ust be provided. Alt hough it 's t em pt ing t o at t em pt t o use variables for such t hings as t able nam es, it doesn't work. For exam ple, you m ight t ry t o generat e a t em porary t able nam e using a variable as follows, but t he result is only an error m essage:

mysql> SET @tbl_name = CONCAT('tbl_',FLOOR(RAND( )*1000000)); mysql> CREATE TABLE @tbl_name (int_col INT); ERROR 1064 at line 2: You have an error in your SQL syntax near '@tbl_name (int_col INT)' at line 1 SQL variables are a MySQL- specific ext ension, so t hey will not work wit h ot her dat abase engines.

1.16 Telling mysql to Read Queries from a File 1.16.1 Problem You want m ysql t o read queries st ored in a file so you don't have t o ent er t hem m anually.

1.16.2 Solution Redirect m ysql's input or use t he SOURCE com m and.

1.16.3 Discussion

By default , t he m ysql program reads input int eract ively from t he t erm inal, but you can feed it queries in bat ch m ode using ot her input sources such as a file, anot her program , or t he com m and argum ent s. You can also use copy and past e as a source of query input . This sect ion discusses how t o read queries from a file. The next few sect ions discuss how t o t ake input from ot her sources. To creat e a SQL script for m ysql t o execut e in bat ch m ode, put your st at em ent s in a t ext file, t hen invoke m ysql and redirect it s input t o read from t hat file:

% mysql cookbook
CREATE TABLE counter (depth INT); mysql> INSERT INTO counter SET depth = 0;

Then creat e a script file loop.sql t hat cont ains t he following lines ( be sure each line ends wit h a sem icolon) :

UPDATE counter SET depth = depth + 1; SELECT depth FROM counter; SOURCE loop.sql; Finally, invoke m ysql and issue a SOURCE com m and t o read t he script file:

% mysql cookbook mysql> SOURCE loop.sql; The first t wo st at em ent s in loop.sql increm ent t he nest ing count er and display t he current

depth value. I n t he t hird st at em ent , loop.sql sources it self, t hus creat ing an input loop. You'll see t he out put whiz by, wit h t he count er display increm ent ing each t im e t hrough t he loop. Event ually m ysql will run out of file descript ors and st op wit h an error:

ERROR: Failed to open file 'loop.sql', error: 24 What is error 24? Find out by using MySQL's perror ( print error) ut ilit y:

% perror 24 Error code 24:

Too many open files

1.17 Telling mysql to Read Queries from Other Programs 1.17.1 Problem You want t o shove t he out put from anot her program int o m ysql.

1.17.2 Solution Use a pipe.

1.17.3 Discussion An earlier sect ion used t he following com m and t o show how m ysql can read SQL st at em ent s from a file:

% mysql cookbook < limbs.sql m ysql can also read a pipe, t o receive out put from ot her program s as it s input . As a t rivial exam ple, t he preceding com m and is equivalent t o t his one:

% cat limbs.sql | mysql cookbook Before you t ell m e t hat I 've qualified for t his week's "useless use of cat award," [ 5] allow m e t o observe t hat you can subst it ut e ot her com m ands for cat . The point is t hat any com m and t hat

produces out put consist ing of sem icolon-t erm inat ed SQL st at em ent s can be used as an input source for m ysql. This can be useful in m any ways. For exam ple, t he m ysqldum p ut ilit y is used t o generat e dat abase backups. I t writ es a backup as a set of SQL st at em ent s t hat recreat e t he dat abase, so t o process m ysqldum p out put , you feed it t o m ysql. This m eans you can use t he com binat ion of m ysqldum p and m ysql t o copy a dat abase over t he net work t o anot her MySQL server: [ 5]

Under Windows, t he equivalent would be t he " useless use of t ype award":

% mysqldump cookbook | mysql -h some.other.host.com cookbook Program -generat ed SQL also can be useful when you need t o populat e a t able wit h t est dat a but don't want t o writ e t he INSERT st at em ent s by hand. I nst ead, writ e a short program t hat generat es t he st at em ent s and send it s out put t o m ysql using a pipe:

% generate-test-data | mysql cookbook 1.17.4 See Also m ysqldum p is discussed furt her in Chapt er 10.

1.18 Specifying Queries on the Command Line 1.18.1 Problem You want t o specify a query direct ly on t he com m and line for m ysql t o execut e.

1.18.2 Solution m ysql can read a query from it s argum ent list . Use t he - e ( or - - execut e) opt ion t o specify a query on t he com m and line.

1.18.3 Discussion For exam ple, t o find out how m any records are in t he limbs t able, run t his com m and:

% mysql -e "SELECT COUNT(*) FROM limbs" cookbook +----------+ | COUNT(*) | +----------+ | 12 | +----------+ To run m ult iple queries wit h t he -e opt ion, separat e t hem wit h sem icolons:

% mysql -e "SELECT COUNT(*) FROM limbs;SELECT NOW( )" cookbook +----------+ | COUNT(*) | +----------+ | 12 |

+----------+ +---------------------+ | NOW( ) | +---------------------+ | 2001-07-04 10:42:22 | +---------------------+ 1.18.4 See Also By default , result s generat ed by queries t hat are specified wit h - e are displayed in t abular form at if out put goes t o t he t erm inal, and in t ab-delim it ed form at ot herwise. To produce a different out put st yle, see Recipe 1.22.

1.19 Using Copy and Paste as a mysql Input Source 1.19.1 Problem You want t o t ake advant age of your graphical user int erface ( GUI ) t o m ake m ysql easier t o use.

1.19.2 Solution Use copy and past e t o supply m ysql wit h queries t o execut e. I n t his way, you can t ake advant age of your GUI 's capabilit ies t o augm ent t he t erm inal int erface present ed by m ysql.

1.19.3 Discussion Copy and past e is useful in a windowing environm ent t hat allows you t o run m ult iple program s at once and t ransfer inform at ion bet ween t hem . I f you have a docum ent cont aining queries open in a window, you can j ust copy t he queries from t here and past e t hem int o t he window in which you're running m ysql. This is equivalent t o t yping t he queries yourself, but oft en quicker. For queries t hat you issue frequent ly, keeping t hem visible in a separat e window can be a good way t o m ake sure t hey're always at your fingert ips and easily accessible.

1.20 Preventing Query Output from Scrolling off the Screen 1.20.1 Problem Query out put zoom s off t he t op of your screen before you can see it .

1.20.2 Solution Tell m ysql t o display out put a page at a t im e, or run m ysql in a window t hat allows scrollback.

1.20.3 Discussion I f a query produces m any lines of out put , norm ally t hey j ust scroll right off t he t op of t he screen. To prevent t his, t ell m ysql t o present out put a page at a t im e by specifying t he - - pager opt ion. [ 6] - - pager =program t ells m ysql t o use a specific program as your pager:

[ 6]

The - - pager opt ion is not available under Windows.

% mysql --pager=/usr/bin/less - - pager by it self t ells m ysql t o use your default pager, as specified in your PAGER environm ent variable:

% mysql --pager I f your PAGER variable isn't set , you m ust eit her define it or use t he first form of t he com m and t o specify a pager program explicit ly. To define PAGER, use t he inst ruct ions in Recipe 1.9 for set t ing environm ent variables. Wit hin a m ysql session, you can t urn paging on and off using \P and \n. \P wit hout an argum ent enables paging using t he program specified in your PAGER variable. \P wit h an argum ent enables paging using t he argum ent as t he nam e of t he paging program :

mysql> \P PAGER set mysql> \P PAGER set mysql> \n PAGER set

to /bin/more /usr/bin/less to /usr/bin/less to stdout

Out put paging was int roduced in MySQL 3.23.28. Anot her way t o deal wit h long result set s is t o use a t erm inal program t hat allows you t o scroll back t hrough previous out put . Program s such as xt erm for t he X Window Syst em , Term inal for Mac OS X, MacSSH or Bet t erTelnet for Mac OS, or Telnet for Windows allow you t o set t he num ber of out put lines saved in t he scrollback buffer. Under Windows NT, 2000, or XP, you can set up a DOS window t hat allows scrollback using t he following procedure: 1.

Open t he Cont rol Panel.

2.

Creat e a short cut t o t he MS- DOS prom pt by right clicking on t he Console it em and dragging t he m ouse t o where you want t o place t he short cut ( on t he deskt op, for exam ple) .

3.

Right click on t he short cut and select t he Propert ies it em from t he m enu t hat appears.

4.

Select t he Layout t ab in t he result ing Propert ies window.

5.

Set t he screen buffer height t o t he num ber of lines you want t o save and click t he OK but t on.

Now you should be able t o launch t he short cut t o get a scrollable DOS window t hat allows out put produced by com m ands in t hat window t o be ret rieved by using t he scrollbar.

1.21 Sending Query Output to a File or to a Program 1.21.1 Problem You want t o send m ysql out put som ewhere ot her t han t o your screen.

1.21.2 Solution

Redirect m ysql's out put or use a pipe.

1.21.3 Discussion m ysql chooses it s default out put form at according t o whet her you run it int eract ively or nonint eract ively. Under int eract ive use, m ysql norm ally sends it s out put t o t he t erm inal and writ es query result s using t abular form at :

mysql> SELECT * FROM limbs; +--------------+------+------+ | thing | legs | arms | +--------------+------+------+ | human | 2 | 2 | | insect | 6 | 0 | | squid | 0 | 10 | | octopus | 0 | 8 | | fish | 0 | 0 | | centipede | 100 | 0 | | table | 4 | 0 | | armchair | 4 | 2 | | phonograph | 0 | 1 | | tripod | 3 | 0 | | Peg Leg Pete | 1 | 2 | | space alien | NULL | NULL | +--------------+------+------+ 12 rows in set (0.00 sec) I n non- int eract ive m ode ( t hat is, when eit her t he input or out put is redirect ed) , m ysql writ es out put in t ab- delim it ed form at :

% echo "SELECT * FROM limbs" | mysql cookbook thing legs arms human 2 2 insect 6 0 squid 0 10 octopus 0 8 fish 0 0 centipede 100 0 table 4 0 armchair 4 2 phonograph 0 1 tripod 3 0 Peg Leg Pete 1 2 space alien NULL NULL However, in eit her cont ext , you can select any of m ysql's out put form at s by using t he appropriat e com m and- line opt ions. This sect ion describes how t o send m ysql out put som ewhere ot her t han t he t erm inal. The next several sect ions discuss t he various m ysql out put form at s and how t o select t hem explicit ly according t o your needs when t he default form at isn't what you want . To save out put from m ysql in a file, use your shell's st andard redirect ion capabilit y:

% mysql cookbook >

outputfile

However, if you t ry t o run m ysql int eract ively wit h t he out put redirect ed, you won't be able t o see what you're t yping, so generally in t his case you'll also t ake query input from a file ( or anot her program ) :

% mysql cookbook


outputfile

You can also send query out put t o anot her program . For exam ple, if you want t o m ail query out put t o som eone, you m ight do so like t his:

% mysql cookbook
outputfile That 's fairly crypt ic, t o say t he least . You can achieve t he sam e result wit h ot her languages t hat m ay be easier t o read. Here's a short Perl script t hat does t he sam e t hing as t he sed com m and ( it convert s t ab-delim it ed input t o CSV out put ) , and includes com m ent s t o docum ent how it works:

#! /usr/bin/perl -w while () { s/"/""/g; s/\t/","/g; s/^/"/; s/$/"/; print; }

# read next input line # # # # #

double any quotes within column values put `","' between column values add `"' before the first value add `"' after the last value print the result

exit (0); I f you nam e t he script csv.pl, you can use it like t his:

% mysql cookbook


outputfile

I f you run t he com m and under a version of Windows t hat doesn't know how t o associat e .pl files wit h Perl, it m ay be necessary t o invoke Perl explicit ly:

C:\> mysql cookbook


outputfile

Perl m ay be m ore suit able if you need a cross-plat form solut ion, because it runs under bot h Unix and Windows. t r and sed norm ally are unavailable under Windows.

1.23.4 See Also An even bet t er way t o produce CSV out put is t o use t he Perl Text : : CSV_XS m odule, which was designed for t hat purpose. This m odule is discussed in Chapt er 10, where it 's used t o const ruct a m ore general- purpose file reform at t er.

1.24 Producing HTML Output 1.24.1 Problem You'd like t o t urn a query result int o HTML.

1.24.2 Solution m ysql can do t hat for you.

1.24.3 Discussion m ysql generat es result set out put as HTML t ables if you use - H ( or - - ht m l) opt ion. This gives you a quick way t o produce sam ple out put for inclusion int o a web page t hat shows what t he result of a query looks like. [ 8] Here's an exam ple t hat shows t he difference bet ween t abular form at and HTML t able out put ( a few line breaks have been added t o t he HTML out put t o m ake it easier t o read) : [ 8] I 'm referring t o writ ing st at ic HTML pages here. I f you're writ ing a script t hat produces web pages on t he fly, t here are bet t er ways t o generat e HTML out put from a query. For m ore inform at ion on writ ing web script s, see Chapt er 16.

% mysql -e "SELECT * FROM limbs WHERE legs=0" cookbook +------------+------+------+ | thing | legs | arms | +------------+------+------+ | squid | 0 | 10 | | octopus | 0 | 8 | | fish | 0 | 0 |

| phonograph | 0 | 1 | +------------+------+------+ % mysql -H -e "SELECT * FROM limbs WHERE legs=0" cookbook

thinglegsarms
squid010
octopus08
fish00
phonograph01
The first line of t he t able cont ains colum n headings. I f you don't want a header row, see Recipe 1.26. The - H and - - ht m l opt ions produce out put only for queries t hat generat e a result set . No out put is writ t en for queries such as INSERT or UPDATE st at em ent s. - H and - - ht m l m ay be used as of MySQL 3.22.26. ( They act ually were int roduced in an earlier version, but t he out put was not quit e correct .)

1.25 Producing XML Output 1.25.1 Problem You'd like t o t urn a query result int o XML.

1.25.2 Solution m ysql can do t hat for you.

1.25.3 Discussion m ysql creat es an XML docum ent from t he result of a query if you use t he - X ( or - - xm l) opt ion. Here's an exam ple t hat shows t he difference bet ween t abular form at and t he XML creat ed from t he sam e query:

% mysql -e "SELECT * FROM limbs WHERE legs=0" cookbook +------------+------+------+ | thing | legs | arms | +------------+------+------+ | squid | 0 | 10 | | octopus | 0 | 8 | | fish | 0 | 0 | | phonograph | 0 | 1 | +------------+------+------+ % mysql -X -e "SELECT * FROM limbs WHERE legs=0" cookbook

squid 0 10



octopus 0 8

fish 0 0

phonograph 0 1

- X and - - xm l m ay be used as of MySQL 4.0. I f your version of MySQL is older t han t hat , you can writ e your own XML generat or. See Recipe 10.42.

1.26 Suppressing Column Headings in Query Output 1.26.1 Problem You don't want t o include colum n headings in query out put .

1.26.2 Solution Turn colum n headings off wit h t he appropriat e com m and- line opt ion. Norm ally t his is - N or - skip- colum n-nam es, but you can use -ss inst ead.

1.26.3 Discussion Tab-delim it ed form at is convenient for generat ing dat afiles t hat you can im port int o ot her program s. However, t he first row of out put for each query list s t he colum n headings by default , which m ay not always be what you want . Suppose you have a program nam ed sum m arize t he produces various descript ive st at ist ics for a colum n of num bers. I f you're producing out put from m ysql t o be used wit h t his program , you wouldn't want t he header row because it would t hrow off t he result s. That is, if you ran a com m and like t his, t he out put would be inaccurat e because sum m arize would count t he colum n heading:

% mysql -e "SELECT arms FROM limbs" cookbook | summarize To creat e out put t hat cont ains only dat a values, suppress t he colum n header row wit h t he - N ( or - - skip- colum n- nam es) opt ion:

% mysql -N -e "SELECT arms FROM limbs" cookbook | summarize

- N and - - skip- colum n- nam es were int roduced in MySQL 3.22.20. For older versions, you can achieve t he sam e effect by specifying t he "silent " opt ion ( - s or - - silent ) t wice:

% mysql -ss -e "SELECT arms FROM limbs" cookbook | summarize Under Unix, anot her alt ernat ive is t o use t ail t o skip t he first line:

% mysql -e "SELECT arms FROM limbs" cookbook | tail +2 | summarize

1.27 Numbering Query Output Lines 1.27.1 Problem You'd like t he lines of a query result nicely num bered.

1.27.2 Solution Post process t he out put from m ysql, or use a SQL variable.

1.27.3 Discussion The - N opt ion can be useful in com binat ion wit h cat - n when you want t o num ber t he out put rows from a query under Unix:

% mysql 1 2 3 4 5 6 7 8 9 10 11 12

-N -e "SELECT thing, arms FROM limbs" cookbook | cat -n human 2 insect 0 squid 10 octopus 8 fish 0 centipede 0 table 0 armchair 2 phonograph 1 tripod 0 Peg Leg Pete 2 NULL

Anot her opt ion is t o use a SQL variable. Expressions involving variables are evaluat ed for each row of a query result , a propert y t hat you can use t o provide a colum n of row num bers in t he out put :

mysql> SET @n = 0; mysql> SELECT @n := @n+1 AS rownum, thing, arms, legs FROM limbs; +--------+--------------+------+------+ | rownum | thing | arms | legs | +--------+--------------+------+------+ | 1 | human | 2 | 2 | | 2 | insect | 0 | 6 | | 3 | squid | 10 | 0 | | 4 | octopus | 8 | 0 | | 5 | fish | 0 | 0 |

| 6 | centipede | 0 | 100 | | 7 | table | 0 | 4 | | 8 | armchair | 2 | 4 | | 9 | phonograph | 1 | 0 | | 10 | tripod | 0 | 3 | | 11 | Peg Leg Pete | 2 | 1 | | 12 | space alien | NULL | NULL | +--------+--------------+------+------+

1.28 Making Long Output Lines More Readable 1.28.1 Problem The out put lines from a query are t oo long. They wrap around and m ake a m ess of your screen.

1.28.2 Solution Use vert ical out put form at .

1.28.3 Discussion Som e queries generat e out put lines t hat are so long t hey t ake up m ore t han one line on your t erm inal, which can m ake query result s difficult t o read. Here is an exam ple t hat shows what excessively long query out put lines m ight look like on your screen: [ 9] [ 9]

Prior t o MySQL 3.23.32, om it t he FULL keyword from t he SHOW COLUMNS st at em ent .

mysql> SHOW FULL COLUMNS FROM limbs; +-------+-------------+------+-----+---------+-------+---------------------------------+ | Field | Type | Null | Key | Default | Extra | Privileges | +-------+-------------+------+-----+---------+-------+---------------------------------+ | thing | varchar(20) | YES | | NULL | | select,insert,update,references | | legs | int(11) | YES | | NULL | | select,insert,update,references | | arms | int(11) | YES | | NULL | | select,insert,update,references | +-------+-------------+------+-----+---------+-------+---------------------------------+ An alt ernat ive is t o generat e "vert ical" out put wit h each colum n value on a separat e line. This is done by t erm inat ing a query wit h \G rat her t han wit h a ; charact er or wit h \g. Here's what t he result from t he preceding query looks like when displayed using vert ical form at :

mysql> SHOW FULL COLUMNS FROM limbs\G *************************** 1. row *************************** Field: thing Type: varchar(20) Null: YES Key: Default: NULL Extra: Privileges: select,insert,update,references *************************** 2. row *************************** Field: legs

Type: int(11) Null: YES Key: Default: NULL Extra: Privileges: select,insert,update,references *************************** 3. row *************************** Field: arms Type: int(11) Null: YES Key: Default: NULL Extra: Privileges: select,insert,update,references To specify vert ical out put from t he com m and line, use t he - E ( or - - vert ical) opt ion when you invoke m ysql. This affect s all queries issued during t he session, som et hing t hat can be useful when using m ysql t o execut e a script . ( I f you writ e t he st at em ent s in t he SQL script file using t he usual sem icolon t erm inat or, you can select norm al or vert ical out put from t he com m and line by select ive use of - E.)

1.29 Controlling mysql's Verbosity Level 1.29.1 Problem You want m ysql t o produce m ore out put . Or less.

1.29.2 Solution Use t he - v or - s opt ions for m ore or less verbosit y.

1.29.3 Discussion When you run m ysql non- int eract ively, not only does t he default out put form at change, it becom es m ore t erse. For exam ple, m ysql doesn't print row count s or indicat e how long queries t ook t o execut e. To t ell m ysql t o be m ore verbose, use - v or - - verbose. These opt ions can be specified m ult iple t im es for increasing verbosit y. Try t he following com m ands t o see how t he out put differs:

% % % %

echo echo echo echo

"SELECT "SELECT "SELECT "SELECT

NOW( NOW( NOW( NOW(

)" )" )" )"

| | | |

mysql mysql -v mysql -vv mysql -vvv

The count erpart s of - v and - - verbose are - s and - - silent . These opt ions t oo m ay be used m ult iple t im es for increased effect .

1.30 Logging Interactive mysql Sessions 1.30.1 Problem

You want t o keep a record of what you did in a m ysql session.

1.30.2 Solution Creat e a t ee file.

1.30.3 Discussion I f you m aint ain a log of an int eract ive MySQL session, you can refer back t o it lat er t o see what you did and how. Under Unix, you can use t he script program t o save a log of a t erm inal session. This works for arbit rary com m ands, so it works for int eract ive m ysql sessions, t oo. However, script also adds a carriage ret urn t o every line of t he t ranscript , and it includes any backspacing and correct ions you m ake as you're t yping. A m et hod of logging an int eract ive m ysql session t hat doesn't add ext ra m essy j unk t o t he log file ( and t hat works under bot h Unix and Windows) is t o st art m ysql wit h a - - t ee opt ion t hat specifies t he nam e of t he file in which t o record t he session: [ 10] [ 10]

I t 's called a " t ee" because it 's sim ilar t o t he Unix t ee ut ilit y. For m ore background, t ry t his com m and:

% mysql --tee=tmp.out cookbook To cont rol session logging from wit hin m ysql, use \T and \t t o t urn t ee out put on and off. This is useful if you want t o record only part s of a session:

mysql> \T tmp.out Logging to file 'tmp.out' mysql> \t Outfile disabled. A t ee file cont ains t he queries you ent er as well as t he out put from t hose queries, so it 's a convenient way t o keep a com plet e record of t hem . I t 's useful, for exam ple, when you want t o print or m ail a session or part s of it , or for capt uring query out put t o include as an exam ple in a docum ent . I t 's also a good way t o t ry out queries t o m ake sure you have t he synt ax correct before put t ing t hem in a script file; you can creat e t he script from t he t ee file lat er by edit ing it t o rem ove everyt hing except t hose queries you want t o keep. m ysql appends session out put t o t he end of t he t ee file rat her t han overwrit ing it . I f you want an exist ing file t o cont ain only t he cont ent s of a single session, rem ove it first before invoking m ysql. The abilit y t o creat e t ee files was int roduced in MySQL 3.23.28.

1.31 Creating mysql Scripts from Previously Executed Queries 1.31.1 Problem You want t o reuse queries t hat were issued during an earlier m ysql session.

1.31.2 Solution Use a t ee file from t he earlier session, or look in m ysql's st at em ent hist ory file.

1.31.3 Discussion One way t o creat e a bat ch file is t o ent er your queries int o t he file from scrat ch wit h a t ext edit or and hope t hat you don't m ake any m ist akes while t yping t hem . But it 's oft en easier t o use queries t hat you've already verified as correct . How? First , t ry out t he queries "by hand" using m ysql in int eract ive m ode t o m ake sure t hey work properly. Then, ext ract t he queries from a record of your session t o creat e t he bat ch file. Two sources of inform at ion are part icularly useful for creat ing SQL script s:



You can record all or part s of a m ysql session by using t he - - t ee com m and- line opt ion or t he \T com m and from wit hin m ysql. ( See Recipe 1.30 for m ore inform at ion.)



Under Unix, a second opt ion is t o use your hist ory file. m ysql m aint ains a record of your queries, which it st ores in t he file .m ysql_hist ory in your hom e direct ory.

A t ee file session log has m ore cont ext because it cont ains bot h query input and out put , not j ust t he t ext of t he queries. This addit ional inform at ion can m ake it easier t o locat e t he part s of t he session you want . ( Of course, you m ust also rem ove t he ext ra st uff t o creat e a bat ch file from t he t ee file.) Conversely, t he hist ory file is m ore concise. I t cont ains only of t he queries you issue, so t here are fewer ext raneous lines t o delet e t o obt ain t he queries you want . Choose whichever source of inform at ion best suit s your needs.

1.32 Using mysql as a Calculator 1.32.1 Problem You need a quick way t o evaluat e an expression.

1.32.2 Solution Use m ysql as a calculat or. MySQL doesn't require every SELECT st at em ent t o refer t o a t able, so you can select t he result s of arbit rary expressions.

1.32.3 Discussion

SELECT st at em ent s t ypically refer t o som e t able or t ables from which you're ret rieving rows. However, in MySQL, SELECT need not reference any t able at all, which m eans t hat you can use t he m ysql program as a calculat or for evaluat ing an expression:

mysql> SELECT (17 + 23) / SQRT(64); +----------------------+ | (17 + 23) / SQRT(64) | +----------------------+ | 5.00000000 | +----------------------+

This is also useful for checking how a com parison works. For exam ple, t o det erm ine whet her or not st ring com parisons are case sensit ive, t ry t he following query:

mysql> SELECT 'ABC' = 'abc'; +---------------+ | 'ABC' = 'abc' | +---------------+ | 1 | +---------------+ The result of t his com parison is 1 ( m eaning "t rue"; in general, nonzero values are t rue) . This t ells you t hat st ring com parisons are not case sensit ive by default . Expressions t hat evaluat e t o false ret urn zero:

mysql> SELECT 'ABC' = 'abcd'; +----------------+ | 'ABC' = 'abcd' | +----------------+ | 0 | +----------------+ I f t he value of an expression cannot be det erm ined, t he result is NULL:

mysql> SELECT 1/0; +------+ | 1/0 | +------+ | NULL | +------+ SQL variables m ay be used t o st ore t he result s of int erm ediat e calculat ions. The following st at em ent s use variables t his way t o com put e t he t ot al cost of a hot el bill:

mysql> SET @daily_room_charge = 100.00; mysql> SET @num_of_nights = 3; mysql> SET @tax_percent = 8; mysql> SET @total_room_charge = @daily_room_charge * @num_of_nights; mysql> SET @tax = (@total_room_charge * @tax_percent) / 100; mysql> SET @total = @total_room_charge + @tax; mysql> SELECT @total; +--------+ | @total | +--------+ | 324 | +--------+

1.33 Using mysql in Shell Scripts 1.33.1 Problem You want t o invoke m ysql from wit hin a shell script rat her t han using it int eract ively.

1.33.2 Solution

There's no rule against t hat . Just be sure t o supply t he appropriat e argum ent s t o t he com m and.

1.33.3 Discussion I f you need t o process query result s wit hin a program , you'll t ypically use a MySQL program m ing int erface designed specifically for t he language you're using ( for exam ple, in a Perl script you'd use t he DBI int erface) . But for sim ple, short , or quick-and- dirt y t asks, it m ay be easier j ust t o invoke m ysql direct ly from wit hin a shell script , possibly post processing t he result s wit h ot her com m ands. For exam ple, an easy way t o writ e a MySQL server st at us t est er is t o use a shell script t hat invokes m ysql, as is dem onst rat ed lat er in t his sect ion. Shell script s are also useful for prot ot yping program s t hat you int end t o convert for use wit h a st andard API lat er. For Unix shell script ing, I recom m end t hat you st ick t o shells in t he Bourne shell fam ily, such as sh, bash, or ksh. ( The csh and t csh shells are m ore suit ed t o int eract ive use t han t o script ing.) This sect ion provides som e exam ples showing how t o writ e Unix script s for / bin/ sh. I t also com m ent s briefly on DOS script ing. The sidebar "Using Execut able Program s" describes how t o m ake script s execut able and run t hem .

Using Executable Programs When you writ e a program , you'll generally need t o m ake it execut able before you can run it . I n Unix, you do t his by set t ing t he "execut e" file access m odes using t he chm od com m and:

% chmod +x myprog To run t he program , nam e it on t he com m and line:

% myprog However, if t he program is in your current direct ory, your shell m ight not find it . The shell searches for program s in t he direct ories nam ed in your PATH environm ent variable, but for securit y reasons, t he search pat h for Unix shells oft en is deliberat ely set not t o include t he current direct ory ( .) . I n t hat case, you need t o include a leading pat h of ./ t o explicit ly indicat e t he program 's locat ion:

% ./myprog Som e of t he program s developed in t his book are int ended only t o dem onst rat e a part icular concept and probably never will be run out side your current direct ory, so exam ples t hat use t hem generally show how t o invoke t hem using t he leading ./ pat h. For program s t hat are int ended for repeat ed use, it 's m ore likely t hat you'll inst all t hem in a direct ory nam ed in your PATH set t ing. I n t hat case, no leading pat h will be necessary t o invoke t hem . This also holds for com m on Unix ut ilit ies ( such as chm od) , which are inst alled in st andard syst em direct ories. Under Windows, program s are int erpret ed as execut able based on t heir filenam e ext ensions ( such as .exe or .bat ) , so chm od is unnecessary. Also, t he com m and int erpret er includes t he current direct ory in it s search pat h by default , so you should be able t o invoke program s t hat are locat ed t here wit hout specifying any leading pat h. ( Thus, if you're using Windows and you want t o run an exam ple com m and t hat is shown in t his book using ./, you should om it t he ./ from t he com m and.)

1.33.4 Writing Shell Scripts Under Unix Here is a shell script t hat report s t he current upt im e of t he MySQL server. I t runs a SHOW

STATUS query t o get t he value of t he Uptime st at us variable t hat cont ains t he server upt im e in seconds:

#! /bin/sh # mysql_uptime.sh - report server uptime in seconds mysql -B -N -e "SHOW STATUS LIKE 'Uptime'"

The first line of t he script t hat begins wit h #! is special. I t indicat es t he pat hnam e of t he program t hat should be invoked t o execut e t he rest of t he script , / bin/ sh in t his case. To use t he script , creat e a file nam ed m ysql_upt im e.sh t hat cont ains t he preceding lines and m ake it execut able wit h chm od + x. The m ysql_upt im e.sh script runs m ysql using - e t o indicat e t he query st ring, - B t o generat e bat ch ( t ab- delim it ed) out put , and - N t o suppress t he colum n header line. The result ing out put looks like t his:

% ./mysql_uptime.sh Uptime 1260142 The com m and shown here begins wit h ./, indicat ing t hat t he script is locat ed in your current direct ory. I f you m ove t he script t o a direct ory nam ed in your PATH set t ing, you can invoke it from anywhere, but t hen you should om it t he ./ from t he com m and. Not e t hat m oving t he script m ake cause csh or t csh not t o know where t he script is locat ed unt il your next login. To rem edy t his wit hout logging in again, use rehash aft er m oving t he script . The following exam ple illust rat es t his process:

% ./mysql_uptime.sh Uptime 1260348 % mv mysql_uptime.sh /usr/local/bin % mysql_uptime.sh mysql_uptime.sh: Command not found. % rehash % mysql_uptime.sh Uptime 1260397 I f you prefer a report t hat list s t he t im e in days, hours, m inut es, and seconds rat her t han j ust seconds, you can use t he out put from t he m ysql STATUS st at em ent , which provides t he following inform at ion:

mysql> STATUS; Connection id: Current database: Current user: Current pager: Using outfile: Server version: Protocol version: Connection: Client characterset: Server characterset: UNIX socket: Uptime:

12347 cookbook cbuser@localhost stdout '' 3.23.47-log 10 Localhost via UNIX socket latin1 latin1 /tmp/mysql.sock 14 days 14 hours 2 min 46 sec

For upt im e report ing, t he only relevant part of t hat inform at ion is t he line t hat begins wit h

Uptime. I t 's a sim ple m at t er t o writ e a script t hat sends a STATUS com m and t o t he server and filt ers t he out put wit h grep t o ext ract t he desired line:

#! /bin/sh # mysql_uptime2.sh - report server uptime

mysql -e STATUS | grep "^Uptime" The result looks like t his:

% ./mysql_uptime2.sh Uptime:

14 days 14 hours 2 min 46 sec

The preceding t wo script s specify t he st at em ent t o be execut ed by m eans of t he - e com m andline opt ion, but you can use ot her m ysql input sources described earlier in t he chapt er, such as files and pipes. For exam ple, t he following m ysql_upt im e3.sh script is like m ysql_upt im e2.sh but provides input t o m ysql using a pipe:

#! /bin/sh # mysql_uptime3.sh - report server uptime echo STATUS | mysql | grep "^Uptime" Som e shells support t he concept of a "here-docum ent ," which serves essent ially t he sam e purpose as file input t o a com m and, except t hat no explicit filenam e is involved. ( I n ot her words, t he docum ent is locat ed "right here" in t he script , not st ored in an ext ernal file.) To provide input t o a com m and using a here-docum ent , use t he following synt ax:

command "-"); # set up connection between DBI and output writer my $gen = XML::Generator::DBI->new ( dbh => $dbh, # database handle Handler => $out, # output writer RootElement => "rowset" # document root element ); # issue query and write XML $gen->execute ($query); $dbh->disconnect ( ); exit (0);

10.43 Importing XML into MySQL 10.43.1 Problem You want t o im port an XML docum ent int o a MySQL t able.

10.43.2 Solution Set up an XML parser t o read t he docum ent . Then use t he records in t he docum ent t o const ruct and execut e INSERT st at em ent s.

10.43.3 Discussion I m port ing an XML docum ent depends on being able t o parse t he docum ent and ext ract record cont ent s from it . The way you do t his will depend on how t he docum ent is writ t en. For exam ple, one form at m ight represent colum n nam es and values as at t ribut es of elem ent s:







mysql> -> mysql> mysql> ->

CREATE TABLE shirt (item CHAR(20)); INSERT INTO shirt (item) VALUES('Pinstripe'),('Tie-Dye'),('Black'); CREATE TABLE tie (item CHAR(20)); INSERT INTO tie (item) VALUES('Fleur de lis'),('Paisley'),('Polka Dot');

You can list what 's in each t able by using separat e single- t able queries:

mysql> SELECT item FROM shirt; +-----------+ | item | +-----------+ | Pinstripe | | Tie-Dye | | Black | +-----------+ mysql> SELECT item FROM tie; +--------------+ | item | +--------------+ | Fleur de lis | | Paisley | | Polka Dot | +--------------+ But you can also ask MySQL t o show you various com binat ions of wardrobe it em s by writ ing a query t hat perform s a j oin. A j oin nam es t wo or m ore t ables aft er t he

FROM keyword. I n t he

out put colum n list , you can nam e colum ns from any or all of t he j oined t ables, or use expressions t hat are based on t hose colum ns. The sim plest j oin involves t wo t ables and select s all colum ns from each. Wit h no

WHERE clause, t he j oin generat es out put

for all

com binat ions of rows. Thus, t o find all possible com binat ions of shirt s and t ies, use t he following query t o produce a full j oin bet ween t he t wo t ables:

mysql> SELECT * FROM shirt, tie; +-----------+--------------+ | item | item | +-----------+--------------+ | Pinstripe | Fleur de lis | | Tie-Dye | Fleur de lis | | Black | Fleur de lis | | Pinstripe | Paisley | | Tie-Dye | Paisley | | Black | Paisley | | Pinstripe | Polka Dot | | Tie-Dye | Polka Dot | | Black | Polka Dot | +-----------+--------------+

You can see t hat each it em from t he

shirt t able is paired wit h every

it em from t he

tie

t able. To use t he list t o guide you in your wardrobe select ions, print it out and t ape it up on t he wall. Each day, wear t he it em s displayed in t he first unused row and cross t he row off t he list . The out put colum n list in t he previous query is specified as out put list of

* m eans "every

*. For a single- t able query, an

colum n from t he nam ed t able." Analogously, for a j oin it m eans

"every colum n from every nam ed t able," so t he query ret urns t he colum ns from bot h and

tbl_name.* t o select all a part icular t able, or tbl_name.col_name t o specify a single colum n

tie. Ot her

colum ns from

shirt

ways t o specify out put colum ns are t o use

from t he t able. Thus, all t he following queries are equivalent :

SELECT SELECT SELECT SELECT SELECT The

* FROM shirt, tie; shirt.*, tie.* FROM shirt, tie; shirt.*, tie.item FROM shirt, tie; shirt.item, tie.* FROM shirt, tie; shirt.item, tie.item FROM shirt, tie;

tbl_name.col_name not at ion t hat

always allowable, but can be short ened t o j ust

qualifies a colum n nam e wit h a t able nam e is

col_name if t he nam e appears in only

one

of t he j oined t ables. I n t hat case, MySQL can det erm ine wit hout am biguit y which t able t he colum n com es from and no t able nam e qualifier is necessary. We can't do t hat for a j oin bet ween

shirt and tie;

t hey bot h have a colum n wit h t he sam e nam e ( item) , so t he

following query is am biguous:

mysql> SELECT item, item FROM shirt, tie; ERROR 1052 at line 1: Column: 'item' in field list is ambiguous I f t he colum ns had dist inct nam es such as

s_item and t_item, t he query could be

writ t en unam biguously wit hout t able qualifiers:

SELECT s_item, p_item FROM shirt, tie; To m ake t he m eaning of a query clearer t o hum an readers, it 's oft en useful t o qualify colum n nam es even when t hat 's not st rict ly necessary as far as MySQL is concerned. I t end t o use qualified nam es in j oin query exam ples for t hat reason. Wit hout a

WHERE clause t o rest rict

t he out put , a j oin produces an out put row for every

possible com binat ion of input rows. For large t ables, t his is usually a bad idea, so it 's t ypical t o provide som e kind of condit ion on t he out put rows. For exam ple, if you're t ired of having your office m at es t ease you about your polka dot t ie, select only t he ot her st ylish com binat ions t hat are possible using your wardrobe it em s:

mysql> SELECT shirt.item, tie.item FROM shirt, tie

-> WHERE tie.item != 'Polka Dot'; +-----------+--------------+ | item | item | +-----------+--------------+ | Pinstripe | Fleur de lis | | Tie-Dye | Fleur de lis | | Black | Fleur de lis | | Pinstripe | Paisley | | Tie-Dye | Paisley | | Black | Paisley | +-----------+--------------+ You can also lim it t he out put ot her ways. To select wardrobe com binat ions at random , run t he following query each m orning t o pick a single row from t he full j oin: [1] [1]

ORDER BY RAND( ) is discussed further in Chapter 13.

mysql> SELECT shirt.item, tie.item FROM shirt, tie -> ORDER BY RAND( ) LIMIT 1; +---------+--------------+ | item | item | +---------+--------------+ | Tie-Dye | Fleur de lis | +---------+--------------+ I t 's possible t o perform j oins bet ween m ore t han t wo t ables. Suppose you set up a

pants

t able:

mysql> SELECT * FROM pants; +----------+ | item | +----------+ | Plaid | | Striped | | Corduroy | +----------+ Then you can select com binat ions of shirt s, t ies, and pant s:

mysql> SELECT shirt.item, tie.item, pants.item FROM shirt, tie, pants; +-----------+--------------+----------+ | item | item | item | +-----------+--------------+----------+ | Pinstripe | Fleur de lis | Plaid | | Tie-Dye | Fleur de lis | Plaid | | Black | Fleur de lis | Plaid | | Pinstripe | Paisley | Plaid | | Tie-Dye | Paisley | Plaid | | Black | Paisley | Plaid | | Pinstripe | Polka Dot | Plaid | | Tie-Dye | Polka Dot | Plaid | | Black | Polka Dot | Plaid | | Pinstripe | Fleur de lis | Striped | | Tie-Dye | Fleur de lis | Striped | | Black | Fleur de lis | Striped |

| Pinstripe | Paisley | Striped | | Tie-Dye | Paisley | Striped | | Black | Paisley | Striped | | Pinstripe | Polka Dot | Striped | | Tie-Dye | Polka Dot | Striped | | Black | Polka Dot | Striped | | Pinstripe | Fleur de lis | Corduroy | | Tie-Dye | Fleur de lis | Corduroy | | Black | Fleur de lis | Corduroy | | Pinstripe | Paisley | Corduroy | | Tie-Dye | Paisley | Corduroy | | Black | Paisley | Corduroy | | Pinstripe | Polka Dot | Corduroy | | Tie-Dye | Polka Dot | Corduroy | | Black | Polka Dot | Corduroy | +-----------+--------------+----------+ Clearly, as you j oin m ore t ables, t he num ber of row com binat ions grows quickly, even when each individual t able has few rows. I f you don't want t o writ e out com plet e t able nam es in t he out put colum n list , give each t able a short alias and refer t o t able colum ns using t he aliases:

SELECT s.item, t.item, p.item FROM shirt AS s, tie AS t, pants AS p; Aliases don't save m uch t yping for t he preceding st at em ent , but for com plicat ed queries t hat select m any colum ns, aliases can m ake life m uch sim pler. I n addit ion, aliases are not only convenient but necessary for som e t ypes of queries, as will becom e evident when we get t o t he t opic of self- j oins ( Recipe 12.12) .

12.3 Performing a Join Between Tables in Different Databases 12.3.1 Problem You want t o use t ables in a j oin, but t hey're not locat ed in t he sam e dat abase.

12.3.2 Solution Use dat abase nam e qualifiers t o t ell MySQL where t o find t he t ables.

12.3.3 Discussion Som et im es it 's necessary t o perform a j oin on t wo t ables t hat live in different dat abases. To do t his, qualify t able and colum n nam es sufficient ly so t hat MySQL knows what you're referring t o. We've been using t he t hat bot h are in t he

shirt and tie t ables under

cookbook dat abase, which m eans t hat

t he im plicit underst anding

we can sim ply refer t o t he

t ables wit hout specifying any dat abase nam e. For exam ple, t he following query ret rieves t he com binat ions of it em s from t he t wo t ables:

mysql> SELECT shirt.item, tie.item FROM shirt, tie; +-----------+--------------+ | item | item | +-----------+--------------+ | Pinstripe | Fleur de lis | | Tie-Dye | Fleur de lis | | Black | Fleur de lis | | Pinstripe | Paisley | | Tie-Dye | Paisley | | Black | Paisley | | Pinstripe | Polka Dot | | Tie-Dye | Polka Dot | | Black | Polka Dot | +-----------+--------------+ But suppose inst ead t hat

shirt is in t he db1 dat abase and tie is in t he db2 dat abase.

To indicat e t his, qualify each t able nam e wit h a prefix t hat specifies which dat abase it 's part of. The fully qualified form of t he j oin looks like t his:

SELECT db1.shirt.item, db2.tie.item FROM db1.shirt, db2.tie;

db1 nor db2, it 's necessary t o use t his fully qualified form . However, if t he current dat abase is db1 or db2, you can dispense wit h som e of t he qualifiers. For exam ple, if t he current dat abase is db1, you can om it t he db1 I f t here is no current dat abase, or it is neit her

qualifiers:

SELECT shirt.item, db2.tie.item FROM shirt, db2.tie; Conversely, if t he current dat abase is

db2,

no

db2 qualifiers are necessary:

SELECT db1.shirt.item, tie.item FROM db1.shirt, tie;

12.4 Referring to Join Output Column Names in Programs 12.4.1 Problem You need t o process t he result of a j oin query from wit hin a program , but t he colum n nam es in t he result set aren't unique.

12.4.2 Solution Use colum n aliases t o assign unique nam es t o each colum n, or refer t o t he colum ns by posit ion.

12.4.3 Discussion Joins oft en ret rieve colum ns from sim ilar t ables, and it 's not unusual for colum ns select ed from different t ables t o have t he sam e nam es. Consider again t he t hree- way j oin bet ween t he

shirt, tie,

and

pants t ables t hat

was used in Recipe 12.2:

mysql> SELECT shirt.item, tie.item, pants.item FROM shirt, tie, pants; +-----------+--------------+----------+ | item | item | item | +-----------+--------------+----------+ | Pinstripe | Fleur de lis | Plaid | | Tie-Dye | Fleur de lis | Plaid | | Black | Fleur de lis | Plaid | | Pinstripe | Paisley | Plaid | ... The query uses t he t able nam es t o qualify each inst ance of

item in t he out put

colum n list t o

clarify which t able each it em com es from . But t he colum n nam es in t he out put are not dist inct , because MySQL doesn't include t able nam es in t he colum n headings. I f you're processing t he result of t he j oin from wit hin a program and fet ching rows int o a dat a st ruct ure t hat references colum n values by nam e, non- unique colum n nam es can cause som e values t o becom e inaccessible. The following Perl script fragm ent illust rat es t he difficult y:

$stmt = qq{ SELECT shirt.item, tie.item, pants.item FROM shirt, tie, pants }; $sth = $dbh->prepare ($stmt); $sth->execute ( ); # Determine the number of columns in result set rows two ways: # - Check the NUM_OF_FIELDS statement handle attribute # - Fetch a row into a hash and see how many keys the hash contains $count1 = $sth->{NUM_OF_FIELDS}; $ref = $sth->fetchrow_hashref ( ); $count2 = keys (%{$ref}); print "The statement is: $stmt\n"; print "According to NUM_OF_FIELDS, the result set has $count1 columns\n"; print "The column names are: " . join (",", sort (@{$sth->{NAME}})) . "\n"; print "According to the row hash size, the result set has $count2 columns\n"; print "The column names are: " . join (",", sort (keys (%{$ref}))) . "\n"; The script issues t he wardrobe- select ion query, t hen det erm ines t he num ber of colum ns in t he result , first by checking t he

NUM_OF_FIELDS at t ribut e, t hen by fet ching a row

int o a

hash and count ing t he num ber of hash keys. Execut ing t his script result s in t he following out put :

According to NUM_OF_FIELDS, the result set has 3 columns The column names are: item,item,item According to the row hash size, the result set has 1 columns The column names are: item

There is a problem here—t he colum n count s don't m at ch. The second count is 1 because t he non- unique colum n nam es cause m ult iple colum n values t o be m apped ont o t he sam e hash elem ent . As a result of t hese hash key collisions, som e of t he values are lost . To solve t his problem , m ake t he colum n nam es unique by supplying aliases. For exam ple, t he query can be rewrit t en from :

SELECT shirt.item, tie.item, pants.item FROM shirt, tie, pants t o:

SELECT shirt.item AS shirt, tie.item AS tie, pants.item AS pants FROM shirt, tie, pants I f you m ake t hat change and rerun t he script , it s out put becom es:

According to NUM_OF_FIELDS, the result set has 3 columns The column names are: pants,shirt,tie According to the row hash size, the result set has 3 columns The column names are: pants,shirt,tie Now t he colum n count s are t he sam e; no values are lost when fet ching int o a hash. Anot her way t o address t he problem t hat doesn't require renam ing t he colum ns is t o fet ch t he row int o som et hing ot her t han a hash. For exam ple, you can fet ch t he row int o an array and refer t o t he shirt , t ie, and pant s it em s as t he first t hrough t hird elem ent s of t he array:

while (my @val = $sth->fetchrow_array ( )) { print "shirt: $val[0], tie: $val[1], pants: $val[2]\n"; } The nam e- clash problem m ay have different solut ions in ot her languages. For exam ple, t he problem doesn't occur in quit e t he sam e way in Pyt hon script s. I f you ret rieve a row using a dict ionary ( Pyt hon's closest analog t o a Perl hash) , t he MySQLdb m odule not ices clashing colum n nam es and places t hem in t he dict ionary using a key consist ing of t he colum n nam e wit h t he t able nam e prepended. Thus, for t he following query, t he dict ionary keys would be

item, tie.item,

and

pants.item:

SELECT shirt.item, tie.item, pants.item FROM shirt, tie, pants That m eans colum n values won't get lost , but it 's st ill necessary t o be aware of non- unique nam es. I f you t ry t o refer t o colum n values using j ust t he colum n nam es, you won't get t he result s you expect for t hose nam es t hat are report ed wit h a leading t able nam e. I f you use aliases t o m ake each colum n nam e unique, t he dict ionary ent ries will have t he nam es t hat you assign.

12.5 Finding Rows in One Table That Match Rows in Another 12.5.1 Problem You want t o use rows in one t able t o locat e rows in anot her t able.

12.5.2 Solution Use a j oin wit h an appropriat e

WHERE clause t o m at ch up records from

different t ables.

12.5.3 Discussion The records in t he

shirt, tie,

and

pants t ables from

Recipe 12.2 have no special

relat ionship t o each ot her, so no com binat ion of rows is m ore m eaningful t han any ot her. That 's okay, because t he purpose of t he exam ples t hat use t hose t ables is t o illust rat e how t o perform a j oin, not why you'd do so. The "why" is t hat j oins allow you t o com bine inform at ion from m ult iple t ables when each t able cont ains only part of t he inform at ion in which you're int erest ed. Out put rows from a j oin are m ore com plet e t han rows from eit her t able by it self. This kind of operat ion oft en is based on m at ching rows in one t able t o rows in anot her, which requires t hat each t able have one or m ore colum ns of com m on inform at ion t hat can be used t o link t hem t oget her logically. To illust rat e, suppose you're st art ing an art collect ion, using t he following t wo t ables t o record your acquisit ions.

artist list s t hose paint ers whose works you want

painting list s each paint ing t hat

t o collect , and

you've purchased:

CREATE TABLE artist ( a_id INT UNSIGNED NOT NULL AUTO_INCREMENT, name VARCHAR(30) NOT NULL, PRIMARY KEY (a_id), UNIQUE (name) ); CREATE TABLE painting ( a_id INT UNSIGNED NOT NULL, p_id INT UNSIGNED NOT NULL AUTO_INCREMENT, title VARCHAR(100) NOT NULL, state VARCHAR(2) NOT NULL, price INT UNSIGNED, (dollars) INDEX (a_id), PRIMARY KEY (p_id) );

# artist ID # artist name

# # # # #

artist ID painting ID title of painting state where purchased purchase price

You've j ust begun t he collect ion, so t he t ables cont ain only t he following records:

mysql> SELECT * FROM artist ORDER BY a_id; +------+----------+ | a_id | name | +------+----------+ | 1 | Da Vinci | | 2 | Monet | | 3 | Van Gogh | | 4 | Picasso | | 5 | Renoir | +------+----------+ mysql> SELECT * FROM painting ORDER BY a_id, p_id; +------+------+-------------------+-------+-------+ | a_id | p_id | title | state | price | +------+------+-------------------+-------+-------+ | 1 | 1 | The Last Supper | IN | 34 | | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | 3 | Starry Night | KY | 48 | | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | 5 | The Rocks | IA | 33 | | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+------+-------------------+-------+-------+ The low values in t he

price colum n of t he painting t able bet ray

t he fact t hat your

collect ion act ually cont ains only cheap facsim iles, not t he originals. Well, t hat 's all right —who can afford t he originals? Each t able cont ains part ial inform at ion about your collect ion. For exam ple, t he doesn't t ell you which paint ings each art ist produced, and t he

artist t able

painting t able list s art ist

I Ds but not t heir nam es. To answer cert ain kinds of quest ions, you m ust com bine t he t wo t ables, and do so in a way t hat m at ches up records properly. The "m at ching up" part is a m at t er of writ ing an appropriat e

WHERE clause.

I n Recipe 12.2, I m ent ioned t hat perform ing

a full j oin generally is a bad idea because of t he am ount of out put produced. Anot her reason not t o perform a full j oin is t hat t he result m ay be m eaningless. The following full j oin bet ween t he

artist and painting t ables m akes t his clear. I t

includes no

WHERE clause, and

t hus produces out put t hat conveys no useful inform at ion:

mysql> SELECT * FROM artist, painting; +------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 2 | Monet | 1 | 1 | The Last Supper | IN | 34 | | 3 | Van Gogh | 1 | 1 | The Last Supper | IN | 34 | | 4 | Picasso | 1 | 1 | The Last Supper | IN | 34 | | 5 | Renoir | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 2 | Monet | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 1 | 2 | The Mona Lisa | MI | 87 | | 4 | Picasso | 1 | 2 | The Mona Lisa | MI | 87 | | 5 | Renoir | 1 | 2 | The Mona Lisa | MI | 87 |

| 1 | Da Vinci | 3 | 3 | Starry Night | KY | 48 | | 2 | Monet | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 4 | Picasso | 3 | 3 | Starry Night | KY | 48 | | 5 | Renoir | 3 | 3 | Starry Night | KY | 48 | | 1 | Da Vinci | 3 | 4 | The Potato Eaters | KY | 67 | | 2 | Monet | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 4 | Picasso | 3 | 4 | The Potato Eaters | KY | 67 | | 5 | Renoir | 3 | 4 | The Potato Eaters | KY | 67 | | 1 | Da Vinci | 3 | 5 | The Rocks | IA | 33 | | 2 | Monet | 3 | 5 | The Rocks | IA | 33 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 4 | Picasso | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 3 | 5 | The Rocks | IA | 33 | | 1 | Da Vinci | 5 | 6 | Les Deux Soeurs | NE | 64 | | 2 | Monet | 5 | 6 | Les Deux Soeurs | NE | 64 | | 3 | Van Gogh | 5 | 6 | Les Deux Soeurs | NE | 64 | | 4 | Picasso | 5 | 6 | Les Deux Soeurs | NE | 64 | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+ Clearly, you're not m aint aining t hese t ables t o m at ch up each art ist wit h each paint ing, which is what t he preceding query does. An unrest rict ed j oin in t his case produces not hing m ore t han a lot of out put wit h no value, so a

WHERE clause is essent ial t o give t he query m eaning.

For exam ple, t o produce a list of paint ings t oget her wit h t he art ist nam es, you can associat e records from t he t wo t ables using a sim ple

WHERE clause t hat

m at ches up values in t he

art ist I D colum n t hat is com m on t o bot h t ables and t hat serves as t he link bet ween t hem :

mysql> SELECT * FROM artist, painting -> WHERE artist.a_id = painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+ The colum n nam es in t he

WHERE clause include t able qualifiers t o m ake it

clear which

a_id

values t o com pare. The out put indicat es who paint ed each paint ing, and, conversely, which paint ings by each art ist are in your collect ion. However, t he out put is perhaps overly verbose.

a_id colum ns, for exam ple; one com es from t he artist t able, t he painting t able.) You m ay want t o see t he a_id values only once. Or

( I t includes t wo ident ical t he ot her from

you m ay not want t o see any I D colum ns at all. To exclude t hem , provide a colum n out put list t hat nam es specifically only t hose colum ns in which you're int erest ed:

mysql> SELECT artist.name, painting.title, painting.state, painting.price -> FROM artist, painting

-> WHERE artist.a_id = painting.a_id; +----------+-------------------+-------+-------+ | name | title | state | price | +----------+-------------------+-------+-------+ | Da Vinci | The Last Supper | IN | 34 | | Da Vinci | The Mona Lisa | MI | 87 | | Van Gogh | Starry Night | KY | 48 | | Van Gogh | The Potato Eaters | KY | 67 | | Van Gogh | The Rocks | IA | 33 | | Renoir | Les Deux Soeurs | NE | 64 | +----------+-------------------+-------+-------+ By adding ot her condit ions t o t he

WHERE clause, you can use row- m at ching queries t o

answer m ore specific quest ions, such as t he following:



Which paint ings did Van Gogh paint ? To answer t his quest ion, ident ify t he record from t he

artist t able t hat

m at ching records in t he

• • • • • • • • • •

• • • • •

painting t able, and select

t he t it le from t hose records:

Who paint ed "The Mona Lisa"? To find out , go in t he ot her direct ion, using inform at ion

painting t able t o find inform at ion in t he artist t able:

mysql> SELECT artist.name -> FROM artist, painting -> WHERE painting.title = 'The Mona Lisa' AND painting.a_id = artist.a_id; +----------+ | name | +----------+ | Da Vinci | +----------+ Which art ist s' paint ings did you purchase in Kent ucky or I ndiana? This is som ewhat sim ilar t o t he last query, but t est s a different colum n in t he t he init ial set of records t o be j oined wit h t he

• •

a_id value t o find

mysql> SELECT painting.title -> FROM artist, painting -> WHERE artist.name = 'Van Gogh' AND artist.a_id = painting.a_id; +-------------------+ | title | +-------------------+ | Starry Night | | The Potato Eaters | | The Rocks | +-------------------+

in t he

• • •

corresponds t o t he art ist nam e, use it s

mysql> SELECT DISTINCT artist.name -> FROM artist, painting

painting t able t o find

artist t able:

• • • • • •

-> WHERE painting.state IN ('KY','IN') AND artist.a_id = painting.a_id; +----------+ | name | +----------+ | Da Vinci | | Van Gogh | +----------+ The query also uses

DISTINCT t o display

DISTINCT and you'll see t hat

each art ist nam e j ust once. Try it wit hout

Van Gogh is list ed t wice—t hat 's because you

obt ained t wo Van Goghs in Kent ucky.



Joins can also be used wit h aggregat e funct ions t o produce sum m aries. For exam ple, t o find out how m any paint ings you have per art ist , use t his query:

• • • • • • • • • •

mysql> SELECT artist.name, COUNT(*) AS 'number of paintings' -> FROM artist, painting -> WHERE artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+---------------------+ | name | number of paintings | +----------+---------------------+ | Da Vinci | 2 | | Renoir | 1 | | Van Gogh | 3 | +----------+---------------------+ A m ore elaborat e query m ight also show how m uch you paid for each art ist 's paint ings, in t ot al and on average:

mysql> SELECT artist.name, -> COUNT(*) AS 'number of paintings', -> SUM(painting.price) AS 'total price', -> AVG(painting.price) AS 'average price' -> FROM artist, painting WHERE artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+---------------------+-------------+---------------+ | name | number of paintings | total price | average price | +----------+---------------------+-------------+---------------+ | Da Vinci | 2 | 121 | 60.5000 | | Renoir | 1 | 64 | 64.0000 | | Van Gogh | 3 | 148 | 49.3333 | +----------+---------------------+-------------+---------------+

artist t able for is list ed in t he artist

Not e t hat t he sum m ary queries produce out put only for t hose art ist s in t he whom you act ually have acquired paint ings. ( For exam ple, Monet

t able but is not present in t he sum m ary because you don't have any of his paint ings yet .) I f you want t he sum m ary t o include all art ist s, even if you have none of t heir paint ings yet , you

m ust use a different kind of j oin—specifically, a

LEFT JOIN.

See Recipe 12.6 and Recipe

12.9.

Joins and Indexes Because a j oin can easily cause MySQL t o process large num bers of row com binat ions, it 's a good idea t o m ake sure t hat t he colum ns you're com paring are indexed. Ot herwise, perform ance can drop off quickly as t able sizes increase. For t he

artist and painting t ables, j oins are m ade based on t he values in t he a_id colum n of each t able. I f you look back at t he CREATE TABLE st at em ent s t hat were shown for t hese t ables in Recipe 12.5, you'll see t hat a_id is indexed in each t able.

12.6 Finding Rows with No Match in Another Table 12.6.1 Problem You want t o find rows in one t able t hat have no m at ch in anot her. Or you want t o produce a list on t he basis of a j oin bet ween t ables, but you want t he list t o include an ent ry even when t here are no m at ches in t he second t able.

12.6.2 Solution Use a

LEFT JOIN. As of MySQL 3.23.25, you can also use a RIGHT JOIN.

12.6.3 Discussion The preceding sect ions focused on finding m at ches bet ween t wo t ables. But t he answers t o som e quest ions require det erm ining which records do not have a m at ch ( or, st at ed anot her way, which records have values t hat are m issing from t he ot her t able) . For exam ple, you m ight want t o know which art ist s in t he

artist t able you don't

yet have any paint ings by.

The sam e kind of quest ion occurs in ot her cont ext s, such as:



You're working in sales. You have a list of pot ent ial cust om ers, and anot her list of people who have placed orders. To focus your effort s on people who are not yet act ual cust om ers, you want t o find people in t he first list t hat are not in t he second.



You have one list of baseball players, anot her list of players who have hit hom e runs, and you want t o know which players in t he first list have not hit a hom e run. The answer is det erm ined by finding t hose players in t he first list who are not in t he second.

For t hese t ypes of quest ions, you need t o use a

LEFT JOIN.

To see why, let 's det erm ine which art ist s in t he

artist t able are m issing from

painting t able. At

present , t he t ables are sm all, so it 's easy t o exam ine t hem visually and

det erm ine t hat you have no paint ings by Monet and Picasso ( t here are no records wit h an

t he

a_id value of 2 or

painting

4) :

mysql> SELECT * FROM artist ORDER BY a_id; +------+----------+ | a_id | name | +------+----------+ | 1 | Da Vinci | | 2 | Monet | | 3 | Van Gogh | | 4 | Picasso | | 5 | Renoir | +------+----------+ mysql> SELECT * FROM painting ORDER BY a_id, p_id; +------+------+-------------------+-------+-------+ | a_id | p_id | title | state | price | +------+------+-------------------+-------+-------+ | 1 | 1 | The Last Supper | IN | 34 | | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | 3 | Starry Night | KY | 48 | | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | 5 | The Rocks | IA | 33 | | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+------+-------------------+-------+-------+ But as you acquire m ore paint ings and t he t ables get larger, it won't be so easy t o eyeball t hem and answer t he quest ion by inspect ion. Can you answer t he quest ion using SQL? Sure, alt hough first at t em pt s at solving t he problem generally look som et hing like t he following query, using a

WHERE clause t hat

looks for m ism at ches bet ween t he t wo t ables:

mysql> SELECT * FROM artist, painting WHERE artist.a_id != painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 2 | Monet | 1 | 1 | The Last Supper | IN | 34 | | 3 | Van Gogh | 1 | 1 | The Last Supper | IN | 34 | | 4 | Picasso | 1 | 1 | The Last Supper | IN | 34 | | 5 | Renoir | 1 | 1 | The Last Supper | IN | 34 | | 2 | Monet | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 1 | 2 | The Mona Lisa | MI | 87 | | 4 | Picasso | 1 | 2 | The Mona Lisa | MI | 87 | | 5 | Renoir | 1 | 2 | The Mona Lisa | MI | 87 | | 1 | Da Vinci | 3 | 3 | Starry Night | KY | 48 | | 2 | Monet | 3 | 3 | Starry Night | KY | 48 | | 4 | Picasso | 3 | 3 | Starry Night | KY | 48 | | 5 | Renoir | 3 | 3 | Starry Night | KY | 48 | | 1 | Da Vinci | 3 | 4 | The Potato Eaters | KY | 67 | | 2 | Monet | 3 | 4 | The Potato Eaters | KY | 67 | | 4 | Picasso | 3 | 4 | The Potato Eaters | KY | 67 |

| 5 | Renoir | 3 | 4 | The Potato Eaters | KY | 67 | | 1 | Da Vinci | 3 | 5 | The Rocks | IA | 33 | | 2 | Monet | 3 | 5 | The Rocks | IA | 33 | | 4 | Picasso | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 3 | 5 | The Rocks | IA | 33 | | 1 | Da Vinci | 5 | 6 | Les Deux Soeurs | NE | 64 | | 2 | Monet | 5 | 6 | Les Deux Soeurs | NE | 64 | | 3 | Van Gogh | 5 | 6 | Les Deux Soeurs | NE | 64 | | 4 | Picasso | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+ That 's obviously not t he correct result ! The query produces a list of all com binat ions of values from t he t wo rows where t he values aren't t he sam e, but what you really want is a list of values in

artist t hat

aren't present at all in

painting. The t rouble here is t hat

a

regular j oin can only produce com binat ions from values t hat are present in t he t ables. I t can't t ell you anyt hing about values t hat are m issing. When faced wit h t he problem of finding values in one t able t hat have no m at ch in ( or t hat are m issing from ) anot her t able, you should get in t he habit of t hinking, "aha, t hat 's a

JOIN problem ."

A

LEFT JOIN is sim ilar

LEFT

t o a regular j oin in t hat it at t em pt s t o m at ch rows

in t he first ( left ) t able wit h t he rows in t he second ( right ) t able. But in addit ion, if a left t able

LEFT JOIN st ill produces a row—one in which all t he colum ns from t he right t able are set t o NULL. This m eans you can find values t hat are m issing from t he right t able by looking for NULL. I t 's easier t o observe how t his happens by

row has no m at ch in t he right t able, a

working in st ages. First , run a regular j oin t o find m at ching rows:

mysql> SELECT * FROM artist, painting -> WHERE artist.a_id = painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+

a_id colum n com es from t he painting t able.

I n t his out put , t he first com es from

t he

artist t able and t he second one

LEFT JOIN is writ t en in som ewhat sim ilar fashion, but you separat e t he t able nam es by LEFT JOIN rat her t han by a com m a, and specify which colum ns t o com pare using an ON clause rat her t han a WHERE clause: Now com pare t hat result wit h t he out put you get from a

LEFT JOIN.

A

mysql> SELECT * FROM artist LEFT JOIN painting -> ON artist.a_id = painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 2 | Monet | NULL | NULL | NULL | NULL | NULL | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 4 | Picasso | NULL | NULL | NULL | NULL | NULL | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+

LEFT JOIN also produces an out put row for artist rows t hat have no painting t able m at ch. For out put rows, all t he colum ns from painting are set t o NULL. The out put is sim ilar t o t hat from t he regular j oin, except t hat t he

Next , t o rest rict t he out put only t o t he non-m at ched t hat looks for

artist rows,

NULL values in t he painting colum n t hat

t hose

WHERE clause is nam ed in t he ON clause: add a

mysql> SELECT * FROM artist LEFT JOIN painting -> ON artist.a_id = painting.a_id -> WHERE painting.a_id IS NULL; +------+---------+------+------+-------+-------+ | a_id | name | a_id | p_id | title | price | +------+---------+------+------+-------+-------+ | 2 | Monet | NULL | NULL | NULL | NULL | | 4 | Picasso | NULL | NULL | NULL | NULL | +------+---------+------+------+-------+-------+ Finally, t o show only t he

artist t able values t hat

are m issing from t he

short en t he out put colum n list t o include only colum ns from t he

painting t able,

artist t able:

mysql> SELECT artist.* FROM artist LEFT JOIN painting -> ON artist.a_id = painting.a_id -> WHERE painting.a_id IS NULL; +------+---------+ | a_id | name | +------+---------+ | 2 | Monet | | 4 | Picasso | +------+---------+ The preceding

LEFT JOIN list s t hose left -t able values t hat

are not present in t he right

t able. A sim ilar kind of operat ion can be used t o report each left - t able value along wit h an indicat or whet her or not it 's present in t he right t able. To do t his, perform a

LEFT JOIN t o

count t he num ber of t im es each left - t able value occurs in t he right t able. A count of zero

indicat es t he value is not present . The following query list s each art ist from t he

artist

t able, and whet her or not you have any paint ings by t he art ist :

mysql> SELECT artist.name, -> IF(COUNT(painting.a_id)>0,'yes','no') AS 'in collection' -> FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+---------------+ | name | in collection | +----------+---------------+ | Da Vinci | yes | | Monet | no | | Picasso | no | | Renoir | yes | | Van Gogh | yes | +----------+---------------+

RIGHT JOIN, which is like LEFT JOIN but t ables. I n ot her words, RIGHT JOIN forces t he

As of MySQL 3.23.25, you can also use reverses t he roles of t he left and right

m at ching process t o produce a row from each t able in t he right t able, even in t he absence of a corresponding row in t he left t able. This m eans you would rewrit e t he preceding

JOIN as follows t o convert

it t o a

RIGHT JOIN t hat

LEFT

produces t he sam e result s:

mysql> SELECT artist.name, -> IF(COUNT(painting.a_id)>0,'yes','no') AS 'in collection' -> FROM painting RIGHT JOIN artist ON painting.a_id = artist.a_id -> GROUP BY artist.name; +----------+---------------+ | name | in collection | +----------+---------------+ | Da Vinci | yes | | Monet | no | | Picasso | no | | Renoir | yes | | Van Gogh | yes | +----------+---------------+ Elsewhere in t his book, I 'll generally refer in discussion only t o t he discussions apply t o

LEFT JOIN for

brevit y, but

RIGHT JOIN as well if you reverse t he roles of t he t ables.

Other Ways to Write LEFT JOIN and RIGHT JOIN Queries When t he nam es of t he colum ns t o be m at ched are t he sam e in bot h t ables, an

ON can be used for writ ing LEFT JOIN and RIGHT JOIN queries. This synt ax subst it ut es USING for ON. For exam ple, t he following t wo

alt ernat ive t o

queries are equivalent :

SELECT t1.n, t2.n FROM t1 LEFT JOIN t2 ON t1.n = t2.n; SELECT t1.n, t2.n FROM t1 LEFT JOIN t2 USING (n); As are t hese:

SELECT t1.n, t2.n FROM t1 RIGHT JOIN t2 ON t1.n = t2.n; SELECT t1.n, t2.n FROM t1 RIGHT JOIN t2 USING (n); I n t he special case t hat you want t o base t he com parison on all colum ns t hat appear in bot h t ables, you can use

NATURAL LEFT JOIN or NATURAL RIGHT

JOIN: SELECT t1.n, t2.n FROM t1 NATURAL LEFT JOIN t2; SELECT t1.n, t2.n FROM t1 NATURAL RIGHT JOIN t2;

12.6.4 See Also LEFT JOIN is useful for finding values wit h no m at ch in anot her t able, or for showing whet her each value is m at ched. LEFT JOIN m ay also be used for As shown in t his sect ion,

producing a sum m ary t hat includes all it em s in a list , even t hose for which t here's not hing t o sum m arize. This is very com m on for charact erizing t he relat ionship bet ween a m ast er t able and a det ail t able. For exam ple, a

LEFT JOIN can produce "t ot al sales per

cust om er"

report s t hat list all cust om ers, even t hose who haven't bought anyt hing during t he sum m ary period. ( See Recipe 12.9.) Anot her applicat ion of

LEFT JOIN is for

perform ing consist ency checking when you receive

t wo dat afiles t hat are supposed t o be relat ed, and you want t o det erm ine whet her t hey really are. ( That is, you want t o check t he int egrit y of t he relat ionship.) I m port each file int o a MySQL t able, t hen run a couple of

LEFT JOIN st at em ent s t o det erm ine whet her t here are

unat t ached records in one t able or t he ot her—t hat is, records t hat have no m at ch in t he ot her t able. ( I f t here are any such records and you want t o delet e t hem , see Recipe 12.22.)

12.7 Finding Rows Containing Per-Group Minimum or Maximum Values 12.7.1 Problem You want t o find which record wit hin each group of rows in a t able cont ains t he m axim um or m inim um value for a given colum n. For exam ple, you want t o det erm ine t he m ost expensive paint ing in your collect ion for each art ist .

12.7.2 Solution Creat e a t em porary t able t o hold t he per-group m axim um or m inim um , t hen j oin t he t em porary t able wit h t he original one t o pull out t he m at ching record for each group.

12.7.3 Discussion Many quest ions involve finding largest or sm allest values in a part icular t able colum n, but it 's also com m on t o want t o know what t he ot her values are in t he row t hat cont ains t he value. For exam ple, you can use

states t able, but

MAX(pop) t o find t he largest

st at e populat ion recorded in t he

you m ight also want t o know which st at e has t hat populat ion. As shown

in Recipe 7.6, one way t o solve t his problem is t o use a SQL variable. The t echnique works like t his:

mysql> SELECT @max := MAX(pop) FROM states; mysql> SELECT * FROM states WHERE pop = @max; +------------+--------+------------+----------+ | name | abbrev | statehood | pop | +------------+--------+------------+----------+ | California | CA | 1850-09-09 | 29760021 | +------------+--------+------------+----------+ Anot her way t o answer t he quest ion is t o use a j oin. First , select t he m axim um populat ion value int o a t em porary t able:

mysql> CREATE TABLE tmp SELECT MAX(pop) as maxpop FROM states; Then j oin t he t em porary t able t o t he original one t o find t he record m at ching t he select ed populat ion:

mysql> SELECT states.* FROM states, tmp WHERE states.pop = tmp.maxpop; +------------+--------+------------+----------+ | name | abbrev | statehood | pop | +------------+--------+------------+----------+ | California | CA | 1850-09-09 | 29760021 | +------------+--------+------------+----------+

By applying t hese t echniques t o t he

artist and painting t ables, you can answer

quest ions like "What is t he m ost expensive paint ing in t he collect ion, and who paint ed it ?" To use a SQL variable, st ore t he highest price in it , t hen use t he variable t o ident ify t he record cont aining t he price so you can ret rieve ot her colum ns from it :

mysql> SELECT @max_price := MAX(price) FROM painting; mysql> SELECT artist.name, painting.title, painting.price -> FROM artist, painting -> WHERE painting.price = @max_price -> AND painting.a_id = artist.a_id; +----------+---------------+-------+ | name | title | price | +----------+---------------+-------+ | Da Vinci | The Mona Lisa | 87 | +----------+---------------+-------+ The sam e t hing can be done by creat ing a t em porary t able t o hold t he m axim um price, and t hen j oining it wit h t he ot her t ables:

mysql> CREATE TABLE tmp SELECT MAX(price) AS max_price FROM painting; mysql> SELECT artist.name, painting.title, painting.price -> FROM artist, painting, tmp -> WHERE painting.price = tmp.max_price -> AND painting.a_id = artist.a_id; +----------+---------------+-------+ | name | title | price | +----------+---------------+-------+ | Da Vinci | The Mona Lisa | 87 | +----------+---------------+-------+ On t he face of it , using a t em porary t able and a j oin is j ust a m ore com plicat ed way of answering t he quest ion. Does t his t echnique have any pract ical value? Yes, it does, because it leads t o a m ore general t echnique for answering m ore difficult quest ions. The previous queries show inform at ion only for t he m ost expensive single paint ing in t he ent ire

painting t able.

What if your quest ion is, "What is t he m ost expensive paint ing per art ist ?" You can't use a SQL variable t o answer t hat quest ion, because t he answer requires finding one price per art ist , and a variable can hold only a single value at a t im e. But t he t echnique of using a t em porary t able works well, because t he t able can hold m ult iple values and a j oin can find m at ches for t hem all at once. To answer t he quest ion, select each art ist I D and t he corresponding m axim um paint ing price int o a t em porary t able. The t able will cont ain not j ust t he m axim um paint ing price, but t he m axim um wit hin each group, where "group" is defined as "paint ings by a given art ist ." Then use t he art ist I Ds and prices st ored in t he

painting t able, and j oin t he result mysql> -> mysql> -> -> ->

wit h

tmp t able t o m at ch records in t he

artist t o get

t he art ist nam es:

CREATE TABLE tmp SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id; SELECT artist.name, painting.title, painting.price FROM artist, painting, tmp WHERE painting.a_id = tmp.a_id AND painting.price = tmp.max_price

-> AND painting.a_id = artist.a_id; +----------+-------------------+-------+ | name | title | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa | 87 | | Van Gogh | The Potato Eaters | 67 | | Renoir | Les Deux Soeurs | 64 | +----------+-------------------+-------+ The sam e t echnique works for ot her kinds of values, such as t em poral values. Consider t he

driver_log t able t hat

list s drivers and t rips t hat t hey've t aken:

mysql> SELECT name, trav_date, miles -> FROM driver_log -> ORDER BY name, trav_date; +-------+------------+-------+ | name | trav_date | miles | +-------+------------+-------+ | Ben | 2001-11-29 | 131 | | Ben | 2001-11-30 | 152 | | Ben | 2001-12-02 | 79 | | Henry | 2001-11-26 | 115 | | Henry | 2001-11-27 | 96 | | Henry | 2001-11-29 | 300 | | Henry | 2001-11-30 | 203 | | Henry | 2001-12-01 | 197 | | Suzi | 2001-11-29 | 391 | | Suzi | 2001-12-02 | 502 | +-------+------------+-------+ One t ype of m axim um -per-group problem for t his t able is, "show t he m ost recent t rip for each driver." I t can be solved like t his:

mysql> CREATE TABLE tmp -> SELECT name, MAX(trav_date) AS trav_date -> FROM driver_log GROUP BY name; mysql> SELECT driver_log.name, driver_log.trav_date, driver_log.miles -> FROM driver_log, tmp -> WHERE driver_log.name = tmp.name -> AND driver_log.trav_date = tmp.trav_date -> ORDER BY driver_log.name; +-------+------------+-------+ | name | trav_date | miles | +-------+------------+-------+ | Ben | 2001-12-02 | 79 | | Henry | 2001-12-01 | 197 | | Suzi | 2001-12-02 | 502 | +-------+------------+-------+

12.7.4 See Also The t echnique illust rat ed in t his sect ion shows how t o answer m axim um - per-group quest ions by select ing sum m ary inform at ion int o a t em porary t able and j oining t hat t able t o t he original one. This t echnique has m any applicat ions. One such applicat ion is calculat ion of t eam

st andings, where t he st andings for each group of t eam s are det erm ined by com paring each t eam in t he group t o t he t eam wit h t he best record. Recipe 12.8 discusses how t o do t his.

12.8 Computing Team Standings 12.8.1 Problem You want t o com put e t eam st andings from t heir win- loss records, including t he gam es-behind ( GB) values.

12.8.2 Solution Det erm ine which t eam is in first place, t hen j oin t hat result t o t he original records.

12.8.3 Discussion St andings for sport s t eam s t hat com pet e against each ot her t ypically are ranked according t o who has t he best win- loss record, and t he t eam s not in first place are assigned a "gam esbehind" value indicat ing how m any gam es out of first place t hey are. This sect ion shows how t o calculat e t hose values. The first exam ple uses a t able cont aining a single set of t eam records, t o illust rat e t he logic of t he calculat ions. The second exam ple uses a t able cont aining several set s of records; in t his case, it 's necessary t o use a j oin t o perform t he calculat ions independent ly for each group of t eam s. Consider t he following t able,

standings1, which cont ains a single set

of baseball t eam

records ( t hey represent t he final st andings for t he Nort hern League in t he year 1902) :

mysql> SELECT team, wins, losses FROM standings1 -> ORDER BY wins-losses DESC; +-------------+------+--------+ | team | wins | losses | +-------------+------+--------+ | Winnipeg | 37 | 20 | | Crookston | 31 | 25 | | Fargo | 30 | 26 | | Grand Forks | 28 | 26 | | Devils Lake | 19 | 31 | | Cavalier | 15 | 32 | +-------------+------+--------+ The records are sort ed by t he win- loss different ial, which is how t o place t eam s in order from first place t o last place. But displays of t eam st andings t ypically include each t eam 's winning percent age and a figure indicat ing how m any gam es behind t he leader all t he ot her t eam s are. So let 's add t hat inform at ion t o t he out put . Calculat ing t he percent age is easy. I t 's t he rat io of wins t o t ot al gam es played and can be det erm ined using t his expression:

wins / (wins + losses)

I f you want t o perform st andings calculat ions under condit ions when a t eam m ay not have played any gam es yet , t hat expression evaluat es t o

NULL because it

involves a division by

zero. For sim plicit y, I 'll assum e a nonzero num ber of gam es, but if you want t o handle t his condit ion by m apping

NULL t o zero, generalize t he expression as follows:

IFNULL(wins / (wins + losses),0) or as:

wins / IF(wins=0,1,wins + losses) Det erm ining t he gam es-behind value is a lit t le t rickier. I t 's based on t he relat ionship of t he win- loss records for t wo t eam s, calculat ed as t he average of t wo values:



The num ber of gam es t he second place t eam m ust win t o have t he sam e num ber of wins as t he first place t eam



The num ber of gam es t he first place t eam m ust lose t o have t he sam e num ber of losses as t he second place t eam

For exam ple, suppose t wo t eam s A and B have t he following win- loss records:

+------+------+--------+ | team | wins | losses | +------+------+--------+ | A | 17 | 11 | | B | 14 | 12 | +------+------+--------+ Here, t eam B has t o win t hree m ore gam es and t eam A has t o lose one m ore gam e for t he t eam s t o be even. The average of t hree and one is t wo, t hus B is t wo gam es behind A. Mat hem at ically, t he gam es-behind calculat ion for t he t wo t eam s can be expressed like t his:

((winsA - winsB) + (lossesB - lossesA)) / 2 Wit h a lit t le rearrangem ent of t erm s, t he expression becom es:

((winsA - lossesA) - (winsB - lossesB)) / 2 The second expression is equivalent t o t he first , but it has each fact or writ t en as a single t eam 's win- loss different ial, rat her t han as a com parison bet ween t eam s. That m akes it easier t o work wit h, because each fact or can be det erm ined independent ly from a single t eam record. The first fact or represent s t he first place t eam 's win- loss different ial, so if we calculat e t hat value first , all t he ot her t eam GB values can be det erm ined in relat ion t o it . The first place t eam is t he one wit h t he largest win- loss different ial. To find t hat value and save it in a variable, use t his query:

mysql> SELECT @wl_diff := MAX(wins-losses) FROM standings1; +------------------------------+ | @wl_diff := MAX(wins-losses) | +------------------------------+ | 17 | +------------------------------+ Then use t he different ial as follows t o produce t eam st andings t hat include winning percent age and GB values:

mysql> SELECT team, wins AS W, losses AS L, -> wins/(wins+losses) AS PCT, -> (@wl_diff - (wins-losses)) / 2 AS GB -> FROM standings1 -> ORDER BY wins-losses DESC, PCT DESC; +-------------+------+------+------+------+ | team | W | L | PCT | GB | +-------------+------+------+------+------+ | Winnipeg | 37 | 20 | 0.65 | 0 | | Crookston | 31 | 25 | 0.55 | 5.5 | | Fargo | 30 | 26 | 0.54 | 6.5 | | Grand Forks | 28 | 26 | 0.52 | 7.5 | | Devils Lake | 19 | 31 | 0.38 | 14.5 | | Cavalier | 15 | 32 | 0.32 | 17 | +-------------+------+------+------+------+ There are a couple of m inor form at t ing issues t hat can be addressed at t his point . Percent ages in st andings generally are displayed t o t hree decim als, and t he GB value for t he first place

- rat her t han as 0. To display t hree decim als, TRUNCATE(expr,3) can be used. To display t he GB value for t eam is displayed as

t he first place t eam

appropriat ely, put t he expression t hat calculat es t he GB colum n wit hin a call t o

IF( ) t hat

m aps 0 t o a dash:

mysql> SELECT team, wins AS W, losses AS L, -> TRUNCATE(wins/(wins+losses),3) AS PCT, -> IF((@wl_diff - (wins-losses)) = 0,'-',(@wl_diff - (wins-losses))/2) AS GB -> FROM standings1 -> ORDER BY wins-losses DESC, PCT DESC; +-------------+------+------+-------+------+ | team | W | L | PCT | GB | +-------------+------+------+-------+------+ | Winnipeg | 37 | 20 | 0.649 | | | Crookston | 31 | 25 | 0.553 | 5.5 | | Fargo | 30 | 26 | 0.535 | 6.5 | | Grand Forks | 28 | 26 | 0.518 | 7.5 | | Devils Lake | 19 | 31 | 0.380 | 14.5 | | Cavalier | 15 | 32 | 0.319 | 17 | +-------------+------+------+-------+------+ These queries order t he t eam s by win- loss different ial, using winning percent age as a t iebreaker in case t here are t eam s wit h t he sam e different ial value. I t would be sim pler j ust t o sort by percent age, of course, but t hen you wouldn't always get t he correct ordering. I t 's a

curious fact t hat a t eam wit h a lower winning percent age can act ually be higher in t he st andings t han a t eam wit h a higher percent age. ( This generally occurs early in t he season, when t eam s m ay have played highly disparat e num bers of gam es, relat ively speaking.) Consider t he case where t wo t eam s A and B have t he following records:

+------+------+--------+ | team | wins | losses | +------+------+--------+ | A | 4 | 1 | | B | 2 | 0 | +------+------+--------+ Applying t he GB and percent age calculat ions t o t hese t eam records yields t he following result , where t he first place t eam act ually has a lower winning percent age t han t he second place t eam :

+------+------+------+-------+------+ | team | W | L | PCT | GB | +------+------+------+-------+------+ | A | 4 | 1 | 0.800 | | | B | 2 | 0 | 1.000 | 0.5 | +------+------+------+-------+------+ The st andings calculat ions shown t hus far can be done wit hout a j oin. They involve only a single set of t eam records, so t he first place t eam 's win- loss different ial can be st ored in a variable. A m ore com plex sit uat ion occurs when a dat aset includes several set s of t eam records. For exam ple, t he 1997 Nort hern League had t wo divisions ( East ern and West ern) . I n addit ion, separat e st andings were m aint ained for t he first and second halves of t he season, because season- half winners in each division played each ot her for t he right t o com pet e in t he league cham pionship. The following t able,

standings2, shows what

t hese records look

like, ordered by season half, division, and win- loss different ial:

mysql> SELECT half, div, team, wins, losses FROM standings2 -> ORDER BY half, div, wins-losses DESC; +------+---------+-----------------+------+--------+ | half | div | team | wins | losses | +------+---------+-----------------+------+--------+ | 1 | Eastern | St. Paul | 24 | 18 | | 1 | Eastern | Thunder Bay | 18 | 24 | | 1 | Eastern | Duluth-Superior | 17 | 24 | | 1 | Eastern | Madison | 15 | 27 | | 1 | Western | Winnipeg | 29 | 12 | | 1 | Western | Sioux City | 28 | 14 | | 1 | Western | Fargo-Moorhead | 21 | 21 | | 1 | Western | Sioux Falls | 15 | 27 | | 2 | Eastern | Duluth-Superior | 22 | 20 | | 2 | Eastern | St. Paul | 21 | 21 | | 2 | Eastern | Madison | 19 | 23 | | 2 | Eastern | Thunder Bay | 18 | 24 | | 2 | Western | Fargo-Moorhead | 26 | 16 | | 2 | Western | Winnipeg | 24 | 18 | | 2 | Western | Sioux City | 22 | 20 | | 2 | Western | Sioux Falls | 16 | 26 |

+------+---------+-----------------+------+--------+ Generat ing t he st andings for t hese records requires com put ing t he GB values separat ely for each of t he four com binat ions of season half and division. Begin by calculat ing t he win- loss different ial for t he first place t eam in each group and saving t he values int o a separat e

firstplace t able: mysql> -> -> ->

CREATE TABLE firstplace SELECT half, div, MAX(wins-losses) AS wl_diff FROM standings2 GROUP BY half, div;

Then j oin t he

firstplace t able t o t he original st andings, associat ing each t eam

record

wit h t he proper win- loss different ial t o com put e it s GB value:

mysql> SELECT wl.half, wl.div, wl.team, wl.wins AS W, wl.losses AS L, -> TRUNCATE(wl.wins/(wl.wins+wl.losses),3) AS PCT, -> IF((fp.wl_diff - (wl.wins-wl.losses)) = 0, -> '-', (fp.wl_diff - (wl.wins-wl.losses)) / 2) AS GB -> FROM standings2 AS wl, firstplace AS fp -> WHERE wl.half = fp.half AND wl.div = fp.div -> ORDER BY wl.half, wl.div, wl.wins-wl.losses DESC, PCT DESC; +------+---------+-----------------+------+------+-------+-------+ | half | div | team | W | L | PCT | GB | +------+---------+-----------------+------+------+-------+-------+ | 1 | Eastern | St. Paul | 24 | 18 | 0.571 | | | 1 | Eastern | Thunder Bay | 18 | 24 | 0.428 | 6.00 | | 1 | Eastern | Duluth-Superior | 17 | 24 | 0.414 | 6.50 | | 1 | Eastern | Madison | 15 | 27 | 0.357 | 9.00 | | 1 | Western | Winnipeg | 29 | 12 | 0.707 | | | 1 | Western | Sioux City | 28 | 14 | 0.666 | 1.50 | | 1 | Western | Fargo-Moorhead | 21 | 21 | 0.500 | 8.50 | | 1 | Western | Sioux Falls | 15 | 27 | 0.357 | 14.50 | | 2 | Eastern | Duluth-Superior | 22 | 20 | 0.523 | | | 2 | Eastern | St. Paul | 21 | 21 | 0.500 | 1.00 | | 2 | Eastern | Madison | 19 | 23 | 0.452 | 3.00 | | 2 | Eastern | Thunder Bay | 18 | 24 | 0.428 | 4.00 | | 2 | Western | Fargo-Moorhead | 26 | 16 | 0.619 | | | 2 | Western | Winnipeg | 24 | 18 | 0.571 | 2.00 | | 2 | Western | Sioux City | 22 | 20 | 0.523 | 4.00 | | 2 | Western | Sioux Falls | 16 | 26 | 0.380 | 10.00 | +------+---------+-----------------+------+------+-------+-------+ That out put is som ewhat difficult t o read, however. To m ake it easier t o underst and, you'd likely execut e t he query from wit hin a program and reform at it s result s t o display each set of t eam records separat ely. Here's som e Perl code t hat does t hat by beginning a new out put group each t im e it encount ers a new group of st andings. The code assum es t hat t he j oin query has j ust been execut ed and t hat it s result s are available t hrough t he st at em ent handle

$sth: my ($cur_half, $cur_div) = ("", ""); while (my ($half, $div, $team, $wins, $losses, $pct, $gb) = $sth->fetchrow_array ( ))

{ if ($cur_half ne $half || $cur_div ne $div) # new group of standings? { # print standings header and remember new half/division values print "\n$div Division, season half $half\n"; printf "%-20s %3s %3s %5s %s\n", "Team", "W", "L", "PCT", "GB"; $cur_half = $half; $cur_div = $div; } printf "%-20s %3d %3d %5s %s\n", $team, $wins, $losses, $pct, $gb; } The reform at t ed out put looks like t his:

Eastern Division, season half 1 Team W L St. Paul 24 18 Thunder Bay 18 24 Duluth-Superior 17 24 Madison 15 27

PCT 0.57 0.43 0.41 0.36

GB 6.00 6.50 9.00

Western Division, season half 1 Team W L Winnipeg 29 12 Sioux City 28 14 Fargo-Moorhead 21 21 Sioux Falls 15 27

PCT 0.71 0.67 0.50 0.36

GB 1.50 8.50 14.50

Eastern Division, season half 2 Team W L Duluth-Superior 22 20 St. Paul 21 21 Madison 19 23 Thunder Bay 18 24

PCT 0.52 0.50 0.45 0.43

GB 1.00 3.00 4.00

Western Division, season half 2 Team W L Fargo-Moorhead 26 16 Winnipeg 24 18 Sioux City 22 20 Sioux Falls 16 26

PCT 0.62 0.57 0.52 0.38

GB 2.00 4.00 10.00

The code j ust shown t hat produces plain t ext out put com es from t he script calc_st andings.pl in t he j oins direct ory of t he

recipes dist ribut ion. That

direct ory also cont ains a PHP script ,

calc_st andings.php, t hat t akes t he alt ernat ive approach of producing out put in t he form of HTML t ables, which you m ight prefer for generat ing st andings in a web environm ent .

12.9 Producing Master-Detail Lists and Summaries 12.9.1 Problem Two relat ed t ables have a m ast er-det ail relat ionship and you want t o produce a list t hat shows each m ast er record wit h it s det ail records, or a list t hat sum m arizes t he det ail records for each m ast er record.

12.9.2 Solution The solut ion t o t his problem involves a j oin, but t he t ype of j oin depends on t he quest ion you want answered. To produce a list cont aining only m ast er records for which som e det ail record exist s, use a regular j oin based on t he prim ary key in t he m ast er t able. To produce a list t hat includes ent ries for all m ast er records, even t hose t hat have no det ail records, use a

LEFT

JOIN. 12.9.3 Discussion I t 's oft en useful t o produce a list from t wo relat ed t ables. For t ables t hat have a m ast er-det ail or parent -child relat ionship, a given record in one t able m ight be m at ched by several records in t he ot her. This sect ion shows som e quest ions of t his t ype t hat you can ask ( and answer) , using t he

artist and painting t ables from

earlier in t he chapt er.

One form of m ast er-det ail quest ion for t hese t ables is, "Which art ist paint ed each paint ing?" This is a sim ple j oin t hat m at ches each

painting record t o it s corresponding artist

record based on t he art ist I D values:

mysql> SELECT artist.name, painting.title -> FROM artist, painting WHERE artist.a_id = painting.a_id -> ORDER BY 1, 2; +----------+-------------------+ | name | title | +----------+-------------------+ | Da Vinci | The Last Supper | | Da Vinci | The Mona Lisa | | Renoir | Les Deux Soeurs | | Van Gogh | Starry Night | | Van Gogh | The Potato Eaters | | Van Gogh | The Rocks | +----------+-------------------+ That t ype of j oin suffices, as long as you want t o list only m ast er records t hat have det ail records. However, anot her form of m ast er-det ail quest ion you can ask is, "Which paint ings did each art ist paint ?" That quest ion is sim ilar, but not quit e ident ical. I t will have a different answer if t here are art ist s list ed in t he

artist t able t hat

painting t able, and t he quest ion requires a different

are not represent ed in t he

query t o produce t he proper answer.

I n t hat case, t he j oin out put should include records in one t able t hat have no m at ch in t he ot her. That 's a form of "find t he non-m at ching records" problem ( Recipe 12.6) , so t o list each

artist record, whet her JOIN:

or not t here are any

painting records for

it , use a

LEFT

mysql> SELECT artist.name, painting.title -> FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id -> ORDER BY 1, 2;

+----------+-------------------+ | name | title | +----------+-------------------+ | Da Vinci | The Last Supper | | Da Vinci | The Mona Lisa | | Monet | NULL | | Picasso | NULL | | Renoir | Les Deux Soeurs | | Van Gogh | Starry Night | | Van Gogh | The Potato Eaters | | Van Gogh | The Rocks | +----------+-------------------+ The rows in t he result t hat have list ed in t he

artist t able for

NULL in t he title colum n correspond t o art ist s t hat

are

whom you have no paint ings.

The sam e principles apply when producing sum m aries using m ast er and det ail t ables. For exam ple, t o sum m arize your art collect ion by num ber of paint ings per paint er, you m ight ask, "how m any paint ings are t here per art ist in t he

painting t able?"

To find t he answer based

on art ist I D, you can count up t he paint ings easily wit h t his query:

mysql> SELECT a_id, COUNT(a_id) AS count FROM painting GROUP BY a_id; +------+-------+ | a_id | count | +------+-------+ | 1 | 2 | | 3 | 3 | | 5 | 1 | +------+-------+ Of course, t hat out put is essent ially m eaningless unless you have all t he art ist I D num bers m em orized. To display t he art ist s by nam e rat her t han I D, j oin t he

painting t able t o t he

artist t able: mysql> SELECT artist.name AS painter, COUNT(painting.a_id) AS count -> FROM artist, painting -> WHERE artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+-------+ | painter | count | +----------+-------+ | Da Vinci | 2 | | Renoir | 1 | | Van Gogh | 3 | +----------+-------+ On t he ot her hand, you m ight ask, "How m any paint ings did each art ist paint ?" This is t he sam e quest ion as t he previous one ( and t he sam e query answers it ) , as long as every art ist in

artist t able has at least one corresponding painting t able record. But art ist s in t he artist t able t hat are not yet represent ed by any paint ings in your t he

if you have collect ion,

t hey will not appear in t he query out put . To produce a count - per-art ist sum m ary t hat includes even art ist s wit h no paint ings in t he

painting t able, use a LEFT JOIN:

mysql> SELECT artist.name AS painter, COUNT(painting.a_id) AS count -> FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+-------+ | painter | count | +----------+-------+ | Da Vinci | 2 | | Monet | 0 | | Picasso | 0 | | Renoir | 1 | | Van Gogh | 3 | +----------+-------+ Beware of a subt le error t hat is easy t o m ake when writ ing t hat kind of query. Suppose you writ e it slight ly different ly, like so:

mysql> SELECT artist.name AS painter, COUNT(*) AS count -> FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+-------+ | painter | count | +----------+-------+ | Da Vinci | 2 | | Monet | 1 | | Picasso | 1 | | Renoir | 1 | | Van Gogh | 3 | +----------+-------+ Now every art ist appears t o have at least one paint ing. Why t he difference? The cause of t he problem is t hat t he query uses The way

COUNT(*) rat her

LEFT JOIN works for

t han

COUNT(painting.a_id).

unm at ched rows in t he left t able is t hat it generat es a row

NULL. I n t he exam ple, t he right t able is painting. The query t hat uses COUNT(painting.a_id) works correct ly, because COUNT(expr) doesn't count NULL values. The query t hat uses COUNT(*)

wit h all t he colum ns from t he right t able set t o

works incorrect ly because it count s all values, even for rows corresponding t o m issing art ist s.

LEFT JOIN is suit able for

ot her t ypes of sum m aries as well. To produce addit ional colum ns

showing t he t ot al and average values of t he paint ings for each art ist in t he

artist t able,

use t his query:

mysql> -> -> -> -> ->

SELECT artist.name AS painter, COUNT(painting.a_id) AS 'number of paintings', SUM(painting.price) AS 'total price', AVG(painting.price) AS 'average price' FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id GROUP BY artist.name;

+----------+---------------------+-------------+---------------+ | painter | number of paintings | total price | average price | +----------+---------------------+-------------+---------------+ | Da Vinci | 2 | 121 | 60.5000 | | Monet | 0 | 0 | NULL | | Picasso | 0 | 0 | NULL | | Renoir | 1 | 64 | 64.0000 | | Van Gogh | 3 | 148 | 49.3333 | +----------+---------------------+-------------+---------------+

COUNT( ) and SUM( ) are zero for art ist s t hat are not represent ed, but AVG( ) is NULL. That 's because AVG( ) is com put ed as t he sum over t he count ; if t he Not e t hat

count is zero, t he value is undefined. To display an average value of zero in t hat case, m odify t he query t o t est t he value of

AVG( ) wit h IFNULL( ):

mysql> SELECT artist.name AS painter, -> COUNT(painting.a_id) AS 'number of paintings', -> SUM(painting.price) AS 'total price', -> IFNULL(AVG(painting.price),0) AS 'average price' -> FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+---------------------+-------------+---------------+ | painter | number of paintings | total price | average price | +----------+---------------------+-------------+---------------+ | Da Vinci | 2 | 121 | 60.5000 | | Monet | 0 | 0 | 0 | | Picasso | 0 | 0 | 0 | | Renoir | 1 | 64 | 64.0000 | | Van Gogh | 3 | 148 | 49.3333 | +----------+---------------------+-------------+---------------+

12.10 Using a Join to Fill in Holes in a List 12.10.1 Problem You want t o produce a sum m ary for each of several cat egories, but som e of t he cat egories are not represent ed in t he dat a t o be sum m arized. Consequent ly, t he sum m ary has m issing cat egories.

12.10.2 Solution Creat e a reference t able t hat list s each cat egory and produce t he sum m ary based on a

JOIN bet ween t he list

and t he t able cont aining your dat a. Then every cat egory in t he

reference t able will appear in t he result , even "em pt y" ones.

12.10.3 Discussion

LEFT

When you run a sum m ary query, norm ally it produces ent ries only for t he values t hat are act ually present in t he dat a. Let 's say you want t o produce a t im e-of- day sum m ary for t he records in t he

mail t able, which looks like t his:

mysql> SELECT * FROM mail; +---------------------+---------+---------+---------+---------+---------+ | t | srcuser | srchost | dstuser | dsthost | size | +---------------------+---------+---------+---------+---------+---------+ | 2001-05-11 10:15:08 | barb | saturn | tricia | mars | 58274 | | 2001-05-12 12:48:13 | tricia | mars | gene | venus | 194925 | | 2001-05-12 15:02:49 | phil | mars | phil | saturn | 1048 | | 2001-05-13 13:59:18 | barb | saturn | tricia | venus | 271 | | 2001-05-14 09:31:37 | gene | venus | barb | mars | 2291 | | 2001-05-14 11:52:17 | phil | mars | tricia | saturn | 5781 | ... To det erm ine how m any m essages were sent for each hour of t he day, use t he following query:

mysql> SELECT HOUR(t) AS hour, COUNT(HOUR(t)) AS count -> FROM mail GROUP BY hour; +------+-------+ | hour | count | +------+-------+ | 7 | 1 | | 8 | 1 | | 9 | 2 | | 10 | 2 | | 11 | 1 | | 12 | 2 | | 13 | 1 | | 14 | 1 | | 15 | 1 | | 17 | 2 | | 22 | 1 | | 23 | 1 | +------+-------+ However, t his sum m ary is incom plet e in t he sense t hat it includes ent ries only for t hose hours of t he day represent ed in t he

mail t able. To produce a sum m ary

t hat includes all hours of

t he day, even t hose during which no m essages were sent , creat e a reference t able t hat list s each hour:

mysql> mysql> -> ->

CREATE TABLE ref (h INT); INSERT INTO ref (h) VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11), (12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);

Then j oin t he reference t able t o t he

mail t able using a LEFT JOIN:

mysql> SELECT ref.h AS hour, COUNT(HOUR(mail.t)) AS count -> FROM ref LEFT JOIN mail ON ref.h = HOUR(mail.t) -> GROUP BY hour;

+------+-------+ | hour | count | +------+-------+ | 0 | 0 | | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | 0 | | 5 | 0 | | 6 | 0 | | 7 | 1 | | 8 | 1 | | 9 | 2 | | 10 | 2 | | 11 | 1 | | 12 | 2 | | 13 | 1 | | 14 | 1 | | 15 | 1 | | 16 | 0 | | 17 | 2 | | 18 | 0 | | 19 | 0 | | 20 | 0 | | 21 | 0 | | 22 | 1 | | 23 | 1 | +------+-------+ Now t he sum m ary includes an ent ry for every hour of t he day. The

LEFT JOIN forces t he

out put t o include a row for every record in t he reference t able, regardless of t he cont ent s of t he

mail t able.

The exam ple j ust shown uses t he reference t able wit h a

LEFT JOIN t o fill in holes in t he

cat egory list . By rewrit ing t he query slight ly, you can also use t he reference t able t o find holes in t he dat aset —t hat is, t o det erm ine which cat egories are not present in t he dat a t o be sum m arized. The following query shows t hose hours of t he day during which no m essages were sent by using a

HAVING clause t hat

select s only sum m ary rows wit h a zero count :

mysql> SELECT ref.h AS hour, COUNT(HOUR(mail.t)) AS count -> FROM ref LEFT JOIN mail ON ref.h = HOUR(mail.t) -> GROUP BY hour -> HAVING count = 0; +------+-------+ | hour | count | +------+-------+ | 0 | 0 | | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | 0 | | 5 | 0 | | 6 | 0 | | 16 | 0 |

| 18 | 0 | | 19 | 0 | | 20 | 0 | | 21 | 0 | +------+-------+ I n t his case, it 's possible t o writ e a sim pler query, based on t he fact t hat each hour value appears in t he reference t able only once. This m eans t hat no look for reference rows t hat don't m at ch any

GROUP BY is necessary;

j ust

mail t able rows:

mysql> SELECT ref.h AS hour -> FROM ref LEFT JOIN mail ON ref.h = HOUR(mail.t) -> WHERE mail.t IS NULL; +------+ | hour | +------+ | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 16 | | 18 | | 19 | | 20 | | 21 | +------+ This query also has t he advant age of not producing a count colum n ( which is ext raneous anyway, because t he count s are always zero) . Reference t ables t hat cont ain a list of cat egories are quit e useful for sum m ary queries, but creat ing such t ables m anually can be a m ind-num bing and error- prone exercise. I f a cat egory list has a lot of ent ries, you m ight find it preferable t o writ e a script t hat uses t he endpoint s of t he range of cat egory values t o generat e t he reference t able for you. I n essence, t his t ype of script act s as an it erat or t hat generat es a record for each value in t he range. The following Perl script , m ake_dat e_list .pl, shows an exam ple of t his approach. I t creat es a reference t able cont aining a row for every dat e in a part icular dat e range:

#! /usr/bin/perl -w # make_date_list.pl - create a table with an entry for every date in a # given date range. The table can be used in a LEFT JOIN with a data table # when producing a summary, to make sure that every date appears in the # summary, whether or not the data table actually contains any values for # a given day. # Usage: make_date_list.pl tbl_name col_name min_date max_date # This script assumes that you're using the cookbook database. use strict;

use lib qw(/usr/local/apache/lib/perl); use Cookbook; # Check number of arguments, perform minimal tests for ISO-format dates @ARGV == 4 or die "Usage: make_date_list.pl tbl_name col_name min_date max_date\n"; my ($tbl_name, $col_name, $min_date, $max_date) = (@ARGV); $min_date =~ /^\d+\D\d+\D\d+$/ or die "Minimum date $min_date is not in ISO format\n"; $max_date =~ /^\d+\D\d+\D\d+$/ or die "Maximum date $max_date is not in ISO format\n"; my $dbh = Cookbook::connect ( ); # Determine the number of days spanned by the date range. my $days = $dbh->selectrow_array (qq{ SELECT TO_DAYS(?) - TO_DAYS(?) + 1 }, undef, $max_date, $min_date); print "Minimum date: $min_date\n"; print "Maximum date: $max_date\n"; print "Number of days spanned by range: $days\n"; die "Date range is too small\n" if $days < 1; # Drop table if it exists, then recreate it $dbh->do ("DROP TABLE IF EXISTS $tbl_name"); $dbh->do (qq{ CREATE TABLE $tbl_name ($col_name DATE NOT NULL, PRIMARY KEY ($col_name)) }); # Populate table with each date in the date range my $sth = $dbh->prepare (qq{ INSERT INTO $tbl_name ($col_name) VALUES(DATE_ADD(?,INTERVAL ? DAY)) }); for (my $i = 0; $i < $days; $i++) { $sth->execute ($min_date, $i); } $dbh->disconnect ( ); exit (0); Tables generat ed by m ake_dat e_list .pl can be used for per-day sum m aries, or t o find days not represent ed in t he t able. A dat e- based reference t able can be used for calendar-day sum m aries, t oo. For exam ple, you could use it t o sum m arize t he baseball1.com

master

t able t o find out how m any ballplayers in t he t able were born each day of t he year, or t o find days of t he year for which t here are no birt hdays. When creat ing a calendar day reference t able, be sure t o use a leap year so t hat t he t able cont ains an ent ry for February 29. The year 2004 is one such year, so a suit able reference t able can be creat ed like t his:

% make_date_list.pl ref d 2004-01-01 2004-12-31

master t able st ores birt h dat es in t hree colum ns nam ed birthday, birthmonth, birthyear. Aft er creat ing t he reference t able, use t he following query t o sum m arize birt hdays in t he master t able for each calendar day: The

SELECT MONTH(ref.d) AS month, DAYOFMONTH(ref.d) AS day, COUNT(master.lahmanid) AS count FROM ref LEFT JOIN master ON MONTH(ref.d) = master.birthmonth AND DAYOFMONTH(ref.d) = master.birthday GROUP BY month, day; To see if t here are any days on which no birt hdays occur, use t his query inst ead:

SELECT MONTH(ref.d) AS month, DAYOFMONTH(ref.d) AS day FROM ref LEFT JOIN master ON MONTH(ref.d) = master.birthmonth AND DAYOFMONTH(ref.d) = master.birthday WHERE master.birthmonth IS NULL and master.birthday IS NULL;

12.11 Enumerating a Many-to-Many Relationship 12.11.1 Problem You want t o display a relat ionship bet ween t ables when records in eit her t able m ay be m at ched by m ult iple records in t he ot her t able.

12.11.2 Solution This is a m any- t o-m any relat ionship. I t requires a t hird t able for associat ing your t wo prim ary t ables, and a t hree- way j oin t o list t he correspondences bet ween t hem .

12.11.3 Discussion The

artist and painting t ables used in earlier

sect ions are relat ed in a one- t o- m any

relat ionship: A given art ist m ay have produced m any paint ings, but each paint ing was creat ed by only one art ist . One- t o-m any relat ionships are relat ively sim ple and t he t wo t ables in t he relat ionship can be relat ed by m eans of a key t hat is com m on t o bot h t ables. Even sim pler is t he one- t o- one relat ionship, which oft en is used for perform ing lookups t hat m ap one set of values t o anot her. For exam ple, t he

abbrev colum ns t hat

states t able cont ains name and

list full st at e nam es and t heir corresponding abbreviat ions:

mysql> SELECT name, abbrev FROM states;

+----------------+--------+ | name | abbrev | +----------------+--------+ | Alabama | AL | | Alaska | AK | | Arizona | AZ | | Arkansas | AR | ... This is a one- t o- one relat ionship. I t can be used t o m ap st at e nam e abbreviat ions in t he

painting t able, which cont ains a state colum n indicat ing t he st at e in which each paint ing was purchased. Wit h no m apping, painting ent ries can be displayed like t his: mysql> SELECT title, state FROM painting ORDER BY state; +-------------------+-------+ | title | state | +-------------------+-------+ | The Rocks | IA | | The Last Supper | IN | | Starry Night | KY | | The Potato Eaters | KY | | The Mona Lisa | MI | | Les Deux Soeurs | NE | +-------------------+-------+ I f you want t o see t he full st at e nam es rat her t han abbreviat ions, it 's possible t o use t he onet o- one relat ionship t hat exist s bet ween t he t wo t hat is enum erat ed in t he Join t hat t able t o t he

states t able.

painting t able as follows, using t he abbreviat ion values t hat

are

com m on t o t he t wo t ables:

mysql> SELECT painting.title, states.name AS state -> FROM painting, states -> WHERE painting.state = states.abbrev -> ORDER BY state; +-------------------+----------+ | title | state | +-------------------+----------+ | The Last Supper | Indiana | | The Rocks | Iowa | | Starry Night | Kentucky | | The Potato Eaters | Kentucky | | The Mona Lisa | Michigan | | Les Deux Soeurs | Nebraska | +-------------------+----------+ A m ore com plex relat ionship bet ween t ables is t he m any- t o-m any relat ionship, which occurs when a record in one t able m ay have m any m at ches in t he ot her, and vice versa. To illust rat e such a relat ionship, t his is t he point at which dat abase books t ypically devolve int o t he "part s and suppliers" problem . ( A given part m ay be available t hrough several suppliers; how can you produce a list showing which part s are available from which suppliers?) However, having seen t hat exam ple far t oo m any t im es, I prefer t o use a different illust rat ion. So, even t hough concept ually it 's really t he sam e idea, let 's use t he following scenario: You and a bunch of your

friends are avid ent husiast s of euchre, a four- handed card gam e played wit h t wo t eam s of part ners. Each year, you all get t oget her, pair off, and run a friendly t ournam ent . Nat urally, t o avoid cont roversy about t he result s of each t ournam ent , you record t he pairings and out com es in a dat abase. One way t o st ore t he result s would be wit h a t able t hat is set up as follows, where for each t ournam ent year, you record t he t eam nam es, win- loss records, players, and player cit ies of residence:

mysql> SELECT * FROM euchre ORDER BY year, wins DESC, player; +----------+------+------+--------+----------+-------------+ | team | year | wins | losses | player | player_city | +----------+------+------+--------+----------+-------------+ | Kings | 2001 | 10 | 2 | Ben | Cork | | Kings | 2001 | 10 | 2 | Billy | York | | Crowns | 2001 | 7 | 5 | Melvin | Dublin | | Crowns | 2001 | 7 | 5 | Tony | Derry | | Stars | 2001 | 4 | 8 | Franklin | Bath | | Stars | 2001 | 4 | 8 | Wallace | Cardiff | | Sceptres | 2001 | 3 | 9 | Maurice | Leeds | | Sceptres | 2001 | 3 | 9 | Nigel | London | | Crowns | 2002 | 9 | 3 | Ben | Cork | | Crowns | 2002 | 9 | 3 | Tony | Derry | | Kings | 2002 | 8 | 4 | Franklin | Bath | | Kings | 2002 | 8 | 4 | Nigel | London | | Stars | 2002 | 5 | 7 | Maurice | Leeds | | Stars | 2002 | 5 | 7 | Melvin | Dublin | | Sceptres | 2002 | 2 | 10 | Billy | York | | Sceptres | 2002 | 2 | 10 | Wallace | Cardiff | +----------+------+------+--------+----------+-------------+ As shown by t he t able, each t eam has m ult iple players, and each player has part icipat ed in m ult iple t eam s. The t able capt ures t he nat ure of t his m any- t o- m any relat ionship, but it 's also in non- norm al form , because each row unnecessarily st ores quit e a bit of repet it ive inform at ion. ( I nform at ion for each t eam is recorded m ult iple t im es, as is inform at ion about each player.) A bet t er way t o represent t his m any-t o-m any relat ionship is as follows:



St ore each t eam nam e, year, and record once, in a t able nam ed



St ore each player nam e and cit y of residence once, in a t able nam ed



euchre_player. Creat e a t hird t able, euchre_link, t hat

euchre_team.

st ores t eam - player associat ions and

serves as a link, or bridge, bet ween t he t wo prim ary t ables. To m inim ize t he inform at ion st ored in t his t able, assign unique I Ds t o each t eam and player wit hin t heir respect ive t ables, and st ore only t hose I Ds in t he The result ing t eam and player t ables look like t his:

mysql> SELECT * FROM euchre_team; +----+----------+------+------+--------+ | id | name | year | wins | losses | +----+----------+------+------+--------+ | 1 | Kings | 2001 | 10 | 2 |

euchre_link t able.

| 2 | Crowns | 2001 | 7 | 5 | | 3 | Stars | 2001 | 4 | 8 | | 4 | Sceptres | 2001 | 3 | 9 | | 5 | Kings | 2002 | 8 | 4 | | 6 | Crowns | 2002 | 9 | 3 | | 7 | Stars | 2002 | 5 | 7 | | 8 | Sceptres | 2002 | 2 | 10 | +----+----------+------+------+--------+ mysql> SELECT * FROM euchre_player; +----+----------+---------+ | id | name | city | +----+----------+---------+ | 1 | Ben | Cork | | 2 | Billy | York | | 3 | Tony | Derry | | 4 | Melvin | Dublin | | 5 | Franklin | Bath | | 6 | Wallace | Cardiff | | 7 | Nigel | London | | 8 | Maurice | Leeds | +----+----------+---------+ The

euchre_link t able associat es t eam s and players as follows:

mysql> SELECT * FROM euchre_link; +---------+-----------+ | team_id | player_id | +---------+-----------+ | 1 | 1 | | 1 | 2 | | 2 | 3 | | 2 | 4 | | 3 | 5 | | 3 | 6 | | 4 | 7 | | 4 | 8 | | 5 | 5 | | 5 | 7 | | 6 | 1 | | 6 | 3 | | 7 | 4 | | 7 | 8 | | 8 | 2 | | 8 | 6 | +---------+-----------+ To answer quest ions about t he t eam s or players using t hese t ables, you need t o perform a t hree- way j oin, using t he link t able t o relat e t he t wo prim ary t ables t o each ot her. Here are som e exam ples:



List all t he pairings t hat show t he t eam s and who played on t hem . This query enum erat es all t he correspondences bet ween t he

euchre_team and

euchre_player t ables and reproduces t he inform at ion t hat non- norm al euchre t able:

was originally in t he

• • • • • • • • • • • • • • • • • • • • • • •

mysql> SELECT t.name, t.year, t.wins, t.losses, p.name, p.city -> FROM euchre_team AS t, euchre_link AS l, euchre_player AS p -> WHERE t.id = l.team_id AND p.id = l.player_id -> ORDER BY t.year, t.wins DESC, p.name; +----------+------+------+--------+----------+---------+ | name | year | wins | losses | name | city | +----------+------+------+--------+----------+---------+ | Kings | 2001 | 10 | 2 | Ben | Cork | | Kings | 2001 | 10 | 2 | Billy | York | | Crowns | 2001 | 7 | 5 | Melvin | Dublin | | Crowns | 2001 | 7 | 5 | Tony | Derry | | Stars | 2001 | 4 | 8 | Franklin | Bath | | Stars | 2001 | 4 | 8 | Wallace | Cardiff | | Sceptres | 2001 | 3 | 9 | Maurice | Leeds | | Sceptres | 2001 | 3 | 9 | Nigel | London | | Crowns | 2002 | 9 | 3 | Ben | Cork | | Crowns | 2002 | 9 | 3 | Tony | Derry | | Kings | 2002 | 8 | 4 | Franklin | Bath | | Kings | 2002 | 8 | 4 | Nigel | London | | Stars | 2002 | 5 | 7 | Maurice | Leeds | | Stars | 2002 | 5 | 7 | Melvin | Dublin | | Sceptres | 2002 | 2 | 10 | Billy | York | | Sceptres | 2002 | 2 | 10 | Wallace | Cardiff | +----------+------+------+--------+----------+---------+

• • • • • • • • • •

List t he m em bers for a part icular t eam ( t he 2001 Crowns) :



List t he t eam s t hat a given player ( Billy) has been a m em ber of:

• • • • • • • • •

mysql> SELECT t.name, t.year, t.wins, t.losses -> FROM euchre_team AS t, euchre_link AS l, euchre_player AS p -> WHERE t.id = l.team_id AND p.id = l.player_id -> AND p.name = 'Billy'; +----------+------+------+--------+ | name | year | wins | losses | +----------+------+------+--------+ | Kings | 2001 | 10 | 2 | | Sceptres | 2002 | 2 | 10 | +----------+------+------+--------+

mysql> SELECT p.name, p.city -> FROM euchre_team AS t, euchre_link AS l, euchre_player AS p -> WHERE t.id = l.team_id AND p.id = l.player_id -> AND t.name = 'Crowns' AND t.year = 2001; +--------+--------+ | name | city | +--------+--------+ | Tony | Derry | | Melvin | Dublin | +--------+--------+

Not e t hat alt hough quest ions about m any- t o- m any relat ionships involve a t hree- way j oin, a t hree- way j oin in it self does not necessarily im ply a m any- t o-m any relat ionship. Earlier in t his sect ion, we j oined t he

states t able t o t he painting t able t o m ap st at e abbreviat ions

t o full nam es:

mysql> SELECT painting.title, states.name AS state -> FROM painting, states -> WHERE painting.state = states.abbrev -> ORDER BY state; +-------------------+----------+ | title | state | +-------------------+----------+ | The Last Supper | Indiana | | The Rocks | Iowa | | Starry Night | Kentucky | | The Potato Eaters | Kentucky | | The Mona Lisa | Michigan | | Les Deux Soeurs | Nebraska | +-------------------+----------+ To display t he art ist who paint ed each paint ing, m odify t he query slight ly by j oining t he result s wit h t he

artist t able:

mysql> SELECT artist.name, painting.title, states.name AS state -> FROM artist, painting, states -> WHERE artist.a_id = painting.a_id AND painting.state = states.abbrev; +----------+-------------------+----------+ | name | title | state | +----------+-------------------+----------+ | Da Vinci | The Last Supper | Indiana | | Da Vinci | The Mona Lisa | Michigan | | Van Gogh | Starry Night | Kentucky | | Van Gogh | The Potato Eaters | Kentucky | | Van Gogh | The Rocks | Iowa | | Renoir | Les Deux Soeurs | Nebraska | +----------+-------------------+----------+ The query now involves a t hree- way j oin, but t he nat ure of t he relat ionship bet ween art ist s and paint ings rem ains t he sam e. I t 's st ill one- t o- m any, not m any- t o-m any.

12.12 Comparing a Table to Itself 12.12.1 Problem You want t o com pare records in a t able t o ot her records in t he sam e t able. For exam ple, you want t o find all paint ings in your collect ion by t he art ist who paint ed "The Pot at o Eat ers." Or

states t able j oined t he Union in t he sam e year which of t he people list ed in t he profile t able have

you want t o know which st at es list ed in t he as New York. Or you want t o know som e favorit e food in com m on.

12.12.2 Solution Problem s t hat require com paring a t able t o it self involve an operat ion known as a self- j oin. I t 's m uch like ot her j oins, except t hat you m ust always use t able aliases so t hat you can refer t o t he sam e t able different ways wit hin t he query.

12.12.3 Discussion A special case of j oining one t able t o anot her occurs when bot h t ables are t he sam e. This is called a self- j oin. Alt hough m any people find t he idea confusing or st range t o t hink about at first , it 's perfect ly legal. Be assured t hat you'll get used t o t he concept , and m ore t han likely will find yourself using self- j oins quit e oft en because t hey are so im port ant . A t ip- off t hat you need a self- j oin is when you want t o know which pairs of elem ent s in a t able sat isfy som e condit ion. For exam ple, suppose your favorit e paint ing is "The Pot at o Eat ers" and you want t o ident ify all t he it em s in your collect ion t hat were done by t he art ist who paint ed it . You can do so as follows: 1.

so t hat you can refer 2.

painting t able t hat t o it s a_id value.

I dent ify t he row in t he

Use t he

a_id value t o m at ch ot her

cont ains t he t it le "The Pot at o Eat ers,"

rows in t he t able t hat have t he sam e

a_id

value. 3.

Display t he t it les from t hose m at ching rows.

The art ist I D and paint ing t it les t hat we begin wit h look like t his:

mysql> SELECT a_id, title FROM painting ORDER BY a_id; +------+-------------------+ | a_id | title | +------+-------------------+ | 1 | The Last Supper | | 1 | The Mona Lisa | | 3 | Starry Night | | 3 | The Potato Eaters | | 3 | The Rocks | | 5 | Les Deux Soeurs | +------+-------------------+ A t wo- st ep m et hod for picking out t he right t it les wit hout a j oin is t o look up t he art ist 's I D wit h one query, t hen use t he I D in a second query t hat select s records t hat m at ch it :

mysql> SELECT @id := a_id FROM painting WHERE title = 'The Potato Eaters'; +-------------+ | @id := a_id | +-------------+ | 3 | +-------------+ mysql> SELECT title FROM painting WHERE a_id = @id; +-------------------+

| title | +-------------------+ | Starry Night | | The Potato Eaters | | The Rocks | +-------------------+ Anot her solut ion—one t hat requires only a single query—is t o use a self- j oin. The t rick t o t his lies in figuring out t he proper not at ion t o use. The way m any people first t ry t o writ e a query t hat j oins a t able t o it self looks som et hing like t his:

mysql> SELECT title FROM painting, painting -> WHERE title = 'The Potato Eaters' AND a_id = a_id; ERROR 1066 at line 1: Not unique table/alias: 'painting' The problem wit h t hat query is t hat t he colum n references are am biguous. MySQL can't t ell which inst ance of t he

painting t able any given colum n nam e refers t o. The solut ion is t o

give at least one inst ance of t he t able an alias so t hat you can dist inguish colum n references by using different t able qualifiers. The following query shows how t o do t his, using t he aliases

p1 and p2 t o refer

t o t he

painting t able different

ways:

mysql> SELECT p2.title -> FROM painting AS p1, painting AS p2 -> WHERE p1.title = 'The Potato Eaters' -> AND p1.a_id = p2.a_id; +-------------------+ | title | +-------------------+ | Starry Night | | The Potato Eaters | | The Rocks | +-------------------+ The query out put illust rat es som et hing t ypical of self-j oins: when you begin wit h a reference value in one t able inst ance ( "The Pot at o Eat ers") t o find m at ching records in a second t able inst ance ( paint ings by t he sam e art ist ) , t he out put includes t he reference value. That m akes sense—aft er all, t he reference m at ches it self. I f you want t o find only ot her paint ings by t he sam e art ist , explicit ly exclude t he reference value from t he out put :

mysql> SELECT p2.title -> FROM painting AS p1, painting AS p2 -> WHERE p1.title = 'The Potato Eaters' AND p2.title != 'The Potato Eaters' -> AND p1.a_id = p2.a_id; +--------------+ | title | +--------------+ | Starry Night | | The Rocks | +--------------+

A m ore general way t o exclude t he reference value wit hout nam ing it lit erally is t o specify t hat you don't want out put rows t o have t he sam e t it le as t he reference, what ever t hat t it le happens t o be:

mysql> SELECT p2.title -> FROM painting AS p1, painting AS p2 -> WHERE p1.title = 'The Potato Eaters' AND p1.title != p2.title -> AND p1.a_id = p2.a_id; +--------------+ | title | +--------------+ | Starry Night | | The Rocks | +--------------+ The preceding queries use com parisons of I D values t o m at ch records in t he t wo t able inst ances, but any kind of value can be used. For exam ple, t o use t he

states t able t o

answer t he quest ion "Which st at es j oined t he Union in t he sam e year as New York?," perform a t em poral pairwise com parison based on t he year part of t he dat es in t he t able's

statehood colum n: mysql> SELECT s2.name, s2.statehood -> FROM states AS s1, states AS s2 -> WHERE s1.name = 'New York' -> AND YEAR(s1.statehood) = YEAR(s2.statehood) -> ORDER BY s2.name; +----------------+------------+ | name | statehood | +----------------+------------+ | Connecticut | 1788-01-09 | | Georgia | 1788-01-02 | | Maryland | 1788-04-28 | | Massachusetts | 1788-02-06 | | New Hampshire | 1788-06-21 | | New York | 1788-07-26 | | South Carolina | 1788-05-23 | | Virginia | 1788-06-25 | +----------------+------------+ Here again, t he reference value ( New York) appears in t he out put . I f you want t o prevent t hat , add an expression t o t he

WHERE clause t hat

explicit ly excludes t he reference:

mysql> SELECT s2.name, s2.statehood -> FROM states AS s1, states AS s2 -> WHERE s1.name = 'New York' AND s1.name != s2.name -> AND YEAR(s1.statehood) = YEAR(s2.statehood) -> ORDER BY s2.name; +----------------+------------+ | name | statehood | +----------------+------------+ | Connecticut | 1788-01-09 | | Georgia | 1788-01-02 | | Maryland | 1788-04-28 | | Massachusetts | 1788-02-06 |

| New Hampshire | 1788-06-21 | | South Carolina | 1788-05-23 | | Virginia | 1788-06-25 | +----------------+------------+ Like t he problem of finding ot her paint ings by t he paint er of "The Pot at o Eat ers," t he st at ehood problem could have been solved by using a SQL variable and t wo queries. That will always be t rue when you're seeking m at ches for one part icular row in your t able. Ot her problem s require finding m at ches bet ween several pairs of rows, in which case t he t wo-query m et hod will not work. Suppose you want t o det erm ine which pairs of people list ed in t he

profile t able have favorit e foods in com m on. I n t his case, t he out put

pot ent ially can

include any pair of people in t he t able. There is no fixed reference value, so you cannot st ore t he reference in a variable. A self- j oin is perfect for t his problem , alt hough t here is t he quest ion of how t o ident ify which

foods values share com m on elem ent s. The foods colum n cont ains SET values, each of which m ay indicat e m ult iple foods, so an exact com parison will not work:



The com parison is t rue only if bot h

foods values nam e an ident ical set

of foods; t his

is unsuit able if you require only a com m on elem ent .



Two em pt y values will com pare as equal, even t hough t hey have no foods in com m on.

To ident ify

SET values t hat

share com m on elem ent s, use t he fact t hat MySQL represent s

t hem as bit fields and perform t he com parison using t he

& ( bit wise AND)

operat or t o look for

pairs t hat have a non-zero int ersect ion:

mysql> SELECT t1.name, t2.name, t1.foods, t2.foods -> FROM profile AS t1, profile AS t2 -> WHERE t1.id != t2.id AND (t1.foods & t2.foods) != 0 -> ORDER BY t1.name, t2.name; +------+------+----------------------+----------------------+ | name | name | foods | foods | +------+------+----------------------+----------------------+ | Alan | Brit | curry,fadge | burrito,curry,pizza | | Alan | Fred | curry,fadge | lutefisk,fadge,pizza | | Alan | Mara | curry,fadge | lutefisk,fadge | | Alan | Sean | curry,fadge | burrito,curry | | Brit | Alan | burrito,curry,pizza | curry,fadge | | Brit | Carl | burrito,curry,pizza | eggroll,pizza | | Brit | Fred | burrito,curry,pizza | lutefisk,fadge,pizza | | Brit | Sean | burrito,curry,pizza | burrito,curry | | Carl | Brit | eggroll,pizza | burrito,curry,pizza | | Carl | Fred | eggroll,pizza | lutefisk,fadge,pizza | | Fred | Alan | lutefisk,fadge,pizza | curry,fadge | | Fred | Brit | lutefisk,fadge,pizza | burrito,curry,pizza | | Fred | Carl | lutefisk,fadge,pizza | eggroll,pizza | | Fred | Mara | lutefisk,fadge,pizza | lutefisk,fadge | | Mara | Alan | lutefisk,fadge | curry,fadge | | Mara | Fred | lutefisk,fadge | lutefisk,fadge,pizza | | Sean | Alan | burrito,curry | curry,fadge |

| Sean | Brit | burrito,curry | burrito,curry,pizza | +------+------+----------------------+----------------------+ Som e self- j oin problem s are of t he "Which values have no m at ch?" variet y. An inst ance of t his is t he quest ion, "Which m essage senders in t he

mail t able didn't

send any m essages t o

t hem selves?" First , check who sent m ail t o who:

mysql> SELECT DISTINCT srcuser, dstuser FROM mail -> ORDER BY srcuser, dstuser; +---------+---------+ | srcuser | dstuser | +---------+---------+ | barb | barb | | barb | tricia | | gene | barb | | gene | gene | | gene | tricia | | phil | barb | | phil | phil | | phil | tricia | | tricia | gene | | tricia | phil | +---------+---------+ Of t hose pairs, several are for people t hat did send m ail t o t hem selves:

mysql> SELECT DISTINCT srcuser, dstuser FROM mail -> WHERE srcuser = dstuser; +---------+---------+ | srcuser | dstuser | +---------+---------+ | phil | phil | | barb | barb | | gene | gene | +---------+---------+ Finding people who didn't send m ail t o t hem selves is a "non- m at ch" problem , which is t he t ype of problem t hat t ypically involves a

LEFT JOIN. I n t his case, t he solut ion requires a

LEFT JOIN of t he mail t able t o it self: mysql> SELECT DISTINCT m1.srcuser -> FROM mail AS m1 LEFT JOIN mail AS m2 -> ON m1.srcuser = m2.srcuser AND m2.srcuser = m2.dstuser -> WHERE m2.dstuser IS NULL; +---------+ | srcuser | +---------+ | tricia | +---------+ For each record in t he

mail t able, t he query

select s m at ches where t he sender and recipient

are t he sam e. For records having no such m at ch, t he

LEFT JOIN forces t he out put

to

cont ain a row anyway, wit h all t he

m2 colum ns set

to

NULL. These rows ident ify t he senders

who sent no m essages t o t hem selves. Using a

LEFT JOIN t o j oin a t able t o it self also provides anot her

way t o answer m axim um -

per-group quest ions of t he sort discussed in Recipe 12.7, but wit hout using a secondary t em porary t able. Recall t hat in t hat recipe we found t he m ost expensive paint ing per art ist as follows using a t em porary t able:

mysql> CREATE TABLE tmp -> SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id; mysql> SELECT artist.name, painting.title, painting.price -> FROM artist, painting, tmp -> WHERE painting.a_id = tmp.a_id -> AND painting.price = tmp.max_price -> AND painting.a_id = artist.a_id; +----------+-------------------+-------+ | name | title | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa | 87 | | Van Gogh | The Potato Eaters | 67 | | Renoir | Les Deux Soeurs | 64 | +----------+-------------------+-------+ Anot her way t o ident ify t he paint ings and t hen pull out values from each of t hose rows is wit h a

LEFT JOIN. The following query

ident ifies t he paint ings:

mysql> SELECT p1.a_id, p1.title, p1.price -> FROM painting p1 -> LEFT JOIN painting p2 -> ON p1.a_id = p2.a_id AND p1.price < p2.price -> WHERE p2.a_id IS NULL; +------+-------------------+-------+ | a_id | title | price | +------+-------------------+-------+ | 1 | The Mona Lisa | 87 | | 3 | The Potato Eaters | 67 | | 5 | Les Deux Soeurs | 64 | +------+-------------------+-------+ To display t he art ist nam es, j oin t he result wit h t he

artist t able:

mysql> SELECT artist.name, p1.title, p1.price -> FROM (painting p1 -> LEFT JOIN painting p2 -> ON p1.a_id = p2.a_id AND p1.price < p2.price), artist -> WHERE p2.a_id IS NULL AND p1.a_id = artist.a_id; +----------+-------------------+-------+ | name | title | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa | 87 | | Van Gogh | The Potato Eaters | 67 | | Renoir | Les Deux Soeurs | 64 | +----------+-------------------+-------+

Not e t hat a given "com pare a t able t o it self" problem does not necessarily require a self- j oin, even if it 's possible t o solve it t hat way. The

mail t able serves t o illust rat e t his. One way t o

det erm ine which senders sent t hem selves a m essage is t o use a self- j oin:

mysql> SELECT DISTINCT m1.srcuser, m2.dstuser -> FROM mail AS m1, mail AS m2 -> WHERE m1.srcuser = m2.srcuser AND m2.dstuser = m1.srcuser; +---------+---------+ | srcuser | dstuser | +---------+---------+ | phil | phil | | barb | barb | | gene | gene | +---------+---------+ But t hat 's silly. The query doesn't need t o com pare records t o each ot her. I t needs only t o com pare different colum ns wit hin each row, so a non-j oin query is sufficient , and sim pler t o writ e:

mysql> SELECT DISTINCT srcuser, dstuser FROM mail -> WHERE srcuser = dstuser; +---------+---------+ | srcuser | dstuser | +---------+---------+ | phil | phil | | barb | barb | | gene | gene | +---------+---------+

12.13 Calculating Differences Between Successive Rows 12.13.1 Problem You have a t able cont aining successive cum ulat ive values in it s rows and you want t o com put e t he differences bet ween pairs of successive rows.

12.13.2 Solution Use a self- j oin t hat m at ches up pairs of adj acent rows and calculat es t he differences bet ween m em bers of each pair.

12.13.3 Discussion Self- j oins are useful when you have a set of absolut e ( or cum ulat ive) values t hat you want t o convert t o relat ive values represent ing t he differences bet ween successive pairs of rows. For exam ple, if you t ake an aut om obile t rip and writ e down t he t ot al m iles t raveled at each st opping point , you can com put e t he difference bet ween successive point s t o det erm ine t he dist ance from one st op t o t he next . Here is such a t able t hat shows t he st ops for a t rip from

San Ant onio, Texas t o Madison, Wisconsin. Each row shows t he t ot al m iles driven as of each st op:

mysql> SELECT seq, city, miles FROM trip_log ORDER BY seq; +-----+------------------+-------+ | seq | city | miles | +-----+------------------+-------+ | 1 | San Antonio, TX | 0 | | 2 | Dallas, TX | 263 | | 3 | Benton, AR | 566 | | 4 | Memphis, TN | 745 | | 5 | Portageville, MO | 878 | | 6 | Champaign, IL | 1164 | | 7 | Madison, WI | 1412 | +-----+------------------+-------+ A self- j oin can convert t hese cum ulat ive values t o successive differences t hat represent t he dist ances from each cit y t o t he next . The following query shows how t o use t he sequence num bers in t he records t o m at ch up pairs of successive rows and com put e t he differences bet ween each pair of m ileage values:

mysql> SELECT t1.seq AS seq1, t2.seq AS seq2, -> t1.city AS city1, t2.city AS city2, -> t1.miles AS miles1, t2.miles AS miles2, -> t2.miles-t1.miles AS dist -> FROM trip_log AS t1, trip_log AS t2 -> WHERE t1.seq+1 = t2.seq -> ORDER BY t1.seq; +------+------+------------------+------------------+--------+--------+-----+ | seq1 | seq2 | city1 | city2 | miles1 | miles2 | dist | +------+------+------------------+------------------+--------+--------+-----+ | 1 | 2 | San Antonio, TX | Dallas, TX | 0 | 263 | 263 | | 2 | 3 | Dallas, TX | Benton, AR | 263 | 566 | 303 | | 3 | 4 | Benton, AR | Memphis, TN | 566 | 745 | 179 | | 4 | 5 | Memphis, TN | Portageville, MO | 745 | 878 | 133 | | 5 | 6 | Portageville, MO | Champaign, IL | 878 | 1164 | 286 | | 6 | 7 | Champaign, IL | Madison, WI | 1164 | 1412 | 248 | +------+------+------------------+------------------+--------+--------+-----+ The presence of t he

seq colum n in t he trip_log t able is im port ant

for calculat ing

successive difference values. I t 's needed for est ablishing which row precedes anot her and m at ching each row

n wit h row n+1. The im plicat ion is t hat

a t able should include a sequence

colum n t hat has no gaps if you want t o perform relat ive- difference calculat ions from absolut e or cum ulat ive values. I f t he t able cont ains a sequence colum n but t here are gaps, renum ber it .

I f t he t able cont ains no such colum n, add one. Recipe 11.9 and Recipe 11.13 describe how t o perform t hese operat ions. A som ewhat m ore com plex sit uat ion occurs when you com put e successive differences for m ore t han one colum n and use t he result s in a calculat ion. The following t able,

player_stats, shows som e cum ulat ive num bers for a baseball player at t he end of each m ont h of his season. ab indicat es t he t ot al at - bat s and h t he t ot al hit s t he player has had as of a given dat e. ( The first record indicat es t he st art ing point of t he player's season, which is why t he

ab and h values are zero.)

mysql> SELECT id, date, ab, h, TRUNCATE(IFNULL(h/ab,0),3) AS ba -> FROM player_stats ORDER BY id; +----+------------+-----+----+-------+ | id | date | ab | h | ba | +----+------------+-----+----+-------+ | 1 | 2001-04-30 | 0 | 0 | 0.000 | | 2 | 2001-05-31 | 38 | 13 | 0.342 | | 3 | 2001-06-30 | 109 | 31 | 0.284 | | 4 | 2001-07-31 | 196 | 49 | 0.250 | | 5 | 2001-08-31 | 304 | 98 | 0.322 | +----+------------+-----+----+-------+ The last colum n of t he query result also shows t he player's bat t ing average as of each dat e. This colum n is not st ored in t he t able, but is easily com put ed as t he rat io of hit s t o at -bat s. The result provides a general idea of how t he player's hit t ing perform ance changed over t he course of t he season, but it doesn't give a very inform at ive pict ure of how t he player did during each individual m ont h. To det erm ine t hat , it 's necessary t o calculat e relat ive differences bet ween pairs of rows. This is easily done wit h a self- j oin t hat m at ches each row wit h row

n+1, t o calculat e differences bet ween pairs of at - bat s and hit s values. These

differences allow bat t ing average during each m ont h t o be com put ed:

mysql> SELECT -> t1.id AS id1, t2.id AS id2, -> t2.date, -> t1.ab AS ab1, t2.ab AS ab2, -> t1.h AS h1, t2.h AS h2, -> t2.ab-t1.ab AS abdiff, -> t2.h-t1.h AS hdiff, -> TRUNCATE(IFNULL((t2.h-t1.h)/(t2.ab-t1.ab),0),3) AS ba -> FROM player_stats AS t1, player_stats AS t2 -> WHERE t1.id+1 = t2.id -> ORDER BY t1.id; +-----+-----+------------+-----+-----+----+----+--------+-------+-------+ | id1 | id2 | date | ab1 | ab2 | h1 | h2 | abdiff | hdiff | ba | +-----+-----+------------+-----+-----+----+----+--------+-------+-------+ | 1 | 2 | 2001-05-31 | 0 | 38 | 0 | 13 | 38 | 13 | 0.342 | | 2 | 3 | 2001-06-30 | 38 | 109 | 13 | 31 | 71 | 18 | 0.253 | | 3 | 4 | 2001-07-31 | 109 | 196 | 31 | 49 | 87 | 18 | 0.206 | | 4 | 5 | 2001-08-31 | 196 | 304 | 49 | 98 | 108 | 49 | 0.453 | +-----+-----+------------+-----+-----+----+----+--------+-------+-------+

n

These result s show m uch m ore clearly t han t he original t able does t hat t he player st art ed off well, but had a slum p in t he m iddle of t he season, part icularly in July. They also indicat e j ust how st rong his perform ance was in August .

12.14 Finding Cumulative Sums and Running Averages 12.14.1 Problem You have a set of observat ions m easured over t im e and want t o com put e t he cum ulat ive sum of t he observat ions at each m easurem ent point . Or you want t o com put e a running average at each point .

12.14.2 Solution Use a self- j oin t o produce t he set s of successive observat ions at each m easurem ent point , t hen apply aggregat e funct ions t o each set of values t o com put e it s sum or average.

12.14.3 Discussion Recipe 12.13 illust rat es how a self- j oin can produce relat ive values from absolut e values. A self- j oin can do t he opposit e as well, producing cum ulat ive values at each successive st age of a set of observat ions. The following t able shows a set of rainfall m easurem ent s t aken over a series of days. The values in each row show t he observat ion dat e and t he am ount of precipit at ion in inches:

mysql> SELECT date, precip FROM rainfall ORDER BY date; +------------+--------+ | date | precip | +------------+--------+ | 2002-06-01 | 1.50 | | 2002-06-02 | 0.00 | | 2002-06-03 | 0.50 | | 2002-06-04 | 0.00 | | 2002-06-05 | 1.00 | +------------+--------+ To calculat e cum ulat ive rainfall for a given day, sum t hat day's precipit at ion value wit h t he values for all t he previous days. For exam ple, t he cum ulat ive rainfall as of

2002-06-03 is

det erm ined like t his:

mysql> SELECT SUM(precip) FROM rainfall WHERE date SELECT t1.date, t1.precip AS 'daily precip', -> SUM(t2.precip) AS 'cum. precip' -> FROM rainfall AS t1, rainfall AS t2 -> WHERE t1.date >= t2.date -> GROUP BY t1.date; +------------+--------------+-------------+ | date | daily precip | cum. precip | +------------+--------------+-------------+ | 2002-06-01 | 1.50 | 1.50 | | 2002-06-02 | 0.00 | 1.50 | | 2002-06-03 | 0.50 | 2.00 | | 2002-06-04 | 0.00 | 2.00 | | 2002-06-05 | 1.00 | 3.00 | +------------+--------------+-------------+ The self- j oin can be ext ended t o display t he num ber of days elapsed at each dat e, as well as t he running averages for am ount of precipit at ion each day:

mysql> SELECT t1.date, t1.precip AS 'daily precip', -> SUM(t2.precip) AS 'cum. precip', -> COUNT(t2.precip) AS days, -> AVG(t2.precip) AS 'avg. precip' -> FROM rainfall AS t1, rainfall AS t2 -> WHERE t1.date >= t2.date -> GROUP BY t1.date; +------------+--------------+-------------+------+-------------+ | date | daily precip | cum. precip | days | avg. precip | +------------+--------------+-------------+------+-------------+ | 2002-06-01 | 1.50 | 1.50 | 1 | 1.500000 | | 2002-06-02 | 0.00 | 1.50 | 2 | 0.750000 | | 2002-06-03 | 0.50 | 2.00 | 3 | 0.666667 | | 2002-06-04 | 0.00 | 2.00 | 4 | 0.500000 | | 2002-06-05 | 1.00 | 3.00 | 5 | 0.600000 | +------------+--------------+-------------+------+-------------+ I n t he preceding query, t he num ber of days elapsed and t he precipit at ion running averages can be com put ed easily using

COUNT( ) and AVG( ) because t here are no m issing days

in t he t able. I f m issing days are allowed, t he calculat ion becom es m ore com plicat ed, because t he num ber of days elapsed for each calculat ion no longer will be t he sam e as t he num ber of records. You can see t his by delet ing t he records for t he days t hat had no precipit at ion t o produce a couple of " holes" in t he t able:

mysql> DELETE FROM rainfall WHERE precip = 0; mysql> SELECT date, precip FROM rainfall ORDER BY date; +------------+--------+ | date | precip | +------------+--------+ | 2002-06-01 | 1.50 |

| 2002-06-03 | 0.50 | | 2002-06-05 | 1.00 | +------------+--------+ Delet ing t hose records doesn't change t he cum ulat ive sum or running average for t he dat es t hat rem ain, but does change how t hey m ust be calculat ed. I f you t ry t he self- j oin again, it yields incorrect result s for t he days- elapsed and average precipit at ion colum ns:

mysql> SELECT t1.date, t1.precip AS 'daily precip', -> SUM(t2.precip) AS 'cum. precip', -> COUNT(t2.precip) AS days, -> AVG(t2.precip) AS 'avg. precip' -> FROM rainfall AS t1, rainfall AS t2 -> WHERE t1.date >= t2.date -> GROUP BY t1.date; +------------+--------------+-------------+------+-------------+ | date | daily precip | cum. precip | days | avg. precip | +------------+--------------+-------------+------+-------------+ | 2002-06-01 | 1.50 | 1.50 | 1 | 1.500000 | | 2002-06-03 | 0.50 | 2.00 | 2 | 1.000000 | | 2002-06-05 | 1.00 | 3.00 | 3 | 1.000000 | +------------+--------------+-------------+------+-------------+ To fix t he problem , it 's necessary t o det erm ine t he num ber of days elapsed a different way. Take t he m inim um and m axim um dat e involved in each sum and calculat e a days- elapsed value from t hem using t he following expression:

TO_DAYS(MAX(t2.date)) - TO_DAYS(MIN(t2.date)) + 1 That value m ust be used for t he days-elapsed colum n and for com put ing t he running averages. The result ing query is as follows:

mysql> SELECT t1.date, t1.precip AS 'daily precip', -> SUM(t2.precip) AS 'cum. precip', -> TO_DAYS(MAX(t2.date)) - TO_DAYS(MIN(t2.date)) + 1 AS days, -> SUM(t2.precip) / (TO_DAYS(MAX(t2.date)) - TO_DAYS(MIN(t2.date)) + 1) -> AS 'avg. precip' -> FROM rainfall AS t1, rainfall AS t2 -> WHERE t1.date >= t2.date -> GROUP BY t1.date; +------------+--------------+-------------+------+-------------+ | date | daily precip | cum. precip | days | avg. precip | +------------+--------------+-------------+------+-------------+ | 2002-06-01 | 1.50 | 1.50 | 1 | 1.5000 | | 2002-06-03 | 0.50 | 2.00 | 3 | 0.6667 | | 2002-06-05 | 1.00 | 3.00 | 5 | 0.6000 | +------------+--------------+-------------+------+-------------+ As t his exam ple illust rat es, calculat ion of cum ulat ive values from relat ive values requires only a colum n t hat allows rows t o be placed int o t he proper order. ( For t he t hat 's t he

date colum n.)

rainfall t able,

Values in t he colum n need not be sequent ial, or even num eric. This

differs from calculat ions t hat produce difference values from cum ulat ive values ( Recipe 12.13) , which require t hat a t able have a colum n t hat cont ains an unbroken sequence.

The running averages in t he rainfall exam ples are based on dividing cum ulat ive precipit at ion sum s by num ber of days elapsed as of each day. When t he t able has no gaps, t he num ber of days is t he sam e as t he num ber of values sum m ed, m aking it easy t o find successive averages. When records are m issing, t he calculat ions becom e m ore com plex. What t his dem onst rat es is t hat it 's necessary t o consider t he nat ure of your dat a and calculat e averages appropriat ely. The next exam ple is concept ually sim ilar t o t he previous ones in t hat it calculat es cum ulat ive sum s and running averages, but it perform s t he com put at ions yet anot her way. The following t able shows a m arat hon runner's perform ance at each st age of a 26- kilom et er run. The values in each row show t he lengt h of each st age in kilom et ers and how long t he runner t ook t o com plet e t he st age. I n ot her words, t he values pert ain t o int ervals wit hin t he m arat hon and t hus are relat ive t o t he whole:

mysql> SELECT stage, km, t FROM marathon ORDER BY stage; +-------+----+----------+ | stage | km | t | +-------+----+----------+ | 1 | 5 | 00:15:00 | | 2 | 7 | 00:19:30 | | 3 | 9 | 00:29:20 | | 4 | 5 | 00:17:50 | +-------+----+----------+ To calculat e cum ulat ive dist ance in kilom et ers at each st age, use a self- j oin t hat looks like t his:

mysql> SELECT t1.stage, t1.km, SUM(t2.km) AS 'cum. km' -> FROM marathon AS t1, marathon AS t2 -> WHERE t1.stage >= t2.stage -> GROUP BY t1.stage; +-------+----+---------+ | stage | km | cum. km | +-------+----+---------+ | 1 | 5 | 5 | | 2 | 7 | 12 | | 3 | 9 | 21 | | 4 | 5 | 26 | +-------+----+---------+ Cum ulat ive dist ances are easy t o com put e because t hey can be sum m ed direct ly. The calculat ion for accum ulat ing t im e values is a lit t le m ore involved. I t 's necessary t o convert t im es t o seconds, sum t he result ing values, and convert t he sum back t o a t im e value. To com put e t he runner's average speed at t he end of each st age, t ake t he rat io of cum ulat ive dist ance over cum ulat ive t im e. Put t ing all t his t oget her yields t he following query:

mysql> -> -> -> -> ->

SELECT t1.stage, t1.km, t1.t, SUM(t2.km) AS 'cum. km', SEC_TO_TIME(SUM(TIME_TO_SEC(t2.t))) AS 'cum. t', SUM(t2.km)/(SUM(TIME_TO_SEC(t2.t))/(60*60)) AS 'avg. km/hour' FROM marathon AS t1, marathon AS t2 WHERE t1.stage >= t2.stage

-> GROUP BY t1.stage; +-------+----+----------+---------+----------+--------------+ | stage | km | t | cum. km | cum. t | avg. km/hour | +-------+----+----------+---------+----------+--------------+ | 1 | 5 | 00:15:00 | 5 | 00:15:00 | 20.0000 | | 2 | 7 | 00:19:30 | 12 | 00:34:30 | 20.8696 | | 3 | 9 | 00:29:20 | 21 | 01:03:50 | 19.7389 | | 4 | 5 | 00:17:50 | 26 | 01:21:40 | 19.1020 | +-------+----+----------+---------+----------+--------------+

12.15 Using a Join to Control Query Output Order 12.15.1 Problem You want t o sort a query's out put using a charact erist ic of t he out put t hat cannot be specified using

ORDER BY. For

exam ple, you want t o sort a set of rows by subgroups, put t ing first

t hose groups wit h t he m ost rows and last t hose groups wit h t he fewest rows. But "num ber of rows in each group" is not a propert y of individual rows, so you can't sort by it .

12.15.2 Solution Derive t he ordering inform at ion and st ore it in anot her t able. Then j oin t he original t able t o t he derived t able, using t he derived t able t o cont rol t he sort order.

12.15.3 Discussion Most of t he t im e when you sort a query result , you use an

ORDER BY ( or GROUP BY)

clause t o nam e t he colum n or colum ns t o use for sort ing. But som et im es t he values you want t o sort by aren't present in t he rows t o be sort ed. This is t he case, for exam ple, if you want t o use group charact erist ics t o order t he rows. The following exam ple uses t he records in t he

driver_log t able t o illust rat e t his. The t able looks like t his: mysql> SELECT * FROM driver_log ORDER BY id; +--------+-------+------------+-------+ | rec_id | name | trav_date | miles | +--------+-------+------------+-------+ | 1 | Ben | 2001-11-30 | 152 | | 2 | Suzi | 2001-11-29 | 391 | | 3 | Henry | 2001-11-29 | 300 | | 4 | Henry | 2001-11-27 | 96 | | 5 | Ben | 2001-11-29 | 131 | | 6 | Henry | 2001-11-26 | 115 | | 7 | Suzi | 2001-12-02 | 502 | | 8 | Henry | 2001-12-01 | 197 | | 9 | Ben | 2001-12-02 | 79 | | 10 | Henry | 2001-11-30 | 203 | +--------+-------+------------+-------+ The preceding query sort s t he records using t he I D colum n, which is present in t he rows. But what if you want t o display a list and sort it on t he basis of a sum m ary value not present in

t he rows? That 's a lit t le t rickier. Suppose you want t o show each driver's records by dat e, but place t hose drivers who drive t he m ost m iles first . You can't do t his wit h a sum m ary query, because t hen you wouldn't get back t he individual driver records. But you can't do it wit hout a sum m ary query, eit her, because t he sum m ary values are required for sort ing. The way out of t he dilem m a is t o creat e anot her t able cont aining t he sum m ary values, t hen j oin it t o t he original t able. That way you can produce t he individual records, and also sort t hem by t he sum m ary values. To sum m arize t he driver t ot als int o anot her t able, do t his:

mysql> CREATE TABLE tmp -> SELECT name, SUM(miles) AS driver_miles FROM driver_log GROUP BY name; That produces t he values we need t o put t he nam es in t he proper order:

mysql> SELECT * FROM tmp ORDER BY driver_miles DESC; +-------+--------------+ | name | driver_miles | +-------+--------------+ | Henry | 911 | | Suzi | 893 | | Ben | 362 | +-------+--------------+

name values t o j oin t he sum m ary t able t o t he driver_log t able, and use t he driver_miles values t o sort t he result . The query below shows t he m ileage t ot als in Then use t he

t he result . That 's only t o m ake it clearer how t he values are being sort ed, it 's not act ually necessary t o display t hem . They're needed only for t he

ORDER BY clause.

mysql> SELECT tmp.driver_miles, driver_log.* -> FROM driver_log, tmp -> WHERE driver_log.name = tmp.name -> ORDER BY tmp.driver_miles DESC, driver_log.trav_date; +--------------+--------+-------+------------+-------+ | driver_miles | rec_id | name | trav_date | miles | +--------------+--------+-------+------------+-------+ | 911 | 6 | Henry | 2001-11-26 | 115 | | 911 | 4 | Henry | 2001-11-27 | 96 | | 911 | 3 | Henry | 2001-11-29 | 300 | | 911 | 10 | Henry | 2001-11-30 | 203 | | 911 | 8 | Henry | 2001-12-01 | 197 | | 893 | 2 | Suzi | 2001-11-29 | 391 | | 893 | 7 | Suzi | 2001-12-02 | 502 | | 362 | 5 | Ben | 2001-11-29 | 131 | | 362 | 1 | Ben | 2001-11-30 | 152 | | 362 | 9 | Ben | 2001-12-02 | 79 | +--------------+--------+-------+------------+-------+

12.16 Converting Subselects to Join Operations 12.16.1 Problem You want t o use a query t hat involves a subselect , but MySQL will not support subselect s unt il Version 4.1.

12.16.2 Solution I n m any cases, you can rewrit e a subselect as a j oin. Or you can writ e a program t hat sim ulat es t he subselect . Or you can even m ake m ysql generat e SQL st at em ent s t hat sim ulat e it .

12.16.3 Discussion Assum e you have t wo t ables,

t1 and t2 t hat

have t he following cont ent s:

mysql> SELECT col1 FROM t1; +------+ | col1 | +------+ | a | | b | | c | +------+ mysql> SELECT col2 FROM t2; +------+ | col2 | +------+ | b | | c | | d | +------+ Now suppose t hat you want t o find values in

t1 t hat

are also present in

t2, or

values in

t1

t2. These kinds of quest ions som et im es are answered using subselect one SELECT inside anot her, but MySQL won't have subselect s unt il

t hat are not present in queries t hat nest

Version 4.1. This sect ion shows how t o work around t hat problem .

IN( ) subselect t hat m at ch col2 values in t able t2:

The following query shows an

col1 values t hat

produces t he rows in t able

t1 having

SELECT col1 FROM t1 WHERE col1 IN (SELECT col2 FROM t2); That 's essent ially j ust a "find m at ching rows" query, and it can be rewrit t en as a sim ple j oin like t his:

mysql> SELECT t1.col1 FROM t1, t2 WHERE t1.col1 = t2.col2; +------+ | col1 | +------+ | b | | c | +------+ The converse quest ion ( rows in

t1 t hat

have no m at ch in

t2)

can be answered using a

NOT

IN( ) subselect : SELECT col1 FROM t1 WHERE col1 NOT IN (SELECT col2 FROM t2);

LEFT hand, t he NOT IN( )

That 's a " find non-m at ching rows" query. Som et im es t hese can be rewrit t en as a

JOIN, a t ype of j oin discussed in Recipe 12.6. For t he case at subselect is equivalent t o t he following LEFT JOIN:

mysql> SELECT t1.col1 FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col2 -> WHERE t2.col2 IS NULL; +------+ | col1 | +------+ | a | +------+ Wit hin a program , you can sim ulat e a subselect by working wit h t he result s of t wo queries. Suppose you want t o sim ulat e t he

IN( ) subselect

t hat finds m at ching values in t he t wo

t ables:

SELECT * FROM t1 WHERE col1 IN (SELECT col2 FROM t2); I f you expect t hat t he inner

SELECT will ret urn a reasonably

sm all num ber of

col2 values,

one way t o achieve t he sam e result as t he subselect is t o ret rieve t hose values and generat e

IN( ) clause t hat looks for t hem in col1. For exam ple, t he query SELECT col2 FROM t2 will produce t he values b, c, and d. Using t his result , you can select m at ching col1 values by generat ing a query t hat looks like t his: an

SELECT col1 FROM t1 WHERE col1 IN ('b','c','d') That can be done as follows ( shown here using Pyt hon) :

cursor = conn.cursor ( ) cursor.execute ("SELECT col2 FROM t2") if cursor.rowcount > 0: # do nothing if there are no values val = [ ] # list to hold data values s = "" # string to hold placeholders # construct %s,%s,%s, ... string containing placeholders for (col2,) in cursor.fetchall ( ): # pull col2 value from each row

if s != "": s = s + "," # separate placeholders by commas s = s + "%s" # add placeholder val.append (col2) # add value to list of values stmt = "SELECT col1 FROM t1 WHERE col1 IN (" + s + ")" cursor.execute (stmt, val) for (col1,) in cursor.fetchall ( ): # pull col1 values from final result print col1 cursor.close ( ) I f you expect lot s of

col2 values, you m ay want

t o generat e individual

SELECT queries for

each of t hem inst ead:

SELECT col1 FROM t1 WHERE col1 = 'b' SELECT col1 FROM t1 WHERE col1 = 'c' SELECT col1 FROM t1 WHERE col1 = 'd' This can be done wit hin a program as follows:

cursor = conn.cursor ( ) cursor2 = conn.cursor ( ) cursor.execute ("SELECT col2 FROM t2") for (col2,) in cursor.fetchall ( ): # pull col2 value from each row stmt = "SELECT col1 FROM t1 WHERE col1 = %s" cursor2.execute ("SELECT col1 FROM t1 WHERE col1 = %s", (col2,)) for (col1,) in cursor2.fetchall ( ): # pull col1 values from final result print col1 cursor.close ( ) cursor2.close ( ) I f you have so m any

col2 values t hat

you don't want t o const ruct a single huge

IN( )

SELECT st at em ent s, eit her, anot her opt ion is t o com bine t he approaches. Break t he set of col2 values int o sm aller groups and use each group t o const ruct an IN( ) clause. This gives you a set of short er queries t hat clause, but don't want t o issue zillions of individual

each look for several values:

SELECT col1 FROM t1 WHERE col1 IN (first SELECT col1 FROM t1 WHERE col1 IN SELECT col1 FROM t1 WHERE col1 IN ...

group of col2 values) (second group of col2 values) (second group of col2 values)

This approach can be im plem ent ed as follows:

grp_size = 1000 # number of IDs to select at once cursor = conn.cursor ( ) cursor.execute ("SELECT col2 FROM t2") if cursor.rowcount > 0: # do nothing if there are no values col2 = [ ] # list to hold data values for (val,) in cursor.fetchall ( ): # pull col2 value from each row

col2.append (val) nvals = len (col2) i = 0 while i < nvals: if nvals < i + grp_size: j = nvals else: j = i + grp_size group = col2[i : j] s = "" # string to hold placeholders val_list = [ ] # construct %s,%s,%s, ... string containing placeholders for val in group: if s != "": s = s + "," # separate placeholders by commas s = s + "%s" # add placeholder val_list.append (val) # add value to list of values stmt = "SELECT col1 FROM t1 WHERE col1 IN (" + s + ")" print stmt cursor.execute (stmt, val_list) for (col1,) in cursor.fetchall ( ): # pull col1 values from result print col1 i = i + grp_size # go to next group of values cursor.close ( ) Sim ulat ing a an

NOT IN( ) subselect

IN( ) subselect . The subselect

from wit hin a program is a bit t rickier t han sim ulat ing looks like t his:

SELECT col1 FROM t1 WHERE col1 NOT IN (SELECT col2 FROM t2);

col1 and col2 values, because you m ust hold at least t he values ret urned by t he inner SELECT in m em ory, so t hat you can com pare t hem t o t he value ret urned by t he out er SELECT. The exam ple shown The t echnique shown here works best for sm aller num bers of

here holds bot h set s in m em ory. First , ret rieve t he

col1 and col2 values:

cursor = conn.cursor ( ) cursor.execute ("SELECT col1 FROM t1") col1 = [ ] for (val, ) in cursor.fetchall ( ): col1.append (val) cursor.execute ("SELECT col2 FROM t2") col2 = [ ] for (val, ) in cursor.fetchall ( ): col2.append (val) cursor.close ( )

col1 value t o see whet her or not it 's present sat isfies t he NOT IN( ) const raint of t he subselect :

Then check each not , it

for val1 in col1: present = 0 for val2 in col2:

in t he set of

col2 values.

If

if val1 == val2: present = 1 break if not present: print val1 The code shown here perform s a lookup in t he

col2 values by looping t hrough t he array

t hat holds t hem . You m ay be able t o perform t his operat ion m ore efficient ly by using an associat ive dat a st ruct ure. For exam ple, in Perl or Pyt hon, you could put t he

col2 values in

a hash or dict ionary. Recipe 10.29 shows an exam ple t hat uses t hat approach. Yet anot her way t o sim ulat e subselect s, at least t hose of t he

IN( ) variet y, is t o generat e

t he necessary SQL from wit hin one inst ance of m ysql and feed it t o anot her inst ance t o be execut ed. Consider t he result from t his query:

mysql> SELECT CONCAT('SELECT col1 FROM t1 WHERE col1 = \'', col2, '\';') -> FROM t2; +------------------------------------------------------------+ | CONCAT('SELECT col1 FROM t1 WHERE col1 = \'', col2, '\';') | +------------------------------------------------------------+ | SELECT col1 FROM t1 WHERE col1 = 'b'; | | SELECT col1 FROM t1 WHERE col1 = 'c'; | | SELECT col1 FROM t1 WHERE col1 = 'd'; | +------------------------------------------------------------+

col2 values from t2 and uses t hem t o produce a set of SELECT find m at ching col1 values in t1. I f you issue t hat query in bat ch m ode and

The query ret rieves t he st at em ent s t hat

suppress t he colum n heading, m ysql produces only t he t ext of t he SQL st at em ent s, not all t he ot her fluff. You can feed t hat out put int o anot her inst ance of m ysql t o execut e t he queries. The result is t he sam e as t he subselect . Here's one way t o carry out t his procedure, assum ing t hat you have t he

SELECT st at em ent

cont aining t he

CONCAT( ) expression st ored in a

file nam ed m ake_select .sql:

% mysql -N cookbook < make_select.sql > tmp Here m ysql includes t he - N opt ion t o suppress colum n headers so t hat t hey won't get writ t en t o t he out put file, t m p. The cont ent s of t m p will look like t his:

SELECT col1 FROM t1 WHERE col1 = 'b'; SELECT col1 FROM t1 WHERE col1 = 'c'; SELECT col1 FROM t1 WHERE col1 = 'd'; To execut e t he queries in t hat file and generat e t he out put for t he sim ulat ed subselect , use t his com m and:

% mysql -N cookbook < tmp b c

This second inst ance of m ysql also includes t he - N opt ion, because ot herwise t he out put will include a header row for each of t he

SELECT st at em ent s t hat

it execut es. ( Try om it t ing - N

and see what happens.) One significant lim it at ion of using m ysql t o generat e SQL st at em ent s is t hat it doesn't work well if your

col2 values cont ain quot es or

ot her special charact ers. I n t hat case, t he queries

t hat t his m et hod generat es would be m alform ed. [2] [2]

As we go to press, a QUOTE( ) function has been added to MySQL 4.0.3 that allows special characters to be escaped so that they are suitable for use in SQL statements.

12.17 Selecting Records in Parallel from Multiple Tables 12.17.1 Problem You want t o select rows one aft er t he ot her from several t ables, or several set s of rows from a single t able—all as a single result set .

12.17.2 Solution Use a

UNION operat ion t o com bine m ult iple result

set s int o one.

12.17.3 Discussion A j oin is useful for com bining colum ns from different t ables side by side. I t 's not so useful when you want a result set t hat includes a set of rows from several t ables one aft er t he ot her, or different set s of rows from t he sam e t able. These are inst ances of t he t ype of operat ion for which a

UNION is useful.

A

UNION allows you t o run several SELECT st at em ent s and

concat enat e t heir result s "vert ically." You receive t he out put in a single result set , rat her t han running m ult iple queries and receiving m ult iple result set s.

UNION is available as of MySQL 4.0. This sect ion illust rat es how

t o use it , t hen describes

som e workarounds if you have an older version of MySQL. Suppose you have t wo t ables t hat list prospect ive and act ual cust om ers, a t hird t hat list s vendors from whom you purchase supplies, and you want t o creat e a single m ailing list by m erging nam es and addresses from all t hree t ables.

UNION provides a way

Assum e t he t hree t ables have t he following cont ent s:

mysql> SELECT * FROM prospect; +---------+-------+------------------------+ | fname | lname | addr | +---------+-------+------------------------+ | Peter | Jones | 482 Rush St., Apt. 402 | | Bernice | Smith | 916 Maple Dr. |

t o do t his.

+---------+-------+------------------------+ mysql> SELECT * FROM customer; +-----------+------------+---------------------+ | last_name | first_name | address | +-----------+------------+---------------------+ | Peterson | Grace | 16055 Seminole Ave. | | Smith | Bernice | 916 Maple Dr. | | Brown | Walter | 8602 1st St. | +-----------+------------+---------------------+ mysql> SELECT * FROM vendor; +-------------------+---------------------+ | company | street | +-------------------+---------------------+ | ReddyParts, Inc. | 38 Industrial Blvd. | | Parts-to-go, Ltd. | 213B Commerce Park. | +-------------------+---------------------+

prospect and customer use different nam es for t he first nam e and last nam e colum ns, and t he vendor t able includes only a single nam e colum n. None of t hat m at t ers for UNION; all you need do is The t ables have colum ns t hat are sim ilar but not ident ical.

m ake sure t o select t he sam e num ber of colum ns from each t able, and in t he sam e order. The following query illust rat es how t o select nam es and addresses from t he t hree t ables all at once:

mysql> SELECT fname, lname, addr FROM prospect -> UNION -> SELECT first_name, last_name, address FROM customer -> UNION -> SELECT company, '', street FROM vendor; +-------------------+----------+------------------------+ | fname | lname | addr | +-------------------+----------+------------------------+ | Peter | Jones | 482 Rush St., Apt. 402 | | Bernice | Smith | 916 Maple Dr. | | Grace | Peterson | 16055 Seminole Ave. | | Walter | Brown | 8602 1st St. | | ReddyParts, Inc. | | 38 Industrial Blvd. | | Parts-to-go, Ltd. | | 213B Commerce Park. | +-------------------+----------+------------------------+ The nam es and t ypes in t he result set are t aken from t he nam es and t ypes of t he colum ns

SELECT st at em ent . Not ice t hat , by default , a UNION elim inat es Bernice Sm it h appears in bot h t he prospect and customer t ables, but

ret rieved by t he first duplicat es;

only once in t he final result . I f you want t o select all records, including duplicat es, follow t he first

UNION keyword wit h ALL:

mysql> SELECT fname, lname, addr FROM prospect -> UNION ALL -> SELECT first_name, last_name, address FROM customer -> UNION -> SELECT company, '', street FROM vendor; +-------------------+----------+------------------------+

| fname | lname | addr | +-------------------+----------+------------------------+ | Peter | Jones | 482 Rush St., Apt. 402 | | Bernice | Smith | 916 Maple Dr. | | Grace | Peterson | 16055 Seminole Ave. | | Bernice | Smith | 916 Maple Dr. | | Walter | Brown | 8602 1st St. | | ReddyParts, Inc. | | 38 Industrial Blvd. | | Parts-to-go, Ltd. | | 213B Commerce Park. | +-------------------+----------+------------------------+ Because it 's necessary t o select t he sam e num ber of colum ns from each t able, t he for t he

vendor t able ( which has j ust

SELECT

one nam e colum n) ret rieves a dum m y ( em pt y) last

nam e colum n. Anot her way t o select t he sam e num ber of colum ns is t o com bine t he first and last nam e colum ns from t he

prospect and customer t ables int o a single colum n:

mysql> SELECT CONCAT(lname,', ',fname) AS name, addr FROM prospect -> UNION -> SELECT CONCAT(last_name,', ',first_name), address FROM customer -> UNION -> SELECT company, street FROM vendor; +-------------------+------------------------+ | name | addr | +-------------------+------------------------+ | Jones, Peter | 482 Rush St., Apt. 402 | | Smith, Bernice | 916 Maple Dr. | | Peterson, Grace | 16055 Seminole Ave. | | Brown, Walter | 8602 1st St. | | ReddyParts, Inc. | 38 Industrial Blvd. | | Parts-to-go, Ltd. | 213B Commerce Park. | +-------------------+------------------------+

ORDER BY clause aft er t he final SELECT nam e in t he ORDER BY should refer t o t he colum n

To sort t he result set as a whole, add an st at em ent . Any colum ns specified by

SELECT, because t hose are t he nam es used for name, do t his:

nam es used in t he first exam ple, t o sort by

t he result set . For

mysql> SELECT CONCAT(lname,', ',fname) AS name, addr FROM prospect -> UNION -> SELECT CONCAT(last_name,', ',first_name), address FROM customer -> UNION -> SELECT company, street FROM vendor -> ORDER BY name; +-------------------+------------------------+ | name | addr | +-------------------+------------------------+ | Brown, Walter | 8602 1st St. | | Jones, Peter | 482 Rush St., Apt. 402 | | Parts-to-go, Ltd. | 213B Commerce Park. | | Peterson, Grace | 16055 Seminole Ave. | | ReddyParts, Inc. | 38 Industrial Blvd. | | Smith, Bernice | 916 Maple Dr. | +-------------------+------------------------+

SELECT st at em ent s wit hin t he UNION. To do t his, enclose a given SELECT ( including it s ORDER BY clause) wit hin I t 's possible in MySQL t o sort t he result s of individual

parent heses:

mysql> (SELECT CONCAT(lname,', ',fname) AS name, addr -> FROM prospect ORDER BY 1) -> UNION -> (SELECT CONCAT(last_name,', ',first_name), address -> FROM customer ORDER BY 1) -> UNION -> (SELECT company, street FROM vendor ORDER BY 1); +-------------------+------------------------+ | name | addr | +-------------------+------------------------+ | Jones, Peter | 482 Rush St., Apt. 402 | | Smith, Bernice | 916 Maple Dr. | | Brown, Walter | 8602 1st St. | | Peterson, Grace | 16055 Seminole Ave. | | Parts-to-go, Ltd. | 213B Commerce Park. | | ReddyParts, Inc. | 38 Industrial Blvd. | +-------------------+------------------------+ Sim ilar synt ax can be used for wit h a t railing

LIMIT as well.

LIMIT clause, or

SELECT st at em ent s by enclosing t hem even be useful t o com bine ORDER BY and

for individual

wit hin parent heses. I n som e cases, it m ay

LIMIT. Suppose you want

That is, you can lim it t he result set as a whole

t o select a lucky prizewinner for som e kind of prom ot ional

giveaway. To select a single winner at random from t he com bined result s of t he t hree t ables, do t his:

mysql> SELECT CONCAT(lname,', ',fname) AS name, addr FROM prospect -> UNION -> SELECT CONCAT(last_name,', ',first_name), address FROM customer -> UNION -> SELECT company, street FROM vendor -> ORDER BY RAND( ) LIMIT 1; +-----------------+---------------------+ | name | addr | +-----------------+---------------------+ | Peterson, Grace | 16055 Seminole Ave. | +-----------------+---------------------+ To select a single winner from each t able and com bine t he result s, do t his inst ead:

mysql> (SELECT CONCAT(lname,', ',fname) AS name, addr -> FROM prospect ORDER BY RAND( ) LIMIT 1) -> UNION -> (SELECT CONCAT(last_name,', ',first_name), address -> FROM customer ORDER BY RAND( ) LIMIT 1) -> UNION -> (SELECT company, street -> FROM vendor ORDER BY RAND( ) LIMIT 1); +------------------+---------------------+

| name | addr | +------------------+---------------------+ | Smith, Bernice | 916 Maple Dr. | | ReddyParts, Inc. | 38 Industrial Blvd. | +------------------+---------------------+ I f t hat result surprises you ( "Why didn't it pick t hree rows?") , rem em ber t hat Bernice is list ed in t wo t ables and t hat

UNION elim inat es duplicat es. I f t he first

and second

SELECT

st at em ent s each happen t o pick Bernice, one inst ance will be elim inat ed and t he final result will have only t wo rows. ( I f t here are no duplicat es am ong t he t hree t ables, t he query will always ret urn t hree rows.) You could of course assure t hree records in all cases by using

UNION ALL, or

by running t he

SELECT st at em ent s individually.

I f you don't have MySQL 4.0, you can't use

UNION. But

you can achieve a sim ilar result by

SELECT queries int o t hat t able, and t hen select ing it s cont ent s. Wit h MySQL 3.23, you can use CREATE TABLE ... SELECT for t he first SELECT, t hen successively ret rieve t he ot her result set s int o it :

creat ing a t em porary t able, st oring t he result of m ult iple

mysql> -> mysql> -> -> mysql> ->

CREATE TABLE tmp SELECT CONCAT(lname,', ',fname) AS name, addr FROM prospect; INSERT INTO tmp (name, addr) SELECT CONCAT(last_name,', ',first_name), address FROM customer; INSERT INTO tmp (name, addr) SELECT company, street FROM vendor;

I f your version of MySQL is older t han 3.23, creat e t he t able first , t hen select each result set int o it :

mysql> mysql> -> -> mysql> -> -> mysql> ->

CREATE TABLE tmp (name CHAR(40), addr CHAR(40)); INSERT INTO tmp (name, addr) SELECT CONCAT(lname,', ',fname), addr FROM prospect; INSERT INTO tmp (name, addr) SELECT CONCAT(last_name,', ',first_name), address FROM customer; INSERT INTO tmp (name, addr) SELECT company, street FROM vendor;

Aft er select ing t he individual result s int o t he t em porary t able, select it s cont ent s:

mysql> SELECT * FROM tmp; +-------------------+------------------------+ | name | addr | +-------------------+------------------------+ | Jones, Peter | 482 Rush St., Apt. 402 | | Smith, Bernice | 916 Maple Dr. | | Peterson, Grace | 16055 Seminole Ave. | | Smith, Bernice | 916 Maple Dr. | | Brown, Walter | 8602 1st St. | | ReddyParts, Inc. | 38 Industrial Blvd. |

| Parts-to-go, Ltd. | 213B Commerce Park. | +-------------------+------------------------+

UNION ALL t han UNION, because duplicat es are not suppressed. To achieve t he effect of UNION, creat e t he t able wit h a unique index on t he Not e t hat t he result is m ore like

name and addr colum ns: mysql> CREATE TABLE tmp (name CHAR(40), addr CHAR(40), UNIQUE (name, addr)); mysql> INSERT INTO ... ... mysql> SELECT * FROM tmp; +-------------------+------------------------+ | name | addr | +-------------------+------------------------+ | Brown, Walter | 8602 1st St. | | Jones, Peter | 482 Rush St., Apt. 402 | | Parts-to-go, Ltd. | 213B Commerce Park. | | Peterson, Grace | 16055 Seminole Ave. | | ReddyParts, Inc. | 38 Industrial Blvd. | | Smith, Bernice | 916 Maple Dr. | +-------------------+------------------------+ I f you creat e t he t able wit hout a unique index, you can rem ove duplicat es at ret rieval t im e by using

DISTINCT, t hough t hat

is less efficient .

12.18 Inserting Records in One Table That Include Values from Another 12.18.1 Problem You need t o insert a record int o a t able t hat requires an I D value. But you know only t he nam e associat ed wit h t he I D, not t he I D it self.

12.18.2 Solution Assum ing t hat you have a lookup t able t hat associat es nam es and I Ds, creat e t he record using

INSERT INTO ... SELECT,

where t he

SELECT perform s a nam e lookup t o obt ain t he

corresponding I D value.

12.18.3 Discussion We've used lookup t ables oft en in t his chapt er in j oin queries, t ypically t o m ap I D values or codes ont o m ore descript ive nam es or labels. But lookup t ables are useful for m ore t han j ust

SELECT st at em ent s. They can help you creat e new records as well. To illust rat e, we'll use t he artist and painting t ables cont aining inform at ion about your art collect ion. Suppose you t ravel t o Minnesot a, where you find a bargain on a $51 reproduct ion of "Les

artist t able, so no new record is you do need a record in t he painting t able. To creat e it , you need t o

j ongleurs" by Renoir. Renoir is already list ed in t he needed t here. But

st ore t he art ist I D, t he t it le, t he st at e where you bought it , and t he price. You already know all of t hose except t he art ist I D, but it 's t edious t o look up t he I D from t he

artist t able

yourself. Because Renoir is already list ed t here, why not let MySQL look up t he I D for you? To

INSERT ... SELECT t o add t he new record. Specify all t he lit eral values t hat in t he SELECT out put colum n list , and use a WHERE clause t o look up t he art ist

do t his, use you know

I D from t he nam e:

mysql> INSERT INTO painting (a_id, title, state, price) -> SELECT a_id, 'Les jongleurs', 'MN', 51 -> FROM artist WHERE name = 'Renoir'; Nat urally, you wouldn't want t o writ e out t he full t ext of such a query by hand each t im e you get a new paint ing. But it would be easy t o writ e a short script t hat , given t he art ist nam e, paint ing t it le, origin, and price, would generat e and issue t he query for you. You could also writ e t he code t o m ake sure t hat if t he art ist is not already list ed in t he

artist t able, you

generat e a new I D value for t he art ist first . Just issue a st at em ent like t his prior t o creat ing t he new

painting record:

INSERT IGNORE INTO artist (name) VALUES('artist

name');

12.19 Updating One Table Based on Values in Another 12.19.1 Problem You need t o updat e exist ing records in one t able based on t he cont ent s of records in anot her t able, but MySQL doesn't yet allow j oin synt ax in t he

WHERE clause of UPDATE

st at em ent s. So you have no way t o associat e t he t wo t ables.

12.19.2 Solution Creat e a new t able t hat is populat ed from t he result of a j oin bet ween t he original t able and t he t able cont aining t he new inform at ion. Then replace t he original t able wit h t he new one. Or writ e a program t hat select s inform at ion from t he relat ed t able and issues t he queries necessary t o updat e t he original t able. Or use m ysql t o generat e and execut e t he queries.

12.19.3 Discussion Som et im es when updat ing records in one t able, it 's necessary t o refer t o records in anot her t able. Recall t hat t he like t his:

states t able used in several earlier

recipes cont ains rows t hat look

mysql> SELECT * FROM states; +----------------+--------+------------+----------+ | name | abbrev | statehood | pop | +----------------+--------+------------+----------+ | Alaska | AK | 1959-01-03 | 550043 | | Alabama | AL | 1819-12-14 | 4040587 | | Arkansas | AR | 1836-06-15 | 2350725 | | Arizona | AZ | 1912-02-14 | 3665228 | ... Now suppose t hat you want t o add som e new colum ns t o t his t able, using inform at ion from anot her t able,

city, t hat

cont ains inform at ion about each st at e's capit al cit y and largest

( m ost populous) cit y:

mysql> SELECT * FROM city; +----------------+----------------+----------------+ | state | capital | largest | +----------------+----------------+----------------+ | Alabama | Montgomery | Birmingham | | Alaska | Juneau | Anchorage | | Arizona | Phoenix | Phoenix | | Arkansas | Little Rock | Little Rock | ... I t would be easy enough t o add new colum ns nam ed

capital and largest t o t he

states t able st ruct ure using an ALTER TABLE st at em ent . But

t hen how would you

m odify t he rows t o fill in t he new colum ns wit h t he appropriat e values? The m ost convenient way t o do t his would be t o run an

UPDATE query

t hat uses j oin synt ax in t he

WHERE

clause:

UPDATE states,city SET states.capital = city.capital, states.largest = city.largest WHERE states.name = city.state; That doesn't work, because MySQL does not yet support t his synt ax. Anot her solut ion would be t o use a subselect in t he

WHERE clause, but

subselect s are not scheduled for inclusion

unt il MySQL 4.1. What are t he alt ernat ives? Clearly, you don't want t o updat e each row by hand. That 's unaccept ably t edious—and silly, t oo, given t hat t he new inform at ion is already st ored in t he

city t able.

The

states and city t ables cont ain a com m on key

( st at e

nam es) , so let 's use t hat inform at ion t o relat e t he t wo t ables and perform t he updat e. There are a few t echniques you can use t o achieve t he sam e result as a m ult iple- t able updat e:



states t able, but includes t he addit ional colum ns t o be added from t he relat ed t able, city. Populat e t he new t able using t he result of a j oin bet ween t he states and city t ables, t hen replace t he original states t able wit h t he new one.

Creat e a new t able t hat is like t he original



execut e



city t able t o generat e and updat e t he states t able one st at e at a t im e.

Writ e a program t hat uses t he inform at ion from t he

UPDATE st at em ent s t hat

Use m ysql t o generat e t he

UPDATE st at em ent s.

12.19.4 Performing a Related-Table Update Using Table Replacement states t able wit h t he capital and largest colum ns from t he city t able, creat e a tmp t able t hat is like t he states t able but adds capital and largest colum ns: The t able- replacem ent approach works as follows. To ext end t he

CREATE TABLE tmp ( name VARCHAR(30) NOT NULL, abbrev CHAR(2) NOT NULL, statehood DATE, pop BIGINT, capital VARCHAR(30), largest VARCHAR(30), PRIMARY KEY (abbrev) ); Then populat e

tmp using t he result

# # # # # #

state name 2-char abbreviation date of entry into the Union population as of 4/1990 capital city most populous city

of a j oin bet ween

states and city t hat

m at ches up

rows in t he t wo t ables using st at e nam es:

INSERT INTO tmp (name, abbrev, statehood, pop, capital, largest) SELECT states.name, states.abbrev, states.statehood, states.pop, city.capital, city.largest FROM states LEFT JOIN city ON states.name = city.state; The query uses a

LEFT JOIN for

a reason. Suppose t he

city t able is incom plet e and

doesn't cont ain a row for every st at e. I n t hat case, a regular j oin will fail t o produce an out put

city t able, and t he result ing tmp t able will be m issing records for t hose st at es, even t hough t hey are present in t he states t able. Not good! The LEFT JOIN ensures t hat t he SELECT produces out put for every row in t he states t able, whet her or not it 's m at ched by a city t able row. Any st at e t hat is m issing in t he city t able would end up wit h NULL values in t he tmp t able for t he capital and largest colum ns, but t hat 's appropriat e when you don't know t he cit y nam es—and row for any st at es t hat are m issing from t he

generat ing an incom plet e row cert ainly is preferable t o losing t he row ent irely.

The result ing

tmp t able is like t he original one, but

largest. You can exam ine it tmp t able, use it

has t wo new colum ns,

capital and

t o see t his. Aft er verifying t hat you're sat isfied wit h t he

t o replace t he original

states t able:

DROP TABLE states; ALTER TABLE tmp RENAME TO states; I f you want t o m ake sure t here is no t im e, however brief, during which t he

states t able is

unavailable, perform t he replacem ent like t his inst ead:

RENAME TABLE states TO states_old, tmp TO states; DROP TABLE states_old;

12.19.5 Performing a Related-Table Update by Writing a Program The t able- replacem ent t echnique is efficient because it let s t he server do all t he work. On t he ot her hand, it is m ost appropriat e when you're updat ing all or m ost of t he rows in t he t able. I f you're updat ing j ust a few rows, it m ay be less work t o updat e t he t able "in place" for j ust t hose rows t hat need it . Also, t able replacem ent requires m ore t han t wice t he space of t he original

states t able while you're carrying out

t he updat e procedure. I f you have a huge

t able t o updat e, you m ay not want t o use all t hat space. A second t echnique for updat ing a t able based on a relat ed t able is t o read t he inform at ion

UPDATE st at em ent s. For exam ple, t o updat e states wit h t he inform at ion st ored in t he city t able, read t he cit y nam es and use t hem from t he relat ed t able and use it t o generat e

t o creat e and issue a series of queries like t his:

UPDATE states SET capital WHERE name = 'Alabama'; UPDATE states SET capital WHERE name = 'Alaska'; UPDATE states SET capital WHERE name = 'Arizona'; UPDATE states SET capital WHERE name = 'Arkansas'; ...

= 'Montgomery', largest = 'Birmingham' = 'Juneau', largest = 'Anchorage' = 'Phoenix', largest = 'Phoenix' = 'Little Rock', largest = 'Little Rock'

To carry out t his procedure, first alt er t he colum ns:

states t able so t hat

it includes t he new

[3]

[3]

If you've already modified states using the table-replacement procedure, first restore the

table to its original structure by dropping the capital and largest columns:

ALTER TABLE states ADD capital VARCHAR(30), ADD largest VARCHAR(30);

city t able and uses it s cont ent s t o produce UPDATE m odify t he states t able. Here is an exam ple script , updat e_cit ies.pl, t hat

Next , writ e a program t hat reads t he st at em ent s t hat does so:

#! /usr/bin/perl -w # update_cities.pl - update states table capital and largest city columns, # using contents of city table. This assumes that the states table has # been modified to include columns named capital and largest. use strict; use lib qw(/usr/local/apache/lib/perl); use Cookbook; my $dbh = Cookbook::connect ( ); my $sth = $dbh->prepare ("SELECT state, capital, largest FROM city"); $sth->execute ( ); while (my ($state, $capital, $largest) = $sth->fetchrow_array ( )) { $dbh->do ("UPDATE states SET capital = ?, largest = ? WHERE name = ?", undef, $capital, $largest, $state); } $dbh->disconnect ( ); exit (0); The script has all t he t able and colum n nam es built in t o it , which m akes it very special purpose. You could generalize t his procedure by writ ing a funct ion t hat accept s param et ers indicat ing t he t able nam es, t he colum ns t o use for m at ching records in t he t wo t ables, and t he colum ns t o use for updat ing t he rows. The updat e_relat ed.pl script in t he j oins direct ory of t he

recipes dist ribut ion shows one way

t o do t his.

12.19.6 Performing a Related-Table Update Using mysql I f your dat a values don't require any special handling for int ernal quot es or ot her special charact ers, you can use m ysql t o generat e and process t he

UPDATE st at em ent s. This is

sim ilar t o t he t echnique shown in Recipe 12.18 for using m ysql t o sim ulat e a subselect . Put t he following st at em ent in a file, updat e_cit ies.sql:

SELECT CONCAT('UPDATE states SET capital = \'',capital, '\', largest = \'',largest,'\' WHERE name = \'',state,'\';') FROM city; The query reads t he rows of t he updat e

city t able and uses t hem

states. Execut e t he query

t o generat e st at em ent s t hat

and save t he result in t m p:

% mysql -N cookbook < update_cities.sql > tmp

t m p will cont ain st at em ent s t hat look like t he queries generat ed by t he updat e_cit ies.pl script . Assum ing t hat you're added t he

capital and largest colum ns t o t he states t able,

you can execut e t hese st at em ent s as follows t o updat e t he t able:

% mysql cookbook < tmp

12.20 Using a Join to Create a Lookup Table from Descriptive Labels 12.20.1 Problem A t able st ores long descript ive labels in an ident ifier colum n. You want t o convert t his colum n t o short I D values and use t he labels t o creat e a lookup t able t hat m aps I Ds t o labels.

12.20.2 Solution Use one of t he relat ed- t able updat e t echniques described in Recipe 12.19.

12.20.3 Discussion I t 's a com m on st rat egy t o st ore I D num bers or codes rat her t han descript ive st rings in a t able t o save space. I t also im proves perform ance, because it 's quicker t o index and ret rieve num bers t han st rings. ( For queries in which you need t o produce t he nam es, j oin t he I D values wit h an I D-t o- nam e lookup t able.) When you're creat ing a new t able, you can keep t his st rat egy in m ind and design t he t able from t he out set t o be used wit h a lookup t able. But you m ay also have an exist ing t able t hat st ores descript ive st rings and t hat could benefit from a conversion t o use I D values. This sect ion discusses how t o creat e t he lookup t able t hat m aps each label t o it s I D, and how t o convert t he labels t o I Ds in t he original t able. The t echnique com bines

ALTER TABLE wit h a relat ed-t able updat e.

Suppose you collect coins, and you've begun t o keep t rack of t hem in your dat abase using t he following t able:

CREATE TABLE coin ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, date CHAR(5) NOT NULL, # 4 digits + mint letter denom CHAR(20) NOT NULL, # denomination (e.g., Lincoln cent) PRIMARY KEY (id) ); Each coin is assigned an I D aut om at ically as an

AUTO_INCREMENT value, and you also

record each coin's dat e of issue and denom inat ion. The records t hat you've ent ered int o t he t able t hus far are as follows:

mysql> SELECT * FROM coin;

+----+-------+---------------------+ | id | date | denom | +----+-------+---------------------+ | 1 | 1944s | Lincoln cent | | 2 | 1977 | Roosevelt dime | | 3 | 1955d | Lincoln cent | | 4 | 1938 | Jefferson nickel | | 5 | 1964 | Kennedy half dollar | | 6 | 1959 | Lincoln cent | | 7 | 1945 | Jefferson nickel | | 8 | 1905 | Buffalo nickel | | 9 | 1924 | Mercury head dime | | 10 | 2001 | Roosevelt dime | | 11 | 1937 | Mercury head dime | | 12 | 1977 | Kennedy half dollar | +----+-------+---------------------+ The t able holds t he inform at ion in which you're int erest ed, but you not ice t hat it 's a wast e of space t o writ e out t he denom inat ion nam es in every record, and t hat t he problem will becom e worse as you ent er addit ional records int o t he t able. I t would be m ore space-efficient t o st ore

coin t able rat her t han t he nam es, t hen look up t he nam es a denom t able t hat list s each denom inat ion nam e and it s I D code. ( The

coded denom inat ion I Ds in t he when necessary from

benefit of t his m ay not be evident wit h such a sm all t able, but when your collect ion grows t o include 10,000 coins, t he space savings from st oring num bers rat her t han st rings will becom e m ore significant .) The procedure for set t ing up t he lookup t able and convert ing t he

coin t able t o use it

is as

follows: 1. 2.

denom lookup t able t o hold t he I D- t o- nam e m apping. Populat e t he denom t able using t he denom inat ion nam es current ly

Creat e t he

in t he

coin

t able. 3.

Replace t he denom inat ion nam es in t he

coin t able wit h t he corresponding I D

values. The

denom t able needs t o record each denom inat ion nam e and it s associat ed I D, so it

can

be creat ed using t he following st ruct ure:

CREATE TABLE denom ( denom_id INT UNSIGNED NOT NULL AUTO_INCREMENT, name CHAR(20) NOT NULL, PRIMARY KEY (denom_id) ); To populat e t he t able, insert int o it t he set of denom inat ion nam es t hat are present in t he

coin t able. j ust once:

Use

SELECT DISTINCT for

t his, because each nam e should be insert ed

INSERT INTO denom (name) SELECT DISTINCT denom FROM coin;

INSERT st at em ent adds only t he denom inat ion nam e t o t he denom t able; denom_id is an AUTO_INCREMENT colum n, so MySQL will assign sequence num bers The

t o it aut om at ically. The result ing t able looks like t his:

+----------+---------------------+ | denom_id | name | +----------+---------------------+ | 1 | Lincoln cent | | 2 | Roosevelt dime | | 3 | Jefferson nickel | | 4 | Kennedy half dollar | | 5 | Buffalo nickel | | 6 | Mercury head dime | +----------+---------------------+ Wit h MySQL 3.23 and up, you can creat e and populat e t he

denom t able using a single

CREATE TABLE ... SELECT st at em ent : CREATE TABLE denom ( denom_id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (denom_id) ) SELECT DISTINCT denom AS name FROM coin; Aft er set t ing up t he

denom t able, t he next

coin t able t o t heir

st ep is t o convert t he denom inat ion nam es in t he

associat ed I Ds:



Creat e a



denom colum n. Populat e tmp from t he result of a j oin bet ween t he coin and denom t ables. Use t he tmp t able t o replace t he original coin t able.



tmp t able t hat

is like

coin but

has a

denom_id colum n rat her

tmp t able, use a CREATE TABLE st at em ent t hat is like t he one used t o creat e coin, but subst it ut e a denom_id colum n for t he denom colum n:

To creat e t he originally

t han a

CREATE TABLE tmp ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, date CHAR(5) NOT NULL, # 4 digits + mint letter denom_id INT UNSIGNED NOT NULL, # denomination ID PRIMARY KEY (id) ); Then populat e

tmp using a j oin bet ween coin and denom:

INSERT INTO tmp (id, date, denom_id) SELECT coin.id, coin.date, denom.denom_id FROM coin, denom WHERE coin.denom = denom.name; Finally, replace t he original

coin t able wit h t he tmp t able:

DROP TABLE coin; ALTER TABLE tmp RENAME TO coin; Wit h MySQL 3.23 and up, you can creat e and populat e t he

tmp t able using a single

st at em ent :

CREATE TABLE tmp ( PRIMARY KEY (id) ) SELECT coin.id, coin.date, denom.denom_id FROM coin, denom WHERE coin.denom = denom.name; Then replace

coin wit h tmp,

as before.

coin t able aft er creat ing t he denom t able is t o m odify using a tmp t able:

Anot her m et hod for convert ing t he

coin in place wit hout 1. 2.

denom_id colum n t o t he coin t able wit h ALTER TABLE. Fill in t he denom_id value in each row wit h t he I D corresponding t o it s denom

Add a

nam e. 3.

Drop t he

denom colum n.

To carry out t his procedure, add a colum n t o

coin t o hold t he denom inat ion I D values:

ALTER TABLE coin ADD denom_id INT UNSIGNED NOT NULL;

denom_id colum n wit h t he proper values using t he denom inat ion nam e-t oI D m apping in t he denom t able. One way t o do t hat is t o writ e a script t o updat e t he I D values in t he coin t able one denom inat ion at a t im e. Here is a short script t hat does so: Then fill in t he

#! /usr/bin/perl -w # update_denom.pl - For each denomination in the denom table, # update the coin table records having that denomination with the # proper denomination ID. use strict; use lib qw(/usr/local/apache/lib/perl);

use Cookbook; my $dbh = Cookbook::connect ( ); my $sth = $dbh->prepare ("SELECT denom_id, name FROM denom"); $sth->execute ( ); while (my ($denom_id, $name) = $sth->fetchrow_array ( )) { # For coin table records with the given denomination name, # add the corresponding denom_id value from denom table $dbh->do ("UPDATE coin SET denom_id = ? WHERE denom = ?", undef, $denom_id, $name); } $dbh->disconnect ( ); exit (0); The script ret rieves each denom inat ion I D/ nam e pair from t he an appropriat e

UPDATE st at em ent

t o m odify all

denom t able and const ruct s

coin t able rows cont aining t he

denom_id values t o t he corresponding I D. When t he finishes, all rows in t he coin t able will have t he denom_id colum n updat ed

denom inat ion nam e by set t ing t heir script

properly. At t hat point , t he

denom colum n is no longer

necessary and you can j et t ison it :

ALTER TABLE coin DROP denom; Whichever m et hod you use t o convert t he

coin t able, t he result ing cont ent s look

like t his:

mysql> SELECT * FROM coin; +----+-------+----------+ | id | date | denom_id | +----+-------+----------+ | 1 | 1944s | 1 | | 2 | 1977 | 2 | | 3 | 1955d | 1 | | 4 | 1938 | 3 | | 5 | 1964 | 4 | | 6 | 1959 | 1 | | 7 | 1945 | 3 | | 8 | 1905 | 5 | | 9 | 1924 | 6 | | 10 | 2001 | 2 | | 11 | 1937 | 6 | | 12 | 1977 | 4 | +----+-------+----------+

coin records wit h denom inat ion nam es rat her a j oin using denom as a lookup t able:

When you need t o display result , perform

mysql> SELECT coin.id, coin.date, denom.name -> FROM coin, denom -> WHERE coin.denom_id = denom.denom_id; +----+-------+---------------------+

t han I Ds in a query

| id | date | name | +----+-------+---------------------+ | 1 | 1944s | Lincoln cent | | 2 | 1977 | Roosevelt dime | | 3 | 1955d | Lincoln cent | | 4 | 1938 | Jefferson nickel | | 5 | 1964 | Kennedy half dollar | | 6 | 1959 | Lincoln cent | | 7 | 1945 | Jefferson nickel | | 8 | 1905 | Buffalo nickel | | 9 | 1924 | Mercury head dime | | 10 | 2001 | Roosevelt dime | | 11 | 1937 | Mercury head dime | | 12 | 1977 | Kennedy half dollar | +----+-------+---------------------+ That result looks like t he cont ent s of t he original

coin t able, even t hough t he t able no

longer st ores a long descript ive st ring in each row. What about ent ering new it em s int o t he

coin t able? Using t he original coin t able, you'd

ent er t he denom inat ion nam e int o each record. But wit h t he denom inat ions convert ed t o I D values, t hat won't work. I nst ead, use an

INSERT INTO ... SELECT st at em ent

t o look up

t he denom inat ion I D based on t he nam e. For exam ple, t o ent er a 1962 Roosevelt dim e, use t his st at em ent :

INSERT INTO coin (date, denom_id) SELECT 1962, denom_id FROM denom WHERE name = 'Roosevelt dime'; This t echnique is described furt her in Recipe 12.18.

12.21 Deleting Related Rows in Multiple Tables 12.21.1 Problem You want t o delet e relat ed records from m ult iple t ables. This is com m on, for exam ple, when you have t ables t hat are relat ed in m ast er- det ail or parent - child fashion; delet ing a parent record t ypically requires all t he associat ed child records t o be delet ed as well.

12.21.2 Solution You have several opt ions. MySQL 4.0 support s cascaded delet e wit h a m ult iple- t able

DELETE synt ax;

you can replace t he t able wit h new versions t hat cont ain only t he records

not t o be delet ed; you can writ e a program t o const ruct appropriat e each t able, or you m ay be able t o use m ysql t o do so.

12.21.3 Discussion

DELETE st at em ent s for

Applicat ions t hat use relat ed t ables need t o operat e on bot h t ables at once for m any operat ions. Suppose you use MySQL t o record inform at ion about t he cont ent s of soft ware dist ribut ions t hat you m aint ain. The m ast er ( or parent ) t able list s each dist ribut ion's nam e, version num ber, and release dat e. The det ail ( or child) t able list s inform at ion about t he files in t he dist ribut ions, t hus serving as t he m anifest for each dist ribut ion's cont ent s. To allow t he parent and child records t o be associat ed, each parent record has a unique I D num ber, and t hat num ber is st ored in t he child records. The t ables m ight be defined som et hing like t his:

CREATE TABLE swdist_head ( dist_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # distribution ID name VARCHAR(40), # distribution name ver_num NUMERIC(5,2), # version number rel_date DATE NOT NULL, # release date PRIMARY KEY (dist_id) ); CREATE TABLE swdist_item ( dist_id INT UNSIGNED NOT NULL, # parent distribution ID dist_file VARCHAR(255) NOT NULL # name of file in distribution ); For t he exam ples here, assum e t he t ables cont ain t he following records:

mysql> SELECT * FROM swdist_head ORDER BY name, ver_num; +---------+------------+---------+------------+ | dist_id | name | ver_num | rel_date | +---------+------------+---------+------------+ | 1 | DB Gadgets | 1.59 | 1996-03-25 | | 3 | DB Gadgets | 1.60 | 1998-12-26 | | 4 | DB Gadgets | 1.61 | 1998-12-28 | | 2 | NetGizmo | 3.02 | 1998-11-10 | | 5 | NetGizmo | 4.00 | 2001-08-04 | +---------+------------+---------+------------+ mysql> SELECT * FROM swdist_item ORDER BY dist_id, dist_file; +---------+----------------+ | dist_id | dist_file | +---------+----------------+ | 1 | db-gadgets.sh | | 1 | README | | 2 | NetGizmo.exe | | 2 | README.txt | | 3 | db-gadgets.sh | | 3 | README | | 3 | README.linux | | 4 | db-gadgets.sh | | 4 | README | | 4 | README.linux | | 4 | README.solaris | | 5 | NetGizmo.exe | | 5 | README.txt | +---------+----------------+ The t ables describe t he dist ribut ions for t hree versions of DB Gadget s and t wo versions of Net Gizm o. But t he t ables are difficult t o m ake sense of individually, so t o display inform at ion

for a given dist ribut ion, you'd use a j oin t o select rows from bot h t ables. For exam ple, t he following query shows t he inform at ion st ored for DB Gadget s 1.60:

mysql> SELECT swdist_head.dist_id, swdist_head.name, -> swdist_head.ver_num, swdist_head.rel_date, swdist_item.dist_file -> FROM swdist_head, swdist_item -> WHERE swdist_head.name = 'DB Gadgets' AND swdist_head.ver_num = 1.60 -> AND swdist_head.dist_id = swdist_item.dist_id; +---------+------------+---------+------------+---------------+ | dist_id | name | ver_num | rel_date | dist_file | +---------+------------+---------+------------+---------------+ | 3 | DB Gadgets | 1.60 | 1998-12-26 | README | | 3 | DB Gadgets | 1.60 | 1998-12-26 | README.linux | | 3 | DB Gadgets | 1.60 | 1998-12-26 | db-gadgets.sh | +---------+------------+---------+------------+---------------+ Sim ilarly, t o delet e a dist ribut ion, you'd need t o access bot h t ables. DB Gadget s 1.60 has an I D of 3, so one way t o get rid of it would be t o issue

DELETE st at em ent s for

each of t he

t ables m anually:

mysql> DELETE FROM swdist_head WHERE dist_id = 3; mysql> DELETE FROM swdist_item WHERE dist_id = 3; That 's quick and easy, but problem s can occur if you forget t o issue

DELETE st at em ent s for

bot h t ables ( which is easier t o do t han you m ight t hink) . I n t hat case, your t ables becom e inconsist ent , wit h parent records t hat have no children, or children t hat are referenced by no parent . Also, m anual delet ion doesn't work well in sit uat ions where you have a large num ber of dist ribut ions t o rem ove, or when you don't know in advance which ones t o delet e. Suppose you decide t o purge all t he old records, keeping only t hose for t he m ost recent version of each dist ribut ion. ( For exam ple, t he t ables cont ain inform at ion for DB Gadget s dist ribut ions 1.59, 1.60, and 1.61, so you'd rem ove records for Versions 1.59 and 1.60.) For t his kind of operat ion, you'd likely det erm ine which dist ribut ions t o rem ove based on som e query t hat figures out t he I Ds of t hose t hat are not t he m ost recent . But t hen what do you do? The query m ight produce m any I Ds; you probably wouldn't want t o delet e each dist ribut ion m anually. And you don't have t o. There are several opt ions for delet ing records from m ult iple t ables:



Use t he m ult iple- t able

DELETE synt ax

t hat is available as of MySQL 4.0.0. This way

you can writ e a query t hat t akes care of ident ifying and rem oving records from bot h t ables at once. You need not rem em ber t o issue m ult iple

DELETE st at em ent s each

t im e you rem ove records from relat ed t ables.



Approach t he problem in reverse: select t he records t hat are not t o be delet ed int o new t ables, t hen use t hose t ables t o replace t he original ones. The effect is t he sam e as delet ing t he unwant ed records.



Use a program t hat det erm ines t he I Ds of t he dist ribut ions t o be rem oved and generat es t he appropriat e

DELETE st at em ent s for you. The program

you writ e yourself, or an exist ing program such as m ysql.

m ight be one

The rem ainder of t his sect ion exam ines each of t hese opt ions in t urn, showing how t o use t hem t o solve t he problem of delet ing old dist ribut ions. Because each exam ple rem oves records from t he

swdist_head and swdist_item t ables, you'll need t o creat e and

populat e t hem anew before t rying each m et hod, so t hat you begin at t he sam e st art ing point each t im e. You can do t his using t he swdist _creat e.sql script in t he j oins direct ory of t he

recipes dist ribut ion. Script s t hat

dem onst rat e each m ult iple- t able delet e m et hod shown in

t he exam ples m ay be found in t hat direct ory as well.

Using Foreign Keys to Enforce Referential Integrity One feat ure a dat abase m ay offer for helping you m aint ain consist ency bet ween t ables is t he abilit y t o define foreign key relat ionships. This m eans you can specify explicit ly in t he t able definit ion t hat a prim ary key in a parent t able ( such as t he

dist_id colum n of t he swdist_head t able) is a parent t o a key in anot her t able ( t he dist_id colum n in t he swdist_item t able) . By defining t he I D colum n in t he child t able as a foreign key t o t he I D colum n in t he parent , t he dat abase can enforce cert ain const raint s against illegal operat ions. For exam ple, it can prevent you from creat ing a child record wit h an I D t hat is not present in t he parent , or from delet ing parent records wit hout also delet ing t he corresponding child records first . A foreign key im plem ent at ion m ay also offer cascaded delet e: if you delet e a parent record, t he dat abase engine cascades t he effect of t he delet e t o any child t ables and aut om at ically delet es t he child records for you. The I nnoDB t able t ype in MySQL offers support for foreign keys as of Version 3.23.44, and for cascaded delet e as of 3.23.50. I n addit ion, t here are plans t o im plem ent foreign key support for all t he t able t ypes in MySQL 4.1.

12.21.4 Performing a Cascaded Delete with a Multiple-Table DELETE Statement As of MySQL 4.0.0,

DELETE support s a synt ax t hat

allows you t o ident ify records t o be

rem oved from m ult iple t ables and clobber t hem all wit h a single st at em ent . To use t his for delet ing soft ware dist ribut ions from t he

swdist_head and swdist_item t ables,

det erm ine t he I Ds of t he relevent dist ribut ions and t hen apply t he list t o t hose t ables. First , det erm ine which version of each dist ribut ion is t he m ost recent and select t he nam es and version num bers int o a separat e t able. The following query select s each dist ribut ion nam e and t he highest version num ber for each one:

mysql> -> -> ->

CREATE TABLE tmp SELECT name, MAX(ver_num) AS newest FROM swdist_head GROUP BY name;

The result ing t able looks like t his:

mysql> SELECT * FROM tmp; +------------+--------+ | name | newest | +------------+--------+ | DB Gadgets | 1.61 | | NetGizmo | 4.00 | +------------+--------+ Next , det erm ine t he I D num bers of t he dist ribut ions t hat are older t han t hose list ed in t he

tmp t able: mysql> CREATE TABLE tmp2 -> SELECT swdist_head.dist_id, swdist_head.name, swdist_head.ver_num -> FROM swdist_head, tmp -> WHERE swdist_head.name = tmp.name AND swdist_head.ver_num < tmp.newest;

dist_id colum n int o tmp2. The exam ple as well so t hat you can look at tmp2 and see m ore

Not e t hat you act ually need select only t he select s t he nam e and version num ber

easily t hat t he I Ds it chooses are indeed t hose for t he older dist ribut ions t hat are t o be delet ed:

mysql> SELECT * FROM tmp2; +---------+------------+---------+ | dist_id | name | ver_num | +---------+------------+---------+ | 1 | DB Gadgets | 1.59 | | 3 | DB Gadgets | 1.60 | | 2 | NetGizmo | 3.02 | +---------+------------+---------+ The t able does not cont ain t he I Ds for DB Gadget s 1.61 or Net Gizm o 4.00, which are t he m ost recent dist ribut ions. Now apply t he I D list in

tmp2 t o t he dist ribut ion t ables using a m ult iple- t able DELETE.

The

general form of t his st at em ent is:

DELETE

tbl_list1

FROM

tbl_list2

WHERE

conditions;

tbl_list1 nam es t he t ables from which t o delet e records. tbl_list2 nam es t he t ables used in t he WHERE clause, which specifies t he condit ions t hat ident ify t he records t o delet e. Each t able list can nam e one or m ore t ables, separat ed by com m as. For t he sit uat ion

swdist_head and swdist_item. used t o ident ify t he delet ed records are t hose t ables and t he tmp2 t able: at hand, t he t ables t o delet e from are

mysql> DELETE swdist_head, swdist_item

The t ables

-> FROM tmp2, swdist_head, swdist_item -> WHERE tmp2.dist_id = swdist_head.dist_id -> AND tmp2.dist_id = swdist_item.dist_id; The result ing t ables look like t his:

mysql> SELECT * FROM swdist_head; +---------+------------+---------+------------+ | dist_id | name | ver_num | rel_date | +---------+------------+---------+------------+ | 4 | DB Gadgets | 1.61 | 1998-12-28 | | 5 | NetGizmo | 4.00 | 2001-08-04 | +---------+------------+---------+------------+ mysql> SELECT * FROM swdist_item; +---------+----------------+ | dist_id | dist_file | +---------+----------------+ | 4 | README | | 4 | README.linux | | 4 | README.solaris | | 4 | db-gadgets.sh | | 5 | README.txt | | 5 | NetGizmo.exe | +---------+----------------+ For t he t ables t hat we're using, t he

DELETE st at em ent

j ust shown works as expect ed. But

be aware t hat it will fail for t ables cont aining parent records t hat should be delet ed but for which t here are no corresponding child records. The

WHERE clause will find no m at ch for t he

parent record in t he client t able, and t hus not select t he parent record for delet ion. To m ake sure t hat t he query select s and delet es t he parent record even in t he absence of m at ching child records, use a

LEFT JOIN:

mysql> DELETE swdist_head, swdist_item -> FROM tmp2 LEFT JOIN swdist_head ON tmp2.dist_id = swdist_head.dist_id -> LEFT JOIN swdist_item ON swdist_head.dist_id = swdist_item.dist_id;

LEFT JOIN is discussed in Recipe 12.6. 12.21.5 Performing a Multiple-Table Delete Using Table Replacement Anot her way t o delet e relat ed rows from m ult iple t ables is t o select only t he records t hat should not be delet ed int o new t ables, t hen replace t he original t ables wit h t he new ones. This is especially useful when you want t o delet e m ore records t han you want t o keep.

tmp_head and tmp_item t hat t he swdist_head and swdist_item t ables: Begin by creat ing t wo t ables

CREATE TABLE tmp_head (

have t he sam e st ruct ure as

dist_id name ver_num rel_date PRIMARY KEY

INT UNSIGNED NOT NULL AUTO_INCREMENT, VARCHAR(40), NUMERIC(5,2), DATE NOT NULL, (dist_id)

); CREATE TABLE tmp_item ( dist_id INT UNSIGNED NOT NULL, dist_file VARCHAR(255) NOT NULL );

# # # #

distribution ID distribution name version number release date

# parent distribution ID # name of file in distribution

Then det erm ine t he I Ds of t he dist ribut ions you want t o keep ( t hat is, t he m ost recent version of each dist ribut ion) . The I Ds are found as follows, using queries sim ilar t o t hose j ust described in t he m ult iple- t able delet e sect ion:

mysql> CREATE TABLE tmp -> SELECT name, MAX(ver_num) AS newest -> FROM swdist_head -> GROUP BY name; mysql> CREATE TABLE tmp2 -> SELECT swdist_head.dist_id -> FROM swdist_head, tmp -> WHERE swdist_head.name = tmp.name AND swdist_head.ver_num = tmp.newest; Next , select int o t he new t ables t he records t hat should be ret ained:

mysql> -> -> -> mysql> -> -> ->

INSERT INTO tmp_head SELECT swdist_head.* FROM swdist_head, tmp2 WHERE swdist_head.dist_id = tmp2.dist_id; INSERT INTO tmp_item SELECT swdist_item.* FROM swdist_item, tmp2 WHERE swdist_item.dist_id = tmp2.dist_id;

Finally, replace t he original t ables wit h t he new ones:

mysql> mysql> mysql> mysql>

DROP TABLE swdist_head; ALTER TABLE tmp_head RENAME TO swdist_head; DROP TABLE swdist_item; ALTER TABLE tmp_item RENAME TO swdist_item;

12.21.6 Performing a Multiple-Table Delete by Writing a Program The preceding t wo m et hods for delet ing relat ed rows from m ult iple t ables are SQL- only t echniques. Anot her approach is t o writ e a program t hat generat es t he

DELETE st at em ent s

for you. The program should det erm ine t he key values ( t he dist ribut ion I Ds) for t he records t o delet e, t hen process t he keys t o t urn t hem int o appropriat e

DELETE st at em ent s. I dent ifying

t he I Ds can be done t he sam e way as shown for t he previous m et hods, but you have som e lat it ude in how you want t o use t hem t o delet e records:



Handle each I D individually. Const ruct

DELETE st at em ent s t hat

rem ove records

from t he t ables one I D at a t im e.



Handle t he I Ds as a group. Const ruct an

IN( ) clause t hat

nam es all t he I Ds, and

use it wit h each t able t o delet e all t he m at ching I Ds at once.



I f t he I D list is huge, break it int o sm aller groups t o const ruct short er

IN( )

clauses.



You can also solve t he problem by reversing t he perspect ive. Select t he I Ds for t he dist ribut ions you want t o ret ain and use t hem t o const ruct a

NOT IN( ) clause t hat

delet es all t he ot her dist ribut ions. This will usually be less efficient , because MySQL will not use an index for

NOT IN( ) operat ions.

I 'll show how t o im plem ent each m et hod using Perl. For each of t he first t hree m et hods, begin by generat ing a list of t he dist ribut ion I Ds for t he records t o be delet ed:

# Identify the newest version for each distribution name $dbh->do ("CREATE TABLE tmp SELECT name, MAX(ver_num) AS newest FROM swdist_head GROUP BY name"); # Identify the IDs for versions that are older than those. my $ref = $dbh->selectcol_arrayref ( "SELECT swdist_head.dist_id FROM swdist_head, tmp WHERE swdist_head.name = tmp.name AND swdist_head.ver_num < tmp.newest"); # selectcol_arrayref( ) returns a reference to a list. Convert the reference # to a list, which will be empty if $ref is undef or points to an empty list. my @val = ($ref ? @{$ref} : ( )); At t his point ,

@val cont ains t he list

of I Ds for t he records t o rem ove. To process t hem

individually, run t he following loop:

# Use the ID list to delete records, one ID at a time foreach my $val (@val) { $dbh->do ("DELETE FROM swdist_head WHERE dist_id = ?", undef, $val); $dbh->do ("DELETE FROM swdist_item WHERE dist_id = ?", undef, $val); } The loop will generat e st at em ent s t hat look like t his:

DELETE DELETE DELETE DELETE DELETE DELETE

FROM FROM FROM FROM FROM FROM

swdist_head swdist_item swdist_head swdist_item swdist_head swdist_item

WHERE WHERE WHERE WHERE WHERE WHERE

dist_id dist_id dist_id dist_id dist_id dist_id

= = = = = =

'1' '1' '3' '3' '2' '2'

A drawback of t his approach is t hat for large t ables, t he I D list m ay be quit e large and you'll generat e lot s of

DELETE st at em ent s. To be m ore efficient , com bine t he I Ds int o a single

IN( ) clause t hat

nam es t hem all at once. Generat e t he I D list t he sam e way as for t he first

m et hod, t hen process t he list like t his: [4] [4]

In Perl, you can't bind an array to a placeholder, but you can construct the query string to contain the proper number of ? characters (see Recipe 2.7). Then pass the array to be bound to the statement, and each element will be bound to the corresponding placeholder.

# Use the ID list to delete records for all IDs at once. # is empty, don't bother; there's nothing to delete. if (@val) { # generate list of comma-separated my $where = "WHERE dist_id IN (" . $dbh->do ("DELETE FROM swdist_head $dbh->do ("DELETE FROM swdist_item } This m et hod generat es only one

If the list

"?" placeholders, one per value join (",", ("?") x @val) . ")"; $where", undef, @val); $where", undef, @val);

DELETE st at em ent

per t able:

DELETE FROM swdist_head WHERE dist_id IN ('1','3','2') DELETE FROM swdist_item WHERE dist_id IN ('1','3','2') I f t he list of I Ds is ext rem ely large, you m ay be in danger of producing

DELETE st at em ent s

t hat exceed t he m axim um query lengt h ( a m egabyt e by default ) . I n t his case, you can break t he I D list int o sm aller groups and use each one t o const ruct a short er

IN( ) clause:

# Use the ID list to delete records, using parts of the list at a time. my $grp_size = 1000;

# number of IDs to delete at once

for (my $i = 0; $i < @val; $i += $grp_size) { my $j = (@val < $i + $grp_size ? @val : $i + $grp_size); my @group = @val[$i .. $j-1]; # generate list of comma-separated "?" placeholders, one per value my $where = "WHERE dist_id IN (" . join (",", ("?") x @group) . ")"; $dbh->do ("DELETE FROM swdist_head $where", undef, @group); $dbh->do ("DELETE FROM swdist_item $where", undef, @group); } Each of t he preceding program m ing m et hods finds t he I Ds of t he records t o rem ove and t hen delet es t hem . You can also achieve t he sam e obj ect ive using reverse logic: select t he I Ds for

t he records you want t o keep, t hen delet e everyt hing else. This approach can be useful if you expect t o ret ain fewer records t han you'll delet e. To im plem ent it , det erm ine t he newest version for each dist ribut ion and find t he associat ed I Ds. Then use t he I D list t o const ruct a

NOT IN( ) clause: # Identify the newest version for each distribution name $dbh->do ("CREATE TABLE tmp SELECT name, MAX(ver_num) AS newest FROM swdist_head GROUP BY name"); # Identify the IDs for those versions. my $ref = $dbh->selectcol_arrayref ( "SELECT swdist_head.dist_id FROM swdist_head, tmp WHERE swdist_head.name = tmp.name AND swdist_head.ver_num = tmp.newest"); # selectcol_arrayref( ) returns a reference to a list. Convert the reference # to a list, which will be empty if $ref is undef or points to an empty list. my @val = ($ref ? @{$ref} : ( )); # Use the ID list to delete records for all *other* IDs at once. # The WHERE clause is empty if the list is empty (in that case, # no records are to be kept, so they all can be deleted). my $where = ""; if (@val) { # generate list of comma-separated "?" placeholders, one per value $where = "WHERE dist_id NOT IN (" . join (",", ("?") x @val) . ")"; } $dbh->do ("DELETE FROM swdist_head $where", undef, @val); $dbh->do ("DELETE FROM swdist_item $where", undef, @val);

NOT groups and using NOT IN( ) wit h

Not e t hat wit h t his reverse- logic approach, you m ust use t he ent ire I D list in a single

IN( ) clause. I f you t ry

breaking t he list int o sm aller

each of t hose, you'll em pt y your t ables com plet ely when you don't int end t o.

12.21.7 Performing a Multiple-Table Delete Using mysql I f t he keys t hat indicat e which records t o delet e do not include quot es or ot her special charact ers, you can generat e

DELETE st at em ent s using m ysql. For t he soft ware dist ribut ion

t ables, t he keys ( dist_id values) are int egers, so t hey're suscept ible t o t his approach. Generat e t he I D list using t he sam e queries as t hose described in t he m ult iple- t able sect ion, t hen use t he list t o creat e t he

DELETE st at em ent s:

DELETE

CREATE TABLE tmp SELECT name, MAX(ver_num) AS newest FROM swdist_head GROUP BY name; CREATE TABLE tmp2 SELECT swdist_head.dist_id FROM swdist_head, tmp WHERE swdist_head.name = tmp.name AND swdist_head.ver_num < tmp.newest; SELECT CONCAT('DELETE FROM swdist_head WHERE dist_id=',dist_id,';') FROM tmp2; SELECT CONCAT('DELETE FROM swdist_item WHERE dist_id=',dist_id,';') FROM tmp2; I f you have t hose st at em ent s in a file swdist _m ysql_delet e.sql, execut e t he file as follows t o produce t he set of

DELETE st at em ent s:

% mysql -N cookbook < swdist_mysql_delete.sql > tmp The file t m p will look like t his:

DELETE DELETE DELETE DELETE DELETE DELETE

FROM FROM FROM FROM FROM FROM

swdist_head swdist_head swdist_head swdist_item swdist_item swdist_item

WHERE WHERE WHERE WHERE WHERE WHERE

dist_id=1; dist_id=3; dist_id=2; dist_id=1; dist_id=3; dist_id=2;

Then execut e t he cont ent s of t m p as follows:

% mysql cookbook < tmp

12.22 Identifying and Removing Unattached Records 12.22.1 Problem You have t ables t hat are relat ed ( for exam ple, t hey have a m ast er-det ail relat ionship) . But you suspect t hat som e of t he records are unat t ached and can be rem oved.

12.22.2 Solution Use a

LEFT JOIN t o ident ify unm at ched values and delet e t hem

by adapt ing t he

t echniques shown in Recipe 12.21. Or use a t able- replacem ent procedure t hat select s t he m at ched records int o a new t able and replaces t he original t able wit h it .

12.22.3 Discussion The previous sect ion shows how t o delet e relat ed records from m ult iple t ables at once, using t he relat ionship t hat exist s bet ween t he t ables. Som et im es t he opposit e problem present s

it self, where you want t o delet e records based on t he lack of relat ionship. Problem s of t his kind t ypically occur when you have t ables t hat are supposed t o m at ch up, but som e of t he records are unat t ached—t hat is, t hey are unm at ched by any corresponding record in t he ot her t able. This can occur by accident , such as when you delet e a parent record but forget t o delet e t he associat ed child records, or vice versa. I t can also occur as an ant icipat ed consequence of a deliberat e act ion. Suppose an online discussion board uses a parent t able t hat list s discussion t opics and a child t able t hat records t he art icles post ed for each t opic. I f you purge t he child t able of old art icle records, t hat m ay result in any given t opic record in t he parent t able no longer having any children. I f so, t he lack of recent post ings for t he t opic indicat es t hat it is probably dead and t hat t he parent record in t he t opic t able can be delet ed, t oo. I n such a sit uat ion, you delet e a set of child records wit h t he explicit recognit ion t hat t he operat ion m ay st rand parent records and cause t hem t o becom e eligible for being delet ed as well. However you arrive at t he point where relat ed t ables have unm at ched records, rest oring t he t ables t o a consist ent st at e is a m at t er of ident ifying t he unat t ached records and t hen delet ing t hem :



unm at ched records"



LEFT JOIN, because t his is a " find problem . ( See Recipe 12.6 for inform at ion about LEFT JOIN.)

To ident ify t he unat t ached records, use a

To delet e t he records t hat have t he unm at ched I Ds, use t echniques sim ilar t o t hose shown in Recipe 12.21, for rem oving records from m ult iple relat ed t ables.

The exam ples here use t he

swdist_head and swdist_item soft ware dist ribut ion

t ables t hat were used in Recipe 12.21. Creat e t he t ables in t heir init ial st at e using t he swdist _creat e.sql script in t he j oins direct ory of t he

recipes dist ribut ion.

t his:

mysql> SELECT * FROM swdist_head; +---------+------------+---------+------------+ | dist_id | name | ver_num | rel_date | +---------+------------+---------+------------+ | 1 | DB Gadgets | 1.59 | 1996-03-25 | | 2 | NetGizmo | 3.02 | 1998-11-10 | | 3 | DB Gadgets | 1.60 | 1998-12-26 | | 4 | DB Gadgets | 1.61 | 1998-12-28 | | 5 | NetGizmo | 4.00 | 2001-08-04 | +---------+------------+---------+------------+ mysql> SELECT * FROM swdist_item; +---------+----------------+ | dist_id | dist_file | +---------+----------------+ | 1 | README | | 1 | db-gadgets.sh | | 3 | README | | 3 | README.linux | | 3 | db-gadgets.sh | | 4 | README |

They'll look like

| 4 | README.linux | | 4 | README.solaris | | 4 | db-gadgets.sh | | 2 | README.txt | | 2 | NetGizmo.exe | | 5 | README.txt | | 5 | NetGizmo.exe | +---------+----------------+ The records in t he t ables are fully m at ched at t his point : For every

dist_id value in t he

parent t able, t here is at least one child record, and each child record has a parent . To "dam age" t he int egrit y of t his relat ionship for purposes of illust rat ion, rem ove a few records from each t able:

mysql> DELETE FROM swdist_head WHERE dist_id IN (1,4); mysql> DELETE FROM swdist_item WHERE dist_id IN (2,5); The result is t hat t here are unat t ached records in bot h t ables:

mysql> SELECT * FROM swdist_head; +---------+------------+---------+------------+ | dist_id | name | ver_num | rel_date | +---------+------------+---------+------------+ | 2 | NetGizmo | 3.02 | 1998-11-10 | | 3 | DB Gadgets | 1.60 | 1998-12-26 | | 5 | NetGizmo | 4.00 | 2001-08-04 | +---------+------------+---------+------------+ mysql> SELECT * FROM swdist_item; +---------+----------------+ | dist_id | dist_file | +---------+----------------+ | 1 | README | | 1 | db-gadgets.sh | | 3 | README | | 3 | README.linux | | 3 | db-gadgets.sh | | 4 | README | | 4 | README.linux | | 4 | README.solaris | | 4 | db-gadgets.sh | +---------+----------------+ A lit t le inspect ion reveals t hat only dist ribut ion 3 has records in bot h t ables. Dist ribut ions 2

swdist_head t able are unm at ched by any records in t he swdist_item t able. Conversely, dist ribut ions 1 and 4 in t he swdist_item t able are unm at ched by any records in t he swdist_head t able. and 5 in t he

The problem now is t o ident ify t he unat t ached records ( by som e m eans ot her t han visual inspect ion) , and t hen rem ove t hem . I dent ificat ion is a m at t er of using a exam ple, t o find childless parent records in t he query:

LEFT JOIN.

For

swdist_head t able, use t he following

mysql> SELECT swdist_head.dist_id AS 'unmatched swdist_head IDs' -> FROM swdist_head LEFT JOIN swdist_item -> ON swdist_head.dist_id = swdist_item.dist_id -> WHERE swdist_item.dist_id IS NULL; +---------------------------+ | unmatched swdist_head IDs | +---------------------------+ | 2 | | 5 | +---------------------------+ Conversely, t o find t he I Ds for orphaned children in t he

swdist_item t able t hat

have no

parent , reverse t he roles of t he t wo t ables:

mysql> SELECT swdist_item.dist_id AS 'unmatched swdist_item IDs' -> FROM swdist_item LEFT JOIN swdist_head -> ON swdist_item.dist_id = swdist_head.dist_id -> WHERE swdist_head.dist_id IS NULL; +---------------------------+ | unmatched swdist_item IDs | +---------------------------+ | 1 | | 1 | | 4 | | 4 | | 4 | | 4 | +---------------------------+ Not e t hat in t his case, an I D will appear m ore t han once in t he list if t here are m ult iple children for a m issing parent . Depending on how you choose t o delet e t he unm at ched records, you m ay want t o use

DISTINCT t o select

each unm at ched child I D only once:

mysql> SELECT DISTINCT swdist_item.dist_id AS 'unmatched swdist_item IDs' -> FROM swdist_item LEFT JOIN swdist_head -> ON swdist_item.dist_id = swdist_head.dist_id -> WHERE swdist_head.dist_id IS NULL; +---------------------------+ | unmatched swdist_item IDs | +---------------------------+ | 1 | | 4 | +---------------------------+ Aft er you ident ify t he unat t ached records, t he quest ion becom es how t o get rid of t hem . You can use eit her of t he following t echniques, which you'll recognize as sim ilar t o t hose discussed in Recipe 12.21:



DELETE st at em ent . You'll be rem oving records from t he synt ax for t his form of DELETE is st ill useful

Use t he I Ds in a m ult iple- t able j ust one t able at a t im e, but

because it allows you t o ident ify t he records t o rem ove by m eans of a j oin bet ween t he relat ed t ables.



Run a program t hat select s t he unm at ched I Ds and uses t hem t o generat e

DELETE

st at em ent s. To use a m ult iple- t able

DELETE st at em ent

for rem oving unm at ched records, j ust t ake t he

SELECT st at em ent t hat you use t o ident ify t hose records and replace t he st uff leading up t o t he FROM keyword wit h DELETE tbl_name. For exam ple, t he SELECT t hat ident ifies childless parent s looks like t his:

SELECT swdist_head.dist_id AS 'unmatched swdist_head IDs' FROM swdist_head LEFT JOIN swdist_item ON swdist_head.dist_id = swdist_item.dist_id WHERE swdist_item.dist_id IS NULL; The corresponding

DELETE looks like t his:

DELETE swdist_head FROM swdist_head LEFT JOIN swdist_item ON swdist_head.dist_id = swdist_item.dist_id WHERE swdist_item.dist_id IS NULL; Conversely, t he query t o ident ify parent less children is as follows:

SELECT swdist_item.dist_id AS 'unmatched swdist_item IDs' FROM swdist_item LEFT JOIN swdist_head ON swdist_item.dist_id = swdist_head.dist_id WHERE swdist_head.dist_id IS NULL; And t he corresponding

DELETE st at em ent

rem oves t hem :

DELETE swdist_item FROM swdist_item LEFT JOIN swdist_head ON swdist_item.dist_id = swdist_head.dist_id WHERE swdist_head.dist_id IS NULL; To rem ove unm at ched records by writ ing a program , select t he I D list and t urn it int o a set of

DELETE st at em ent s. Here's a Perl program

t hat does so, first for t he parent t able and t hen

for t he child t able:

#! /usr/bin/perl -w use strict; use lib qw(/usr/local/apache/lib/perl); use Cookbook; my $dbh = Cookbook::connect ( ); # Identify the IDs of childless parent records my $ref = $dbh->selectcol_arrayref ( "SELECT swdist_head.dist_id FROM swdist_head LEFT JOIN swdist_item

ON swdist_head.dist_id = swdist_item.dist_id WHERE swdist_item.dist_id IS NULL"); # selectcol_arrayref( ) returns a reference to a list. Convert the reference # to a list, which will be empty if $ref is undef or points to an empty list. my @val = ($ref ? @{$ref} : ( )); # Use the ID list to delete records for all IDs at once. # is empty, don't bother; there's nothing to delete.

If the list

if (@val) { # generate list of comma-separated "?" placeholders, one per value my $where = "WHERE dist_id IN (" . join (",", ("?") x @val) . ")"; $dbh->do ("DELETE FROM swdist_head $where", undef, @val); } # Repeat the procedure for the child table. # each ID is selected only once.

Use SELECT DISTINCT so that

$ref = $dbh->selectcol_arrayref ( "SELECT DISTINCT swdist_item.dist_id FROM swdist_item LEFT JOIN swdist_head ON swdist_item.dist_id = swdist_head.dist_id WHERE swdist_head.dist_id IS NULL"); @val = ($ref ? @{$ref} : ( )); if (@val) { # generate list of comma-separated "?" placeholders, one per value my $where = "WHERE dist_id IN (" . join (",", ("?") x @val) . ")"; $dbh->do ("DELETE FROM swdist_item $where", undef, @val); } $dbh->disconnect ( ); exit (0); The program uses

IN( ) t o delet e all t he affect ed records in a given t able at

once. See

Recipe 12.21 for ot her relat ed approaches.

DELETE st at em ent s; a script t his can be found in t he j oins direct ory of t he recipes dist ribut ion. You can also use m ysql t o generat e t he

t hat shows how t o do

A different t ype of solut ion t o t he problem is t o use a t able- replacem ent procedure. This m et hod com es at t he problem in reverse. I nst ead of finding and rem oving unm at ched records, find and keep m at ched records. For exam ple, you can use a j oin t o select m at ched records int o a new t able. Then replace t he original t able wit h it . Unat t ached records don't get carried along by t he j oin, and so in effect are rem oved when t he new t able replaces t he original one.

The t able replacem ent procedure works as follows. For t he

swdist_head t able, creat e a

new t able wit h t he sam e st ruct ure:

CREATE TABLE tmp ( dist_id INT UNSIGNED NOT NULL AUTO_INCREMENT, name VARCHAR(40), ver_num NUMERIC(5,2), rel_date DATE NOT NULL, PRIMARY KEY (dist_id) ); Then select int o t he

tmp t able t hose swdist_head records t hat

# # # #

distribution ID distribution name version number release date

have a m at ch in t he

swdist_item t able: INSERT IGNORE INTO tmp SELECT swdist_head.* FROM swdist_head, swdist_item WHERE swdist_head.dist_id = swdist_item.dist_id; Not e t hat t he query uses

INSERT IGNORE;

a parent record m ay be m at ched by m ult iple

child records, but we want only one inst ance of it s I D. ( The sym pt om of failing t o use

IGNORE is t hat

t he query will fail wit h a " duplicat e key" error.)

Finish by replacing t he original t able wit h t he new one:

DROP TABLE swdist_head; ALTER TABLE tmp RENAME TO swdist_head; The procedure for replacing t he child t able wit h a t able cont aining only m at ched child records is sim ilar, except t hat

IGNORE is not

needed—each child t hat is m at ched will be m at ched by

only one parent :

CREATE TABLE tmp ( dist_id INT UNSIGNED NOT NULL, dist_file VARCHAR(255) NOT NULL );

# parent distribution ID # name of file in distribution

INSERT INTO tmp SELECT swdist_item.* FROM swdist_head, swdist_item WHERE swdist_head.dist_id = swdist_item.dist_id; DROP TABLE swdist_item; ALTER TABLE tmp RENAME TO swdist_item;

12.23 Using Different MySQL Servers Simultaneously 12.23.1 Problem You want t o run a query t hat uses t ables in dat abases t hat are host ed by different MySQL servers.

12.23.2 Solution There is no SQL- only solut ion t o t his problem . One workaround is t o open separat e connect ions t o each server and relat e t he inform at ion from t he t wo t ables yourself. Anot her is t o copy one of t he t ables from one server t o t he ot her so t hat you can work wit h bot h t ables using a single server.

12.23.3 Discussion Throughout t his chapt er, I 've been m aking t he im plicit assum pt ion t hat all t he t ables involved in a m ult iple- t able operat ion are m anaged by a single MySQL server. I f t his assum pt ion is invalid, t he t ables becom e m ore difficult t o work wit h. A connect ion t o a MySQL server is specific t o t hat server. You can't writ e a SQL st at em ent t hat refers t o t ables host ed by anot her server. ( I 've seen claim s t hat t his can be done, but t hey always t urn out t o have been m ade by people who haven't act ually t ried it .) Here is an exam ple t hat illust rat es t he problem , using t he

artist and painting t ables.

Suppose you want t o find t he nam es of paint ings by Da Vinci. This requires det erm ining t he I D for Da Vinci in t he

artist t able and m at ching it

t o records in t he

painting t able.

I f t he

bot h t ables are locat ed wit hin t he sam e dat abase, you can ident ify t he paint ings by using t he following query t o perform a j oin bet ween t he t ables:

mysql> SELECT painting.title -> FROM artist, painting -> WHERE artist.name = 'Da Vinci' AND artist.a_id = painting.a_id; +-----------------+ | title | +-----------------+ | The Last Supper | | The Mona Lisa | +-----------------+ I f t he t ables are in different dat abases, but st ill m anaged by t he sam e MySQL server, t he query need only be m odified a bit t o include dat abase qualifiers. ( This t echnique is discussed in Recipe 12.3.) For t he t wo t ables at hand, t he query looks som et hing like t his:

mysql> SELECT db2.painting.title -> FROM db1.artist, db2.painting -> WHERE db1.artist.name = 'Da Vinci' -> AND db1.artist.a_id = db2.painting.a_id; +-----------------+

| title | +-----------------+ | The Last Supper | | The Mona Lisa | +-----------------+ I f t he

artist and painting t ables are m anaged by different

servers, you cannot issue

a single query t o perform a j oin bet ween t hem . You m ust send a query t o one server t o fet ch t he appropriat e art ist I D:

mysql> SELECT a_id FROM artist WHERE name = 'Da Vinci'; +------+ | a_id | +------+ | 1 | +------+ Then use t hat

a_id value ( 1)

t o const ruct a second query t hat you send t o t he ot her server:

mysql> SELECT title FROM painting WHERE a_id = 1; +-----------------+ | title | +-----------------+ | The Last Supper | | The Mona Lisa | +-----------------+ The preceding exam ple uses a relat ively sim ple exam ple, which has a correspondingly sim ple solut ion. I t 's sim ple because it ret rieves only a single value from t he first t able, and because it displays inform at ion only from t he second t able. I f you want ed inst ead t o display t he art ist nam e wit h t he paint ing t it le, and t o do so for several art ist s, t he problem becom es correspondingly m ore difficult . You m ight solve it by writ ing a program t hat sim ulat es a j oin: 1.

Open a separat e connect ion t o each dat abase server.

2.

Run a loop t hat fet ches art ist I Ds and nam es from t he server t hat m anages t he

artist t able. 3.

Each t im e t hrough t he loop, use t he current art ist I D t o const ruct a query t hat looks

painting t able rows t hat m at ch t he art ist I D value. Send t he query t o t he server t hat m anages t he painting t able. As you ret rieve paint ing t it les, display

for

t hem along wit h t he current art ist nam e. This t echnique allows sim ulat ion of a j oin bet ween t ables locat ed on any t wo servers. I ncident ally, it also can be used when you need t o work wit h t ables t hat are host ed by different t ypes of dat abase engines. ( For exam ple, you can sim ulat e a j oin bet ween a MySQL t able and a Post greSQL t able t his way.) However, it 's st ill m essy, so when faced wit h t his kind of problem , you m ay wish t o consider anot her alt ernat ive: copy one of t he t ables from one server t o t he ot her. Then you can work wit h bot h t ables using t he sam e server, which allows

you t o perform a proper j oin bet ween t hem . See Recipe 10.17 for inform at ion on copying t ables bet ween servers.

Chapter 13. Statistical Techniques Sect ion 13.1. I nt roduct ion Sect ion 13.2. Calculat ing Descript ive St at ist ics Sect ion 13.3. Per- Group Descript ive St at ist ics Sect ion 13.4. Generat ing Frequency Dist ribut ions Sect ion 13.5. Count ing Missing Values Sect ion 13.6. Calculat ing Linear Regressions or Correlat ion Coefficient s Sect ion 13.7. Generat ing Random Num bers Sect ion 13.8. Random izing a Set of Rows Sect ion 13.9. Select ing Random I t em s from a Set of Rows Sect ion 13.10. Assigning Ranks

13.1 Introduction This chapt er covers several t opics t hat relat e t o basic st at ist ical t echniques. For t he m ost part , t hese recipes build on t hose described in earlier chapt ers, such as t he sum m ary t echniques discussed in Chapt er 7. The exam ples here t hus show addit ional ways t o apply t he m at erial from t hose chapt ers. Broadly speaking, t he t opics discussed in t his chapt er include:



Techniques for dat a charact erizat ion, such as calculat ing descript ive st at ist ics, generat ing frequency dist ribut ions, count ing m issing values, and calculat ing least squares regressions or correlat ion coefficient s



Random izat ion m et hods, such as how t o generat e random num bers and apply t hem t o random izat ion of a set of rows or t o select ing individual it em s random ly from t he rows



Rank assignm ent s

St at ist ics covers such a large and diverse array of t opics t hat t his chapt er necessarily only scrat ches t he surface, and sim ply illust rat es a few of t he pot ent ial areas in which MySQL m ay be applied t o st at ist ical analysis. Not e t hat som e st at ist ical m easures can be defined in different ways ( for exam ple, do you calculat e st andard deviat ion based on freedom , or

n- 1?) . For t hat

n degrees of

reason, if t he definit ion I use for a given t erm doesn't m at ch t he

one you prefer, you'll need t o adapt t he queries or algorit hm s shown here t o som e ext ent . You can find script s relat ed t o t he exam ples discussed here in t he st at s direct ory of t he

recipes dist ribut ion, and script s for

creat ing som e of t he exam ple t ables in t he t ables

direct ory.

13.2 Calculating Descriptive Statistics 13.2.1 Problem You want t o charact erize a dat aset by com put ing general descript ive or sum m ary st at ist ics.

13.2.2 Solution Many com m on descript ive st at ist ics, such as m ean and st andard deviat ion, can be obt ained by applying aggregat e funct ions t o your dat a. Ot hers, such as m edian or m ode, can be calculat ed based on count ing queries.

13.2.3 Discussion Suppose you have a t able

testscore cont aining observat ions represent ing subj ect

I D,

age, sex, and t est score:

mysql> SELECT subject, age, sex, score FROM testscore ORDER BY subject; +---------+-----+-----+-------+

| subject | age | sex | score | +---------+-----+-----+-------+ | 1 | 5 | M | 5 | | 2 | 5 | M | 4 | | 3 | 5 | F | 6 | | 4 | 5 | F | 7 | | 5 | 6 | M | 8 | | 6 | 6 | M | 9 | | 7 | 6 | F | 4 | | 8 | 6 | F | 6 | | 9 | 7 | M | 8 | | 10 | 7 | M | 6 | | 11 | 7 | F | 9 | | 12 | 7 | F | 7 | | 13 | 8 | M | 9 | | 14 | 8 | M | 6 | | 15 | 8 | F | 7 | | 16 | 8 | F | 10 | | 17 | 9 | M | 9 | | 18 | 9 | M | 7 | | 19 | 9 | F | 10 | | 20 | 9 | F | 9 | +---------+-----+-----+-------+ A good first st ep in analyzing a set of observat ions is t o generat e som e descript ive st at ist ics t hat sum m arize t heir general charact erist ics as a whole. Com m on st at ist ical values of t his kind include:

• •

Measures of cent ral t endency, such as m ean, m edian, and m ode



Measures of variat ion, such as st andard deviat ion or variance

The num ber of observat ions, t heir sum , and t heir range ( m inim um and m axim um )

Aside from t he m edian and m ode, all of t hese can be calculat ed easily by invoking aggregat e funct ions:

mysql> SELECT COUNT(score) AS n, -> SUM(score) AS sum, -> MIN(score) AS minimum, -> MAX(score) AS maximum, -> AVG(score) AS mean, -> STD(score) AS 'std. dev.' -> FROM testscore; +----+------+---------+---------+--------+-----------+ | n | sum | minimum | maximum | mean | std. dev. | +----+------+---------+---------+--------+-----------+ | 20 | 146 | 4 | 10 | 7.3000 | 1.7916 | +----+------+---------+---------+--------+-----------+ The aggregat e funct ions as used in t his query count only non- NULL observat ions. I f you use

NULL t o represent

m issing values,you m ay want t o perform an addit ional charact erizat ion t o

assess t he ext ent t o which values are m issing. ( See Recipe 13.5.)

Variance is not shown in t he query, and MySQL has no funct ion for calculat ing it . However, variance is j ust t he square of t he st andard deviat ion, so it 's easily com put ed like t his:

STD(score) * STD(score)

STDDEV( ) is a synonym

for

STD( ).

St andard deviat ion can be used t o ident ify out liers—values t hat are uncharact erist ically far from t he m ean. For exam ple, t o select values t hat lie m ore t han t hree st andard deviat ions from t he m ean, you can do som et hing like t his:

SELECT @mean := AVG(score), @std := STD(score) FROM testscore; SELECT score FROM testscore WHERE ABS(score-@mean) > @std * 3;

n values, t he st andard deviat ion produced by STD( ) is based on n degrees of freedom . This is equivalent t o com put ing t he st andard deviat ion "by hand" as follows ( @ss For a set of

represent s t he sum of squares) :

mysql> SELECT -> @n := COUNT(score), -> @sum := SUM(score), -> @ss := SUM(score*score) -> FROM testscore; mysql> SELECT @var := ((@n * @ss) - (@sum * @sum)) / (@n * @n); mysql> SELECT SQRT(@var); +------------+ | SQRT(@var) | +------------+ | 1.791647 | +------------+ To calculat e a st andard deviat ion based on

n- 1 degrees of freedom

inst ead, do it like t his:

mysql> SELECT -> @n := COUNT(score), -> @sum := SUM(score), -> @ss := SUM(score*score) -> FROM testscore; mysql> SELECT @var := ((@n * @ss) - (@sum * @sum)) / (@n * (@n - 1)); mysql> SELECT SQRT(@var); +------------+ | SQRT(@var) | +------------+ | 1.838191 | +------------+ Or, m ore sim ply, like t his:

mysql> SELECT @n := COUNT(score) FROM testscore; mysql> SELECT STD(score)*SQRT(@n/(@n-1)) FROM testscore; +----------------------------+

| STD(score)*SQRT(@n/(@n-1)) | +----------------------------+ | 1.838191 | +----------------------------+ MySQL has no built - in funct ion for com put ing t he m ode or m edian of a set of values, but you can com put e t hem yourself. The m ode is t he value t hat occurs m ost frequent ly. To det erm ine what it is, count each value and see which one is m ost com m on:

mysql> SELECT score, COUNT(score) AS count -> FROM testscore GROUP BY score ORDER BY count DESC; +-------+-------+ | score | count | +-------+-------+ | 9 | 5 | | 6 | 4 | | 7 | 4 | | 4 | 2 | | 8 | 2 | | 10 | 2 | | 5 | 1 | +-------+-------+ I n t his case, 9 is t he m odal score value. The m edian of a set of ordered values can be calculat ed like t his:

[1]

[1]

Note that the definition of median given here isn't fully general; it doesn't address what to do if there's duplication of the middle values in the dataset.



I f t he num ber of values is odd, t he m edian is t he m iddle value.



I f t he num ber of values is even, t he m edian is t he average of t he t wo m iddle values.

Based on t hat definit ion, use t he following procedure t o det erm ine t he m edian of a set of observat ions st ored in t he dat abase:



I ssue a query t o count t he num ber of observat ions. From t he count , you can det erm ine whet her t he m edian calculat ion requires one or t wo values, and what t heir indexes are wit hin t he ordered set of observat ions.



I ssue a query t hat includes an

LIMIT clause t o pull out •

ORDER BY clause t o sort

t he observat ions, and a

t he m iddle value or values.

Take t he average of t he select ed value or values.

For exam ple, if a t able

t cont ains a score colum n wit h 37 values ( an odd num ber) , you

need t o select a single value, using a query like t his:

SELECT score FROM t ORDER BY 1 LIMIT 18,1 I f t he colum n cont ains 38 values ( an even num ber) , t he query becom es:

SELECT score FROM t ORDER BY 1 LIMIT 18,2 Then you can select t he value or values ret urned by t he query and com put e t he m edian from t heir average. The following Perl funct ion im plem ent s a m edian calculat ion. I t t akes a dat abase handle and t he nam es of t he t able and colum n t hat cont ain t he set of observat ions, t hen generat es t he query t hat ret rieves t he relevant values, and ret urns t heir average:

sub median { my ($dbh, $tbl_name, $col_name) = @_; my ($count, $limit); $count = $dbh->selectrow_array ("SELECT COUNT($col_name) FROM $tbl_name"); return undef unless $count > 0; if ($count % 2 == 1) # odd number of values; select middle value { $limit = sprintf ("LIMIT %d,1", ($count-1)/2); } else # even number of values; select middle two values { $limit = sprintf ("LIMIT %d,2", $count/2 - 1); } my $sth = $dbh->prepare ( "SELECT $col_name FROM $tbl_name ORDER BY 1 $limit"); $sth->execute ( ); my ($n, $sum) = (0, 0); while (my $ref = $sth->fetchrow_arrayref ( )) { ++$n; $sum += $ref->[0]; } return ($sum / $n); } The preceding t echnique works for a set of values st ored in t he dat abase. I f you happen t o have already fet ched an ordered set of values int o an array

@val, you can com put e t he

m edian like t his inst ead:

if (@val == { $median } elsif (@val { $median } else { $median }

0)

# if array is empty, median is undefined

= undef; % 2 == 1)

# if array size is odd, median is middle number

= $val[(@val-1)/2]; # array size is even; median is average # of two middle numbers = ($val[@val/2 - 1] + $val[@val/2]) / 2;

The code works for arrays t hat have an init ial subscript of 0; for languages t hat used 1-based array indexes, adj ust t he algorit hm accordingly.

13.4 Generating Frequency Distributions 13.4.1 Problem You want t o know t he frequency of occurrence for each value in a t able.

13.4.2 Solution Derive a frequency dist ribut ion t hat sum m arizes t he cont ent s of your dat aset .

13.4.3 Discussion A com m on applicat ion for per- group sum m ary t echniques is t o generat e a breakdown of t he num ber of t im es each value occurs. This is called a frequency dist ribut ion. For t he

testscore t able, t he frequency

dist ribut ion looks like t his:

mysql> SELECT score, COUNT(score) AS occurrence -> FROM testscore GROUP BY score; +-------+------------+ | score | occurrence | +-------+------------+ | 4 | 2 | | 5 | 1 | | 6 | 4 | | 7 | 4 | | 8 | 2 | | 9 | 5 | | 10 | 2 | +-------+------------+ I f you express t he result s in t erm s of percent ages rat her t han as count s, you produce a relat ive frequency dist ribut ion. To break down a set of observat ions and show each count as a percent age of t he t ot al, use one query t o get t he t ot al num ber of observat ions, and anot her t o calculat e t he percent ages for each group:

mysql> SELECT @n := COUNT(score) FROM testscore; mysql> SELECT score, (COUNT(score)*100)/@n AS percent -> FROM testscore GROUP BY score; +-------+---------+ | score | percent | +-------+---------+ | 4 | 10 | | 5 | 5 | | 6 | 20 | | 7 | 20 | | 8 | 10 | | 9 | 25 | | 10 | 10 |

+-------+---------+ The dist ribut ions j ust shown sum m arize t he num ber of values for individual scores. However, if t he dat aset cont ains a large num ber of dist inct values and you want a dist ribut ion t hat shows only a sm all num ber of cat egories, you m ay wish t o lum p values int o cat egories and produce a count for each cat egory. "Lum ping" t echniques are discussed in Recipe 7.13. One t ypical use of frequency dist ribut ions is t o export t he result s for use in a graphing program . I n t he absence of such a program , you can use MySQL it self t o generat e a sim ple ASCI I chart as a visual represent at ion of t he dist ribut ion. For exam ple, t o display an ASCI I bar chart of t he t est score count s, convert t he count s t o st rings of

* charact ers:

mysql> SELECT score, REPEAT('*',COUNT(score)) AS occurrences -> FROM testscore GROUP BY score; +-------+-------------+ | score | occurrences | +-------+-------------+ | 4 | ** | | 5 | * | | 6 | **** | | 7 | **** | | 8 | ** | | 9 | ***** | | 10 | ** | +-------+-------------+ To chart t he relat ive frequency dist ribut ion inst ead, use t he percent age values:

mysql> SELECT @n := COUNT(score) FROM testscore; mysql> SELECT score, REPEAT('*',(COUNT(score)*100)/@n) AS percent -> FROM testscore GROUP BY score; +-------+---------------------------+ | score | percent | +-------+---------------------------+ | 4 | ********** | | 5 | ***** | | 6 | ******************** | | 7 | ******************** | | 8 | ********** | | 9 | ************************* | | 10 | ********** | +-------+---------------------------+ The ASCI I chart m et hod is fairly crude, obviously, but it 's a quick way t o get a pict ure of t he dist ribut ion of observat ions, and it requires no ot her t ools. I f you generat e a frequency dist ribut ion for a range of cat egories where som e of t he cat egories are not represent ed in your observat ions, t he m issing cat egories will not appear in t he out put . To force each cat egory t o be displayed, use a reference t able and a ( a t echnique discussed in Recipe 12.10) . For t he

LEFT JOIN

testscore t able, t he possible scores

range from 0 t o 10, so a reference t able should cont ain each of t hose values:

mysql> CREATE TABLE ref (score INT); mysql> INSERT INTO ref (score) -> VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10); Then j oin t he reference t able t o t he t est scores t o generat e t he frequency dist ribut ion:

mysql> SELECT ref.score, COUNT(testscore.score) AS occurrences -> FROM ref LEFT JOIN testscore ON ref.score = testscore.score -> GROUP BY ref.score; +-------+-------------+ | score | occurrences | +-------+-------------+ | 0 | 0 | | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | 2 | | 5 | 1 | | 6 | 4 | | 7 | 4 | | 8 | 2 | | 9 | 5 | | 10 | 2 | +-------+-------------+ This dist ribut ion includes rows for scores 0 t hrough 3, none of which appear in t he frequency dist ribut ion shown earlier. The sam e principle applies t o relat ive frequency dist ribut ions:

mysql> SELECT @n := COUNT(score) FROM testscore; mysql> SELECT ref.score, (COUNT(testscore.score)*100)/@n AS percent -> FROM ref LEFT JOIN testscore ON ref.score = testscore.score -> GROUP BY ref.score; +-------+---------+ | score | percent | +-------+---------+ | 0 | 0 | | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | 10 | | 5 | 5 | | 6 | 20 | | 7 | 20 | | 8 | 10 | | 9 | 25 | | 10 | 10 | +-------+---------+

13.5 Counting Missing Values 13.5.1 Problem A set of observat ions is incom plet e. You want t o find out how m uch so.

13.5.2 Solution Count t he num ber of

NULL values in t he set .

13.5.3 Discussion Values can be m issing from a set of observat ions for any num ber of reasons: A t est m ay not yet have been adm inist ered, som et hing m ay have gone wrong during t he t est t hat requires invalidat ing t he observat ion, and so fort h. You can represent such observat ions in a dat aset as

NULL values t o signify t hat

t hey're m issing or ot herwise invalid, t hen use sum m ary queries

t o charact erize t he com plet eness of t he dat aset .

t cont ains values t o be sum m arized along a single dim ension, a sim ple sum m ary will do t o charact erize t he m issing values. Suppose t looks like t his: I f a t able

mysql> SELECT subject, score FROM t ORDER BY subject; +---------+-------+ | subject | score | +---------+-------+ | 1 | 38 | | 2 | NULL | | 3 | 47 | | 4 | NULL | | 5 | 37 | | 6 | 45 | | 7 | 54 | | 8 | NULL | | 9 | 40 | | 10 | 49 | +---------+-------+

COUNT(*) count s t he t ot al num ber

of rows and

COUNT(score) count s only t he

num ber of non- m issing scores. The difference bet ween t he t wo is t he num ber of m issing scores, and t hat difference in relat ion t o t he t ot al provides t he percent age of m issing scores. These calculat ions are expressed as follows:

mysql> SELECT COUNT(*) AS 'n (total)', -> COUNT(score) AS 'n (non-missing)', -> COUNT(*) - COUNT(score) AS 'n (missing)', -> ((COUNT(*) - COUNT(score)) * 100) / COUNT(*) AS '% missing' -> FROM t; +-----------+-----------------+-------------+-----------+ | n (total) | n (non-missing) | n (missing) | % missing | +-----------+-----------------+-------------+-----------+ | 10 | 7 | 3 | 30.00 | +-----------+-----------------+-------------+-----------+

NULL values as t he difference bet ween count s, you can count using SUM(ISNULL(score)). The ISNULL( ) funct ion ret urns 1 if

As an alt ernat ive t o count ing t hem direct ly

it s argum ent is

NULL, zero ot herwise:

mysql> SELECT COUNT(*) AS 'n (total)', -> COUNT(score) AS 'n (non-missing)', -> SUM(ISNULL(score)) AS 'n (missing)', -> (SUM(ISNULL(score)) * 100) / COUNT(*) AS '% missing' -> FROM t; +-----------+-----------------+-------------+-----------+ | n (total) | n (non-missing) | n (missing) | % missing | +-----------+-----------------+-------------+-----------+ | 10 | 7 | 3 | 30.00 | +-----------+-----------------+-------------+-----------+ I f values are arranged in groups, occurrences of group basis. Suppose

t cont ains scores for

NULL values can be assessed on a per-

subj ect s t hat are dist ribut ed am ong condit ions for

t wo fact ors A and B, each of which has t wo levels:

mysql> SELECT subject, A, B, score FROM t ORDER BY subject; +---------+------+------+-------+ | subject | A | B | score | +---------+------+------+-------+ | 1 | 1 | 1 | 18 | | 2 | 1 | 1 | NULL | | 3 | 1 | 1 | 23 | | 4 | 1 | 1 | 24 | | 5 | 1 | 2 | 17 | | 6 | 1 | 2 | 23 | | 7 | 1 | 2 | 29 | | 8 | 1 | 2 | 32 | | 9 | 2 | 1 | 17 | | 10 | 2 | 1 | NULL | | 11 | 2 | 1 | NULL | | 12 | 2 | 1 | 25 | | 13 | 2 | 2 | NULL | | 14 | 2 | 2 | 33 | | 15 | 2 | 2 | 34 | | 16 | 2 | 2 | 37 | +---------+------+------+-------+ I n t his case, t he query uses a

GROUP BY clause t o produce a sum m ary for each com binat ion

of condit ions:

mysql> SELECT A, B, COUNT(*) AS 'n (total)', -> COUNT(score) AS 'n (non-missing)', -> COUNT(*) - COUNT(score) AS 'n (missing)', -> ((COUNT(*) - COUNT(score)) * 100) / COUNT(*) AS '% missing' -> FROM t -> GROUP BY A, B; +------+------+-----------+-----------------+-------------+-----------+ | A | B | n (total) | n (non-missing) | n (missing) | % missing | +------+------+-----------+-----------------+-------------+-----------+

| 1 | 1 | 4 | 3 | 1 | 25.00 | | 1 | 2 | 4 | 4 | 0 | 0.00 | | 2 | 1 | 4 | 2 | 2 | 50.00 | | 2 | 2 | 4 | 3 | 1 | 25.00 | +------+------+-----------+-----------------+-------------+-----------+

13.6 Calculating Linear Regressions or Correlation Coefficients 13.6.1 Problem You want t o calculat e t he least -squares regression line for t wo variables, or t he correlat ion coefficient t hat expresses t he st rengt h of t he relat ionship bet ween t hem .

13.6.2 Solution Apply sum m ary funct ions t o calculat e t he necessary t erm s.

13.6.3 Discussion When t he dat a values for t wo variables X and Y are st ored in a dat abase, t he least -squares regression for t hem can be calculat ed easily using aggregat e funct ions. The sam e is t rue for t he correlat ion coefficient . The t wo calculat ions are act ually fairly sim ilar, and m any t erm s for perform ing t he com put at ions are com m on t o t he t wo procedures. Suppose you want t o calculat e a least - squares regression using t he age and t est score values for t he observat ions in t he

testscore t able:

mysql> SELECT age, score FROM testscore; +-----+-------+ | age | score | +-----+-------+ | 5 | 5 | | 5 | 4 | | 5 | 6 | | 5 | 7 | | 6 | 8 | | 6 | 9 | | 6 | 4 | | 6 | 6 | | 7 | 8 | | 7 | 6 | | 7 | 9 | | 7 | 7 | | 8 | 9 | | 8 | 6 | | 8 | 7 | | 8 | 10 | | 9 | 9 | | 9 | 7 | | 9 | 10 |

| 9 | 9 | +-----+-------+ A regression line is expressed as follows, where

a and b are t he int ercept

and slope of t he

line:

Y = bX + a Let t ing

age be X and score be Y, begin by

com put ing t he t erm s needed for t he correlat ion

equat ion. These include t he num ber of observat ions, t he m eans, sum s, and sum s of squares for each variable, and t he sum of t he product s of each variable: [2] [2]

You can see where these terms come from by consulting any standard statistics text.

mysql> SELECT -> @n := COUNT(score) AS N, -> @meanX := AVG(age) AS "X mean", -> @sumX := SUM(age) AS "X sum", -> @sumXX := SUM(age*age) "X sum of squares", -> @meanY := AVG(score) AS "Y mean", -> @sumY := SUM(score) AS "Y sum", -> @sumYY := SUM(score*score) "Y sum of square", -> @sumXY := SUM(age*score) AS "X*Y sum" -> FROM testscore\G *************************** 1. row *************************** N: 20 X mean: 7.0000 X sum: 140 X sum of squares: 1020 Y mean: 7.3000 Y sum: 146 Y sum of square: 1130 X*Y sum: 1053 From t hose t erm s, t he regression slope and int ercept are calculat ed as follows:

mysql> SELECT -> @b := (@n*@sumXY - @sumX*@sumY) / (@n*@sumXX - @sumX*@sumX) -> AS slope; +-------+ | slope | +-------+ | 0.775 | +-------+ mysql> SELECT @a := -> (@meanY - @b*@meanX) -> AS intercept; +-----------+ | intercept | +-----------+ | 1.875 | +-----------+ The regression equat ion t hen is:

mysql> SELECT CONCAT('Y = ',@b,'X + ',@a) AS 'least-squares regression'; +--------------------------+ | least-squares regression | +--------------------------+ | Y = 0.775X + 1.875 | +--------------------------+ To com put e t he correlat ion coefficient , m any of t he sam e t erm s are used:

mysql> SELECT -> (@n*@sumXY - @sumX*@sumY) -> / SQRT((@n*@sumXX - @sumX*@sumX) * (@n*@sumYY - @sumY*@sumY)) -> AS correlation; +------------------+ | correlation | +------------------+ | 0.61173620442199 | +------------------+

13.7 Generating Random Numbers 13.7.1 Problem You need a source of random num bers.

13.7.2 Solution I nvoke MySQL's

RAND( ) funct ion.

13.7.3 Discussion MySQL has a

RAND( ) funct ion t hat

can be invoked t o produce random num bers bet ween 0

and 1:

mysql> SELECT RAND( ), RAND( ), RAND( ); +------------------+------------------+------------------+ | RAND( ) | RAND( ) | RAND( ) +------------------+------------------+------------------+ | 0.31466114177803 | 0.89354679723601 | 0.52375059157959 | +------------------+------------------+------------------+ When invoked wit h an int eger argum ent ,

RAND( ) uses t hat

|

value t o seed t he random

num ber generat or. Each t im e you seed t he generat or wit h a given value,

RAND( ) will

produce a repeat able series of num bers:

mysql> SELECT RAND(1), RAND( ), RAND( ); +------------------+------------------+------------------+ | RAND(1) | RAND( ) | RAND( ) | +------------------+------------------+------------------+ | 0.18109050223705 | 0.75023211143001 | 0.20788908117254 |

+------------------+------------------+------------------+ mysql> SELECT RAND(20000000), RAND( ), RAND( ); +------------------+-------------------+------------------+ | RAND(20000000) | RAND( ) | RAND( ) | +------------------+-------------------+------------------+ | 0.24628307879556 | 0.020315642487552 | 0.36272900678472 | +------------------+-------------------+------------------+ mysql> SELECT RAND(1), RAND( ), RAND( ); +------------------+------------------+------------------+ | RAND(1) | RAND( ) | RAND( ) | +------------------+------------------+------------------+ | 0.18109050223705 | 0.75023211143001 | 0.20788908117254 | +------------------+------------------+------------------+ mysql> SELECT RAND(20000000), RAND( ), RAND( ); +------------------+-------------------+------------------+ | RAND(20000000) | RAND( ) | RAND( ) | +------------------+-------------------+------------------+ | 0.24628307879556 | 0.020315642487552 | 0.36272900678472 | +------------------+-------------------+------------------+ I f you want t o seed

RAND( ) random ly, pick a seed value based on a source of ent ropy.

Possible sources are t he current t im est am p or connect ion ident ifier, alone or perhaps in com binat ion:

mysql> SELECT RAND(UNIX_TIMESTAMP( )) AS rand1, -> RAND(CONNECTION_ID( )) AS rand2, -> RAND(UNIX_TIMESTAMP( )+CONNECTION_ID( )) AS rand3; +------------------+------------------+------------------+ | rand1 | rand2 | rand3 | +------------------+------------------+------------------+ | 0.50452774158169 | 0.18113064782799 | 0.50456789089792 | +------------------+------------------+------------------+ However, it 's probably bet t er t o use ot her seed value sources if you have t hem . For exam ple, if your syst em has a / dev/ random or / dev/ urandom device, you can read t he device and use it t o generat e a value for seeding

RAND( ).

13.8 Randomizing a Set of Rows 13.8.1 Problem You want t o random ize a set of rows or values.

13.8.2 Solution Use

ORDER BY RAND( ).

13.8.3 Discussion

MySQL's

RAND( ) funct ion can be used t o random ize t he order

in which a query ret urns it s

rows. Som ewhat paradoxically, t his random izat ion is achieved by adding an

ORDER BY

clause t o t he query. The t echnique is roughly equivalent t o a spreadsheet random izat ion m et hod. Suppose you have a set of values in a spreadsheet t hat looks like t his:

Patrick Penelope Pertinax Polly To place t hese in random order, first add anot her colum n t hat cont ains random ly chosen num bers:

Patrick Penelope Pertinax Polly

.73 .37 .16 .48

Then sort t he rows according t o t he values of t he random num bers:

Pertinax Penelope Polly Patrick

.16 .37 .48 .73

At t his point , t he original values have been placed in random order, because t he effect of sort ing t he random num bers is t o random ize t he values associat ed wit h t hem . To rerandom ize t he values, choose anot her set of random num bers and sort t he rows again. I n MySQL, a sim ilar effect is achieved by associat ing a set of random num bers wit h a query result and sort ing t he result by t hose num bers. For MySQL 3.23.2 and up, t his is done wit h an

ORDER BY RAND( ) clause: mysql> SELECT name FROM t ORDER BY RAND( ); +----------+ | name | +----------+ | Pertinax | | Penelope | | Patrick | | Polly | +----------+ mysql> SELECT name FROM t ORDER BY RAND( ); +----------+ | name | +----------+ | Patrick | | Pertinax | | Penelope | | Polly | +----------+

How Random Is RAND( )? Does t he

RAND( ) funct ion generat e evenly dist ribut ed num bers? Check it

out for

yourself wit h t he following Pyt hon script , rand_t est .py, from t he st at s direct ory of t he

recipes dist ribut ion. I t

uses

RAND( ) t o generat e random

num bers and

const ruct s a frequency dist ribut ion from t hem , using .1- sized cat egories. This provides a m eans of assessing how evenly dist ribut ed t he values are.

#! /usr/bin/python # rand_test.pl - create a frequency distribution of RAND( ) values. # This provides a test of the randomness of RAND( ). # Method is to draw random numbers in the range from 0 to 1.0, # and count how many of them occur in .1-sized intervals (0 up # to .1, .1 up to .2, ..., .9 up *through* 1.0). import sys sys.path.insert (0, "/usr/local/apache/lib/python") import MySQLdb import Cookbook npicks = 1000 # number of times to pick a number bucket = [0] * 10 conn = Cookbook.connect ( ) cursor = conn.cursor ( ) for i in range (0, npicks): cursor.execute ("SELECT RAND( )") (val,) = cursor.fetchone ( ) slot = int (val * 10) if slot > 9: slot = 9 # put 1.0 in last slot bucket[slot] = bucket[slot] + 1 cursor.close ( ) conn.close ( ) # Print the resulting frequency distribution for slot in range (0, 9): print "%2d %d" % (slot+1, bucket[slot]) sys.exit (0) The st at s direct ory also cont ains equivalent script s in ot her languages.

For versions of MySQL older t han 3.23.2, you cannot use

ORDER BY clauses cannot

refer t o expressions, so

RAND( ) t here ( see Recipe 6.4) . As a workaround, add a colum n of random

num bers t o t he colum n out put list , alias it , and refer t o t he alias for sort ing:

mysql> SELECT name, name*0+RAND( ) AS rand_num FROM t ORDER BY rand_num; +----------+-------------------+ | name | rand_num | +----------+-------------------+ | Penelope | 0.372227413926485 | | Patrick | 0.431537678867148 | | Pertinax | 0.566524063764628 | | Polly | 0.715938107777329 | +----------+-------------------+

Not e t hat t he expression for t he random num ber colum n is

RAND( ). I f you t ry

name*0+RAND( ),

not j ust

using t he lat t er, t he pre-3.23 MySQL opt im izer not ices t hat t he colum n

cont ains only a funct ion, assum es t hat t he funct ion ret urns a const ant value for each row, and opt im izes t he corresponding

ORDER BY clause out

of exist ence. As a result , no sort ing is

done. The workaround is t o fool t he opt im izer by adding ext ra fact ors t o t he expression t hat don't change it s value, but m ake t he colum n look like a non- const ant . The query j ust shown illust rat es one easy way t o do t his: Take any colum n nam e, m ult iply it by zero, and add t he result t o

RAND( ). Grant ed, it

m ay seem a lit t le st range t o use

name in a m at hem at ical

expression, because t hat colum n's values aren't num eric. That doesn't m at t er; MySQL sees t he

* m ult iplicat ion operat or

and perform s a st ring-t o- num ber conversion of t he

name

values before t he m ult iply operat ion. The im port ant t hing is t hat t he result of t he m ult iplicat ion is zero, which m eans t hat

name*0+RAND( ) has t he sam e value as

RAND( ). Applicat ions for random izing a set of rows include any scenario t hat uses select ion wit hout replacem ent ( choosing each it em from a set of it em s, unt il t here are no m ore it em s left ) . Som e exam ples of t his are:



Det erm ining t he st art ing order for part icipant s in an event . List t he part icipant s in a t able and select t hem in random order.



Assigning st art ing lanes or gat es t o part icipant s in a race. List t he lanes in a t able and select a random lane order.

• •

Choosing t he order in which t o present a set of quiz quest ions. Shuffling a deck of cards. Represent each card by a row in a t able and shuffle t he deck by select ing t he rows in random order. Deal t hem one by one unt il t he deck is exhaust ed.

To use t he last exam ple as an illust rat ion, let 's im plem ent a card deck shuffling algorit hm . Shuffling and dealing cards is random izat ion plus select ion wit hout replacem ent : each card is dealt once before any is dealt t wice; when t he deck is used up, it is reshuffled t o re- random ize it for a new dealing order. Wit hin a program , t his t ask can be perform ed wit h MySQL using a t able

deck t hat

has 52 rows, assum ing a set of cards wit h each com binat ion of 13 face

values and 4 suit s:

• •

Each t im e a card is needed, t ake t he next elem ent from t he array.



When t he array is exhaust ed, all t he cards have been dealt . "Reshuffle" t he t able t o

Select t he ent ire t able and st ore it int o an array.

generat e a new card order.

deck t able is a t edious t ask if you insert t he 52 card records by writ ing out t he INSERT st at em ent s m anually. The deck cont ent s can be generat ed m ore easily in

Set t ing up t he

all

com binat orial fashion wit hin a program by generat ing each pairing of face value wit h suit . Here's som e PHP code t hat creat es a

deck t able wit h face and suit colum ns, t hen

populat es t he t able using nest ed loops t o generat e t he pairings for t he

INSERT st at em ent s:

mysql_query (" CREATE TABLE deck ( face ENUM('A', 'K', 'Q', 'J', '10', '9', '8', '7', '6', '5', '4', '3', '2') NOT NULL, suit ENUM('hearts', 'diamonds', 'clubs', 'spades') NOT NULL )", $conn_id) or die ("Cannot issue CREATE TABLE statement\n"); $face_array = array ("A", "K", "Q", "J", "10", "9", "8", "7", "6", "5", "4", "3", "2"); $suit_array = array ("hearts", "diamonds", "clubs", "spades"); # insert a "card" into the deck for each combination of suit and face reset ($face_array); while (list ($index, $face) = each ($face_array)) { reset ($suit_array); while (list ($index2, $suit) = each ($suit_array)) { mysql_query ("INSERT INTO deck (face,suit) VALUES('$face','$suit')", $conn_id) or die ("Cannot insert card into deck\n"); } } Shuffling t he cards is a m at t er of issuing t his st at em ent :

SELECT face, suit FROM deck ORDER BY RAND( ); To do t hat and st ore t he result s in an array wit hin a script , writ e a

shuffle_deck( )

funct ion t hat issues t he query and ret urns t he result ing values in an array ( again shown in PHP) :

function shuffle_deck ($conn_id) { $query = "SELECT face, suit FROM deck ORDER BY RAND( )"; $result_id = mysql_query ($query, $conn_id) or die ("Cannot retrieve cards from deck\n"); $card = array ( ); while ($obj = mysql_fetch_object ($result_id)) $card[ ] = $obj; # add card record to end of $card array mysql_free_result ($result_id); return ($card); } Deal t he cards by keeping a count er t hat ranges from 0 t o 51 t o indicat e which card t o select . When t he count er reaches 52, t he deck is exhaust ed and should be shuffled again.

13.9 Selecting Random Items from a Set of Rows 13.9.1 Problem You want t o pick an it em or it em s random ly from a set of values.

13.9.2 Solution Random ize t he values, t hen pick t he first one ( or t he first few, if you need m ore t han one) .

13.9.3 Discussion When a set of it em s is st ored in MySQL, you can choose one at random as follows:



Select t he it em s in t he set in random order, using

ORDER BY RAND( ) as

described in Recipe 13.8.



Add

LIMIT 1 t o t he query

t o pick t he first it em .

For exam ple, a sim ple sim ulat ion of t ossing a die can be perform ed by creat ing a

die t able

cont aining rows wit h values from 1 t o 6 corresponding t o t he six faces of a die cube, t hen picking rows from it at random :

mysql> SELECT +------+ | n | +------+ | 6 | +------+ mysql> SELECT +------+ | n | +------+ | 4 | +------+ mysql> SELECT +------+ | n | +------+ | 5 | +------+ mysql> SELECT +------+ | n | +------+ | 4 | +------+

n FROM die ORDER BY RAND( ) LIMIT 1;

n FROM die ORDER BY RAND( ) LIMIT 1;

n FROM die ORDER BY RAND( ) LIMIT 1;

n FROM die ORDER BY RAND( ) LIMIT 1;

As you repeat t his operat ion, you pick a random sequence of it em s from t he set . This is a form of select ion wit h replacem ent : An it em is chosen from a pool of it em s, t hen ret urned t o t he pool for t he next pick. Because it em s are replaced, it 's possible t o pick t he sam e it em m ult iple

t im es when m aking successive choices t his way. Ot her exam ples of select ion wit h replacem ent include:

• •

Picking a row for a "quot e of t he day" applicat ion.



"Pick a card, any card" m agic t ricks t hat begin wit h a full deck of cards each t im e.

Select ing a banner ad t o display on a web page

LIMIT argum ent . For exam ple, t o draw a t able nam ed drawing t hat cont ains cont est ent ries,

I f you want t o pick m ore t han one it em , change t he five winning ent ries at random from use

RAND( ) in com binat ion wit h LIMIT:

SELECT * FROM drawing ORDER BY RAND( ) LIMIT 5; A special case occurs when you're picking a single row from a t able t hat you know cont ains a

n in unbroken sequence. Under t hese circum st ances, it 's possible t o avoid perform ing an ORDER BY operat ion on t he ent ire t able

colum n wit h values in t he range from 1 t o

by picking a random num ber in t hat range and select ing t he m at ching row:

SET @id = FLOOR(RAND( )*n)+1; SELECT ... FROM

tbl_name

This will be m uch quicker t han

WHERE id = @id;

ORDER BY RAND( ) LIMIT 1 as t he t able size

increases.

13.10 Assigning Ranks 13.10.1 Problem You want t o assign ranks t o a set of values.

13.10.2 Solution Decide on a ranking m et hod, t hen put t he values in t he desired order and apply t he m et hod t o t hem .

13.10.3 Discussion Som e kinds of st at ist ical t est s require assignm ent of ranks. I 'll describe t hree ranking m et hods and show how each can be im plem ent ed using SQL variables. The exam ples assum e t hat a t able

t cont ains t he following scores, which are t o be ranked wit h t he values in descending

order:

mysql> SELECT score FROM t ORDER BY score DESC;

+-------+ | score | +-------+ | 5 | | 4 | | 4 | | 3 | | 2 | | 2 | | 2 | | 1 | +-------+ One t ype of ranking sim ply assigns each value it s row num ber wit hin t he ordered set of values. To produce such rankings, keep t rack of t he row num ber and use it for t he current rank:

mysql> SET @rownum := 0; mysql> SELECT @rownum := @rownum + 1 AS rank, score -> FROM t ORDER BY score DESC; +------+-------+ | rank | score | +------+-------+ | 1 | 5 | | 2 | 4 | | 3 | 4 | | 4 | 3 | | 5 | 2 | | 6 | 2 | | 7 | 2 | | 8 | 1 | +------+-------+ That kind of ranking doesn't t ake int o account t he possibilit y of t ies ( inst ances of values t hat are t he sam e) . A second ranking m et hod does so by advancing t he rank only when values change:

mysql> SET @rank = 0, @prev_val = NULL; mysql> SELECT @rank := IF(@prev_val=score,@rank,@rank+1) AS rank, -> @prev_val := score AS score -> FROM t ORDER BY score DESC; +------+-------+ | rank | score | +------+-------+ | 1 | 5 | | 2 | 4 | | 2 | 4 | | 3 | 3 | | 4 | 2 | | 4 | 2 | | 4 | 2 | | 5 | 1 | +------+-------+ A t hird ranking m et hod is som et hing of a com binat ion of t he ot her t wo m et hods. I t ranks values by row num ber, except when t ies occur. I n t hat case, t he t ied values each get a rank

equal t o t he row num ber of t he first of t he values. To im plem ent t his m et hod, keep t rack of t he row num ber and t he previous value, advancing t he rank t o t he current row num ber when t he value changes:

mysql> SET @rownum = 0, @rank = 0, @prev_val = NULL; mysql> SELECT @rownum := @rownum + 1 AS row, -> @rank := IF(@prev_val!=score,@rownum,@rank) AS rank, -> @prev_val := score AS score -> FROM t ORDER BY score DESC; +------+------+-------+ | row | rank | score | +------+------+-------+ | 1 | 1 | 5 | | 2 | 2 | 4 | | 3 | 2 | 4 | | 4 | 4 | 3 | | 5 | 5 | 2 | | 6 | 5 | 2 | | 7 | 5 | 2 | | 8 | 8 | 1 | +------+------+-------+ Ranks are easy t o assign wit hin a program as well. For exam ple, t he following PHP fragm ent ranks t he scores in

t using t he t hird ranking m et hod:

$result_id = mysql_query ("SELECT score FROM t ORDER BY score DESC", $conn_id) or die ("Cannot select scores\n"); $rownum = 0; $rank = 0; unset ($prev_score); print ("Row\tRank\tScore\n"); while (list ($score) = mysql_fetch_row ($result_id)) { ++$rownum; if ($rownum == 1 || $prev_score != $score) $rank = $rownum; print ("$rownum\t$rank\t$score\n"); $prev_score = $score; } mysql_free_result ($result_id); The t hird t ype of ranking is com m only used out side t he realm of st at ist ical m et hods. Recall t hat in Recipe 3.19, we used a t able

al_winner t hat

cont ains t he t op 15 winning pit chers

in t he Am erican League for 2001:

mysql> SELECT name, wins FROM al_winner ORDER BY wins DESC, name; +----------------+------+ | name | wins | +----------------+------+ | Mulder, Mark | 21 | | Clemens, Roger | 20 | | Moyer, Jamie | 20 | | Garcia, Freddy | 18 | | Hudson, Tim | 18 |

| Abbott, Paul | 17 | | Mays, Joe | 17 | | Mussina, Mike | 17 | | Sabathia, C.C. | 17 | | Zito, Barry | 17 | | Buehrle, Mark | 16 | | Milton, Eric | 15 | | Pettitte, Andy | 15 | | Radke, Brad | 15 | | Sele, Aaron | 15 | +----------------+------+ These pit chers can be assigned ranks using t he t hird m et hod as follows:

mysql> SET @rownum = 0, @rank = 0, @prev_val = NULL; mysql> SELECT @rownum := @rownum + 1 AS row, -> @rank := IF(@prev_val!=wins,@rownum,@rank) AS rank, -> name, -> @prev_val := wins AS wins -> FROM al_winner ORDER BY wins DESC; +------+------+----------------+------+ | row | rank | name | wins | +------+------+----------------+------+ | 1 | 1 | Mulder, Mark | 21 | | 2 | 2 | Clemens, Roger | 20 | | 3 | 2 | Moyer, Jamie | 20 | | 4 | 4 | Garcia, Freddy | 18 | | 5 | 4 | Hudson, Tim | 18 | | 6 | 6 | Abbott, Paul | 17 | | 7 | 6 | Mays, Joe | 17 | | 8 | 6 | Mussina, Mike | 17 | | 9 | 6 | Sabathia, C.C. | 17 | | 10 | 6 | Zito, Barry | 17 | | 11 | 11 | Buehrle, Mark | 16 | | 12 | 12 | Milton, Eric | 15 | | 13 | 12 | Pettitte, Andy | 15 | | 14 | 12 | Radke, Brad | 15 | | 15 | 12 | Sele, Aaron | 15 | +------+------+----------------+------+

Chapter 14. Handling Duplicates Sect ion 14.1. I nt roduct ion Sect ion 14.2. Prevent ing Duplicat es from Occurring in a Table Sect ion 14.3. Dealing wit h Duplicat es at Record-Creat ion Tim e Sect ion 14.4. Count ing and I dent ifying Duplicat es Sect ion 14.5. Elim inat ing Duplicat es from a Query Result Sect ion 14.6. Elim inat ing Duplicat es from a Self- Join Result Sect ion 14.7. Elim inat ing Duplicat es from a Table

14.1 Introduction Tables or result set s som et im es cont ain duplicat e records. I n som e cases t his is accept able. For exam ple, if you conduct a web poll t hat records dat es and client I P num bers along wit h t he vot es, duplicat e records m ay be allowable, because it 's possible for large num bers of vot es t o appear t o originat e from t he sam e I P num ber for an I nt ernet service t hat rout es t raffic from it s cust om ers t hrough a single proxy host . I n ot her cases, duplicat es will be unaccept able, and you'll want t o t ake st eps t o avoid t hem . Operat ions relat ed t o handling of duplicat e records include t he following:



Count ing t he num ber of duplicat es t o det erm ine whet her t hey occur and t o what ext ent .



I dent ifying duplicat ed values ( or t he records cont aining t hem ) so you can see what t hey are and where t hey occur.



Elim inat ing duplicat es t o ensure t hat each record is unique. This m ay involve rem oving rows from a t able t o leave only unique records. Or it m ay involve select ing a result set in such a way t hat no duplicat es appear in t he out put . ( For exam ple, t o display a list of t he st at es in which you have cust om ers, you probably wouldn't want a long list of st at e nam es from all cust om er records. A list showing each st at e nam e only once suffices and is easier t o underst and.)



Prevent ing duplicat es from being creat ed wit hin a t able in t he first place. I f each record in a t able is int ended t o represent a single ent it y ( such as a person, an it em in a cat alog, or a specific observat ion in an experim ent ) , t he occurrence of duplicat es present s significant difficult ies in using it t hat way. Duplicat es m ake it im possible t o refer t o som e records in t he t able unam biguously, so it 's best t o m ake sure duplicat es never occur.

Several t ools are at your disposal for dealing wit h duplicat e records. These can be chosen according t o t he obj ect ive you're t rying t o achieve:



Creat ing a t able t o include a unique index will prevent duplicat es from being added t o t he t able. MySQL will use t he index t o enforce t he requirem ent t hat each record in t he t able cont ains a unique key in t he indexed colum n or colum ns.



I n conj unct ion wit h a unique index, t he

INSERT IGNORE and REPLACE

st at em ent s allow you t o handle insert ion of duplicat e records gracefully wit hout generat ing errors. For bulk- loading operat ions, t he sam e opt ions are available in t he

LOAD DATA st at em ent . I f you need t o det erm ine whet her or not a t able cont ains duplicat es, GROUP BY cat egorizes rows int o groups, and COUNT( ) shows how m any rows are in each form of t he



IGNORE or REPLACE m odifiers for

t he

group. These are described in Chapt er 7 in t he cont ext of producing sum m aries, but t hey're useful for duplicat e count ing and ident ificat ion as well. Aft er all, a count ing sum m ary is essent ially an operat ion t hat groups values int o cat egories t o det erm ine how frequent ly each occurs.



SELECT DISTINCT is useful for

rem oving duplicat e rows from a result set t o

leave only unique records. Adding a unique index t o a t able can rem ove duplicat es t hat are present in t he t able. I f you det erm ine t hat t here are t able, you can use

n ident ical records in a

DELETE ... LIMIT t o elim inat e n- 1 inst ances from

t hat specific

set of rows. This chapt er describes how each of t hese t echniques applies t o duplicat e ident ificat ion and rem oval, but before proceeding furt her, I should define what "duplicat e" m eans here. When people say "duplicat e record," t hey m ay m ean different t hings. For purposes of t his chapt er, one record is a duplicat e of anot her if bot h rows cont ain t he sam e values in colum ns t hat are supposed t o dist inguish t hem . Consider t he following t able:

mysql> SELECT * FROM person; +------+-----------+------------+---------------+------+ | id | last_name | first_name | address | age | +------+-----------+------------+---------------+------+ | 1 | Smith | Jim | 428 Mill Road | 36 | | 2 | Smith | Joan | 428 Mill Road | 36 | | 3 | Smith | Junior | 428 Mill Road | 12 | +------+-----------+------------+---------------+------+ None of t hese records are duplicat es if you com pare rows using all t he colum ns, because t hen

id and first_name colum ns, each of which happen t o cont ain only unique values. However, if you look only at t he last_name or address colum ns,

t he records cont ain t he

all t he records cont ain duplicat ed values. Lying bet ween t hese ext rem es, a result set consist ing of t he

age colum n cont ains a m ix of unique and duplicat ed values.

Script s relat ed t o t he exam ples shown in t his chapt er are locat ed in t he dups direct ory of t he

recipes dist ribut ion. For

script s t hat creat e t he t ables used here, look in t he t ables

direct ory.

14.2 Preventing Duplicates from Occurring in a Table 14.2.1 Problem You want t o prevent a t able from ever cont aining duplicat es, so t hat you won't have t o worry about elim inat ing t hem lat er.

14.2.2 Solution Use a

PRIMARY KEY or a UNIQUE index.

14.2.3 Discussion

To m ake sure t hat records in a t able are unique, som e colum n or com binat ion of colum ns m ust be required t o cont ain unique values in each row. When t his requirem ent is sat isfied, you can refer t o any record in t he t able unam biguously using it s unique ident ifier. To m ake sure a t able has t his charact erist ic, include a

PRIMARY KEY or UNIQUE index

in t he t able

st ruct ure when you creat e t he t able. The following t able cont ains no such index, so it would allow duplicat e records:

CREATE TABLE person ( last_name CHAR(20), first_name CHAR(20), address CHAR(40) ); To prevent m ult iple records wit h t he sam e first and last nam e values from being creat ed in

PRIMARY KEY t o it s definit ion. When you do t his, it 's also necessary t o declare t he indexed colum ns t o be NOT NULL, because a PRIMARY KEY does not allow NULL values: t his t able, add a

CREATE TABLE person ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, address CHAR(40), PRIMARY KEY (last_name, first_name) ); The presence of a unique index in a t able norm ally causes an error t o occur if you insert a record int o t he t able t hat duplicat es an exist ing record in t he colum n or colum ns t hat define t he index. Recipe 14.3 discusses how t o handle such errors or m odify MySQL's duplicat ehandling behavior.

PRIMARY KEY t o a t able. The t wo t ypes of indexes are ident ical, wit h t he except ion t hat a UNIQUE index can be creat ed on colum ns t hat allow NULL values. For t he person t able, it 's likely Anot her way t o enforce uniqueness is t o add a

UNIQUE index

rat her t han a

t hat you'd require bot h t he first and last nam es t o be filled in. I f so, you'd st ill declare t he colum ns as

NOT NULL, and t he following t able declarat ion would be effect ively equivalent

t he preceding one:

CREATE TABLE person ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, address CHAR(40), UNIQUE (last_name, first_name) );

to

If a

UNIQUE index

does happen t o allow

NULL values, NULL is special because it

is t he

one value t hat can occur m ult iple t im es. The rat ionale for t his is t hat it is not possible t o know whet her one unknown value is t he sam e as anot her, so m ult iple unknown values are allowed. I t m ay of course be t hat you'd want t he

person t able t o reflect

t he real world, in which

people do som et im es have t he sam e nam e. I n t his case, you cannot set up a unique index based on t he nam e colum ns, because duplicat e nam es m ust be allowed. I nst ead, each person m ust be assigned som e sort of unique ident ifier, which becom es t he value t hat dist inguishes one record from anot her. I n MySQL, a com m on t echnique for t his is t he

AUTO_INCREMENT colum n: CREATE TABLE person ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, last_name CHAR(20), first_name CHAR(20), address CHAR(40), PRIMARY KEY (id) ); I n t his case, when you creat e a record wit h an

id value of NULL, MySQL assigns t hat

colum n a unique I D aut om at ically. Anot her possibilit y is t o assign ident ifiers ext ernally and use t hose I Ds as unique keys. For exam ple, cit izens in a given count ry m ight have unique t axpayer I D num bers. I f so, t hose num bers can serve as t he basis for a unique index:

CREATE TABLE person ( tax_id INT UNSIGNED NOT NULL, last_name CHAR(20), first_name CHAR(20), address CHAR(40), PRIMARY KEY (tax_id) );

14.2.4 See Also AUTO_INCREMENT colum ns are discussed furt her

in Chapt er 11.

14.3 Dealing with Duplicates at Record-Creation Time 14.3.1 Problem You've creat ed a t able wit h a unique index t o prevent duplicat e values in t he indexed colum n or colum ns. But t his result s in an error if you at t em pt t o insert a duplicat e record, and you want t o avoid having t o deal wit h such errors.

14.3.2 Solution

One approach is t o j ust ignore t he error. Anot her is t o use eit her an

INSERT IGNORE or

REPLACE st at em ent , each of which m odifies MySQL's duplicat e- handling behavior. For bulkloading operat ions, LOAD DATA has m odifiers t hat allow you t o specify how t o handle duplicat es.

14.3.3 Discussion By default , MySQL generat es an error when you insert a record t hat duplicat es an exist ing unique key. For exam ple, you'll see t he following result if t he unique index on t he

person t able cont ains a

last_name and first_name colum ns:

mysql> INSERT INTO person (last_name, first_name) -> VALUES('X1','Y1'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO person (last_name, first_name) -> VALUES('X1','Y1'); ERROR 1062 at line 1: Duplicate entry 'X1-Y1' for key 1 I f you're issuing t he st at em ent s from t he m ysql program int eract ively, you can sim ply say, "Okay, t hat didn't work," ignore t he error, and cont inue. But if you writ e a program t o insert t he records, an error m ay t erm inat e t he program . One way t o avoid t his is t o m odify t he program 's error- handling behavior t o t rap t he error and t hen ignore it . See Recipe 2.3 for inform at ion about error- handling t echniques. I f you want t o prevent t he error from occurring in t he first place, you m ight consider using a t wo- query m et hod t o solve t he duplicat e- record problem : issue a

SELECT t o see if t he

INSERT if it 's not . But t hat doesn't really work. t he sam e record aft er t he SELECT and before t he INSERT, in

record is already present , followed by an Anot her client m ight insert

which case t he error would st ill occur. To m ake sure t hat doesn't happen, you could use a t ransact ion or lock t he t ables, but t hen you're up from t wo st at em ent s t o four. MySQL provides t wo single-query solut ions t o t he problem of handling duplicat e records:



Use

INSERT IGNORE rat her

t han

INSERT. I f a record doesn't

duplicat e an

exist ing record, MySQL insert s it as usual. I f t he record is a duplicat e, t he keyword t ells MySQL t o discard it silent ly wit hout generat ing an error:

• • • • •

mysql> INSERT IGNORE INTO person (last_name, first_name) -> VALUES('X2','Y2'); Query OK, 1 row affected (0.00 sec) mysql> INSERT IGNORE INTO person (last_name, first_name) -> VALUES('X2','Y2'); Query OK, 0 rows affected (0.00 sec)

IGNORE

The row count value indicat es whet her t he record was insert ed or ignored. From wit hin a program , you can obt ain t his value by checking t he rows- affect ed funct ion provided by your API . ( See Recipe 2.5 and Recipe 9.2.)



REPLACE rat her t han INSERT. I f t he record is new, it 's insert ed j ust INSERT. I f it 's a duplicat e, t he new record replaces t he old one:

• • • • •

mysql> REPLACE INTO person (last_name, first_name) -> VALUES('X3','Y3'); Query OK, 1 row affected (0.00 sec) mysql> REPLACE INTO person (last_name, first_name) -> VALUES('X3','Y3'); Query OK, 2 rows affected (0.00 sec)

Use

as wit h

The rows- affect ed value in t he second case is 2 because t he original record is delet ed and t he new record is insert ed in it s place.

INSERT IGNORE and REPLACE should be chosen according t o t he duplicat e- handling behavior you want t o effect . INSERT IGNORE keeps t he first of a set of duplicat ed records and discards t he rest . REPLACE keeps t he last of a set of duplicat es and kicks out any earlier ones. INSERT IGNORE is m ore efficient t han REPLACE because it doesn't act ually insert duplicat es. Thus, it 's m ost applicable when you j ust want t o m ake sure a copy of a given record is present in a t able.

REPLACE, on t he ot her

hand, is oft en m ore

appropriat e for t ables in which ot her non- key colum ns m ay need updat ing. Suppose you're m aint aining a t able nam ed

passtbl for

a web applicat ion t hat cont ains em ail addresses

and passwords and t hat is keyed by em ail address:

CREATE TABLE passtbl ( email CHAR(60) NOT NULL, password CHAR(20) BINARY NOT NULL, PRIMARY KEY (email) ); How do you creat e records for new users, and change passwords for exist ing users? Wit hout

REPLACE, creat ing a new user and changing an exist ing user's password m ust

be handled

different ly. A t ypical algorit hm for handling record m aint enance m ight look like t his:



I ssue a



I f no such record exist s, add a new one wit h



I f t he record does exist , updat e it

SELECT t o see if a record already

exist s wit h a given

INSERT. wit h UPDATE.

email value.

All of t hat m ust be perform ed wit hin a t ransact ion or wit h t he t ables locked t o prevent ot her users from changing t he t ables while you're using t hem . Wit h

REPLACE, you can sim plify

bot h cases t o t he sam e single-st at em ent operat ion:

REPLACE INTO passtbl (email,password) VALUES(address,passval); I f no record wit h t he given em ail address exist s, MySQL creat es a new one. I f a record does exist , MySQL replaces it ; in effect , t his updat es t he

password colum n of t he record

associat ed wit h t he address.

INSERT IGNORE and REPLACE have t he benefit

of elim inat ing overhead t hat m ight

ot herwise be required for a t ransact ion. But t his benefit com es at t he price of port abilit y, because bot h are MySQL- specific st at em ent s. I f port abilit y is a high priorit y, you m ight prefer t o st ick wit h a t ransact ional approach. For bulk- load operat ions in which you use t he

LOAD DATA st at em ent

t o load a set of

records from a file int o a t able, duplicat e- record handling can be cont rolled using t he

IGNORE and REPLACE m odifiers. These produce behavior analogous t o t hat of t he INSERT IGNORE and REPLACE st at em ent s. See Recipe 10.8 for m ore st at em ent 's

inform at ion.

14.4 Counting and Identifying Duplicates 14.4.1 Problem You want t o find out if a t able cont ains duplicat es, and t o what ext ent t hey occur. Or you want t o see t he records t hat cont ain t he duplicat ed values.

14.4.2 Solution Use a count ing sum m ary t hat looks for and displays duplicat ed values. To see t he records in which t he duplicat ed values occur, j oin t he sum m ary t o t he original t able t o display t he m at ching records.

14.4.3 Discussion Suppose t hat your web sit e includes a sign- up page t hat allows visit ors t o add t hem selves t o your m ailing list t o receive periodic product cat alog m ailings. But you forgot t o include a unique index in t he t able when you creat ed it , and now you suspect t hat som e people are signed up m ult iple t im es. Perhaps t hey forgot t hey were already on t he list , or perhaps people added friends t o t he list who were already signed up. Eit her way, t he result of t he duplicat e records is t hat you m ail out duplicat e cat alogs. This is an addit ional expense t o you, and it annoys t he recipient s. This sect ion discusses how t o find out if duplicat es are present in a

t able, how prevalent t hey are, and how t o display t he duplicat ed records. ( For t ables t hat do cont ain duplicat es, Recipe 14.7 describes how t o elim inat e t hem .) To det erm ine whet her or not duplicat es occur in a t able, use a count ing sum m ary, a t opic covered in Chapt er 7. Sum m ary t echniques can be applied t o ident ifying and count ing duplicat es by grouping records wit h

GROUP BY and count ing t he rows in each group using

COUNT( ). For t he exam ples, assum e t hat cat alog recipient s are list ed in a t able nam ed cat_mailing t hat has t he following cont ent s: mysql> SELECT * FROM cat_mailing; +-----------+-------------+--------------------------+ | last_name | first_name | street | +-----------+-------------+--------------------------+ | Isaacson | Jim | 515 Fordam St., Apt. 917 | | Baxter | Wallace | 57 3rd Ave. | | McTavish | Taylor | 432 River Run | | Pinter | Marlene | 9 Sunset Trail | | BAXTER | WALLACE | 57 3rd Ave. | | Brown | Bartholomew | 432 River Run | | Pinter | Marlene | 9 Sunset Trail | | Baxter | Wallace | 57 3rd Ave., Apt 102 | +-----------+-------------+--------------------------+ Suppose you want t o define "duplicat e" using t he

last_name and first_name

colum ns. That is, recipient s wit h t he sam e nam e are assum ed t o be t he sam e person. ( This is a sim plificat ion, of course.) The following queries are t ypical of t hose used t o charact erize t he t able and t o assess t he exist ence and ext ent of duplicat e values:



The t ot al num ber of rows in t he t able:

• • • • •

mysql> SELECT COUNT(*) AS rows FROM cat_mailing; +------+ | rows | +------+ | 8 | +------+



The num ber of dist inct nam es:



mysql> SELECT COUNT(DISTINCT last_name, first_name) AS 'distinct names' -> FROM cat_mailing; +----------------+ | distinct names | +----------------+ | 5 | +----------------+

• • • • • •

The num ber of rows cont aining duplicat ed nam es:

• •

mysql> SELECT COUNT(*) - COUNT(DISTINCT last_name, first_name) -> AS 'duplicate names'

• • • • •

-> FROM cat_mailing; +-----------------+ | duplicate names | +-----------------+ | 3 | +-----------------+

• • • • • • • • • •

The fract ion of t he records t hat cont ain unique or non- unique nam es:

mysql> SELECT COUNT(DISTINCT last_name, first_name) / COUNT(*) -> AS 'unique', -> 1 - (COUNT(DISTINCT last_name, first_name) / COUNT(*)) -> AS 'non-unique' -> FROM cat_mailing; +--------+------------+ | unique | non-unique | +--------+------------+ | 0.62 | 0.38 | +--------+------------+

These queries help you charact erize t he ext ent of duplicat es, but don't show you which values are duplicat ed. To see which nam es are duplicat ed in t he

cat_mailing t able, use a

sum m ary query t hat displays t he non- unique values along wit h t he count s:

mysql> SELECT COUNT(*) AS repetitions, last_name, first_name -> FROM cat_mailing -> GROUP BY last_name, first_name -> HAVING repetitions > 1; +-------------+-----------+------------+ | repetitions | last_name | first_name | +-------------+-----------+------------+ | 3 | Baxter | Wallace | | 2 | Pinter | Marlene | +-------------+-----------+------------+ The query includes a

HAVING clause t hat

rest rict s t he out put t o include only t hose nam es

t hat occur m ore t han once. ( I f you om it t he clause, t he sum m ary list s unique nam es as well, which is useless when you're int erest ed only in duplicat es.) I n general, t o ident ify set s of values t hat are duplicat ed, do t he following:



Det erm ine which colum ns cont ain t he values t hat m ay be duplicat ed.



List t hose colum ns in t he colum n select ion list , along wit h



List t he colum ns in t he



Add a

COUNT(*).

GROUP BY clause as well.

HAVING clause t hat

elim inat es unique values by requiring group count s t o be

great er t han one. Queries const ruct ed t his way have t he following form :

SELECT COUNT(*),

column_list

tbl_name GROUP BY column_list FROM

HAVING COUNT(*) > 1 I t 's easy t o generat e duplicat e- finding queries like t hat wit hin a program , given a t able nam e and a nonem pt y set of colum n nam es. For exam ple, here is a Perl funct ion,

make_dup_count_query( ), t hat

generat es t he proper query for finding and

count ing duplicat ed values in t he specified colum ns:

sub make_dup_count_query { my ($tbl_name, @col_name) = @_; return ( "SELECT COUNT(*)," . join (",", @col_name) . "\nFROM $tbl_name" . "\nGROUP BY " . join (",", @col_name) . "\nHAVING COUNT(*) > 1" ); }

make_dup_count_query( ) ret urns t he query

as a st ring. I f you invoke it like t his:

$str = make_dup_count_query ("cat_mailing", "last_name", "first_name"); The result ing value of

$str is:

SELECT COUNT(*),last_name,first_name FROM cat_mailing GROUP BY last_name,first_name HAVING COUNT(*) > 1 What you do wit h t he query st ring is up t o you. You can execut e it from wit hin t he script t hat creat es it , pass it t o anot her program , or writ e it t o a file for execut ion lat er. The dups direct ory of t he

recipes dist ribut ion cont ains a script

nam ed dup_count .pl t hat you can

use t o t ry out t he funct ion ( as well as som e t ranslat ions int o ot her languages) . Lat er in t his chapt er, Recipe 14.7 uses t he

make_dup_count_query( ) funct ion t o im plem ent

duplicat e- rem oval t echnique. Sum m ary t echniques are useful for assessing t he exist ence of duplicat es, how oft en t hey occur, and displaying which values are duplicat ed. But a sum m ary in it self cannot display t he ent ire cont ent of t he records t hat cont ain t he duplicat e values. ( For exam ple, t he sum m aries shown t hus far display count s of duplicat ed nam es in t he

cat_mailing t able or

t he

nam es t hem selves, but don't show t he addresses associat ed wit h t hose nam es.) To see t he original records cont aining t he duplicat e nam es, j oin t he sum m ary inform at ion t o t he t able from which it 's generat ed. The following exam ple shows how t o do t his t o display t he

a

cat_mailing records t hat

cont ain duplicat ed nam es. The sum m ary is writ t en t o a

t em porary t able, which t hen is j oined t o t he

cat_mailing t able t o produce t he records

t hat m at ch t hose nam es:

mysql> CREATE TABLE tmp -> SELECT COUNT(*) AS count, last_name, first_name -> FROM cat_mailing GROUP BY last_name, first_name HAVING count > 1; mysql> SELECT cat_mailing.* -> FROM tmp, cat_mailing -> WHERE tmp.last_name = cat_mailing.last_name -> AND tmp.first_name = cat_mailing.first_name -> ORDER BY last_name, first_name; +-----------+------------+----------------------+ | last_name | first_name | street | +-----------+------------+----------------------+ | Baxter | Wallace | 57 3rd Ave. | | BAXTER | WALLACE | 57 3rd Ave. | | Baxter | Wallace | 57 3rd Ave., Apt 102 | | Pinter | Marlene | 9 Sunset Trail | | Pinter | Marlene | 9 Sunset Trail | +-----------+------------+----------------------+

Duplicate Identification and String Case Sensitivity Non- binary st rings t hat differ in let t ercase are considered t he sam e for com parison purposes. To consider t hem as dist inct , use t he

BINARY keyword t o m ake t hem

case sensit ive.

14.5 Eliminating Duplicates from a Query Result 14.5.1 Problem You want t o select rows in a query result in such a way t hat it cont ains no duplicat es.

14.5.2 Solution Use

SELECT DISTINCT.

14.5.3 Discussion Rows in query result s som et im es cont ain duplicat e rows. This is part icularly com m on when you select only a subset of t he colum ns in a t able, because t hat reduces t he am ount of inform at ion available t hat m ight ot herwise dist inguish one row from anot her. To obt ain only t he unique rows in a result , elim inat e t he duplicat es by adding t he

DISTINCT keyword.

That t ells MySQL t o ret urn only one inst ance of each set of colum n values. For exam ple, if you

select t he nam e colum ns from t he

cat_mailing t able wit hout

using

DISTINCT,

several duplicat es occur:

mysql> SELECT last_name, first_name -> FROM cat_mailing ORDER BY last_name, first_name; +-----------+-------------+ | last_name | first_name | +-----------+-------------+ | Baxter | Wallace | | BAXTER | WALLACE | | Baxter | Wallace | | Brown | Bartholomew | | Isaacson | Jim | | McTavish | Taylor | | Pinter | Marlene | | Pinter | Marlene | +-----------+-------------+ Wit h

DISTINCT, t he duplicat es are elim inat ed:

mysql> SELECT DISTINCT last_name, first_name -> FROM cat_mailing ORDER BY last_name; +-----------+-------------+ | last_name | first_name | +-----------+-------------+ | Baxter | Wallace | | Brown | Bartholomew | | Isaacson | Jim | | McTavish | Taylor | | Pinter | Marlene | +-----------+-------------+ An alt ernat ive t o

DISTINCT is t o add a GROUP BY clause t hat

nam es t he colum ns you're

select ing. This has t he effect of rem oving duplicat es and select ing only t he unique com binat ions of values in t he specified colum ns:

mysql> SELECT last_name, first_name FROM cat_mailing -> GROUP BY last_name, first_name; +-----------+-------------+ | last_name | first_name | +-----------+-------------+ | Baxter | Wallace | | Brown | Bartholomew | | Isaacson | Jim | | McTavish | Taylor | | Pinter | Marlene | +-----------+-------------+

14.5.4 See Also SELECT DISTINCT is discussed furt her

in Recipe 7.5.

14.6 Eliminating Duplicates from a Self-Join Result 14.6.1 Problem Self- j oins oft en produce rows t hat are "near" duplicat es—t hat is, rows t hat cont ain t he sam e values but in different orders. Because of t his,

SELECT DISTINCT will not

elim inat e t he

duplicat es.

14.6.2 Solution Select colum n values in a specific order wit hin rows t o m ake rows wit h duplicat e set s of values ident ical. Then you can use

SELECT DISTINCT t o rem ove duplicat es. Alt ernat ively,

ret rieve rows in such a way t hat near- duplicat es are not even select ed.

14.6.3 Discussion Self- j oins can produce rows t hat are duplicat es in t he sense t hat t hey cont ain t he sam e values, yet are not ident ical. Consider t he following query, which uses a self- j oin t o find all pairs of st at es t hat j oined t he Union in t he sam e year:

mysql> SELECT YEAR(s2.statehood) AS year, s1.name, s2.name -> FROM states AS s1, states AS s2 -> WHERE YEAR(s1.statehood) = YEAR(s2.statehood) -> AND s1.name != s2.name -> ORDER BY year, s1.name, s2.name; +------+----------------+----------------+ | year | name | name | +------+----------------+----------------+ | 1787 | Delaware | New Jersey | | 1787 | Delaware | Pennsylvania | | 1787 | New Jersey | Delaware | | 1787 | New Jersey | Pennsylvania | | 1787 | Pennsylvania | Delaware | | 1787 | Pennsylvania | New Jersey | ... | 1912 | Arizona | New Mexico | | 1912 | New Mexico | Arizona | | 1959 | Alaska | Hawaii | | 1959 | Hawaii | Alaska | +------+----------------+----------------+ The condit ion in t he

WHERE clause t hat

requires st at e pair nam es not t o be ident ical

elim inat es t he t rivially t rue rows showing t hat each st at e j oined t he Union in t he sam e year as it self. But each rem aining pair of st at es st ill appears t wice. For exam ple, t here is one row t hat list s Delaware and New Jersey, and anot her t hat list s New Jersey and Delaware. Each such pair of rows m ay be considered as effect ive duplicat es because t hey cont ain t he sam e values. However, because t he values are not list ed in t he sam e order wit hin t he rows, t hey are not ident ical and you can't get rid of t he duplicat es by adding

DISTINCT t o t he query.

One way t o solve t his problem is t o m ake sure t hat st at e nam es are always list ed in a specific order wit hin a row. This can be done by select ing t he nam es wit h a pair of expressions t hat place t he lesser value first in t he out put colum n list :

IF(val1 IF(s1.name IF(s1.name FROM states AS s1, states AS s2 -> WHERE YEAR(s1.statehood) = YEAR(s2.statehood) -> AND s1.name != s2.name -> ORDER BY year, state1, state2; +------+----------------+----------------+ | year | state1 | state2 | +------+----------------+----------------+ | 1787 | Delaware | New Jersey | | 1787 | Delaware | New Jersey | | 1787 | Delaware | Pennsylvania | | 1787 | Delaware | Pennsylvania | | 1787 | New Jersey | Pennsylvania | | 1787 | New Jersey | Pennsylvania | ... | 1912 | Arizona | New Mexico | | 1912 | Arizona | New Mexico | | 1959 | Alaska | Hawaii | | 1959 | Alaska | Hawaii | +------+----------------+----------------+ Duplicat e rows are st ill present in t he out put , but now duplicat e pairs are ident ical and t he ext ra copies can be elim inat ed by adding

DISTINCT t o t he query:

mysql> SELECT DISTINCT YEAR(s2.statehood) AS year, -> IF(s1.name IF(s1.name FROM states AS s1, states AS s2 -> WHERE YEAR(s1.statehood) = YEAR(s2.statehood) -> AND s1.name != s2.name -> ORDER BY year, state1, state2; +------+----------------+----------------+ | year | state1 | state2 | +------+----------------+----------------+ | 1787 | Delaware | New Jersey | | 1787 | Delaware | Pennsylvania | | 1787 | New Jersey | Pennsylvania | ... | 1912 | Arizona | New Mexico | | 1959 | Alaska | Hawaii | +------+----------------+----------------+

An alt ernat ive approach t o rem oving non- ident ical duplicat es relies not so m uch on det ect ing and elim inat ing t hem as on select ing rows in such a way t hat only one row from each pair ever appears in t he query result . This m akes it unnecessary t o reorder values wit hin out put rows or t o use

DISTINCT.

For t he st at e-pairs query, select ing only t hose rows where t he first st at e

nam e is lexically less t han t he second aut om at ically elim inat es rows where t he nam es appear in t he ot her order: [1]

[1]

The same constraint also eliminates those rows where the state names are identical.

mysql> SELECT YEAR(s2.statehood) AS year, s1.name, s2.name -> FROM states AS s1, states AS s2 -> WHERE YEAR(s1.statehood) = YEAR(s2.statehood) -> AND s1.name < s2.name -> ORDER BY year, s1.name, s2.name; +------+----------------+----------------+ | year | name | name | +------+----------------+----------------+ | 1787 | Delaware | New Jersey | | 1787 | Delaware | Pennsylvania | | 1787 | New Jersey | Pennsylvania | ... | 1912 | Arizona | New Mexico | | 1959 | Alaska | Hawaii | +------+----------------+----------------+

14.7 Eliminating Duplicates from a Table 14.7.1 Problem You want t o rem ove duplicat e records from a t able so t hat it cont ains only unique rows.

14.7.2 Solution Select t he unique rows from t he t able int o a second t able t hat you use t o replace t he original

ALTER TABLE, which will rem ove duplicat es LIMIT n t o rem ove all but one inst ance of a

one. Or add a unique index t o t he t able using as it builds t he index. Or use

DELETE ...

specific set of duplicat e rows.

14.7.3 Discussion I f you forget t o creat e a t able wit h a unique index t o prevent t he occurrence of duplicat es wit hin t he t able, you m ay discover lat er t hat it 's necessary t o apply som e sort of duplicat erem oval t echnique. The

cat_mailing t able used in earlier sect ions is an exam ple of t his,

because it cont ains several inst ances where t he sam e person is list ed m ult iple t im es.

mysql> SELECT * FROM cat_mailing ORDER BY last_name, first_name; +-----------+-------------+--------------------------+ | last_name | first_name | street |

+-----------+-------------+--------------------------+ | Baxter | Wallace | 57 3rd Ave. | | BAXTER | WALLACE | 57 3rd Ave. | | Baxter | Wallace | 57 3rd Ave., Apt 102 | | Brown | Bartholomew | 432 River Run | | Isaacson | Jim | 515 Fordam St., Apt. 917 | | McTavish | Taylor | 432 River Run | | Pinter | Marlene | 9 Sunset Trail | | Pinter | Marlene | 9 Sunset Trail | +-----------+-------------+--------------------------+ The t able cont ains redundant ent ries and it would be a good idea t o rem ove t hem , t o elim inat e duplicat e m ailings and reduce post age cost s. To do t his, you have several opt ions:



Select t he t able's unique rows int o anot her t able, t hen use t hat t able t o replace t he original one. The result is t o rem ove t he t able's duplicat es. This works when "duplicat e" m eans "t he ent ire row is t he sam e as anot her."



Add a unique index t o t he t able using

ALTER TABLE. This operat ion will rem ove

duplicat e rows based on t he cont ent s of t he indexed colum ns.



You can rem ove duplicat es for a specific set of duplicat e rows by using

LIMIT n t o rem ove all but

DELETE ...

one of t he rows.

This sect ion discusses each of t hese duplicat e- rem oval m et hods. When you consider which of t hem t o choose under various circum st ances, not e t hat t he applicabilit y of a given m et hod t o a specific problem oft en will be det erm ined by t wo fact ors:



Does t he m et hod require t he t able t o have a unique index?



I f t he colum ns in which duplicat e values occur m ay cont ain rem ove duplicat e

NULL, will t he m et hod

NULL values?

14.7.4 Removing Duplicates Using Table Replacement One way t o elim inat e duplicat es from a t able is t o select it s unique records int o a new t able t hat has t he sam e st ruct ure. Then replace t he original t able wit h t he new one. I f a row is considered t o duplicat e anot her only if t he ent ire row is t he sam e, you can use

DISTINCT t o select

t he unique rows:

mysql> CREATE TABLE tmp SELECT DISTINCT * FROM cat_mailing; mysql> SELECT * FROM tmp ORDER BY last_name, first_name; +-----------+-------------+--------------------------+ | last_name | first_name | street | +-----------+-------------+--------------------------+ | Baxter | Wallace | 57 3rd Ave. | | Baxter | Wallace | 57 3rd Ave., Apt 102 | | Brown | Bartholomew | 432 River Run | | Isaacson | Jim | 515 Fordam St., Apt. 917 | | McTavish | Taylor | 432 River Run | | Pinter | Marlene | 9 Sunset Trail |

SELECT

+-----------+-------------+--------------------------+ This m et hod works in t he absence of an index ( t hough it m ight be slow for large t ables) , and for t ables t hat cont ain duplicat e

NULL values, it

will rem ove t hose duplicat es. Not e t hat t his

m et hod considers t he rows for Wallace Baxt er t hat have slight ly different

street values t o

be dist inct . I f duplicat es are defined only wit h respect t o a subset of t he colum ns in t he t able, creat e a new t able t hat has a unique index first , t hen select rows int o it using

INSERT IGNORE.

mysql> CREATE TABLE tmp ( -> last_name CHAR(40) NOT NULL, -> first_name CHAR(40) NOT NULL, -> street CHAR(40) NOT NULL, -> PRIMARY KEY (last_name, first_name)); mysql> INSERT IGNORE INTO tmp SELECT * FROM cat_mailing; mysql> SELECT * FROM tmp ORDER BY last_name, first_name; +-----------+-------------+--------------------------+ | last_name | first_name | street | +-----------+-------------+--------------------------+ | Baxter | Wallace | 57 3rd Ave. | | Brown | Bartholomew | 432 River Run | | Isaacson | Jim | 515 Fordam St., Apt. 917 | | McTavish | Taylor | 432 River Run | | Pinter | Marlene | 9 Sunset Trail | +-----------+-------------+--------------------------+ The index prevent s records wit h duplicat e key values from being insert ed int o

IGNORE t ells MySQL not

tmp,

and

t o st op wit h an error if a duplicat e is found. One short com ing of

t his m et hod is t hat if t he indexed colum ns can cont ain

NULL values, you m ust

UNIQUE index rat her t han a PRIMARY KEY, in which case t he index duplicat e NULL keys. ( UNIQUE indexes allow m ult iple NULL values.) Aft er creat ing t he new t able

tmp t hat

use a

will not rem ove

cont ains unique rows, use it t o replace t he original

cat_mailing t able. The effect ive result

is t hat

cat_mailing no longer

will cont ain

duplicat es:

mysql> DROP TABLE cat_mailing; mysql> ALTER TABLE tmp RENAME TO cat_mailing;

14.7.5 Removing Duplicates by Adding an Index To rem ove duplicat es from a t able "in place," add a unique index t o t he t able wit h

TABLE,

using t he

IGNORE keyword t o t ell it

ALTER

t o discard records wit h duplicat e key values

during t he index const ruct ion process. The original

cat_mailing t able looks like t his

wit hout an index:

mysql> SELECT * FROM cat_mailing ORDER BY last_name, first_name; +-----------+-------------+--------------------------+ | last_name | first_name | street | +-----------+-------------+--------------------------+ | Baxter | Wallace | 57 3rd Ave. | | BAXTER | WALLACE | 57 3rd Ave. | | Baxter | Wallace | 57 3rd Ave., Apt 102 | | Brown | Bartholomew | 432 River Run | | Isaacson | Jim | 515 Fordam St., Apt. 917 | | McTavish | Taylor | 432 River Run | | Pinter | Marlene | 9 Sunset Trail | | Pinter | Marlene | 9 Sunset Trail | +-----------+-------------+--------------------------+ Add a unique index, t hen see what effect doing so has on t he t able cont ent s:

mysql> ALTER IGNORE TABLE cat_mailing -> ADD PRIMARY KEY (last_name, first_name); mysql> SELECT * FROM cat_mailing ORDER BY last_name, first_name; +-----------+-------------+--------------------------+ | last_name | first_name | street | +-----------+-------------+--------------------------+ | Baxter | Wallace | 57 3rd Ave. | | Brown | Bartholomew | 432 River Run | | Isaacson | Jim | 515 Fordam St., Apt. 917 | | McTavish | Taylor | 432 River Run | | Pinter | Marlene | 9 Sunset Trail | +-----------+-------------+--------------------------+ I f t he indexed colum ns can cont ain

PRIMARY KEY. I n t hat

NULL, you m ust

case, t he index will not

UNIQUE index rat her t han a rem ove duplicat e NULL key values. use a

14.7.6 Removing Duplicates of a Particular Row As of MySQL 3.22.7, you can use

LIMIT t o rest rict

t he effect of a

DELETE st at em ent

subset of t he rows t hat it ot herwise would delet e. This m akes t he st at em ent applicable t o rem oving duplicat e records. Suppose you have a t able

+-------+ | color | +-------+ | blue | | green | | blue | | blue | | red | | green | | red | +-------+

t wit h t he following cont ent s:

to a

The t able list s

blue t hree t im es, and green and red t wice each. To rem ove t he ext ra

inst ances of each color, do t his:

mysql> DELETE mysql> DELETE mysql> DELETE mysql> SELECT +-------+ | color | +-------+ | blue | | green | | red | +-------+

FROM t FROM t FROM t * FROM

WHERE color = 'blue' LIMIT 2; WHERE color = 'green' LIMIT 1; WHERE color = 'red' LIMIT 1; t;

This t echnique works in t he absence of a unique index, and it will elim inat e duplicat e

NULL

values. I t 's handy if you want t o rem ove duplicat es only for a specific set of rows wit hin a t able. However, if t here are m any different set s of duplicat es t hat you want t o rem ove, t his is not a procedure you'd want t o carry out by hand. The process can be aut om at ed by using t he t echniques discussed earlier in Recipe 14.4 for det erm ining which values are duplicat ed. Recall t hat in t hat recipe we wrot e a

make_dup_count_query( ) funct ion t o generat e t he

query needed t o count t he num ber of duplicat e values in a given set of colum ns in a t able:

sub make_dup_count_query { my ($tbl_name, @col_name) = @_; return ( "SELECT COUNT(*)," . join (",", @col_name) . "\nFROM $tbl_name" . "\nGROUP BY " . join (",", @col_name) . "\nHAVING COUNT(*) > 1" ); }

delete_dups( ) t hat uses make_dup_count_query( ) t o find out which values in a t able are duplicat ed and We can writ e anot her funct ion

how oft en. From t hat inform at ion, we can figure out how m any duplicat es t o rem ove wit h

DELETE ... LIMIT n, so t hat

only unique inst ances rem ain. The

delete_dups( )

funct ion looks like t his:

sub delete_dups { my ($dbh, $tbl_name, @col_name) = @_; # Construct and run a query that finds duplicated values my $dup_info = $dbh->selectall_arrayref ( make_dup_count_query ($tbl_name, @col_name) ); return unless defined ($dup_info);

# For each duplicated set of values, delete all but one instance # of the rows containing those values foreach my $row_ref (@{$dup_info}) { my ($count, @col_val) = @{$row_ref}; next unless $count > 1; # Construct condition string to match values, being # careful to match NULL with IS NULL my $str; for (my $i = 0; $i < @col_name; $i++) { $str .= " AND " if $str; $str .= defined ($col_val[$i]) ? "$col_name[$i] = " . $dbh->quote ($col_val[$i]) : "$col_name[$i] IS NULL"; } $str = "DELETE FROM $tbl_name WHERE $str LIMIT " . ($count - 1); $dbh->do ($str); } } Suppose we have an

employee t able t hat

cont ains t he following records:

mysql> SELECT * FROM employee; +----------+------------+ | name | department | +----------+------------+ | Fred | accounting | | Fred | accounting | | Fred | accounting | | Fred | accounting | | Bob | shipping | | Mary Ann | shipping | | Mary Ann | shipping | | Mary Ann | sales | | Mary Ann | sales | | Mary Ann | sales | | Mary Ann | sales | | Mary Ann | sales | | Mary Ann | sales | | Boris | NULL | | Boris | NULL | +----------+------------+

delete_dups( ) funct ion t o elim inat e duplicat es on t he name and department colum ns of t he employee t able, call it like t his: To use t he

delete_dups ($dbh, "employee", "name", "department");

delete_dups( ) calls make_dup_count_query( ) and execut es t he SELECT query t hat it generat es. For t he employee t able, t hat query produces t he following result s:

+----------+----------+------------+ | COUNT(*) | name | department | +----------+----------+------------+ | 2 | Boris | NULL | | 4 | Fred | accounting | | 6 | Mary Ann | sales | | 2 | Mary Ann | shipping | +----------+----------+------------+

delete_dups( ) uses t hat

inform at ion t o generat e t he following

DELETE

st at em ent s:

DELETE FROM employee WHERE name = 'Boris' AND department IS NULL LIMIT 1 DELETE FROM employee WHERE name = 'Fred' AND department = 'accounting' LIMIT 3 DELETE FROM employee WHERE name = 'Mary Ann' AND department = 'sales' LIMIT 5 DELETE FROM employee WHERE name = 'Mary Ann' AND department = 'shipping' LIMIT 1 I n general, using

DELETE ... LIMIT n is likely t o be slower t han rem oving duplicat es by

using a second t able or by adding a unique index. Those m et hods keep t he dat a on t he server

DELETE ... LIMIT n involves a lot of client -server int eract ion because it uses a SELECT query t o ret rieve inform at ion about duplicat es, followed by several DELETE st at em ent s t o rem ove inst ances of duplicat ed rows. side and let t he server do all t he work.

DELETE ... LIMIT n st at em ent s from wit hin a program , be sure t o execut e t hem only for values of n great er t han When you issue

zero. That is not only sensible ( why wast e bandwidt h issuing a query t o delet e not hing?) , but it 's also necessary t o avoid a bug t hat affect s som e versions of MySQL. Logically, one would expect t hat a st at em ent of t he form

DELETE ... LIMIT 0 would delet e no

records, and t hat 's what happens for current versions of MySQL. But versions prior t o 3.23.40 have a bug such t hat as t hough t he

LIMIT clause is not

LIMIT 0 is t reat ed

present at all; t he result is t hat

DELETE delet es all t he select ed rows! ( For affect ed versions of MySQL, t his problem also occurs for UPDATE ... LIMIT 0.)

Chapter 15. Performing Transactions Sect ion 15.1. I nt roduct ion Sect ion 15.2. Verifying Transact ion Support Requirem ent s Sect ion 15.3. Perform ing Transact ions Using SQL Sect ion 15.4. Perform ing Transact ions from Wit hin Program s Sect ion 15.5. Using Transact ions in Perl Program s Sect ion 15.6. Using Transact ions in PHP Program s Sect ion 15.7. Using Transact ions in Pyt hon Program s Sect ion 15.8. Using Transact ions in Java Program s Sect ion 15.9. Using Alt ernat ives t o Transact ions

15.1 Introduction The MySQL server can service m ult iple client s at t he sam e t im e because it is m ult it hreaded. To deal wit h cont ent ion am ong client s, t he server perform s any necessary locking so t hat t wo client s cannot m odify t he sam e dat a at once. However, as t he server execut es st at em ent s, it 's very possible t hat successive queries received from a given client will be int erleaved wit h queries from ot her client s. I f a client issues m ult iple st at em ent s t hat are dependent on each ot her, t he fact t hat ot her client s m ay be updat ing t ables in bet ween t hose st at em ent s can cause difficult ies. St at em ent failures can be problem at ic, t oo, if a m ult iple- st at em ent operat ion does not run t o com plet ion. Suppose you have a

flight t able cont aining inform at ion about

airline flight schedules and you want t o updat e t he record for flight 578 by choosing a pilot from am ong t hose available. You m ight do so using t hree st at em ent s as follows:

SELECT @p_val := pilot_id FROM pilot WHERE available = 'yes' LIMIT 1; UPDATE pilot SET available = 'no' WHERE pilot_id = @p_val; UPDATE flight SET pilot_id = @p_val WHERE flight_id = 578; The first st at em ent chooses one of t he available pilot s, t he second m arks t he pilot as unavailable, and t he t hird assigns t he pilot t o t he flight . That 's st raight forward enough in pract ice, but in principle t here are a couple of significant difficult ies wit h t he process:



Concur r e ncy issue s. The MySQL server can handle m ult iple client s at t he sam e t im e. I f t wo client s want t o schedule pilot s, it 's possible t hat bot h of t hem would run t he init ial

SELECT query

and ret rieve t he sam e pilot I D num ber before eit her of t hem has a chance t o set t he pilot 's st at us t o unavailable. I f t hat happens, t he sam e pilot would be scheduled for t wo flight s at once.



I nt e gr it y issue s. All t hree st at em ent s m ust execut e successfully as a unit . For exam ple, if t he

SELECT and t he first UPDATE run successfully, but

t he second

UPDATE fails,

t he pilot 's st at us is set t o unavailable wit hout t he pilot being assigned a flight . The dat abase will be left in an inconsist ent st at e. To prevent concurrency and int egrit y problem s in t hese t ypes of sit uat ions, t ransact ions are helpful. A t ransact ion groups a set of st at em ent s and guarant ees t he following propert ies:



No ot her client can updat e t he dat a used in t he t ransact ion while t he t ransact ion is in progress; it 's as t hough you have t he server all t o yourself. For exam ple, ot her client s cannot m odify t he pilot or flight records while you're booking a pilot for a flight . By prevent ing ot her client s from int erfering wit h t he operat ions you're perform ing, t ransact ions solve concurrency problem s arising from t he m ult iple-client nat ure of t he

MySQL server. I n effect , t ransact ions serialize access t o a shared resource across m ult iple-st at em ent operat ions.



St at em ent s in a t ransact ion are grouped and are com m it t ed ( t ake effect ) as a unit , but only if t hey all succeed. I f an error occurs, any act ions t hat occurred prior t o t he error are rolled back, leaving t he relevant t ables unaffect ed as t hough none of t he st at em ent s had been issued at all. This keeps t he dat abase from becom ing inconsist ent . For exam ple, if an updat e t o t he t he change t o t he

flights t able fails, rollback causes

pilots t able t o be undone, leaving t he pilot

st ill available.

Rollback frees you from having t o figure out how t o undo a part ially com plet e operat ion yourself. This chapt er discusses how t o det erm ine whet her or not your MySQL server support s t ransact ions and shows t he synt ax for t he SQL st at em ent s t hat begin and end t ransact ions. I t also describes how t o im plem ent t ransact ional operat ions from wit hin program s, using error det ect ion t o det erm ine whet her t o com m it or roll back. The final sect ion discusses som e workarounds you can use if your MySQL server doesn't support t ransact ions. Script s relat ed t o t he exam ples shown here are locat ed in t he t ransact ions direct ory of t he

recipes dist ribut ion.

15.2 Verifying Transaction Support Requirements 15.2.1 Problem You want t o use t ransact ions, but don't know whet her your MySQL server support s t hem .

15.2.2 Solution Check your server version t o be sure it 's recent enough, and det erm ine what t able t ypes it support s. You can also t ry creat ing a t able wit h a t ransact ional t ype and see whet her MySQL act ually uses t hat t ype for t he t able definit ion.

15.2.3 Discussion To use t ransact ions in MySQL, you need a server t hat is recent enough t o support t ransact ionsafe t able handlers, and your applicat ions m ust use t ables t hat have a t ransact ional t ype. To check t he version of your server, use t he following query:

mysql> SELECT VERSION( ); +----------------+ | VERSION( ) | +----------------+ | 4.0.4-beta-log | +----------------+

Transact ion support first appeared in MySQL 3.23.17 wit h t he inclusion of t he BDB ( Berkeley DB) t ransact ional t able t ype. Since t hen, t he I nnoDB t ype has becom e available; as of MySQL 3.23.29, bot h t ypes can be used. I n general, I 'd recom m end using as recent a version of MySQL as possible. Transact ion support ( and MySQL it self) have im proved a lot since Version 3.23.29. Even if your server is recent enough t o include t ransact ion support , it m ay not act ually have t ransact ional capabilit ies. The handlers for t he appropriat e t able t ypes m ay not have been configured in when t he server was com piled. I t 's also possible for handlers t o be present but disabled, if t he server has been st art ed wit h t he - - skip- bdb or - - skip- innodb opt ions. To check t he availabilit y and st at us of t he t ransact ional t able handlers, use

SHOW VARIABLES:

mysql> SHOW VARIABLES LIKE 'have_bdb'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | have_bdb | YES | +---------------+-------+ mysql> SHOW VARIABLES LIKE 'have_innodb'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | have_innodb | YES | +---------------+-------+ The query out put shown here indicat es t hat BDB and I nnoDB t ables bot h can be used. I f eit her of t hese queries produces no out put or t he ( such as

Value colum n says som et hing ot her

NO or DISABLED) , t he corresponding t able t ype cannot

t han

YES

be used.

For program m at ic m et hods of checking t he server version and t he set of t able t ypes t hat t he server support s, see Recipe 9.14 and Recipe 9.18. Anot her way t o check t he availabilit y of a specific t able t ype is t o t ry creat ing a t able wit h t hat

SHOW CREATE TABLE st at em ent t o see what t ype MySQL act ually uses. For exam ple, t ry creat ing t as an I nnoDB t able by execut ing t he following st at em ent s:

t ype. Then issue a

mysql> CREATE TABLE t (i INT) TYPE = InnoDB; mysql> SHOW CREATE TABLE t\G *************************** 1. row *************************** Table: t Create Table: CREATE TABLE `t` ( `i` int(11) default NULL ) TYPE=InnoDB I f t he I nnoDB t ype is available, t he last part of t he

SHOW st at em ent

will say

TYPE=InnoDB. I f not , MySQL will creat e t he t able using MyI SAM ( t he default

t able t ype) ,

and t he last part of t he st at em ent will say

SHOW TABLE STATUS t o check

TYPE=MyISAM inst ead. ( You can also use

t he t ype of a t able.)

I n t he event t hat your MySQL server doesn't include t he t ransact ion- safe t able handlers you want t o use, you'll need t o replace it wit h one t hat does. I f you inst all MySQL from a source dist ribut ion, t he inst allat ion inst ruct ions indicat e which configurat ion flags t o use t o enable t he desired handlers. I f you prefer binaries, be sure t o inst all a dist ribut ion t hat was built t o include BDB or I nnoDB handlers. Aft er you've verified t hat your server support s t he appropriat e t ransact ional t able t ypes, your applicat ions can go ahead and use t hem :



I f you're writ ing a new applicat ion, you can creat e it s t ables t o have a t ransact ional t ype right from t he beginning. All t hat 's necessary t o creat e such a t able is t o add

TYPE = tbl_type t o t he end of t he CREATE TABLE st at em ent : •

CREATE TABLE t1 (i INT) TYPE = BDB; CREATE TABLE t2 (i INT) TYPE = INNODB;



I f you m odify an exist ing applicat ion in such a way t hat it becom es necessary t o perform t ransact ions wit h exist ing t ables t hat were not originally creat ed wit h t ransact ions in m ind, you can change t he t ables t o have a different t ype. For exam ple, t he I SAM and MyI SAM t ypes are non- t ransact ional. Trying t o use t hem for t ransact ions will yield incorrect result s because t hey do not support rollback. I n t his case, you can use

ALTER TABLE t o convert

t he t ables t o a t ransact ional t ype. Suppose

t is a

MyI SAM t able. To m ake it an I nnoDB t able, do t his:

ALTER TABLE t TYPE = INNODB; Not e t hat changing a t able's t ype t o support t ransact ions m ay affect it s behavior in ot her ways. For exam ple, MyI SAM t ables provide m ore flexible handling of

AUTO_INCREMENT colum ns t han do ot her

t able t ypes. I f you rely on MyI SAM-

only sequence feat ures, changing t he t able t ype will cause problem s. See Chapt er 11 for m ore inform at ion. I f your server does not support t ransact ions and you cannot replace it wit h one t hat does, you m ay be able t o achieve som ewhat t he sam e effect in ot her ways. Som et im es it 's possible t o lock your t ables across m ult iple st at em ent s using

LOCK and UNLOCK. This prevent s ot her

client s from int erfering, alt hough t here is no rollback if any of t he st at em ent s fail. Anot her alt ernat ive m ay be t o rewrit e queries so t hat t hey don't require t ransact ions. See Recipe 15.9 for inform at ion about bot h t ypes of workarounds.

15.3 Performing Transactions Using SQL 15.3.1 Problem You need t o issue a set of queries t hat m ust succeed or fail as a unit .

15.3.2 Solution Manipulat e MySQL's aut o-com m it m ode t o allow m ult iple- st at em ent t ransact ions, t hen com m it or roll back t he st at em ent s depending on whet her t hey succeed or fail.

15.3.3 Discussion This sect ion describes t he SQL st at em ent s t hat cont rol t ransact ional behavior in MySQL. The im m ediat ely following sect ions discuss how t o perform t ransact ions from wit hin program s. Som e API s require t hat you im plem ent t ransact ions by issuing t he SQL st at em ent s discussed in t his sect ion; ot hers provide a special m echanism t hat allows t ransact ion m anagem ent wit hout writ ing SQL direct ly. However, even in t he lat t er case, t he API m echanism will m ap program operat ions ont o t ransact ional SQL st at em ent s, so reading t his sect ion will give you a bet t er underst anding of what t he API is doing on your behalf. MySQL norm ally operat es in aut o- com m it m ode, which com m it s t he effect of each st at em ent as it execut es. ( I n effect , each st at em ent is it s own t ransact ion.) To perform a m ult iplest at em ent t ransact ion, you m ust disable aut o-com m it m ode, issue t he st at em ent s t hat m ake up t he t ransact ion, and t hen eit her com m it or roll back your changes. I n MySQL, you can do t his t wo ways:



I ssue a

BEGIN ( or BEGIN WORK)

st at em ent t o suspend aut o- com m it m ode, t hen

issue t he queries t hat m ake up t he t ransact ion. I f t he queries succeed, record t heir effect in t he dat abase and t erm inat e t he t ransact ion by issuing a st at em ent :

• • • • • • • • • • •

mysql> CREATE TABLE t (i INT) TYPE = InnoDB; mysql> BEGIN; mysql> INSERT INTO t (i) VALUES(1); mysql> INSERT INTO t (i) VALUES(2); mysql> COMMIT; mysql> SELECT * FROM t; +------+ | i | +------+ | 1 | | 2 | +------+

COMMIT

COMMIT. I nst ead, cancel t he t ransact ion by issuing a ROLLBACK st at em ent . I n t he following exam ple, t rem ains em pt y aft er t he I f an error occurs, don't use

t ransact ion because t he effect s of t he

INSERT st at em ent s are rolled back:

mysql> CREATE TABLE t (i INT) TYPE = InnoDB; mysql> BEGIN; mysql> INSERT INTO t (i) VALUES(1); mysql> INSERT INTO t (x) VALUES(2); ERROR 1054 at line 5: Unknown column 'x' in 'field list' mysql> ROLLBACK; mysql> SELECT * FROM t; Empty set (0.00 sec)



Anot her way t o group st at em ent s is t o t urn off aut o- com m it m ode explicit ly. Then each st at em ent you issue becom es part of t he current t ransact ion. To end t he t ransact ion and begin t he next one, issue a

• • • • • • • • • • •

COMMIT or ROLLBACK st at em ent :

mysql> CREATE TABLE t (i INT) TYPE = InnoDB; mysql> SET AUTOCOMMIT = 0; mysql> INSERT INTO t (i) VALUES(1); mysql> INSERT INTO t (i) VALUES(2); mysql> COMMIT; mysql> SELECT * FROM t; +------+ | i | +------+ | 1 | | 2 | +------+ To t urn aut o- com m it m ode back on, use t his st at em ent :

mysql> SET AUTOCOMMIT = 1;

Not Everything Can Be Undone Transact ions have t heir lim it s, because not all st at em ent s can be part of a

DROP DATABASE st at em ent , don't back t he dat abase by execut ing a ROLLBACK.

t ransact ion. For exam ple, if you issue a expect t o get

15.4 Performing Transactions from Within Programs 15.4.1 Problem You're writ ing a program t hat needs t o im plem ent t ransact ional operat ions.

15.4.2 Solution

Use t he t ransact ion abst ract ion provided by your language API , if it has such a t hing. I f it doesn't , use t he API 's usual query execut ion m echanism t o issue t he t ransact ional SQL st at em ent s direct ly.

15.4.3 Discussion When you run queries int eract ively from m ysql ( as in t he exam ples shown in t he previous sect ion) , you can see by inspect ion whet her st at em ent s succeed or fail and det erm ine on t hat basis whet her t o com m it or roll back. From wit hin a non- int eract ive SQL script st ored in a file, t hat doesn't work so well. You cannot com m it or roll back condit ionally according t o st at em ent success or failure, because MySQL includes no t he flow of t he script . ( There is an

IF/THEN/ELSE const ruct

IF( ) funct ion, but

for cont rolling

t hat 's not t he sam e t hing.) For t his

reason, it 's m ost com m on t o perform t ransact ional processing from wit hin a program , because you can use your API language t o det ect errors and t ake appropriat e act ion. This sect ion discusses som e general background on how t o do t his. The next sect ions provide languagespecific det ails for t he Perl, PHP, Pyt hon, and Java API s. Every API support s t ransact ions, even if only in t he sense t hat you can explicit ly issue t ransact ion- relat ed SQL st at em ent s such as

BEGIN and COMMIT. However, som e API s also

provide a t ransact ion abst ract ion t hat allows you t o cont rol t ransact ional behavior wit hout working direct ly wit h SQL. This approach hides t he det ails and provides bet t er port abilit y t o ot her dat abases t hat support t ransact ions but for which t he underlying SQL synt ax m ay differ. The Perl, Pyt hon, and Java MySQL int erfaces provide such an abst ract ion. PHP does not ; you m ust issue t he SQL st at em ent s yourself. The next few sect ions each im plem ent t he sam e exam ple t o illust rat e how t o perform program - based t ransact ions. They use a t able

t t hat

has t he following init ial cont ent s t hat

show how m uch m oney t wo people have:

+------+------+ | name | amt | +------+------+ | Eve | 10 | | Ida | 0 | +------+------+ The sam ple t ransact ion is a sim ple financial t ransfer t hat uses t wo give six dollars of Eve's m oney t o I da:

UPDATE money SET amt = amt - 6 WHERE name = 'Eve'; UPDATE money SET amt = amt + 6 WHERE name = 'Ida'; The result is a t able t hat looks like t his:

+------+------+ | name | amt |

UPDATE st at em ent s t o

+------+------+ | Eve | 4 | | Ida | 6 | +------+------+ I t 's necessary t o execut e bot h st at em ent s wit hin a t ransact ion t o ensure t hat bot h of t hem t ake effect at once. Wit hout a t ransact ion, Eve's m oney disappears wit hout being credit ed t o I da if t he second st at em ent fails. By using a t ransact ion, t he t able will be left unchanged if st at em ent failure occurs. The exam ple program s for each language are locat ed in t he t ransact ions direct ory of t he

recipes dist ribut ion. I f you com pare t hem , you'll see t hat

t hey all em ploy a sim ilar

fram ework for perform ing t ransact ional processing:



The st at em ent s of t he t ransact ion are grouped wit hin a cont rol st ruct ure, along wit h a com m it operat ion.



I f t he st at us of t he cont rol st ruct ure indicat es t hat it did not execut e successfully t o com plet ion, t he t ransact ion is rolled back.

That logic can be expressed as follows, where

block represent s t he cont rol st ruct ure used

t o group st at em ent s:

block: statement 1 statement 2 ... statement n commit if the block failed: roll back I n Perl, t he cont rol st ruct ure is an code. Pyt hon and Java use a

eval block t hat

try block t hat

succeeds or fails and ret urns an error

execut es t o t he end if t he t ransact ion was

successful. I f an error occurs, an except ion is raised t hat t riggers execut ion of a corresponding error- handling block t o roll back t he t ransact ion. PHP does not have t hese const ruct s, but you can achieve t he sam e effect by execut ing t he st at em ent s of t he t ransact ion and a com m it operat ion wit hin a funct ion. I f t he funct ion fails, roll back. The benefit of st ruct uring your code as j ust described is t hat it m inim izes t he num ber of t est s needed t o det erm ine whet her t o roll back. The alt ernat ive—checking t he result of each st at em ent wit hin t he t ransact ion and rolling back on individual st at em ent errors—quickly t urns your code int o an unreadable m ess. A subt le point t o be aware of when rolling back wit hin languages t hat raise except ions is t hat it m ay be possible for t he rollback it self t o fail, causing anot her except ion t o be raised. I f you don't want t o deal wit h t hat , issue t he rollback wit hin anot her block t hat has an em pt y except ion handler. The exam ple program s for Perl, Pyt hon, and Java do t his.

Checking How API Transaction Abstractions Map onto SQL Statements For API s t hat provide a t ransact ion abst ract ion, you can see how t he int erface m aps ont o t he underlying SQL by enabling logging in your MySQL server and t hen wat ching t he query log t o see what st at em ent s t he API execut es when you run a t ransact ional program .

15.5 Using Transactions in Perl Programs 15.5.1 Problem You want t o perform a t ransact ion in a DBI script .

15.5.2 Solution Use t he st andard DBI t ransact ion support m echanism .

15.5.3 Discussion The DBI m echanism for perform ing t ransact ions is based on explicit m anipulat ion of aut ocom m it m ode. The procedure is as follows: 1.

Turn on t he

RaiseError at t ribut e if it 's not

enabled and disable

PrintError

if it 's on. You want errors t o raise except ions wit hout print ing anyt hing; leaving

2.

PrintError enabled can int erfere wit h failure det ect ion in som e cases. Disable t he AutoCommit at t ribut e so t hat a com m it will be done only when you say so.

3.

Execut e t he st at em ent s t hat m ake up t he t ransact ion wit hin an

eval block

so t hat

errors raise an except ion and t erm inat e t he block. The last t hing in t he block should be a call t o

commit( ), which com m it s t he t ransact ion if all it s st at em ent s com plet ed

successfully. 4.

eval execut es, check t he $@ variable. I f $@ cont ains t he em pt y st ring, t he t ransact ion succeeded. Ot herwise, t he eval will have failed due t o t he occurrence of som e error and $@ will cont ain an error m essage. I nvoke rollback( ) t o cancel t he t ransact ion. I f you want t o display an error m essage, print $@ before calling rollback( ). Aft er t he

The following code shows how t o im plem ent t his procedure t o perform our exam ple t ransact ion. I t does so in such a way t hat t he current values of t he error- handling and aut o-

com m it at t ribut es are saved before and rest ored aft er execut ing t he t ransact ion. That m ay be overkill for your own applicat ions. For exam ple, if you know t hat

PrintError are set

RaiseError and

properly already, you need not save or rest ore t hem .

# save error-handling and auto-commit attributes, # then make sure they're set correctly. $save_re = $dbh->{RaiseError}; $save_pe = $dbh->{PrintError}; $save_ac = $dbh->{AutoCommit}; $dbh->{RaiseError} = 1; # raise exception if an error occurs $dbh->{PrintError} = 0; # don't print an error message $dbh->{AutoCommit} = 0; # disable auto-commit eval { # move some money from one person to the other $dbh->do ("UPDATE money SET amt = amt - 6 WHERE name = 'Eve'"); $dbh->do ("UPDATE money SET amt = amt + 6 WHERE name = 'Ida'"); # all statements succeeded; commit transaction $dbh->commit ( ); }; if ($@) # an error occurred { print "Transaction failed, # roll back within eval to # failure from terminating eval { $dbh->rollback ( ); } # restore attributes $dbh->{AutoCommit} = $dbh->{PrintError} = $dbh->{RaiseError} =

rolling back. Error was:\n$@\n"; prevent rollback the script };

to original state $save_ac; $save_pe; $save_re;

You can see t hat t he exam ple goes t o a lot of work j ust t o issue a couple of st at em ent s. To m ake t ransact ion processing easier, you m ight want t o creat e a couple of convenience funct ions t o handle t he processing t hat occurs before and aft er t he

sub transact_init { my $dbh = shift; my $attr_ref = {};

eval:

# create hash in which to save attributes

$attr_ref->{RaiseError} $attr_ref->{PrintError} $attr_ref->{AutoCommit} $dbh->{RaiseError} = 1; $dbh->{PrintError} = 0; $dbh->{AutoCommit} = 0; return ($attr_ref);

= = = # # # #

$dbh->{RaiseError}; $dbh->{PrintError}; $dbh->{AutoCommit}; raise exception if an error occurs don't print an error message disable auto-commit return attributes to caller

} sub transact_finish { my ($dbh, $attr_ref, $error) = @_;

if ($error) # an error occurred { print "Transaction failed, rolling back. Error was:\n$error\n"; # roll back within eval to prevent rollback # failure from terminating the script eval { $dbh->rollback ( ); }; } # restore error-handling and auto-commit attributes $dbh->{AutoCommit} = $attr_ref->{AutoCommit}; $dbh->{PrintError} = $attr_ref->{PrintError}; $dbh->{RaiseError} = $attr_ref->{RaiseError}; } By using t hose t wo funct ions, our exam ple t ransact ion can be sim plified considerably:

$ref = transact_init ($dbh); eval { # move some money from one person to the other $dbh->do ("UPDATE money SET amt = amt - 6 WHERE name = 'Eve'"); $dbh->do ("UPDATE money SET amt = amt + 6 WHERE name = 'Ida'"); # all statements succeeded; commit transaction $dbh->commit ( ); }; transact_finish ($dbh, $ref, $@);

AutoCommit at t ribut e m anually is t o begin a t ransact ion by invoking begin_work( ). This m et hod disables AutoCommit and causes it t o be enabled again aut om at ically when you invoke commit( ) or rollback( ) lat er. As of DBI 1.20, an alt ernat ive t o m anipulat ing t he

Transactions and Older Versions of DBD::mysql The DBI t ransact ion m echanism requires DBD: : m ysql 1.2216 or newer. For earlier

AutoCommit at t ribut e has no effect , so you'll need t o issue t he t ransact ion- relat ed SQL st at em ent s yourself ( BEGIN, COMMIT, ROLLBACK) .

versions, set t ing t he

15.6 Using Transactions in PHP Programs 15.6.1 Problem You want t o perform a t ransact ion in a PHP script .

15.6.2 Solution

I ssue t he SQL st at em ent s t hat begin and end t ransact ions.

15.6.3 Discussion PHP provides no special t ransact ion m echanism , so it 's necessary t o issue t he relevant SQL

BEGIN t o st art a t ransact ion, or disable m ode yourself using SET AUTOCOMMIT. The following

st at em ent s direct ly. This m eans you can eit her use and enable t he aut o- com m it exam ple uses

BEGIN. The st at em ent s of t he t ransact ion are placed wit hin a funct ion t o

avoid a lot of m essy error checking. To det erm ine whet her or not t o roll back, it 's necessary only t o t est t he funct ion result :

function do_queries ($conn_id) { # move some money from one person to the other if (!mysql_query ("BEGIN", $conn_id)) return (0); if (!mysql_query ("UPDATE money SET amt = amt - 6 WHERE name = 'Eve'", $conn_id)) return (0); if (!mysql_query ("UPDATE money SET amt = amt + 6 WHERE name = 'Ida'", $conn_id)) return (0); if (!mysql_query ("COMMIT", $conn_id)) return (0); return (1); } if (!do_queries ($conn_id)) { print ("Transaction failed, rolling back. Error was:\n" . mysql_error ($conn_id) . "\n"); mysql_query ("ROLLBACK", $conn_id); } The

do_queries( ) funct ion t est s each m et hod and ret urns failure if any of t hem

fail.

That st yle of t est ing lends it self t o sit uat ions in which you m ay need t o perform addit ional processing bet ween st at em ent s or aft er execut ing t he st at em ent s and before ret urning success. For t he exam ple shown, no ot her processing is necessary, so

do_queries( )

could be reim plem ent ed as a single long expression:

function do_queries ($conn_id) { # move some money from one person to the other return ( mysql_query ("BEGIN", $conn_id) && mysql_query ("UPDATE money SET amt = amt - 6 WHERE name = 'Eve'", $conn_id) && mysql_query ("UPDATE money SET amt = amt + 6 WHERE name = 'Ida'", $conn_id)

&& mysql_query ("COMMIT", $conn_id) ); }

15.7 Using Transactions in Python Programs 15.7.1 Problem You want t o perform a t ransact ion in a DB- API script .

15.7.2 Solution Use t he st andard DB- API t ransact ion support m echanism .

15.7.3 Discussion The Pyt hon DB- API abst ract ion provides t ransact ion processing cont rol t hrough connect ion

begin( ) t o begin a t ransact ion and eit her commit( ) or rollback( ) t o end it . The begin( ) and commit( ) calls go int o a try block, and t he rollback( ) goes int o t he corresponding except block t o cancel t he

obj ect m et hods. I nvoke

t ransact ion if an error occurs:

try: conn.begin ( ) cursor = conn.cursor ( ) # move some money from one person to the other cursor.execute ("UPDATE money SET amt = amt - 6 WHERE name = 'Eve'") cursor.execute ("UPDATE money SET amt = amt + 6 WHERE name = 'Ida'") cursor.close ( ) conn.commit( ) except MySQLdb.Error, e: print "Transaction failed, rolling back. Error was:" print e.args try: # empty exception handler in case rollback fails conn.rollback ( ) except: pass

15.8 Using Transactions in Java Programs 15.8.1 Problem You want t o perform a t ransact ion in a JDBC script .

15.8.2 Solution Use t he st andard JDBC t ransact ion support m echanism .

15.8.3 Discussion Connection obj ect t o t urn off aut o- com m it m ode. Then, aft er issuing your queries, use t he obj ect 's commit( ) m et hod t o com m it t he t ransact ion or rollback( ) t o cancel it . Typically, you execut e t he st at em ent s for t he t ransact ion in a try block, wit h commit( ) at t he end of t he block. To handle failures, invoke rollback( ) in t he corresponding except ion handler: To perform t ransact ions in Java, use your

try { conn.setAutoCommit (false); Statement s = conn.createStatement // move some money from one person s.executeUpdate ("UPDATE money SET s.executeUpdate ("UPDATE money SET s.close ( ); conn.commit ( ); conn.setAutoCommit (true);

( ); to the other amt = amt - 6 WHERE name = 'Eve'"); amt = amt + 6 WHERE name = 'Ida'");

} catch (SQLException e) { System.err.println ("Transaction failed, rolling back."); Cookbook.printErrorMessage (e); // empty exception handler in case rollback fails try { conn.rollback ( ); conn.setAutoCommit (true); } catch (Exception e2) { } }

15.9 Using Alternatives to Transactions 15.9.1 Problem You need t o perform t ransact ional processing, but your MySQL server doesn't support t ransact ions.

15.9.2 Solution Som e t ransact ional operat ions are am enable t o workarounds such as explicit t able locking. I n cert ain cases, you m ay not act ually even need a t ransact ion; by rewrit ing your queries, you can elim inat e t he need for a t ransact ion ent irely.

15.9.3 Discussion Transact ions are valuable, but som et im es t hey need not be or cannot be used:



Your server m ay not support t ransact ions at all. ( I t m ay be t oo old or not configured wit h t he appropriat e t able handlers, as discussed in Recipe 15.2) . I n t his case, you have no choice but t o use som e kind of workaround for t ransact ions. One st rat egy t hat can be helpful in som e sit uat ions is t o use explicit t able locking t o prevent concurrency problem s.



Applicat ions som et im es use t ransact ions when t hey're not really necessary. You m ay be able t o elim inat e t he need for a t ransact ion by rewrit ing st at em ent s. This m ay even result in a fast er applicat ion.

15.9.4 Grouping Statements Using Locks I f your server doesn't have t ransact ional capabilit ies but you need t o execut e a group of queries wit hout int erference by ot her client s, you can do so by using

UNLOCK TABLE:

LOCK TABLE and

[1]

[1]

LOCK TABLES and UNLOCK TABLES are synonyms for LOCK TABLE and UNLOCK TABLE. •

Use

LOCK TABLE t o obt ain locks for

all t he t ables you int end t o use. ( Acquire writ e

locks for t ables you need t o m odify, and read locks for t he ot hers.) This prevent s ot her client s from m odifying t he t ables while you're using t hem .



I ssue t he queries t hat m ust be execut ed as a group.



Release t he locks wit h

UNLOCK TABLE. Ot her

client s will regain access t o t he

t ables. Locks obt ained wit h

LOCK TABLE rem ain in effect

unt il you release t hem and t hus can

apply over t he course of m ult iple st at em ent s. This gives you t he sam e concurrency benefit s as t ransact ions. However, t here is no rollback if errors occur, so t able locking is not appropriat e for all applicat ions. For exam ple, you m ight t ry perform ing an operat ion t hat t ransfers funds from Eve t o I da like t his:

LOCK TABLE money WRITE; UPDATE money SET amt = amt - 6 WHERE name = 'Eve'; UPDATE money SET amt = amt + 6 WHERE name = 'Ida'; UNLOCK TABLE; Unfort unat ely, if t he second updat e fails, t he effect of t he first updat e is not rolled back. Despit e t his caveat , t here are cert ain t ypes of sit uat ions where t able locking m ay be sufficient for your purposes:



A set of st at em ent s consist ing only of

SELECT st at em ent s and prevent

SELECT queries. I f you want

t o run several

ot her client s from m odifying t he t ables while you're

querying t hem , locking will do t hat . For exam ple, if you need t o run several sum m ary queries on a set of t ables, your sum m aries m ay appear t o be based on different set s

of dat a if ot her client s are allowed t o change records in bet ween your sum m ary queries. This will m ake t he sum m aries inconsist ent . To prevent t hat from happening, lock t he t ables while you're using t hem .



Locking also can be useful for a set of queries where only t he last st at em ent is an updat e. I n t his case, t he earlier st at em ent s don't m ake any changes and t here is not hing t hat needs t o be rolled back should t he updat e fail.

15.9.5 Rewriting Queries to Avoid Transactions Som et im es applicat ions use t ransact ions unnecessarily. Suppose you have a t able

meeting

t hat records m eet ing and convent ion inform at ion ( including t he num ber of t icket s left for each event ) , and t hat you're writ ing a Perl applicat ion cont aining a funct ion

get_ticket( )

t hat dispenses t icket s. One way t o im plem ent t he funct ion is t o check t he t icket count , decrem ent it if it 's posit ive, and ret urn a st at us indicat ing whet her a t icket was available. To prevent m ult iple client s from at t em pt ing t o grab t he last t icket at t he sam e t im e, issue t he queries wit hin a t ransact ion:

[2]

[2]

The transact_init( discussed in Recipe 15.5.

) and transact_finish( ) functions are

sub get_ticket { my ($dbh, $meeting_id) = @_; my $ref = transact_init ($dbh); my $count = 0; eval { # check the current ticket count $count = $dbh->selectrow_array ( "SELECT tix_left FROM meeting WHERE meeting_id = ?", undef, $meeting_id); # if there are tickets left, decrement the count if ($count > 0) { $dbh->do ( "UPDATE meeting SET tix_left = tix_left-1 WHERE meeting_id = ?", undef, $meeting_id); } $dbh->commit ( ); }; $count = 0 if $@; # if an error occurred, no tix available transact_finish ($dbh, $ref, $@); return ($count > 0) } The funct ion dispenses t icket s properly, but involves a cert ain am ount of unnecessary work. I t 's possible t o do t he sam e t hing wit hout using a t ransact ion at all. Decrem ent t he t icket count only if t he count is great er t han zero, t hen check whet her t he st at em ent affect ed a row:

sub get_ticket

{ my ($dbh, $meeting_id) = @_; my $count = $dbh->do ("UPDATE meeting SET tix_left = tix_left-1 WHERE meeting_id = ? AND tix_left > 0", undef, $meeting_id); return ($count > 0); } I n MySQL, t he row count ret urned by an

UPDATE st at em ent

indicat es t he num ber of rows

changed. This m eans t hat if t here are no t icket s left for an event , t he

UPDATE won't

change

t he row and t he count will be zero. This m akes it easy t o det erm ine whet her a t icket is available using a single query rat her t han wit h t he m ult iple queries required by t he t ransact ional approach. The lesson here is t hat alt hough t ransact ions are im port ant and have t heir place, you m ay be able t o avoid t hem and end up wit h a fast er applicat ion as a result . ( The single-query solut ion is an exam ple of what t he MySQL Reference Manual refers t o as an "at om ic operat ion." The m anual discusses t hese as an efficient alt ernat ive t o t ransact ions.)

Chapter 16. Introduction to MySQL on the Web Sect ion 16.1. I nt roduct ion Sect ion 16.2. Basic Web Page Generat ion Sect ion 16.3. Using Apache t o Run Web Script s Sect ion 16.4. Using Tom cat t o Run Web Script s Sect ion 16.5. Encoding Special Charact ers in Web Out put

16.1 Introduction The next few chapt ers discuss som e of t he ways t hat MySQL can help you build a bet t er web sit e. I n general, t he principal benefit is t hat MySQL m akes it easier t o provide dynam ic rat her t han st at ic cont ent . St at ic cont ent exist s as pages in t he web server's docum ent t ree t hat are served exact ly as is. Visit ors can access only t he docum ent s t hat you place in t he t ree, and changes occur only when you add, m odify, or delet e t hose docum ent s. By cont rast , dynam ic cont ent is creat ed on dem and. Rat her t han opening a file and serving it s cont ent s direct ly t o t he client , t he web server execut es a script t hat generat es t he page and sends t he result ing out put . As a sim ple exam ple, a script can look up t he current hit count er value in t he dat abase for a given web page, updat e t he count er, and ret urn t he new value for display in t he page. Each t im e t he script execut es, it produces a different value. More com plex exam ples are script s t hat show t he nam es of people t hat have a birt hday t oday, ret rieve and display it em s in a product cat alog, or provide inform at ion about t he current st at us of t he server. And t hat 's j ust for st art ers; web script s have access t o t he power of t he program m ing language in which t hey're writ t en, so t he act ions t hat t hey perform t o generat e pages can be quit e ext ensive. For exam ple, web script s are im port ant for form processing, and a single script m ay be responsible for generat ing a form and sending it t o t he user, processing t he cont ent s of t he form when t he user subm it s it lat er, and st oring t he cont ent s in a dat abase. By com m unicat ing wit h users t his way, web script s bring a m easure of int eract ivit y t o your web sit e. This chapt er covers t he int roduct ory aspect s of writ ing script s t hat use MySQL in a web environm ent . Som e of t he init ial m at erial is not part icularly MySQL- specific, but it is necessary t o est ablish t he general groundwork for using your dat abase from wit hin t he cont ext of web program m ing. The t opics covered here include:



How web script ing differs from writ ing st at ic HTML docum ent s or script s int ended t o be



Som e of t he prerequisit es for running web script s. I n part icular, you m ust have a web

execut ed from t he com m and line. server inst alled and it m ust be set up t o recognize your script s as program s t o be execut ed, rat her t han as st at ic files t o be served lit erally over t he net work.



How t o use each of our API languages t o writ e a short web script t hat queries t he MySQL server for inform at ion and displays t he result s in a web page.



How t o properly encode out put . HTML consist s of t ext t o be displayed int erspersed wit h special m arkup const ruct s. However, if t he t ext cont ains special charact ers, you m ust encode t hem t o avoid generat ing m alform ed web pages. Each API provides a m echanism for doing t his.

The following chapt ers go int o m ore det ail on t opics such as how t o display query result s in various form at s ( paragraphs, list s, t ables, and so fort h) , working wit h im ages, form processing, and t racking a user across t he course of several page request s as part of a single user session.

This book uses t he Apache web server for Perl, PHP, and Pyt hon script s, and t he Tom cat server for Java script s—writ t en using JavaServer Pages ( JSP) not at ion. Bot h servers are available from t he Apache Group: ht t p: / / ht t pd.apache.org/ ht t p: / / j akart a.apache.org/ Because Apache inst allat ions are fairly prevalent , I 'm going t o assum e t hat it is already inst alled on your syst em and t hat you j ust need t o configure it . Recipe 16.3 discusses how t o configure Apache for Perl, PHP, and Pyt hon, and how t o writ e a short web script in each language. Tom cat is less widely deployed t han Apache, so som e addit ional inst allat ion inform at ion is provided in Appendix B. Recipe 16.4 discusses JSP script writ ing using Tom cat . You can use different servers if you like, but you'll need t o adapt t he inst ruct ions given here. The web-based exam ple script s in t he

recipes dist ribut ion m ay be found under

t he

direct ories nam ed for t he servers used t o run t hem . For Perl, PHP, and Pyt hon exam ples, look under t he apache direct ory; for JSP exam ples, look under t om cat . I will assum e t hat you have som e basic fam iliarit y wit h HTML. For Tom cat , it 's also helpful t o know som et hing about XML, because Tom cat 's configurat ion files are writ t en as XML docum ent s, and JSP pages cont ain elem ent s writ t en using XML synt ax. I f you don't know any XML, see t he quick sum m ary in t he sidebar "XML and XHTML in a Nut shell." I n general, t he web script s in t his book produce out put t hat is valid not only as HTML, but as XHTML, t he t ransit ional form at bet ween HTML and XML. ( This is anot her reason t o becom e fam iliar wit h XML.) For exam ple, XHTML requires closing t ags, so paragraphs are writ t en wit h a closing

t ag following t he paragraph body. The use of t his out put

st yle will be obvious for

script s writ t en using languages like PHP in which t he HTML t ags are included lit erally in t he script . For int erfaces t hat generat e HTML for you, like t he Perl CGI .pm m odule, conform ance t o XHTML is a m at t er of whet her or not t he m odule it self produces XHTML. CGI .pm does so beginning wit h Version 2.69, t hough it s XHTML conform ance im proves in m ore recent versions.

XML and XHTML in a Nutshell XML is sim ilar in som e ways t o HTML, and because m ore people know HTML, it 's perhaps easiest t o charact erize XML in t erm s of how it differs from HTML:



Let t ercase for HTML t ag and at t ribut e nam es does not m at t er; in XML, t he nam es are case sensit ive.



I n HTML, t ag at t ribut es can be specified wit h a quot ed or unquot ed value, or som et im es wit h no value at all. I n XML, every t ag at t ribut e m ust have a value, and t he value m ust be quot ed.



Every opening t ag in XML m ust have a corresponding closing t ag. This is t rue even if t here is no body, alt hough in t hat case, a short cut t ag form can be


, but XML requires a closing t ag. You could writ e t his as

, but t he elem ent has no body, so a short cut form
can be used t hat com bines t he opening used. For exam ple, in HTML, you can writ e

and closing t ags. However, when writ ing XML t hat will be t ranslat ed int o


wit h a space preceding t he slash. The space helps browsers not t o m isint erpret t he t ag nam e as br/ HTML, it 's safer t o writ e t he t ag as

and consequent ly ignore it as unrecognized. XHTML is a t ransit ional form at used for t he m igrat ion of t he Web away from HTML and t oward XML. I t 's less st rict t han XML, but m ore st rict t han HTML. For exam ple, XHTML t ag and at t ribut e nam es m ust be lowercase and at t ribut es m ust have a double-quot ed value. I n HTML you m ight writ e a radio but t on elem ent like t his:

I n XHTML, t he t ag nam e m ust be lowercase, t he at t ribut e values m ust be quot ed, t he

checked at t ribut e m ust

be given a value, and t here m ust be a closing t ag:

The elem ent has no body in t his case, so t he single- t ag short cut form can be used:

Appendix C list s som e references if you want addit ional general inform at ion about HTML, XHTML, or XML.

16.2 Basic Web Page Generation 16.2.1 Problem You want t o produce a web page from a script rat her t han by writ ing it m anually.

16.2.2 Solution Writ e a program t hat generat es t he page when it execut es. This gives you m ore cont rol over what get s sent t o t he client t han when you writ e a st at ic page, alt hough it m ay also require t hat you provide m ore part s of t he response. For exam ple, it m ay be necessary t o writ e t he headers t hat precede t he page body.

16.2.3 Discussion HTML is a m arkup language ( t hat 's what t he "ML" st ands for) t hat consist s of a m ix of plain t ext t o be displayed and special m arkup indicat ors or const ruct s t hat cont rol how t he plain t ext is displayed. Here is a very sim ple HTML page t hat specifies a t it le in t he page header, and a body wit h whit e background cont aining a single paragraph:

Web Page Title

Web page body.



I t 's possible t o writ e a script t hat produces t he sam e page, but doing so differs in som e ways from writ ing a st at ic page. For one t hing, you're writ ing in t wo languages at once. ( The script is writ t en in your program m ing language, and t he script it self writ es HTML.) Anot her difference is t hat you m ay have t o produce m ore of t he response t hat is sent t o t he client . When a web server sends a st at ic page t o a client , it act ually sends a set of one or m ore header lines first t hat provide addit ional inform at ion about t he page. For exam ple, an HTML docum ent would be preceded by a

Content-Type: header t hat

let s t he client know

what kind of inform at ion t o expect , and a blank line t hat separat es any headers from t he page body:

Content-Type: text/html

Web Page Title

Web page body.



The web server produces header inform at ion aut om at ically for st at ic HTML pages. When you writ e a web script , you m ay need t o provide t he header inform at ion yourself. Som e API s ( such as PHP) m ay send a cont ent -t ype header aut om at ically, but allow you t o override t he default t ype. For exam ple, if your script sends a JPEG im age t o t he client inst ead of an HTML page, you would want t o have t he script change t he cont ent t ype from

text/html t o

image/jpeg. Writ ing web script s also differs from writ ing com m and- line script s, bot h for input and for out put . On t he input side, t he inform at ion given t o a web script is provided by t he web server rat her t han by com m and- line argum ent s or by input t hat you t ype in. This m eans your script s do not obt ain input using read st at em ent s. I nst ead, t he web server put s inform at ion int o t he execut ion environm ent of t he script , which t hen ext ract s t hat inform at ion from it s environm ent and act s on it . On t he out put side, com m and- line script s t ypically produce plain t ext out put , whereas web script s produce HTML, im ages, or what ever ot her t ype of cont ent you need t o send t o t he client . Out put produced in a web environm ent usually m ust be highly st ruct ured, t o ensure t hat it can be underst ood by t he receiving client program . Any API allows you t o generat e out put by m eans of print st at em ent s, but som e also offer special assist ance for producing web pages. This support can be eit her built int o t he API it self or provided by m eans of special m odules:



For Perl script s, a popular m odule is CGI .pm . I t provides feat ures for generat ing HTML



PHP script s are writ t en as a m ix of HTML and em bedded PHP code. That is, you writ e

m arkup, form processing, and m ore. HTML lit erally int o t he script , t hen drop int o "program m ode" whenever you need t o generat e out put by execut ing code. The code is replaced by it s out put in t he result ing page t hat is sent t o t he client .



Pyt hon includes

cgi and urllib m odules t hat

help perform web program m ing

t asks.



For Java, we'll writ e script s according t o t he JSP specificat ion, which allows script ing direct ives and code t o be em bedded int o web pages. This is sim ilar t o t he way PHP works.

Ot her page-generat ing packages are available besides t hose used in t his book—som e of which can have a m arked effect on t he way you use a language. For exam ple, Mason, em bPerl, ePerl, and AxKit allow you t o t reat Perl as an em bedded language, som ewhat like t he way t hat PHP works. Sim ilarly, t he m od_snake Apache m odule allows Pyt hon code t o be em bedded int o HTML t em plat es. Before you can run any script s in a web environm ent , your web server m ust be set up properly. I nform at ion about doing t his for Apache and Tom cat is provided in Recipe 16.3 and Recipe 16.4, but concept ually, a web server t ypically runs a script in one of t wo ways. First ,

t he web server can use an ext ernal program t o execut e t he script . For exam ple, it can invoke an inst ance of t he Pyt hon int erpret er t o run a Pyt hon script . Second, if t he server has been enabled wit h t he appropriat e language processing abilit y, it can execut e t he script it self. Using an ext ernal program t o run script s requires no special capabilit y on t he part of t he web server, but is slower because it involves st art ing up a separat e process, as well as som e addit ional overhead for writ ing request inform at ion t o t he script and reading t he result s from it . I f you em bed a language processor int o t he web server, it can execut e script s direct ly, result ing in m uch bet t er perform ance. Like m ost web servers, Apache can run ext ernal script s. I t also support s t he concept of ext ensions ( m odules) t hat becom e part of t he Apache process it self ( eit her by being com piled in or dynam ically loaded at runt im e) . One com m on use of t his feat ure is t o em bed language processors int o t he server t o accelerat e script execut ion. Perl, PHP, and Pyt hon script s can be execut ed eit her way. Like com m and- line script s, ext ernally execut ed web script s are writ t en as execut able files t hat begin wit h a

#! line specifying t he pat hnam e of t he appropriat e

language int erpret er. Apache uses t he pat hnam e t o det erm ine which int erpret er runs t he script . Alt ernat ively, you can ext end Apache using m odules such as m od_perl for Perl, m od_php for PHP, and m od_pyt hon or m od_snake for Pyt hon. This gives Apache t he abilit y t o direct ly execut e script s writ t en in t hose languages. For Java JSP script s, t he script s are com piled int o Java servlet s and run inside a process known as a servlet cont ainer. This is sim ilar t o t he em bedded- int erpret er approach in t he sense t hat t he script s are run by a server process t hat m anages t hem , rat her t han by st art ing up an ext ernal process for each script . The first t im e a JSP page is request ed by a client , t he cont ainer com piles it int o a servlet in t he form of execut able Java byt e code, t hen loads it and runs it . The cont ainer caches t he byt e code, so subsequent request s for t he script run direct ly wit h no com pilat ion phase. I f you m odify t he script , t he cont ainer not ices t his when t he next request arrives, recom piles t he script int o a new servlet , and reloads it . The JSP approach provides a significant advant age over writ ing servlet s direct ly, because you don't have t o com pile code yourself or handle servlet loading and unloading. Tom cat can handle t he responsibilit ies of bot h t he servlet cont ainer and of t he web server t hat com m unicat es wit h t he cont ainer. I f you run m ult iple servers on t he sam e host , t hey m ust list en for request s on different port num bers. I n a t ypical configurat ion, Apache list ens on t he default HTTP port ( 80) and Tom cat list ens on anot her port such as 8080. The exam ples here use server host nam es of apache.snake.net and t om cat .snake.net t o represent URLs for script s processed using Apache and Tom cat . These m ay or m ay not m ap t o t he sam e physical m achine, depending on your DNS set t ings, so t he exam ples use a different port ( 8080) for Tom cat . Typical form s for URLs t hat you'll see in t his book are as follows: ht t p: / / apache.snake.net / cgi-bin/ m y_perl_script .pl ht t p: / / apache.snake.net / cgi-bin/ m y_pyt hon_script .py

ht t p: / / apache.snake.net / m cb/ m y_php_script .php ht t p: / / t om cat .snake.net : 8080/ m cb/ m y_j sp_script .j sp You'll need t o change t he host nam e and port num ber appropriat ely for pages served by your own servers.

16.3 Using Apache to Run Web Scripts 16.3.1 Problem You want t o run Perl, PHP, or Pyt hon program s in a web environm ent .

16.3.2 Solution Execut e t hem using t he Apache server.

16.3.3 Discussion This sect ion describes how t o configure Apache for running Perl, PHP, and Pyt hon script s, and illust rat es how t o writ e web-based script s in each language. There are t ypically several direct ories under t he Apache root direct ory, which I 'll assum e here t o be / usr/ local/ apache. These direct ories include:

bin Cont ains ht t pd—t hat is, Apache it self—and ot her Apache- relat ed execut able program s

conf For configurat ion files, not ably ht t pd.conf, t he prim ary file used by Apache

htdocs The root of t he docum ent t ree

logs For log files To configure Apache for script execut ion, edit t he ht t pd.conf file in t he conf direct ory. Typically, execut able script s are ident ified eit her by locat ion or by filenam e suffix. A locat ion can be eit her language- neut ral or language-specific.

Apache configurat ions oft en have a cgi- bin direct ory under t he server root direct ory where you can inst all script s t hat should run as ext ernal program s. I t 's configured using a

ScriptAlias direct ive: ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/ The second argum ent is t he act ual locat ion of t he script direct ory in your filesyst em , and t he first is t he pat hnam e in URLs t hat corresponds t o t hat direct ory. Thus, t he direct ive j ust shown associat es script s locat ed in / usr/ local/ apache/ cgi- bin wit h URLs t hat have cgi-bin following t he host nam e. For exam ple, if you inst all t he script m yscript .py in t he direct ory / usr/ local/ apache/ cgi-bin on t he host apache.snake.net , you'd request it wit h t his URL: ht t p: / / apache.snake.net / cgi-bin/ m yscript .py When configured t his way, t he cgi- bin direct ory can cont ain script s writ t en in any language. Because of t his, t he direct ory is language- neut ral, so Apache needs t o be able t o figure out which language processor t o use t o execut e each script t hat is inst alled t here. To provide t his inform at ion, t he first line of t he script should begin wit h

#! followed by t he pat hnam e t o t he

program t hat execut es t he script , and possibly som e opt ions. For exam ple, a script t hat begins wit h t he following line will be run by Perl, and t he - w opt ion t ells Perl t o warn about quest ionable language const ruct s:

#! /usr/bin/perl -w Under Unix, you m ust also m ake t he script execut able ( use chm od + x) , or it won't run properly. The

#! line j ust

shown is appropriat e for a syst em t hat has Perl inst alled as

/ usr/ bin/ perl. I f your Perl int erpret er is inst alled som ewhere else, m odify t he line accordingly. I f you're on a Windows m achine wit h Perl inst alled as D: \ Perl\ bin\ perl.exe, t he

#! line should

look like t his:

#! D:\Perl\bin\perl -w For Windows users, anot her opt ion t hat is sim pler is t o set up a filenam e ext ension associat ion bet ween script nam es t hat end wit h a .pl suffix and t he Perl int erpret er. Then t he script can begin like t his:

#! perl -w A

ScriptAlias direct ive set s up a direct ory

t hat can be used for script s writ t en in any

language. I t 's also possible t o associat e a direct ory wit h a specific language processor, so t hat any script found t here is assum ed t o be writ t en in a part icular language. For exam ple, t o designat e / usr/ local/ apache/ cgi- perl as a m od_perl direct ory, you m ight configure Apache like t his:

Alias /cgi-perl/ /usr/local/apache/cgi-perl/

SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader on Options +ExecCGI

I n t his case, Perl script s locat ed in t he designat ed direct ory would be invoked as follows:

http://apache.snake.net/cgi-perl/myscript.pl Using m od_perl is beyond t he scope of t his book, so I won't say any m ore about it . Check Appendix C for som e useful m od_perl resources. Direct ories used only for script s generally are placed out side of your Apache docum ent t ree. As an alt ernat ive t o using specific direct ories for script s, you can ident ify script s by filenam e ext ension, so t hat files wit h a part icular suffix becom e associat ed wit h a specific language processor. I n t his case, you can place t hem anywhere in t he docum ent t ree. This is t he m ost com m on way t o use PHP. For exam ple, if you have Apache configured wit h PHP support built in using t he m od_php m odule, you can t ell it t hat script s having nam es ending wit h .php should be int erpret ed as PHP script s. To do so, add t his line t o ht t pd.conf:

AddType application/x-httpd-php .php Then if you inst all a PHP script m yscript .php under ht docs ( t he Apache docum ent root direct ory) , t he URL for invoking t he script becom es: ht t p: / / apache.snake.net / m yscript .php I f PHP runs as an ext ernal st andalone program , you'll need t o t ell Apache where t o find it . For exam ple, if you're running Windows and you have PHP inst alled as D: \ Php\ php.exe, put t he following lines in ht t pd.conf ( not e t he use of forward slashes in t he pat hnam es rat her t han backslashes) :

ScriptAlias /php/ "D:/Php/" AddType application/x-httpd-php .php Action application/x-httpd-php /php/php.exe For purposes of showing URLs in exam ples, I 'm going t o assum e t hat Perl and Pyt hon script s are in your cgi- bin direct ory, and t hat PHP script s are in t he m cb direct ory of your docum ent t ree, ident ified by t he .php ext ension. This m eans t hat URLs for script s in t hese languages will look like t his: ht t p: / / apache.snake.net / cgi-bin/ m yscript .pl ht t p: / / apache.snake.net / cgi-bin/ m yscript .py

ht t p: / / apache.snake.net / m cb/ m yscript .php I f you plan t o use a sim ilar set up, m ake sure you have a cgi- bin direct ory under your Apache root , and an m cb direct ory under your Apache docum ent root . Then, t o deploy Perl or Pyt hon web script s, copy t hem t o t he cgi- bin direct ory. To deploy PHP script s, copy t hem t o t he m cb direct ory. I f you request a web script and get an error page in response, have a look at t he Apache error log, which can be a useful source of diagnost ic inform at ion when you're t rying t o figure out why a script doesn't work. A com m on nam e for t his log is error_log in t he logs direct ory. I f you don't find any such file, check ht t pd.conf for an

ErrorLog direct ive t o see where

Apache logs it s errors.

Web Security Note Under Unix, script s run wit h part icular user and group I Ds. Script s t hat you execut e from t he com m and line run wit h your user and group I Ds, and have t he filesyst em privileges associat ed wit h your account . However, script s execut ed by a web server probably won't run wit h your user and group I D, nor will t hey have your user privileges. I nst ead, t hey run under t he user and group I D of t he account t he web server has been set t o run as, and wit h t hat account 's privileges. ( To det erm ine what account t his is, look for

User and Group direct ives in t he ht t pd.conf

configurat ion file.) This m eans t hat if you expect web script s t o read and writ e files, t hose files m ust be accessible t o t he account used t o run t he web server. For exam ple, if your server runs under t he

nobody account

and you want a script t o

be able t o st ore uploaded im age files int o a direct ory called uploads in t he docum ent t ree, you m ust m ake t hat direct ory readable t o and writ able by t he

nobody user.

Anot her im plicat ion is t hat if ot her people can writ e script s for your web server, t hose script s t oo will run as

nobody and t hey

can read and writ e t he sam e files as

your own script s. Solut ions t o t his problem include using t he suEXEC m echanism for Apache 1.x, or using Apache 2.x, which allows you t o designat e which user and group I Ds t o use for running a given set of script s. Aft er Apache has been configured t o support script execut ion, you can begin t o writ e script s t hat generat e web pages. The rem ainder of t his sect ion describes how t o do so for Perl, PHP, and Pyt hon. The exam ples for each language connect t o t he MySQL server, run a

TABLES query,

SHOW

and display t he result s in a web page. The script s shown here indicat e any

addit ional m odules or libraries t hat web script s t ypically need t o include. ( Lat er on, I 'll generally assum e t hat t he proper m odules have been referenced and show only script fragm ent s.)

16.3.3.1 Perl Our first web-based Perl script , show_t ables.pl, is shown below. I t produces an appropriat e

Content-Type: header, a blank line t o separat e t he header

from t he page cont ent , and

t he init ial part of t he page. Then it ret rieves and displays a list of t ables in t he

cookbook

dat abase. The t able list is followed by t he t railing HTML t ags t hat close t he page:

#! /usr/bin/perl -w # show_tables.pl - Issue SHOW TABLES query, display results # by generating HTML directly use strict; use lib qw(/usr/local/apache/lib/perl); use Cookbook; # Print header, blank line, and initial part of page print execute ( ); while (my @row = $sth->fetchrow_array ( )) { print "$row[0]
\n"; } $dbh->disconnect ( );

# Print page trailer print "Tables in cookbook Database", -bgcolor => "white"); print p ("Tables in cookbook database:"); # Connect to database, display table list, disconnect my $dbh = Cookbook::connect ( ); my $sth = $dbh->prepare ("SHOW TABLES"); $sth->execute ( ); while (my @row = $sth->fetchrow_array ( )) { print $row[0] . br ( ); }

$dbh->disconnect ( ); # Print page trailer print end_html ( ); exit (0); I nst all t he show_t ables_fc.pl script in your cgi- bin direct ory and t ry it out t o verify t hat it produces t he sam e out put as show_t ables_oo.pl. This book uses t he CGI .pm funct ion call int erface for Perl- based web script s from t his point on. You can get m ore inform at ion about CGI .pm at t he com m and line by using t he following com m ands t o read t he inst alled docum ent at ion:

% perldoc CGI % perldoc CGI::Carp Ot her references for t his m odule, bot h online and in print form , are list ed in Appendix C.

16.3.3.2 PHP PHP doesn't provide m uch in t he way of t ag short cut s, which is surprising given it s web orient at ion. On t he ot her hand, because PHP is an em bedded language, you can sim ply writ e your HTML lit erally in your script wit hout using

print st at em ent s. Here's a script

show_t ables.php t hat shift s back and fort h bet ween HTML m ode and PHP m ode:



Tables in cookbook Database

Tables in cookbook database:





To t ry t he script , put it in t he m cb direct ory of your web server's docum ent t ree and invoke it as follows: ht t p: / / apache.snake.net / m cb/ show_t ables.php Unlike t he Perl versions of t he MySQL show- t ables script , t he PHP script includes no code t o produce t he

Content-Type: header, because PHP produces it

aut om at ically. ( I f you

want t o override t his behavior and produce your own headers, consult t he

header( )

funct ion sect ion in t he PHP m anual.) Except for t he break t ags, show_t ables.php includes HTML cont ent by writ ing it out side of t he

t ags so t hat

t he PHP int erpret er sim ply writ es it wit hout int erpret at ion.

Here's a different version of t he script t hat produces all t he HTML using

print st at em ent s:

Som et im es it m akes sense t o use one approach, som et im es t he ot her—and som et im es bot h wit hin t he sam e script . I f a sect ion of HTML doesn't refer t o any variable or expression values, it can be clearer t o writ e it in HTML m ode. Ot herwise it m ay be clearer t o writ e it using

print or echo st at em ent s, t o avoid swit ching bet ween HTML and PHP m odes frequent ly. 16.3.3.3 Python

A st andard inst allat ion of Pyt hon includes

cgi and urllib m odules t hat

are useful for web

program m ing. However, we don't act ually need t hem yet , because t he only web- relat ed act ivit y of our first Pyt hon web script is t o generat e som e sim ple HTML. Here's a Pyt hon version of t he MySQL st at us script :

#! /usr/bin/python # show_tables.py - Issue SHOW TABLES query, display results import sys sys.path.insert (0, "/usr/local/apache/lib/python") import MySQLdb import Cookbook # Print header, blank line, and initial part of page print """Content-Type: text/html

Tables in cookbook Database

Tables in cookbook database:

""" # Connect to database, display table list, disconnect conn = Cookbook.connect ( ) cursor = conn.cursor ( ) cursor.execute ("SHOW TABLES") for (tbl_name, ) in cursor.fetchall ( ): print tbl_name + "
" cursor.close ( ) conn.close ( ) # Print page trailer print """

""" Put t he script in Apache's cgi- bin direct ory and invoke it like t his: ht t p: / / apache.snake.net / cgi-bin/ show_t ables.py

16.4 Using Tomcat to Run Web Scripts 16.4.1 Problem You want t o run Java- based program s in a web environm ent .

16.4.2 Solution

Writ e program s using JSP not at ion and execut e t hem using a servlet cont ainer.

16.4.3 Discussion As described in Recipe 16.3, Apache can be used t o run Perl, PHP, and Pyt hon script s. For Java, a different approach is needed, because Apache doesn't serve JSP pages. I nst ead, we'll use Tom cat , a server designed for processing Java in a web environm ent . Apache and Tom cat are very different servers, but t here is a fam ilial relat ionship—Tom cat is part of t he Jakart a Proj ect , which is overseen by t he Apache Soft ware Foundat ion. This sect ion provides an overview of JSP program m ing wit h Tom cat , but m akes several assum pt ions:



You have a som e fam iliarit y wit h t he concept s underlying JavaServer Pages, such as what a servlet cont ainer is, what an applicat ion cont ext is, and what t he basic JSP script ing elem ent s are.



The Tom cat server has been inst alled so t hat you can execut e JSP pages, and you



You are fam iliar wit h t he Tom cat webapps direct ory and how a Tom cat applicat ion is

know how t o st art and st op it . st ruct ured. I n part icular, you underst and t he purpose of t he WEB- I NF direct ory and t he web.xm l file.



You know what a t ag library is and how t o use one.

I recognize t hat t his is a lot t o assum e, because t he use of JSP and Tom cat in t he MySQL world is not so widespread as t he use of our ot her languages wit h Apache. I f you're unfam iliar wit h JSP or need inst ruct ions for inst alling Tom cat , Appendix B provides t he necessary background inform at ion. Once you have Tom cat in place, you should inst all t he following com ponent s so t hat you can work t hrough t he JSP exam ples in t his book:



The



A MySQL JDBC driver. You m ay already have one inst alled for use wit h t he script s in

mcb sam ple applicat ion locat ed in t he t om cat

direct ory of t he

recipes

dist ribut ion. earlier chapt ers, but Tom cat needs a copy, t oo.



The JSP St andard Tag Library ( JSTL) , which cont ains t ags for perform ing dat abase act ivit ies, condit ional t est ing, and it erat ive operat ions wit hin JSP pages.

I 'll discuss how t o inst all t hese com ponent s, provide a brief overview of som e of t he JSTL t ags, and t hen describe how t o writ e t he JSP equivalent of t he MySQL show- t ables script t hat was im plem ent ed in Recipe 16.3 using Perl, PHP, and Pyt hon.

16.4.4 Installing the mcb Application

Web applicat ions for Tom cat t ypically are packaged as WAR ( web archive) files and inst alled under it s webapps direct ory, which is roughly analogous t o Apache's ht docs docum ent root direct ory. The

recipes dist ribut ion includes a sam ple applicat ion nam ed mcb t hat

you can

use for t rying t he JSP exam ples described here. Look in t he dist ribut ion's t om cat direct ory, where you will find a file nam ed m cb.war. Copy t hat file t o Tom cat 's webapps direct ory. Here's an exam ple inst allat ion procedure for Unix, assum ing t hat t he

recipes dist ribut ion

and Tom cat are locat ed at / u/ paul/ recipes and / usr/ local/ j akart a-t om cat . The com m and t o inst all m cb.war would look like t his:

% cp /u/paul/recipes/tomcat/mcb.war /usr/local/jakarta-tomcat/webapps For Windows, if t he relevant direct ories are D: \ recipes and D: \ j akart a-t om cat , t he com m and looks like t his:

C:\> copy D:\recipes\tomcat\mcb.war D:\jakarta-tomcat\webapps Aft er copying t he m cb.war file t o t he webapps direct ory, rest art Tom cat . As dist ribut ed, Tom cat is configured by default t o look for WAR files under webapps when it st art s up and aut om at ically unpack any t hat have not already been unpacked. This m eans t hat copying m cb.war t o t he webapps direct ory and rest art ing Tom cat should be enough t o unpack t he

mcb applicat ion. When Tom cat

finishes it s st art up sequence, look under webapps and you

should see a new m cb direct ory under which are all t he files cont ained in m cb.war. ( I f Tom cat doesn't unpack m cb.war aut om at ically, see t he sidebar Unpacking a WAR File Manually.) I f you like, have a look around in t he m cb direct ory at t his point . I t should cont ain several files t hat client s can request using a browser. There should also be a WEB- I NF subdirect ory, which is used for inform at ion t hat is privat e—t hat is, available for use by script s in t he m cb direct ory, but not direct ly accessible by client s. Next , verify t hat Tom cat can serve pages from t he

mcb applicat ion cont ext

by request ing

som e of t hem from your browser. The following URLs request in t urn a st at ic HTML page, a servlet , and a sim ple JSP page: ht t p: / / t om cat .snake.net : 8080/ m cb/ t est .ht m l ht t p: / / t om cat .snake.net : 8080/ m cb/ servlet / Sim pleServlet ht t p: / / t om cat .snake.net : 8080/ m cb/ sim ple.j sp Adj ust t he host nam e and port num ber in t he URLs appropriat ely for your inst allat ion.

Unpacking a WAR File Manually WAR files are act ually ZI P- form at archives t hat can be unpacked using j ar, WinZip, or any ot her t ool t hat underst ands ZI P files. However, when unpacking a WAR file m anually, you'll need t o creat e it s t op- level direct ory first . The following sequence of st eps shows one way t o do t his, using t he j ar ut ilit y t o unpack a WAR file nam ed m cb.war t hat is assum ed t o be locat ed in Tom cat 's webapps direct ory. For Unix, change locat ion t o t he webapps direct ory, t hen issue t he following com m ands:

% mkdir mcb % cd mcb % jar xf ../mcb.war For Windows, t he com m ands are only slight ly different :

C:\> mkdir mcb C:\> cd mcb C:\> jar xf ..\mcb.war Unpacking t he WAR file in t he webapps direct ory creat es a new applicat ion cont ext , so you'll need t o rest art Tom cat before it not ices t he new applicat ion.

16.4.5 Installing the JDBC Driver The JSP pages in t he

mcb applicat ion need a JDBC driver

for connect ing t o t he

cookbook

dat abase. The following inst ruct ions describe how t o inst all t he MySQL Connect or/ J driver; t he inst allat ion procedure for ot her drivers should be sim ilar. To inst all MySQL Connect or/ J for use by Tom cat applicat ions, place a copy of it in Tom cat 's direct ory t ree. Assum ing t hat t he driver is packaged as a JAR file ( as is t he case for MySQL Connect or/ J) , t here are t hree likely places under t he Tom cat root direct ory where you can inst all it , depending on how visible you want t he driver t o be:



To m ake t he driver available only t o t he

mcb applicat ion, place it

in t he m cb/ WEB-

I NF/ lib direct ory under Tom cat 's webapps direct ory.



To m ake t he driver available t o all Tom cat applicat ions but not t o Tom cat it self, place it in t he lib direct ory under t he Tom cat root .



To m ake t he driver available bot h t o applicat ions and t o Tom cat , place it in t he com m on/ lib direct ory under t he Tom cat root .

I recom m end inst alling a copy of t he driver in t he com m on/ lib direct ory. That gives it t he m ost global visibilit y ( it will be accessible bot h by Tom cat and by applicat ions) , and you'll need t o inst all it only once. I f you enable t he driver only for t he

mcb applicat ion by placing a copy in

m cb/ WEB- I NF/ lib, but t hen develop ot her applicat ions t hat use MySQL, you'll need t o eit her copy t he driver int o t hose applicat ions or m ove it t o a m ore global locat ion.

Making t he driver m ore globally accessible also is useful if you t hink it likely t hat at som e point you'll elect t o use JDBC- based session m anagem ent or realm aut hent icat ion. Those act ivit ies are handled by Tom cat it self above t he applicat ion level, so Tom cat needs access t o t he driver t o carry t hem out . Here's an exam ple inst allat ion procedure for Unix, assum ing t hat t he MySQL Connect or/ J driver and Tom cat are locat ed at / src/ Java/ m ysql- connect or- j ava- bin.j ar and / usr/ local/ j akart a- t om cat . The com m and t o inst all t he driver would look like t his:

% cp /src/Java/mysql-connector-java-bin.jar /usr/local/jakartatomcat/common/lib For Windows, if t he com ponent s are inst alled at D: \ m ysql- connect or-j ava-bin.j ar and D: \ j akart a- t om cat , t he com m and looks like t his:

C:\> copy D:\mysql-connector-java-bin.jar D:\jakarta-tomcat\common\lib Aft er inst alling t he driver, rest art Tom cat and t hen request t he following

mcb applicat ion page

t o verify t hat Tom cat can find t he JDBC driver properly: ht t p: / / t om cat .snake.net : 8080/ m cb/ j dbc_t est .j sp You m ay need t o edit j dbc_t est .j sp first t o change t he connect ion param et ers.

16.4.6 Installing the JSTL Distribution Most of t he script s t hat are part of t he

mcb sam ple applicat ion use JSTL, so it 's necessary t o

inst all it or t hose script s won't work. To inst all a t ag library int o an applicat ion cont ext , copy t he library's files int o t he proper locat ions under t he applicat ion's WEB- I NF direct ory. Generally, t his m eans inst alling at least one JAR file and a t ag library descript or ( TLD) file, and adding som e t ag library inform at ion t o t he applicat ion's web.xm l file. JSTL act ually consist s of several t ag set s, so t here are t here are several JAR files and TLD files. The following inst ruct ions describe how t o inst all JSTL for use wit h t he



mcb applicat ion:

Make sure t hat t he m cb.war file has been unpacked t o creat e t he

mcb applicat ion

direct ory hierarchy under t he Tom cat webapps direct ory. ( See Recipe 16.4.416.4.4.") This is necessary because t he JSTL files m ust be inst alled under t he m cb/ WEB- I NF direct ory, which will not exist unt il m cb.war has been unpacked.



Get t he JSTL dist ribut ion from t he Jakart a Proj ect web sit e. Go t o t he Jakart a Taglibs proj ect page, which is accessible at t his URL: ht t p: / / j akart a.apache.org/ t aglibs/ Follow t he St andard Taglib link t o get t o t he JSTL inform at ion page; t he lat t er has a Downloads sect ion from which you can get t he binary JSTL dist ribut ion.



Unpack t he JSTL dist ribut ion int o som e convenient locat ion, preferably out side of t he Tom cat hierarchy. The com m ands t o do t his are sim ilar t o t hose used t o unpack Tom cat it self ( see Recipe ) . For exam ple, t o unpack a ZI P form at dist ribut ion, use t he following com m and, adj ust ing t he filenam e as necessary:

% jar xf jakarta-taglibs-standard.zip



Unpacking t he dist ribut ion will creat e a direct ory cont aining several files. Copy t he JAR files ( j st l.j ar, st andard.j ar, and so fort h) t o t he m cb/ WEB-I NF/ lib direct ory. These files cont ain t he class libraries t hat im plem ent t he JSTL t ag act ions. Copy t he t ag library descript or files ( c.t ld, sql.t ld, and so fort h) t o t he m cb/ WEB-I NF direct ory. These files define t he int erface for t he act ions im plem ent ed by t he classes in t he JAR files.



The m cb/ WEB- I NF direct ory cont ains a file nam ed web.xm l t hat is t he web applicat ion deploym ent descript or file ( a fancy nam e for "configurat ion file") . Modify web.xm l t o add

ent ries for

each of t he JSTL TLD files. The ent ries will look

som et hing like t his:

• • • • • • •

http://java.sun.com/jstl/core /WEB-INF/c.tld

http://java.sun.com/jstl/sql /WEB-INF/sql.tld

Each

ent ry

cont ains a

elem ent

t hat specifies t he

mcb JSP pages will refer t o t he corresponding TLD file, and a elem ent t hat indicat es t he locat ion of t he TLD file under t he mcb applicat ion direct ory. ( You'll find t hat web.xm l as dist ribut ed already sym bolic nam e by which

cont ains t hese ent ries. However, you should t ake a look at t hem t o m ake sure t hey m at ch t he filenam es of t he TLD files t hat you j ust inst alled in t he previous st ep.)



The m cb/ WEB- I NF direct ory also cont ains a file nam ed j st l- m cb- set up.inc. This file is not part of JSTL it self, but it cont ains a JSTL is used by m any of t he

t ag t hat

mcb JSP pages t o set

up a dat a source for connect ing t o t he

cookbook dat abase. The file looks like t his: • •

Edit t he

driver, url, user,

and

password t ag at t ribut es as necessary t o

change t he connect ion param et ers t o t hose t hat you use for accessing t he

cookbook dat abase. Do not

change t he

var at t ribut e.



The JSTL dist ribut ion also includes WAR files cont aining docum ent at ion and exam ples ( st andard-doc.war and st andard-exam ples.war) . I f you wish t o inst all t hese, copy t hem int o Tom cat 's webapps direct ory. ( You probably should inst all t he docum ent at ion so t hat you can access it locally from your own server. I t 's useful t o inst all t he exam ples as well, because t hey provide helpful dem onst rat ions showing how t o use JSTL t ags in JSP pages.)



Rest art Tom cat so t hat it not ices t he changes you've j ust m ade t o t he

mcb applicat ion

and so t hat it unpacks t he WAR files cont aining t he JSTL docum ent at ion and exam ples. I f Tom cat doesn't unpack WAR files for you aut om at ically, see t he sidebar Unpacking a WAR File Manually. Aft er inst alling JSTL, rest art Tom cat and request t he following

mcb applicat ion page t o verify

t hat Tom cat can find t he JSTL t ags properly: ht t p: / / t om cat .snake.net : 8080/ m cb/ j st l_t est .j sp

16.4.7 Writing JSP Pages with JSTL This sect ion discusses t he synt ax for som e of t he JSTL t ags used m ost frequent ly by

mcb JSP

pages. The descript ions are very brief, and m any of t hese t ags have addit ional at t ribut es t hat allow t hem t o be used in ways ot her t han t hose shown here. For m ore inform at ion, consult t he JSTL specificat ion ( see Appendix C) . A JSP page t hat uses JSTL m ust include a

taglib direct ive for

each t ag set t hat t he page

uses. Exam ples in t his book use t he core and dat abase t ags, ident ified by t he following

taglib direct ives:

uri values should m at ch t he sym bolic values t hat are list ed in t he web.xm l ent ries ( see Recipe 16.4.6") . The prefix values indicat e t he init ial st ring The

used in t ag nam es t o ident ify t ags as part of a given t ag library. JSTL t ags are writ t en in XML form at , using a special synt ax for t ag at t ribut es t o include expressions. Wit hin t ag at t ribut es, t ext is int erpret ed lit erally unless enclosed wit hin

${...}, in which case it

is int erpret ed as an expression t o be evaluat ed.

The following t ags are part of t he JSTL core t ag set :

This t ag evaluat es it s

value at t ribut e and is replaced by t he result . One com m on

use for t his t ag is t o provide cont ent for t he out put page. The following t ag would produce t he value 3:



This t ag assigns a value t o a variable. For exam ple, t o assign a st ring t o a variable nam ed

title and t hen use t he variable lat er

in t he

elem ent

of t he

out put page, do t his:



... This exam ple illust rat es a principle t hat is generally t rue for JSTL t ags: To specify a variable int o which a value is t o be st ored, nam e it wit hout using To refer t o t hat variable's value lat er, use it wit hin

${...} not at ion.

${...} so t hat

it is int erpret ed

as an expression t o be evaluat ed.

This t ag evaluat es t he condit ional t est given in it s

test at t ribut e. I f t he t est

result is

t rue, t he t ag body is evaluat ed and becom es t he t ag's out put ; if t he result is false, t he body is ignored:

1 is not equal to 0

==, !=, , =. The alt ernat ive operat ors eq, ne, lt, gt, le, and ge m ake it easier t o avoid using special HTML charact ers in expressions. Arit hm et ic operat ors are +, -, *, / ( or div) , and % ( or mod) . Logical operat ors are && ( and) , || ( or) , and ! ( not) . The special empty operat or is t rue if a value is em pt y or null: The com parison operat ors are

x is empty



y is not empty

This is anot her condit ional t ag, but it allows m ult iple condit ions t o be t est ed. I nclude a

t ag for

each condit ion t hat you want t o t est explicit ly, and a

t ag if t here is a "default "

case:

Please choose an item

Please choose only one item

Thank you for choosing an item



This t ag act s as an it erat or, allowing you t o loop over a set of values. The following

t ag t o loop t hrough a set a query ( represent ed here by t he rs variable) :

exam ple uses a from

of rows in t he result set

id = , name =


Each it erat ion of t he loop assigns t he current row t o t he variable

row. Assum ing t hat

id and name, t he colum n values are accessible as row.id and row.name. t he query result includes colum ns nam ed

The JSTL dat abase t ags are used t o issue queries and access t heir result s:

This t ag set s up connect ion param et ers t o be used when JSTL cont act s t he dat abase server. For exam ple, t o specify param et ers for using t he MySQL Connect or/ J driver t o access t he

cookbook dat abase, t he t ag looks like t his:

driver, url, user, and password at t ribut es specify t he connect ion param et ers, and t he var at t ribut e nam es t he variable t o associat e wit h t he connect ion. By convent ion, JSP pages in t his book use t he variable conn, so t ags The

occurring lat er in t he page t hat require a dat a source can refer t o t he connect ion using t he expression

${conn}.

To avoid list ing connect ion param et ers in each JSP page t hat uses MySQL, a

t ag for

connect ing t o t he

cookbook dat abase is

placed in t he include file WEB- I NF/ j st l- m cb- set up.inc. A JSP page can access t he file as follows t o set up t he connect ion:

To change t he connect ion param et ers used by t he

mcb pages, j ust

edit j st l- m cb-

set up.inc.

UPDATE, DELETE, or INSERT t hat doesn't ret urn rows, use a t ag. A dataSource t ag at t ribut e indicat es t he To issue a st at em ent such as

dat a source, t he affect ed- rows count result ing from t he st at em ent is ret urned in t he variable nam ed by t he

var at t ribut e, and t he st at em ent

it self should be specified in

t he t ag body:

DELETE FROM profile WHERE id > 100

Number of rows deleted:

To process queries t hat ret urn a result set , use

.

As wit h

, t he t ext of t he query is given in t he t ag body, and t he dataSource at t ribut e indicat es t he dat a source. The t ag also t akes a var at t ribut e t hat nam es t he variable you want t o associat e wit h t he result set :

SELECT id, name FROM profile ORDER BY id

The

mcb JSP pages use rs as t he nam e of t he result

set variable. St rat egies for

accessing t he cont ent s of a result set are out lined below.

You can writ e dat a values lit erally int o a query st ring, but JSTL also allows t he use of placeholders, which is helpful for values t hat cont ain charact ers t hat are special in SQL

? charact er for each placeholder in t he query st ring, and provide values t o be bound t o t he placeholders using t ags in t he body of st at em ent s. Use a

t he query- issuing t ag. The dat a value can be specified eit her in t he body of t he

t ag or

in it s

value at t ribut e:

DELETE FROM profile WHERE id > ? 100

SELECT id, name FROM profile WHERE cats = ? AND color = ?



are accessible several ways. Assum ing t hat a result set is available t hrough a variable rs, row i of t he result can be accessed eit her as rs.rows[i] or as rs.rowsByIndex[i], where row num ber The cont ent s of a result set ret urned by

values begin at 0. The first form produces a row wit h colum ns t hat can be accessed by nam e. The second form produces a row wit h colum ns t hat can be accessed by colum n num ber ( beginning wit h 0) . For exam ple, if a result set has colum ns nam ed

id and name,

you can

access t he values for row t hree using colum n nam es like t his:

To use colum n num bers inst ead, do t his:



as an it erat or t o loop t hrough t he rows in a result I t erat e t hrough rs.rows if you want t o access colum n values by nam e:

You can also use

id = , name =


I t erat e t hrough

rs.rowsByIndex t o access colum n values by

num ber:

set .

id = , name =


The row count is available as

rs.rowCount:

Number of rows selected: Nam es of t he colum ns in t he result set are available using

rs.columnNames:




16.4.8 Writing a MySQL Script using JSP and JSTL Recipe 16.3 shows how t o writ e Perl, PHP, and Pyt hon versions of a script t o display t he nam es of t he t ables in t he

cookbook dat abase. Wit h t he JSTL t ags, we can writ e a JSP

page t hat provides t hat inform at ion as follows:





Tables in cookbook Database

Tables in cookbook database:



SHOW TABLES




The

taglib direct ives ident ify

include accessing t he cookbook dat abase.

which t ag libraries t he script needs, and t he

direct ive pulls in t he code t hat set s up a dat a source for The rest of t he script generat es t he page cont ent .

This script should be inst alled in t he m cb subdirect ory of your Tom cat server's webapps direct ory, and you can invoke it as follows: ht t p: / / t om cat .snake.net : 8080/ m cb/ show_t ables.j sp Like t he PHP script shown in Recipe 16.3, t he JSP script does not produce any

Type: header explicit ly. The JSP engine produces a default text/html aut om at ically.

Content-

header wit h a cont ent t ype of

16.5 Encoding Special Characters in Web Output 16.5.1 Problem Cert ain charact ers are special in HTML pages and m ust be encoded if you want t o display t hem lit erally. Because dat abase cont ent oft en cont ains t hese charact ers, script s t hat include query result s in web pages should encode t hose result s t o prevent browsers from m isint erpret ing t he inform at ion.

16.5.2 Solution Use t he m et hods t hat are provided by your API for perform ing HTML-encoding and URLencoding.

16.5.3 Discussion HTML is a m arkup language—it uses cert ain charact ers as m arkers t hat have a special m eaning. To include lit eral inst ances of t hese charact ers in a page, you m ust encode t hem so t hat t hey are not int erpret ed as having t heir special m eanings. For exam ple, encoded as

< t o keep a browser

< should be

from int erpret ing it as t he beginning of a t ag.

Furt herm ore, t here are act ually t wo kinds of encoding, depending on t he cont ext in which you use a charact er. One encoding is appropriat e for general HTML t ext , anot her is used for t ext t hat is part of a URL in a hyperlink. The MySQL show- t ables script s shown in Recipe 16.3 and Recipe 16.4 are sim ple dem onst rat ions of how t o produce web pages using program s. But wit h one except ion, t he script s have a com m on failing: t hey t ake no care t o properly encode special charact ers t hat occur in t he inform at ion ret rieved from t he MySQL server. ( The except ion is t he JSP version of t he script ; t he

t ag used t here handles encoding aut om at ically, as we'll discuss

short ly.) As it happens, I deliberat ely chose inform at ion t o display t hat is unlikely t o cont ain any special charact ers, so t hey should work properly even in t he absence of any encoding. However, in t he general case, it 's unsafe t o assum e t hat a query result will cont ain no special charact ers

and t hus you m ust be prepared t o encode it . Neglect ing t o do t his oft en result s in script s t hat generat e pages cont aining m alform ed HTML t hat displays incorrect ly. This sect ion describes how t o handle special charact ers, beginning wit h som e general principles, and t hen discusses how each API im plem ent s encoding support . The API - specific exam ples show how t o process inform at ion drawn from a dat abase t able, but t hey can be adapt ed t o any cont ent you include in a web page, no m at t er it s source.

16.5.4 General Encoding Principles One form of encoding applies t o charact ers t hat are used in writ ing HTML const ruct s, anot her applies t o t ext t hat is included in URLs. I t 's im port ant t o underst and t his dist inct ion so t hat you don't encode t ext inappropriat ely. Not e t oo t hat encoding t ext for inclusion in a web page is an ent irely different issue t han encoding special charact ers in dat a values for inclusion in a SQL st at em ent . The lat t er issue is discussed in Recipe 2.8.

16.5.4.1 Encoding characters that are special in HTML

< and > charact ers t o begin and end t ags, & t o begin special ent it y nam es ( such as   t o signify a non- breaking space) , and " t o quot e at t ribut e values in t ags ( such as

) . Consequent ly, t o display lit eral inst ances of t hese HTML m arkup uses

charact ers, you m ust encode t hem as HTML ent it ies so t hat browsers or ot her client s underst and your int ent . To do t his, convert designat ors

, &,

< ( less t han) , > ( great er

" t o t he corresponding HTML ent it y t han) , & ( am persand) , and " and

( quot e) . Suppose you want t o display t he following st ring lit erally in a web page:

Paragraphs begin and end with

&

tags. I f you send t his t ext t o t he client browser exact ly as shown, t he browser will m isint erpret it . ( The

and

t ags will be t aken as paragraph m arkers and t he & m ay be t aken as

t he beginning of an HTML ent it y designat or.) To display t he st ring t he way you int end, t he special charact ers should be encoded as t he

,

and

&,

ent it ies:

Paragraphs begin and end with

&

tags. The principle of encoding t ext t his way is also useful wit hin t ags. For exam ple, HTML t ag at t ribut e values usually are enclosed wit hin double quot es, so it 's im port ant t o perform HTMLencoding on at t ribut e values. Suppose you want t o include a t ext input box in a form , and you want t o provide an init ial value of

Rich "Goose" Gossage t o be displayed in t he box.

You cannot writ e t hat value lit erally in t he t ag like t his:

The problem here is t hat t he double-quot ed which m akes t he

value at t ribut e includes int ernal double quot es,

t ag m alform ed. The proper

way t o writ e it is t o encode t he

double quot es:

When a browser receives t his t ext , it will decode t he and int erpret t he

" ent it ies back

to

" charact ers

value at t ribut e value properly.

16.5.4.2 Encoding characters that are special in URLs URLs for hyperlinks t hat occur wit hin HTML pages have t heir own synt ax, and t heir own encoding. This encoding applies t o at t ribut es wit hin several t ags:



Many charact ers have special m eaning wit hin URLs, such as

:, /, ?, =, &,

and

;. The

following URL cont ains som e of t hese charact ers: ht t p: / / apache.snake.net / m yscript .php?id= 428&nam e= Gandalf

: and / charact ers segm ent t he URL int o com ponent s, t he ? charact er indicat es t hat param et ers are present , and t he & charact ers separat es t he param et ers, each of which is specified as a name= value pair. ( The ; charact er is not present in t he URL j ust shown, but com m only is used inst ead of & t o separat e param et ers.) I f you want t o include any of

Here t he

t hese charact ers lit erally wit hin a URL, you m ust encode t hem t o prevent t he browser from int erpret ing t hem wit h t heir usual special m eaning. Ot her charact ers such as spaces require special t reat m ent as well. Spaces are not allowed wit hin a URL, so if you want t o reference a page nam ed m y hom e page.ht m l on t he sit e apache.snake.net , t he URL in t he following hyperlink won't work:

My Home Page URL- encoding for special and reserved charact ers is perform ed by convert ing each such charact er t o

% followed by t wo hexadecim al digit s represent ing t he charact er's ASCI I

code.

For exam ple, t he ASCI I value of t he space charact er is 32 decim al, or 20 hexadecim al, so you'd writ e t he preceding hyperlink like t his:

My Home Page Som et im es you'll see spaces encoded as

+ in URLs. This t oo is legal.

16.5.4.3 Encoding interactions Be sure t o encode inform at ion properly for t he cont ext in which you're using it . Suppose you want t o creat e a hyperlink t o t rigger a search for it em s m at ching a search t erm , and you want t he t erm it self t o appear as t he link label t hat is displayed in t he page. I n t his case, t he t erm appears as a param et er in t he URL, and also as HTML t ext bet ween t he

and t ags.

I f t he search t erm is "cat s & dogs", t he unencoded hyperlink const ruct looks like t his:

cats & dogs That is incorrect because

& is special in bot h cont ext s and t he spaces are special in t he URL.

The link should be writ t en like t his inst ead:

cats & dogs Here,

& is HTML-encoded as & for

t he link label, and is URL-encoded as

URL, which also includes spaces encoded as

%26 for

t he

%20.

Grant ed, it 's a pain t o encode t ext before writ ing it t o a web page, and som et im es you know enough about a value t hat you can skip t he encoding ( see t he sidebar Do You Always Need t o Encode Web Page Out put ?") . But encoding is t he safe t hing t o do m ost of t he t im e. Fort unat ely, m ost API s provide funct ions t o do t he work for you. This m eans you need not know every charact er t hat is special in a given cont ext . You j ust need t o know which kind of encoding t o perform , and call t he appropriat e funct ion t o produce t he int ended result .

Do You Always Need to Encode Web Page Output? I f you know a value is legal in a part icular cont ext wit hin a web page, you need not encode it . For exam ple, if you obt ain a value from an int eger- valued colum n in a dat abase t able t hat cannot be

NULL,

it m ust necessarily be an int eger. No HTML- or

URL- encoding is needed t o include t he value in a web page, because digit s are not special in HTML t ext or wit hin URLs. On t he ot her hand, suppose you solicit an int eger value using a field in a web form . You m ight be expect ing t he user t o provide an int eger, but t he user m ight be confused and ent er an illegal value. You could handle t his by displaying an error page t hat shows t he value and explains t hat it 's not an int eger. But if t he value cont ains special charact ers and you don't encode it , t he page won't display t he value properly, possibly confusing t he user furt her.

16.5.5 Encoding Special Characters Using Web APIs The following encoding exam ples show how t o pull values out of MySQL and perform bot h HTML-encoding and URL- encoding on t hem t o generat e hyperlinks. Each exam ple reads a t able nam ed

phrase t hat

cont ains short phrases, using it s cont ent s t o const ruct hyperlinks

t hat point t o a ( hypot het ical) script t hat searches for inst ances of t he phrases in som e ot her t able. The t able looks like t his:

mysql> SELECT phrase_val FROM phrase ORDER BY phrase_val; +--------------------------+ | phrase_val | +--------------------------+ | are we "there" yet? | | cats & dogs | | rhinoceros | | the whole > sum of parts | +--------------------------+ The goal here is t o generat e a list of hyperlinks using each phrase bot h as t he hyperlink label ( which requires HTML-encoding) and in t he URL as a param et er t o t he search script ( which requires URL- encoding) . The result ing links look like t his:

are we "there" yet?

cats & dogs

rhinoceros

the whole > sum of parts The links produced by som e API s will look slight ly different , because t hey encode spaces as rat her t han as

%20.

+

16.5.5.1 Perl The Perl CGI .pm m odule provides t wo m et hods,

escapeHTML( ) and escape( ),

t hat handle HTML-encoding and URL- encoding. There are t hree ways t o use t hese m et hods t o encode a st ring

$str:



escapeHTML( ) and escape( ) as CGI CGI:: prefix:



use CGI; printf "%s\n%s\n", CGI::escape ($str), CGI::escapeHTML ($str);



Creat e a

• •

use CGI; my $cgi = new CGI; printf "%s\n%s\n", $cgi->escape ($str), $cgi->escapeHTML ($str);



I m port t he nam es explicit ly int o your script 's nam espace. I n t his case, neit her a

I nvoke

CGI obj ect

and invoke

class m et hods using a

escapeHTML( ) and escape( ) as obj ect

m et hods:

obj ect nor t he

CGI:: prefix

CGI

is necessary and you can invoke t he m et hods as

st andalone funct ions. The following exam ple im port s t he t wo m et hod nam es in addit ion t o t he set of st andard nam es:



use CGI qw(:standard escape escapeHTML); printf "%s\n%s\n", escape ($str), escapeHTML ($str);

I prefer t he last alt ernat ive because it is consist ent wit h t he CGI .pm funct ion call int erface t hat you use for ot her im port ed m et hod nam es. Just rem em ber t o include t he encoding m et hod nam es in t he

use CGI st at em ent

for any Perl script t hat requires t hem , or you'll get

"undefined subrout ine" errors when t he script execut es.

phrase t able and produces hyperlinks from using escapeHTML( ) and escape( ):

The following code reads t he cont ent s of t he t hem

my $query = "SELECT phrase_val FROM phrase ORDER BY phrase_val"; my $sth = $dbh->prepare ($query); $sth->execute ( ); while (my ($phrase) = $sth->fetchrow_array ( )) { # URL-encode the phrase value for use in the URL # HTML-encode the phrase value for use in the link label my $url = "/cgi-bin/mysearch.pl?phrase=" . escape ($phrase); my $label = escapeHTML ($phrase); print a ({-href => $url}, $label) . br ( ) . "\n"; }

16.5.5.2 PHP

I n PHP, t he

htmlspecialchars( ) and urlencode( ) funct ions perform

HTML-encoding and URL- encoding. They're used as follows:

$query = "SELECT phrase_val FROM phrase ORDER BY phrase_val"; $result_id = mysql_query ($query, $conn_id); if ($result_id) { while (list ($phrase) = mysql_fetch_row ($result_id)) { # URL-encode the phrase value for use in the URL # HTML-encode the phrase value for use in the link label $url = "/mcb/mysearch.php?phrase=" . urlencode ($phrase); $label = htmlspecialchars ($phrase); printf ("%s
\n", $url, $label); } mysql_free_result ($result_id); }

16.5.5.3 Python

cgi and urllib m odules cont ain t he relevant encoding m et hods. cgi.escape( ) perform s HTML- encoding and urllib.quote( ) does URLI n Pyt hon, t he

encoding:

import cgi import urllib query = "SELECT phrase_val FROM phrase ORDER BY phrase_val" cursor = conn.cursor ( ) cursor.execute (query) for (phrase,) in cursor.fetchall ( ): # URL-encode the phrase value for use in the URL # HTML-encode the phrase value for use in the link label url = "/cgi-bin/mysearch.py?phrase=" + urllib.quote (phrase) label = cgi.escape (phrase, 1) print "%s
" % (url, label) cursor.close ( )

cgi.escape( ) is t he st ring t o be HTML-encoded. By default , t his funct ion convert s , and & charact ers t o t heir corresponding HTML ent it ies. To t ell cgi.escape( ) also t o convert double quot es t o t he " ent it y, pass a second argum ent of 1, as shown in t he exam ple. This is especially im port ant if you're encoding values The first argum ent t o

t o be placed int o a double- quot ed t ag at t ribut e.

16.5.5.4 Java The

JSTL t ag aut om at ically

perform s HTML- encoding for JSP pages. ( St rict ly

, &, ", and ', By using t o display t ext in a

speaking, it perform s XML- encoding, but t he set of charact ers affect ed is which includes all t hose needed for HTML- encoding.)

web page, you need not even t hink about convert ing special charact ers t o HTML ent it ies. I f for som e reason you want t o suppress encoding, invoke

To URL- encode param et ers for inclusion in a URL, use t he

t ag. Specify

t he URL

value at t ribut e, and include any param et er values and nam es in t ags in t he body of t he t ag. A param et er value can be given eit her in t he value at t ribut e of a t ag or in it s body. Here's an exam ple t hat st ring in t he t ag's

shows bot h ways:

sky blue

id and color param et ers and add t hem t o t he end is placed in an obj ect nam ed urlStr, which you can display as

This will URL- encode t he values of t he of t he URL. The result follows:

t ag does not encode special charact ers such as spaces in t he st ring supplied in it s value at t ribut e. You m ust encode t hem yourself, so it 's probably best j ust t o avoid The

creat ing pages wit h spaces in t heir nam es, t o avoid t he likelihood t hat you'll need t o refer t o t hem .

and t ags can be used as follows t o display ent ries from phrase t able: The

t he

SELECT phrase_val FROM phrase ORDER BY phrase_val








Chapter 17. Incorporating Query Results into Web Pages Sect ion 17.1. I nt roduct ion Sect ion 17.2. Displaying Query Result s as Paragraph Text Sect ion 17.3. Displaying Query Result s as List s Sect ion 17.4. Displaying Query Result s as Tables Sect ion 17.5. Displaying Query Result s as Hyperlinks Sect ion 17.6. Creat ing a Navigat ion I ndex from Dat abase Cont ent Sect ion 17.7. St oring I m ages or Ot her Binary Dat a Sect ion 17.8. Ret rieving I m ages or Ot her Binary Dat a Sect ion 17.9. Serving Banner Ads Sect ion 17.10. Serving Query Result s for Download

17.1 Introduction When you st ore inform at ion in your dat abase, you can easily ret rieve it for use on t he Web in a variet y of ways. Query result s can be displayed as unst ruct ured paragraphs or as st ruct ured elem ent s such as list s or t ables; you can display st at ic t ext or creat e hyperlinks. Query m et adat a can be useful when form at t ing query result s, t oo, such as when generat ing an HTML t able t hat displays a result set and uses it s m et adat a t o get t he colum n headings for t he t able. These t asks com bine query processing wit h web script ing, and are prim arily a m at t er of properly encoding any special charact ers in t he result s ( like

& or

Adding inform at ion t o t he web.xm l file is a m at t er of placing new elem ent s bet ween t he

and t ags. As a sim ple illust rat ion, you can add a elem ent t o specify a list of files t hat Tom cat should look for when client s send a request URL t hat ends wit h m yapp and no specific page. Whichever file Tom cat finds first becom es t he default page t hat is sent t o t he client . For exam ple, t o specify t hat Tom cat should consider page3.j sp and index.ht m l t o be valid default pages, creat e a web.xm l file in t he WEB- I NF direct ory t hat looks like t his:



page3.jsp index.html

Rest art Tom cat so it reads t he new applicat ion configurat ion inform at ion, t hen issue a request t hat specifies no explicit page:

ht t p: / / t om cat .snake.net : 8080/ m yapp/ The m yapp direct ory cont ains a page nam ed page3.j sp, which is list ed as one of t he default pages in t he web.xm l file, so Tom cat should execut e page3.j sp and send t he result t o your browser.

B.4 Elements of JSP Pages An earlier sect ion of t his appendix described som e general charact erist ics of JSP pages. This sect ion discusses in m ore det ail t he kinds of const ruct s you can use. JSP pages are t em plat es t hat cont ain st at ic part s and dynam ic part s:



Lit eral t ext in a JSP page t hat is not enclosed wit hin special m arkers is st at ic; it 's sent t o t he client wit hout change. The JSP exam ples in t his book produce HTML pages, so t he st at ic part s of JSP script s are writ t en in HTML. But you can also writ e pages t hat produce ot her form s of out put , such as plain t ext , XML, or WML.



The non-st at ic ( dynam ic) part s of JSP pages consist of code t o be evaluat ed. The code is dist inguished from st at ic t ext by being enclosed wit hin special m arkers. Som e m arkers indicat e page-processing direct ives or script let s. A direct ive gives t he JSP engine inform at ion about how t o process t he page, whereas a script let is a m iniprogram t hat is evaluat ed and replaced by what ever out put it produces. Ot her m arkers t ake t he form of t ags writ t en as XML elem ent s; t hey are associat ed wit h classes t hat act as t ag handlers t o perform t he desired act ions.

The following sect ions discuss t he various t ypes of dynam ic elem ent s t hat JSP pages can cont ain.

B.4.1 Scripting Elements Several set s of script ing m arkers allow you t o em bed Java code or com m ent s in a JSP page:

m arkers indicat e a script let —t hat is, em bedded Java code. The following script let invokes print( ) t o writ e a value t o t he out put page: The



These m arkers indicat e an expression t o be evaluat ed. The result is added t o t he out put page, which m akes it easy t o display values wit h no explicit print st at em ent . For exam ple, t hese t wo const ruct s bot h display t he value 3, but t he second is easier t o writ e:



The

m arkers allow class variables and m et hods t o be declared.

Text wit hin t hese m arkers is t reat ed as a com m ent and ignored. JSP com m ent s disappear ent irely and do not appear in t he out put t hat is ret urned t o t he client . I f you're writ ing a JSP page t hat produces HTML and you want t he com m ent t o appear in t he final out put page, use an HTML com m ent inst ead:

When a JSP page is t ranslat ed int o a servlet , all script ing elem ent s effect ively becom e part of t he sam e servlet . This m eans t hat a variable declared in one elem ent can be used by ot her elem ent s lat er in t he page. I t also m eans t hat if you declare a given variable in t wo elem ent s, t he result ing servlet is illegal and an error will occur.

and m arkers bot h can be used t o declare variables, but differ in t heir effect . A variable declared wit hin is an obj ect ( or inst ance) variable; it is init ialized each t im e t hat t he page is request ed. A variable declared wit hin is a The

class variable, init ialized only at t he beginning t he life of t he page. Consider t he following JSP page, count er.j sp, which declares

counter1 as an obj ect

variable and

counter2 as a

class variable:





Counter 1 is

Counter 2 is

I f you inst all t he page and request it several t im es, t he value of every request . The value of

counter1 will be 1 for

counter2 increm ent s across successive request s ( even if

different client s request t he page) , unt il Tom cat is rest art ed. I n addit ion t o variables t hat you declare yourself, JSP pages have access t o a num ber of obj ect s t hat are declared for you im plicit ly. These are discussed in "I m plicit JSP Obj ect s."

B.4.2 JSP Directives

The

m arkers indicat e a JSP direct ive t hat

provides t he JSP processor wit h

inform at ion about t he kind of out put t he page produces, t he classes or t ag libraries it requires, and so fort h.

page direct ives provide several kinds of inform at ion, which are indicat ed by one or m ore attribute="value" pairs following t he page keyword. The following direct ive specifies t hat t he page script ing language is Java and t hat it produces an out put page wit h a cont ent t ype of

text/html:

This part icular direct ive need not act ually be specified at all, because

text/html are t he default

java and

values for t heir respect ive at t ribut es.

I f a JSP page produces non-HTML out put , be sure t o override t he default cont ent t ype. For exam ple, if a page produces plain t ext , use t his direct ive:

import at t ribut e causes Java classes t o be im port ed. I n a regular Java program , you would do t his using an import st at em ent . I n a JSP page, use a page

An

direct ive inst ead:

The date is .

I f you refer t o a part icular class only once, it m ay be m ore convenient t o om it t he direct ive and j ust refer t o t he class by it s full nam e when you use it :

The date is .



The

include direct ive insert s t he cont ent s of a file int o t he page t ranslat ion

process. That is, t he direct ive is replaced by t he cont ent s of t he included file, which is t hen t ranslat ed it self. The following direct ive causes inclusion of a file nam ed m yset up-st uff.inc from t he applicat ion's WEB- I NF direct ory:

/ indicat es a filenam e relat ive t o t he applicat ion direct ory ( a cont ext - relat ive pat h) . No leading / m eans t he file is relat ive t o t he locat ion of t he page cont aining t he

A leading

include direct ive. I nclude files allow cont ent ( eit her st at ic or dynam ic) t o be shared easily am ong a set of JSP pages. For exam ple, you can use t hem t o provide st andard headers or foot ers for a set of JSP pages, or t o execut e code for com m on operat ions such as set t ing up a connect ion t o a dat abase server.

A

taglib direct ive indicat es t hat

t he page uses cust om act ions from a given t ag

library. The direct ive includes at t ribut es t hat t ell t he JSP engine how t o locat e t he TLD file for t he library and also t he nam e you'll use in t he rest of t he page t o signify t ags from t he library. For exam ple, a page t hat uses t he core and dat abase- access t ags

taglib direct ives:

from JSTL m ight include t he following

The

uri ( Uniform

Resource I dent ifier) at t ribut e uniquely ident ifies t he t ag library so

t hat t he JSP engine can find it s TLD file. The TLD defines t he behavior ( t he int erface) of t he act ions so t hat t he JSP processor can m ake sure t he page uses t he library's t ags

uri values is t o use a st ring which t he t ag library originat es. That m akes t he uri

correct ly. A com m on convent ion for const ruct ing unique t hat includes t he host from

value look like a URL, but it 's j ust an ident ifier; it doesn't m ean t hat t he JSP engine act ually goes t o t hat host t o fet ch t he descript or file. The rules for int erpret ing t he

uri value are described in Recipe B.4.4." The

prefix at t ribut e indicat es how

t ags from t he library will be invoked. The

direct ives j ust shown indicat e t hat core and dat abase t ags will have t he form s

and . For

exam ple, you can use t he

out t ag from

core library as follows t o display a value:

Or you m ight issue a query wit h t he dat abase

query t ag like t his:

SHOW TABLES

B.4.3 Action Elements

t he

Act ion elem ent t ags can refer t o st andard ( predefined) JSP act ions, or t o cust om act ions in a t ag library. Tag nam es include a prefix and a specific act ion:



jsp prefix indicat e predefined act ion elem ent s. For exam ple, forwards t he current request t o anot her page. This act ion is Tag nam es wit h a

available t o any page run under a st andard JSP processor.



Cust om act ions are im plem ent ed by t ag libraries. The prefix of t he t ag nam e m ust m at ch t he

prefix at t ribut e of a taglib direct ive t hat

appears earlier in t he

page, so t hat t he JSP processor can det erm ine which library t he t ag is part of. To use cust om t ags, t he library m ust be inst alled first . See "Using a Tag Library." Act ions are writ t en as XML elem ent s wit hin a JSP page, and t heir synt ax follows norm al XML rules. An elem ent wit h a body is writ t en wit h separat e opening and closing t ags:

x is zero

I f t he t ag has no body, t he opening and closing t ags can be com bined:

B.4.4 Using a Tag Library Suppose t hat you have a t ag library consist ing of a JAR file m yt ags.j ar and a t ag library descript or file m yt ags.t ld. To m ake t he library available t o t he JSP pages in a given applicat ion, bot h files m ust be inst alled. Typically, you'd put t he JAR file in t he applicat ion's WEB-I NF/ lib direct ory and t he TLD file in t he WEB- I NF direct ory. A JSP page t hat uses t he t ag library m ust include an appropriat e

taglib direct ive before

using any of t he act ions t hat t he library provides:

prefix at t ribut e t ells Tom cat how you'll refer t o t ags from t he library in t he rest of t he JSP page. I f you use a prefix value of mytags, you can refer t o t ags lat er in t he page

The

like t his:

tag body

The

prefix value is a nam e of your own choosing, but

you m ust use it consist ent ly

t hroughout t he page, and you cannot use t he sam e value for t wo different t ag libraries.

The

uri at t ribut e t ells t he JSP processor

how t o find t he t ag library's TLD file. The value can

be eit her direct or indirect :



You can specify t he

uri value direct ly

as t he pat hnam e t o t he TLD file, which

t ypically will be inst alled in t he WEB- I NF direct ory:

/ indicat es a filenam e relat ive t o t he applicat ion direct ory ( a cont ext - relat ive pat h) . No leading / m eans t he file is relat ive t o t he locat ion of t he page cont aining t he taglib direct ive. A leading

I f an applicat ion uses lot s of t ag libraries, a com m on convent ion for keeping TLD files from clut t ering up t he WEB- I NF direct ory is t o put t hem in a t ld subdirect ory of t he WEB-I NF direct ory. I n t hat case, t he

uri value would be writ t en like t his inst ead:

The disadvant age of specifying a TLD file pat hnam e direct ly is t hat if a new version of t he t ag library is released and t he TLD file has a different nam e, you'll need t o m odify t he



taglib direct ive in every

JSP page t hat refers t o t he file.

Anot her way t o specify t he locat ion of t he TLD file is by using t he pat hnam e t o t he t ag library JAR file:

The JSP processor can find t he TLD file t his way, provided a copy of it is included in t he JAR file as META- I NF/ t aglib.t ld. However, t his m et hod suffers t he sam e problem as specifying t he TLD filenam e direct ly—if a new version of t he library com es out wit h a different JAR file pat hnam e, you m ust updat e

taglib direct ives in individual JSP

pages. I t also doesn't work for cont ainers t hat can't find TLD files in JAR files. ( Older versions of Tom cat have t his problem , for exam ple.)



A t hird way t o specify t he locat ion of t he TLD file is t o do so indirect ly. Assign a sym bolic nam e t o t he library and add a

ent ry t o t he applicat ion's

web.xm l file t hat m aps t he sym bolic nam e t o t he pat hnam e of t he TLD file. Then refer t o t he sym bolic nam e in your JSP pages. Suppose you define t he sym bolic nam e for t he

mytags t ag library

as:

http://terrific-tags.com/mytags

The

ent ry

in web.xm l should list t he sym bolic nam e and provide t he

pat h t o t he corresponding TLD file. I f t he file is inst alled in t he WEB-I NF direct ory, writ e t he ent ry like t his:

http://terrific-tags.com/mytags /WEB-INF/mytags.tld

I f t he file is inst alled in WEB- I NF/ t ld inst ead, writ e t he ent ry like t his:

http://terrific-tags.com/mytags /WEB-INF/tld/mytags.tld

Eit her way, you refer t o t he t ag library in JSP pages using t he sym bolic nam e, like t his:

Using a sym bolic TLD nam e involves a level of indirect ion, but has a significant advant age in t hat it provides a m ore st able m eans by which t o refer t o t he t ag library in JSP pages. You specify t he act ual locat ion of t he TLD file only in web.xm l, rat her t han in individual JSP pages. I f a new version of t he t ag library com es out and t he TLD file has a different nam e, j ust change t he

value in

web.xm l and rest art Tom cat t o allow your JSP pages t o use t he new library. There's no need t o change any of t he JSP pages.

B.4.5 Implicit JSP Objects When a servlet runs, t he servlet cont ainer passes it t wo argum ent s represent ing t he request and t he response, but you m ust declare ot her obj ect s yourself. For exam ple, you can use t he

response argum ent

t o obt ain an out put - writ ing obj ect like t his:

PrintWriter out = response.getWriter ( ); A convenience t hat JSP provides in com parison t o servlet writ ing is a set of im plicit obj ect s— t hat is, st andard obj ect s t hat are provided as part of t he JSP execut ion environm ent . You can refer t o any of t hese obj ect s wit hout explicit ly declaring t hem . Thus, in a JSP page, t he

out

obj ect can be t reat ed as having already been set up and m ade available for use. Som e of t he m ore useful im plicit obj ect s are:

pageContext An obj ect t hat provides t he environm ent for t he page.

request An obj ect t hat cont ains inform at ion about t he request received from t he client , such as t he param et ers subm it t ed in a form .

response The response being const ruct ed for t ransm ission t o t he client . You can use it t o specify response headers, for exam ple.

out The out put obj ect . Writ ing t o t his obj ect t hrough m et hods such as

println( ) adds t ext

print( ) or

t o t he response page.

session Tom cat provides access t o a session t hat can be used t o carry inform at ion from request t o request . This allows you t o writ e applicat ions t hat int eract wit h t he user in what seem s t o t he user as a cohesive series of event s. Sessions are described m ore fully in Chapt er 19.

application This obj ect provides access t o inform at ion t hat is shared on an applicat ion- wide basis.

B.4.6 Levels of Scope in JSP Pages JSP pages have access t o several scope levels, which can be used t o st ore inform at ion t hat varies in how widely available it is. The scope levels are:

Page scope I nform at ion t hat is available only t o t he current JSP page.

Request scope I nform at ion t hat is available t o any of t he JSP pages or servlet s t hat are servicing t he current client request . I t 's possible for one page t o invoke anot her during request processing; placing inform at ion in request scope allows such pages t o com m unicat e wit h each ot her.

Session scope I nform at ion t hat is available t o any page servicing a request t hat is part of a given session. Session scope can span m ult iple request s from a given client .

A pplication scope I nform at ion t hat is available t o any page t hat is part of t he applicat ion cont ext . Applicat ion scope can span m ult iple request s, sessions, or client s. One cont ext knows not hing about ot her cont ext s, but pages served from wit hin t he sam e cont ext can share inform at ion wit h each ot her by regist ering at t ribut es ( obj ect s) in one of t he scopes t hat are higher t han page scope. To m ove inform at ion int o or out of a given scope, use t he

setAttribute( ) or

getAttribute( ) m et hods of t he im plicit obj ect corresponding t o t hat scope ( pageContext, request, session, or application) . For exam ple, t o place a st ring value tomcat.snake.net int o request scope as an at t ribut e nam ed myhost, use t he request obj ect : request.setAttribute ("myhost", "tomcat.snake.net");

setAttribute( ) st ores t he value as an Object. To ret rieve t he value lat er, fet ch it by nam e using getAttribute( ) and t hen coerce it back t o st ring form : Object obj; String host; obj = request.getAttribute ("myhost"); host = obj.toString ( );

pageContext obj ect , setAttribute( ) and getAttribute( ) default t o page cont ext . Alt ernat ively, t hey can be invoked wit h an addit ional param et er of PAGE_SCOPE, REQUEST_SCOPE, SESSION_SCOPE, or APPLICATION_SCOPE t o specify a scope level explicit ly. The following st at em ent s have When used wit h t he

t he sam e effect as t hose j ust shown:

pageContext.setAttribute ("myhost", "tomcat.snake.net", pageContext.REQUEST_SCOPE); obj = pageContext.getAttribute ("myhost", pageContext.REQUEST_SCOPE); host = obj.toString ( );

Appendix C. References This appendix list s som e references t hat you should find helpful if you want m ore inform at ion about t opics discussed in t his book.

C.1 MySQL Resources Michael Widenius, David Axm ark and MySQL AB MySQL Reference Manual O'Reilly & Associat es. Paul DuBois. MySQL. New Riders. A FAQ for MySQL is available at ht t p: / / www.bit bybit .dk/ m ysqlfaq/ . This sit e also provides a useful index of changes and updat es t o MySQL, which is helpful for det erm ining whet her a feat ure you're t rying t o use is present in your version.

C.1 MySQL Resources Michael Widenius, David Axm ark and MySQL AB MySQL Reference Manual O'Reilly & Associat es. Paul DuBois. MySQL. New Riders. A FAQ for MySQL is available at ht t p: / / www.bit bybit .dk/ m ysqlfaq/ . This sit e also provides a useful index of changes and updat es t o MySQL, which is helpful for det erm ining whet her a feat ure you're t rying t o use is present in your version.

C.3 PHP Resources The prim ary PHP web sit e is ht t p: / / www.php.net / , which provides access t o PHP dist ribut ions and docum ent at ion. The sit e for PEAR, t he PHP Ext ension and Add-on Reposit ory, is ht t p: / / pear.php.net / . PEAR includes a dat abase abst ract ion m odule.

C.4 Python Resources The prim ary Pyt hon web sit e is ht t p: / / www.pyt hon.org/ , which provides access t o Pyt hon dist ribut ions and docum ent at ion. General docum ent at ion for t he DB- API dat abase access int erface is available at ht t p: / / www.pyt hon.org/ t opics/ dat abase/ . Docum ent at ion for MySQLdb, t he MySQL- specific DB- API driver, is at ht t p: / / sourceforge.net / proj ect s/ m ysql-pyt hon/ . The Vault s of Parnassus serves as a general reposit ory for Pyt hon source code: ht t p: / / www.vex.net / ~ x/ parnassus/ . David M. Beazley. Pyt hon Essent ial Reference.New Riders.

C.5 Java Resources Sun's Java sit e provides access t o docum ent at ion ( including t he specificat ions) for JDBC, servlet s, JSP, and t he JSP St andard Tag Library:

• •

JDBC general inform at ion: ht t p: / / j ava.sun.com / product s/ j dbc/ index.ht m l JDBC docum ent at ion: ht t p: / / j ava.sun.com / j 2se/ 1.3/ docs/ guide/ j dbc/ index.ht m l

• •

Java servlet s: ht t p: / / j ava.sun.com / product s/ servlet /



JSP St andard Tag Library: ht t p: / / j ava.sun.com / product s/ j sp/ j st l/

JavaServer Pages: ht t p: / / j ava.sun.com / product s/ j sp/

George Reese. Dat abase Program m ing wit h JDBC and Java. O'Reilly & Associat es. David Flanagan. Java Exam ples in a Nut shell. O'Reilly & Associat es. Hans Bergst en. JavaServer Pages. O'Reilly & Associat es. David Harm s. JSP, Servlet s, and MySQL. M & T Books. Sim on Brown, et al. Professional JSP. Wrox Press. Shawn Bayern. JSTL in Act ion. Manning Publicat ions.

C.6 Apache Resources Wainwright , Pet er. Professional Apache. Wrox Press.

C.7 Other Resources Chuck Musciano and Bill Kennedy. HTML & XHTML: The Definit ive Guide. O'Reilly & Associat es. Erik T. Ray. Learning XML. O'Reilly & Associat es. Jeffrey E. F. Friedl Mast ering Regular Expressions. O'Reilly & Associat es.